Admin, can I play with XML please?

So you learnt about the beauty of XML, and you want to use it. You scan the web to find out how to do that. You find half-baked Internet Explorer solutions like Data Islands and loads of XSLT parsers and apache add-ons that help you using XML.

Great, all you need to do is patch and upgrade your server and the world of content and display separated web development is open for you to explore.

You read the specifications of the various techniques (Cocoon, Sablotron, Saxon, Xalan, Xerxes and the whole lot) and mail the tech support of your web space provider to please install them for you so you can XML-enable your site.

Just to get an email back stating "sorry, we don't do that".

The end of the road? Not really, there's a very quick and dirty way how to use XML to generate your pages, by using two really easy PHP functions and some clever HTML.

This solution has some drawbacks, which will be explained, at the end of this article. However, with a bit of clever XML structure, you can do a lot without getting your head around XPATH, XSLT and server administration.

PHP untag() to the rescue

The main PHP function is this:

function untag($string,$tag,$mode){

$tmpval="";

$preg="/<".$tag.">(.*?)</".$tag.">/si";

preg_match_all($preg,$string,$tags);

foreach ($tags[1] as $tmpcont){

if ($mode==1){$tmpval[]=$tmpcont;}

else {$tmpval.=$tmpcont;}

}

return $tmpval;

}

What it does is to extract the content of all tags "$tag" in string "$string". When "$mode" is 1 it returns the content as an array, otherwise as a string.

No rocket science, just another example of what regular expressions can do for you.

And that is all you want to do: extract the content from the XML tags.

With XSLT you can do that a lot better, but for smaller solutions, and with a bit of PHP knowledge, a lot can be done.

Now, how to use that? Let's say you want to create a news page on your site and you want to use XML to store the data.

You want to make sure that only the last three news-items are displayed, and the rest will be stored in an archive. To make the news page really handy, toss in a "search" functionality.

The first step would be to define the XML you will use. For this exercise, you type this XML, in a second phase we can also explore how to generate it via a web form. This will make your news page fully maintainable online.

Enter: The XML data

The XML is as follows:

<?xml version="1.0"?>

<page>

<item>

<id></id>

<date></date>

<headline></headline>

<copy></copy>

</item>

</page>

Each piece of news is an item and has a unique ID (to make it easier to identify in the editable version), the date it was entered, a headline and the news text (as copy).

Now, as you want to separate display from logic and content, use an HTML template for display. Of course the whole HTML could be in the PHP page, but by using a template, the display could also be tweaked by someone not knowing PHP at all.

To tell PHP what to display, add HTML comments to the template. The script will replace them with the real data later. Furthermore add comments with "start:" and "end:" to allow PHP to keep or delete parts of HTML according to which page should be displayed.

Teaching your HTML how to speak PHP

The template in the zip to this article has all the necessary comments added and explained. For the moment let's focus on the display of the news only:

<!-- start:newsitem -->

<div class="date"> <!-- date --> </div>

<div class="headline"> <!-- headline --> </div>

<div class="copy"> <!-- copy --> </div>

<div class="shadow"></div>

<br />

<!-- end:newsitem -->

This is how one piece of news will be displayed. The date, the headline and the copy comments will be replaced by the XML data.

To embed the template and the XML we use a function called load(), which does nothing but load the content of a file, and store it in a variable.

To extract the "news HTML block" from the template, use preg_match:

preg_match("/<!-- start:newsitem -->(.*?)<!-- end:newsitem -->/si", $template, $newshtml);

this grabs everything within those two comments and stores it in the array $newshtml.

From the XML, grab all the news items by calling

$items=untag($xml,"item",1);

and reverse the order of this array, so that the most recent news come first.

Now you have all the news items in an array called items. Each item contains the date, the headline, the copy and the id.

Displaying the news

To display the news on the page, loop through the array.

Grab the "news HTML block" each time and replace the comments with the corresponding XML data, using untag(). Add each of these HTML chunks to an include variable.

foreach ($items as $i){

$tmphtml=$newshtml[0];

$tmphtml=str_replace("<!-- date -->",untag($i,"date",0),$tmphtml);

$tmphtml=str_replace("<!-- headline -->",untag($i,"headline",0),$tmphtml);

$tmphtml=str_replace("<!-- copy -->",untag($i,"copy",0),$tmphtml);

$newsinclude.=$tmphtml;

}

Then replace the raw "news HTML block" in the template with the include:

$template= preg_replace ("/<!-- start:newsitem -->(.*?)<!-- end:newsitem -->/si", $newsinclude, $template);

and display it.

To differentiate between news and archive page, you simply don't use a "foreach" but display only a part of the news items (1 to 3 for the news page, 4 to the end for the archive). To find search results, compare each item with a search query.

Put all together and you have your XML based news page.

The script to this article is highly commented, and should be quite self-explanatory. After all this here should just explain what power the small but fine function untag() gives you.

The good, the bad and the ugly

This technique does not give you full control over XML documents. It's just an easy way to use basic XML and create HTML from that.

By adding a possibility to add, delete and alter the XML via a form, you can create a basic, lightweight CMS tool.

The flat-file ASCII data in XML format is also readable and editable directly, much unlike the oldschool flatfiles separating the news bit by pipe characters or commas.

The news data can contain any character, even HTML, without breaking apart (a thing that easily happens with comma separated data files).

However, it cannot handle the more complex XML structures, a real solution using XSLT and a parser can handle.

This technique does not recognise:

Furthermore you need to use unique tags for each item. A real XSLT parser can differentiate between the <element>s in this example:

<item>

<element>

<element></element>

</element>

</item>

untag() returns the first element value with the second one embedded.

It works much like an <xsl:value-of select="//element"/> command in XSLT.

Now grab the sample script, template and XML and see for yourself. Download demo files