Ever tried adding HTML within XML, but can't keep the XML validated? Using ASP as the backend to create your XML output, this solution will provide backwards compatible HTML in your XML applications.

Why Would I Do That?

When developing an XML news feed, a forum or a CMS feature to create webpages, a useful feature to include is the ability to edit the HTML input right in the web (a lot like the process used by evolt.org for adding these articles)

Anyway, the downside to this is the need to embed these HTML fragments created, within XML resultsets without modification at XML creation stage so they can be written to the HTML based forums, or added to the XML news feed while keeping the HTML formatting.

For example

Say you have an XML news feed pulling news (TITLE and DATA) from a database. Its easy with just plain text stored in the database, just grab it and print it - the XML and XSL do the hard work.
But what if you have HTML tags in the news feed (images, tables, font styles, etc). The HTML would displays ok if it where on a normal page, but how will it look when passed through the XML parser and displayed as part of the XML feed?

So, Where's The Problem?

The problem is that HTML is not well-formed XML, and I don't want to get into the process of processing the text input into the text boxes to validate the HTML tags, client or server side, since HTML can be very sloppily put together and still work. Key point being, you don't want any unneccessary processing of what could be potentially large HTML strings, whether at read or write stage. So, here's what to do about it:

The XML :

When I retrieve the HTML and create the XML output, I put the untouched HTML into an XML fragment using ASP code like this:

' Here I concatenate in the data's title record

' which is not XML, so I process it using a simple

' XMLEncode function which replaces all the

' reserved XML character Entities

sXML = sXML & " <TITLE>" & XMLEncode(RS("Title")) & "</TITLE>" & vbcrlf

' The following line places the HTML in a comment block

' which means it will be completely ignored by any XML

' validation, hence can contain malformed XML.

sXML = sXML & " <DATA><!--" & RS("Data") & "--></DATA>" & vbcrlf

The XSL:

This part is very easy, but there's two things you must always remember. The typical way of spitting out the <TITLE> element's content's would be:

<xsl:value-of select="TITLE"/>

The way to output the Embedded HTML is to use the comment() XPath Node test. The other very important step is to add the disable-output-escaping="yes" attribute to your value-of element. So:

<xsl:value-of select="DATA/comment()" disable-output-escaping="yes"/>

outputs the HTML intact. The disable output escaping prevents your HTML looking like <HTML> which is not the desired effect. Lastly, to prevent the XSL Processor spitting out crap instead of all your nice font characters, use encoding on both the XML and XSL files to prevent errors.

The Final Code:


<?xml version="1.0" encoding="iso-8850-150"?>


<TITLE>This is a News Title</TITLE>



<BODY bgcolor=white><P>This contains malformed HTML<BR>

<IMG SRC=http://www.evolt.org/images/logo.gif width=120 height=30>





<?xml version="1.0" encoding="iso-8850-150"?>


<xsl:template match="/">

<xsl:value-of select="MYDATA/TITLE"/>

<xsl:value-of select="MYDATA/HTMLDATA/comment()" disable-output-escaping="yes"/>



You now have a page with an invalid HTML layout, but which still validates properly as XML.