Skip to page content or skip to Accesskey List.
Search evolt.org
evolt.org login: or register

Work

Main Page Content

Embed HTML in XML & Retrieve it with XSL

Rated 4.05 (Ratings: 4) (Add your rating)

Log in to add a comment
(7 comments so far)

Want more?

 
Picture of Nautilus

Member info | Full bio

User since: May 21, 2002

Last login: May 21, 2002

Articles written: 1

Ever tried adding HTML within XML, but can't keep the XML validated? Using ASP as the backend to create your XML output, this solution will provide backwards compatible HTML in your XML applications.

Why Would I Do That?

When developing an XML news feed, a forum or a CMS feature to create webpages, a useful feature to include is the ability to edit the HTML input right in the web (a lot like the process used by evolt.org for adding these articles)

Anyway, the downside to this is the need to embed these HTML fragments created, within XML resultsets without modification at XML creation stage so they can be written to the HTML based forums, or added to the XML news feed while keeping the HTML formatting.

For example

Say you have an XML news feed pulling news (TITLE and DATA) from a database. Its easy with just plain text stored in the database, just grab it and print it - the XML and XSL do the hard work.
But what if you have HTML tags in the news feed (images, tables, font styles, etc). The HTML would displays ok if it where on a normal page, but how will it look when passed through the XML parser and displayed as part of the XML feed?

So, Where's The Problem?

The problem is that HTML is not well-formed XML, and I don't want to get into the process of processing the text input into the text boxes to validate the HTML tags, client or server side, since HTML can be very sloppily put together and still work. Key point being, you don't want any unneccessary processing of what could be potentially large HTML strings, whether at read or write stage. So, here's what to do about it:

The XML :

When I retrieve the HTML and create the XML output, I put the untouched HTML into an XML fragment using ASP code like this:

' Here I concatenate in the data's title record
' which is not XML, so I process it using a simple
' XMLEncode function which replaces all the
' reserved XML character Entities

sXML = sXML & "  <TITLE>" & XMLEncode(RS("Title")) & "</TITLE>" & vbcrlf

' The following line places the HTML in a comment block
' which means it will be completely ignored by any XML
' validation, hence can contain malformed XML.

sXML = sXML & "  <DATA><!--" & RS("Data") & "--></DATA>" & vbcrlf

The XSL:

This part is very easy, but there's two things you must always remember. The typical way of spitting out the <TITLE> element's content's would be:

&lt;xsl:value-of select="TITLE"/&gt;

The way to output the Embedded HTML is to use the comment() XPath Node test. The other very important step is to add the disable-output-escaping="yes" attribute to your value-of element. So:

&lt;xsl:value-of select="DATA/comment()" disable-output-escaping="yes"/&gt;

outputs the HTML intact. The disable output escaping prevents your HTML looking like <HTML> which is not the desired effect. Lastly, to prevent the XSL Processor spitting out crap instead of all your nice font characters, use encoding on both the XML and XSL files to prevent errors.

The Final Code:

XML:

<?xml version="1.0" encoding="iso-8850-150"?>
<MYDATA>
    <TITLE>This is a News Title</TITLE>
    <HTMLDATA>
        <!--
        <BODY bgcolor=white><P>This contains malformed HTML<BR>
        <IMG SRC=http://www.evolt.org/images/logo.gif width=120 height=30>
        -->
    </HTMLDATA>
</MYDATA>

XSL:

<?xml version="1.0" encoding="iso-8850-150"?>
<xsl:stylesheet>

<xsl:template match="/">
      <xsl:value-of select="MYDATA/TITLE"/>
      <xsl:value-of select="MYDATA/HTMLDATA/comment()" disable-output-escaping="yes"/>
</xsl:template>

</xsl:stylesheet>

You now have a page with an invalid HTML layout, but which still validates properly as XML.

Comments in the HTML

Submitted by endquote on June 4, 2002 - 12:13.

Would this work if there were comment tags in the HTML?

login or register to post comments

Use CDATA sections instead of XML comments

Submitted by coinz on June 4, 2002 - 16:11.

Embedded HTML comments would break the solution above. Besides, we should be using CDATA sections instead of XML comments anyway - that's one of the functions they're meant to provide.

XML:
<?xml version="1.0"?>

	This is a News Title
	
		



XSL:

login or register to post comments

PHP

Submitted by uioreanu on June 5, 2002 - 03:09.

How about PHP? Which conversion functions can we use to put HTML inside XML? Calin

login or register to post comments

XSL for extracting CDATA from XML?

Submitted by dbo on June 6, 2002 - 16:38.

Coinz writes above that we should be busing CDATA for containing HTML information, instead of comments. So how do we extract the information from the CDATA using XSL, without the tags showing?

login or register to post comments

embed javascript in xml doc so it will validate

Submitted by davecale on September 18, 2002 - 07:13.

Wondering what code can be used to allow an XML document to validate and import if javascript is embedded in the xml document and to allow the javascript to run properly in the resultant imported xml document.

login or register to post comments

Javascript within XSL HowTo: Static and XML-driven

Submitted by coinz on September 18, 2002 - 08:24.

Here's how I go about it...

XML
<?xml version="1.0"?>
<root>
<node>XML-driven hello world</node>
</root>

XSL
<?xml version="1.0"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="/">
<!-- Javascript example 1: using CDATA sections -->
<!-- Example 1 ishould be used when embedding javascript snippets that don't make use of XML data during the transform -->
<xsl:text disable-output-escaping="yes">
<![CDATA[
<script>
alert("Generic hello world")
</script>
]]>
</xsl:text>
<!-- Javascript example 2: using xsl:text and disable-output-escaping -->
<!-- Example 2 should be used  when embedding javascript snippets that need to make use of XML data during the transform -->
<xsl:text disable-output-escaping="yes"><</xsl:text>script<xsl:text disable-output-escaping="yes">></xsl:text>
alert("<xsl:value-of select="root/node" />")
<xsl:text disable-output-escaping="yes"><</xsl:text>/script<xsl:text disable-output-escaping="yes">></xsl:text>

<!-- the rest of your XSL transformation logic here -->
<xsl:value-of select="root/node" />
</xsl:template>
</xsl:stylesheet>

login or register to post comments

hmm just thinking

Submitted by evol on March 15, 2004 - 14:47.

Most of the articles I've read online about the above topic suggest using cdata, wich is, for me one of the easyr solutions. I' ve been looking for a sollution for the problem becouse i want to make a website with xml -php and xslt-fo and a online wysiwyg editor. Any idea how u, while using cdata, can get the content (without html), to a pdf output. I was thinking of using a simple xslt-fo conversion( how would i then solve for exmpl a bold tag in a cdata section). uioreanu: for php many suggest to use the sablotron engine wich puts xml and stylesheet together in a few lines of code :-) search google with xslt sablotron and u end up with a lot of great tutorials

login or register to post comments

The access keys for this page are: ALT (Control on a Mac) plus:

evolt.orgEvolt.org is an all-volunteer resource for web developers made up of a discussion list, a browser archive, and member-submitted articles. This article is the property of its author, please do not redistribute or use elsewhere without checking with the author.