Skip to page content or skip to Accesskey List.
Search evolt.org
evolt.org login: or register

Work

Main Page Content

The XHTML Transition: It's not that difficult

Rated 4.19 (Ratings: 19) (Add your rating)

Log in to add a comment
(18 comments so far)

Want more?

  • More articles in Code
  • More articles by elfur
 
Picture of elfur

Elfur Logadòttir

Member info | Full bio

User since: December 13, 1998

Last login: November 09, 2008

Articles written: 4

Now that XHTML 1.0 is W3C's Recommendation for the latest version of HTML, you should have started to prepare your code for it. You're already coding to the HTML 4.01 recommendation and validating your code (well if you aren't you should start *now*), so all you need is to know how to make that transition? Not to mention when? But let's start with why:

Why the transition to XHTML?

XHTML is a family of current and future document types and modules that reproduce, subset, and extend HTML 4. The XHTML family document types are XML based, and ultimately are designed to work in conjunction with XML-based user agents. Well-designed HTML documents that distinguish structure and presentation will adapt more easily to new technologies. Not to mention the fact that some of the most popular elements of the past are deprecated today, going on obsolete.

XHTML is a reformulation of the three HTML 4 document types as applications of XML 1.0. It is intended to be used as a language for content that is both XML-conforming and, if some simple guidelines are followed, operates in HTML 4 conforming user agents.

The XHTML family is the next step in the evolution of the Internet. By migrating to XHTML today, content developers can enter the XML world with all of its intended benefits, while still remaining confident in their content's backward and future compatibility.

The Route

There are three steps on your way to perfect XHTML coding. If you haven't been coding to the HTML 4.01 recommendations this would be your first step. Next, you make little adjustments to your coding habits, while still validating your code against the HTML 4.01 recommendations. Finally, you make the complete transition, by changing the HTML Version Information in your DTD declarations.

Step One: Coding to the HTML 4.01 Recommendations

Adding HTML Version Information

In your leap to HTML 4.01, your first amendment to your code is adding HTML version information at the top of each document. Here you have three document types to choose from: strict, transitional or frameset.

  • The HTML 4.01 Strict DTD includes all elements and attributes that have not been deprecated or do not appear in frameset documents. For documents that use this DTD, use this document type declaration:
     &lt;!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"<br>
            &nbsp;&nbsp;&nbsp;"http://www.w3.org/TR/html4/strict.dtd"&gt;
  • The HTML 4.01 Transitional DTD includes everything in the strict DTD plus deprecated elements and attributes (most of which concern visual presentation). For documents that use this DTD, use this document type declaration:
     &lt;!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"<br>
            &nbsp;&nbsp;&nbsp;"http://www.w3.org/TR/html4/loose.dtd"&gt;
  • The HTML 4.01 Frameset DTD includes everything in the transitional DTD plus the tags for frames. For documents that use this DTD, use this document type declaration:
     &lt;!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Frameset//EN"<br>
            &nbsp;&nbsp;&nbsp;"http://www.w3.org/TR/html4/frameset.dtd"&gt;

Deleting your font elements and your color/alignment attributes

The second adjustment you make is deleting your font elements, not only from your documents, but from your mind as well. The essential change between HTML 3.2 and HTML 4.0, and then 4.01, is separating presentation from content. Therefore most elements dealing with presentation are deprecated, in favor of Cascading Style Sheets (CSS). For the same reason, color and alignment attributes should also be removed.

Adding the title attribute

Another adjustment is an addition, both to your code and your mind. Use the title attribute basically everywhere. In your anchors, your abbreviations and anywhere you feel an explanation might ease the accessibility to your content.

Increasing accessibility for people with physical limitation

I've already mentioned the title attribute, but you can do so much more. The alt attribute, the accesskey attribute, the lang attribute, the label attribute. Use them all.

Remembering the meta tags

Meta tags are good for many things with regard to specifying information about the content on a page. Mark your audience by defining the content-type and the content-language of your HTML page. Examples:

&lt;META http-equiv="Content-Type" content="text/html; charset=EUC-JP"&gt;<br>
&lt;META http-equiv="Content-Language" content="en-us"&gt;

Replacing your name attributes with id's

The final adjustment is a replacement, still both to code and mind. Wherever you have used the name attribute in the past, start using the id attribute. The id attribute uniquely identifies any item in your content, which is pretty useful, not only for increased CSS usage, but also for marking destination anchors of links. This is also important with regard to XHTML transition further on, since the name attribute is deprecated in XHTML within the a, applet, form, frame, iframe, img and map elements. Caveat: Support for this behavior of ID is shaky in earlier browsers, also including a NAME for anchoring might be a good idea, for backwards compatibility.

Step Two: Adjusting your code to XHTML, but not your DTDs

The second step in your XHTML transition is to add those eccentric XHTML features to your HTML code. XHTML documents must be well-formed. This means that all elements must be nested correctly, have closing tags or be closed in the empty tag with a space and a slash ( />).

Keeping the tags lowercase

XML is case-sensitive and, therefore, it is necessary to lowercase all HTML elements and attributes when used in XHTML documents. This also includes cascading style sheets.

Closing and correctly nesting all tags

If an element is made up of opening and closing tags, use the closing tag. Even those that have been marked optional in past versions of HTML. It is equally as important to nest tags correctly, to close the previously opened &lt;em&gt; before closing the paragraph it resides in.

"space-slashing" empty tags

Space-slashing means adding a space and a slash at the end of all empty tags - tags that don't have closing tags. This is an assistant indicator for XML that the tag has ended. The XML specifications claim that you could add a closing tag to those empty tags, but as I understand it, the support for that is shaky at best. The reason for adding space-slashing is mainly for backwards compatability, elderly browsers might choke on your page when you don't.

Wrap attribute values in quotes

All attribute values must be quoted, even those which appear to be numeric. Example:
border="1".

Adding the lang and xml:lang attributes

Use both the lang and xml:lang attributes when specifying the language of an element. The value of the xml:lang attribute takes precedence. Example:
&lt;html lang="en" xml:lang="en"&gt;

Stopping the Attribute Minimization

XML does not support attribute minimization. Attribute-value pairs must be written in full. Attribute names such as nowrap cannot occur in elements without their value being specified nowrap=&quot;nowrap&quot;. Caveat: Some older HTML user agents are unable to interpret boolean attributes when these appear in their full (non-minimized) form, as required by XML 1.0. Note, this problem doesn't affect user agents compliant with HTML 4. The following attributes are involved: compact, nowrap, ismap, declare, noshade, checked, disabled, readonly, multiple, selected, noresize, defer.

Adding Character Encoding

To specify a character encoding in the document, use both the encoding attribute specification on the xml declaration (e.g., &lt;?xml version="1.0" encoding="EUC-JP"?&gt;) and a meta http-equiv statement (e.g., &lt;meta http-equiv="Content-type" content='text/html; charset="EUC-JP"' /&gt;). The value of the encoding attribute of the xml declaration takes precedence.

Embedding Style Sheets and Scripts

Use external style sheets if your style sheet uses < or & or ]]> or --. Note that XML parsers are permitted to silently remove the contents of comments. Therefore, the historical practice of "hiding" scripts and style sheets within comments to make the documents backward compatible is likely to not work as expected in XML-based implementations.

Adjusting to allowed nesting

XHTML has stricter nesting rules than HTML. You have to be more careful as to how you build up your code and which elements you nest within another. Some combinations of nesting elements are forbidden. The elements in question are following:

  •  a cannot contain other a elements.
  •  pre cannot contain the img, object, big, small, sub, or sup elements.
  •  button cannot contain the input, select, textarea, label, button, form, fieldset, iframe or isindex elements.
  •  label cannot contain other label elements.
  •  form cannot contain other form elements.

Adding the XML namespace attribute

The XML namespace attribute is needed in all XHTML documents. It's a good practice to start adding them to the root element (<html>) right away. The correct syntax is as follows:
&lt;html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"&gt;

Step Three: Making that Transition

Adding the XML declaration

An XML declaration is not required, but strongly encouraged. Whenever the character encoding differs from the default (UTF-8; UTF-16), it is necessary.

Changing the HTML Version Information

  •  &lt;!DOCTYPE html<br>
    &nbsp;&nbsp;&nbsp;PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"<br>
            &nbsp;&nbsp;&nbsp;"http://www.w3.org/TR/html4/strict.dtd"&gt;
  •  &lt;!DOCTYPE html<br>
    &nbsp;&nbsp;&nbsp;PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"<br>
            &nbsp;&nbsp;&nbsp;"http://www.w3.org/TR/html4/loose.dtd"&gt;
  •  &lt;!DOCTYPE html<br>
    &nbsp;&nbsp;&nbsp;PUBLIC "-//W3C//DTD XHTML 1.0 Frameset//EN"<br>
            &nbsp;&nbsp;&nbsp;"http://www.w3.org/TR/html4/frameset.dtd"&gt;

Simple example of an XHTML document

Finally, let's put together a basic XHTML document showcasing what has been mentioned before in this article.

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html
     PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
    "DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
  <head>
    <title>evolt.org</title>
	<?xml version="1.0" encoding="EUC-JP"?>
	<meta http-equiv="Content-type" content='text/html; charset="EUC-JP"' />
  </head>
  <body>
    <p>evolt.org a community for the web developers,
	by the web developers.</p>
	<hr noshade="noshade" />
  </body>
</html>
Elfur Logadòttir (elfur) is The Icelandic One. She is a student, a freelance Web developer, a mother, a soccer club manager, a founding member of evolt.org and evolt.org's current secretary. Elfur has been attached to the Web since it's early days, when the likes of Netscape 1.0 were The Ultimate Experience and Wired was the place to love.

XML declaration bad for Mac

Submitted by bmason on April 27, 2001 - 10:16.

Just a note that putting any XML declaration in the code will kill the page on some of the Mac browsers -- I think it's IE 4 that suffers this problem. On that, nothing on the page will render.

login or register to post comments

Worked nice on my OSX

Submitted by angievert on April 27, 2001 - 10:40.

Opened it into explorer 5.1 beta. Fine... netscape 4.7 running under classic... Fine.

login or register to post comments

Coding to XHTML 1.0 Strict for a while

Submitted by deanburge on April 28, 2001 - 18:01.

I've been coding to XHTML 1.0 Strict for quite a while now (over a year) and I can say that it is not difficult at all... anyone who suggests that it is does not understand it. I actually recommend it - the process (actual marking the code to discovering new interests in related areas) made me a more confident and more educated person in my field. And result the in the web pages is invisible - anyone who does not know about - or does not need to know, about the nuts & bolts under their web pages, are none the wiser.

login or register to post comments

ie4/Mac &amp; XML

Submitted by MartinB on April 29, 2001 - 03:47.

xHTML works fine with ie4.5 - and the last time that ie4.01 was pushed to Mac users was several OS versions ago. iCab is fine. Opera is fine. Netscape (any Mac version) is fine. The worst you'll get is slight stylesheet dodgyness - which is all around CSS support than anything to do with xHTML.

login or register to post comments

IE 4.5 and more

Submitted by bmason on April 29, 2001 - 10:23.

I'm generally required to maintain, if not 100% of what you'd see in a 5.x browser, pretty much full functionality in a 4.x browser in my browsers. This led to my comment about XML declarations and Mac IE 4.0. With Mac IE 4.0 and the fun CSS quirks in Netscape 4.0x, you can imagine how my days go!

login or register to post comments

Much of this Applies to HTML 4.01, Too.

Submitted by Calum on May 1, 2001 - 11:51.

Please note, this is not a flame - if enough people believe that XHTML follows rules (while HTML is just a bunch of pointy brackets for NetExploder 4 layout instructions) then let's convince them to switch to XHTML.

However, I suspect that the majority of Evolters already realise that HTML is a markup application, and that there are rules about nesting and quoting attributes. For us, a little thought about what content markup is all about and some quality assurance from an SGML validator allows us to produce forward capable documents in HTML 4.01

Transformation from valid HTML 4.01 to valid XHTML 1.0 is mostly a doddle. Element and attribute names become lower case, space-slashes append to empty elements, quotes are added for all attributes, non-empty elements are explicitly closed, attribute values are de-minimised and and the DocType declaration is changed. Easy!

This is not the case for Tag Soup. Sloppy invalid (non-)HTML is just as useless as sloppy invalid (non-)XHTML. Trnasformations and repurposing are not a doddle with bad markup.

Still, if we can convince those people who think that HTML's a joke that XHTML's serious, all well and good.

NB: The XML declaration (which causes problems for some browsers) is optional for UTF-8 and UTF-16.

login or register to post comments

WARNING for mozilla/ns 6

Submitted by nastySprite on May 5, 2001 - 15:22.

If you're designing for mozilla - and you should do that ;) -, using tables and converting your code to xhtml you should be aware of a really annoying bug: Funnily, the correct doctype definition leads to a broken table design (you'll have strange gaps in between your cells, that is). I don't know how a doctype could possibly be related to this quirk, but it exists nonetheless - and it took me a long time to track it down, because you usually don't search for display errors outside your actual code. What you have to do is insert a space between the ! and DOCTYPE. For those validator freaks out there (like me...), to get around my pages not being well formed, I use a simple php command: $MOZ = preg_match("'^Mozilla/5'",$HTTP_USER_AGENT)?" ":"";?> DOCTYPE blah blah...>

login or register to post comments

Submitted by nastySprite on May 5, 2001 - 15:27.

oops... I forgot escaping HTML... Once again: <?php $MOZ = preg_match("'^Mozilla/5'",$HTTP_USER_AGENT)?" ":"";?> <!<?=$MOZ?>DOCTYPE ....

login or register to post comments

Re: Coding to XHTML 1.0 Strict for a while

Submitted by Martin Tsachev on June 29, 2001 - 07:05.

I've been coding to XHTML 1.0 Strict for quite a while now (over a year) and I can say that it is not difficult at all

It's not that it is difficult that prevents people from coding XHTML but that it may possibly not display so good as when they used good old HTML.
And also have you noticed how many of the sites now use mainly CSS for rendering contents rather than the deprecated HTML layout.

login or register to post comments

Re: Difference in code display

Submitted by elfur on June 29, 2001 - 08:23.

"It's not that it is difficult that prevents people from coding XHTML but that it may possibly not display so good as when they used good old HTML."

Which is exactly why I wrote the article - to remind you of the things to do, to make sure that it's the same display whether you use HTML or XHTML. The only difference being the fact that with XHTML you're coding to current recommendation and preparing for the future ...

login or register to post comments

Re: Difference in code display

Submitted by Martin Tsachev on June 29, 2001 - 09:22.

What I meant is that moving to XHTML is one thing( to the Transitional of coure), but coding in XHTML Strict just won't work for now.
It just relies too heavily on CSS support and current browsers will most likely fail to show the content the way you wanted it to be.

login or register to post comments

torn between two lovers...

Submitted by cursif on August 7, 2001 - 10:16.

dear evolt

as a man who fully loves his standards, i have found myself torn lately. torn between remaining faithful to my new love... and the lingering doubts that my old friends [visual design and actual day-to-day browser habits] arent coming over as much anymore. doubts which nag me to include a few stray "align" attributes in my tables. "just to make sure that everyonecan see those sliced images realigned perfectly, just the way you designed them," they assure me.

"but thats not standard!" i protest. "im taking us further off the road to compliance!"

"but how many browsers really are?" they retorted. "if you totally ignore an albeit lagging marketshare, what makes you think thats any better than whoring around with us deprecated elements?"

whats a designer/coder to do? should i feel safe slipping back into my old life, only in tables... only when i need to insure the image [not the content, or font style] is fitted together in a coherent way?

signed, torn between two lovers in atlanta

login or register to post comments

Yea, but what's the freekin' point?

Submitted by biolight on August 9, 2001 - 13:05.

I mean, I went down to the bookstore yesterday and looked at all 5-6 books on xhtml. I was looking for a really compelling reason to implement xhtml. It's not a lot of work in itself, but man, converting 50k documents to xhtml sure is. I didn't find one. Besides the obvious answers, that it let's anybody see the page & that it increases usability for the disabled (does anyone have any statistics that show that someone out there is actually visiting sites with this stuff -- beside the generic, there are so many million people out there who are blind, deaf, or as dumb as a web developer?), what's the point? It looks like just another academic standard that isn't ready for commercial prime-time, and certainly is not worth the time or money to implement. I'm sure it will be, but I don't see a *really* compelling reason to transition beyond supporting the few dozen clients a year that actually use non-standard browsers. Bear in mind, I'm talking about a number of sites that get, oh, say, .5 million page views a day. So the decision is pretty much paramount. See, it's all about cost-benefit here. I'm using xhtml on some of my personal sites, btw. Come on, give me a *good* reason!!! J

login or register to post comments

XHTML still makes sense

Submitted by cacklebunny on October 6, 2002 - 10:23.

Now that we've seen how XML has finally begun living up to the hype we'd starting hearing about in the late 1990's, XHTML makes a lot more sense.

Unfortunately, there's still a long way to go in terms of making the DTD's bug-free. I find that if I re-code some pages from HTML 4.0 to XHTML 1.0, strange side-effects emerge that are only solved by removing the transitional or strict doctype header, even after successfully validating.

As a curious sidenote, did anyone notice that this very page is still in HTML 4.0 transitional format? ;)

login or register to post comments

Re: XHTML still makes sense

Submitted by bmason on October 6, 2002 - 11:39.

I've yet to hear of a rendering problem caused by an error in the DTD. More likely, the browser is doctype-sniffing its way into a standards-rendering mode and doing something differently than in its more-forgiving quirky mode.

There's nothing wrong with staying with HTML 4, really. It's still a valid spec and will probably be so for awhile. In fact, there is a body of argument that sending XHTML as text/html is a bad thing.

login or register to post comments

This page HTML version

Submitted by MartinB on October 7, 2002 - 10:17.

Cacklebunny

This page (actually the entire site) is indeed in HTML4.01. Take a look at how (and when) we redesigned the evolt site - we also recognise that when we move to XHTML, we've got a major job recoding all the articles on the site.

login or register to post comments

Transition to XHTMl

Submitted by Martin Tsachev on October 7, 2002 - 12:46.

And Martin don't forget the comments too. I think that there are lots of them which are not HTML 4 now.

login or register to post comments

CS3 standard to do the textarea WRAP

Submitted by djr on September 30, 2003 - 16:31.

Hi, The XHTML code you presented is in a textarea that uses the WRAP attribute (WRAP="off" ) which is illegal to W3C. Could you (anyone) show how to achieve the same utility in valid HTML 4.01 or XHTML? Thanks if you can. -jr

login or register to post comments

The access keys for this page are: ALT (Control on a Mac) plus:

evolt.orgEvolt.org is an all-volunteer resource for web developers made up of a discussion list, a browser archive, and member-submitted articles. This article is the property of its author, please do not redistribute or use elsewhere without checking with the author.