Skip to page content or Skip to Accesskey List.


Main Page Content

The Xhtml Transition It S Not That Difficult

Rated 4.19 (Ratings: 19)

Want more?

  • More articles in Code
Picture of elfur

Elfur Logadòttir

Member info

User since: 14 Dec 1998

Articles written: 4

Now that XHTML 1.0 is W3C's Recommendation for the latest version of HTML, you should have started to prepare your code for it. You're already coding to the HTML 4.01 recommendation and validating your code (well if you aren't you should start *now*), so all you need is to know how to make that transition? Not to mention when? But let's start with why:

Why the transition to XHTML?

XHTML is a family of current and future document types and modules that reproduce, subset, and extend HTML 4. The XHTML family document types are XML based, and ultimately are designed to work in conjunction with XML-based user agents. Well-designed HTML documents that distinguish structure and presentation will adapt more easily to new technologies. Not to mention the fact that some of the most popular elements of the past are deprecated today, going on obsolete.

XHTML is a reformulation of the three HTML 4 document types as applications of XML 1.0. It is intended to be used as a language for content that is both XML-conforming and, if some simple guidelines are followed, operates in HTML 4 conforming user agents.

The XHTML family is the next step in the evolution of the Internet. By migrating to XHTML today, content developers can enter the XML world with all of its intended benefits, while still remaining confident in their content's backward and future compatibility.

The Route

There are three steps on your way to perfect XHTML coding. If you haven't been coding to the HTML 4.01 recommendations this would be your first step. Next, you make little adjustments to your coding habits, while still validating your code against the HTML 4.01 recommendations. Finally, you make the complete transition, by changing the HTML Version Information in your DTD declarations.

Step One: Coding to the HTML 4.01 Recommendations

Adding HTML Version Information

In your leap to HTML 4.01, your first amendment to your code is adding HTML version information at the top of each document. Here you have three document types to choose from: strict, transitional or frameset.

  • The HTML 4.01 Strict DTD includes all elements and attributes that have not been deprecated or do not appear in frameset documents. For documents that use this DTD, use this document type declaration:


  • The HTML 4.01 Transitional DTD includes everything in the strict DTD plus deprecated elements and attributes (most of which concern visual presentation). For documents that use this DTD, use this document type declaration:

    <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"

  • The HTML 4.01 Frameset DTD includes everything in the transitional DTD plus the tags for frames. For documents that use this DTD, use this document type declaration:

    <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Frameset//EN"


Deleting your font elements and your color/alignment attributes

The second adjustment you make is deleting your font elements, not only from your documents, but from your mind as well. The essential change between HTML 3.2 and HTML 4.0, and then 4.01, is separating presentation from content. Therefore most elements dealing with presentation are deprecated, in favor of Cascading Style Sheets (CSS). For the same reason, color and alignment attributes should also be removed.

Adding the title attribute

Another adjustment is an addition, both to your code and your mind. Use the title attribute basically everywhere. In your anchors, your abbreviations and anywhere you feel an explanation might ease the accessibility to your content.

Increasing accessibility for people with physical limitation

I've already mentioned the title attribute, but you can do so much more. The alt attribute, the accesskey attribute, the lang attribute, the label attribute. Use them all.

Remembering the meta tags

Meta tags are good for many things with regard to specifying information about the content on a page. Mark your audience by defining the content-type and the content-language of your HTML page. Examples:

<META http-equiv="Content-Type" content="text/html; charset=EUC-JP">

<META http-equiv="Content-Language" content="en-us">

Replacing your name attributes with id's

The final adjustment is a replacement, still both to code and mind. Wherever you have used the name attribute in the past, start using the id attribute. The id attribute uniquely identifies any item in your content, which is pretty useful, not only for increased CSS usage, but also for marking destination anchors of links. This is also important with regard to XHTML transition further on, since the name attribute is deprecated in XHTML within the a, applet, form, frame, iframe, img and map elements. Caveat: Support for this behavior of ID is shaky in earlier browsers, also including a NAME for anchoring might be a good idea, for backwards compatibility.

Step Two: Adjusting your code to XHTML, but not your DTDs

The second step in your XHTML transition is to add those eccentric XHTML features to your HTML code. XHTML documents must be well-formed. This means that all elements must be nested correctly, have closing tags or be closed in the empty tag with a space and a slash ( />).

Keeping the tags lowercase

XML is case-sensitive and, therefore, it is necessary to lowercase all HTML elements and attributes when used in XHTML documents. This also includes cascading style sheets.

Closing and correctly nesting all tags

If an element is made up of opening and closing tags, use the closing tag. Even those that have been marked optional in past versions of HTML. It is equally as important to nest tags correctly, to close the previously opened <em> before closing the paragraph it resides in.

"space-slashing" empty tags

Space-slashing means adding a space and a slash at the end of all empty tags - tags that don't have closing tags. This is an assistant indicator for XML that the tag has ended. The XML specifications claim that you could add a closing tag to those empty tags, but as I understand it, the support for that is shaky at best. The reason for adding space-slashing is mainly for backwards compatability, elderly browsers might choke on your page when you don't.

Wrap attribute values in quotes

All attribute values must be quoted, even those which appear to be numeric. Example:


Adding the lang and xml:lang attributes

Use both the lang and xml:lang attributes when specifying the language of an element. The value of the xml:lang attribute takes precedence. Example:

<html lang="en" xml:lang="en">

Stopping the Attribute Minimization

XML does not support attribute minimization. Attribute-value pairs must be written in full. Attribute names such as nowrap cannot occur in elements without their value being specified nowrap="nowrap". Caveat: Some older HTML user agents are unable to interpret boolean attributes when these appear in their full (non-minimized) form, as required by XML 1.0. Note, this problem doesn't affect user agents compliant with HTML 4. The following attributes are involved: compact, nowrap, ismap, declare, noshade, checked, disabled, readonly, multiple, selected, noresize, defer.

Adding Character Encoding

To specify a character encoding in the document, use both the encoding attribute specification on the xml declaration (e.g., <?xml version="1.0" encoding="EUC-JP"?>) and a meta http-equiv statement (e.g., <meta http-equiv="Content-type" content='text/html; charset="EUC-JP"' />). The value of the encoding attribute of the xml declaration takes precedence.

Embedding Style Sheets and Scripts

Use external style sheets if your style sheet uses < or & or ]]> or --. Note that XML parsers are permitted to silently remove the contents of comments. Therefore, the historical practice of "hiding" scripts and style sheets within comments to make the documents backward compatible is likely to not work as expected in XML-based implementations.

Adjusting to allowed nesting

XHTML has stricter nesting rules than HTML. You have to be more careful as to how you build up your code and which elements you nest within another. Some combinations of nesting elements are forbidden. The elements in question are following:

  • a cannot contain other a elements.
  • pre cannot contain the img, object, big, small, sub, or sup elements.
  • button cannot contain the input, select, textarea, label, button, form, fieldset, iframe or isindex elements.
  • label cannot contain other label elements.
  • form cannot contain other form elements.

Adding the XML namespace attribute

The XML namespace attribute is needed in all XHTML documents. It's a good practice to start adding them to the root element (<html>) right away. The correct syntax is as follows:

<html xmlns="" xml:lang="en" lang="en">

Step Three: Making that Transition

Adding the XML declaration

An XML declaration is not required, but strongly encouraged. Whenever the character encoding differs from the default (UTF-8; UTF-16), it is necessary.

Changing the HTML Version Information

  • <!DOCTYPE html

       PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"

  • <!DOCTYPE html

       PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"

  • <!DOCTYPE html

       PUBLIC "-//W3C//DTD XHTML 1.0 Frameset//EN"


Simple example of an XHTML document

Finally, let's put together a basic XHTML document showcasing what has been mentioned before in this article.

<?xml version="1.0" encoding="UTF-8"?>

<!DOCTYPE html

PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"


<html xmlns="" xml:lang="en" lang="en">



<?xml version="1.0" encoding="EUC-JP"?>

<meta http-equiv="Content-type" content='text/html; charset="EUC-JP"' />



<p> a community for the web developers,

by the web developers.</p>

<hr noshade="noshade" />



Elfur Logadòttir (elfur) is The Icelandic One. She is a student, a freelance Web developer, a mother, a soccer club manager, a founding member of and's current secretary. Elfur has been attached to the Web since it's early days, when the likes of Netscape 1.0 were The Ultimate Experience and Wired was the place to love.

The access keys for this page are: ALT (Control on a Mac) plus: is an all-volunteer resource for web developers made up of a discussion list, a browser archive, and member-submitted articles. This article is the property of its author, please do not redistribute or use elsewhere without checking with the author.