The Xhtml Transition It S Not That Difficult
Posted on 26 Apr 2001
by Elfur LogadÃ²ttir (elfur)
Rated 4.19 (Ratings: 19)
- More articles in Code
Now that XHTML 1.0 is W3C's Recommendation for the latest version of HTML, you should have started to prepare your code for it. You're already coding to the HTML 4.01 recommendation and validating your code (well if you aren't you should start *now*), so all you need is to know how to make that transition? Not to mention when? But let's start with why:
Why the transition to XHTML?
XHTML is a family of current and future document types and modules that reproduce, subset, and extend HTML 4. The XHTML family document types are XML based, and ultimately are designed to work in conjunction with XML-based user agents. Well-designed HTML documents that distinguish structure and presentation will adapt more easily to new technologies. Not to mention the fact that some of the most popular elements of the past are deprecated today, going on obsolete.
XHTML is a reformulation of the three HTML 4 document types as applications of XML 1.0. It is intended to be used as a language for content that is both XML-conforming and, if some simple guidelines are followed, operates in HTML 4 conforming user agents.
The XHTML family is the next step in the evolution of the Internet. By migrating to XHTML today, content developers can enter the XML world with all of its intended benefits, while still remaining confident in their content's backward and future compatibility.
There are three steps on your way to perfect XHTML coding. If you haven't been coding to the HTML 4.01 recommendations this would be your first step. Next, you make little adjustments to your coding habits, while still validating your code against the HTML 4.01 recommendations. Finally, you make the complete transition, by changing the HTML Version Information in your DTD declarations.
Step One: Coding to the HTML 4.01 Recommendations
Adding HTML Version Information
In your leap to HTML 4.01, your first amendment to your code is adding HTML version information at the top of each document. Here you have three document types to choose from: strict, transitional or frameset.
- The HTML 4.01 Strict DTD includes all elements and attributes that have not been deprecated or do not appear in frameset documents. For documents that use this DTD, use this document type declaration:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
- The HTML 4.01 Transitional DTD includes everything in the strict DTD plus deprecated elements and attributes (most of which concern visual presentation). For documents that use this DTD, use this document type declaration:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
- The HTML 4.01 Frameset DTD includes everything in the transitional DTD plus the tags for frames. For documents that use this DTD, use this document type declaration:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Frameset//EN"
Deleting your font elements and your color/alignment attributes
The second adjustment you make is deleting your
font elements, not only from your documents, but from your mind as well. The essential change between HTML 3.2 and HTML 4.0, and then 4.01, is separating presentation from content. Therefore most elements dealing with presentation are deprecated, in favor of Cascading Style Sheets (CSS). For the same reason, color and alignment attributes should also be removed.
Another adjustment is an addition, both to your code and your mind. Use the
title attribute basically everywhere. In your anchors, your abbreviations and anywhere you feel an explanation might ease the accessibility to your content.
Increasing accessibility for people with physical limitation
I've already mentioned the
title attribute, but you can do so much more. The
alt attribute, the
accesskey attribute, the
lang attribute, the
label attribute. Use them all.
Remembering the meta tags
Meta tags are good for many things with regard to specifying information about the content on a page. Mark your audience by defining the content-type and the content-language of your HTML page. Examples:
<META http-equiv="Content-Type" content="text/html; charset=EUC-JP">
<META http-equiv="Content-Language" content="en-us">
name attributes with
The final adjustment is a replacement, still both to code and mind. Wherever you have used the
name attribute in the past, start using the
id attribute. The
id attribute uniquely identifies any item in your content, which is pretty useful, not only for increased CSS usage, but also for marking destination anchors of links. This is also important with regard to XHTML transition further on, since the
name attribute is deprecated in XHTML within the
map elements. Caveat: Support for this behavior of
ID is shaky in earlier browsers, also including a
NAME for anchoring might be a good idea, for backwards compatibility.
Step Two: Adjusting your code to XHTML, but not your DTDs
The second step in your XHTML transition is to add those eccentric XHTML features to your HTML code. XHTML documents must be well-formed. This means that all elements must be nested correctly, have closing tags or be closed in the empty tag with a space and a slash ( />).
Keeping the tags lowercase
XML is case-sensitive and, therefore, it is necessary to lowercase all HTML elements and attributes when used in XHTML documents. This also includes cascading style sheets.
Closing and correctly nesting all tags
If an element is made up of opening and closing tags, use the closing tag. Even those that have been marked optional in past versions of HTML. It is equally as important to nest tags correctly, to close the previously opened
<em> before closing the paragraph it resides in.
"space-slashing" empty tags
Space-slashing means adding a space and a slash at the end of all empty tags - tags that don't have closing tags. This is an assistant indicator for XML that the tag has ended. The XML specifications claim that you could add a closing tag to those empty tags, but as I understand it, the support for that is shaky at best. The reason for adding space-slashing is mainly for backwards compatability, elderly browsers might choke on your page when you don't.
Wrap attribute values in quotes
All attribute values must be quoted, even those which appear to be numeric. Example:
Use both the
xml:lang attributes when specifying the language of an element. The value of the
xml:lang attribute takes precedence. Example:
<html lang="en" xml:lang="en">
Stopping the Attribute Minimization
XML does not support attribute minimization. Attribute-value pairs must be written in full. Attribute names such as nowrap cannot occur in elements without their value being specified
nowrap="nowrap". Caveat: Some older HTML user agents are unable to interpret boolean attributes when these appear in their full (non-minimized) form, as required by XML 1.0. Note, this problem doesn't affect user agents compliant with HTML 4. The following attributes are involved: compact, nowrap, ismap, declare, noshade, checked, disabled, readonly, multiple, selected, noresize, defer.
Adding Character Encoding
To specify a character encoding in the document, use both the encoding attribute specification on the xml declaration (e.g.,
<?xml version="1.0" encoding="EUC-JP"?>) and a meta http-equiv statement (e.g.,
<meta http-equiv="Content-type" content='text/html; charset="EUC-JP"' />). The value of the encoding attribute of the xml declaration takes precedence.
Embedding Style Sheets and Scripts
Use external style sheets if your style sheet uses < or & or ]]> or --. Note that XML parsers are permitted to silently remove the contents of comments. Therefore, the historical practice of "hiding" scripts and style sheets within comments to make the documents backward compatible is likely to not work as expected in XML-based implementations.
Adjusting to allowed nesting
XHTML has stricter nesting rules than HTML. You have to be more careful as to how you build up your code and which elements you nest within another. Some combinations of nesting elements are forbidden. The elements in question are following:
acannot contain other
precannot contain the
buttoncannot contain the
labelcannot contain other
formcannot contain other
Adding the XML
namespace attribute is needed in all XHTML documents. It's a good practice to start adding them to the root element (<html>) right away. The correct syntax is as follows:
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
Step Three: Making that Transition
Adding the XML declaration
An XML declaration is not required, but strongly encouraged. Whenever the character encoding differs from the default (UTF-8; UTF-16), it is necessary.
Changing the HTML Version Information
PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
PUBLIC "-//W3C//DTD XHTML 1.0 Frameset//EN"
Simple example of an XHTML document
Finally, let's put together a basic XHTML document showcasing what has been mentioned before in this article.
<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "DTD/xhtml1-strict.dtd"><html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"> <head> <title>evolt.org</title> <?xml version="1.0" encoding="EUC-JP"?> <meta http-equiv="Content-type" content='text/html; charset="EUC-JP"' /> </head> <body> <p>evolt.org a community for the web developers, by the web developers.</p> <hr noshade="noshade" /> </body></html>