Now that XHTML 1.0 is W3C's Recommendation for the latest version of HTML, you should have started to prepare your code for it. You're already coding to the HTML 4.01 recommendation and validating your code (well if you aren't you should start *now*), so all you need is to know how to make that transition? Not to mention when? But let's start with why:
XHTML is a family of current and future document types and modules that reproduce, subset, and extend HTML 4. The XHTML family document types are XML based, and ultimately are designed to work in conjunction with XML-based user agents. Well-designed HTML documents that distinguish structure and presentation will adapt more easily to new technologies. Not to mention the fact that some of the most popular elements of the past are deprecated today, going on obsolete.
XHTML is a reformulation of the three HTML 4 document types as applications of XML 1.0. It is intended to be used as a language for content that is both XML-conforming and, if some simple guidelines are followed, operates in HTML 4 conforming user agents.
The XHTML family is the next step in the evolution of the Internet. By migrating to XHTML today, content developers can enter the XML world with all of its intended benefits, while still remaining confident in their content's backward and future compatibility.
There are three steps on your way to perfect XHTML coding. If you haven't been coding to the HTML 4.01 recommendations this would be your first step. Next, you make little adjustments to your coding habits, while still validating your code against the HTML 4.01 recommendations. Finally, you make the complete transition, by changing the HTML Version Information in your DTD declarations.
In your leap to HTML 4.01, your first amendment to your code is adding HTML version information at the top of each document. Here you have three document types to choose from: strict, transitional or frameset.
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Frameset//EN"
The second adjustment you make is deleting your
font elements, not only from your documents, but from your mind as well. The essential change between HTML 3.2 and HTML 4.0, and then 4.01, is separating presentation from content. Therefore most elements dealing with presentation are deprecated, in favor of Cascading Style Sheets (CSS). For the same reason, color and alignment attributes should also be removed.
Another adjustment is an addition, both to your code and your mind. Use the
title attribute basically everywhere. In your anchors, your abbreviations and anywhere you feel an explanation might ease the accessibility to your content.
I've already mentioned the
title attribute, but you can do so much more. The
alt attribute, the
accesskey attribute, the
lang attribute, the
label attribute. Use them all.
Meta tags are good for many things with regard to specifying information about the content on a page. Mark your audience by defining the content-type and the content-language of your HTML page. Examples:
<META http-equiv="Content-Type" content="text/html; charset=EUC-JP">
<META http-equiv="Content-Language" content="en-us">
The final adjustment is a replacement, still both to code and mind. Wherever you have used the
name attribute in the past, start using the
id attribute. The
id attribute uniquely identifies any item in your content, which is pretty useful, not only for increased CSS usage, but also for marking destination anchors of links. This is also important with regard to XHTML transition further on, since the
name attribute is deprecated in XHTML within the
map elements. Caveat: Support for this behavior of
ID is shaky in earlier browsers, also including a
NAME for anchoring might be a good idea, for backwards compatibility.
The second step in your XHTML transition is to add those eccentric XHTML features to your HTML code. XHTML documents must be well-formed. This means that all elements must be nested correctly, have closing tags or be closed in the empty tag with a space and a slash ( />).
XML is case-sensitive and, therefore, it is necessary to lowercase all HTML elements and attributes when used in XHTML documents. This also includes cascading style sheets.
If an element is made up of opening and closing tags, use the closing tag. Even those that have been marked optional in past versions of HTML. It is equally as important to nest tags correctly, to close the previously opened
<em> before closing the paragraph it resides in.
Space-slashing means adding a space and a slash at the end of all empty tags - tags that don't have closing tags. This is an assistant indicator for XML that the tag has ended. The XML specifications claim that you could add a closing tag to those empty tags, but as I understand it, the support for that is shaky at best. The reason for adding space-slashing is mainly for backwards compatability, elderly browsers might choke on your page when you don't.
All attribute values must be quoted, even those which appear to be numeric. Example:
Use both the
xml:lang attributes when specifying the language of an element. The value of the
xml:lang attribute takes precedence. Example:
<html lang="en" xml:lang="en">
XML does not support attribute minimization. Attribute-value pairs must be written in full. Attribute names such as nowrap cannot occur in elements without their value being specified
nowrap="nowrap". Caveat: Some older HTML user agents are unable to interpret boolean attributes when these appear in their full (non-minimized) form, as required by XML 1.0. Note, this problem doesn't affect user agents compliant with HTML 4. The following attributes are involved: compact, nowrap, ismap, declare, noshade, checked, disabled, readonly, multiple, selected, noresize, defer.
To specify a character encoding in the document, use both the encoding attribute specification on the xml declaration (e.g.,
<?xml version="1.0" encoding="EUC-JP"?>) and a meta http-equiv statement (e.g.,
<meta http-equiv="Content-type" content='text/html; charset="EUC-JP"' />). The value of the encoding attribute of the xml declaration takes precedence.
Use external style sheets if your style sheet uses < or & or ]]> or --. Note that XML parsers are permitted to silently remove the contents of comments. Therefore, the historical practice of "hiding" scripts and style sheets within comments to make the documents backward compatible is likely to not work as expected in XML-based implementations.
XHTML has stricter nesting rules than HTML. You have to be more careful as to how you build up your code and which elements you nest within another. Some combinations of nesting elements are forbidden. The elements in question are following:
acannot contain other
precannot contain the
buttoncannot contain the
labelcannot contain other
formcannot contain other
namespace attribute is needed in all XHTML documents. It's a good practice to start adding them to the root element (<html>) right away. The correct syntax is as follows:
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
An XML declaration is not required, but strongly encouraged. Whenever the character encoding differs from the default (UTF-8; UTF-16), it is necessary.
PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
PUBLIC "-//W3C//DTD XHTML 1.0 Frameset//EN"
Finally, let's put together a basic XHTML document showcasing what has been mentioned before in this article.
<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "DTD/xhtml1-strict.dtd"><html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"> <head> <title>evolt.org</title> <?xml version="1.0" encoding="EUC-JP"?> <meta http-equiv="Content-type" content='text/html; charset="EUC-JP"' /> </head> <body> <p>evolt.org a community for the web developers, by the web developers.</p> <hr noshade="noshade" /> </body></html>