Why standards-compliant HTML matters
Web technologies have always been misused, to achieve effects they were not (yet) designed to achieve. While a very common and natural drive in humans, and sometimes a very fruitful approach, creative use of computer tools has got important negative consequences on the quality of web products.
When the first graphical browser, Mosaic, was released, nobody talked yet about the layout of pages. It was a graphical user interface for an essentially text-based browser, and the web was still all about content. (Side note: content, and ASCII art--another creative use of technology that we humans can perceive and appreciate, while it makes no sense at all to the computer.)
The first additions of features like the ability to add inline graphics did serve information purposes, but it also triggered creative, unintended uses. Netscape 2.0 introduced tables, and things continued evolving dramatically in that direction. Proprietary extensions that were soon turned into the official HTML standard allowed webmasters to display tabular data, but their use was quickly repurposed. Tables, images, frames and font colors, designed as containers for a certain type of information, were (and still are) being used for visual purposes.
The frame was supposed to separate independent elements of content and to allow the rational display of content and navigation, or more generally of pieces of content that are related but of a different nature. The actual use was pretty close to the original idea: here the content, there the logo and main nav, here the footer and copyright notice. However, empty frames were quickly added to emulate various behaviors deemed desirable by someone on the project team: to display only one URL for the entire site (at some point, some considered it clean!), or to center the site's content in the browser window.
Similarly, tables were meant to organize information in lines and columns. However, it became clear pretty quickly that with invisible borders, tables could be organized to cut out regions of a controlled size in pages, to hold together in a visually satisfying manner various pieces of text or images, to selectively apply a background image or color, etc. More than abundant literature was written on the topic, and some graphic design packages even included the ability to produce table-based web layouts on the fly.
Again, inline images (i.e. embedded in the text) were intended to display, well, visual information: graphics, photos, logos, and the like. But those were soon repurposed to participate in layout and branding. Overall page visuals, as created by graphic designers, were being built out of bits of images put together in a totally meaningless manner--which looked good in a graphical browser. Large images containing everything (from the site's identity to its textual content) were a way to fully control the layout. But the crowing glory of this practice was the spacer gif, a 1x1 pixel image which conveyed strictly no information at all, but enabled HTML coders to control the behavior of tables quite finely.
Layers and scripts, <font> tags and other remnants of the first browser wars show the same the repurposing pattern.
Creative (but unorthodox) use of technology
Creative minds were applied to the problem of creating visually appealing pages with the limitations of the very inappropriate tools available. And creative solutions were found. And the web started to look good. Designers became more and more astute at using those tricks, and attained a certain balance in the efficient but totally unorthodox usage of the technologies.
The problem here is that each site was creating its own standard of information encoding. This essentially defeated the original purpose of HTML and neutralized one of its most powerful uses: packaging information in a way that the computer can process, even if the other major purpose, getting information across to users, was reached in most cases. Basing their work on the reality that most users would access the content via a certain browser, web builders started tweaking their input until they were happy with the output.
Ideally, all web sites should feature structurally correct markup: <h1> describing a top-level header, <blockquote> wrapping a quotation block, headers organized in a logical way, <strong> or <b> applied to mark emphasis, tables to display tabular data, etc. When you have this, you reach two goals:
- get your information across to your users: because all browsers will display that structurally correct code in a good (but probably boring or ugly) manner
- make that information usable by a computer: any program capable of parsing HTML can be made aware of much of the content's context, structure, and highlights
But the actual practice was use to those structural elements for the visual effect they were having. And web designers would mix and match happily until the desired effect was reached, when viewed in a particular browser. Besides the enormously costly cross-platform discrepancies that arose from that practice, the consequence was HTML code that did not separate the content and its structure from the presentation.
All sorts of bad things happen when you do this, most of them related to the usefulness limits of non-intended use of technology:
- lessened impact on search engines, as related information may be technically unavailable for the search spiders despite being visible to most of your visitors
- access difficulty for people with another browser than the one(s) you used as your reference target, such as people with a handicap
- bandwidth waste, as the ratio between content and markup in your HTML code plunges to abysmal levels
- terrible difficulties in maintaining the content, changing the layout, and dealing with new access technologies (mobile phones, PDAs, RSS feed readers).
Nowadays, content is often pulled from the database of some Content Management System, and the ugly structural-markup-for-visual-purposes could be part of the display template. In that case, the last consequence obviously no longer applies. But then you get something else:
- content delivery must be handled on a per-method basis, with a different template handling the display for each different access method
And that does imply that you only used clean markup when maintaining content in your CMS, which is not always possible (for example, browser-based rich-text editors such as the very clever and powerful HTMLArea deliver terrible code).
So the case behind standards-compliant, structurally correct HTML markup is the following: because it makes use of the features of HTML as they were designed, it is efficient in what it does (describing the nature of various pieces of content that compose a document), and it is future-proof (if need be, good but deprecated markup can be easily converted to a newer version).
This does matter
In the practice, this must serve as a warning against creative and entirely unintended uses of a technology in the domain of content management, storage and maintenance. Of course, we're talking here about taking this element into account in your decision process: in some cases, the short-term benefits may outweigh the long-term drawbacks. Sincerely going through the list of negative aspects above with then next few years in mind may help finding a site's sweet spot in this compromise.
Take some time to carefully review the costs of the drawbacks above:
- "searchability" (increasing your site's affinity with search engine spiders, ensuring that the site's information registers in search engines) greatly increases the impact of a site, and is a very affordable improvement (a "low hanging fruit" waiting to be plucked). Many other factors are important in this regard, such as the number and wording of links leading to your site, but while not difficult at all to achieve, structurally correct markup can help very much, particularly for specialized content (business-specific vocabulary, original material, etc.).
- while a limited population (say, people with a major visual handicap) might be in so small numbers in your intended audience that they do not seem to be worth the effort, bad press, customer complaints and legal pressure may force you in the end to cater for their specific needs. And then, patching things up once the site is up is often much more painful and expensive than doing them right them in the first place.
- care about bandwidth waste may sound ludicrous now, in our days of cheap and plentiful broadband access. But bandwidth usage is not always free, and if it is, the hosting provider may still choose to shut down your site for abuse. In all cases, having a wasteful site puts a larger price tag on success, thereby lessening one of the positive aspects of the online medium.
- at this point, many sites are generated from the database of a CMS, and therefore the impact of content repackaging is usually limited to creating a new template. However, the repackaging trajectory for a given target platform may be more or less costly depending on the current state of the information.
Assessment in your context
In the context of a particular web site, the web builder will have to assess the importance and relevance of the various factors influencing the technical realization. The checklist above can help, but many other factors come into play, such as the intended audience and its social and technical characteristics, the expected lifetime of the site, as well as its lifecycle (the way it will be built, used, reused, and archived or destroyed). And the cost of standards-compliant building must be taken into consideration.
Regarding costs and efforts, here are a few hints:
- old habits die hard, and several activities, such as visual design and coding, need to be tackled in a radically different manner. Switching to standards-compliant coding is not an implementation detail, and it may cost a team quite a bit of effort. Don't decide this at the last minute without adapting your project timeline.
- tables-based layouts are easy to build and rather reliable (within their limits), while CSS-based layouts on the top of structurally correct HTML are still quirky and tend to irritate graphic designers, as they are quite a departure from the safe tables- and paper-based design. Ensure that all parties involved in designing and producing the visual aspect of the site are aware of the change.
- it took time to build up experience in tables-based layouts, it will take time to come up with solutions to all problems for CSS-based layouts. It is normal that something really easy to do with tables should take more effort with CSS, and it will also take a while for everyone on the team to know how to relate to other people's tasks in the new context.
Not for you?
In some cases, you might decide that the standards-compliant route is too expensive and too troublesome for your project.
Building for an intranet with a very controlled park of machines and browsers is not really an argument against standards-compliant building, but may only be considered in combination with other, more compelling factors. Indeed, while designing for one browser may save a bit of effort, or provide extra possibilities, it also increases the reliance of the site on external factors you have no control over: not only could the browsers change, following a company-wide policy reversal, but the company making the browser could choose to evolve the software's technical characteristics in future versions. With standards-compliant code, you're safe in both cases.
The single most relevant reason for building a site that is not standards-compliant is time: if you are creating an event site which will be taken offline within a few weeks or months of its inception. In that case, layout and content will probably be extremely tightly integrated (as in "rich media" sites...), and you won't care much about the drawbacks mentioned above.
However, in most other cases, there is simply no reasonable business case to be made for non-standards-compliant web building, as it is very costly in the long term, while not being necessarily expensive in the short term.
What can you do
Just switching over to such building methods isn't simply a case of reading a few good books and "doing it". You have to want it, and to rehearse a lot. In the practice, watch out for the following:
- the graphic designers should be aware of and familiar with the characteristics and peculiarities of good HTML+CSS web sites, to use them at their best
- someone has to be responsible for content markup and structure, which is a different job than HTML coding (but requires some understanding of the same technology), and a different job than copywriting (but also requires some understanding of the content and the communication objectives)
- the technology must be tamed for that specific use: some CMSs simply can't handle this in a proper way
- all stakeholders, including the client, should be (made) aware of the benefits as well as the requirements
Suggested further reading
Much is already available on what is actually needed for standards-compliant web building, particularly from A List Apart and in Zeldman's book (primarily targeted at graphic designers making the switch from a paper-based approach).
For the more technical people: The web standards project's blog about standards-compliance, and the W3C validator.