Sunday, June 10, 2007

HTML vs. XHTML

About the delay in this post… We spent two weeks on vacation in Alaska and a week cleaning up all the things that accumulated while we were gone. By way of atonement, here’s a photo of the Margery Glacier in Glacier Bay. It’s about 250’ high and periodically shed chunks of ice the size of phone booths to the tune of gigantic cracking sounds.
















And now, back to computers…

If you evaluate different help authoring tools, you’ll see that many use HTML but some use a language called XHTML. In order to understand the difference between the two and why it matters, you need to understand some basic concepts of XML.

HTML (Hypertext Markup Language) dates back to 1990 and is the basis of the web. It’s still widely used but its limits began to appear as early as the mid-90s. To deal with those limits, the W3C (Worldwide Web Consortium – the main internet standard-setting body) introduced a new language called XML (Extensible Markup Language) around 1997.

Unlike HTML, which has about 90 codes, XML technically has none and isn’t a language at all. Instead, it’s a “meta-language,” a set of rules for creating custom codes/languages optimized for specific situations, as opposed to the general purpose but less efficient HTML. By way of analogy, if HTML is equivalent to the alphabet, XML is equivalent to a set of rules that let you create your own custom alphabets.

No one seems to know how many XML-based languages there are, but estimates are well over a thousand including CBL (Comic Book Markup Language), CSML (Cave Survey Markup Language), MAML (MicroArray Markup Language), another MAML (Microsoft Assistance Markup Language), and so on.

The ability to create your own custom languages obviously offers great flexibility but can cause problems. Someone must have defined the codes for the custom language that you want to use. That definition process is fraught with design difficulties (what codes do we need now and in the future?), technical difficulties (how do we implement those codes?), business difficulties (if two groups in the same company or two companies in the same industry create different sets of codes to do the same thing, whose codes get used?), and implementation difficulties (how do we get people to change the way they work in order to use XML?). I’ll discuss the partial solution – XHTML – a few paragraphs further on.

In addition to supporting custom languages, XML also requires adherence to code syntax rules. HTML has syntax rules but developers often ignore the rules in order to do things more efficiently, more creatively, or just differently for the sake of being different. Most such “hacks” still work, even though they violate syntax rules, because HTML browsers like Internet Explorer and Firefox are very forgiving. But this still causes problems; the browsers waste time figuring out how to deal with various hacks, HTML files from different developers may not work together cleanly, and so on.

So, given the difficulties involved in defining custom languages and the difficulties due to syntax violations in HTML, what’s needed is a language that addresses both problems, which brings us to XHTML.

XHTML (Extensible HTML) is basically HTML revised so as to follow the syntax rules of XML. In other words, it keeps the familiar HTML but enforces the syntax rules much more tightly. Developers can still break the rules, but far less so. Because of the need to follow the rules, XHTML is sometimes referred to as “HTML done right.” It’s so useful that the W3C officially replaced HTML with XHTML a few years ago, although HTML is still widely used.

I’ll summarize by addressing the obvious question – does it matter if you use HTML or XHTML? The answer depends on what you need to create.

If you want to create standard web sites or help systems, HTML is fine. There’s no need for XHTML. (But most authoring tools that create HTML today will probably switch to XHTML in the next few years so you’ll go from HTML to XHTML without even knowing it.) One of the best known help authoring tools, RoboHelp 6, is based on HTML but lets you import XHTML files into a project, converting them to HTML, or export HTML files out to XHTML.

If you want to create web sites or help systems that use newer XML-based standards like MathML (Mathematics Markup Language), SVG (Scalable Vector Graphics), or DITA (Darwin Information Typing Architecture), you’ll need to use XHTML instead because XHTML, by virtue of its following XML’s syntax rules, effectively is XML whereas HTML is not and thus can’t handle those new standards. One of the newest help authoring tools, Flare, uses native XHTML.