Wednesday, May 21, 2008

Microformats

In April, I gave a workshop for the Boston STC on the issue of doing structured authoring without using DITA or Frame. In the workshop, I happened to mention a technology called microformats which several people asked me to define during and after the workshop. I did but, in retrospect, decided that I wasn't satisfied with the answer that I gave. So, here's a better definition that also has a number of additional ramifications.

Microformats use elements from existing languages or standards, like HTML, to mark up web content in such a way as to add semantic information to that content for use in Web 2.0, without having to adopt new languages or standards. Basically, microformats re-use existing features of current languages and standards.

There are several issues tied up in this definition. Let's take a look at two big ones.

- Existing languages or standards... - Every language or standard has a number of widely-used features and an often much larger number of little-known features. The latter often go unused or unnoticed. For example, the rel attribute of the link tag points to the location of the CSS that we attach to a topic in a help system, but rel often goes unnoticed unless you delve into the code. Yet rel is actually pretty flexible, offering a bunch of pre-defined values *and* the ability to define your own. This can get pretty esoteric, but it does not require you to buy and learn new software, just new ways to work what you already have.

- Semantic... - HTML tags like h1 are presentational rather than semantic. In other words, applying h1 to text tells us how to display that text but not what it is. For example, consider an online book store that uses HTML to mark up its listings. We could create a book listing and use h1 for the book title and h2 for the name of the author. We can then format the display by specifying the style attributes for h1 and h2, but we have no way of knowing that h1 is actually the book title and h2 is the author's name - e.g. the semantics of the information. XML lets us fix this by creating our own, semantically-definitive tags, such as creating and using tags called and rather than h1 and h2. But HTML already has elements that carry semantic information, such as the "cite" element that lets us identify a block of text as a citation. In other words, we may well be able to add semantic information without having to move to XML or DITA.

For a detailed overview of microformats, I recommend Microformats: Empowering Your Markup for Web 2.0 by John Allsopp, published by friendsof. In fact, I recommend reading the book even if you never plan to use microformats because of two other useful aspects of the book.

The first is the author's discussion of structural and semantic HTML in chapter 3. Here, he discusses some of the more rigorous programmatic aspects of HTML, how they're implemented, and why they're important for the long run.

The second is the nuggets sprinkled throughout the book, such as this one on why XML is important for RSS feeds.

"...RSSs are also XML-based languages, meaning that feeds must at least be well-formed..."

(from Microformats: Empowering Your Markup for Web 2.0, page 226.)

Why does this matter for technical communication? Today, most material produced by technical communicators is self-contained - e.g. a help system or user manual produced by one developer. But the web already has features like RSS feeds and aggregators that may be just as useful for technical communication. What the nugget above is saying is that RSS feeds and aggregators will most likely require XML, which will require a move away from HTML and the adoption of authoring tools that produce content that's at least well-formed if not valid. ("well-formed" and "valid" in the programmatic sense of following XML syntax rules.)

The book assumes familiarity with HTML, XML, IETF, and other acronyms and is very dense, but it's a quick read if you just look at a few code samples to get the idea and focus instead on the larger issues of programmatic rigor. Highly recommended.

Saturday, May 3, 2008

How We Got Here - A History of DITA (and other things that we see today, not necessarily DITA-related)

Technical communication tends to focus either on the present (gotta finish the project...) or on the future (what's the next big thing and how might it affect me). The past sometimes gets lost. Yet if we get beyond the "When I was your age..." stories, the past can often teach us a lot - why did a particular technology or tool or methodology fail ten years ago, for example - letting us draw parallels to something we're doing today. And if nothing else, the past is intellectually interesting. How did we get to where we are today...

On that note, I recommend reading a history of DITA, written by long-time DITA consultant (among many other things) Bob Doyle. The article, available at http://dita.xml.org/book/history-of-dita describes the history of DITA but, in a larger sense, describes the evolution of today's technical communication field. Highly recommended.