Miss an article? Archives

Feature Article

Friday, April 11, 2008

Choosing an XML Schema: DocBook or DITA?

By Richard Hamilton, special to The Content Wrangler (reprinted with permission)

If you follow the latest trends or have been to a conference recently, you may find the idea of choosing an XML schema puzzling.  Isn’t the question really, “How should I customize DITA to do what I want”?  While there are many good reasons to choose DITA, it’s not the only schema in town.

The two most popular schemas at the moment are DocBook and DITA, and I’ll use them as examples.  There are other choices—S1000D and TEI come to mind—but the chances are good that if you’re not in an industry that mandates a particular schema, you’ll end up using DocBook or DITA.

Full disclosure: I’m a long-time DocBook user (this article was authored in DocBook) and a member of the OASIS DocBook Technical Committee. That makes me at least somewhat partisan. I’ll do my best to be even-handed, but you should know where my origins are.

Your decision will depend on the following considerations:

  • Content:  Is your content narrative (books, articles, etc.), modular (topics, reference pages, help pages, etc.), or both?

  • Deliverables:  Do you deliver printed documentation, web pages, help systems, or all of the above?  And, how important is each type relative to the others?

  • Customization:  How specialized is your content?  Do you need to create new markup unique to your application?

  • Scale:  How much content do you need to manage and how many writers do you have working on that content?

Let’s consider each of these in turn.

Content

Common wisdom is that if you develop narrative content, you should use DocBook, and if you develop modular or topic-based content, you should use DITA.  While this is true to an extent, it’s misleading.  You can write books using DITA and modular content using DocBook.

That said, there are important differences, and you will probably find that one or the other will be a more natural fit for your content.  To look for the best fit, you need to consider two kinds of markup:

  • Structural:  Structural markup defines the organization of your content.  It includes markup to identify sections, modules, chapters, or books, as well as markup that builds larger structures from smaller pieces.

    DITA is designed for modular, topic-oriented content. Typically, writers create individual topics, which are then aggregated into deliverables of various kinds using a “ditamap.

    DocBook was originally designed to support documentation structured like a book, with front matter, chapters, and back matter (appendices, glossary, index, etc).  However, it has evolved over the years to support a much wider variety of structures.

  • Inline:  Inline markup lets you tag pieces of content, typically to define their semantics.  For example, in this article, inline markup is used to identify things like links, author information, and book titles.

    DITA and DocBook have markup for most common software and hardware components, as well as standard inlines for links, meta-data, references, and so forth.  In keeping with its philosophy, DITA has fewer inline elements and encourages users to create additional elements as specializations.  DocBook has more choices, and is less likely to need specialization.

The most significant markup differences between DITA and DocBook are structural; therefore, I give them the greatest weight. If you use a topic-based, modular methodology, and you want the schema to help enforce that methodology, you will probably find DITA more to your liking.  If you use a more traditional methodology, or if you don’t want to enforce a particular methodology, you will probably find DocBook more to your liking.

If your team writes both narrative and modular documentation, it’s a tougher call.  Each will handle both types of documentation, but other things being equal, I give the edge to DocBook. I think DocBook does a better job handling modularity than DITA does handling books.

Regarding inlines, I suggest that you look at the choices offered by each schema.  If you need a lot of new inlines, you’ll probably be happier with DITA, but if your content is mainstream software or hardware documentation, you’ll probably find that DocBook already contains what you need.

Deliverables

Both schemas have open source XSL stylesheets that generate a range of deliverables.  As I’m writing this, both the DocBook XSL stylesheets and the DITA Open Toolkit will produce: print (using XSL-FO), HTML, XHTML, HTML Help (HTML that can be compiled into Microsoft HTML help), JavaHelp, and Eclipse help.  In addition, DocBook has stylesheets to generate WordML and plain text, and DITA has stylesheets to generate Microsoft’s Rich Text Format (RTF) and troff.  And, if you can’t make up your mind, there are stylesheets that convert DocBook to DITA and DITA to DocBook.

Since both support essentially the same formats, the important differentiator is how well the stylesheets work for your deliverables.  The DocBook stylesheets are more mature, and are well documented in Bob Stayton’s DocBook XSL: The Complete Guide.  The DITA stylesheets are newer, and have less well developed documentation.  Both are actively maintained, and both have strong communities of interest that are willing to help.

No matter which you choose, you will need to customize the stylesheets.  I’ve never seen an organization that didn’t need something different from the standard look and feel.  If you have XSL-knowledgeable staff or contractors ...

Read more

Filed under: DITADocBookStructured ContentTechnical WritingXML

News & Notes
(updated daily. almost.)
News RSS Feed

What’s Your DITA Quotient?

Monday, April 21, 2008

DITA Users has announced a new online tool designed to help organizations determine the value of adopting the Darwin Information Typing Architecture. Complete the 10-question profiler and learn what value DITA may have for your organization.

DITA for Publishing: DITA2InDesign and Project Gutenberg

Friday, March 21, 2008

From the Really Strategies blog, XML guru Eliot Kimber explores how DITA can be used to create content that’s not technical documentation. Kimber explains the basic idea behind the DITA2InDesign project and provides links to DITA Project Gutenberg.

Content Management Is Most Important Issue, Say Technical Documentation Managers

Thursday, February 14, 2008

In the latest edition of Techcom Manager e-newsletter, technical documentation managers were asked what topic is the “most important or getting the most attention” in the organizations for which they work? The top answer: Content Management (41.5%). See the survey results.

Subscribe: Direct Inbox Delivery

Get The Content Wrangler Newsletter delivered straight to your home or work Inbox. It's full of content goodness.

sponsors Image Image Image Web Content 2008 Chicago image image Image image Inmedius Horizon Image Image Image Image Image Image image Image Image Image Image image