Content Reuse: Is It Harmful?
By Richard Hamilton, special to The Content Wrangler
A place for everything and everything in its place—Isabella Mary Beeton, The Book of Household Management, 1861
For a number of years it has been a matter of faith that the more content a technical documentation team reuses, the more efficient they are presumed to be. Vasont Systems, a content management system (CMS) vendor, claims its users average 71% content reuse. That is a bold claim, but I suspect that if you could show even 30% or 40% content reuse, you would earn bonus brownie points with nearly any manager. But, are you really more efficient? Let’s take a deeper look.
Terminology
- Duplication: You separately maintain more than one copy of some piece of content in source control or a Content Management System (CMS). If you keep two copies of a glossary definition in your source control system, that would be duplication.
- Reuse: You put the same piece of content into more than one deliverable in the same output medium. If you have just one copy of that glossary definition in source control, but include it in the printed versions of your Installation Guide and User’s Guide, that would be reuse.
- Single sourcing: You deliver the same piece of content via different media. If you deliver the Installation Guide in print and also on the web, that would be single sourcing.
[Note: Any given piece of content can be reused or single sourced or both. Defining single sourcing and reuse separately may seem to be nitpicking, but the distinction is important.]
Why Minimize Reuse?
I doubt anyone would argue against minimizing duplication. The benefits are clear and the exceptions are relatively few. I also agree wholeheartedly that single sourcing makes very good sense. But, I differ with the mainstream regarding reuse. I believe you should minimize reuse, not maximize it.
There are two main reasons for minimizing reuse:
- Every time you reuse content, you give your users another place to look at when they search for that topic. If you have the same content in several different places, your users can end up jumping around among those places, trying to figure out which one they should use. Having one, authoritative place for any particular module will simplify their search and avoid confusion.
- Even with highly structured methodologies, reuse is not free. When you reuse content, you need to take steps to be sure that content will work in multiple locations. This takes effort that might not need to be expended for content that is not reused
Driving out Duplication
Most efforts to maximize reuse start by looking for and driving out duplication. The search for duplication typically identifies places where there is an exact or close match between two or more pieces of content. For each of those matches, you have three choices:
- Continue to maintain two (or more) versions of the content.
- Merge the matching content into one version, store that version in source control, and use it in each of the original locations as you build deliverables.
- Remove all but one instance of the content, and if you must, point to it rather than copy it.
All three of these choices cost something. Choice one abandons reuse, giving you more content to maintain. That is not always a bad choice; if there are enough differences in the content to make maintaining one version more expensive than maintaining two versions, this might be a valid option.
Choice two is classic reuse; you will have some additional work making the module work in multiple contexts, and you will have some additional work over time maintaining that independence. But, you will usually save effort over maintaining separate versions.
Choice three eliminates duplication and reuse. If you can eliminate all but one of the situations that used the content, you have not only eliminated the duplication, you have reduced the overall size of your deliverables. When it works, this choice is the most efficient of the three.
The Bias Towards Maximizing Reuse
I see a bias towards choice two in most of what I have read about content reuse; in fact, often choice three is nowhere in sight. While structure is given its due, I see little discussion about structuring content to minimize the need for reuse. Several factors fuel this bias:
- Metrics: It is easier to create a metric to measure reuse than it is to create one to measure where you have avoided the need to reuse.
- Human nature: Choice three requires you to eliminate content. Since nearly all content was originally generated because someone needed it, your natural inclination will be to keep content, even if it is redundant.
- Content Management Systems: The typical CMS makes reuse easy. Just mix and match modules, push a button, and poof, you have a new deliverable.
- Structure: Choice three requires you to look more deeply into your structure, which your team may not have the time or inclination to do.
If unchecked, these biases can leave you with a lot of unnecessary reuse. You can argue that’s not a big deal, but even when well structured, a heavily reused module will take more maintenance than one that is used in just one place. In addition, it will needlessly increase the bulk of your deliverables. Both of these factors decrease efficiency. If you are serious about maximizing your efficiency, you need to structure your documentation with a bias against both duplication and reuse.
Implications for Modular Documentation
So, am I arguing against modular documentation? No. Consistent structure and style help people use your documentation. And good methodologies give your authors the guidelines they need to produce consistent structure and style. Where things go off the rails is when you try to treat your documentation as a set of modules that can be indiscriminately mixed and matched to create whatever deliverables you want.
Content that is central to your message deserves a context within which it can live. If it is pulled out of context, it will either be confusing, or it will require additional information to provide that context, either as part of the module itself, or in the including document.
Jon Bosak summed up the problem nicely in his Closing Keynote at the XML 2006 conference:
Another ancient subject that seems to be popping up again is the idea of modular document creation. This is one of those concepts that comes through about once a decade, seduces all the writing managers with the prospect of greater efficiency, takes over entire writing departments for a couple of years, and then falls out of favor as people finally realize that document reuse is not a solvable problem in document delivery but rather an intractable problem in document writing – which is, how to retain any sense of logical connection between pieces of information while writing as if your target audience consisted entirely of people afflicted with ADD.
While I do not have quite as pessimistic a view of modular documentation as Bosak expresses here, I do think that maximizing reuse without considering context and structure yields documentation that is difficult to use. Even if your structure allows you to easily reuse modules, there is benefit in doing so only when you have a compelling reason.
What I advocate is to first build a structure that minimizes the need to reuse content, then judiciously choose where, within that structure, you will reuse. Obvious cases include glossary entries, legal boilerplate, and repeated procedures. And you will find other places where it simply makes sense to include content rather than sending the user off somewhere else to find it.
If you start with Isabella Beeton’s words in mind, you will end up with less reuse, better structured documentation, a more efficient process, and maybe customers who do not feel you are inflicting ADD upon them.
About the Author
Richard Hamilton is principal consultant with R.L. Hamilton & Associates, specializing in documentation management and the application of XML technology to documentation. He is the author of the forthcoming book, Managing Writers: A Real World Guide to Managing Technical Documentation, which will be published by XML Press later this year.
The Content Wrangler
























You make some good points.
Your article sparked another thought for me. As I writer, I’m all for structure and efficiency, but as a some-time user of documentation, I have some qualms. On more than one occasion, I had trouble understanding something as it was written in one place in the doc but I did understand a similar passage about the same topic that appeared elsewhere, mainly because a quick in the wording helped clarify something for me. As a reader, I was able to exploit the writing team’s “messy” inconsistency to my benefit.
Likewise, if you don’t understand what someone is saying in casual conversation, you question the speaker. Unless he or she is a robot, the other person usually replies by giving you the same information but with slightly different wording so that you “get it” the second time around. Scrubbing our language of all variation closes that loophole.
Don’t get me wrong. I’m not arguing in favor of thoughtless repetition and variation. I’m just wondering if it might have some cognitive benefit, since it seems to be hard-wired into human communication patterns.
Hallo Richard
As a reader and writer of technical documentation, I agree whole-heartedly that we should apply careful judgement when re-using content.
As a reader, when I come across a chunk of text that I recognise having seen before in a different part of the same documentation set, I am usually a bit taken-aback. Also, being of a distinctly distrustful nature, I usually go back to the other place and check both passages almost word for word, to see if they are the same. This is because I _know_ that duplication happens in technical documentation, and I also know that there’s not guarantee that both occurrences are up to date.
So, as a technical writer, I’ve often suggested that a re-used chunk of content should actually include a statement at the top, telling the reader that this piece of text comes from a library of re-usable chunks.
As you say, there definitely are cases where content re-use is advisable. For example, you may need two versions of an installation guide, for two different technical environments, but where large portions of the guide are the same in both environments. Instead of telling your read to go somewhere for the common instructions, then come back for the special bit, then go back to the common set, etc, you should be able to give them a single set of sequential steps. So the common instructions would be from a re-usable chunk.
Another case is when you need to issue a product warning of some kind, which affects more than one version of the product. If it’s in a re-usable chunk, you can simply empty the content when the warning is no longer required.
A good, thought-provoking article. Thanks!
I agree with what you say. I think that one of the major differences between re-use and single sourcing is that in re-use, the author specifies the relationship between two fragments of information, whereas in single sourcing, this is done by the developer aggregating the fragments into an output. In the event that a new configuration is required for a brand new output, it will probably make less sense for the author to go back and rework the relationships than for the developer to just be told how to do the new aggregation.
This would result in a hybrid system though, where the author couldn’t modify the content without first ascertaining whether the developer’s aggregation had some stake in that fragment. An arguably more robust approach is to have the author create the information in a way that makes sense for them, then have them confirm that the output produced is fit for purpose.
This should be more productive for the authors and developers as well as providing more predictable outputs.
As a translator of technical documentation, I see many unfortunate results from content re-use. I often get the impression that it encourages a certain laziness in the persons preparing the documentation, who often repeat large chunks of text unnecessarily in the same document or fail to adapt a re-used text chunk to remove information relevant to a different product. This problem is widespread: I see it in documentation from small companies as well as some of the world’s leading corporations.
The more fundamental problem, I think, is the refusal to allow time for full, careful review of the text by writers, editors and approvers. How to solve that with the constant mantra of cost-cutting droning in one’s ears is anybody’s guess.
Hi, Dick!
Your article found me in the middle of a re-use conundrum involving seven books describing a single product for different operating systems with varying degrees of similarity. I have the necessary tools for the job (DITA and a CMS), but as I analyze each chunk of information, I’m faced with the decision of whether to create a stand-alone chunk for that particular OS or to take the effort to apply the filters and changes necessary for it to be shared in multiple contexts. It’s a seat-of-the-pants decision generally left up to the front-line writer, but because it has long-term effects, we would all benefit from objective criteria and strategic direction.
I’ve struggled with the consequences of choice 3, “Remove all but one instance of the content, and if you must, point to it rather than copy it,” in a non-structured environment in which references were made to multiple documents, all of which had to be downloaded for the reader to make use of the references. And the impossibility of keeping duplicated information up-to-date is what attracted most of us to structured re-use in the first place.
In general, I’ve found that the rewards of re-use far exceed the pain of making the content fit multiple contexts. But I agree that we need better guidelines for determining when reuse just isn’t worth the effort. Will this topic be addressed in your forthcoming book on managing writers?
Thanks from the trenches,
Jim
I think one way to avoid some of the ADD neurosis of duplicated content is to pick your deliverable formats wisely. Obviously, you don’t want to eliminate your ability to single-source by cutting out bad deliverable formats, but you may want to think of deliverable formats in terms of feature sets. For example, the Eclipse Infocenter deliverable provided by the Open Toolkit renders the content in a much more atomic page way, so that a reader is very unlikely to confront duplicate information (and if she does, she is unlikely to think strange of it). I think a lot of these issues are really about reader expectations, and if you can frame their expectations through the delivery mechanism (usually technological in some way), then you avoid any cognitive dissonance. Likewise, if you write in a way that feels right for an atomic page on an Infocenter you are likely to craft your prose in a way that is easily reused.
Quinn DuPont, I prefer to look at it the other way around. Since there is no way to predict all future delivery formats, the effort is better placed in authoring good information than in trying to author information appropriate for a delivery format. In some cases, I think that the delivery format is at odds with what an authoring team might consider to be “a natural approach”. I’m not suggesting that the delivery formats should be ignored, just that they be considered along with a range of other factors, not as a primary driver.
Why doesn’t the first reason given for believing that reuse should be minimised weigh equally strongly against single-sourcing? If I have the same content in the help file and in the user manual, why aren’t my users “jumping around among those [two] places, trying to figure out which one they should use”?
I’m sceptical about how much “jumping around” users do anyway. Users are generally looking for answers. When they find the answer, they tend to stop looking. If I’m right about that, by reducing reuse you could be making your users work harder to find what they’re looking for.
As for the second reason, well, yes, reuse is not free. But the effort needed to make content reusable is pretty much the effort needed to make it useful in the first place. Predominantly, content that cannot be reused is badly written content.
And surely reducing the “bulk” of your deliverables to increase efficiency is today only relevant to printed media.
That said, I agree that the power to reuse content should not be abused. But the danger of excessive reuse is poorly structured and incoherent documentation, not jumpy readers or over-worked authors.