By Paul Trotter, CEO, Author-it Software Corporation

Various reports have shown that knowledge workers spend about 30% of their time looking for content that has already been created.  If that sounds like a colossal waste of time and money, it is.

But in terms of waste, time and money is only the tip of the iceberg.  The more pressing problem is that by continually creating corporate documents from scratch, companies run the risk of producing external and internal communications that are inconsistent in style, appearance, and – even worse – message.  The ramifications of these shortcomings can be disastrous, particularly with respect to industry-specific compliance issues.

Consider a financial institution that generates vast amounts of investment offers for its customer base.  Disseminating the most up-to-date information quickly and accurately to end users is critical.  However, if a law is changed or the restrictions on an investment vehicle are revised, and the material reflecting these changes is either inaccurate or not distributed in a timely manner, an institution leaves itself vulnerable to potentially massive fines for non-compliance – perhaps even a litigious situation.

Clearly, such examples, as well as less dramatic ones, highlight the need for content management within most organizations.  However, many of these entities are relying on content management systems (CMS’s) that are woefully insufficient or no systems at all!  Frankly, many organizations that are authoring and managing huge volumes of content are using nothing more than simple desktop products.  What they’ve realized, whether it’s triggered by compliance issues, lack of resources to manage the problem, fear of litigation because of inconsistency in the content, or the increasing cost of content translation (localization) is, “We’re not doing this very efficiently.”

At its core, the primary reason people consider component content management is that there is duplication of content and they realize that they can save time and money, as well as increase the consistency of their internal and external communications, by simply reusing previously approved information.  Further, it holds the promise of improving the speed and productivity of the people producing these communications.  There are “trickle down” effects as well: editing is easier and translation to multiple languages is simpler, to name just two.

When company management reaches the conclusion that reuse is not feasible using the current tool set, they explore the idea of reuse with increased diligence; what they find is a fundamental difference at the very core of competitive systems.

A Topical Approach

Most content management organizations promote the concept that in order to reuse content you must segment content into topics. This approach works well for technical information because with technical content you are describing concepts, asking people to perform tasks or follow steps, or providing reference material.  Consequently, you can reasonably and easily create topics that represent concise ideas, and ultimately, small chunks of content.

However, while people might comprehend the benefits that topic-oriented documentation provides, they generally don’t grasp the downsides of such an approach.  One of the first requirements that need to be fulfilled in order to utilize the topic-based method is for people to start examining how they write.  They must figure out how they’re segmenting topics.  They also have to write in a style that is consistent, so that when other personnel are assembling documents, the documents sound and read like they’ve been written by the same group of people.  But this is not always possible from document to document.

For instance, a user document is typically written in a “second-person” format, i.e., “you will do this or that.” Conversely, a sales proposal, or a response to an RFP would normally be developed in a “third-person” format, i.e., “the user will do this or that.” This minor difference in the way diverse types of documents are written presents a formidable challenge in whether topic-based reuse will be practical or not.

Secondly, topics must be crafted in a way that makes them reusable so they can be slotted into any number of documents without causing problems of context.  Then, people have to be disciplined enough to go out and actually seek the information they need and place it in their documents.  Obviously, it is essential to have access to tools that will support this task, but it’s even more crucial to have the required discipline in order to gain the benefits that the topic-based content management promises.

Few content management companies truly understand the problems behind topic-based content reuse.  In the end, all of the problems actually come down to one factor: people.  Either people are unwilling to take the time to seek out content similar to what they’re writing, or they don’t even know it exists, so they don’t think to look for it.

How can these problems be addressed?  If the information is made even more granular, then it’s even harder to work with.  Part of the reason topics became difficult for people to work with is that when they’re used to working with a 100-page document, a two-paragraph piece of information seems too small, too granular to deal with.  What’s more, the time saved through the reuse of content – by reducing the time spent on other parts of the writing and editing process downstream – is negated by the time and management overhead required to deal with these minute pieces of information.  And the more granular the information, the more burdensome it is to manage.  If you can’t save money, there’s no point in doing it.

Even when a writer finds the topic that he or she is looking for, making it usable to the next person is a job in itself.  Let’s say you’re writing some marketing content for a website.  You know someone has written something similar, but you’re not sure where it is.  Using traditional search methods, you find the piece you were after, but it was inside a larger topic describing the subject in far greater detail than you needed; two of the paragraphs were exactly what you wanted to use, but the rest of it is superfluous.

In order to reuse just those two paragraphs using a topic framework, you would first have to turn those two paragraphs into a separate topic.  Then you would nest that topic in the original place you found it; you would then have that smaller topic to reuse where you wanted it in the new document.  What this all means is that if you’re creating topics, not only do you have to create a topic that makes sense for your current purpose, you must be prescient enough to guess or foresee the ways people might use that content in the future, possibly in more granular form.  Clearly, this is not a practical exercise; the amount of effort to break up the desired content and “re-reference” it is substantial – to the point where you’re simply going to copy and paste it.  In a topic-oriented world, that creates duplication, the very thing you’re trying to avoid.

Paragraphs Preferred

The obvious question, then, is, “Why not just save the paragraphs and forget the topics?” In other words, go to the paragraph level, but do it in a way in which the user doesn’t have to really do anything.  There are a few products that do store content in paragraphs rather than topics.  As a result, if a writer was to copy and paste a paragraph into another topic and save it, the system would say, “I’ve got that paragraph; I’m going to reference it.” In this way, content is never duplicated, and all identical content residing in the company’s database is instantly consolidated.

Even fewer of these products can go to the next level.  When the developers of these products started looking at paragraphs lined up next to each other, they saw many that were very similar.  They may have very tiny differences that were not obvious to the user, but the computer could easily identify them.  So the products were equipped with a background algorithm that compares every paragraph to every other paragraph in the database and generates a “matrix of similarity.” Through visual highlighting, the user is shown similar paragraphs to the one that is being written; most of the products use a color to show paragraphs that exhibit similarity of 95% or higher.  This affords the user the opportunity to consolidate similar content into one way of saying it.

Sometimes, the only differences between paragraphs are white space, punctuation, or capitalization.  These elements are placed into a special category called “Exact Match,” while the others are called fuzzy match.

Of course, it is also important to prevent people from creating these differences in the first place – that is one of the primary tenets of successful content reuse.  Consequently, it is optimal if these differences can be circumvented at the most effective time possible: the actual typing process.  As the writer types, similar content is needed, so the writers can choose to reuse content at the point of content creation.  If you analyze the value and investment of a CMS, the only opportunity you have to save financial and human resources is at the time the user is typing; once content has been typed, time has been used, and the opportunity evaporates.

There is an ancillary benefit to an effective CMS.  Not only are users prompted with suggestions of available, matching content, it also becomes obvious to users how certain documents are written.  The writer might be composing a piece that is slightly different from the corporate “norm,” but as soon as suggestions of content are presented, it is quickly apparent what the proper style is for that type of document.  The writer might well create a unique manuscript, but it is still going to be written and structured the way other content is written and structured within the company, department or division – not from a language or technical point of view but from the perspective of actual sentence structure.  Thus, there will be consistency not only in reuse of content but also in style.

It should be noted that while it is preferable for a CMS to operate on a paragraph level, the ideal CMS can function on a topic level as well.  Once a solution has offered content suggestions at the paragraph level, it will optimally allow the user to view all of the contexts, or topics, in which the information appears.

The Human Factor

As pointed out earlier, even with the most powerful content management tools offering content suggestions to the user at the paragraph level – without the user being burdened by the need to search or even ask for it – there are human factors that can prevent a successful system implementation.  The net effect is that many of these systems never reach their true potential.

The range of human factors is almost as diverse as the humans who use them.  Most are thinly veiled excuses that, unfortunately, mask some writers’ insecurities, biases, or inability to adapt to new technology, or even change in general, such as:

  • “John’s new with the company; he didn’t know we had that kind of system.”
  • “I knew we had the system, but I just figured it would be easier to write the document over again.”
  • “No one can write as well as I can – I don’t want to use other people’s inferior material.”

Clearly, the goals of the organization, at least in the area of document creation, are sometimes at odds with the goals – or at least the approach – of the people who work there.  There are a number of reasons this can occur, as stated above.  But in the end, there is usually one overriding factor that serves as the foundation of all the adoption issues: employees don’t view content as a corporate asset that takes time and resources to create.  And because few people consider it an asset, few people track it and manage it the way they would manage, say, the company’s financial resources.

In the final analysis, the best content management tool will not succeed unless it is easy to use and people are willing to use it.  What’s more, there has to be a set of initial guidelines set down at the corporate or department level that people will actually follow.  If they don’t adhere to them, the entire exercise is a waste of time.

Granted, this will involve some work, and it might seem that the initial effort to integrate the system into the corporate culture will make the overall task of creating documents harder than before.  But if done correctly, it will be the classic case of taking one step back to take 10 steps forward.

About the Author

Paul Trotter is the founder and CEO of Author-it Software Corporation. He is a sought-after presenter and well-known expert on the subjects of single sourcing, component content management, collaborative authoring, and localization. Contact Paul.