Home » main blog » Currently Reading:

Case Study in Controlling Documentation Quality with acrocheck: Assisted Writing and Editing at SAS

March 22, 2008
These icons link to social bookmarking sites where readers can share and discover new web pages.
  • del.icio.us
  • StumbleUpon
  • email
  • Facebook
  • LinkedIn
  • TwitThis
No Comments

By John Kohl, SAS Institute (reprinted with permission from Client Side News)

In a previous article, Uwe Muegge speculated about why we don’t hear about more companies using controlled languages. According to Muegge, “anyone new to the field may have a hard time finding reliable, vendor-independent information on what [controlled-language] solutions are available and what the costs and benefits of deploying those solutions are.”

We found that to be true in our investigations at SAS Institute as well, but at least part of the problem is in the interpretation of “controlled language.” In the course of our investigation, we found that the technology has evolved to controlled-authoring, and that it is no longer limited to helping authors conform to a strictly controlled language such as Simplified Technical English. Instead, many companies use controlled-authoring software to:

  • Ensure a high degree of language quality and consistency in their publications
  • Increase the productivity of content authors, editors, and translators
  • Help non-native authors produce better quality English source texts
  • Other business reasons

Our investigations led us to one such controlled authoring product: acrocheck. The acrocheck suite of Content Quality Management tools is based on a Natural Language Processing engine that evolved over the course of 15 years of research and development at the German Research Institute for Artificial Intelligence (DFKI in Saarbruecken). Acrocheck is sold and supported by acrolinx GmbH in Berlin, with offices in the US.

Overview of SAS

SAS is the largest privately owned software company in the world, and it is the global leader in business intelligence and analytical software. It has 10,000 employees worldwide and annual revenues of about $1.9 billion. In our Documentation Division we have 53 technical writers and 12 editors. Of course, we have content creators in other divisions as well, but so far we have implemented acrocheck only in the Documentation Division.

Why acrocheck

Our implementation was motivated partly by the need to standardize and control terminology. In recent years, SAS products have become more integrated. We also began publishing documentation on the web with a consolidated index and full-text search. Terminology issues became more visible to us, and to customers, than ever before.

The intensified pace of globalization also meant that we had to find an efficient way of making our documentation more suitable for translation and easier for nonnative speakers of English to understand. To address this second issue, we have developed a detailed set of “Global English” guidelines. But even the best technical writers find it difficult to apply complex style guidelines or to consistently conform to lists of approved and deprecated terms. Deadlines and time pressures make it impractical for authors and editors to refer to style guides and glossaries frequently.

Since SAS is all about using technology to support business processes and decision-making, it is only natural that we would look for a technological solution to help our authors follow our style and terminology guidelines. We also anticipate that the increased consistency in our documentation will make the use of translation memory more effective, and that consistent terminology and phrasing will make our documentation more usable for all our audiences.

Implementation

We were fortunate to have an active executive-level champion, in addition to great management support throughout the company. To emphasize the goal of helping our authors communicate clearly and consistently, we used Assisted Writing and Editing (AWE) as the name of the project. Although we realize that we are controlling the English language to some degree, we wanted to avoid the negative connotation of the term “controlled authoring.” Besides, we’re not really depriving authors of anything that they can’t easily do without; we’re just helping them make optimal choices.

The rollout, which began in May of 2007, has gone quite smoothly. Overall, the response from our writers and editors has been extremely favorable. We’ve gotten quite a number of positive comments, including the following:

  • “This looks like a great tool and I’m really glad you all have gone ahead with it. It should free up editors to do some deeper edits.”
  • “I’m really impressed with this software. I had no difficulty installing or running it.”
  • “I’ve used it on some HTML Help files. Very pleased with the results. It was straightforward and really nailed me in a few places. (Ouch!)”
  • “This tool is AWEsome! I love it!”

Because acrocheck gives authors immediate feedback on their own writing, they quickly learn to follow guidelines that they never quite grasped before. After an initial productivity hit, this training effect leads to the opposite: a significant productivity increase. Writers fix grammar, spelling, style and terminology issues early in the writing process, so there are fewer corrections to be made late in the documentation cycle, when the pressure to deliver is greatest. Because much of the copy editing work is now done during the writing process, our editors have more time to devote to more substantive issues.

Implementation Details

The whole implementation process, from the initial decision to proceed through our rollout, took a little more than a year– although I hasten to add that our experience was atypical. Most companies have done it in half that time, or less.

The first issue that contributed to the extended timeline was our extensive collection of deprecated terms, in addition to approved terms, for which we had some background information that we did not want to lose track of. Because that information was scattered around in several places, it took a while to consolidate it into one Excel spreadsheet that we could then use as the basis for an acrocheck term bank. Then we had to specify what the Help topics for those terms should look like, because we wanted to structure them differently than in acrocheck’s default approach.

Second, SAS documentation contains many oddities that we had to “teach” acrocheck to handle. For example, acrocheck initially interpreted the word “%tmfilter” (the name of a software concept called a macro) as two “tokens”–the percent sign and “tmfilter.” That issue became apparent when “tmfilter” was flagged as a spelling error, as if it had no percent sign attached to it. Acrolinx defined dozens of token classes such as “PercentLowercaseWord” for us so that acrocheck would recognize that these “word shapes” were single terms and that they should not be checked for spelling. According to the chief linguist at acrolinx, SAS has more token classes than any other acrolinx customer!

Third, a few months into our implementation, acrolinx rebuilt their batch client, which we planned to use for checking HTML documents. We gladly accepted a two-month delay because the new client included support for checking SGML documents. That was a huge benefit to SAS. We are moving to an XML-based publishing system, but a lot of our content is still authored in SGML.

Fourth, we tested and optimized the acrocheck rules quite thoroughly in order to reduce “false alarms” to a minimum. In hindsight, it wasn’t necessary to be so thorough–most of the standard acrocheck rules are quite accurate “out of the box”–but at the time we were perhaps more concerned about user acceptance than we needed to be.

To facilitate testing, we assembled a large collection of our documentation as our test corpus, which took quite a while. We ended up with 64,000 files and 17,000,000 words in our collection. We used a 1,000,000- word subset of the collection for early testing of new rules that we asked acrolinx to develop. We would run acrocheck in batch mode against the entire collection, review the output from each rule, work with acrolinx to make corrections, and test again.

Most of the refinements to the grammar and style rules reflect the nature of our content. For example, acrocheck flags the following sentence as an error because “the at” seems to be an ungrammatical sequence of words:

The remaining seven characters can include letters, digits, underscores, the dollar sign ($), or the at sign (@).

But you can prevent that “false alarm” from being triggered by modifying the rule so that it ignores any occurrence of “the at” that is immediately followed by “sign.”

Here’s another example. Acrocheck flags “a HMDA” as an error and suggests “an HMDA” instead:

To view a HMDA Edit Analysis Report, complete these steps:

But “HMDA” is pronounced as an acronym (HUM-dah), not as an initialism (H-M-D-A). So “a HMDA” is correct, and we needed to tweak the rule to prevent this false alarm.

Pilot projects were another part of the implementation process. It takes time to solicit volunteers, do initial training, and collect feedback. So that’s another factor on the customer end that affects how long an acrocheck implementation takes. And we heavily customized the Help files for grammar rules and style rules to use examples that came from our own documentation.

Results

With our initial set of style rules and terminology guidance, we’ve found that acrocheck eliminates a lot of unnecessary variation. Even at this early stage of our implementation, we can see that the writing/editing process is more efficient, and that our terminology is much more consistent.

It’s too soon to have hard figures, but we certainly expect to see reductions in translation time and cost as a result of standardizing our terminology, our phrasing, and even our punctuation. Since we don’t have a content management system, writers often cut and paste information between documents, sometimes modifying the material according to their stylistic preferences.

Then the editors suggest additional changes. acrocheck’s consistent, objective feedback minimizes most of this variability. Its ability to flag unnecessary words and phrases also reduces the volume of words to be translated.

Another major and unexpected benefit is that the deprecated terms that we collected for use with acrocheck are now being used by our research and development divisions. Developers run scripts that detect any deprecated terms that are in their software messages and user-interface labels. As I’m sure our readers will understand, fixing terminology problems that far upstream in the development process is a dream come true.

What The Future Holds

In the coming months, we plan to put more attention on exploiting acrocheck’s intelligent reuse functionality–the ability to identify variant sentences, not just variant terms and phrases. We’ll identify standard sentences, and acrocheck will flag linguistically equivalent variants to be replaced by the standard sentence.

I’m also hoping that we will put more attention on content reduction. I totally agree with Hans Fenstermacher …it is really the best way to reduce localization costs. Now that our editors are spending less time marking up the issues that acrocheck detects, they will be able to focus more on eliminating unnecessary content.

It’s also quite likely that other divisions at SAS will begin using acrocheck as soon as we are ready to support them. We need to better understand the ongoing support requirements before we reach out to other divisions.

Currently I provide most of the support, but a few of my colleagues assist with training, systems support, and by supporting the interaction between acrocheck and our XML-based publishing system. We get very few trouble reports from users–maybe because we were so thorough in eliminating most of those false alarms–but I have no shortage of work. Unlike most acrocheck administrators, I use the acrocheck development environment to develop and test my own style rules, in addition to collaborating with acrolinx on other shared interest rules.

We need to determine how far we want to take acrocheck’s functionality. The amount of support that is required isn’t that great, but you definitely get out of it what you put into it. You can develop your own controlled language if you are that ambitious and have the skills. (I have dreamed of doing that for years!) However, at SAS we don’t ever want to constrain our authors

too much or produce language that sounds stilted to native speakers. So the challenge will be to see how far we can go toward controlled language without crossing that line.

Comment on this Article:

Subscribe to the Newsletter

Get The Content Wrangler Newsletter delivered straight to your home or work Inbox. It's full of content goodness.

Sponsors

Edit Me
E-Spirit
Byte Level Research
Future Changes
Tech Comm Suite
Oxygen
TC World Magazine
JFM Concepts VDP Web
Scriptorium

Readers

Subscribe by or


Latest Tweets

Posting tweet...

Powered by Twitter Tools

Archives

Bad Behavior has blocked 2594 access attempts in the last 7 days.