By Jean Graef, Founder of the Montague Institute, special to TheContentWrangler.com
In the pre-Internet days, users relied on indexes, tables of contents, databases, card catalogs, and annotated bibliographies to find information. These “legacy retrieval tools” worked reasonably well in the orderly system created by editors, publishers, and librarians. With the World Wide Web came a new kind of system: a giant electronic warehouse with no shelves, no labels, no maps, and even no lights. Into this virtual warehouse, we dumped a huge pile of content — much of it lacking author, publisher, subject, publication date or even a meaningful title. By default, full text search emerged as the dominant retrieval tool because there was nothing else.
We now know that search engines have limitations: too much irrelevant information and the inability to find items that are known to exist. Although some of them have begun to incorporate many of the features of legacy retrieval systems, such as synonyms and fielded search, for many tasks search still can’t match the efficiency of traditional tools.
There are three problems with using legacy retrieval tools in a Web environment:
- How and when to use them
- How to integrate them technically
- How to show a return on investment
In this article we look at the pros and cons of four different kinds of legacy tools in the context of a specific Web site. As a purely academic exercise, we selected the official Supreme Court site because it’s a well known organization of current interest, it’s about both people and documents, and it’s aimed at multiple audiences.
First, we’ll look at what information is currently accessible on the Supreme Court Web. Then we’ll make some assumptions about the audience. Next, we’ll look at features that might make the site easier to use. Finally, we’ll apply these features to a hypothetical design and look behind the scenes to see how they could be implemented by using a metadata repository similar to the one developed for our A – Z index and teaching lab. Our goal is to see how the addition of legacy tools might increase retrieval efficiency.
Contents of the Supreme Court Web site
We can get a rough idea of what’s currently available by looking at the site map. There are 13 top-level headings, most of which have subheadings. In addition, two kinds of full text search are available: Supreme Court Files or Supreme Court Docket Files.
TOC subheadings are shown as drop-down lists when the cursor hovers over a main category or you can view them on the same page with the site map page.
Who’s the audience?
The primary audiences for the Supreme Court Web site are the bar, the public, and the media. Users in the public category might include:
- Tourists who want to visit the Supreme Court Building
- People looking for internships or jobs
- Teachers creating lessons plans, projects, or field trips
- Students doing research for class assignments
- Activists and others with an interest in a specific issue (e.g. women’s rights)
The three audiences present a problem in terminology. While we can assume that members of the bar will understand legalese terms, we can’t make the same assumption about the public and the media. How to deal with this problem is a major issue in the redesign.
Now let’s compare the Supreme Court Web site as an information source with four other kinds of tools:
- Book: The Supreme Court (intended for a general audience)
- Ad-supported Web site: US Supreme Court Center (published by the West Group, a legal publisher)
- Librarian’s guide: Web Guide to U.S. Supreme Court Research
- Database: U. S. Supreme Court Justices database (a series of Excel files compiled by a law professor)
1. Book: The Supreme Court
The book is an introduction to the court and its impact on American life aimed at the layman. On Amazon.com, we can look at its table of contents, indexes, and an excerpt. We can also do a full text search of the text.
Although the basic vocabulary is plain language, there’s a glossary of legal terms mentioned in the text. In addition, there are two kinds of indexes: one for names and subjects and another for Supreme Court cases.
Pros and cons
The book, written by a law professor, is well organized and uses a style that is targeted to the layman. It has several reader-friendly features, including a bibliography, two kinds of indexes, and a glossary. On the other hand, it suffers from the limitations of any book. Unlike the Supreme Court Web site, it is neither free nor instantly available. It lacks current information (for visitors, job seekers, and the media), and it doesn’t provide access to primary research material (e.g. the text of court opinions). Background information on the author is limited, and there’s no method of contacting him.
2. Ad-supported Web site: US Supreme Court Center
The US Supreme Court Center is a collection of information published by Findlaw, an ad-supported Web site owned by legal publisher West Group. It features biographies of justices, a history of the court, a calendar (docket), decisions and opinions, current news stories, and an index to cases by topic for each term. Visitors can sign up to receive free same-day US Supreme Court case summaries and read commentary on current Supreme Court issues by Findlaw columnists. There are 12 main topics in the “Court Resources” section, similar to those listed on the home page of the official Supreme Court Web site.
Pros and cons
The Findlaw Web site is free, instantly accessible, easy to use, and contains a wealth of current information about the Supreme Court. It includes both topic browse and “fielded search” of court records by year, volume, citation, or party (i.e. the name of the person or organization involved in the case). However, biographical information on past justices is not comprehensive, and some court documents published prior to 1999 are not accessible. The design is somewhat cluttered with ads, and there’s no glossary or general A – Z index.
3. Librarian’s guide: Web Guide to U.S. Supreme Court Research
This article by a law school librarian is an annotated bibliography of sources about the Supreme Court. The Guide is geared to sources of information, not the actual information itself. For example, while FindLaw cites individual newspaper articles, the Guide refers to newspaper titles (e.g. Washington Post). Among the sources mentioned in the Guide is the FindLaw Hot Topics page, which contains links to recent articles on such issues as abortion, immigration, and elections. The Supreme Court is itself listed as a “hot topic.”
Pros and cons
Sources in the Guide have been selected by an expert who we can assume is quality-conscious and relatively objective. Sources are annotated with descriptions about their scope and content. The “hot topics” item will be especially appealing to the media and the public. The reader is given some background on the author and can send her an e-mail message. On the other hand, the focus on sources, rather than actual information, makes the site more useful to other librarians than to end users looking for a quick answer. Moreover, it’s hard to find (it doesn’t appear on the first five pages of a Google search on “Supreme Court”).
4. Database: U. S. Supreme Court Justices database
This database is not intended for the general public. Compiled by a law professor, it contains data on individuals nominated (whether confirmed or not) to the U.S. Supreme Court. The database consists of 263 variables, falling roughly into five categories: identifiers, background characteristics and personal attributes, nomination and confirmation, service on the Court, and departures from the bench.
The database is available for download in three formats: SPSS and Stata (statistical software programs) as well as Excel. The files are accompanied by a 140-page user manual. There is no Web search interface.
The database contains 263 kinds of data on people nominated to the Supreme Court (whether or not they were confirmed and served).
Pros and cons
The database is perfect for scholars, writers, and reporters who want to use statistical software to unearth trends and mine interesting nuggets of information on people nominated to the Supreme Court. However, to be useful to the public, the factual data about people should be linked to related information, such as books, articles, and Web sites. It also needs a Web interface to allow users to browse and search the data without having to download it.
Putting it all together
If we were the editor of the Supreme Court Web site, here are the features we’d borrow from each of the tools above:
* From the book: plain language table of contents, general index, case index, bibliography, glossary;
* From FindLaw’s US Supreme Court Center: browse cases by topic for each year, view court docket by month, browse hot topics;
* From Web Guide to Supreme Court Research: on the Supreme Court home page create a link to the entire document;
* From U. S. Supreme Court Justices database: search and browse Court nominees by multiple attributes, provide specialized lists (e.g. nominees by state, members who served in the military).
The new Supreme Court home page might look something like this:
We’ve reduced the size of the photo and blank space to give more room for a two-level table of contents as well as left and right sidebars that list navigation tools and related information.
What about search?
Search on the current Supreme Court site is a great step forward in providing instant access to source documents, but there’s some room for improvement. In the new design, database search and topic browse play a major role, while full text site search serves as the navigation tool of last resort.
Supreme Court full text search results. Search could be improved in three ways:
- Making results more predictable. For example, a search for ‘rowe vs. wade’ finds nothing, while ‘rowe and wade’ finds 21 items, some of which are not relevant if you’re looking for information on the Rowe v. Wade abortion case.
- Instead of letting the search engine guess the title and description, using metadata values entered by humans.
- Reducing the number of items in the results list either by narrowing the scope of the collection to be searched and/or by constructing rules that use metadata values for specific keywords.
Behind the scenes
We can infer that there is already metadata in machine-readable format for:
- Justices and nominees
- Merits Briefs
- Cases organized by topic and year
But since some of this data was created by a third party, we’d have to obtain permission to use it. Moreover, we’d need a data structure (i.e. a metadata repository) that would allow us to:
- Associate Justices with cases, speeches, and other documents
- Associate index terms with documents and with each other in thesaurus relationships
- Quickly produce up-to-date bibliographies, a glossary, and other specialized lists (e.g. Justices that have served in the military)
Once the right repository structure was in place, we’d need to populate it with data. The documents themselves would not be stored in the repository; we’d only store pointers or URLs that would allow them to be retrieved from a server.
Metadata repository example showing the segment containing index terms (keywords), thesaurus relationships, and document/term relationships.
Other segments of the repository contain metadata about people and documents.
This example from the Montague Institute drives our A – Z index, serves as a teaching lab, and makes our internal business processes more efficient.
Return on investment
The cost of creating and populating a metadata repository would include:
- Software license fees
- Time required to gather, input and maintain data
- Time required to select and enhance content (e.g. add meaningful titles)
- System integration fees
The return on this investment would come primarily from three sources:
- Time saved in creating and updating automatically instead of by hand
- Time saved by substituting self service information retrieval for phone calls to staff members
- Time saved by users in finding information
- Sale of products and services
For government agencies, it’s more practical to calculate internal time savings in content management and customer service, but saving time for users can be a lucrative business opportunity for third parties (e.g. FindLaw). For intranet managers, all four ROI factors come into play, but time saved by users usually gets short shrift. There’s no one to advocate for users, and they aren’t well defined as a specific audience.
One of the reasons why some Web sites fall short in user efficiency is that we abandon traditional editorial and retrieval tools. Our first priority is to make content accessible. Only later do we think about how a work (in this case a Web site or site segment) meets the needs of an audience or how to save time for users. Instead of trying to make search engines be everything to everybody, it’s time to rethink legacy retrieval tools in a Web context and consider a metadata repository as the implementation vehicle. This approach not only saves money on content management, but it also saves time for users.
About the authorJean Graef founded the Montague Institute in 1992 and the Society of Knowledge Base Publishers in 1998. She writes and edits articles for the Institute’s monthly Web journal, the Montague Institute Review. Her work has appeared in many other publications, including CIO magazine, Research Technology Management, Internet Week, Sales & Field Force Automation, the Wall Street Journal, the Boston Globe, and CFO magazine.
The Montague Institute educates executives and information professionals on cutting-edge topics relating to corporate information services with a focus on those that cross organizational boundaries. It is known for its hands-on, projected-oriented Web courses that use an online Knowledge Base Publishing system as a metadata repository lab.