© 1997 Nancy C. Mulvany
This invited article was originally published in the ACM Journal of Computer Documentation, May 1997.
Thus far, the 1990s have been a busy decade for indexing. There has been a flurry of publishing activity; that is, if you consider the appearance of two major books a flurry! The publication of Indexing Books (Mulvany, 1994a) and Indexing from A to Z (Wellisch, 1996) filled a gap left by the now out-of-print book, Indexing, The Art of (Knight, 1979). These two books are the core resources for information about indexing methods and techniques, editing indexes, formatting indexes, indexing standards, and typical publishers’ style guides for indexes.
Today, index writers, like technical writers, are facing the challenge of transferring their writing skills to a new media, the electronic publishing environment. However, we all know that print-based publishing is still of major importance. Consider the following comments about an index:
The index and figures of this book are indeed alone worth its whole price, because they make it much easier to use …. so that everybody who wants to find quickly something that is contained in this little book can find it.
This is just the kind of statement I would like to see in a review of a manual I had indexed. This quotation dates back to a book published in 1465 (Wellisch, 1994). Yes, that is 532 years ago. I bring this up only to point out that indexing has a long tradition in the West. In a nutshell, primary tenets of this tradition include: indexes are for readers; indexes are written by people; indexes provide access to information contained in texts.
It seems obvious that readers use indexes. In regard to computer documentation, indexes often play an important role in product usability. In 1991 PC Computing magazine surveyed its readers and found that the index is considered the most important feature in both hardware and software manuals (Grech, 1992). A survey by the Dataquest Desktop Software Support group concluded, "By far, the index is considered to be the most important software documentation support feature" (Dataquest, 1993). High-quality indexes can significantly reduce calls to technical support, and although free, unlimited technical support seems to have disappeared from the industry, good indexes can enhance user satisfaction, product usability, and influence sales.
This is probably the most controversial statement I will make here. I will hold to it with tenacity. Indexes are creative, authored works eligible for U.S. copyright registration. Just as no ten technical writers will produce the same written description of a software product, no ten indexers will index the documentation for the software product in the same way. The underlying premise of an automatic "indexing" program is as insulting to indexers as an automatic "writing" program is to writers.
In 1994, Indexicon, The Only Fully Automatic Indexer jumped into the market with such brazen claims as, "with just a click of the mouse, you create back-of-the-book indexes," and "produce professional quality indexes at a rate of up to 50 pages per minute." A thorough review of these claims and the functionality of the program revealed a rather pitiful attempt at producing an index (Mulvany and Milstead, 1994). The review concluded, "Our test results (Figs. 1-4) could be used as a Turing Test of Indexes. Can you tell which ones were done by a computer?" Yes, the difference between a handcrafted index and a computer-generated list is quite obvious.
During the past fifteen years the computer tools for document processing have developed impressively. Unfortunately, the same cannot be said for the indexing modules included in the document processing programs. Compared to stand-alone indexing programs like Macrex and Cindex, the modules for embedded indexing are poorly designed, tedious to use, and stand in the way of efficient indexing. Most of these tools provide an index writing environment that is absurd. Instead of being able to work with the index as it develops, the indexer must work in the dark. It would be comparable to using a word processing program where you could only see one sentence at a time; in order to see the context of your sentence (within a paragraph, paragraphs within a chapter, chapters within a book) you would need to compile your document.
Stand-alone indexing software allows the index writer to work within the context of the index itself. When an entry is added to the index, the writer can see exactly where it will appear in alphabetic order. The indexer is alerted when an improper cross-reference is added; e.g., a See cross-reference to an entry that does not exist. Since the writer is working directly with the index document, on-the-spot editing is a fundamental part of the index-writing process. This dedicated indexing software frees the indexer to focus on the quality and content of the index.
Nothing new has been written recently about the design failures of embedded indexing software, as this issue has already been addressed in detail (Mulvany, 1989; Wittmann, 1991; Mulvany, 1994b). If you think you are the only writer being driven crazy by your indexing software, you will surely enjoy "Attachment B: Selected Quotes from Respondents" in the article, "Embedded Indexing Software: Users Speak Out" (Mulvany, 1994b).
It would require the space of a book to discuss how indexes provide access to information within a document. The best synopsis can be found in the American Society of Indexers’ criteria for the ASI/H.W. Wilson Award for Excellence in Book Indexing (American Society of Indexers, 1996). While these criteria are primarily aimed at printed book indexes, many of the guidelines are applicable to indexes for other media.
We have all heard that hypertext will free us from linear reading of text, but for indexers and users of indexes, hypertext is a new technique rather than a new concept. Indexes are one of the access aids that enable readers to jump into discrete locations within the text, and thus users of indexes are already free from linear text reading. A well-designed index is an exceptional navigation aid for reference documents. Since so much computer documentation is now presented in an online format, it is important to note that there is a great deal of overlap between functional index structure and hypertext structure.
A quick look at the index entries for "indexing" at the Memex Web site (Fig. 1) illustrates the diverse nature of indexing concerns among hypertext developers. (The Memex and Beyond Web site is a major research, educational, and collaborative web site integrating the historical record of and current research in hypermedia.) The references in Figure 1 are to papers from the following conferences: Hypertext'87, Hypertext'89, ECHT'90, Hypertext'91, ECHT'92, Hypertext'93, ECHT'94, IWHD'95, and Hypertext'96.
Figure 1
"Indexing" entries from the Memex Web site (http://www.cs.brown.edu/memex/i.index.html). The Global Index at the Memex Web site is a fine example of a handcrafted index providing access to online information.
indexing,
See Also information, retrieval; navigation
access paths in gIBIS; Conklin(102)-249
as access structure, in OOHDM design process; Schwabe(12)-121
adaptive, illustration of SuperBook features; Remde(98)-183
automatic,
creating hypertext structure from linear documents with; Nanard(191)-331
current research that has hypertext relevance; Walker(106)-321
CYBERMAP's use to identify related nodes; Gloor(175)-110
by documentation readers, permitted in DIF; Garg(112)-415
content analysis based on, in CYBERMAP; Gloor(175)-112
content search mechanism, supported by query-based access in
OpenBook system; Ichimura(229)-63
context displaying, Document Examiner facilities for; Walker(106)-320
contextual,
for hypertext documents; Boy(171)-51
value for both design and operational documentation; Boy(171)-57
as data structure that facilitates global navigation; Frisse(128)-200
deficiencies of context-free descriptors; Boy(171)-54
documents for use in hypertext, as CID capability; Boy(171)-51
embedded menus vs, experimental studies on with Hyperties;
Shneiderman(99)-192
entries, semantics of; Garg(110)-390
extracting terms from text, in CID; Boy(171)-53
full-text,
Pixlook System support of; Egan(189)-303
SuperBook System support of; Egan(189)-302
generation in CYBERMAP; Gloor(175)-113
hyperindices; Bruza(149)-109
in electronic documents; Walker(106)-313
in media-based navigation systems; Hirata(237)-159
in text to hypertext conversion, the most difficult step; Frisse(91)-58
index nodes, term definition and characteristics; Botafogo(172)-64
information retrieval issues, small document vs graph traversal
approach; Frisse(91)-59
inverted,
hypertext information retrieval use of; Frisse(91)-61
limitations of in searching; Walker(106)-317
manual generation,
as gold standard for judging recall and precision; Cleary(4)-33
ASK system use of; Cleary(4)-32
as navigational class; Schwabe(12)-122
network structures, importance for large hypertexts; Hara(173)-77
online, difficulties with; Wright(167)-7
overview map generation with automatic, CYBERMAP use of; Gloor(175)-108
paper documents, value of; Walker(106)-312
problems with multiple referents for the same term, CID solutions to;
Boy(171)-53
rich, characteristics of; Remde(98)-177
separating index space from document space, in CID; Boy(171)-58
similarity relation representation; Oren(105)-300
topology relationship to document topology; Frisse(128)-202
user-defined terms, capability of SuperBook; Remde(98)-177
user-generated, in DIF documentation; Garg(112)-415
user-generated synonyms, in SuperBook; Remde(98)-178
It is not uncommon for a customer to purchase a complex software product and receive a skimpy printed manual and a CD-ROM disc. The bulk of the product documentation is contained in help files. In the Windows environment an index is a common component of the online help. As pointed out earlier, the embedded indexing modules in print-based document processing software have many problems. Unfortunately, these indexing tools appear elegant compared to the methods used to produce online indexes. Although Microsoft appears to be abandoning its WinHelp Compiler for HTML-based help (see http://www.microsoft.com/workshop/author/htmlhelp), the techniques used by experienced writers of Windows help indexes will be of benefit regardless of the tool used for producing the help files.
One of the most thorough articles about help indexes is the two-part series written by Jan Wright (1995, 1996) for the WinHelp Journal. Wright examines fixing common keyword problems, standardizing style, dealing with sorting problems, cross-referencing, and editing techniques and provides general work-arounds that help create a usable index. Wright’s series delves into the nuts’n’bolts of producing an online index.
Another good source of information about indexes for Windows help is Designing Windows 95 Help (Deaton and Zubak, 1996). This 684-page book (+ CD-ROM) offers good technical and conceptual advice about index design.
An excellent article appeared recently in a newsletter for law indexers, "Indexing for CD-ROM Products" (Mertes, 1996). Mertes works in a legal publishing environment where she has been involved in the production of indexes for both print and electronic products. Indexes for legal publications are often very lengthy, and particular attention is devoted in the article to searchability of the index. "Make sure the index is both browsable and searchable. Readers should be able to flip through an electronic index just as they can through a paper index, seeing what’s available in the text and reading around a general topic. The index is still a great selling point for electronic publications. But it’s important that the index can be keyword-searched; it allows the reader to use the index in idiosyncratic and unexpected ways, and it’s also a great tool for the indexer to improve the product."
Mertes’ discussion of searching the index includes suggestions for the improvement of the common search tools. She points out, "One of the problems with doing a keyword search on an index with multiple points of entry is that the end result is so repetitive and so messy-looking." Sensibly she suggests that the search program evaluate links and display only one entry per unique locator.
Another problem addressed in the article is dealing with abbreviations and acronyms in relation to searching the index. In printed indexes it is common to establish a rule for the handling of abbreviations. For example, the rule might be: use the full term with the abbreviation in parentheses in main headings; use the abbreviation in subentries. A series of entries may look like:
file transfer protocol (FTP) FTP. See file transfer protocol (FTP) transferring files with FTP
Readers searching for "file transfer protocol" in the index will find only the first two entries. However those searching for "FTP" will find all three entries. Mertes suggests, "Consistency of use is the key; for electronic products it’s best to use both the full term and the acronym in parentheses, so that whatever the user looks for will be caught."
Mertes (1996) and Wright (1995, 1996) are examples of articles that address very specific concerns about the presentation of non-print indexes. In many respects, online indexes are much more difficult to format for a computer screen than for the printed page. While a complex legal index may be four levels deep on a printed page and still be easily scanned and usable, this number of subentry levels is often too cluttered for the computer screen. What is the indexer to do? Eliminate the complexity? Or, maintain the complexity and eliminate screen clutter through the use of "collapsible" entries? Then there is the issue of how to indicate multiple references to a topic. In a printed index this is easily done with the use of multiple page references, "file access, 75-78, 125-127, 187". When documentation is put online, the "pages" are no longer static. How do we indicate these multiple references in an online index? These are the types of questions that indexers are trying to address today.
No article about current issues in indexing would be complete without some discussion of search tools. After all, for years it has been argued that full-text or keyword searching would eliminate the need for handcrafted indexes. If I may be allowed to grossly generalize, I see three camps: indexers, a subgroup of information science people, and the Searching Public. The indexers have been hearing about their imminent demise for years and find the arguments tedious and of little interest. One subset of the information science group has been focusing for decades on various types of automatic content analysis and is still actively trying to find a method that works. The Searching Public is a very broad collection of people who have no background in information science or indexing; they tend to be quite enamored with spiffy search tools. The Searching Public is distinguished from the other two groups in a very significant way -- they often control the money. I have seen long-term technical indexing contracts dissolve because "all the material is going online; we don’t need indexes anymore because users can search the text." Fortunately, the technical indexing market is so broad that there are many clients who have "been there, failed at that," and have returned to traditional indexing methods.
The American Society for Information Science (ASIS) is the dominant professional society that brings together current research in the information science field. You will find the Proceedings of its 59th Annual Meeting (Oct. 1996) at the ASIS Web site ( http://www.asis.org/annual-96/ElectronicProceedings/index.html). Here you will find discussion of indexing methods, search models, evaluations of search engines, etc. Of particular interest to traditional indexers is Bella Hass Weinberg’s paper, "Complexity in Indexing Systems -- Abandonment and Failure: Implications for Organizing the Internet" (Weinberg, 1996a). This paper presents an overview of classification systems, and suggests that indexes to large electronic collections of documents might best be structured like book indexes with specific headings and coined modifications.
A special interest group within ASIS that is of particular interest is SIG/CR (Classification Research). In an indexing publication Weinberg presents an overview of the SIG/CR papers and many of the general session papers at the ASIS meeting (Weinberg, 1996b). Weinberg calls this article "a slanted abstract for indexers." The article is witty and cuts to the chase. Indexers who read this article should feel reassured about the viability of their skills. Members of the Searching Public probably will not understand all the jokes, but one could hope that their search engine pedestal might be lowered a few notches.
I think it is extremely important to recognize that both search engines and handcrafted indexes have their place. The search engines are becoming more sophisticated and somewhat easier to use. However, the user of a search engine stills bears the burden of meticulous query construction, anticipation of language variants, and filtering of results. A well-written index performs many of these tasks for users. The indexer acknowledges and resolves language differences such as the use of synonyms and explicit and implicit concept analysis. Developed index entries that result from careful analysis and synthesis of text pull together related information that might otherwise be missed. This serendipity is often lacking when users must rely only on a literal-minded search algorithm to locate information.
In massively large environments where traditional indexing is impossible, search tools are the only option for information retrieval. Luckily, within the domain of online computer documentation, information retrieval methods are not so limited. A search engine is a useful adjunct to an index in this environment.
As I indicated earlier, software tools for online indexing are tedious and costly to use, and I am concerned that indexes will be abandoned all together in favor of a search engine. Given the design of current search engines, this brave new world may not be much fun. In Wired magazine Steve Steinberg wrote, "As automated indexing becomes available, we will begin to depend on it. It will encourage people to write plainly, without metaphors or double entendres that might confuse a search engine. After all, everyone wants people to be able to find what they have written" (Steinberg, 1996 p.182).
It is unsettling to imagine the day when a writer’s audience is not the reader, but a search engine. We have seen the proliferation of words in titles of scientific articles because the greater the number of words, the better chance there is at retrieving the article with a keyword search engine. Will writers follow Steinberg’s advice and use nice, neat, plain, mundane language so as not to confuse the search engine?
The merits of Blair and Maron’s 1985 paper, "An Evaluation of Retrieval Effectiveness for a Full-Text Document-Retrieval System," have been debated widely, and I do not wish to rekindle the debate about their methodology or interpretation of results. However, there is one aspect of their findings that is inarguable and still very disturbing. In their study the full-text retrieval system retrieved "only 20 percent of the relevant documents, whereas the lawyers using the system believed they were retrieving a much higher percentage (i.e., over 75 percent)" (Blair and Maron, 1985, p. 293). I am not surprised that only a small amount of the relevant items were identified. What is surprising is that the searchers thought they were retrieving so much more.
Perhaps as more people use multiple Web search engines and compare the disparate results, we will be less naïve about the thoroughness of the lists of hits. In the meantime, the traditional index remains an efficient and highly effective information access tool that links topics within texts. There are more resources about indexing available today than ever before, and the American Society of Indexers Web site (http://www.asindexing.org) is a good starting point for explorations.
Nancy Mulvany is a past president of the American Society of Indexers and is currently the Associate Editor of the British journal, The Indexer, and owner of Bayside Indexing Service. She can be reached at: nmulvany@bayside-indexing.com.
Return to Home Page