About Thesauri
© 2000. Jessica L. Milstead. All Rights Reserved
What is a thesaurus?
- For writers, it is a tool like Roget’s one with words grouped and classified to help select the best word to convey a specific nuance of meaning.
- For indexers and searchers, it is an information storage and retrieval tool: a listing of words and phrases authorized for use in an indexing system, together with relationships, variants and synonyms, and aids to navigation through the thesaurus.
For more information on what an information retrieval thesaurus is and what it contains, see the American standard for thesauri: National Information Standards Institute. American National Standard Guidelines for the Construction, Format, and Management of Monolingual Thesauri. Bethesda, MD: NISO Press, 1994. (ANSI/NISO Z39.19-1993). While standard thesauri are formalized and highly structured, for some purposes a less complex vocabulary is adequate.
When does my organization need a thesaurus?
- If it has large files of text and unstructured information, and needs to control and provide access to that information.
- If it is using a search engine for its information, and has found that the engine does not provide adequate results perhaps too much irrelevant information for some queries, while missing useful information that is in the file for other queries.
- If the customers of the information system are demanding better access.
What can a thesaurus do for me and for my organization?
Properly developed and used, a thesaurus can play several roles:
- It can be a separate tool to which both indexers and searchers refer in deciding how to tag documents and queries for indexing and retrieval.
- It can also sit behind a search interface, facilitating searches without requiring users to interact with it as a separate operation.
- It can be used to improve the retrieval results from a search engine. For more on the value of thesauri in a full-text situation, click here.
What about using an existing thesaurus?
- Different fields vary in how well they are served by thesauri. With the exception of the Art and Architecture Thesaurus, science and technology are far better covered than the humanities, with the social sciences falling somewhere in between.
- Almost all published thesauri have been developed to serve the needs of a particular database. You are likely to have to make adjustments and extensions to pre-existing thesauri.
- However, there may be an existing thesaurus that you can adapt or even use as is, and it’s worth looking. Most thesauri are available only in print or by license from the developer, but you can click here for a listing of thesauri and word lists that are available on the Web.
- Keep proprietary and copyright issues in mind if you do use an existing thesaurus.
If I decide to build my own thesaurus, how do I go about the process?
In addition to the ANSI/NISO standard, the best guide to development of a thesaurus is: Aitchison, Jean & Gilchrist, Alan. Thesaurus Construction: A Practical Guide. 3rd ed. London: Aslib, 1997. (Available from Portland Press).
There are two basic methods for building a thesaurus: "top-down" and "bottom-up." In real life most thesaurus development efforts are a mixture of the two.
The top-down method:
- Convene a group of subject experts to decide on the scope and broad categories of terms to be included.
- Use existing dictionaries and thesauri to decide on the terms and their relationships.
- Review and organize the preliminary term set: decide on preferred terms and make Use references from the variants and synonyms; and build hierarchical and associative relationships among the preferred terms.
- Produce a draft thesaurus, test index and revise.
The bottom-up method:
- Develop a group of subject experts to serve as advisors; work with them to determine the scope if it has not already been determined.
- If you have a representative set of already-indexed documents, use the index terms from this set as your preliminary term list.
- If not, index a set of representative documents using free language (i.e., no vocabulary control), and take this term set as your preliminary list.
- Build your thesaurus by reviewing and organizing these terms, using a variety of resources as aids, as in the top-down method.
- Refer to your subject experts on terms whose meaning or usage is unclear, and for advice on which variant or synonym to prefer (or on whether two terms really are synonyms in the field).
- Produce a draft thesaurus, test index, and revise.
Maintain your thesaurus. A thesaurus is never "finished," unless it is no longer being used for indexing or its database is no longer being updated. Plan for maintenance before you even begin developing your thesaurus. A thesaurus which is not well-maintained quickly becomes a liability rather than an asset.
If you are thinking about using database software you already have, or even building a thesaurus using word processing software, DON'T. Even a small thesaurus will represent a very large investment of time and intellectual effort. Software which will automate the clerical and repetitive tasks is available for a cost that is very reasonable when you consider the true cost and value of the tool you will be producing.
A listing of some of the thesaurus management packages which are on the market today follows. Do not misinterpret statements about "has a thesaurus" in the publicity for packages designed for other purposes such as retrieval or library automation. Many of these do not manage a thesaurus; they simply provide access to a thesaurus file that is read in from another resource.
Standalone packages
Except as noted, these packages run on PCs and/or are available via the Web. Some also run under Unix; check with the vendors for specifics.
-
- a.k.a.
-
- Data Harmony
-
- LEXICO
-
- MultiTes
-
- STRIDE
-
- Synaptica
-
- Term Tree 2000
-
- Thesaurus Builder
-
Database modules
In general, these modules are integral parts of the larger system, and cannot be run separately. Their availability may vary, depending on the vendor’s development priorities.
- BASIS
-
- Bibliotech PRO
-
- MINISIS STEMMA
-
- STAR
-