260 likes | 473 Views
Use of Controlled Vocabulary at the National Agricultural Library. EPA System of Registries Conference May 20, 2009 Lori Finch Thesaurus Coordinator Agricultural Research Service, U.S. Department of Agriculture.
E N D
Use of Controlled Vocabulary at the National Agricultural Library EPA System of Registries Conference May 20, 2009 Lori Finch Thesaurus Coordinator Agricultural Research Service, U.S. Department of Agriculture
Established 1862 as the primary agricultural information resource in the Nation. Statutory mandates to identify, collect and preserve in perpetuity, and provide access to quality information relevant to agriculture. One of four national libraries Serve as USDA’s library
NAL Services: AGRICOLA Online catalog of ~5 million items, “free for all”
NAL Services: Special Collections Rare books, manuscripts, nursery and seed trade catalogs, botanical illustrations, photographs, and posters from 1500s to the present
Agriculture Network Information Center NAL provides leadership in AgNIC, which is a voluntary alliance and partnership of 60 member institutions and organizations working to offer quick and reliable access to quality agricultural information.
NAL Agricultural Thesaurus Contains approximately 45,000 descriptors and 25,000 synonyms in the area of agriculture and biology Each term has its equivalent in English and in Spanish Definitions for over 2500 terms available in easy to use glossary Updated annually by USDA, Inter-American Institute for Cooperation on Agriculture (IICA) and specialized national institutions in Latin American Countries First edition in 2002 in English, Spanish added in 2007 Entire thesaurus is browsable and downloadable in several formats from web site….only available digital.
Issue #1: “Going Digital” and “old records” “Going digital” so now we have NEW publisher-supplied data that feeds into AGRICOLA database - Some data does not have subjects in the metadata - Some data has author-supplied keywords OLD records, 1970-1985, have incomplete metadata. No subjects. Thesaurus is the controlled vocabulary of AGRICOLA, and so there is a disconnect in the database. Impossible with given resources to manually index these items.
Solution to Issue #1 Use the NAL Thesaurus for machine-aided processing of these records. Use homegrown simple lexical matching program to enhance records with controlled vocabulary terms The program uses the synonym rings in the thesaurus and other “hidden label” equivalence relationships to aid in the selection of terms. e.g., continuous cropping, continuous cropping, monocrop, monocropped, monocrops, monocropping, monoculture
Issue #2: Search is minimal in AGRICOLA Use Voyager integrated library system where search is rather basic. Searcher is burdened and responsible for creating good queries. What terms would you use to retrieval information on this animal?
Solution to Issue #2 “Smart search”. Use the relationships, synonyms and hierarchy, from the thesaurus to expand search for the user to increase recall and precision for the user. If searcher uses any of the terms in a synonym ring, then the others can be added automatically, e.g., swine, pigs, hogs, Sus scrofa domestica, porcine If searcher uses a term with narrower terms in a hierarchy, then the terms can be added automatically, e.g., piglets, boars, sows, barrows, gilts, etc.
Issue #3 -Silos of information Even though we are a library, and are accustomed to acquiring, organizing and describing items with metadata, we STILL have silos of information….
NAL Agricultural Thesaurus: ”The glue that binds” Controlled vocabulary for AGRICOLA database AgNIC database / website Image collections NAL web pages, static and dynamic Information Center web pages, databases, resources Federated search enabled across resources at NAL, CV gives searcher more “handles” to find items of interest
Issue #4: Indexers working offsite Want to accommodate indexers working at home, “Flexiplace” Contract indexers offsite Indexers wanted to be able to complete work outside of the Voyager integrated library system. Voyager has system checks to prevent errors How maintain data integrity?
Solution to Issue #4 Indexers can use a macro in text editor to complete work at home and then load the file when they return to NAL work location. Use terms and synonyms in the thesaurus to: spell check validate input Name: Bob Cow Action: REPLACE Record: 1084033 Date: 01-MAY-09 Note: ‘seed coat’ is a non- descriptor for ‘testa’ Error: ‘Minnesita’ is not a valid NALT term • Send email to indexer when a record does not pass this quality check. • System can automatically correct when a cross reference in the thesaurus exists.
Issue #5 – AgNIC: How to present information to user? AgNIC database wanted to Reveal content and breadth of subject area to searchers Allow a “browse” of the content in combination with “search box”
Uses of controlled vocabulary Machine-aided indexing of new and old records with little or no subject metadata “Smart search” enabled for searchers of AGRICOLA Federated search of NAL web pages, AgNIC database, AGRICOLA database, and Information Center web pages and resources. Data integrity – validation and control of typographical errors Enable “browse” since thesaurus is arranged into subject categories.
Benefits of controlled vocabulary and metadata Web content productivity Create and describe web content once, and can be reused in NAL by different groups Employee productivity Find content, so it’s not re-created more than once. Reduced costs The public can find content without assistance. Self-service saves money—at least $5 per call. Shared solutions Provide an institution-wide content organization resource, and methodology that does not have to be reinvented each time by each group for each new project.
More benefits Interoperability Enables federated search when you want to combine the silos into “one-stop shopping” for an audience Standards compliance Dublin Core recommends the use of controlled vocabularies for the metadata element “subject”. “Smart search” Leverage the hierarchical structure for “expand” in search Leverage the synonyms for automatic query expansion Offer related terms to searchers to refine search