560 likes | 862 Views
Knowledge Organization: Library Tools and Taxonomies for the Web. Jan Herd jher@loc.gov Business Reference Services Science, Technology & Business Division The Library of Congress. Web is too big to organize?. One billion pages 1.5 million pages added daily
E N D
Knowledge Organization:Library Tools and Taxonomies for the Web Jan Herd jher@loc.gov Business Reference Services Science, Technology & Business Division The Library of Congress
Web is too big to organize? • One billion pages • 1.5 million pages added daily • Selection of sites by collection development specialists/reference librarians
Librarians work in corporate settings • Yahoo.com (directory) • Northern Light.com (search engine) • Amazon.com (e-book seller) • Microsoft.com
OCLC Library Corporation Cooperatively Catalogs: • 45 Million Works • 350,000 Web sites and growing
Traditional Library Tools on the Web • Medical Subject Headings 1996 • Web Dewey 2000 • Classification Web 2001 (LCSH & LCC)
Importance of controlled vocabulary as metadata American Library Association Subject Analysis Committee (SAC) Subcommittee on Metadata and Subject Analysis recommendations http://www.ala.org/alcts/organization/ ccs/metarept2.html
Controlled VocabulariesWhy We Need Them • Used “behind” search engines • Standard in online databases • New adherents (i.e., Web Content Managers utilizing Taxonomies) • They Work !
Sherry Vellucci, Associate Professor, St. John’s Univ., during the Conference on Bibliographic Control for the New Millennium: “authority control is not only wonderful, but critical. Controlled vocabulary mediating tools should cover Subjects, Genres, Gazetteers, Names and Titles, etc.”
Metathesauri/Subject Correlations • Universal Medical Language System (UMLS) maps over 60 medical and health care thesauri in one http://www.nlm.nih.gov/pubs/ factsheets/umlsmeta.html • ClassificationWeb The Library of Congress subject headings and LC classification correlations http://classweb.loc.gov
Mapping:Standard information exchangesystems • Dublin Core to MARC • http://lcweb.loc.gov/marc/dccross.htmlMARC to Dublin Corehttp://www.loc.gov/marc/marc2dc.html • XMLMARC Crosswalkhttp://lcweb.loc.gov/marc/marcsgml.html (Must download files) • MARC to XML to MARC Converterhttp://www.logos.com/marc/default.asp
Mapping:Specialized information exchange systems Standard Industrial Classification (SIC codes) to North American Industrial Classification System (NAICS codes)
SIC Code Example • Major group 73=Business services • 737=Computer programming, data processing, and other computer related services, 7372=Prepackaged software • Equivalent NAICS codes are: • Major group=51 Information • 511=Publishing industries • 5112=Software publishers (with cross ref. to Sector 42 for reselling packaged software)
Using old and new tools for knowledge organization on the Web Water into Wine
What is a Taxonomy ? A high level information search device constructed to provide a means of understanding, navigating, and gaining access to intellectual capital.
History of Taxonomies Aristotle 384 - 322 B.C. Kallimachos Carl Linnaeus 305 - 240 B.C. 1707-1778 Library of Alexandria
“Classification” is used much more frequently than “Taxonomy”, in all fields of study.
Numerous formal taxonomies are maintained by government and commercial enterprises
Taxonomies are used in: • Customized search engines • Interfaces in web portals
Service Codes CODE TITLE A Research and Development B Special Studies and Analysis ‑ Not R&D C Architect and Engineering Services ‑ Construction D Information Technology Services, including Telecommunication Services E Purchase of Structures and Facilities F Natural Resources and Conservation Services G Social Services H Quality Control, Testing and Inspection Services J Maintenance, Repair, and Rebuilding of Equipment K Modification of Equipment L Technical Representative Services M Operation of Government‑Owned Facilities N Installation of Equipment P Salvage Services Q Medical Services R Professional, Administrative and Management Support Services S Utilities and Housekeeping Services T Photographic, Mapping, Printing, and Publication Services U Education and Training Services V Transportation, Travel and Relocation Services W Lease or Rental of Equipment X Lease or Rental of Facilities Y Construction of Structures and Facilities Z Maintenance, Repair or Alteration of Real Property
How do we define taxonomies in a wired world ? • Taxonomy: A classification of elements within a domain • Domain: a sphere of knowledge, influence, or activity • Classification: the operation of grouping elements and establishing relationships between them (or the product of that operation) • Relationships: a defined linkage between two elements • Element: an object or concept Crandall, Mike.”Taxonomies for the Real World: The Business Imperative to Simply Content Access” TFPL Taxonomies for Business Conference, London, Oct.23, 2000.
What are Taxonomies Good For? Taxonomies are applied to: • Items (aka resources) individual pieces of information (documents, people... By the use of: • Metadata: (aka properties, attributes) information describing types of data Which may or may not use values from a: • Vocabulary: selection of terms, classified or sorted To create: • Content: an item and its associated metadata Crandall, Mike.”Taxonomies for the Real World: The Business Imperative to Simply Content Access” TFPL Taxonomies for Business Conference, London, Oct.23, 2000.
Challenges • Information management across divisions of your agency • Agency global intranets/Internet portals • Global or national document management including technical documentation • Incorporating taxonomy technology into agency technology +info. policies • Cost of building a taxonomy • Moving a taxonomy from overhead to being a core part of your agency’s information management.
More Challenges • Certification of the taxonomy by an authoritative body. • Finding common ground across multiple taxonomies or schemas with similar terms and different meanings. • Ensuring the ongoing integrity of the taxonomy with constant maintenance. • Acceptance by developers of tagging tools. • Integrating with a legacy system and external content.
The core expertise required for constructing a taxonomy is: • Systems Analyst who understands specifications for creating taxonomies • Domain expert/Subject expert in the subject of the taxonomy • Computational linguist, AI engineer • Linguist and/or Lexicographer • Database/Application Development Expert • Administrative Support • Review Support
Example of a custom taxonomy marked up in xbrl: <?xml version=”1.0" encoding=”utf-8"?> <schema xmlns:xbrl=”http://www.xbrl.org/core/2000-07-31"> targetNamespace=”http://www.xbrl.org/us/gaap/ci/2000-07-31"> <import namespace=http://www.xbrl.org/core/2000-07-31/ schemaLocation=”http://www.xbrl.org/core/2000-07-31/ xbrl-meta-2000-07-31.xsd”/> <element name=”propertyPlantAndEquipmentGrossNote.purchasedSoftwareForInternalUse” type=”monetary”> <annotation> <documentation>this is software that...</documentation> <appinfo> <xbrl:rollup to=”ci:propertyPlantAndEquipmentNetNote.propertyPlantAndEquipmentGrossNote” weight=”1" order=”7.5" /> <xbrl:label xml:lang=”en”>Purchased software for internal use</xbrl:label> <xbrl:reference name=”GPSI” number=”73" chapter=”11" paragraph=”b” subparagraph=”i” /> </appinfo> </annotation> </element> </schema>
Recommendations: • Actively seek out existing taxonomies in the target discipline or subject area. If your needs are met in part by an existing taxonomy use it and build on it. • Look at the intended purpose of the taxonomy and select appropriate software tools. • Consider scalability of the taxonomy. Look at the big picture and see how the taxonomy will be able to hook into others. • Consider utilizing numerical taxonomy as a schema in the metadata in order to merge documents in foreign languages. • Accommodate new standards whenever possible. • Document “Best Practices” while creating the taxonomy and review them regularly. • Maintain and update the taxonomy continually.
Meta Model(Describes how taxonomies are created) Existing Taxonomy in your Field Your Agency Taxonomy Related Taxonomy of other agency in same field Related Taxonomy of other agency hooked to one above Core Schema (Describes how document is to be created) Electronic Document in XML
Efficient Web information retrieval systems in the form of search engines or Web portals require continued support and improvement of:
Web based classification and numerical taxonomic tools to use in • Web based cataloging tools such as CORC, which provides metadata based on • Taxonomies such as controlled vocabularies/thesauri which will be hooked together using • Metathesauri and standard information exchange systems such as MARC-XML
And this is the house that Jack built… With a wine cellar...