270 likes | 382 Views
Taxonomies, Lexicons and Organizing Knowledge. Wendi Pohs, IBM Software Group. Agenda. Benefits, business and technical A few definitions Planning Issues Measuring value Futures Q&A. The Mantra.
E N D
Taxonomies, Lexicons and Organizing Knowledge Wendi Pohs, IBM Software Group
Agenda • Benefits, business and technical • A few definitions • Planning • Issues • Measuring value • Futures • Q&A IBM Software Group
The Mantra • Knowledge is in the eye of the beholder, but reflecting end user needs is as critical as representing texts....and it takes work! IBM Software Group
If only I could find information to help me do my job better ... Business Benefits Mergers and acquisitions Research and development Industries: Consulting Pharmaceuticals Financial services Legal IBM Software Group
Technical Benefits • Site creation • Navigation/search • Personalization • Defining areas of expertise IBM Software Group
Definitions: Taxonomy • “The science, laws or principles of classification” (From the Greek: rules of arrangement) • Biology (Linnaeus) • Education (Bloom) • A hierarchical collection of categories and documents • Structure and content IBM Software Group
More general than taxonomy • Natural structure • Wide vs deep • Category structure less controlled • File system • Yahoo (http://www.yahoo.com) • Yellow Pages • Corporate Web sites (http://www.ibm.com) Definitions: Directory IBM Software Group
Definitions: Thesaurus • Controlled vocabulary • Subject headings, labels • Synonyms (U, UF) • Relation types (TT, BT, NT,SN, HN, RT, SA) • Examples: http://www.loc.gov/flicc/wg/taxonomy.html IBM Software Group
Meta-data • Properties, attributes: information describing types of data [Crandall] • The ‘energy’ required to keep things organized [Earley] • Tagging • <META>, <Source> • Document Properties Definitions: Meta-data and tagging IBM Software Group
Definitions: Classification • Analyzing documents and assigning them to predefined categories • Rule-based vs natural • Classification schemes • Dewey • Library of Congress • Industry-specific IBM Software Group
Clustering • Automatically generating groups of similar documents based on distance or proximity measures • "Bags of words" • Vector analysis determines boundaries • Adaptive, but not abstract Definitions: Clustering IBM Software Group
Determine user information needs • Information audit, Content audit • Select appropriate sources • Create initial taxonomy • Edit categories • Categorize new documents • Test the UI • Train the taxonomy Develop a Plan IBM Software Group
What is the objective of the system? • Who owns the project? • What do users need? • What do content creators need? • What do system managers need? Plan: Information audit IBM Software Group
Is there an existing taxonomy? • How clean is the meta-data? • Is the content suited to automatic classification techniques? • Good example: Notes discussion databases • Not-so-good example: Web site with little text, lots of links • Is a subset of a source better than the whole? Plan: Content audit IBM Software Group
Which sources? • Who owns them? • Which sources do users access most often? • How do users access these sources? • What is the lifecycle of the content? • Who identifies the most current content? Plan: Select sources IBM Software Group
Plan: Maintenance • Resources • Centralized or department-level • Who decides when new content is added? • Term approval process • How do new concepts get into the taxonomy? IBM Software Group
Getting user involvement and buy-in • Maintenance resources • Directory versus taxonomy • Meta-data • Globalization and regionalization • Hidden vs published taxonomies Identify issues IBM Software Group
Organizational “perfection complex” [Chait] • Multiple taxonomies • Automated versus manual categorization Understand the BIG issues IBM Software Group
Many editors • Term approval process, synonyms • Standard tools across the enterprise • Federated taxonomies • Taxonomy links, “cross-connections,” facets, views • Taxonomy mapping Multiple taxonomies IBM Software Group
NCR Corporation - Support Organization • Needed to convince organization of the value of captured content • Managers resisted diverting resources to maintaining content • Current measure: Time per incident • How could the value of a knowledge classification system be demonstrated? Measuring value IBM Software Group
NCR developed a new parameter: • Knowledge helpful (the answer was in the support database and was used to solve the problem) • Knowledge not effective (the answer sent them in the wrong direction, did not help to address the issue) • Knowledge not available (nothing available to assist in solving the problem) • Knowledge not required (problem solved without the use of the knowledge base) Measuring value IBM Software Group
Methods: • Feature extraction, statistical analysis, rules-based, label generation • Starter taxonomies, imports • Taxonomy mapping • Interfaces: Visualization, better training tools Futures IBM Software Group
? Q&A IBM Software Group