1 / 39

Taxonomies: Hidden but Critical Tools

Taxonomies: Hidden but Critical Tools. Marjorie M.K. Hlava President Access Innovations, Inc. Industry in change. Technology changes Evolving standards Mergers New buzzwords Hard to tell what is real. Popular Misconceptions. Computers can do it all No need to index

oni
Download Presentation

Taxonomies: Hidden but Critical Tools

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Taxonomies:Hidden but Critical Tools Marjorie M.K. Hlava President Access Innovations, Inc.

  2. Industry in change • Technology changes • Evolving standards • Mergers • New buzzwords • Hard to tell what is real

  3. Popular Misconceptions • Computers can do it all • No need to index • No need for thesauri or subject headings • Full text gives all we need • Automatic full text • User friendly search engines • Search engines are indexes • User profiles provide the right context • Data filters give right answers

  4. Some of it is true • What can we use? • Automatic - semi - classification • Depends….. • Size of collection • Cost of the effort

  5. What’s in?? • Taxonomies • thesauri • hierarchies - classification • categorization • browsing • Wellformedness • Bricks and mortar, i.e., profit

  6. Options for Access/Control • Keep track of the input • Thesaurus • Authority file • Maximize the access • Search engine • Browse list • Power of the word • McCain

  7. What do we need? • The basics... • Authority file • People, places, things • Taxonomy • Thesaurus* with authority file or document instance • “Automatic” Classification

  8. Thesaurus Construction • Parts of a whole • Noun and noun phrases • People, places, things • Actions and reactions • Concepts and processes

  9. Term Records -Thesaurus - format • Main Entries • Top Terms - TT • Broader Terms - BT • Narrower Terms - NT • Scope Notes - SN • History - HI • Date Term - added/changed - DA

  10. Thesaurus - Format • Related Terms - RT • See - S • See Also - SA • Use - U • Use For - UF • “Wellformedness” = W3C

  11. What are the parts? • Natural Language Processing • Term forms • Term Relationships • Term Associations

  12. Natural Language Processing • Morphological • Lexical Analysis • Syntactic • Numerical • Phraseological • Semantic Analysis • Pragmatic

  13. Seven Major Parts of NLP 1. Morphological • plural • past tense to present

  14. Seven Major Parts of NLP 2. Lexical Analysis • part of speech tagging 3. Syntactic analysis • non phrase id • proper name boundary

  15. Seven Major Parts of NLP 4. Numeric concept boundary 5. Semantic analysis • Proper name concept categorization • Numeric concept categorization • Semantic relation extraction 6. Phraseological - discourse analysis • Text structure identification

  16. Seven Major Parts of NLP 7. Pragmatic analysis • Cause and effect relationships • Nurse and nursing • Common sense reasoning (buy  possess) • Who has x ? • These are the people who brought you.....

  17. Say it another way • Term standardization • Term forms • Term relationships • Term associations • Rule building / domain creation

  18. Word Standardization • Split out chemical & drug terms • Separates chemical & drug terms for special treatment • Split out homonyms, non-English terms, and authority terms • Separates objects, proper names, place names, and dates for special treatment • Run spelling standardization program • Identifies variant spellings

  19. Word Standardization • Run word standardization program • ie, ing, -ed, -s, es, pre-, non-, and “-” • Match preferred terms and synonyms

  20. Term Forms • Noun • Adjective • Verb, adverb • Singular, plural • Initial articles • Spelling variants

  21. Term Forms • Punctuation • Capitalization • Abbreviations

  22. Term Relationships • Generic • Hierarchical • Systematic • Alphabetic • Instance • Poly-hierarchical

  23. Term Associations • Cross references • All and some rule • Associative terms • Related terms

  24. “Rule building”* process • Put terms in context • Group like categories • Consider relationships • Standardize variants • Meld to a single concept rule • How much is really automatic???

  25. Domains • Taxonomy • Term Record - thesaurus • Hierarchical Browse-able list • Handout in Booth 150

  26. What else can we have? • Proximity • Stemming (lemmatization) • Truncation • Statistical clustering • Bayesian and others

  27. Other terms and tools • Neural networks • Word normalization • Lexical (word) networks • Distance mapping • Pattern recognition

  28. Moving toward the search engines • Term weighting • Frequency counts • Relevance • Precision • Recall

  29. Classification of “Automatic Classification Systems” • Evolving model… • Noun Extractors • Rule Based Systems • Semantic Processors • Fuzzy Search Systems • Filtering Systems

  30. (Semi) Automatic Indexing • Basic theories • Thesaurus construction • Natural language processing • Domain specific

  31. Noun extractors • Noun Extractors • Use stop word list and frequency counts • Semio • Word Perfect 5.0 • Recon • Prebuilt domains • Autonomy • Net Owl • Newsindexer

  32. Rules Based Systems • Rule Based • Data Harmony • API • DTIC • Mapit

  33. Semantic Processors • Synth Bank • n-Stein - expected • Quiver - beta

  34. Fuzzy Search Systems • Dr. Link • Sovereign Hill

  35. Filtering Systems • Screaming Media • Data Harmony

  36. New Directions • Topic Maps - TAO • Topic • Associations • Occurrences • Relational Indexing • Index Visualization • Based on term records • Add the search engines….

  37. What’s a user to do? • Enjoy the presentation • What about a database producer? • Look the options, • Build from the basics • Evaluate the new tools • See it work before you buy

  38. Give me your card I will email the presentation tonight

  39. Thank You • Marjorie M.K. Hlava • President, Access Innovations, Inc. • www.accessinn.com • Chairman, Data Harmony • mhlava@accessinn.com • 505-998-0800 • Booth 150

More Related