430 likes | 637 Views
INFM 700: Session 6 Taxonomies and Metadata. Paul Jacobs The iSchool University of Maryland Tuesday, March 24, 2009. This work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States See http://creativecommons.org/licenses/by-nc-sa/3.0/us/ for details.
E N D
INFM 700: Session 6Taxonomies and Metadata Paul Jacobs The iSchool University of Maryland Tuesday, March 24, 2009 This work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United StatesSee http://creativecommons.org/licenses/by-nc-sa/3.0/us/ for details
Today’s Topics • Nature and types of metadata • General-purpose taxonomies (ontologies, thesauri, …) • Special-purpose taxonomies & thesauri • Practical use of taxonomies and metadata Metadata Taxonomies & Thesauri Practical Uses
Metadata • Literally “data about data” • “a set of data that describes and gives information about other data” ― Oxford English Dictionary • Why do we need this? • Types of metadata • Descriptive/subjective/content (e.g. author, subject, keywords, …) • Administrative (e.g. owner, rights, cost, creation date, version, …) • Technical (e.g. format, size, dependencies, programs) • . . . . • In practical terms: • Metadata helps users locate, navigate, interpret content • Metadata helps organizations manage content • Metadata helps systems manipulate content Metadata Taxonomies & Thesauri Practical Uses
Data without Metadata… Who: authored it? to contact about data? What: are contents of database? When: was it collected? processed? finalized? Where: was the study done? Why: was the data collected? How: were data collected? processed? Verified? Metadata Taxonomies & Thesauri Practical Uses … can be pretty useless!
Early Example of Metadata Metadata Taxonomies & Thesauri Practical Uses
Related Terms & Techniques • Taxonomies • Anything organized in some sort of hierarchical structure • Tagging • Adding almost any kind of metadata to content, but now often descriptive and user-provided • Thesauri • Focus on relations between terms • Focus on “concepts” • Ontologies • Usually model a specific domain or part of the world • Generally machine-readable Metadata Taxonomies & Thesauri Practical Uses Increasing complexity and richness
Menagerie of Terms • Classification • Hierarchies • Epistemology • Directories • Controlled vocabularies • Knowledge representation Metadata Taxonomies & Thesauri Practical Uses Let’s focus on significant differences. Let’s focus on advantages/disadvantages. Let’s focus on how each is useful. Let’s not quibble over what to exactly call each.
Segue – Metadata to Taxonomies What do taxonomies, thesauri, etc., have to do with meta-data? Metadata Taxonomies & Thesauri Practical Uses
Taxonomies • Organization of objects according to some principle • Familiar examples: • Linnaean taxonomy (for living organisms) • Web directories (e.g., Yahoo or ODP) • Corporate directories • Organization charts • Organizational structures previously discussed Metadata Taxonomies & Thesauri Practical Uses
Thesauri: Motivation • “Semantic gap” between concepts and words • Words are used to evoke concepts • Concrete objects: MacBook Pro, iPhone • Abstract ideas: freedom, peace Concepts Ideas Words Meaning Metadata Taxonomies & Thesauri Practical Uses
Words and concepts • The semantic gap: What’s the problem? • Synonymy – roughly, different words or phrases can be used to express similar ideas (e.g. “notebook”, “laptop”) • Polysemy – roughly, the same word can have different meanings (e.g., “line” (fishing, code, queue, . . .) ) • Taxonomies try to group similar concepts • “Tags” often assign words to concepts, making it easier to find related concepts • Controlled vocabularies avoid ambiguity (like a specific tag set) • Thesauri represent attempts to better organize mappings between words and concepts Do these present precision or recall problems? Metadata Taxonomies & Thesauri Practical Uses
Some Real Examples • Content tagging and social media (e.g. flickr, del.i.cious) • Special-purpose classification schemes and thesauri (e.g. art & architecture thesaurus – AAT, UMLS) • General semantic tools and classification schemes (e.g., Princeton WordNet, Roget’s Thesaurus) Metadata Taxonomies & Thesauri Practical Uses
Think for a sec… • You are developing a content-rich site and need organization and labeling schemes to help users view/browse/learn/find stuff – what do you do? • Define your own tagging/organization scheme? • Let the users define their own? • Leave it all to a search engine? • Use some existing scheme? • . . . Metadata Taxonomies & Thesauri Practical Uses
Flickr – popular tags Metadata Taxonomies & Thesauri Practical Uses
Flickr – related tags Metadata Taxonomies & Thesauri Practical Uses
Del.icio.us – related tags Metadata Taxonomies & Thesauri Practical Uses
Art & Architecture Thesaurus http://www.getty.edu/research/conducting_research/vocabularies/aat/ Metadata Taxonomies & Thesauri Practical Uses
UMLS (Unified Medical Labeling System) Source: National Library of Medicine (NIH) SPECIALIST Lexicon +Tools Semantic Network Metathesaurus 135 broad categories and 54 relationships between them lexical information and programs for language processing 1 million+ biomedical concepts from over 100 sources Metadata Taxonomies & Thesauri Practical Uses 3 Knowledge Sources used separately or together
UMLS (Unified Medical Labeling System) Source: National Library of Medicine (NIH) Began in 1986 as long-term R&D project • Designed for systems developers • Develop multi-purpose tools to enhance understanding of medical meaning across systems • Overcome barriers to effective retrieval of machine-readable information • Overcome variety of ways the same concepts are expressed in machine readable and human language Metadata Taxonomies & Thesauri Practical Uses
UMLS Uses Source: National Library of Medicine (NIH) • Information retrieval • Thesaurus construction • Natural language processing • Automated indexing • Electronic health records (EHR) • Distribution mechanism for • HIPAA, CHI, PHIN regulatory standards • SNOMED CT Metadata Taxonomies & Thesauri Practical Uses
UMLS Metathesaurus http://www.nlm.nih.gov/research/umls/ Metadata Taxonomies & Thesauri Practical Uses
UMLS Metathesaurus http://www.nlm.nih.gov/research/umls/ Metadata Taxonomies & Thesauri Practical Uses
UMLS Thesaurus Browser http://www.nlm.nih.gov/research/umls/ Metadata Taxonomies & Thesauri Practical Uses
Think for a sec… • You are developing a content-rich site and need organization and labeling schemes to help users view/browse/learn/find stuff – what do you do? • Define your own tagging/organization scheme? • Let the users define their own? • Leave it all to a search engine? • Use some existing scheme? • . . . Metadata Taxonomies & Thesauri Practical Uses
Applying IA Principles • Focus on users and user needs – users are different, and have different models • Focus on content – concepts are different, too – different levels, words, complexity, vagueness • Examples: • What’s the difference between laptop, PDA, phone, and convergence device? • When is “cancer research” “oncology”? • When a user browses a furniture catalog for chairs, do you show them ottomans and footstools? Metadata Taxonomies & Thesauri Practical Uses
Standard Thesaurus Structure Broader Terms Computer IS-A Preferred Notebook Laptop Synonyms (variants) AKA IS-A Metadata Taxonomies & Thesauri Practical Uses Narrower Terms DesktopReplacement Ultraportable Tablet PC
IA Uses of Thesauri • For organization • For navigation • For indexing content • For searching Metadata Taxonomies & Thesauri Practical Uses
Poly-Hierarchies • Concepts can have multiple parents • Example: Cracow (Poland : Voivodship) German death camps Auschwitz II-Birkenau (Poland : Death Camp) Metadata Taxonomies & Thesauri Practical Uses Block 25 (Auschwitz II-Birkenau) Kanada(Auschwitz II-Birkenau) From Shoah Foundation’s thesaurus of holocaust terms
Poly-Hierarchies • What are the advantages and disadvantages? • What’s the relationship to polysemy? Metadata Taxonomies & Thesauri Practical Uses
Practical Uses & Implementation • What are we trying to do (e.g., help users find stuff)? • What tools are at our disposal (e.g., tags, XML, databases)? • Given the above, how do we use/implement hierarchies and thesauri? Metadata Taxonomies & Thesauri Practical Uses
Faceted Hierarchies • Alternative to single and poly-hierarchies • Basic idea: • Describe objects along multiple facets • Each facet has its associated hierarchy • Issues: • What’s a facet? • How do you navigate faceted hierarchies? Metadata Taxonomies & Thesauri Practical Uses
Faceted Browsing Example Metadata Taxonomies & Thesauri Practical Uses
Faceted Browsing Example Metadata Taxonomies & Thesauri Practical Uses
Faceted Browsing Example Metadata Taxonomies & Thesauri Practical Uses Demo: http://flamenco.berkeley.edu/demos.html
Advantages of Facets • Integrates searching and browsing • Easy to build complex queries • Easy to narrow, broaden, shift focus • Helps users avoid getting lost • Helps to prevent “categorization wars” Metadata Taxonomies & Thesauri Practical Uses
Relationship to IA? Database WebServer ApplicationServer Network Ontologies are implicitly “hidden” here!!! Trip Airplane Type: Capacity: Part-of Equipment Flight Metadata Taxonomies & Thesauri Practical Uses From: Departure Time: Origin: To: Arrival Time: Destination: Rule: Arrival Time is always after Departure Time Rule: Distance from Origin to Destination typical > 100 miles
Putting it all together… mySQL Apache Database WebServer PHP Network Two-Layer Architecture Database WebServer ApplicationServer Network Metadata Taxonomies & Thesauri Practical Uses Three-Layer Architecture
Popular Implementation Presentation PHP/HTML Metadata Taxonomies & Thesauri Practical Uses Content Metadata SQL Database
Encoding Hierarchies A Table: Hierarchy B C Store in RDBMS D E F G H Metadata Taxonomies & Thesauri Practical Uses Finding children of A: Select child from Hierarchy where parent = ‘A’ B, C Finding parent of G: Select parent from Hierarchy where child = ‘G’ D Finding siblings of D: find parent, and then find its children
Encoding Metadata A Table: Items B C D E F G H Metadata Taxonomies & Thesauri Practical Uses
Content Presentation A You are here: A > C > D Related - D - E B C Contents at D D E F G H Metadata Taxonomies & Thesauri Practical Uses Hierarchy(child, parent) Content(id, attribute1, attribute2, attribute3, …)
Faceted Browsing Filter by - Facet1 (possible values) - Facet2 (possible values) Matching Results Metadata Taxonomies & Thesauri Practical Uses Hierarchy(child, parent) Content(id, attribute1, attribute2, attribute3, …)
Recap • Meta-data • General function • Types of meta-data • Taxonomies and Thesauri • Role in organizing, navigating and searching content • General-purpose taxonomies • Special-purpose taxonomies • Practical use & implementation Metadata Taxonomies & Thesauri Practical Uses