1 / 44

IBE312: Information Architecture 2013

IBE312: Information Architecture 2013. Ch . 9 – Metadata Many of the slides in this slideset are reproduced and/or modified content from publically available slidesets by Paul Jacobs (2012), The iSchool , University of Maryland

dandre
Download Presentation

IBE312: Information Architecture 2013

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. IBE312: InformationArchitecture2013 Ch. 9 – Metadata Manyofthe slides in thisslidesetarereproduced and/or modifiedcontent from publicallyavailableslidesets by Paul Jacobs (2012), The iSchool, University of Maryland http://terpconnect.umd.edu/~psjacobs/s12/INFM700s12.htm. These materials were made available and licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United StatesSee http://creativecommons.org/licenses/by-nc-sa/3.0/us/ for details.

  2. Metadata • “Data about data” - Definitional and descriptive documentation/information about data… • From Free On-line Dictionary of Computing:Data about data. In data processing, meta-data is definitional data that provides information about or documentation of other data managed within an application or environment. For example, meta-data would document data about data elements or attributes, (name, size, data type, etc) and data about records or data structures (length, fields, columns, etc) and data about data (where it is located, how it is associated, ownership, etc.). Meta-data may include descriptive information about the context, quality and condition, or characteristics of the data. • (Some otherdefinitions.)

  3. Metadata • Why do we need this? • Types of metadata • Descriptive/subjective/content (e.g. author, subject, keywords, …) • Administrative (e.g. owner, rights, cost, creation date, version, …) • Technical (e.g. format, size, dependencies, programs) • . . . . • In practical terms: • Metadata helps users locate, navigate, interpret content • Metadata helps organizations manage content • Metadata helps systems manipulate content

  4. Data without Metadata… Who: authored it? to contact about data? What: are contents of database? When: was it collected? processed? finalized? Where: was the study done? Why: was the data collected? How: were data collected? processed? Verified? … can be pretty useless!

  5. Early Example of Metadata

  6. Menagerie of Terms • Classification • Hierarchies • Epistemology • Directories • Controlled vocabularies • Knowledge representation Let’s focus on significant differences. Let’s focus on advantages/disadvantages. Let’s focus on how each is useful.

  7. Controlled Vocabulary • Any defined subset of natural language • List of equivalent terms (synonym rings) • Use search logs. • List of preferred terms (authority files) • Commonly also include variant terms • Educating users, enabling browsing • Term rotation (pointers in index) p.201 • Classification scheme / taxonomy • Hierarchical relationships (narrower/broader)

  8. ControlledVocabulary Queriescan be ”exploded” to increaserecall

  9. ControlledVocabularyauthority file – inclusive, preferred term can serve as theuniqueidentifier for a collectionof terms, educateusers

  10. Related Terms & Techniques • Taxonomies • Anything organized in some sort of hierarchical structure • Tagging • Adding almost any kind of metadata to content, but now often descriptive and user-provided • Thesauri • Focus on relations between terms • Focus on “concepts” • Ontologies • Usually model a specific domain or part of the world • Generally machine-readable Increasing complexity and richness Metadata Taxonomies & Thesauri Practical Uses

  11. How are taxonomies, tagging, controlled vocabularies and thesauri used? • The semantic gap: What’s the problem? • Synonymy – roughly, different words or phrases can be used to express similar ideas (e.g. “notebook”, “laptop”) • Polysemy – roughly, the same word can have different meanings (e.g., “line” (fishing, code, queue, . . .) ) • Taxonomies try to group similar concepts • “Tags” often assign words to concepts, making it easier to find related concepts • Controlled vocabularies avoid ambiguity (like a specific tag set) • Thesauri represent attempts to better organize mappings between words and concepts Do these present precision or recall problems?

  12. Taxonomies • Organization of objects according to some principle • Familiar examples: • Linnaean taxonomy (for living organisms) • Web directories (e.g., Yahoo or ODP) • Corporate directories • Organization charts • Organizational structures previously discussed Metadata Taxonomies & Thesauri Practical Uses

  13. Tagging- e.g. Flickr– popular tags Metadata Taxonomies & Thesauri Practical Uses

  14. Flickr – related tags Metadata Taxonomies & Thesauri Practical Uses

  15. Del.icio.us – related tags Metadata Taxonomies & Thesauri Practical Uses

  16. Thesauri: Motivation • “Semantic gap” between concepts and words • Online thesauri help mapping many synonyms or word variants onto one preferred term – improve precision in retrieval (p.203) • Words are used to evoke concepts • Concrete objects: MacBook Pro, iPhone • Abstract ideas: freedom, peace Concepts Ideas Words Meaning

  17. Thesauri • Book of synonyms, often including related and contrasting words and antonyms. • In this class: • A controlled vocabulary in which equivalence, hierarchical, and associative relationships are identified for purposes of improved retrieval. • Technical lingo … • Thesauri standards: ISO 2788, …

  18. Thesauri Types

  19. IA Uses of Thesauri • For organization • For navigation • For indexing content • For searching

  20. Applying IA Principles • Focus on users and user needs – users are different, and have different models • Focus on content – concepts are different, too – different levels, words, complexity, vagueness • Examples: • What’s the difference between laptop, PDA, phone, and convergence device? • When is “cancer research” “oncology”? • When a user browses a furniture catalog for chairs, do you show them ottomans and footstools?

  21. Standard Thesaurus Structure Broader Terms Computer IS-A Preferred Notebook Laptop Synonyms (variants) AKA IS-A Narrower Terms DesktopReplacement Ultraportable Tablet PC

  22. Semanticrelationships in a thesaurus • (pp. 204-205): Abbreviations: PT, VT, BT, NT, RT, Use (U) – VT use PT, Use For (UF) – full list of VT onthe PT record, Scope Note (SN) – meaningofthe term to ruleoutambiguity.

  23. Semanticrelationshipsof a winethesaurus, p. 206

  24. Some Real Examples • Content tagging and social media (e.g. flickr, del.i.cious) • Special-purpose classification schemes and thesauri (e.g. art & architecture thesaurus – AAT, UMLS) • General semantic tools and classification schemes (e.g., Princeton WordNet, Roget’s Thesaurus)

  25. Art & Architecture Thesaurus http://www.getty.edu/research/conducting_research/vocabularies/aat/ Metadata Taxonomies & Thesauri Practical Uses

  26. UMLS (Unified Medical Labeling System) Source: National Library of Medicine (NIH) SPECIALIST Lexicon +Tools Semantic Network Metathesaurus 135 broad categories and 54 relationships between them lexical information and programs for language processing 1 million+ biomedical concepts from over 100 sources Metadata Taxonomies & Thesauri Practical Uses 3 Knowledge Sources used separately or together

  27. E.g. UMLS (Unified Medical Labeling System) Source: National Library of Medicine (NIH) Began in 1986 as long-term R&D project • Designed for systems developers • Develop multi-purpose tools to enhance understanding of medical meaning across systems • Overcome barriers to effective retrieval of machine-readable information • Overcome variety of ways the same concepts are expressed in machine readable and human language Metadata Taxonomies & Thesauri Practical Uses

  28. UMLS Uses Source: National Library of Medicine (NIH) • Information retrieval • Thesaurus construction • Natural language processing • Automated indexing • Electronic health records (EHR) • Distribution mechanism for • HIPAA, CHI, PHIN regulatory standards • SNOMED CT Metadata Taxonomies & Thesauri Practical Uses

  29. UMLS Metathesaurus http://www.nlm.nih.gov/research/umls/

  30. UMLS Metathesaurus http://www.nlm.nih.gov/research/umls/

  31. UMLS Thesaurus Browser http://www.nlm.nih.gov/research/umls/

  32. Semantic Relationships • Equivalence (PT = VT) • Hierarchical: Generic (Bird NT Magpie), whole-part (Foot NT big toe) or instance (Seas NT Mediterranean Sea) • Faceted / multiple hierarchies • Associative • Related terms (hammer RT nail) • Preferred terms: • Form, selection, definition and specificity • Polyhierarchy(Medline corss-lists viral pneumonia under both ...Fig 9-25, p. 220) • Faceted classification – multiple taxonomies that focus on different dimensions of the content. (e.g. wine.com pp. 223-224.)

  33. Associative Term

  34. Poly-Hierarchies • Concepts can have multiple parents • Example: • What are the advantages and disadvantages? • What’s the relationship to polysemy? Cracow (Poland : Voivodship) German death camps Auschwitz II-Birkenau (Poland : Death Camp) Block 25 (Auschwitz II-Birkenau) Kanada(Auschwitz II-Birkenau) From Shoah Foundation’s thesaurus of holocaust terms

  35. Faceted Hierarchies • Alternative to single and poly-hierarchies • Basic idea: • Describe objects along multiple facets • Each facet has its associated hierarchy • Issues: • What’s a facet? • How do you navigate faceted hierarchies?

  36. Faceted Browsing Example

  37. Faceted Browsing Example Demo: http://flamenco.berkeley.edu/demos.html

  38. Advantages of Facets • Integrates searching and browsing • Easy to build complex queries • Easy to narrow, broaden, shift focus • Helps users avoid getting lost • Helps to prevent “categorization wars”

  39. Relationship to IA? Database WebServer ApplicationServer Network Ontologies are implicitly “hidden” here!!! Trip Airplane Type: Capacity: Part-of Equipment Flight From: Departure Time: Origin: To: Arrival Time: Destination: Rule: Arrival Time is always after Departure Time Rule: Distance from Origin to Destination typical > 100 miles

  40. Putting it all together… mySQL Apache Database WebServer PHP Network Two-Layer Architecture Database WebServer ApplicationServer Network Three-Layer Architecture

  41. Popular Implementation Presentation PHP/HTML Content Metadata SQL Database

  42. Content  Presentation A You are here: A > C > D Related - D - E B C Contents at D D E F G H Hierarchy(child, parent) Content(id, attribute1, attribute2, attribute3, …)

  43. Faceted Browsing Filter by - Facet1 (possible values) - Facet2 (possible values) Matching Results Hierarchy(child, parent) Content(id, attribute1, attribute2, attribute3, …)

  44. Summary • Meta-data • General function • Types of meta-data • Taxonomies and Thesauri • Role in organizing, navigating and searching content • General-purpose taxonomies • Special-purpose taxonomies • Practical use & implementation

More Related