1 / 79

eXtended Metadata Registry (XMDR) International Ecoinformatics Technical Collaboration

eXtended Metadata Registry (XMDR) International Ecoinformatics Technical Collaboration Berkeley, California October 24, 2006. Bruce Bargmeyer, Lawrence Berkley National Laboratory University of California Tel: +1 510-495-2905 bebargmeyer@lbl.gov. Topics. Challenges to address

Download Presentation

eXtended Metadata Registry (XMDR) International Ecoinformatics Technical Collaboration

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. eXtended Metadata Registry (XMDR) International Ecoinformatics Technical Collaboration Berkeley, California October 24, 2006 Bruce Bargmeyer, Lawrence Berkley National Laboratory University of California Tel: +1 510-495-2905 bebargmeyer@lbl.gov

  2. Topics • Challenges to address • A brief tutorial on Semantics and semantic computing • where XMDR fits • Semantic computing technologies • Traditional Data Administration • XMDR project • Test Bed demonstrations

  3. The Internet Revolution A world wide web of diverse content: The information glut is nothing new. The access to it is astonishing.

  4. Challenge: Find and process non-explicit data Analgesic Agent For example… Patient data on drugs contains brand names (e.g. Tylenol, Anacin-3, Datril,…); However, want to study patients taking analgesic agents Non-Narcotic Analgesic Analgesic and Antipyretic Nonsteroidal Antiinflammatory Drug Acetominophen Datril Tylenol Anacin-3

  5. Challenge: Specify and compute across Relations, e.g., within a food web in an Arctic ecosystem An organism is connected to another organism for which it is a source of food energy and material by an arrow representing the direction of biomass transfer. Source: http://en.wikipedia.org/wiki/Food_web#Food_web (from SPIRE)

  6. Contamination Biological Radioactive Chemical mercury lead cadmium Challenge: Combine Data, Metadata & Concept Systems Inference Search Query: “find water bodies downstream from Fletcher Creek where chemical contamination was over 10 micrograms per liter between December 2001 and March 2003” Concept system: Data: Metadata:

  7. Dublin Core Registries Software Component Registries Common Content Common Content Challenge: Use data from systems that record the same facts with different terms Database Catalogs Common Content ISO 11179Registries UDDIRegistries Table Column Data Element Common Content Common Content Business Specification Country Identifier OASIS/ebXMLRegistries CASE Tool Repositories XML Tag Attribute Common Content Common Content Business Object Coverage TermHierarchy OntologicalRegistries Common Content

  8. Challenge: Draw information together from a broad range of studies, databases, reports, etc.

  9. Challenge: Gain Common Understanding of meaning between Data Creators and Data Users text text data data environ agriculture climate human health industry tourism soil water air ambiente agricultura tiempo salud hunano industria turismo tierra agua aero 123 345 445 670 248 591 308 123 345 445 670 248 591 308 3268 0825 1348 5038 2708 0000 2178 3268 0825 1348 5038 2708 0000 2178 123 345 445 670 248 591 308 123 345 445 670 248 591 308 3268 0825 1348 5038 2708 0000 2178 3268 0825 1348 5038 2708 0000 2178 A common interpretation of what the data represents EEA USGS text data environ agriculture climate human health industry tourism soil water air DoD 123 345 445 670 248 591 308 123 345 445 670 248 591 308 3268 0825 1348 5038 2708 0000 2178 3268 0825 1348 5038 2708 0000 2178 Users text data environ agriculture climate human health industry tourism soil water air EPA 123 345 445 670 248 591 308 123 345 445 670 248 591 308 3268 0825 1348 5038 2708 0000 2178 3268 0825 1348 5038 2708 0000 2178 text data 3268 0825 1348 5038 2708 0000 2178 123 345 445 670 248 591 308 ambiente agricultura tiempo salud huno industria turismo tierra agua aero 123 345 445 670 248 591 308 3268 0825 1348 5038 Others . . . Users Information systems Data Creation

  10. Semantic Computing and XMDR • We are laying the foundation to make a quantum leap toward a substantially new way of computing: Semantic Computing • How can we make use of semantic computing for the environment and health? • What do environmental agencies need to do to prepare for and stimulate semantic computing? • What are the ecoinformatics challenges?

  11. Coming: A Semantic Revolution • Searching and ranking • Pattern analysis • Knowledge discovery • Question answering • Reasoning • Semi-automated • decision making

  12. The Nub of It • Processing that takes “meaning” into account • Processing based on the relations between things not just computing about the things themselves. • Processing that takes people out of the processing, reducing the human toil • Data access, extraction, mapping, translation, formatting, validation, inferencing, … • Delivering higher-level results that are more helpful for the user’s thought and action

  13. A Brief Tutorial on Semantics • What is meaning? • What are concepts? • What are relations? • What are concept systems? • What is “reasoning”?

  14. Thought or Reference (Concept) Refers to Symbolises Symbol Referent Stands for “Rose”, “ClipArt” Meaning: The Semiotic Triangle C.K Ogden and I. A. Richards. The Meaning of Meaning.

  15. CONCEPT Refers To Symbolizes “Rose”, “ClipArt” Stands For Referent Semiotic Triangle:Concepts, Definitions and Signs Definition Sign

  16. CONCEPT Refers To Symbolizes “Rose”, “ClipArt” Stands For Referent Forms of Definitions Definition - Define by: --Essence & Differentia --Relations --Axioms Sign

  17. Definition of Concept - Rose: Dictionary - Essence & Differentia • 1. any of the wild or cultivated, usually prickly-stemmed, pinnate-leaved, showy-flowered shrubs of the genus Rosa. Cf. rose family. • 2. any of various related or similar plants. • 3. the flower of any such shrub, of a red, pink, white, or yellow color. --Random House Webster’s Unabridged Dictionary (2003)

  18. Definitions in the EPA Environmental Data Registry http://www.epa/gov/edr/sw/AdministeredItem#MailingAddress The exact address where a mail piece is intended to be delivered, including urban-style address, rural route, and PO Box Mailing Address: State USPS Code: http://www.epa/gov/edr/sw/AdministeredItem#StateUSPSCode The U.S. Postal Service (USPS) abbreviation that represents a state or state equivalent for the U.S. or Canada Mailing Address State Name: http://www.epa/gov/edr/sw/AdministeredItem#StateName The name of the state where mail is delivered

  19. Definition of Concept - Rose: Relations to Other Concepts Love Romance Marriage CONCEPT Refers To Symbolizes “Rose”, “ClipArt” Stands For Referent

  20. SNOMED – Terms Defined by Relations

  21. Definition of Concept - Rose:Defined by Axioms in OWL rdfs:subClassOf owl:equivalentClass owl:disjointWith CONCEPT Refers To Symbolizes “Rose”, “ClipArt” Stands For Referent

  22. Class Axiom (Definitions)Class Description is Building Block of Class Axiom • A class description is the term used in this document (and in the OWL Semantics and Abstract Syntax) for the basic building blocks of class axioms (informally called class definitions in the Overview and Guide documents). A class description describes an OWL class, either by a class name or by specifying the class extension of an unnamed anonymous class. • OWL distinguishes six types of class descriptions: • a class identifier (a URI reference) • an exhaustive enumeration of individuals that together form the instances of a class • a property restriction • the intersection of two or more class descriptions • the union of two or more class descriptions • the complement of a class description • The first type is special in the sense that it describes a class through a class name (syntactically represented as a URI reference). The other five types of class descriptions describe an anonymous class by placing constraints on the class extension. • Class descriptions of type 2-6 describe, respectively, a class that contains exactly the enumerated individuals (2nd type), a class of all individuals which satisfy a particular property restriction (3rd type), or a class that satisfies boolean combinations of class descriptions (4th, 5th and 6th type). Intersection, union and complement can be respectively seen as the logical AND, OR and NOT operators. The four latter types of class descriptions lead to nested class descriptions and can thus in theory lead to arbitrarily complex class descriptions. In practice, the level of nesting is usually limited.

  23. Class Descriptions -> Class Axiom • Class descriptions form the building blocks for defining classes through class axioms. The simplest form of a class axiom is a class description of type 1, It just states the existence of a class, using owl:Class with a class identifier. • For example, the following class axiom declares the URI reference #Human to be the name of an OWL class: • <owl:Class rdf:ID="Human"/> This is correct OWL, but does not tell us very much about the class Human. Class axioms typically contain additional components that state necessary and/or sufficient characteristics of a class. OWL contains three language constructs for combining class descriptions into class axioms: • rdfs:subClassOf allows one to say that the class extension of a class description is a subset of the class extension of another class description. • owl:equivalentClass allows one to say that a class description has exactly the same class extension as another class description. • owl:disjointWith allows one to say that the class extension of a class description has no members in common with the class extension of another class description.

  24. Computable Meaning rdfs:subClassOf owl:equivalentClass owl:disjointWith CONCEPT Refers To Symbolizes “Rose”, “ClipArt” Stands For Referent If “rose” is owl:disjointWith “daffodil”, then a computer can determine that an assertion is invalid, if it states that a rose is also a daffodil (e.g., in a knowledgebase).

  25. What are Relations? WaterBody Relation Merced River Fletcher Creek isA isA Merced Lake Merced Lake Fletcher Creek Concepts and relations can be represented as nodes and edges in formal graph structures, e.g., “is-a” hierarchies.

  26. Concept Systems have Nodes and may have Relations Nodes represent concepts A Lines (arcs) represent relations 1 2 a b c d Concept systems can be represented & queried as graphs

  27. Linear Large Non-linear Non-linear Large linear Small linear Small non- linear Deep Natural Flowing Shallow Stagnant Artificial River Stream Canal Reservoir Lake Marsh Pond A More Complex Concept Graph Concept lattice of inland water features From Supervaluation Semantics for an Inland Water Feature Ontology Paulo Santos and Brandon Bennett http://ijcai.org/papers/1187.pdf#search=%22terminology%20water%20ontology%22

  28. Types of Concept System Graph Structures • Trees • Partially Ordered Trees • Ordered Trees • Faceted Classifications • Directed Acyclic Graphs • Partially Ordered Graphs • Lattices • Bipartite Graphs • Directed Graphs • Cliques • Compound Graphs

  29. Directed Acyclic Graph Tree Bipartite Graph Partial Order Graph Partial Order Tree Clique Powerset of 3 element set Ordered Tree Compound Graph Faceted Classification Types of Concept System Graph Structures

  30. Graph Taxonomy Graph Directed Graph Undirected Graph Directed Acyclic Graph Clique Bipartite Graph Partial Order Graph Faceted Classification Lattice Partial Order Tree Note: not all bipartite graphs are undirected. Tree Ordered Tree

  31. What Kind of Relations are There?Lots! Relationship class: A particular type of connection existing between people related to or having dealings with each other. • acquaintanceOf - A person having more than slight or superficial knowledge of this person but short of friendship. • ambivalentOf - A person towards whom this person has mixed feelings or emotions. • ancestorOf - A person who is a descendant of this person. • antagonistOf - A person who opposes and contends against this person. • apprenticeTo - A person to whom this person serves as a trusted counselor or teacher. • childOf - A person who was given birth to or nurtured and raised by this person. • closeFriendOf - A person who shares a close mutual friendship with this person. • collaboratesWith - A person who works towards a common goal with this person. • …

  32. Example of relations in a food web in an Arctic ecosystem An organism is connected to another organism for which it is a source of food energy and material by an arrow representing the direction of biomass transfer. Source: http://en.wikipedia.org/wiki/Food_web#Food_web (from SPIRE)

  33. Ontologies are a type of Concept System • Ontology: explicit formal specifications of the terms in the domain and relations among them (Gruber 1993) • An ontology defines a common vocabulary for researchers who need to share information in a domain. It includes machine-interpretable definitions of basic concepts in the domain and relations among them. • Why would someone want to develop an ontology? Some of the reasons are: • To share common understanding of the structure of information among people or software agents • To enable reuse of domain knowledge • To make domain assumptions explicit • To separate domain knowledge from the operational knowledge • To analyze domain knowledge http://www.ksl.stanford.edu/people/dlm/papers/ontology101/ontology101-noy-mcguinness.html

  34. What is Reasoning?Inference Disease is-a is-a Infectious Disease Chronic Disease is-a is-a is-a is-a Heart disease Diabetes Polio Smallpox Signifies inferred is-a relationship

  35. California part-of part-of Alameda County Santa Clara County part-of part-of part-of part-of San Jose Berkeley Santa Clara Oakland Reasoning: Taxonomies & partonomies can be used to support inference queries E.g., if a database contains information on events by city, we could query that database for events that happened in a particular county or state, even though the event data does not contain explicit state or county codes.

  36. Reasoning: Relationship metadata can be used to infer non-explicit data Analgesic Agent • For example… • patient data on drugs currently being taken contains brand names (e.g. Tylenol, Anacin-3, Datril,…); • (2)concept system connects different drug types and names with one another (via is-a, part-of, etc. relationships); • (3) so… patient data can be linked and searched by inferred terms like “acetominophen” and “analgesic” as well as trade names explicitly stored as text strings in the database Non-Narcotic Analgesic Analgesic and Antipyretic Nonsteroidal Antiinflammatory Drug Acetominophen Datril Tylenol Anacin-3

  37. Analgesic Agent Opioid Non-Narcotic Analgesic Opiate Morphine Sulfate Codeine Phosphate Nonsteroidal Antiinflammatory Drug Acetominophen Reasoning: Least Common Ancestor Query What is the least common ancestor concept in the NCI Thesaurus for AcetominophenandMorphine Sulfate? (answer = Analgesic Agent) Analgesic and Antipyretic

  38. Reasoning: Example “sibling” queries: concepts that share a common ancestor • Environmental: • "siblings" of Wetland (in NASA SWEET ontology) • Health • Siblings of ERK1 finds all 700+ other kinase enzymes • Siblings of Novastatin finds all other statins • 11179 Metadata • Sibling values in an enumerated value domain

  39. Reasoning: More complex “sibling” queries: concepts with multiple ancestors site neoplasms breast disorders • Health • Find all the siblings of Breast Neoplasm • Environmental • Find all chemicals that are a • carcinogen (cause cancer) and • toxin (are poisonous) and • terratogenic (cause birth defects) Breast neoplasm Non-Neoplastic Breast Disorder Eye neoplasm Respiratory System neoplasm

  40. End of Tutorial about concept systems Where does ISO/IEC 11179 fit?

  41. Data Generation and UseCost vs. Coordination Full Control $ Community of Interest Data Creation Reporting Coordination Autonomous

  42. Data Generation and UseCost vs. Coordination Data Use Full Control $ Community of Interest Data Creation Reporting Coordination Autonomous

  43. ISO/IEC 11179 Metadata Registries Reduce Cost of Data Creation and Use Data Use Full Control $ Community of Interest Data Creation Reporting Coordination Autonomous

  44. Metadata Registries Increase the Benefitfrom Data (Strategic Effectiveness) Benefit Full Control Community of Interest Autonomous Reporting MDR

  45. What Can ISO/IEC 11179 MDR Do? Traditional Data Management (11179 Edition 2) • Register metadata which describes data—in databases, applications, XML Schemas, data models, flat files, paper • Assist in harmonizing, standardizing, and vetting metadata • Assist data engineering • Provide a source of well formed data designs for system designers • Record reporting requirements • Assist data generation, by describing the meaning of data entry fields and the potential valid values • Register provenance information that can be provided to end users of data • Assist with information discovery by pointing to systems where particular data is maintained.

  46. Traditional MDR:Manage Code Sets Name: Country Identifiers Context: Definition: Unique ID: 5769 Conceptual Domain: Maintenance Org.: Steward: Classification: Registration Authority: Others DataElementConcept Algeria Belgium China Denmark Egypt France . . . Zimbabwe Data Elements Algeria Belgium China Denmark Egypt France . . . Zimbabwe L`Algérie Belgique Chine Danemark Egypte La France . . . Zimbabwe DZ BE CN DK EG FR . . . ZW DZA BEL CHN DNK EGY FRA . . . ZWE 012 056 156 208 818 250 . . . 716 Name: Context: Definition: Unique ID: 4572 Value Domain: Maintenance Org. Steward: Classification: Registration Authority: Others ISO 3166 3-Alpha Code ISO 3166 English Name ISO 3166 French Name ISO 3166 2-Alpha Code ISO 3166 3-Numeric Code

  47. What Can XMDR Do? Support a new generation of semantic computing • Concept system management • Harmonizing and vetting concept systems • Linkage of concept systems to data • Interrelation of multiple concept systems • Grounding ontologies and RDF in agreed upon semantics • Reasoning across XMDR content • Provision of Semantic Services

  48. Coming: A Semantic Revolution • Searching and ranking • Pattern analysis • Knowledge discovery • Question answering • Reasoning • Semi-automated decision making Full Control Community of Interest Reporting Autonomous

  49. We are trying to manage semantics in an increasingly complex content space Structured data Semi-structured data Unstructured data Text Pictographic Graphics Multimedia Voice video

  50. 11179-3 (E3) Increases MDR Benefit When communities create information according to a common vocabulary the value of the resulting information increases dramatically. Benefit Full Control Community of Interest Autonomous Reporting MDR

More Related