790 likes | 923 Views
eXtended Metadata Registry (XMDR) International Ecoinformatics Technical Collaboration Berkeley, California October 24, 2006. Bruce Bargmeyer, Lawrence Berkley National Laboratory University of California Tel: +1 510-495-2905 bebargmeyer@lbl.gov. Topics. Challenges to address
E N D
eXtended Metadata Registry (XMDR) International Ecoinformatics Technical Collaboration Berkeley, California October 24, 2006 Bruce Bargmeyer, Lawrence Berkley National Laboratory University of California Tel: +1 510-495-2905 bebargmeyer@lbl.gov
Topics • Challenges to address • A brief tutorial on Semantics and semantic computing • where XMDR fits • Semantic computing technologies • Traditional Data Administration • XMDR project • Test Bed demonstrations
The Internet Revolution A world wide web of diverse content: The information glut is nothing new. The access to it is astonishing.
Challenge: Find and process non-explicit data Analgesic Agent For example… Patient data on drugs contains brand names (e.g. Tylenol, Anacin-3, Datril,…); However, want to study patients taking analgesic agents Non-Narcotic Analgesic Analgesic and Antipyretic Nonsteroidal Antiinflammatory Drug Acetominophen Datril Tylenol Anacin-3
Challenge: Specify and compute across Relations, e.g., within a food web in an Arctic ecosystem An organism is connected to another organism for which it is a source of food energy and material by an arrow representing the direction of biomass transfer. Source: http://en.wikipedia.org/wiki/Food_web#Food_web (from SPIRE)
Contamination Biological Radioactive Chemical mercury lead cadmium Challenge: Combine Data, Metadata & Concept Systems Inference Search Query: “find water bodies downstream from Fletcher Creek where chemical contamination was over 10 micrograms per liter between December 2001 and March 2003” Concept system: Data: Metadata:
Dublin Core Registries Software Component Registries Common Content Common Content Challenge: Use data from systems that record the same facts with different terms Database Catalogs Common Content ISO 11179Registries UDDIRegistries Table Column Data Element Common Content Common Content Business Specification Country Identifier OASIS/ebXMLRegistries CASE Tool Repositories XML Tag Attribute Common Content Common Content Business Object Coverage TermHierarchy OntologicalRegistries Common Content
Challenge: Draw information together from a broad range of studies, databases, reports, etc.
Challenge: Gain Common Understanding of meaning between Data Creators and Data Users text text data data environ agriculture climate human health industry tourism soil water air ambiente agricultura tiempo salud hunano industria turismo tierra agua aero 123 345 445 670 248 591 308 123 345 445 670 248 591 308 3268 0825 1348 5038 2708 0000 2178 3268 0825 1348 5038 2708 0000 2178 123 345 445 670 248 591 308 123 345 445 670 248 591 308 3268 0825 1348 5038 2708 0000 2178 3268 0825 1348 5038 2708 0000 2178 A common interpretation of what the data represents EEA USGS text data environ agriculture climate human health industry tourism soil water air DoD 123 345 445 670 248 591 308 123 345 445 670 248 591 308 3268 0825 1348 5038 2708 0000 2178 3268 0825 1348 5038 2708 0000 2178 Users text data environ agriculture climate human health industry tourism soil water air EPA 123 345 445 670 248 591 308 123 345 445 670 248 591 308 3268 0825 1348 5038 2708 0000 2178 3268 0825 1348 5038 2708 0000 2178 text data 3268 0825 1348 5038 2708 0000 2178 123 345 445 670 248 591 308 ambiente agricultura tiempo salud huno industria turismo tierra agua aero 123 345 445 670 248 591 308 3268 0825 1348 5038 Others . . . Users Information systems Data Creation
Semantic Computing and XMDR • We are laying the foundation to make a quantum leap toward a substantially new way of computing: Semantic Computing • How can we make use of semantic computing for the environment and health? • What do environmental agencies need to do to prepare for and stimulate semantic computing? • What are the ecoinformatics challenges?
Coming: A Semantic Revolution • Searching and ranking • Pattern analysis • Knowledge discovery • Question answering • Reasoning • Semi-automated • decision making
The Nub of It • Processing that takes “meaning” into account • Processing based on the relations between things not just computing about the things themselves. • Processing that takes people out of the processing, reducing the human toil • Data access, extraction, mapping, translation, formatting, validation, inferencing, … • Delivering higher-level results that are more helpful for the user’s thought and action
A Brief Tutorial on Semantics • What is meaning? • What are concepts? • What are relations? • What are concept systems? • What is “reasoning”?
Thought or Reference (Concept) Refers to Symbolises Symbol Referent Stands for “Rose”, “ClipArt” Meaning: The Semiotic Triangle C.K Ogden and I. A. Richards. The Meaning of Meaning.
CONCEPT Refers To Symbolizes “Rose”, “ClipArt” Stands For Referent Semiotic Triangle:Concepts, Definitions and Signs Definition Sign
CONCEPT Refers To Symbolizes “Rose”, “ClipArt” Stands For Referent Forms of Definitions Definition - Define by: --Essence & Differentia --Relations --Axioms Sign
Definition of Concept - Rose: Dictionary - Essence & Differentia • 1. any of the wild or cultivated, usually prickly-stemmed, pinnate-leaved, showy-flowered shrubs of the genus Rosa. Cf. rose family. • 2. any of various related or similar plants. • 3. the flower of any such shrub, of a red, pink, white, or yellow color. --Random House Webster’s Unabridged Dictionary (2003)
Definitions in the EPA Environmental Data Registry http://www.epa/gov/edr/sw/AdministeredItem#MailingAddress The exact address where a mail piece is intended to be delivered, including urban-style address, rural route, and PO Box Mailing Address: State USPS Code: http://www.epa/gov/edr/sw/AdministeredItem#StateUSPSCode The U.S. Postal Service (USPS) abbreviation that represents a state or state equivalent for the U.S. or Canada Mailing Address State Name: http://www.epa/gov/edr/sw/AdministeredItem#StateName The name of the state where mail is delivered
Definition of Concept - Rose: Relations to Other Concepts Love Romance Marriage CONCEPT Refers To Symbolizes “Rose”, “ClipArt” Stands For Referent
Definition of Concept - Rose:Defined by Axioms in OWL rdfs:subClassOf owl:equivalentClass owl:disjointWith CONCEPT Refers To Symbolizes “Rose”, “ClipArt” Stands For Referent
Class Axiom (Definitions)Class Description is Building Block of Class Axiom • A class description is the term used in this document (and in the OWL Semantics and Abstract Syntax) for the basic building blocks of class axioms (informally called class definitions in the Overview and Guide documents). A class description describes an OWL class, either by a class name or by specifying the class extension of an unnamed anonymous class. • OWL distinguishes six types of class descriptions: • a class identifier (a URI reference) • an exhaustive enumeration of individuals that together form the instances of a class • a property restriction • the intersection of two or more class descriptions • the union of two or more class descriptions • the complement of a class description • The first type is special in the sense that it describes a class through a class name (syntactically represented as a URI reference). The other five types of class descriptions describe an anonymous class by placing constraints on the class extension. • Class descriptions of type 2-6 describe, respectively, a class that contains exactly the enumerated individuals (2nd type), a class of all individuals which satisfy a particular property restriction (3rd type), or a class that satisfies boolean combinations of class descriptions (4th, 5th and 6th type). Intersection, union and complement can be respectively seen as the logical AND, OR and NOT operators. The four latter types of class descriptions lead to nested class descriptions and can thus in theory lead to arbitrarily complex class descriptions. In practice, the level of nesting is usually limited.
Class Descriptions -> Class Axiom • Class descriptions form the building blocks for defining classes through class axioms. The simplest form of a class axiom is a class description of type 1, It just states the existence of a class, using owl:Class with a class identifier. • For example, the following class axiom declares the URI reference #Human to be the name of an OWL class: • <owl:Class rdf:ID="Human"/> This is correct OWL, but does not tell us very much about the class Human. Class axioms typically contain additional components that state necessary and/or sufficient characteristics of a class. OWL contains three language constructs for combining class descriptions into class axioms: • rdfs:subClassOf allows one to say that the class extension of a class description is a subset of the class extension of another class description. • owl:equivalentClass allows one to say that a class description has exactly the same class extension as another class description. • owl:disjointWith allows one to say that the class extension of a class description has no members in common with the class extension of another class description.
Computable Meaning rdfs:subClassOf owl:equivalentClass owl:disjointWith CONCEPT Refers To Symbolizes “Rose”, “ClipArt” Stands For Referent If “rose” is owl:disjointWith “daffodil”, then a computer can determine that an assertion is invalid, if it states that a rose is also a daffodil (e.g., in a knowledgebase).
What are Relations? WaterBody Relation Merced River Fletcher Creek isA isA Merced Lake Merced Lake Fletcher Creek Concepts and relations can be represented as nodes and edges in formal graph structures, e.g., “is-a” hierarchies.
Concept Systems have Nodes and may have Relations Nodes represent concepts A Lines (arcs) represent relations 1 2 a b c d Concept systems can be represented & queried as graphs
Linear Large Non-linear Non-linear Large linear Small linear Small non- linear Deep Natural Flowing Shallow Stagnant Artificial River Stream Canal Reservoir Lake Marsh Pond A More Complex Concept Graph Concept lattice of inland water features From Supervaluation Semantics for an Inland Water Feature Ontology Paulo Santos and Brandon Bennett http://ijcai.org/papers/1187.pdf#search=%22terminology%20water%20ontology%22
Types of Concept System Graph Structures • Trees • Partially Ordered Trees • Ordered Trees • Faceted Classifications • Directed Acyclic Graphs • Partially Ordered Graphs • Lattices • Bipartite Graphs • Directed Graphs • Cliques • Compound Graphs
Directed Acyclic Graph Tree Bipartite Graph Partial Order Graph Partial Order Tree Clique Powerset of 3 element set Ordered Tree Compound Graph Faceted Classification Types of Concept System Graph Structures
Graph Taxonomy Graph Directed Graph Undirected Graph Directed Acyclic Graph Clique Bipartite Graph Partial Order Graph Faceted Classification Lattice Partial Order Tree Note: not all bipartite graphs are undirected. Tree Ordered Tree
What Kind of Relations are There?Lots! Relationship class: A particular type of connection existing between people related to or having dealings with each other. • acquaintanceOf - A person having more than slight or superficial knowledge of this person but short of friendship. • ambivalentOf - A person towards whom this person has mixed feelings or emotions. • ancestorOf - A person who is a descendant of this person. • antagonistOf - A person who opposes and contends against this person. • apprenticeTo - A person to whom this person serves as a trusted counselor or teacher. • childOf - A person who was given birth to or nurtured and raised by this person. • closeFriendOf - A person who shares a close mutual friendship with this person. • collaboratesWith - A person who works towards a common goal with this person. • …
Example of relations in a food web in an Arctic ecosystem An organism is connected to another organism for which it is a source of food energy and material by an arrow representing the direction of biomass transfer. Source: http://en.wikipedia.org/wiki/Food_web#Food_web (from SPIRE)
Ontologies are a type of Concept System • Ontology: explicit formal specifications of the terms in the domain and relations among them (Gruber 1993) • An ontology defines a common vocabulary for researchers who need to share information in a domain. It includes machine-interpretable definitions of basic concepts in the domain and relations among them. • Why would someone want to develop an ontology? Some of the reasons are: • To share common understanding of the structure of information among people or software agents • To enable reuse of domain knowledge • To make domain assumptions explicit • To separate domain knowledge from the operational knowledge • To analyze domain knowledge http://www.ksl.stanford.edu/people/dlm/papers/ontology101/ontology101-noy-mcguinness.html
What is Reasoning?Inference Disease is-a is-a Infectious Disease Chronic Disease is-a is-a is-a is-a Heart disease Diabetes Polio Smallpox Signifies inferred is-a relationship
California part-of part-of Alameda County Santa Clara County part-of part-of part-of part-of San Jose Berkeley Santa Clara Oakland Reasoning: Taxonomies & partonomies can be used to support inference queries E.g., if a database contains information on events by city, we could query that database for events that happened in a particular county or state, even though the event data does not contain explicit state or county codes.
Reasoning: Relationship metadata can be used to infer non-explicit data Analgesic Agent • For example… • patient data on drugs currently being taken contains brand names (e.g. Tylenol, Anacin-3, Datril,…); • (2)concept system connects different drug types and names with one another (via is-a, part-of, etc. relationships); • (3) so… patient data can be linked and searched by inferred terms like “acetominophen” and “analgesic” as well as trade names explicitly stored as text strings in the database Non-Narcotic Analgesic Analgesic and Antipyretic Nonsteroidal Antiinflammatory Drug Acetominophen Datril Tylenol Anacin-3
Analgesic Agent Opioid Non-Narcotic Analgesic Opiate Morphine Sulfate Codeine Phosphate Nonsteroidal Antiinflammatory Drug Acetominophen Reasoning: Least Common Ancestor Query What is the least common ancestor concept in the NCI Thesaurus for AcetominophenandMorphine Sulfate? (answer = Analgesic Agent) Analgesic and Antipyretic
Reasoning: Example “sibling” queries: concepts that share a common ancestor • Environmental: • "siblings" of Wetland (in NASA SWEET ontology) • Health • Siblings of ERK1 finds all 700+ other kinase enzymes • Siblings of Novastatin finds all other statins • 11179 Metadata • Sibling values in an enumerated value domain
Reasoning: More complex “sibling” queries: concepts with multiple ancestors site neoplasms breast disorders • Health • Find all the siblings of Breast Neoplasm • Environmental • Find all chemicals that are a • carcinogen (cause cancer) and • toxin (are poisonous) and • terratogenic (cause birth defects) Breast neoplasm Non-Neoplastic Breast Disorder Eye neoplasm Respiratory System neoplasm
End of Tutorial about concept systems Where does ISO/IEC 11179 fit?
Data Generation and UseCost vs. Coordination Full Control $ Community of Interest Data Creation Reporting Coordination Autonomous
Data Generation and UseCost vs. Coordination Data Use Full Control $ Community of Interest Data Creation Reporting Coordination Autonomous
ISO/IEC 11179 Metadata Registries Reduce Cost of Data Creation and Use Data Use Full Control $ Community of Interest Data Creation Reporting Coordination Autonomous
Metadata Registries Increase the Benefitfrom Data (Strategic Effectiveness) Benefit Full Control Community of Interest Autonomous Reporting MDR
What Can ISO/IEC 11179 MDR Do? Traditional Data Management (11179 Edition 2) • Register metadata which describes data—in databases, applications, XML Schemas, data models, flat files, paper • Assist in harmonizing, standardizing, and vetting metadata • Assist data engineering • Provide a source of well formed data designs for system designers • Record reporting requirements • Assist data generation, by describing the meaning of data entry fields and the potential valid values • Register provenance information that can be provided to end users of data • Assist with information discovery by pointing to systems where particular data is maintained.
Traditional MDR:Manage Code Sets Name: Country Identifiers Context: Definition: Unique ID: 5769 Conceptual Domain: Maintenance Org.: Steward: Classification: Registration Authority: Others DataElementConcept Algeria Belgium China Denmark Egypt France . . . Zimbabwe Data Elements Algeria Belgium China Denmark Egypt France . . . Zimbabwe L`Algérie Belgique Chine Danemark Egypte La France . . . Zimbabwe DZ BE CN DK EG FR . . . ZW DZA BEL CHN DNK EGY FRA . . . ZWE 012 056 156 208 818 250 . . . 716 Name: Context: Definition: Unique ID: 4572 Value Domain: Maintenance Org. Steward: Classification: Registration Authority: Others ISO 3166 3-Alpha Code ISO 3166 English Name ISO 3166 French Name ISO 3166 2-Alpha Code ISO 3166 3-Numeric Code
What Can XMDR Do? Support a new generation of semantic computing • Concept system management • Harmonizing and vetting concept systems • Linkage of concept systems to data • Interrelation of multiple concept systems • Grounding ontologies and RDF in agreed upon semantics • Reasoning across XMDR content • Provision of Semantic Services
Coming: A Semantic Revolution • Searching and ranking • Pattern analysis • Knowledge discovery • Question answering • Reasoning • Semi-automated decision making Full Control Community of Interest Reporting Autonomous
We are trying to manage semantics in an increasingly complex content space Structured data Semi-structured data Unstructured data Text Pictographic Graphics Multimedia Voice video
11179-3 (E3) Increases MDR Benefit When communities create information according to a common vocabulary the value of the resulting information increases dramatically. Benefit Full Control Community of Interest Autonomous Reporting MDR