510 likes | 680 Views
Ontologies in Ecology and Biodiversity Informatics. Dave Thau With some slides by Shawn Bowers and Josh Madin gratefully reused with permission. Four Chapters. What are ontologies and why should we care? Some nitty gritty Ontologies in ecology and biodiversity informatics Tools. Talk Goals.
E N D
Ontologies in Ecology and Biodiversity Informatics Dave Thau With some slides by Shawn Bowers and Josh Madin gratefully reused with permission Dave Thau PASI, Costa Rica, June 7, 2008
Four Chapters • What are ontologies and why should we care? • Some nitty gritty • Ontologies in ecology and biodiversity informatics • Tools Dave Thau PASI, Costa Rica, June 7, 2008
Talk Goals • Learn about ontology successes • Learn basic terminology / buzz words • Get a sense for ontology development • See how they apply to ecology and biodiversity • Learn what remains to be done • Bottom line: A LOT! Dave Thau PASI, Costa Rica, June 7, 2008
Ontology Defined lives in has part Acrophora Ocean Trapeziid Crab Pincer lives in Dave Thau PASI, Costa Rica, June 7, 2008
notebook The Way It’s Been Dave Thau PASI, Costa Rica, June 7, 2008
The Plan How are the finches doing these days? Finch Fancy Repository World Finch Database Find data sets: “give me all data sets describing finch abundance” Finches R’ Us 2. Find analysis: “find a way to plot their distribution” Plotter Workflow Integrate data, plug it in, get results Dave Thau PASI, Costa Rica, June 7, 2008
Where Ontologies Can Help Finch Fancy Repository World Finch Database Finding the right Data sets Finches R’ Us Integrating the data Finding a good analysis and making sure data fits the analysis Plotter Workflow Making the results discoverable Dave Thau PASI, Costa Rica, June 7, 2008
Other Ways Ontologies Help • Crystalize knowledge • Lay open assumptions • Makes for great parties Dave Thau PASI, Costa Rica, June 7, 2008
Simple Assembly Subclass Of Assembly With Switch Instance Of Assembly-1 Successes Dave Thau PASI, Costa Rica, June 7, 2008
The GO Ontology: www.geneontology.org Dave Thau PASI, Costa Rica, June 7, 2008
Gene Ontology widely adopted AgBase Dave Thau PASI, Costa Rica, June 7, 2008
GO Stats GO Over 25,000 terms 19 Contributing groups GO Annotations UniProtKB O13035 GO:0004098 UniProtKB O13035 GO:0004336 UniProtKB O13035 GO:0004348 Total manual GO annotations - 388,633 Total proteins with manual annotations – 80,402 Total number distinct proteins – 2,971,374 Total number taxa – 129,318 I Dave Thau PASI, Costa Rica, June 7, 2008
Ontologies and You • User of “invisible” ontologies • like search • User of created ontologies • annotating data sets • Collaborator in ontology creation • biologist working with ontologist • Hands-on ontology builder • you’ll need more than a 1 hour talk… Dave Thau PASI, Costa Rica, June 7, 2008
Chapter I Summary • Ontologies can help • Locate data • Add semantics to data • Integrate data • Clarifiy domains • There are already good examples • In genomics • In biomedical field • In engineering Dave Thau PASI, Costa Rica, June 7, 2008
The Nitty Gritty • XML, RDF, OWL and other 3 letter words • Ontology Basics • Reasoning with Ontologies Dave Thau PASI, Costa Rica, June 7, 2008
XML, DTDs, XML Schema Not good for machines tools can’t automatically process how do you know it’s valid? Dave Thau PASI, Costa Rica, June 7, 2008
Col.,Ht.,Crabs hya,1.5,11 XML XML Schema string float integer XML, XML Schema <?xml version='1.0'?> <dataset> <dataitem> <col>hya</col> <ht>1.5</ht> <crabs>11</crabs> </dataitem> … </dataset> Dave Thau PASI, Costa Rica, June 7, 2008
XML and XML Schema • Now any machine can validate an XML document, given a schema • Languages to translate XML to PDF or HTML exist • But…. Can’t relate things • Like “the data in this file relates to study X” Dave Thau PASI, Costa Rica, June 7, 2008
RDF and RDF Schema The Resource Description Framework (RDF) • individuals (objects), properties, and classes Crab Coral subClassOf subClassOf livesIn T.Crab A. Coral type type livesIn That Coral My Crab Dave Thau PASI, Costa Rica, June 7, 2008
Person Person knows seeAlso name img FOAF name David Jacobs Jesse James Garrett homepage homepage blog.jjg.net randomwalks.com <foaf:Person> <foaf:weblog rdf:resource="http://hello.typepad.com/" /> <foaf:homepage rdf:resource="http://www.randomwalks.com" /> <foaf:name>David Jacobs</foaf:name> <bio:olb>I work in New York City with filmmakers, activists and educators. </bio:olb> <foaf:img rdf:resource="http://hello.typepad.com/mirrorshot.jpg" /> <foaf:knows> <foaf:Person> <foaf:name>Jesse James Garrett</foaf:name> <foaf:homepage rdf:resource="http://blog.jjg.net/weblog/" /> <rdfs:seeAlso rdf:resource="http://blog.jjg.net/foaf.rdf" /> </foaf:Person> </foaf:knows> </foaf:Person> RDF is Useful • GO is available in RDF • FOAF - Friend of a Friend • For example, go to • http://xml.mfd-consult.dk/foaf/explorer/ • Enter: http://hello.typepad.com/foaf.rdf • RSS - Really Simple Syndication • It’s probably in your browser • Yahoo pipes rss blender Dave Thau PASI, Costa Rica, June 7, 2008
The crab that bit me crab crab isa color T.crab crab has-color Basic Ontology Building Blocks Instances • The actual things of interest • For example, a specimen (that crab) Classes (concepts) • A set of instances that share certain characteristics • For example, the set of all crabs is-a • A is-a B means every instance of A is also an instance of B • A might have additional characteristics; more restrictions Properties (has-a / part-of) • Represent a characteristic • e.g., has Wings, has-color Yellow Dave Thau PASI, Costa Rica, June 7, 2008
Example of Pollution Ontology Dave Thau PASI, Costa Rica, June 7, 2008
Species is-a Human instance John Species Human Species John Human Human John Classes versus Instances - tricky! • If A is-a B, then every A is B • Every human, in this case, must also be a species • But “John” is not a species Dave Thau PASI, Costa Rica, June 7, 2008 (Guarino)
Car Engine Engine Car part-of is-a is not part-of Car • What are essential properties of Cars? • E.g., that they accommodate people? • Are these also essential for Engines? part-of part-of Engine Wheel Dave Thau PASI, Costa Rica, June 7, 2008 [Guarino]
Limitations of RDF-based Ontologies • No constraints - • “all red things have the color property with value red” • “Costa Rica has only one President” • Can’t create definitions by combining other definitions • Mother = Parent and Female • Can’t say concepts are equivalent or disjoint Dave Thau PASI, Costa Rica, June 7, 2008
OWL - The Web Ontology Language • Three different kinds • Lite - limited, but still powerful • DL - very expressive, can still reason • Full - extremely expressive, but unreasonable • Example Reasoning OWL • If all apples are red, and apples and manzanas are the same, then all manzanas are red Dave Thau PASI, Costa Rica, June 7, 2008
Reasoning about Taxonomy Peet’s 2005 Ranunculus data set: 9 Taxonomies 654 Taxa 704 Relations visualization by Martin Graham Dave Thau PASI, Costa Rica, June 7, 2008
Is This Right? Assuming disjoint children and complete partitioning of parents Benson, 1948 Kartesz, 2004 º ⊋ Ranunculus hydrocharoides Ranunculus hydrocharoides R.h. var stolonifer R.h. var typicus R.h. var stolonifer R.h. var typicus R.h. var natans º º Peet, 2005: B.1948:R.h.stolonifer is congruent to K.2004:R.h.stolonifer B.1948:R.h.typicus is congruent to K.2004:R.h.typicus B.1948:R. hydrocharoides is congruent to K.2004:R. hydrocharoides The most likely fix here is to change the congruence relation between the top two nodes to instead state that Benson's R. hydrocharoides includes Kartesz's Dave Thau PASI, Costa Rica, June 7, 2008
Getting Crazy with Properties • Properties can be: • Transitive (a is inCountry b, b is inCountry c..) • Inverse (a partOf b, b has_part a) • Functional (dave’s birthMother is vera) • Inverse functional (dave’s ssn is ….) • And you can say stuff like • Apples are only red • Some apples are red • Crabs have 2 pincers Dave Thau PASI, Costa Rica, June 7, 2008
Chapter II Summary • XML is about syntax • RDF is about relationships • OWL is about more complex constraints • Tips: • If A is-a B, then every instance of A is also an instance of B • Keep classes and instances separate • is-a is not part-of Dave Thau PASI, Costa Rica, June 7, 2008
Chapter III: Ontologies in Ecology • GO and friends are successful but.. • Hard to represent processes • Show me studies about the flow of nitrogen in highly saline lakes, starting with lake-side nitrate • Can’t be used for data integration • Ecologists use complex models that involve many relations beyond is-a and part of relations Dave Thau PASI, Costa Rica, June 7, 2008
Reminder:Where Ontology Can Help • Crystalizing domain knowledge • Marking up metadata and data sets • Marking up analyses, and analysis components Dave Thau PASI, Costa Rica, June 7, 2008
Alternet http://www5.umweltbundesamt.at/ALTERNet Taxonomic Working Group Standards http://rs.tdwg.org/ontology/voc/ Geo.owl Species.owl Vegetation.owl Geography.owl Water.owl Ecosystem.owl Marking Up Metadata and Data Dave Thau PASI, Costa Rica, June 7, 2008
Metadata and Data with OBOE Example data set: the abundance of Trapeziid crabs in coral colonies (Stewart et al. 2006) Dave Thau PASI, Costa Rica, June 7, 2008
Metadata and Data with OBOE hasMeasurement ofCharacteristic : Observation : Measurement : TaxonName usesStandard ofEntity hasValue : TaxonCatalog : Organism “Acroporahyacinthus” Two measurements of the organism: the name … Dave Thau PASI, Costa Rica, June 7, 2008
Metadata and Data with OBOE hasMeasurement ofCharacteristic : Observation : Measurement : TaxonName usesStandard ofEntity hasValue : TaxonCatalog : Organism “Acroporahyacinthus” hasMeasurement ofCharacteristic : Measurement : Height hasPrecision usesStandard hasValue : Meter “0.01” “1.25” Two measurements of the organism: the name … height Dave Thau PASI, Costa Rica, June 7, 2008
Metadata and Data with OBOE hasMeasurement ofCharacteristic : Observation : Measurement : TaxonName hasValue usesStandard ofEntity : TaxonCatalog “Acroporahyacinthus” : Organism hasMeasurement ofCharacteristic : Measurement : Height hasContext hasPrecision hasValue usesStandard : Meter “0.01” “1.25” ofCharacteristic hasMeasurement : Observation : Measurement : TaxonName hasValue usesStandard : TaxonCatalog “Trapeziidcrab” hasMeasurement ofCharacteristic : Measurement : Abundance hasValue usesStandard : Individual “11” Dave Thau PASI, Costa Rica, June 7, 2008
Data Integration with OBOE (a) hasMeasurement ofCharacteristic : Observation : Measurement : Diameter usesStandard ofEntity hasPrecision hasValue : Meter : Coral “0.01” “1.25” “10” “320” : Animal : Centimeter hasPrecision hasValue usesStandard ofEntity (b) hasMeasurement ofCharacteristic : Observation : Measurement : ColonyDiamater Integration of data sets given their observation semantics Dave Thau PASI, Costa Rica, June 7, 2008
Data Integration with OBOE (a) hasMeasurement ofCharacteristic : Observation : Measurement : Diameter usesStandard ofEntity hasPrecision hasValue : Meter : Coral “0.01” “1.25” hasDimension : Length is-a is-a hasDimension “10” “320” : Animal : Centimeter hasPrecision hasValue usesStandard ofEntity (b) hasMeasurement ofCharacteristic : Observation : Measurement : ColonyDiamater Integration involves data set observation structures Dave Thau PASI, Costa Rica, June 7, 2008
Data Integration with OBOE (c) hasMeasurement ofCharacteristic : Observation : Measurement : Diameter usesStandard ofEntity hasPrecision hasValue : Meter (a) : Animal “0.1” “1.3” (b) “3.2” And then applying appropriate conversions, etc. Dave Thau PASI, Costa Rica, June 7, 2008
Marking up Analyses • Scientific Workflow Systems help: • Make analyses reproducible • Make parts of analyses reusable • But… • 100’s of workflows and templates • 1000’s of actors (e.g. actors for web services, data analytics, …) • Need to find what you want Dave Thau PASI, Costa Rica, June 7, 2008
Semantic Type Annotation in Kepler Component input and output port annotation Each port can be annotated with multiple terms from multiple ontologies Annotations are stored within the actor metadata Dave Thau PASI, Costa Rica, June 7, 2008
Chapter III Summary • Taxonomies and partonomies are useful but limiting • We saw a couple of ontologies for • Representing a domain • Describing data • Again, the focus is always on discovery, integration and reuse Dave Thau PASI, Costa Rica, June 7, 2008
Tools • For RDF: • Simile : simile.mit.edu - nice RDF tools • For OWL: • Protégé : protege.stanford.edu • For reasoning: • Pellet: http://www.mindswap.org/2003/pellet/ • Jena: http://jena.sourceforge.net/inference/ Dave Thau PASI, Costa Rica, June 7, 2008
Protégé Dave Thau PASI, Costa Rica, June 7, 2008
OWLViz Tab Dave Thau PASI, Costa Rica, June 7, 2008
Summing Up • Ontologies are useful for • Data discovery • Data integration • Terminology regulation • Analysis Reuse • Ontology in ecology and biodiversity is just getting started Dave Thau PASI, Costa Rica, June 7, 2008
Lastly: Back to the Goals • Learn about ontology successes • Learn basic terminology / buzz words • Get a sense for ontology development • See how and where they apply to ecology and biodiversity studies • Learn what remains to be done • Bottom line: A LOT! Dave Thau PASI, Costa Rica, June 7, 2008
Some References Practical guides/references • Protégé. Open source ontology editor. http://protege.stanford.edu/ • CO-ODE. Various resources on ontologies, tutorials, best-practices, etc. http://www.co-ode.org/ • W3C Semantic Web Activity. Various pointers, standardization efforts, etc. http://www.w3.org/2001/sw/ • OWL Resources: OWL-Guide (http://www.w3.org/TR/owl-guide/), OWL-Reference (http://www.w3.org/TR/owl-ref/) • Pizza Tutorials.http://www.co-ode.org/resources/tutorials/ Academic Papers/Collections • Bard and Rhee. Ontologies in biology: Design, applications and future challenges. Nature Reviews, Genetics, vol. 5, 2004. • The Gene Ontology Consortium. Gene Ontology: tool for the unification of biology. Nature Genet. 25: 25-29, 2000 • Barry Smith, http://ontology.buffalo.edu/smith/, various papers on ontologies (even for ecology) • Sowa, J. F. Knowledge Representation: Logical, Philosophical, and Computational Foundations. PWS Publishing Co., Boston, 1999. • Baader F., Calvanese D., McGuinness D., Nardi D., and Patel-Schneider P. The Description Logic Handbook: Theory, Implementation, and Applications. Cambridge Univ. Press, 2003. • Thomas R. Gruber. Toward principles for the design of ontologies used for knowledge sharing. In Formal Ontology in Conceptual Analysis and Knowledge Representation, Kluwer Academic Publishers, 1993. • Nicola Guarino. Formal ontology and information systems. In Proc. of Formal Ontology in Information Systems, IOS Press, pp. 3-15, 1998. Dave Thau PASI, Costa Rica, June 7, 2008
Exercise: Ontology Engineering 1. Choose the specific “domain” you want to tackle: • Based on a specific collection of data that you are familiar with • Based on an existing project/experiment you are working on or understand • Focus on use: data set markup or describing a domain 2. Define (a part of) an ontology for the domain • Start with the classes • Then arrange into an isa hierarchy • Then add properties between the classes • If you feel mighty, try some property constraints 3. Capture your ontology on whiteboard, poster board, or cmap tool as one or more diagram Transitive Inverse Functional Inverse Functional All Apples have a color Some Apples have a color All apples are red Some apples are red Crabs have 2 pincers Dave Thau PASI, Costa Rica, June 7, 2008