1 / 32

Towards interoperability of bio-ontologies or Statistics vs Logic

Towards interoperability of bio-ontologies or Statistics vs Logic. Towards interoperability of bio-ontologies. Part I: Problem How to define meaning? Part II: Solution The meaning of a word is its use in language Part III: Solution The meaning of a word is OWL Part IV: Conclusion .

kasia
Download Presentation

Towards interoperability of bio-ontologies or Statistics vs Logic

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Towards interoperability of bio-ontologiesorStatistics vs Logic

  2. Towards interoperability of bio-ontologies • Part I: ProblemHow to define meaning? • Part II: Solution The meaning of a word is its use in language • Part III: SolutionThe meaning of a word is OWL • Part IV: Conclusion <owl:Class rdf:ID="LifeAndAllTheRest" /> <owl:DatatypeProperty rdf:ID="lifeValue"> <rdfs:domain rdf:resource="#LifeAndAllTheRest" /> <rdfs:range rdf:resource="&xsd;positiveInteger"/> </owl:DatatypeProperty>

  3. Part I: How to define meaning • Some hints: • Aardvark • To Belch • Mermaid‘s

  4. Part I: How to define meaning • Defining concepts is difficult • Aardvark: Ambiguous definition also matching a bee • Belch: Verbal description vs. „just doing it“ • Ambiguity • C = sea: „Big blue wobbly thing that mermaid‘s live in“ • Avoid Negation • Dog = not a cat • Completeness (not part of video) • Johnson forgot „sausage“ in dictionary

  5. Part II: The meaning of a word is its use in language

  6. Use in language = use in PubMed, use in UniProt, use in PDB, … >30.000 3D Structures >1.000.000 Sequences

  7. GoPubMed.org MeshPubMed.org Cohse Textpresso EbiMed Whatizit Termine Vivisimo … BioCreative Textmining can help

  8. Sync data, text and ontology

  9. Apoptosis vs programmed cell death • apoptosis NOT "programmed cell death“ > 120.000 papers • „programmed cell death“ NOT apoptosis: 1609 papers • „programmed cell death (apoptosis)“ 903 papers • „apoptosis (programmed cell death) 455 papers

  10. Rethinking the microprocessor • Stemming: • binds, binding, bind! • Dimerization = dimer? • Organisation = organ! • Missing words: • Text “...a transcription factor that binds...'' = “transcription factor binding'’ • 1/3 of GO terms end with “activity” • Word sense disambiguation: • “Rethinking the microprocessor'' about microRNA

  11. Word sense disambiugation • Tell me who your friends are and I will tell you who you are • Co-occurance graphs and Support Vector Maschines achieve • 80%-95% for word sense disambiguation for terms like development, spindle, nucleus, envelope, … • >80% for gene identification task in BioCreative competition • (Identifying terms much harder though)

  12. Candidate terms • Statistics on text corpus can reveal candidate terms • Composition: membrane inner membrane mitochondrial inner membrane mitochondrial inner membrane peptidate complex “The compositional structure of gene ontology terms” [Ogren et al., 2004] • Systems:Text2onto, Ontolearn

  13. Defining concepts • Caspases are a family of cysteine proteases that cleave proteins after • HIV is a disease that affects the immune system • The liver is the largest internal organ in the body • Small GTPases are monomeric guanine nucleotide-binding proteins • Endocytosis is a process by which cells internalize ... • Endosomes are membrane-bound vesicles • See also:

  14. Part III:The meaning of a word is OWL

  15. Why logic is promising • If all facts are formally defined, we can reason over models • Example Glycogen storage disease: • All metabolic reactions expressed as rules • Facts: Glucose down, pyruvate and lactate up • Inconsistent with model • Which facts can be added to restore consistency? • Alpha-glucosidase malfunctioning (GSD II) • Amylo-alpha-1,6-glucosidase malfunctioning (GSD III)

  16. Long history of computational logic for computational biology • Leibniz (1646-1716) • Lingua universalis and calculus raciocinator • Idea: Reasoning = prime factor computation • concepts=numbers, • basic concepts=prime, • complex concepts=non-prime composed of basics’ primes • Example • animal=2, rational=3, therefore human=2x3=6 • Assuming monkey=10 he concludes: monkey =/= human because neither is 10/6 nor 6/10 divisable. • To prove the usefulness of his calculus, he assumes “those marvelous characteristic numbers” as given

  17. Boole (1815-1864) • “clean beasts (x) are those which both divide the hoof (y) and chew the cud (z)”: x = yz • 1. Division: z = x/y • 2. Development: z = 1/1 xy + 1/0 x(1-y) + 0/1 (1-x)y + 0/0 (1-x)(1-y) = xy + 1/0 x(1-y) + 0 (1-x)y + 0/0 (1-x)(1-y) • 3. Interpretation: Beasts which chew the cud [z] consists of all clean beasts (which also divide the hoof)[xy] together with an indefinite remainder (some, none, or all)[indicated by 0/0] of unclean beasts which do not divide the hoof [(1-x)(1-y)] • Note: No statement about 0 (n/a) and 1/0 (no statement about z)

  18. So, now it is OWL then <owl:Class rdf:ID="LifeAndAllTheRest" /> <owl:DatatypeProperty rdf:ID="lifeValue"> <rdfs:domain rdf:resource="#LifeAndAllTheRest" /> <rdfs:range rdf:resource="&xsd;positiveInteger"/> </owl:DatatypeProperty>

  19. Let‘s get real

  20. OWL is everywhere • 1600 OWL ontologies • Baker et al., Ontology Evaluation – Beauty in the eye of the beholder, Poster, 2005, NCOR inauguration

  21. Implicit vs explicit • Snomed uses only existential restriction • General: • Life scientists make observations and only state facts, as life is too complex to generalise • Computer scientists make abstractions where possible Spackman and Reynoso. Examining SNOMED from the Perspective of Formal Ontological Principles: Some Preliminary Analysis and Observations

  22. Dealing with exceptions • Any logical approach should handle exceptions, as they are the norm in the life sciences • Non-monotonic reasoning: • Every member state of the EU is in Europe, Britain is a member state of the EU, but… • Bird‘s fly, penguins do not, penguins are birds

  23. Dealing with negation • Science is geared towards positive statements, but negative information is often equally important • Journal of negative results in biomedicine • Defining a negative interaction dataset • Select two random proteins • Select two proteins with different localisation • Different types of negation • Explicit: …HFR1 was shown not to interact with phyB… • Implicit: …Kip3 is not known to interact with Kar9…

  24. Dealing with negation • Many different semantics su/a=su/d su/u su/sa sa/u=sa/d=sa/a su/su u/a=u/d=u/sa sa/su=sa/sa u/su=u/u d/su=d/u=d/a=d/d=d/sa a/su=a/u=a/a=a/d=a/sa

  25. Reasoning • GONG Example (Mike Bada et al.): • glycosaminoglycan biosynthesis and heparin biosynthesis were unrelated GO ocncepts • Using formal reasoning and other ontologies containing heparin is-a glycosaminoglycan infer automatically • Heparin biosynthesis is-a glycosaminoglycan biosynthesis • But why not use textmining? • Robert: If X is-a Y and there are concept XZ and YZ, then suggest that Z is-a YZ • Yves: Sugar - sugar phosphotransferase - Phosphotransferase

  26. Part IV:Conclusions

  27. Link text and ontologies • Textmining is difficult. • Thus, let authors write abstracts for maschines!(Force authors to submit data) • Dietrich: Cashew prize Kuhn et al., DILS2006

  28. But make it simple!

  29. Beware • Both, statistical and logical approaches organise knowledge as a hierarchy • But hierarchies are just means to structure data in a continuous space

  30. Bioinformatics = Biological + Informatics

  31. Bioinformatics = Biological + Informatics - Logical

  32. Towards inter-operability • Retrieval • Linking data, text, and ontologies semi-automatically • Evaluating ontologies on data/text, evaluating data/text on ontologies • Generating ontologies semi-automatically from text and data • Representation • Zebrafish vs Drosophila vs mouse anatomy • „Benign dictator vs UN“ • Relation ontology, meta data • Formats: XML, OBO, OWL, … • Reasoning • Need? Expressiveness DL, rules, non-monotonic reasoning, negation, closed/open world, …

More Related