1 / 17

Ontology development and use for efficient information input and retrieval

Ontology development and use for efficient information input and retrieval. 1 Alice Clara Augustine, Vijayalakshmi K, Shobha Char, Naveen Sylvester, Mittur N Jagadish; 2 Mike Edgerton

gema
Download Presentation

Ontology development and use for efficient information input and retrieval

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Ontology development and use for efficient information input and retrieval 1Alice Clara Augustine, Vijayalakshmi K, Shobha Char, Naveen Sylvester, Mittur N Jagadish; 2Mike Edgerton 1Monsanto Research Centre. Divn. of Monsanto Holdings Pvt. Ltd. #44/2A, “Vasants Business Park”. Bellary Road, NH-7, Hebbal. Bangalore 560 092, India 2Monsanto Company. 800 North Lindbergh Blvd. Creve Coeur, Missouri 63167 United States Seventh Agricultural Ontology Services Workshop NOVEMEBER 9-10, 2006: BANGALORE (INDIA)

  2. Outline • What is an ontology, desiderata, • What we have done in Monsanto • Tools used, method etc • Example of information retrieval • Challenges

  3. What is an Ontology • a common vocabulary • a shared understanding for people and machines • an established list of standardized terminology for use in indexing and retrieval of information. For our purposes, an ontology is a set of terms encompassing a domain of biology that is organized according to biological relationships.

  4. Why do we need an ontology? There is tremendous variation in the way in which phenotypes, traits, gene expression and protein localization are described. In addition, the nomenclature used to describe anatomy and development varies across taxa. For example: 1. A plant that flowers late can be described in many ways (late flowering, delayed flowering, flowers at 36 days after sowing). 2. Panicle, ear, tassel are all words used to describe an inflorescence. To make meaningful comparisons within and across different databases, we need a shared descriptive language that is uniformly applied to the data. Slide is a courtesy of Plant Ontology Consortium (POC)

  5. Ontology Desiderata Precision (The degree of mutual agreement / strict conformity to a rule or a standard e.g Gene Ontology terms) Flexibility (the quality of being adaptable or variable e.g. Thesaurus) Explicitness unambiguous (indicatinga single clearly definedmeaning: e.g: ‘Flower’) Systematic (characterized by order and planning e.g. Hierarchy)

  6. Objectives of the Ontology team at Monsanto • To establish a company standard Plant Ontology to curate / edit data points in varied databases in a consistent manner. • To incorporate a hierarchy indicating relationships amongst terms • To associate terms with very succinct definitions (glossary) for uniform understanding • To build in thesaurus for variability / adaptability • To ensure the use of this standardized terminology for indexing, entry and retrieval of plant-relevant biological information • To be a part of the public Ontology effort (Plant Ontology Consortium; GO and TO - Gramene) in an attempt to improve both our ontology and the public ontology

  7. Salient features: Monsanto ontology • Terms sourced from literature (dicot and monocot), Plant Biology books, public databases (Gene Ontology, Plant Ontology, Trait Ontology, RiceGenes, MaizedB, Gramene) • Covers plant morphology/anatomy and developmental stages, trait terms & phenotypes • Aligned with terms in Plant Ontology (PO) and Trait ontology (TO) (Shared with POC and Gramene). • Hierarchical organization • Facilitates Thesaurus building for enhanced information retrieval • Includes glossary of terms

  8. Term organization & tools The structure: Directed Acyclic Graph (DAG). The tool: DAG-Edit tool The structure: Biological concepts are represented as a tree. • Branches represent broader terms • Leaves are more specific terms. Like a simple hierarchy, children are not allowed to be their own ancestors; hence cycles are forbidden. However, unlike a simple hierarchy, children terms are allowed to have more than one parent, thus allowing multiple child to parent relationships.

  9. Each 'child term' has a unique relationship to its 'parent term' Instance of (is a, type of): Used to describe the relationship between a child term that represents a specific type of a more general parent term. For example: a silique is a type of fruit; a panicle is an inflorescence. Part of: Used to indicate the relationship between a child term that is a part of the parent term. For example: the ectocarp is a part of the pericarp, which in turn is part of the fruit. Develops from: Used to describe the relationship between a child term that develops from its parent term. For example: a seed coat (testa) develops from the integuments; a leaf develops from a leaf primordium.

  10. A graphical view of some terms and their relationships Instance of Part of term 4 wheelers Two wheelers SUV Car Bus Truck Safari SUMO Sierra Indica V2 Indigo Indiva Leyland TATA truck Tyre Carburetor Clutch plate Shock absorber The terms can be children of other parents too

  11. A graphical view of some terms and their relationships Plant Ontology (PO) Part of anatomy Instance of Develops from Whole plant term organ gene tissue shoot floral organ inflorescence Petal primordium flower Yfg1 tapetum petal anther stamen filament

  12. Examples of queries that will be possible using annotations with POC terms. What genes are expressed in the same tissues or organs as my gene of interest? When and where are homologous genes expressed in different organisms? What genes are expressed in both monocot and dicot flowers? What genes are expressed in maize leaves but NOT in Arabidopsis or rice?

  13. Terms are then used to annotate genes What are the genes that are associated with conferring a pod shatter related trait? What are the genes that are associated with WUE AND increased root mass phenotype? Trait ontology gene Stress related Plant anatomy related Yield related term abiotic abiotic Root related Shoot related Fruit related WUE NUE Cold Pod related AGL8 ALC P5CS Root mass Root volume Root weight Pod weight Pod length Pod shatter The terms can be children of other parents too

  14. Using linguistics for Database interoperability Cross database - querying and reporting

  15. CV – Mediated Benefits of Data Integration, Consistency and Accuracy Information sharing • Common understanding amongst all • Reuse of domain knowledge • Standards for interoperability • A community reference. Database interoperability • A common framework for integration Intelligent interface • Querying, indexing & data capture. Support for knowledge intensive applications • Text extraction. • Decision support. • Knowledge discovery • Hypothesis generation.

  16. Challenges • Common Platform, User Interface & Format of Unified CV • Alignment with external public dB’s (GO / TO / PO) • Heterogeneous sources / heterogeneous IDs. • Inconsistency in annotation (tackling legacy ontologies across databases). • Implementation of a Company Standard Plant Ontology that would ultimately cater to the integration of varied kind of databases with diverse data types.

  17. Acknowledgements Jagadish Mittur Shobha Char Vijay Paranjape BSIB team@MRC, Bangalore The organizers of this conference – in particular Dr. Gauri Salokhe (Information Management Officer) Dr. V. C. Patil (The Organizing Secretary)

More Related