1 / 15

Anatomy ontology evaluation @ Arr ayExpress

Anatomy ontology evaluation @ Arr ayExpress. Helen Parkinson, PhD. Content . ArrayExpress use cases Fuzzy matching of ontology terms Data driven ontology building Wish list. Public/Private. ATLAS. Re-annotate. Summarize. Gene queries. Experiment queries. Submit. Hybs.

kerryn
Download Presentation

Anatomy ontology evaluation @ Arr ayExpress

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Anatomy ontology evaluation @ Arr ayExpress Helen Parkinson, PhD

  2. Content • ArrayExpress use cases • Fuzzy matching of ontology terms • Data driven ontology building • Wish list

  3. Public/Private ATLAS Re-annotate Summarize Gene queries Experiment queries Submit Hybs ArrayExpress: Overview Public Only Cross expt/ species queries Genes

  4. Fuzzy matching of ontology terms – why? • Clean up ArrayExpress OE and synonym tables • OE based integration • Constrain OEs on data entry/validation • Improved searches in repository/DW web interface • Data integration across species, experiments and experimental designs • Automated mapping of free text to ontology terms for data imporrt

  5. Phonetic Matching • Precompute phonetic encodings of all terms in the ontology • Match each target term by comparing these encodings • Soundex: Robert Russell and Margaret Odell (1918), famously described by Donald Knuth • Double Metaphone: Lawrence Philips (2000)‏ • Metaphone: Lawrence Philips ‏ • Most matches are single • Highest success rate

  6. Algorithm comparisons

  7. Percent matches using automated mapping

  8. Failures to match • Species (or Kingdom)-specific terms (e.g. plant anatomy)‏ • Conflated terms (e.g. diseased cell types)‏ • Compound terms (e.g. "cerebral cortex and hypothalamus")‏ • Genuinely missing terms • Esoteric terms less of a priority • Most trivial misspellings, however, were matched • Dirty input data

  9. Implications • Need more terms in some commonly-used ontologies • Synonyms are important • generating less noise • better coverage • Choice of ontology can limit expressivity - this will be frustrating to biologists

  10. Why? • Clean up ArrayExpress OE and synonym tables • Add accessions/DB links to these tables • Constrain OEs on data entry/validation • Improved searches in repository/DW web interface • Generate suggestions for new OE terms • Evaluate domain coverage by a given ontology

  11. ArrayExpress Ontology Development and Future Directions Developing the Ontology • Define Scope: ArrayExpress already has some useful structure given the current database plus rich source of use cases and competency questions. • Build: Ontology Capture: Identify key concepts and relationships within our domain and give explicit definitions to these features: • Middle-out approach – specify core of basic terms then specialise and generalise as required • Mappings – text mining approach to do initial semi-automated mappings to external resources for rapid coverage • Manual mapping for data warehouse data, and selected data sets

  12. ArrayExpress Ontology Development and Future Directions Capture to Code: Definitions and Hierarchy

  13. ArrayExpress Ontology Development and Future Directions Semantic Roadmap • Position of the ArrayExpress Experimental Factor Ontology in the ‘bigger picture’ • Key is orthogonal coverage, reuse of existing resources and shared frameworks Chemical Entities of Biological Interest (ChEBI) NCI Cell Type Ontology Various Species Anatomy Ontologies Common Anatomy Reference Ontology Disease Ontology AE Ontology

  14. Wish list • NOT to build our own anatomy ontology • CARO extension • CARO evaluation • Mapping CARO to relevant multi-species ontologies • Application of CARO to ArrayExpress data • Use of CARO in ArrayExpress tools

  15. Acknowledgments • Anna Farne • Ele Holloway • James Malone • Margus Lukk ArrayExpress Production Team • Helen Parkinson • Tim Rayner • Faisal Rezwan • Eleanor Williams • Mengyao Zhao • Holly Zheng

More Related