1 / 27

The Ontrez project at NCBO

The Ontrez project at NCBO. Nigam Shah nigam@stanford.edu. Public data repositories. Around 1100 databases in the NAR’s 2008 database issue. High throughput gene expression data in repositories such as GEO, SMD, Array Express

egil
Download Presentation

The Ontrez project at NCBO

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The Ontrez project at NCBO Nigam Shah nigam@stanford.edu

  2. Public data repositories • Around 1100 databases in the NAR’s 2008 database issue. • High throughput gene expression data in repositories such as GEO, SMD, Array Express • Clinical Trial repositories such as caBIG, TrialBank, clinicaltrials.gov • Guideline repositories such as www.guideline.gov • Image repositories such as BIRN • Observational studies such as Framingham, NHANES, AMCIS.

  3. Database annotation • Ontology based annotation is not as wide-spread as desired • Most annotation is still free-text • Possible reasons: • Lack of a one stop shop for bio-ontologies • Lack of tools to annotate experimental data • Manual  phenote • Automatic  ? • Lack of a sustainable mechanism to create ontology based annotations

  4. Different kinds of annotations ELMO1 expression is altered by mechanical stimuli : : Other experiments : : ELMO1 associated_withactin cytoskeleton organization and biogenesis Expression profiling of cultured bladder smooth muscle cells subjected to repetitive mechanical stimulation for 4 hours. Chronic overdistension results in bladder wall thickening, associated with loss of muscle contractility. Results identify genes whose expression is altered by mechanical stimuli. Low level result metadata summary result annotation Chronic Bladder Overdistension

  5. Annotations as assertions • Annotation = An assertion declaring a relationship b/w a biomedical entity and a type in an ontology. • e.g. p53 <associated_with> cell death • Annotations tell us what the biologists believe to be true (in particular or in general) • Most annotations are based on particular observations and are generalized during interpretation by a biologist/curator. • Semantics of annotations are not always declared apriori (e.g. associated_with, involves)

  6. Annotations as ‘Meta-data’ • Metadata: The text description accompanying a dataset in a database. • Metadata-annotations should be machine processed (and indexed using ontologies) because • The volume is orders of magnitude more than the summary results • These annotations are not stating any biological fact • Hence don’t need a curator to create them • These annotations are to be used to LOCATE datasets accurately as soon as they are available in a public repository • we can not afford to have a curation bottleneck

  7. High level goal • Process the metadata annotations to automatically tag the ‘elements’ in public repositories with as many ontology terms as possible. • For example in case of the GEO dataset 906: • Expression profiling of cultured bladder smooth muscle cells subjected to repetitive mechanical stimulation for 4 hours. Chronic overdistension results in bladder wall thickening, associated with loss of muscle contractility. Results identify genes whose expression is altered by mechanical stimuli. • Gets tagged with: • Expression, Expression of bladder, bladder, smooth, bladder muscle, muscle, smooth muscle, cells, mechanical, mechanical stimulation, stimulation, Chronic, results, bladder overdistension, associated, associated with, with, loss, genes, altered

  8. Tagging [annotating] with ontology terms

  9. Querying the annotation index

  10. What new science do we enable?

  11. New Science enabled • Nature study on image features and gene expression • Correlation b/w protein and gene expression for cancer classification • Correlating gene expression and drug effect information for predicting drug efficacy • Training and testing image processing algorithms

  12. Decoding global gene expression programs in liver cancer by noninvasive imaging Eran Segal, Claude B Sirlin, Clara Ooi, Adam S Adler, Jeremy Gollub, Xin Chen, Bryan K Chan, George R Matcuk, Christopher T Barry, Howard Y Chang & Michael D Kuo Nature Biotechnology 25, 675 - 680 (2007) Published online: 21 May 2007

  13. Correlation of protein and gene expression for the stratification of breast cancer patients

  14. There are 20 other diseases for which this is possible!

  15. TMAD incorporates the NCI Thesaurus ontology for searching tissues in the cancer domain. Image processing researchers can extract images and scores for training and testing classification algorithms.

  16. Current status of the prototype

  17. Ontrez: Target resources

  18. Where can we go? • Become a service for ‘annotating’ biomedical text. • People send us text, we send back recognized concepts (may be even relationships) • Given a set of concepts we provide a similarity metric between them • Both these services can be plugged into a variety of community and collaborative annotations tools • Become ‘the one stop shop’ for finding items across a wide variety of resources … • Integrate on the ‘disease’ dimension. Gene cards exist, disease cards don’t • Focus on approx. 15 resources in the next year. • PDB and PLoS are interested

  19. Research questions - 1

  20. Research questions - 2

  21. Credits and collaborations • Clement Jonquet • Nipun Bhatia • Manhong Dai • Fan Meng • Brian Athey • Mark Musen

More Related