270 likes | 434 Views
The Ontrez project at NCBO. Nigam Shah nigam@stanford.edu. Public data repositories. Around 1100 databases in the NAR’s 2008 database issue. High throughput gene expression data in repositories such as GEO, SMD, Array Express
E N D
The Ontrez project at NCBO Nigam Shah nigam@stanford.edu
Public data repositories • Around 1100 databases in the NAR’s 2008 database issue. • High throughput gene expression data in repositories such as GEO, SMD, Array Express • Clinical Trial repositories such as caBIG, TrialBank, clinicaltrials.gov • Guideline repositories such as www.guideline.gov • Image repositories such as BIRN • Observational studies such as Framingham, NHANES, AMCIS.
Database annotation • Ontology based annotation is not as wide-spread as desired • Most annotation is still free-text • Possible reasons: • Lack of a one stop shop for bio-ontologies • Lack of tools to annotate experimental data • Manual phenote • Automatic ? • Lack of a sustainable mechanism to create ontology based annotations
Different kinds of annotations ELMO1 expression is altered by mechanical stimuli : : Other experiments : : ELMO1 associated_withactin cytoskeleton organization and biogenesis Expression profiling of cultured bladder smooth muscle cells subjected to repetitive mechanical stimulation for 4 hours. Chronic overdistension results in bladder wall thickening, associated with loss of muscle contractility. Results identify genes whose expression is altered by mechanical stimuli. Low level result metadata summary result annotation Chronic Bladder Overdistension
Annotations as assertions • Annotation = An assertion declaring a relationship b/w a biomedical entity and a type in an ontology. • e.g. p53 <associated_with> cell death • Annotations tell us what the biologists believe to be true (in particular or in general) • Most annotations are based on particular observations and are generalized during interpretation by a biologist/curator. • Semantics of annotations are not always declared apriori (e.g. associated_with, involves)
Annotations as ‘Meta-data’ • Metadata: The text description accompanying a dataset in a database. • Metadata-annotations should be machine processed (and indexed using ontologies) because • The volume is orders of magnitude more than the summary results • These annotations are not stating any biological fact • Hence don’t need a curator to create them • These annotations are to be used to LOCATE datasets accurately as soon as they are available in a public repository • we can not afford to have a curation bottleneck
High level goal • Process the metadata annotations to automatically tag the ‘elements’ in public repositories with as many ontology terms as possible. • For example in case of the GEO dataset 906: • Expression profiling of cultured bladder smooth muscle cells subjected to repetitive mechanical stimulation for 4 hours. Chronic overdistension results in bladder wall thickening, associated with loss of muscle contractility. Results identify genes whose expression is altered by mechanical stimuli. • Gets tagged with: • Expression, Expression of bladder, bladder, smooth, bladder muscle, muscle, smooth muscle, cells, mechanical, mechanical stimulation, stimulation, Chronic, results, bladder overdistension, associated, associated with, with, loss, genes, altered
New Science enabled • Nature study on image features and gene expression • Correlation b/w protein and gene expression for cancer classification • Correlating gene expression and drug effect information for predicting drug efficacy • Training and testing image processing algorithms
Decoding global gene expression programs in liver cancer by noninvasive imaging Eran Segal, Claude B Sirlin, Clara Ooi, Adam S Adler, Jeremy Gollub, Xin Chen, Bryan K Chan, George R Matcuk, Christopher T Barry, Howard Y Chang & Michael D Kuo Nature Biotechnology 25, 675 - 680 (2007) Published online: 21 May 2007
Correlation of protein and gene expression for the stratification of breast cancer patients
TMAD incorporates the NCI Thesaurus ontology for searching tissues in the cancer domain. Image processing researchers can extract images and scores for training and testing classification algorithms.
Where can we go? • Become a service for ‘annotating’ biomedical text. • People send us text, we send back recognized concepts (may be even relationships) • Given a set of concepts we provide a similarity metric between them • Both these services can be plugged into a variety of community and collaborative annotations tools • Become ‘the one stop shop’ for finding items across a wide variety of resources … • Integrate on the ‘disease’ dimension. Gene cards exist, disease cards don’t • Focus on approx. 15 resources in the next year. • PDB and PLoS are interested
Credits and collaborations • Clement Jonquet • Nipun Bhatia • Manhong Dai • Fan Meng • Brian Athey • Mark Musen