The Ontrez project at NCBO

The Ontrez project at NCBO Nigam Shah nigam@stanford.edu

Public data repositories • Around 1100 databases in the NAR’s 2008 database issue. • High throughput gene expression data in repositories such as GEO, SMD, Array Express • Clinical Trial repositories such as caBIG, TrialBank, clinicaltrials.gov • Guideline repositories such as www.guideline.gov • Image repositories such as BIRN • Observational studies such as Framingham, NHANES, AMCIS.

Database annotation • Ontology based annotation is not as wide-spread as desired • Most annotation is still free-text • Possible reasons: • Lack of a one stop shop for bio-ontologies • Lack of tools to annotate experimental data • Manual  phenote • Automatic  ? • Lack of a sustainable mechanism to create ontology based annotations

Different kinds of annotations ELMO1 expression is altered by mechanical stimuli : : Other experiments : : ELMO1 associated_withactin cytoskeleton organization and biogenesis Expression profiling of cultured bladder smooth muscle cells subjected to repetitive mechanical stimulation for 4 hours. Chronic overdistension results in bladder wall thickening, associated with loss of muscle contractility. Results identify genes whose expression is altered by mechanical stimuli. Low level result metadata summary result annotation Chronic Bladder Overdistension

Annotations as assertions • Annotation = An assertion declaring a relationship b/w a biomedical entity and a type in an ontology. • e.g. p53 <associated_with> cell death • Annotations tell us what the biologists believe to be true (in particular or in general) • Most annotations are based on particular observations and are generalized during interpretation by a biologist/curator. • Semantics of annotations are not always declared apriori (e.g. associated_with, involves)

Annotations as ‘Meta-data’ • Metadata: The text description accompanying a dataset in a database. • Metadata-annotations should be machine processed (and indexed using ontologies) because • The volume is orders of magnitude more than the summary results • These annotations are not stating any biological fact • Hence don’t need a curator to create them • These annotations are to be used to LOCATE datasets accurately as soon as they are available in a public repository • we can not afford to have a curation bottleneck

High level goal • Process the metadata annotations to automatically tag the ‘elements’ in public repositories with as many ontology terms as possible. • For example in case of the GEO dataset 906: • Expression profiling of cultured bladder smooth muscle cells subjected to repetitive mechanical stimulation for 4 hours. Chronic overdistension results in bladder wall thickening, associated with loss of muscle contractility. Results identify genes whose expression is altered by mechanical stimuli. • Gets tagged with: • Expression, Expression of bladder, bladder, smooth, bladder muscle, muscle, smooth muscle, cells, mechanical, mechanical stimulation, stimulation, Chronic, results, bladder overdistension, associated, associated with, with, loss, genes, altered

Tagging [annotating] with ontology terms

Querying the annotation index

What new science do we enable?

New Science enabled • Nature study on image features and gene expression • Correlation b/w protein and gene expression for cancer classification • Correlating gene expression and drug effect information for predicting drug efficacy • Training and testing image processing algorithms

Decoding global gene expression programs in liver cancer by noninvasive imaging Eran Segal, Claude B Sirlin, Clara Ooi, Adam S Adler, Jeremy Gollub, Xin Chen, Bryan K Chan, George R Matcuk, Christopher T Barry, Howard Y Chang & Michael D Kuo Nature Biotechnology 25, 675 - 680 (2007) Published online: 21 May 2007

Correlation of protein and gene expression for the stratification of breast cancer patients

There are 20 other diseases for which this is possible!

TMAD incorporates the NCI Thesaurus ontology for searching tissues in the cancer domain. Image processing researchers can extract images and scores for training and testing classification algorithms.

Current status of the prototype

Ontrez: Target resources

Where can we go? • Become a service for ‘annotating’ biomedical text. • People send us text, we send back recognized concepts (may be even relationships) • Given a set of concepts we provide a similarity metric between them • Both these services can be plugged into a variety of community and collaborative annotations tools • Become ‘the one stop shop’ for finding items across a wide variety of resources … • Integrate on the ‘disease’ dimension. Gene cards exist, disease cards don’t • Focus on approx. 15 resources in the next year. • PDB and PLoS are interested

Research questions - 1

Research questions - 2

Credits and collaborations • Clement Jonquet • Nipun Bhatia • Manhong Dai • Fan Meng • Brian Athey • Mark Musen

The Ontrez project at NCBO

The Ontrez project at NCBO

Presentation Transcript

Health Ontology Mapper NCBO BioPortal Integration

NCBO Science: Current and Future Efforts

Ontrez

NCBO-I2B2 Collaboration Overview and Use Cases

The Project at a Glance...

The SpokenWeb Project at Concordia University

NCBO Driving Biological Project

The SPEAR3 Upgrade Project at SLAC

The 8-Pack Project at NLCTA

The Majorana Project at UW

The SNAP Project at SLAC

NCBO Mayo DevCon July 2008 Introduction

Experiences from the NCBO OBO-to-OWL Mapping Effort

The HITRAP Project at GSI

The AWAKE Project at CERN

NCBO products integration

The Project At A Glance

The OPeNDAP project at HAO

PROJECT “AT THE INTERNATIONAL AIRPORT”

Sneak peek at The Venice Project

The FIDIPRO project at JYFL

NCBO Driving Biological Project