420 likes | 537 Views
Accelerating Candidate Gene Discovery through Ontological Indexing of Large Scale Data Repositories Simon Twigger, Ph.D. MCW Department of Physiology. Human & Molecular Genetics Center. http://rgd.mcw.edu. Meet the client. Rat researchers ask.
E N D
Accelerating Candidate Gene Discovery through Ontological Indexing of Large Scale Data RepositoriesSimon Twigger, Ph.D.
MCW Department of Physiology Human & Molecular Genetics Center http://rgd.mcw.edu
Rat researchers ask... Has anyone done any expression studies using congenic rats? What tissue is this gene expressed in? What expression data is known for SD (aka SD/NHsd, Harlan Sprague Dawley, Sprague Dawley) rats? Are any of these genes associated with my phenotype? Has this gene been seen in the brain? What rat expression studies have been done on Mammary Cancer(aka breast neoplasms/breast cancer/cancer of the breast, breast carcinoma...)?
Really important piece of data... Biological Data Warehouse
Where, what, when? Problem... +
Where, what, when? (one) Solution? +
Examine One by One? Analysis of anterior pituitary glands of ACI, Copenhagen, and Brown Norway males following treatment with the synthetic estrogen diethylstilbestrol (DES). Copenhagen = COP Brown Norway = BN
NCBO ontology services http://bioportal.bioontology.org/annotator
Open Biomedical Annotator http://www.bioontology.org/wiki/index.php/Annotator_Web_service
Initial Ontologies & Workflow • Datasets • Series • Samples
Initial Test Load: 30 Rat Dataset records (GDS) out of 236 32 Series records (GSE) out of 750 587 Sample records (GSM) out of 7288 RubyOnRails web application to view data http://gminer.mcw.edu/
October August Concurrent Annotation Results
Initial Observations - Synonyms DES Ept6 Searching with synonyms can be great: Ept6 = ACI.COP-(D3Mgh16-D3Rat119)/Shul DES = Diethylystilbestrol
Initial Observations - Synonyms Searching with synonyms can cause problems: Estrogen-induced pituitary tumorigenesis = EPT Ethanolaminephosphotransferase activity = EPT
Train classifier on real strain phrases? Look for relevant neighboring terms? ...pituitary gland of the ACI, Copenhagen and Brown NorwayRat. ...16 month-old Sprague-Dawleyfemales that... ...expression data from female SDrats with access to lifelong... ...Strain or Line: F344/NCrl ... ...dahl Salt-sensitive (S) rat and S.R(9)x3Acongenic rat.... ...kidneys from Dahl salt-sensitivemales... Initial Observations 2 Rat Strain symbols AT, AN, AS, A, B, CD G (1000 x g) C (˚C) TX (Abbreviation for Texas)
Initial Observations - Anatomy Potential synonyms that could be added to MA
Larger scale data load 0 Rat Dataset records (GDS) 479 Series records (GSE) 12,012 Sample records (GSM)
Targeted Indexing Mouse Adult Gross Anatomy Ontology
Tm2d1 RGD1306410 Svs4 Hbb Scgb2a1 Alb Linking annotations to data
Human (U133, U133v2.), Mouse (430, U74, U95) and Rat (U34a/b/c, 230, 230v2) Tm2d1 RGD1306410 Svs4 62,000 samples x ca. 25,000 genes/sample = 1.5B data points Hbb Scgb2a1 Alb Linking annotations to data + Hbbis_expressed_in rat kidney Tm2d1is_expressed_in rat kidney
Triple Store OpenRDF Sesame Mouse Anatomy Ontology Probeset to RGD ID Probeset to MA Rat Genes & xrefs Virtuoso Open Source RDF Data integration
Ongoing • Work on term recognition, strains, etc. • Evaluation of Probeset-to-Anatomy results • Curation interface to add additional terms • RDF formats, Triple Store implementation • Integrate Strain and tissue results into RGD
You! More knowledge through education = bigger lever! Ontologies Researchers Heavy Scientific Problem
Target is the scientist! • Solve common tasks • Use annotation tools • Evaluate annotations • Intro to specific ontologies • Interview ontology teams • Ideas? • What does your community need? Future Videos
Acknowledgements • Joey Geiger - Development of GMiner • Jennifer Smith - Video creation, data curation • Rajni Nigam - Rat Strain Ontology • Clement Jonquet - NCBO OBA tools • Trish Whetzel - Video script feedback • Mark Musen & NIH Roadmap Initiative - Our Funding!
Links • http://twigger.hmgc.mcw.edu/ncbo/ Project webpage • http://gminer.mcw.edu Web application • http://github.com/mcwbbc/gminerGminer Code • http://github.com/simont/MCW-RDFRDFizer code simont@mcw.edu