1 / 42

MCW Department of Physiology

Accelerating Candidate Gene Discovery through Ontological Indexing of Large Scale Data Repositories Simon Twigger, Ph.D. MCW Department of Physiology. Human & Molecular Genetics Center. http://rgd.mcw.edu. Meet the client. Rat researchers ask.

Download Presentation

MCW Department of Physiology

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.


Presentation Transcript

  1. Accelerating Candidate Gene Discovery through Ontological Indexing of Large Scale Data RepositoriesSimon Twigger, Ph.D.

  2. MCW Department of Physiology Human & Molecular Genetics Center http://rgd.mcw.edu

  3. Meet the client

  4. Rat researchers ask... Has anyone done any expression studies using congenic rats? What tissue is this gene expressed in? What expression data is known for SD (aka SD/NHsd, Harlan Sprague Dawley, Sprague Dawley) rats? Are any of these genes associated with my phenotype? Has this gene been seen in the brain? What rat expression studies have been done on Mammary Cancer(aka breast neoplasms/breast cancer/cancer of the breast, breast carcinoma...)?

  5. Really important piece of data... Biological Data Warehouse

  6. Where, what, when? Problem... +

  7. Where, what, when? (one) Solution? +

  8. How to create the index?

  9. Examine One by One? Analysis of anterior pituitary glands of ACI, Copenhagen, and Brown Norway males following treatment with the synthetic estrogen diethylstilbestrol (DES). Copenhagen = COP Brown Norway = BN

  10. NCBO ontology services http://bioportal.bioontology.org/annotator

  11. Open Biomedical Annotator http://www.bioontology.org/wiki/index.php/Annotator_Web_service

  12. Initial Ontologies & Workflow • Datasets • Series • Samples

  13. Phase 1Small Scale Testing

  14. Initial Test Load: 30 Rat Dataset records (GDS) out of 236 32 Series records (GSE) out of 750 587 Sample records (GSM) out of 7288 RubyOnRails web application to view data http://gminer.mcw.edu/

  15. Parallel Annotation Workflow

  16. October August Concurrent Annotation Results

  17. Cloud-enabled Workflow?

  18. Results/Demo

  19. Initial Observations - Synonyms DES Ept6 Searching with synonyms can be great: Ept6 = ACI.COP-(D3Mgh16-D3Rat119)/Shul DES = Diethylystilbestrol

  20. Initial Observations - Synonyms Searching with synonyms can cause problems: Estrogen-induced pituitary tumorigenesis = EPT Ethanolaminephosphotransferase activity = EPT

  21. Train classifier on real strain phrases? Look for relevant neighboring terms? ...pituitary gland of the ACI, Copenhagen and Brown NorwayRat. ...16 month-old Sprague-Dawleyfemales that... ...expression data from female SDrats with access to lifelong... ...Strain or Line: F344/NCrl ... ...dahl Salt-sensitive (S) rat and S.R(9)x3Acongenic rat.... ...kidneys from Dahl salt-sensitivemales... Initial Observations 2 Rat Strain symbols AT, AN, AS, A, B, CD G (1000 x g) C (˚C) TX (Abbreviation for Texas)

  22. Initial Observations - Anatomy Potential synonyms that could be added to MA

  23. Phase 2All Rat Affy Samples1 ontology (Anatomy)

  24. Larger scale data load 0 Rat Dataset records (GDS) 479 Series records (GSE) 12,012 Sample records (GSM)

  25. Targeted Indexing Mouse Adult Gross Anatomy Ontology

  26. Results/Demo

  27. Tm2d1 RGD1306410 Svs4 Hbb Scgb2a1 Alb Linking annotations to data

  28. Human (U133, U133v2.), Mouse (430, U74, U95) and Rat (U34a/b/c, 230, 230v2) Tm2d1 RGD1306410 Svs4 62,000 samples x ca. 25,000 genes/sample = 1.5B data points Hbb Scgb2a1 Alb Linking annotations to data + Hbbis_expressed_in rat kidney Tm2d1is_expressed_in rat kidney

  29. Probeset results on GMiner Gabdr

  30. Probeset results on GMiner

  31. Triple Store OpenRDF Sesame Mouse Anatomy Ontology Probeset to RGD ID Probeset to MA Rat Genes & xrefs Virtuoso Open Source RDF Data integration

  32. Ongoing • Work on term recognition, strains, etc. • Evaluation of Probeset-to-Anatomy results • Curation interface to add additional terms • RDF formats, Triple Store implementation • Integrate Strain and tissue results into RGD

  33. Education & Outreach

  34. Meet the student

  35. You! More knowledge through education = bigger lever! Ontologies Researchers Heavy Scientific Problem

  36. Video #3 is being shot this week

  37. Target is the scientist! • Solve common tasks • Use annotation tools • Evaluate annotations • Intro to specific ontologies • Interview ontology teams • Ideas? • What does your community need? Future Videos

  38. Acknowledgements • Joey Geiger - Development of GMiner • Jennifer Smith - Video creation, data curation • Rajni Nigam - Rat Strain Ontology • Clement Jonquet - NCBO OBA tools • Trish Whetzel - Video script feedback • Mark Musen & NIH Roadmap Initiative - Our Funding!

  39. Links • http://twigger.hmgc.mcw.edu/ncbo/ Project webpage • http://gminer.mcw.edu Web application • http://github.com/mcwbbc/gminerGminer Code • http://github.com/simont/MCW-RDFRDFizer code simont@mcw.edu

More Related