1 / 9

ODD-Genes: Accelerating data-driven scientific discovery

ODD-Genes: Accelerating data-driven scientific discovery. NeSC Review 2003 NeSC 2003-09-30. Introduction. ODD-Genes Background Science enabled by ODD-Genes Automating routine statistical conditioning of highly variable microarray results. Discovering related data sources

Download Presentation

ODD-Genes: Accelerating data-driven scientific discovery

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. ODD-Genes:Accelerating data-drivenscientific discovery NeSC Review 2003 NeSC 2003-09-30

  2. Introduction • ODD-Genes Background • Science enabled by ODD-Genes • Automating routine statistical conditioning of highly variable microarray results. • Discovering related data sources • Querying discovered data sources for relevant data • Identifying significant targets for focussed investigation • Caveats & further work

  3. ODD-Genes Background • ODD-Genes is a demonstrator • Demonstrates how Grid technologies enable e-Science, accelerating scientific discovery • SunDCG’s TOG software allows for job submission on remote compute resources • OGSA-DAI provides access, control and discovery of data resources • ODD-Genes used to investigate Wilms Tumour • Routine statistical conditioning of microarray results • Data-driven discovery of novel targets for investigation and potential therapy • Collaborative project • NeSC/EPCC, Edinburgh, UK • Scottish Centre for Genomic Technology and Informatics, Edinburgh, UK (GTI) • Human Genetics Unit at MRC, Western General Hospital, Edinburgh, UK (HGU)

  4. SunDCG – Enabling Routine Statistical Conditioning • Choose analysis to perform • Automates analysis process • Provides predetermined workflow • Can run more than one analysis at a time • Multiple reproducible avenues for investigation • Reduces cost (human, machine), increases availability • TOG enables this by allowing access to HPC resources

  5. SunDCG - Conditioning Results • Results of conditioning can be analysed and investigated • Researcher has potentially several views of data to explore, all presented simultaneously in parallel (cp traditional serialised, manual process) • Researcher can reproduce this initial condition for repeated analyses • Researcher need not perform each step manually and serially, or ask dedicated statistician to do so.

  6. OGSA-DAI - Results Investigation • Multiple views of data • Raw • Heat Map • Cluster Map • Wilms Tumour study takes a new direction • two genes appear significant in early development • Researchers would like more info on these genes…

  7. OGSA-DAI - Data Resource Discovery • OGSA-DAI uses keywords to locate relevant data resources • May return data resources previously unknown to researcher • Researcher selects most interesting data resource to query for information about gene • Researcher selects Mouse atlas – narrow, deep database of spatial gene expression in mice embryonic development • Contrast with GTI database of broad, shallow genome-wide gene expression across multiple organisms, stages & conditions

  8. OGSA-DAI - Data Resource Query • OGSA-DAI returns data from query • Data and annotation displayed • Data contains references to related images • Researcher rapidly moves from numeric and textual description to spatial representation of relevant gene expression • These show that the genes are stem cell markers • Targets for focussed investigation, potential therapy

  9. ODD-Genes Caveats & Further Work • ODD-Genes is a demonstrator • Need to develop production applications for both routine statistical processing and data resource discovery and query • Need to parameterise routine conditioning appropriately to complete automation • ODD-Genes requires GRID infrastructure • Participating researchers need to partner with centres who host application front-ends (or, host the infrastructure themselves) • However, alternatives often proprietary, expensive, less flexible • ODD-Genes requires registration by data-hosts • Critical mass of registered data sources.

More Related