90 likes | 204 Views
ODD-Genes: Accelerating data-driven scientific discovery. NeSC Review 2003 NeSC 2003-09-30. Introduction. ODD-Genes Background Science enabled by ODD-Genes Automating routine statistical conditioning of highly variable microarray results. Discovering related data sources
E N D
ODD-Genes:Accelerating data-drivenscientific discovery NeSC Review 2003 NeSC 2003-09-30
Introduction • ODD-Genes Background • Science enabled by ODD-Genes • Automating routine statistical conditioning of highly variable microarray results. • Discovering related data sources • Querying discovered data sources for relevant data • Identifying significant targets for focussed investigation • Caveats & further work
ODD-Genes Background • ODD-Genes is a demonstrator • Demonstrates how Grid technologies enable e-Science, accelerating scientific discovery • SunDCG’s TOG software allows for job submission on remote compute resources • OGSA-DAI provides access, control and discovery of data resources • ODD-Genes used to investigate Wilms Tumour • Routine statistical conditioning of microarray results • Data-driven discovery of novel targets for investigation and potential therapy • Collaborative project • NeSC/EPCC, Edinburgh, UK • Scottish Centre for Genomic Technology and Informatics, Edinburgh, UK (GTI) • Human Genetics Unit at MRC, Western General Hospital, Edinburgh, UK (HGU)
SunDCG – Enabling Routine Statistical Conditioning • Choose analysis to perform • Automates analysis process • Provides predetermined workflow • Can run more than one analysis at a time • Multiple reproducible avenues for investigation • Reduces cost (human, machine), increases availability • TOG enables this by allowing access to HPC resources
SunDCG - Conditioning Results • Results of conditioning can be analysed and investigated • Researcher has potentially several views of data to explore, all presented simultaneously in parallel (cp traditional serialised, manual process) • Researcher can reproduce this initial condition for repeated analyses • Researcher need not perform each step manually and serially, or ask dedicated statistician to do so.
OGSA-DAI - Results Investigation • Multiple views of data • Raw • Heat Map • Cluster Map • Wilms Tumour study takes a new direction • two genes appear significant in early development • Researchers would like more info on these genes…
OGSA-DAI - Data Resource Discovery • OGSA-DAI uses keywords to locate relevant data resources • May return data resources previously unknown to researcher • Researcher selects most interesting data resource to query for information about gene • Researcher selects Mouse atlas – narrow, deep database of spatial gene expression in mice embryonic development • Contrast with GTI database of broad, shallow genome-wide gene expression across multiple organisms, stages & conditions
OGSA-DAI - Data Resource Query • OGSA-DAI returns data from query • Data and annotation displayed • Data contains references to related images • Researcher rapidly moves from numeric and textual description to spatial representation of relevant gene expression • These show that the genes are stem cell markers • Targets for focussed investigation, potential therapy
ODD-Genes Caveats & Further Work • ODD-Genes is a demonstrator • Need to develop production applications for both routine statistical processing and data resource discovery and query • Need to parameterise routine conditioning appropriately to complete automation • ODD-Genes requires GRID infrastructure • Participating researchers need to partner with centres who host application front-ends (or, host the infrastructure themselves) • However, alternatives often proprietary, expensive, less flexible • ODD-Genes requires registration by data-hosts • Critical mass of registered data sources.