230 likes | 338 Views
Power of integrative approaches for genomic medicine. Data and Knowledge Base. Integration. Analysis and Understanding. The Challenge
E N D
Power of integrative approaches for genomic medicine Data and Knowledge Base Integration Analysis and Understanding
The Challenge Flood of high throughput biological data genomic sequence, global mRNA expression profiles, copy number and LOH, epigenetic data, protein level and modification status, metabolite profiles Proliferation of tools Databases, visualization, and analysis Access, analyze, visualize each data type separately The Need To gain insights through integrative studies of all data types To overcome the difficulty of getting tools to work together Computational Genomics 3
Hypothetical use case - no single tool suffices iii ComparativeMarkerSelection Show network Network 1 Load compendium Show module map Arrests G2/M 3 i Compendium 2 5 Show Chromosome Extractmodule 4 Expand +1 (include neighbors) Idea ii Expression 6 GSEA test enrichment Add Transcription Factor track from UCSC Learn p53 site/score on promoter Added to GenePattern iv Pathwayactivation 8 Alterations v Test for similarity of p53 and gene location 7 Looks close to p53 site atcgcgtttattcgataagg atcgcgttttttcgataagg Conclusion vi GenePattern Cytoscape IGV/UCSC Genomica CMAP 4
Actual use case – Chang lab @Stanford- no single tool suffices 2 ComparativeMarkerSelection GEOImporter Expression File Creator GenePattern Cytoscape IGV/UCSC Genomica MSigDB Compendium Load Show module map Download Gene lists Network Check for enrichment Add Transcription Factor track from UCSC Check for common TFs Visualize on protein network Expression Conclusion 6
9 steps, 5 tools, 6 transitions 1 GenePattern 2 Cytoscape 4 3 Genomica 5 6 MSigDB 7 UCSC 8 9 7
Stanford’s data types • Gene Expression • Gene Lists • Sample metadata • Gene Ontology • Protein Network • Copy Number (aCGH)
Here is the real problem • They (Stanford) have the skills to make these transitions • Most labs/biologists can’t do this … 9 Steps, 5 tools, 6 transitions… • 11 Perl scripts to transform, map, extract etc • 14 different file formats, 3+ identifier formats DBP1_required_files liefeld$ ls *.pl AffyID2GeneSymbol.pl generate_res.pl GES_exp.pl gmt2tab.pl acgh2tab.pl grepAffyID2GeneSymbol.pl combinegct.pl symbol2probe.pl extract.pl upcase.pl gct2tab.pl
Self-contained monolithic tool Requires user re-training Re-engineering new tools to fit prescribed architecture Limited flexibility Increasingly hard to maintain, expand over time Possible Solutions • Plug and play cooperative approach • Build on scientists existing skills • A common layer for interoperability • Leverage capabilities of multiple groups and existing tools • Provide access to familiar tools with customary look and feel 10
An online community to share diverse computational tools www.genomespace.org 6 Seed Tools Cytoscape Galaxy GenePattern Genomica IGV UCSC Browser 2 Driving Biological Projects lincRNAs Stem cell circuits Outreach: new tools Outreach: new DPBs
GenomeSpace • A community to share diverse computational tools • Aimed towards non-programming users • Tools maintain native look and feel • Support interoperability through transparent cross-tool data • transfer • Reproducibility, analytic work flows, comprehensive • documentation • Tools retain their identity, original distribution methods, and • use as stand-alone software. www.genomespace.org
GenomeSpace • Six seed tools • Cytoscape • Galaxy • GenePattern • Integrative Genomics Viewer • UCSC Genome Browser • 2 driving biological projects • Regulatory networks in stem cells • Functional role of large non-coding • RNAs www.genomespace.org
High Level Architecture Cytoscape GenePattern Galaxy Integrative Genomics Viewer GS Enabled Tools External Data Sources & tools DM DataManagement ATM Analysis & Tool Management Authentication & Authorization Provenance GenomeSpacemetadata
Deployment Architecture Clustered Identity Service (SSO) Amazon GS Clients Clustered CDK GS UI Analysis Task Manager (ATM) IGV CDK REST Genomica CDK Simple DB Provenance GenePattern CDK External Data Sources (e.g. GEO) Data Manager (DM) Galaxy REST UCSC Browser S3 File transfers 15
Technology Stack(s) Identity Data Manager ATM Jersey Maven OSGI (Virgo) J3ts3t S3 SimpleDB
9 steps, 5 tools, 6 transitions 1 GenePattern 2 Cytoscape 4 3 Genomica 5 6 MSigDB 7 UCSC 8 9 18
State of the Space Today • Core services in place (ATM,DM, Identity) • First transformations implemented • Transformation engine in place • CDK integrated into 4/6 seed tools at dev/test level • First tool analyses loaded • Servers hosted on Amazon, code in BitBucket • 3 pre-beta users trying it out By Mid June • All 4 seed tools updated and in the same environment • 2-4 more format transforms • CDK added into UCSC (dev/test) • More tool analyses loaded for Genomica, GenePattern • 5-10 more beta users • In Fall • All 6 seed tools available • All transforms to support the Stanford problem ++ • Provenance roughed in • ## Real users?
Why bother when there is Gaggle? • When we began, Gaggle was RMI only • To support non-java clients (Galaxy, UCSC) we decided to put in JSON Rest-ful services • Our design focus is on bigger data than we felt Gaggle could handle, indexed access (e.g. http range queries) • We were contemplating making a GenomeSpace Gaggle boss • Gaggle – JSON • Makes this much more open for all of our seed tools • Want to make a GS-Gaggle boss even more now • Ideal for supporting more interactive use of the tools • e.g. I have IGV and UCSC open. Send a list of gene names from one to the other to update the display area
GenomeSpace Early Adopters • We are looking for tool developer who wish to integrate with any of the 6 seed tools • At genomespace.org you can get • Architectural docs • CDK (for Java) and Web API doc (for others) • MailingList: Join by emailing gs-tool-developers-join@broadinstitute.org http://www.genomespace.org/adopters
GenomeSpace Team Jill Mesirov, Principal Investigator Michael Reich, Director Cancer Informatics Development Cytoscape Mike Smoot, Johannes Ruscheinski Galaxy Greg Von Kuster GenePattern Peter Carr, Thorin Tabor Genomica Gil Ben-Artzi IGV Jim Robinson UCSC Genome Browser Galt Barber Chang Lab Howard Chang Kun Qu Regev Lab Aviv Regev Maxim Artomov GenomeSpace Development Jared Nedzel Marco Ocana Eliot Polk