100 likes | 123 Views
GenePattern. Overview for MAGE-TAB Workshop Ted Liefeld January 24, 2007. a platform for integrative genomics. Module Repository. Pipeline Environment. Client User Interfaces. all_aml_train. all_aml_test. KNN. GSEA. Preprocess. Preprocess. NMF. SVM. SOM Clustering. Class
E N D
GenePattern Overview for MAGE-TAB Workshop Ted Liefeld January 24, 2007
a platform for integrative genomics Module Repository Pipeline Environment Client User Interfaces all_aml_train all_aml_test KNN GSEA Preprocess Preprocess NMF SVM SOM Clustering Class Neighbors Weighted Voting Cross-Val Weighted Voting Train/Test SOM PCA Desktop SOM Cluster Viewer Marker Selection Viewer Prediction Results Viewer Prediction Results Viewer Module Integrator Golub and Slonim et. al 1999 Web Programming
Features Comprehensive Module Repository • ~90 modules: analysis, visualization, pipelines • Expression, proteomic, sequence, variation (SNP), and whole genome associationdata • Construction of context-sensitive, flexible analytic workflows • Module suites Multiple user interfaces • Desktop client • Web client • Programmatic interfaces to Java, MATLAB, R Automatic Module Integration • Add new modules without writing code • Supports any command line callable code (language independent) Local and Distributed Computing • Laptop • Client/Server • Compute farm • Public server (1/2008) Analytic Reproducibility • Easy, rapid sharing of methodologies via pipelines • Versioning using Life Sciences Identifier (LSID) • Executable history of all sessions • Automatic pipeline generation from result files • Executable research documents Interoperability • caBIG • caArray • caGrid • geWorkbench • Cytoscape
Module Integrator • Add modules and visualizers without writing code • Share custom analysis tasks • Integrate your own or “third-party” tools easily • Add tools to a common repository
LSF Worker Nodes as a Visualization & Analysis Engine Run GenePattern Analyses Portal GenePattern GenePattern SNPViewer visualizer (running as applet) http://www.broad.mit.edu/mmgp
MAGE-TAB use tomorrow • Ideally • Be able to automatically find raw/derived bioassay data when parsing MAGE-TAB files • Use MAGE-TAB like our native (tab-delimited) data formats, GCT, RES in (almost) any GenePattern analysis module • Not require user interaction to specify Assays or quantitation types • ? MGED-Ontology for common data transform protocols (eg RMA, MAS5) in addition to free text • Sub-optimal but still good • Have an interactive viewer to convert from MAGE-TAB to a native format (e.g. MAGE-ML import viewer) • Human interaction required…
More MAGE-TAB thoughts • Define structure/format for keeping multiple MAGE-TAB files together • IDF, ADF, SDRF, raw data files -> package together as ZIP? tgz? • Sub directories in the zip? (defined) • Does MAGE-TAB support for multiple Arrays in one file? • Useful & MAGE-ML allows this now (but I don’t like it for automated processing) • E.g. E-GEOD-995.mageml.tgz from ArrayExpress
More MAGE-TAB thoughts • Persistent identifiers • For protocols, samples etc • Allow use of SDRF, data matrix (eg in GP with persistent references to external entities) • Array details, experiment design, etc • Question? • Should we consider MAGE-TAB DAG to record data processing pipelines (provenance - HLA)? • e.g. a protocol for each module execution added to MAGE-TAB file outputs • File growth issues… • Record all analysis for a publication • Add additional SDRF file at each step
Release Information • Initially released in March, 2004 • Current version 3.0, released April 2007 • 3.1 due Feb 08 • Currently 5900+ users, 500+ organizations, ~90 countries Availability • Freely available • Windows, Mac OS, and Unix platforms Resources • http://www.genepattern.org • User workshops, documentation, email help desk, online user forum • Reich et al. (2006) Nature Genetics Collaborations • caBIG • MAGNet NCBC • NCIBI NCBC GenePattern is a winner of the 2005 BioIT World Best Practices Award