1 / 10

GenePattern

GenePattern. Overview for MAGE-TAB Workshop Ted Liefeld January 24, 2007. a platform for integrative genomics. Module Repository. Pipeline Environment. Client User Interfaces. all_aml_train. all_aml_test. KNN. GSEA. Preprocess. Preprocess. NMF. SVM. SOM Clustering. Class

franciscak
Download Presentation

GenePattern

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. GenePattern Overview for MAGE-TAB Workshop Ted Liefeld January 24, 2007

  2. a platform for integrative genomics Module Repository Pipeline Environment Client User Interfaces all_aml_train all_aml_test KNN GSEA Preprocess Preprocess NMF SVM SOM Clustering Class Neighbors Weighted Voting Cross-Val Weighted Voting Train/Test SOM PCA Desktop SOM Cluster Viewer Marker Selection Viewer Prediction Results Viewer Prediction Results Viewer Module Integrator Golub and Slonim et. al 1999 Web Programming

  3. Features Comprehensive Module Repository • ~90 modules: analysis, visualization, pipelines • Expression, proteomic, sequence, variation (SNP), and whole genome associationdata • Construction of context-sensitive, flexible analytic workflows • Module suites Multiple user interfaces • Desktop client • Web client • Programmatic interfaces to Java, MATLAB, R Automatic Module Integration • Add new modules without writing code • Supports any command line callable code (language independent) Local and Distributed Computing • Laptop • Client/Server • Compute farm • Public server (1/2008) Analytic Reproducibility • Easy, rapid sharing of methodologies via pipelines • Versioning using Life Sciences Identifier (LSID) • Executable history of all sessions • Automatic pipeline generation from result files • Executable research documents Interoperability • caBIG • caArray • caGrid • geWorkbench • Cytoscape

  4. Module Integrator • Add modules and visualizers without writing code • Share custom analysis tasks • Integrate your own or “third-party” tools easily • Add tools to a common repository

  5. LSF Worker Nodes as a Visualization & Analysis Engine Run GenePattern Analyses Portal GenePattern GenePattern SNPViewer visualizer (running as applet) http://www.broad.mit.edu/mmgp

  6. Using MAGE-ML today

  7. MAGE-TAB use tomorrow • Ideally • Be able to automatically find raw/derived bioassay data when parsing MAGE-TAB files • Use MAGE-TAB like our native (tab-delimited) data formats, GCT, RES in (almost) any GenePattern analysis module • Not require user interaction to specify Assays or quantitation types • ? MGED-Ontology for common data transform protocols (eg RMA, MAS5) in addition to free text • Sub-optimal but still good • Have an interactive viewer to convert from MAGE-TAB to a native format (e.g. MAGE-ML import viewer) • Human interaction required…

  8. More MAGE-TAB thoughts • Define structure/format for keeping multiple MAGE-TAB files together • IDF, ADF, SDRF, raw data files -> package together as ZIP? tgz? • Sub directories in the zip? (defined) • Does MAGE-TAB support for multiple Arrays in one file? • Useful & MAGE-ML allows this now (but I don’t like it for automated processing) • E.g. E-GEOD-995.mageml.tgz from ArrayExpress

  9. More MAGE-TAB thoughts • Persistent identifiers • For protocols, samples etc • Allow use of SDRF, data matrix (eg in GP with persistent references to external entities) • Array details, experiment design, etc • Question? • Should we consider MAGE-TAB DAG to record data processing pipelines (provenance - HLA)? • e.g. a protocol for each module execution added to MAGE-TAB file outputs • File growth issues… • Record all analysis for a publication • Add additional SDRF file at each step

  10. Release Information • Initially released in March, 2004 • Current version 3.0, released April 2007 • 3.1 due Feb 08 • Currently 5900+ users, 500+ organizations, ~90 countries Availability • Freely available • Windows, Mac OS, and Unix platforms Resources • http://www.genepattern.org • User workshops, documentation, email help desk, online user forum • Reich et al. (2006) Nature Genetics Collaborations • caBIG • MAGNet NCBC • NCIBI NCBC GenePattern is a winner of the 2005 BioIT World Best Practices Award

More Related