1 / 36

GenePattern caBIG Adaptor Overview

GenePattern caBIG Adaptor Overview. Session Date: 1/26/05 Session Length: 30 minutes Target Audience: caArray users from the caBIG community Trainer: Ted Liefeld Senior Software Architect Cancer Informatics Broad Institute of MIT and Harvard.

astro
Download Presentation

GenePattern caBIG Adaptor Overview

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. GenePattern caBIG Adaptor Overview Session Date: 1/26/05 Session Length: 30 minutes Target Audience: caArray users from the caBIG community Trainer: Ted Liefeld Senior Software Architect Cancer Informatics Broad Institute of MIT and Harvard

  2. Session Details:Session Objectives • Overview of GenePattern • caArrayImportViewer & the gct result file format • Using the caArrayImportViewer from a GenePattern server • Using caArrayImportViewer directly from a caBIG CVS repository directory

  3. Session Details:Lesson Plan • Lesson 1: Introduction to GenePattern • Lesson 2: Introduction to caArrayImportViewer and gct files • Lesson 3: Running caArrayImportViewer in GenePattern • Lesson 4: Running caArrayImportViewer from the caBIG CVS repository

  4. Introduction to GenePattern • In this Lesson, we will: • Review GenePattern

  5. GenePattern: A platform for integrative genomics Pipeline Environment Graphical Environment Module Repository remote data source PCA KNN PCA WV Threshold impose a baseline and a ceiling Bicluster NMF SVM PreprocessDatasetextract breast samples SOM FWER Heat Map Prediction Results Programming Environment GeneNeighborscompute nearest neighbors ofcyclin D1 in breast cells Task Integrator SelectFeaturesColumnsextract ovary samples # source("D:/CGP2003/GenePattern_modules/Golub_et_al_1999.R", echo = TRUE) # GenePattern # # Molecular Classification of Cancer: Class Prediction by Gene Expression # # Summary: This R/GenePattern script implements the supervised prediction metho # in Golub et al 1999, Science 286:531-537 (1999). # Load and set up GenePattern commands and server source("http://wilkins.wi.mit.edu:7070/gp/GenePattern.R", echo = FALSE, print.ev server <- SOAPServer("http://wilkins.wi.mit.edu", "/axis/servlet/AxisServlet", 7 source(paste("http://", server@host, ":", server@port, "/gp/getAllTaskWrappers.j # Neighborhood analysis MS.out <- MarkerSelection("data.filename" = "http://www-genome.wi.mit.edu/mpr/pu "class.filename" =“” "pred.results.file" = "pred.results", "data.results.file" = "data.results", "num.permutations" = "25", file.show(MS.out$pred.results) file.show(MS.out$data.results.gct) data <- read.table(MS.out$pred.results, header=T, sep="\t", skip=14) SelectFeaturesRowsget expression data for breast neighbors in ovary cells GenePattern HeatMapViewerproject data as a heat map Analysis Task Manager Marker Selection Analysis Task WV Analysis Task SOM Analysis Task Transpose Analysis Task

  6. GenePattern Module Repository GenePattern Module Repository User’s GenePattern installation • Modules are hosted on the Broad Institute’s GenePattern Module Repository • Users download modules from the module repository onto their own GP server • Users check for new and updated modules and install them automatically

  7. ~65 GenePattern Modules (1/06) External modules adapted from: Bioconductor MeV (TIGR) Fred Hutchinson Cancer Reearch Center • Clustering • SOM, Hierarchical, Consensus • Prediction • kNN, Weighted Voting,SVM • Proteomics • AreaChange, CompareSpectra, LocatePeaks, mzXMLToCsv, ProteoArray, etc. • Marker Selection • Class Neighbors, Gene Neighbors, FWER, Q-value, FDR • Preprocessing/Utilities • Threshold, Variation Filter, MAGE-ML, GEO Download, Transpose, etc. • Statistical Methods • Missing value imputation, Kolomogorov-Smirnov score, NMF, PCA • Visualizers • Heat Map, Hierarchical Clustering, SOM, PCA, Feature Summary, Prediction Results, Gene List Significance • Annotation • GeneCruiser, Affymetrix Chip Probe Conversion

  8. GenePattern Graphical Environment

  9. Graphical Environment Features • Object Browser • organizes data and result files • allows easy manipulation of local and remote files • Analysis UI • provides a simple, flexible interface for launching analyses and visualizations • documents parameters and input types • Maintains history of analyses • Automatically determine which modules you can run a file on.

  10. GenePattern Pipeline Environment remote data source[http://research.dfci.harvard.edu/justin/demo/nci60.res] Threshold impose a baseline and a ceiling PreprocessDatasetextract breast samples GeneNeighborscompute nearest neighbors ofcyclin D1 in breast cells SelectFeaturesColumnsextract ovary samples SelectFeaturesRowsget expression data for breast neighbors in ovary cells HeatMapViewerproject data as a heat map

  11. Pipeline Use Cases Create a workflow that runs an analysis automatically Run the same analysis over different data sets Encapsulate the algorithms and parameters you used for a method so it can be remembered later Share a pipeline with another researcher who wants to reproduce your results

  12. Task Integrator Features • Add tasks and visualizers without writing code, via a Web-based form • Modules can be written in any language • Once added, modules are usable by other users of a GenePattern server • Edits to modules are automatically versioned, so a pipeline can specify which version of a module to run

  13. GenePattern Programming Language Environment • Users can run any module or pipeline as a routine call in a programming language. • GenePattern server synchronizes programming libraries with current available modules. • Any pipeline can be converted to equivalent code. • Available languages are Java, MATLAB, and R.

  14. GenePattern Architecture • Standalone • Desktop • Laptop • Client/server • Single server • Compute grid (eg. LSF)

  15. GenePattern Users • Over 1850 registered users (1/06) in 450+ organizations in 53 countries • Used as courseware for MIT Materials Science classes (wrapping fortran code on linux cluster) 3.320  Atomistic Modeling of Materials 3.021j - Introduction to Modeling and Simulation • Training – Over 200 users have attended GenePattern workshops

  16. Introduction:Any Questions? • GenePattern • Module Repository • Graphical Environment • Pipeline Environment • Task Integrator • Architecture

  17. caArrayImportViewer and gct files • In this Lesson, we will: • Learn what caArrayImportViewer retrieves from caArray • Learn about the gct file format

  18. caArrayImportViewer • A GenePattern module that can retrieve data from caArray • Implemented in java • Runnable from the command line • Allows user to select DerivedBioAssays from an experiment and write them to a GCT file

  19. Derived BioAssays • Def’n (from MGED) • “A bioassay is a single step within a microarray experiment. There are 3 types of bioassays. A physical bioassay correspond to wet-lab microarray experimental step. A measured bioassays corresponds to a situation after feature extraction has been performed. A derived bioassay corresponds to data processing experimental steps. “ • caArrayImportViewer only retrieves Derived Bioassays • Typically we want the ‘signal’ generated from MAS5 or RMA for an Affymetrix array • Physical/measured bioassays not appropriate input for other (existing) GenePattern tasks

  20. caArrayImportViewer parameters • caArrayImportViewer requires you to have in advance • caArray URL • //caarray-mageom-server.nci.nih.gov:8080/ • Username/password to login • Uses caArray-client.jar and other NCICB jar files to securely connect to the caArray instance specified • Does not store or remember login information in any way • User then navigates the GUI (next lesson) to select the experiment, bioassays etc to save to file.

  21. caArrayImportViewer Output – GCT file • Defined in our tutorial http://www.broad.mit.edu/cancer/software/genepattern/tutorial/

  22. GCT File • Tab delimited – 2 header lines, 2 columns of annotation • Very similar to Stanford format cdt file

  23. caArrayImportViewer & gct files:Any Questions? • caArrayImportViewer • Derived BioAssay data • URL and login • gct file format

  24. caArrayImportViewer use in GenePattern • In this lesson, we will use the caArrayImportViewer from a GenePattern server • Launch caArrayImportViewer • Login • Select experiment • Select Derived BioAssays • Select Quantitation Type • Provide file name • Save result file

  25. Launching caArrayImportViewer from GenePattern

  26. Login to caArray

  27. Select an experiment

  28. Select Derived BioAssays

  29. Select Quantitation Type

  30. Select Output File name

  31. Review summary

  32. Done.

  33. caArrayImportViewer in GenePatternAny Questions? • In this lesson, we used caArrayImportViewer from a GenePattern server • Launched caArrayImportViewer • Login • Selected experiment • Selected Derived BioAssays • Selected Quantitation Type • Provided file name • Saved result file

  34. Session Review:Questions? • Any questions about anything we have covered in the entire class: • Lesson 1: Introduction to GenePattern • Lesson 2: Introduction to caArrayImportViewer and gct files • Lesson 3: Running caArrayImportViewer in GenePattern • Lesson 4: Running caArrayImportViewer from the caBIG CVS repository

  35. Recommended Follow-On Training for GenePattern • GenePattern Workshops • Offered at the Broad Institute and conferences • Registration and dates at http://www.broad.mit.edu/cancer/software/genepattern/workshop/ • What are the Next Steps (if no additional training is offered) • There are many resources to help users learn the features of GenePattern and communicate with the development team: • email help desk, gp-help@broad.mit.edu • online user forum, http://groups.yahoo.com/group/GenePatternUserForum/ • on-line tutorial, & FAQ (www.genepattern.org)

  36. Additional Questions • For additional assistance please contact • Ted Liefeld, liefeld@broad.mit.edu • gp-help@broad.mit.edu

More Related