360 likes | 559 Views
GenePattern caBIG Adaptor Overview. Session Date: 1/26/05 Session Length: 30 minutes Target Audience: caArray users from the caBIG community Trainer: Ted Liefeld Senior Software Architect Cancer Informatics Broad Institute of MIT and Harvard.
E N D
GenePattern caBIG Adaptor Overview Session Date: 1/26/05 Session Length: 30 minutes Target Audience: caArray users from the caBIG community Trainer: Ted Liefeld Senior Software Architect Cancer Informatics Broad Institute of MIT and Harvard
Session Details:Session Objectives • Overview of GenePattern • caArrayImportViewer & the gct result file format • Using the caArrayImportViewer from a GenePattern server • Using caArrayImportViewer directly from a caBIG CVS repository directory
Session Details:Lesson Plan • Lesson 1: Introduction to GenePattern • Lesson 2: Introduction to caArrayImportViewer and gct files • Lesson 3: Running caArrayImportViewer in GenePattern • Lesson 4: Running caArrayImportViewer from the caBIG CVS repository
Introduction to GenePattern • In this Lesson, we will: • Review GenePattern
GenePattern: A platform for integrative genomics Pipeline Environment Graphical Environment Module Repository remote data source PCA KNN PCA WV Threshold impose a baseline and a ceiling Bicluster NMF SVM PreprocessDatasetextract breast samples SOM FWER Heat Map Prediction Results Programming Environment GeneNeighborscompute nearest neighbors ofcyclin D1 in breast cells Task Integrator SelectFeaturesColumnsextract ovary samples # source("D:/CGP2003/GenePattern_modules/Golub_et_al_1999.R", echo = TRUE) # GenePattern # # Molecular Classification of Cancer: Class Prediction by Gene Expression # # Summary: This R/GenePattern script implements the supervised prediction metho # in Golub et al 1999, Science 286:531-537 (1999). # Load and set up GenePattern commands and server source("http://wilkins.wi.mit.edu:7070/gp/GenePattern.R", echo = FALSE, print.ev server <- SOAPServer("http://wilkins.wi.mit.edu", "/axis/servlet/AxisServlet", 7 source(paste("http://", server@host, ":", server@port, "/gp/getAllTaskWrappers.j # Neighborhood analysis MS.out <- MarkerSelection("data.filename" = "http://www-genome.wi.mit.edu/mpr/pu "class.filename" =“” "pred.results.file" = "pred.results", "data.results.file" = "data.results", "num.permutations" = "25", file.show(MS.out$pred.results) file.show(MS.out$data.results.gct) data <- read.table(MS.out$pred.results, header=T, sep="\t", skip=14) SelectFeaturesRowsget expression data for breast neighbors in ovary cells GenePattern HeatMapViewerproject data as a heat map Analysis Task Manager Marker Selection Analysis Task WV Analysis Task SOM Analysis Task Transpose Analysis Task
GenePattern Module Repository GenePattern Module Repository User’s GenePattern installation • Modules are hosted on the Broad Institute’s GenePattern Module Repository • Users download modules from the module repository onto their own GP server • Users check for new and updated modules and install them automatically
~65 GenePattern Modules (1/06) External modules adapted from: Bioconductor MeV (TIGR) Fred Hutchinson Cancer Reearch Center • Clustering • SOM, Hierarchical, Consensus • Prediction • kNN, Weighted Voting,SVM • Proteomics • AreaChange, CompareSpectra, LocatePeaks, mzXMLToCsv, ProteoArray, etc. • Marker Selection • Class Neighbors, Gene Neighbors, FWER, Q-value, FDR • Preprocessing/Utilities • Threshold, Variation Filter, MAGE-ML, GEO Download, Transpose, etc. • Statistical Methods • Missing value imputation, Kolomogorov-Smirnov score, NMF, PCA • Visualizers • Heat Map, Hierarchical Clustering, SOM, PCA, Feature Summary, Prediction Results, Gene List Significance • Annotation • GeneCruiser, Affymetrix Chip Probe Conversion
Graphical Environment Features • Object Browser • organizes data and result files • allows easy manipulation of local and remote files • Analysis UI • provides a simple, flexible interface for launching analyses and visualizations • documents parameters and input types • Maintains history of analyses • Automatically determine which modules you can run a file on.
GenePattern Pipeline Environment remote data source[http://research.dfci.harvard.edu/justin/demo/nci60.res] Threshold impose a baseline and a ceiling PreprocessDatasetextract breast samples GeneNeighborscompute nearest neighbors ofcyclin D1 in breast cells SelectFeaturesColumnsextract ovary samples SelectFeaturesRowsget expression data for breast neighbors in ovary cells HeatMapViewerproject data as a heat map
Pipeline Use Cases Create a workflow that runs an analysis automatically Run the same analysis over different data sets Encapsulate the algorithms and parameters you used for a method so it can be remembered later Share a pipeline with another researcher who wants to reproduce your results
Task Integrator Features • Add tasks and visualizers without writing code, via a Web-based form • Modules can be written in any language • Once added, modules are usable by other users of a GenePattern server • Edits to modules are automatically versioned, so a pipeline can specify which version of a module to run
GenePattern Programming Language Environment • Users can run any module or pipeline as a routine call in a programming language. • GenePattern server synchronizes programming libraries with current available modules. • Any pipeline can be converted to equivalent code. • Available languages are Java, MATLAB, and R.
GenePattern Architecture • Standalone • Desktop • Laptop • Client/server • Single server • Compute grid (eg. LSF)
GenePattern Users • Over 1850 registered users (1/06) in 450+ organizations in 53 countries • Used as courseware for MIT Materials Science classes (wrapping fortran code on linux cluster) 3.320 Atomistic Modeling of Materials 3.021j - Introduction to Modeling and Simulation • Training – Over 200 users have attended GenePattern workshops
Introduction:Any Questions? • GenePattern • Module Repository • Graphical Environment • Pipeline Environment • Task Integrator • Architecture
caArrayImportViewer and gct files • In this Lesson, we will: • Learn what caArrayImportViewer retrieves from caArray • Learn about the gct file format
caArrayImportViewer • A GenePattern module that can retrieve data from caArray • Implemented in java • Runnable from the command line • Allows user to select DerivedBioAssays from an experiment and write them to a GCT file
Derived BioAssays • Def’n (from MGED) • “A bioassay is a single step within a microarray experiment. There are 3 types of bioassays. A physical bioassay correspond to wet-lab microarray experimental step. A measured bioassays corresponds to a situation after feature extraction has been performed. A derived bioassay corresponds to data processing experimental steps. “ • caArrayImportViewer only retrieves Derived Bioassays • Typically we want the ‘signal’ generated from MAS5 or RMA for an Affymetrix array • Physical/measured bioassays not appropriate input for other (existing) GenePattern tasks
caArrayImportViewer parameters • caArrayImportViewer requires you to have in advance • caArray URL • //caarray-mageom-server.nci.nih.gov:8080/ • Username/password to login • Uses caArray-client.jar and other NCICB jar files to securely connect to the caArray instance specified • Does not store or remember login information in any way • User then navigates the GUI (next lesson) to select the experiment, bioassays etc to save to file.
caArrayImportViewer Output – GCT file • Defined in our tutorial http://www.broad.mit.edu/cancer/software/genepattern/tutorial/
GCT File • Tab delimited – 2 header lines, 2 columns of annotation • Very similar to Stanford format cdt file
caArrayImportViewer & gct files:Any Questions? • caArrayImportViewer • Derived BioAssay data • URL and login • gct file format
caArrayImportViewer use in GenePattern • In this lesson, we will use the caArrayImportViewer from a GenePattern server • Launch caArrayImportViewer • Login • Select experiment • Select Derived BioAssays • Select Quantitation Type • Provide file name • Save result file
caArrayImportViewer in GenePatternAny Questions? • In this lesson, we used caArrayImportViewer from a GenePattern server • Launched caArrayImportViewer • Login • Selected experiment • Selected Derived BioAssays • Selected Quantitation Type • Provided file name • Saved result file
Session Review:Questions? • Any questions about anything we have covered in the entire class: • Lesson 1: Introduction to GenePattern • Lesson 2: Introduction to caArrayImportViewer and gct files • Lesson 3: Running caArrayImportViewer in GenePattern • Lesson 4: Running caArrayImportViewer from the caBIG CVS repository
Recommended Follow-On Training for GenePattern • GenePattern Workshops • Offered at the Broad Institute and conferences • Registration and dates at http://www.broad.mit.edu/cancer/software/genepattern/workshop/ • What are the Next Steps (if no additional training is offered) • There are many resources to help users learn the features of GenePattern and communicate with the development team: • email help desk, gp-help@broad.mit.edu • online user forum, http://groups.yahoo.com/group/GenePatternUserForum/ • on-line tutorial, & FAQ (www.genepattern.org)
Additional Questions • For additional assistance please contact • Ted Liefeld, liefeld@broad.mit.edu • gp-help@broad.mit.edu