510 likes | 645 Views
GenMAPP. A Software Tool for Analyzing Genome-Scale Data in the Context of Biological Pathways and the Gene Ontology. J. David Gladstone Institute of Cardiovascular Disease UCSF. Overview. Intro to GenMAPP - GenMAPP analysis example Advanced features.
E N D
GenMAPP A Software Tool for Analyzing Genome-Scale Data in the Context of Biological Pathways and the Gene Ontology J. David Gladstone Institute of Cardiovascular Disease UCSF
Overview • Intro to GenMAPP - GenMAPP analysis example • Advanced features
Analyzing Large-Scale Data in the Context of Biological Pathways • Which genes are expressed in my dataset? • What biological processes are important in my data model? • New insight into underlying biology
Analyzing Large-Scale Data in the Context of Biological Pathway • View data in the context of known biology • Rather than seeing which individual genes are changed, pathway analysis emphasizes processes that are changed • Biologists are familiar with pathways, so it is a natural way of sharing data
GenMAPPGene Map Annotator and Pathway Profiler www.GenMAPP.org • Visualize gene expression and other genomic data on biological pathways and other groupings of genes • Global analysis identifies significantly changed processes and functional groups
GenMAPP • Developed in the Conklin lab at Gladstone as an internal tool for dealing with microarray data • Approximately ~12,000 registered users to date • 100% Free!! • Used in 150 - 200 publications • Open source, code available at SourceForge.net • Current version for Windows only (Coded in VB)
SNPs with Predicted Effects http://alto.compbio.ucsf.edu/LS-SNP/
SNPs that Predispose to Myocardial Infarction • 547 acute MI cases; 505 controls • 58 SNPs in 35 genes • => SNPs in 5 different genes showed statistical • association with MI • Study spans 19 pathways • => 4 of 5 hits are on a single pathway Tobin et al, European Heart Journal 2004
SNPs and Myocardial Infarction Tobin et al, European Heart Journal 2004
SNP Data in GenMAPP • Visualization • Distribution of SNPs per gene • Prioritization • Mapping SNP annotations onto pathways • Analysis • Interpreting SNP data in the context of biological • pathways • Future directions • High-resolution visualization of individual SNPs • with the ability to overlay data
MAPPFinder Originally developed as a separate application by Scott Doniger* Gene Ontology terms Experimental Data GenMAPP Pathways MAPPFinder Global comparison of changes in dataset to changes expected by chance Pathways and GO terms with significant changes * Doniger et al. Genome Biology 4(1):R7
GenMAPP Relationship Schema User Dataset (GEX) Gene ID System Criterion Gene ID Blue Affymetrix 1415904_at Pathway MAPP Gene ID System Gene Name Gene ID Lpl EntrezGene 16956
GenMAPP Supported Species Fruit fly Human Mouse Rat Worm Yeast Zebrafish Chicken Dog Cow By request: Chimp Frog Fugu F.rubripes Honey bee Mosquito Pufferfish T.nigroviridis
GenMAPP Supported Gene IDs Species-specific MGI RGD SGD WormBase ZFIN HUGO FlyBase Annotations InterPro EMBL OMIM Pfam Gene Ontology Gene IDs Affymetrix Entrez Gene RefSeq (protein only) Unigene UniProt Ensembl PDB
Available MAPP Archives Contributed MAPPs Hand-curated pathways created at GenMAPP.org or submitted by GenMAPP users. >70 MAPPs for human, mouse and rat. Inferred MAPPs Inferred from human contributed MAPPs, using homology information from Homologene and Ensembl Tissue-Specific MAPPs(human and mouse only) Based on the analysis of two microarray datasets generated by the Genomic Institute of the Novartis Foundation GO Sample MAPPs An partial collection of GO terms formatted as GenMAPP MAPP files, each containing between 100 genes and 300 genes. GO MAPPs are formatted as lists of genes, and do not contain any graphics other than the gene object and the label SGD metabolic MAPPs (yeast only) Derived from the yeast pathways at SGD KEGG converted MAPPs The KEGG Converted MAPPs were converted from the Pathway Resource at the Kyoto Encyclopedia of Genes and Genomes. Download all MAPPs through Downloader in GenMAPP
Input Data • Data in spreadsheet summary format • NO raw data • Data should include metrics that you want to use as cutoffs: • avg signal, ratio, fold, signal quality, p-value, cluster ID, other statistics • Include ALL genes measured in experiment, DO NOT pre-filter • Choose optimal primary gene ID • Custom annotation can be useful (Database includes standard annotation) • Example: Group Comparison Experiment • Fold changes between groups • p-value associated with fold • Average signal per group
GenMAPP Workflow Pre-Processed Formatted Data (with statistics, metrics) Import Data Expression Dataset Manager Drafting Board MAPPBuilder Converter Set Color Criteria Create/Edit/Convert Pathways Drafting Board Display Data on Pathways Gene Ontology analysis Export Pathways to the Web MAPPFinder MAPPSets
Example: Analysis of Complex Time-Course Data Challenges: • How to represent your data in an intuitive manner • How to analyze patterns rather than specific comparisons. Approach: • Set up hypotheses to test • Attach global statistics (e.g. ANOVA) and pattern recognition • Efficiently import in data into GenMAPP • Visualize cluster and time-point data (GenMAPP 2.1-NEW) • Global analysis of pathway/ontologies (MAPPFinder) • Export results to the web/for publication
Set Up Hypotheses to Test Build a MAPP to Test a Hypothesis • Use literature and previous knowledge about the model you are studying to build a list of candidates or pathway. Step 1): • Collect a list of gene IDs • Import them using the MAPPBuilder Function • Organize into a biological pathway along with predictions of expected changes. Salomonis N, et al. Genome Biol. 2005 6:R12–R12.16
Example: Analysis of Complex Time-Course Data Challenges: • How to represent your data in an intuitive manner • How to analyze patterns rather than specific comparisons. Approach: • Set up hypotheses to test • Attach global statistics (e.g. ANOVA) and pattern recognition • Efficiently import in data into GenMAPP • Visualize cluster and time-point data (GenMAPP 2.1-NEW) • Global analysis of pathway/ontologies (MAPPFinder) • Export results to the web/for publication
Dataset: Mouse Uterine Pregnancy Time-Course Experiment Design: • Analyzed 7 time-points (3-8 replicates): • Non-Pregnant mice • 14.5, 16.5 and 17.5 days post fertilization • 18.5 days (term pregnancy) • 6 hours and 24 hours postpartum • Hybridized to mouse 11k Affymetrix arrays Analysis: • Normalized and Adjusted expression (gcrma R) • Performed a global f-test (multtest R) • Hierarchical and partitioned clustering (hopach R) Salomonis N, et al. Genome Biol. 2005 6:R12–R12.16
Hierarchical Ordered Partitioning and Collapsing Hybrid HOPACH Clustering • Use global f-test to filter probeset list down to 3500 entries. • Cluster fold changes for each replicate compared to non-pregnant baseline mean. • Take the top level cluster (left) and re-associate with expression data.
Example: Analysis of Complex Time-Course Data Challenges: • How to represent your data in an intuitive manner • How to analyze patterns rather than specific comparisons. Approach: • Set up hypotheses to test • Attach global statistics (e.g. ANOVA) and pattern recognition • Efficiently import in data into GenMAPP • Visualize cluster and time-point data (GenMAPP 2.1-NEW) • Global analysis of pathway/ontologies (MAPPFinder) • Export results to the web/for publication
GenMAPP Input Import File Design: • Include all probe data (not just filtered) • Include the following columns of data • Multtest p-values • HOPACH clusters • Average group expression values • Fold changes (all relevant pair wise comparisons) • Gene Database system code Salomonis N, et al. Genome Biol. 2005 6:R12–R12.16
GenMAPP Expression Dataset Manager Import Text File into GenMAPP • Tell GenMAPP which columns have non-numeric data. Establishing Rules for Coloring Gene Boxes: • Design criterion that captures any patterns you want to see. • Here we want: • Fold change gradients for up and down regulated for time-point comparisons (Color Sets) • Different colors assigned to each HOPACH cluster Salomonis N, et al. Genome Biol. 2005 6:R12–R12.16
Example: Analysis of Complex Time-Course Data Challenges: • How to represent your data in an intuitive manner • How to analyze patterns rather than specific comparisons. Approach: • Set up hypotheses to test • Attach global statistics (e.g. ANOVA) and pattern recognition • Efficiently import in data into GenMAPP • Visualize cluster and time-point data (GenMAPP 2.1-NEW) • Global analysis of pathway/ontologies (MAPPFinder) • Export results to the web/for publication
Viewing Time-Course Data on MAPPs Method 1) • View criterion, one at a time on pathways of interest.
Viewing Time-Course Data on MAPPs Method 1) • View criterion, one at a time on pathways of interest. Method 2) • View clusters directly on pathway.
Viewing Time-Course Data on MAPPs Method 1) • View criterion, one at a time on pathways of interest. Method 2) • View clusters directly on pathway. Method 3) • View all criterion of interest simultaneously.
Example: Analysis of Complex Time-Course Data Challenges: • How to represent your data in an intuitive manner • How to analyze patterns rather than specific comparisons. Approach: • Set up hypotheses to test • Attach global statistics (e.g. ANOVA) and pattern recognition • Efficiently import in data into GenMAPP • Visualize cluster and time-point data (GenMAPP 2.1-NEW) • Global analysis of pathway/ontologies (MAPPFinder) • Export results to the web/for publication
Advanced Features • Customizing a Gene Database / Creating a Gene Database for a non-supported species => Implement GenMAPP for a novel model species • Create your own pathway MAPPs => Implement GenMAPP for a novel model species => Author novel pathways based on your discoveries • High-throughput export of browsable html pathway archive => For interactive web-display of data on pathway archive International Gene Trap Consortium