1 / 46

Pathway Analysis

Pathway Analysis. Martina Kutmon. Contents. Background on Pathway Analysis Data Analysis with PathVisio Introduction to the Afternoon Session. Biological Pathways. Why Pathway Analysis?. Intuitive to biologists Puts data in biological context More intuitive way of looking at your data

toyah
Download Presentation

Pathway Analysis

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Pathway Analysis Martina Kutmon

  2. Contents • Background on Pathway Analysis • Data Analysis with PathVisio • Introduction to the Afternoon Session

  3. Biological Pathways

  4. Why Pathway Analysis? • Intuitive to biologists • Puts data in biological context • More intuitive way of looking at your data • More efficient than looking up gene-by-gene • Computational analysis • Overrepresentation analysis • Network analysis

  5. Why Pathway Analysis?

  6. Biological Context • Statistical results: • 1,300 genes are significantly regulated after treatment with X • Biological Meaning: • Is a certain biological pathway activated or deactivated? • Which genes in these pathway are significantly changed?

  7. Pathway Collection • Where to get pathways? • Online pathway databases • WikiPathways www.wikipathways.org • Reactome www.reactome.org • Many more ... http://pathguide.org

  8. Identifier Mapping Identifier Mapping Annotation: ENSG00000131828

  9. Identifier Mapping • Microarrays typically use internal ids: • Affymetrix: 205749_at • Agilent: A_14_P106416 • Illumina: ILMN_4380 • Pathways typically use gene/protein ids • Entrez Gene: 1543 • Ensembl: ENSG00000140465 • UniProt: P04637

  10. Identifier Mapping • 2 scenarios • Software will take care of it • e.g. PathVisio uses synonym databases • You will have to convert the ids yourself • DAVID: http://david.abcc.ncifcrf.gov • SOURCE: http://smd.stanford.edu/cgi-bin/source/sourceBatchSearch • BioMART: http://www.biomart.org • NetAffx: http://www.affymetrix.com

  11. Pathway Analysis Tools • PathVisio • BioRAG • MetaCore (GeneGO) • Pathway-Express • GenMAPP / MAPPFinder

  12. PathVisio www.pathvisio.org

  13. Data Analysis with PathVisio

  14. Pathway Analysis Workflow Prepare your data Import your data in PathVisio Find „enriched“ pathways Visualize data on pathways Export pathway images

  15. 1. Prepare your data

  16. File Format • PathVisio accepts delimited text files • Prepare and export from Excel

  17. File Format • Export from R write.table(myTable, file = txtFile, col.names = NA, sep = "\t", quote = FALSE, na = "NaN")

  18. Identifier Systems PathVisio accepts many identifier systems: • Probes • Affymetrix, Illumina, Agilent,... • Genes and Proteins • Entrez Gene, Ensembl, UniProt, HUGO,... • Metabolites • ChEBI, HMDB, PubChem,...

  19. 2. Import your data

  20. Import Expression Data

  21. Gene Database Your data A pathway Entrez Gene 5326 153 4357 65543 2094 90218 … 4357 ?? ENS0002114 P4235

  22. Gene Database • Download from www.pathvisio.org/wiki/PathVisioDownload • 32 species supported

  23. Identifier and System Code

  24. Exception File Exceptions file

  25. Pgex File • Imported data is stored in a .pgex file • Load an existing dataset:

  26. 3. Find „enriched“ pathways

  27. Statistics Unchanged gene Changed gene Question: • Does the small circle have a higher percentage of changed genes than the large circle? • Is this difference significant?

  28. Calculate Z-scores • The Z-score can be used as a measure for how much a subset of genes is different from the rest • r = changed genes in Pathway • n = total genes in Pathway • R = changed genes • N = total genes Other enrichment calculation methods Ackermann M et al., A general modular framework for gene set enrichment analysis, BMC bioinformatics, 2009

  29. Z-score • The Z-score is a ranking method. • High Z-score  selection is very different from the rest of the dataset • Z-score = 0  selection is not different at all

  30. Criteria Define criterion and select pathway collection criterion collection

  31. Z-score Calculation r = changed genes in Pathway n = total genes in Pathway r n

  32. Z-score Calculation

  33. 4. Visualize your data

  34. Create a Visualization Add/Remove Visualizations Activate visualization options

  35. Color by Data Values

  36. Color Set based on Criterion

  37. Color Set based on Gradient

  38. Visualizations • Gradient based • Fold-change • Rule based • Significant genes

  39. Gradient based

  40. Rule based

  41. 5. Export Pathways

  42. Export Pathway • Export to image formats PNG

  43. Any data associated to a gene, protein or metabolite

  44. PathVisio Team • Maastricht University • Martijn van Iersel • Thomas Kelder • Chris Evelo • Gladstone Institute (San Francisco) • Alexander Pico • Kristina Hanspers • Bruce Conklin • Around the world • Open Source Community

  45. Afternoon Session

  46. Afternoon Session • Pathway Analysis of liver data set with PathVisio • Find „enriched“ pathways in a WikiPathways analysis collection for rats • Create visualization and set the data in a biological context

More Related