1 / 19

Cancer Genomics Pipeline with Multivariate Correspondence Analysis

A pipeline based on multivariate correspondence analysis for cancer genomics with various biological level information sources and data mining techniques. Explore techniques for DNA, RNA, protein, and phenotype analysis.

adorothy
Download Presentation

Cancer Genomics Pipeline with Multivariate Correspondence Analysis

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Max Planck Institutefor Molecular Genetics A pipeline based on multivariate correspondence analysis with supplementary variables for cancer genomics Christine Steinhoff Max Planck Institute for Molecular Genetics Berlin, Germany

  2. Biological Level Information Source Technology Examples Max Planck Institutefor Molecular Genetics • DNA/Genome • RNA • Protein • Phenotype ESTlibrary; physical parameters of DNA, RNA, Proteins, etc; DNA sequence, datamining, literature mining, ... Literature/ database Methylation prediction: TFBS prediction; functional annotations (repetitive elements, functional categories,... ), Splicing, In silico experimental Profiling/ characterizing Epigenetics; SNP arrays, arrayCGH; sequencing; expression arrays; ... interaction ChIP chip; Preotein interaction; MASS of complexes; ... phenomics Imaging; RNAi techniques; MASS; medical observations Data Sources

  3. Max Planck Institutefor Molecular Genetics Cat ( m , c ) PROBLEMS Discrete categories After appropriate normalization Approx lognormal symmetric Not symmetric skew Scale and Distribution differ!

  4. Max Planck Institutefor Molecular Genetics Data INPUT Procedure Discretization Filtering Indicator coding Multiple Correspondence Analysis

  5. Max Planck Institutefor Molecular Genetics Step 1: Discretization Patients covariates arrayCGH Expression Categorical: e.g. Staging Grading Smoking Mutation ....

  6. Max Planck Institutefor Molecular Genetics Step 1: Discretization arrayCGH Expression Package: DNAcopy Segmentation and discretization of arrayCGH data Probability of expression Fold Change Criterion

  7. Max Planck Institutefor Molecular Genetics Step 1: Discretization Patients covariates arrayCGH Expression Typically: n~23,000 -> reduce number

  8. Max Planck Institutefor Molecular Genetics Step 2: Filtering (optional) • Possibilities • Neglect all genes with no change in any patient • Choose genes with highest Variance across patients • Select for high Correlation between arrayCGH and expression

  9. Max Planck Institutefor Molecular Genetics Data INPUT Procedure Discretization Filtering Indicator coding Multiple Correspondence Analysis

  10. Max Planck Institutefor Molecular Genetics Step 3: Indicator Matrix - Binary Coding Indicator matrix With binary coding Original matrix With categories

  11. Max Planck Institutefor Molecular Genetics From: Multiple Correspondence Analysis and related Methods

  12. Max Planck Institutefor Molecular Genetics EXAMPLE: PUBLISHED DATA

  13. Max Planck Institutefor Molecular Genetics Covariate States‘ Display

  14. Max Planck Institutefor Molecular Genetics Explore ERBB2 and MYC ERBB2 Amplified in ACGH ERBB2 overexpression ERBB2 normal in ACGH

  15. Max Planck Institutefor Molecular Genetics ERBB2 underexpr ERBB2 loss in ACGH

  16. Max Planck Institutefor Molecular Genetics MYC Overexpression MYC amplification

  17. Max Planck Institutefor Molecular Genetics MYC Normal acgh MYC underexpression

  18. Max Planck Institutefor Molecular Genetics Enrichment of GO Categories

  19. Max Planck Institutefor Molecular Genetics Thank you for your attention ! ACKNOWLEDGEMENT Sensor Lab, CNR-INFM Max Planck Institutefor Molecular Genetics Martin Vingron Matteo Pardo

More Related