1 / 33

APO-SYS workshop on data analysis and pathway charting

APO-SYS workshop on data analysis and pathway charting. Igor Ulitsky Ron Shamir ’ s Computational Genomics Group. Part I: Presentations. EXPANDER AMADEUS SPIKE MATISSE. Part II: Hands-on Session. EXPANDER MATISSE SPIKE. EXP ression AN alyzer and D isplay ER. Adi Maron-Katz

keren
Download Presentation

APO-SYS workshop on data analysis and pathway charting

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. APO-SYS workshop on data analysis and pathway charting Igor Ulitsky Ron Shamir’s Computational Genomics Group

  2. Part I: Presentations • EXPANDER • AMADEUS • SPIKE • MATISSE

  3. Part II: Hands-on Session • EXPANDER • MATISSE • SPIKE

  4. EXPression ANalyzer and DisplayER Adi Maron-Katz Chaim Linhart Amos Tanay Rani Elkon Israel Steinfeld Seagull Shavit Igor Ulitsky Roded Sharan Yossi Shiloh Ron Shamir http://acgt.cs.tau.ac.il/expander

  5. EXPANDER • Low level analysis: • Missing data estimation (KNN or manual) • Normalization: quantile, loess • Filtering: fold change, variation, t-test • Standardization: mean 0 std 1, take log, fixed norm • High level gene partition analysis: • Clustering • Biclustering • Ascribing biological meaning to patterns: • Enriched functional categories (Gene Ontology) • Identify transcriptional regulators – promoter analysis • Built-in support for 9 organisms: • human, mouse, rat, chicken, zebrafish, fly, worm, arabidopsis, yeast

  6. Input data Normalization/ Filtering Links to public annotation databases Visualization utilities Clustering (CLICK, SOM, K-means, Hierarchical) Biclustering (SAMBA) Functional enrichment (TANGO) Promoter signals (PRIMA)

  7. EXPANDER - Preprocessing • Input data: • Expression matrix (probe-row; condition-column) • One-channel data (e.g., Affymetrix) • Dual-channel data (cDNA microarrays, data are (log) ratios between the Red and Green channels) • ‘.cel’ files • ID conversion file: map probes to genes • Gene sets data • Data definitions: • Defining condition subsets • Data type & scale (log)

  8. EXPANDER – Preprocessing (II) • Data Adjustments: • Missing value estimation (KNN or arbitrary) • Merging conditions Normalization: removal of systematic biases from the analyzed chips • Implemented methods: quantile, lowess • Visualization: box plots, scatter plots (simple, M vs. A)

  9. EXPANDER – Preprocessing (III) • Filtering: Focus downstream analysis on the set of “responding genes” • Fold-Change • Variation • Statistical tests (T-test) • Standardization:Create a common scale • For each probe Mean=0, STD=1 • Log data (base 2) • Fixed Norm (divide by norm of probe vector)

  10. Input data Normalization/ Filtering Links to public annotation databases Visualization utilities Clustering (CLICK, SOM, K-means, Hierarchical) Biclustering (SAMBA) Functional enrichment (TANGO) Promoter signals (PRIMA)

  11. Cluster Analysis • Partition the responding genes into distinct sets, each with a particular expression pattern • Identify major patterns in the data: reduce the dimensionality of the problem • co-expression → co-function • co-expression → co-regulation • Partition the genes to achieve: • Homogeneity: genes inside a cluster show highly similar expression pattern. • Separation: genes from different clusters have different expression patterns.

  12. Cluster Analysis (II) • Implemented algorithms: • CLICK, K-means, SOM, Hierarchical • Visualization: • Mean expression patterns • Heat-maps

  13. Sensors ATM Effectors (p53, BRCA1, CHK2) Survival pathways Cell death pathways Apoptosis Cell cycle arrest Stress responses DNA repair Example study: responses to ionizing radiation Ionizing Radiation Double Strand Breaks

  14. Example study: experimental design • Genotypes: Atm-/- and control w.t. mice • Tissue: Lymph node • Treatment: Ionizing radiation • Time points: 0, 30 min, 120 min • Microarrays: Affymetrix U74Av2 (12k probesets)

  15. Test case - Data Analysis • Dataset: six conditions (2 genotypes, 3 time points) • Normalization • Filtering step – define the ‘responding genes’ set • genes whose expression level is changed by at least 1.75 fold • Over 700 genes met this criterion • The set contains genes with various response patterns – we applied CLICK to this set of genes

  16. Major Gene Clusters – Irradiated Lymph node Atm-dependent early responding genes

  17. Major Gene Clusters – Irradiated Lymph node Atm-dependent 2nd wave of responding genes

  18. Input data Normalization/ Filtering Links to public annotation databases Visualization utilities Clustering (CLICK, SOM, K-means, Hierarchical) Biclustering (SAMBA) Functional enrichment (TANGO) Promoter signals (PRIMA)

  19. Ascribe Functional Meaning to the Clusters • Gene Ontology (GO) annotations for human, mouse, rat, chicken, fly, worm, Arabidopsis, Zebrafish and yeast. • TANGO: Apply statistical tests that seek over-represented GO functional categories in the clusters.

  20. Functional Enrichment - Visualization

  21. Functional Categories cell cycle control (p<1x10-6 )

  22. Functional Categories Cell cycle control (p<5x10-6) Apoptosis (p=0.001)

  23. Input data Normalization/ Filtering Links to public annotation databases Visualization utilities Clustering (CLICK, SOM, K-means, Hierarchical) Biclustering (SAMBA) Functional enrichment (TANGO) Promoter signals (PRIMA)

  24. Clues are in the promoters Identify Transcriptional Regulators ATM Hidden layer NEW ? TF-C ? TF-B ? ? ? p53 TF-A Observed layer g13 g12 g11 g10 g9 g8 g7 g6 g5 g4 g3 g2 g1

  25. ‘Reverse engineering’ of transcriptional networks • Infers regulatory mechanisms from gene expression data • Assumption: co-expression → transcriptional co-regulation → common cis-regulatory promoter elements • Step 1: Identification of co-expressed genes using microarray technology (clustering algs) • Step 2: Computational identification of cis-regulatory elements that are over-represented in promoters of the co-expressed gene

  26. PRIMA – general description • Input: • Target set(e.g., co-expressed genes) • Background set (e.g., all genes on the chip) • Analysis: • Identify transcription factors whose binding site signatures are enriched in the ‘Target set’ with respect to the ‘Background set’. • TF binding site models – TRANSFAC DB • Default: From -1000 bp to 200 bp relative the TSS

  27. Promoter Analysis - Visualization

  28. PRIMA - Results

  29. PRIMA – Results NF-B 5.1 3.8x10-8 p53 4.2 9.6x10-7 STAT-1 3.2 5.4x10-6 Sp-1 1.7 6.5x10-4

  30. Input data Normalization/ Filtering Links to public annotation databases Visualization utilities Clustering (CLICK, SOM, K-means, Hierarchical) Biclustering (SAMBA) Functional enrichment (TANGO) Promoter signals (PRIMA)

  31. Biclustering • Clustering becomes too restrictive on large datasets: • Seeks global partition of genes according to similarity in their expression across ALL conditions • Relevant knowledge can be revealed by identifying genes with common pattern across a subset of the conditions • Biclustering algorithmic approach

  32. A. Tanay, R. Sharan, R. Shamir RECOMB 02 * Bicluster(=module): subset of genes with similar behavior in a subset of conditions * Computationally challenging: has to consider many combinations of sub-conditions Biclustering: SAMBAStatistical Algorithmic Method for Bicluster Analysis

  33. Biclustering Visualization

More Related