290 likes | 474 Views
Multivariate Analysis of Pathways. Multivariate Approaches to Gene Set Selection. Key Multivariate Ideas. PCA (Principal Components Analysis) SVD (Singular Value Decomposition) MDS (Multi-dimensional Scaling) Hotelling T 2. PCA. PCA1 lies along the direction of
E N D
Key Multivariate Ideas • PCA (Principal Components Analysis) • SVD (Singular Value Decomposition) • MDS (Multi-dimensional Scaling) • Hotelling T2
PCA PCA1 lies along the direction of maximal correlation; PCA 2 at right angles with the next highest variation. Three correlated variables
Multivariate Representation of Pathways • BAD pathway Normal IBC Other BC • Clear separation between groups • Variation differences
Hotelling’s T2 • Compute distance between sample means using (common) metric of covariation • Where • Multidimensional analog of t (actually F) statistic
Principles of Kong et al Method • Normal covariation generally acts to preserve homeostasis • The transcription of genes that participate in many processes will be changed • The joint changes in genes will be most distinctive for those genes active in pathways that are working differently
Critiques of Hotelling’s T • Small samples: unreliable S estimates • N < p • Estimates of Dx and S not robust to outliers • Assumes same covariance in each sample • S1 = S2 ? Usually not in disease • Kong et al propose analog of Welch t-test • Permutation in samples for significance
Making it Stable • Insufficient information to capture all relationships – too much correlation! • Power of Hotelling’s method comes from identifying directions of rare variation • Many (spurious) directions of 0 variation • Random variation in data leads to random variation in PCA • Regularization strategy: force covariance to be more like IID
Making it Robust • Microarray data has many outliers • Multivariate methods are very much distorted by outliers • Robust estimates of covariance could give robust PCA • Simple approach: trim outliers
Handling Changes of Covariance • Power of Hotelling’s method comes from identifying directions of rare variation • If one group shows little covariation in one direction but the other does – how to test for changes? • If one group is control then its rare covariance changes should be taken as standard • Robust measure of means in both groups
Meaning of Covariance Change • Meaning of covariance across individuals • Homeostasis in face of individual variation • e.g. BAD pathway: largest loadings of PC1 on PRKARB & ADCY1 • PRKARB represses CREB1; ADCY activates CREB1 • Gene sets whose covariance diminishes may • be responding to different inputs • have escaped their usual regulatory control • Characteristic of cancers
Testing Covariance Changes • Idea: directions of small variation in one should match directions of small variation in other • Mathematical approach • Find solutions of S1 – lS2 • Solutions should all be near 1, if no change • Test statistic: easily computed • Computational approach • Ratio of largest to smallest: lmax / lmin
Network Topology • Connections represent interactions: • Regulatory (one-way) • Protein interaction (two-way) • Hubs are genes with many connections • Bottlenecks are single genes that connect two parts of a functional network
Devising Tests Based on Topology • Issues: how to weight more heavily the genes that are hubs • How to assess directionality of change • How to measure co-operativity (activation or repression changes in appropriate ways)
Draghici et. al. Approach • Overall measure • Effective contribution (perturbation factor)
Outliers: Clues to Disease Process? • Outliers usually reflect idiosyncratic events • Recurrent outliers reflect rare events that are selected • If a particular pathway is disrupted in disease, but by many different mechanisms, then the expression profiles should • Lose healthy covariance • Show recurrent outliers • How to test for ‘consistent’ outliers? • COPA: a method for flagging recurrent outliers in expression data • Finds consistent fusion gene
A Test Statistic for Consistent Outliers • Ratio of quantile differences to normal variation: (q.90 – q.10)tumor/max( (q.9-q.1)normal,0.4) • Compare to null distribution by permutation • Many genes show much higher ratios
Statistical Significance • Find false positives confidence limits by permutations • Several hundred genes appear significant at 10-20% FDR • Actual scores: 267 scores are greater than 5, where 90% of permutations have fewer than 34 scores over 5
A Test for Functional Groups • For each group G of genes • sG <- sum(scores[G])/sqrt(length(G)) • Scores: t-scores or range ratios • PAGE (BMC Bioinformatics, 2005)
Do Genes Make Sense? • Quantile Ratio • [1] "DNA replication" • [2] "response to pathogenic fungi" • [6] "cleavage of lamin" • [7] "spindle organization and biogenesis" • [15] "response to osmotic stress" • [16] "nutrient import" • [22] "response to mercury ion" • T-test • [2] "sodium ion homeostasis" • [3] "leukocyte adhesive activation" • [4] "positive regulation of calcium-independent cell-cell adhesion" • [5] "oxytocin receptor activity" • [6] "ADP biosynthesis" • [7] "dADP biosynthesis" • [10] "regulation of muscle contraction" • [11] "caveolar membrane" • [12] "response to cold" • [16] "stress fiber formation" • [18] "positive regulation of complement activation" • [19] "astrocyte activation" • [22] "regulation of long-term neuronal synaptic plasticity" • [24] "positive regulation of endocytosis" • [25] "embryonic hemopoiesis"
Cancer Functional Groups • Do very probable cancer genes show high-discrepancy in few samples? • Program: identify genes that might contribute to cancer processes: growth signaling, loss of cell-matrix adhesion, apoptosis • Do most samples from these categories show at least one gross mis-regulation? • Are they the same genes in most samples?
Example: Cell Growth • Select genes in GO:001558 ‘regulation of cell growth’ • Expect most samples to have at least one very serious mis-regulated gene from this category. • Compute maximum aberration score across category
Aberrations • Aberration score indicated by color: vanilla: 0; red: 4 • Nine normals at left • No gene misregulated in even 50% of samples • BUT: Only a few genes commonly misregulated
Simplest Summary • Maximum aberration score for samples
Testing the Pathway for Outliers • Many genes show aberrations in tumor group • Null distribution: medians of maxima from randomly selected gene groups of size 37 • P < .01 NB. The results for cell-matrix interaction are very similar; angiogenesis not so strong