160 likes | 366 Views
Analysis of Multiple Experiments TIGR Multiple Experiment Viewer (MeV). Joseph White DFCI January 24,2008. MeV . Stand-alone java application for analysis New version: 4.1 Not database centric; uses TDMS files Writes TDMS files Primarily for normalized data
E N D
Analysis of Multiple ExperimentsTIGR Multiple Experiment Viewer (MeV) Joseph White DFCI January 24,2008
MeV • Stand-alone java application for analysis • New version: 4.1 • Not database centric; uses TDMS files • Writes TDMS files • Primarily for normalized data • MeV does not currently write MAGE-TAB • Download MeV from: tm4.org
Outline • Description of MeV • How MeV treats expression • Some essential concepts • Demo: basic operations in MeV • New file loader • ANOVA example • Demo of MeV new features • Affymetrix file reader • Non-parametric tests • CGH • GCOD
Exp 1 Exp 2 Exp 3 Exp 4 Exp 5 Exp 6 Gene 1 Gene 2 Gene 3 Gene 4 Gene 5 Gene 6 The Expression Matrix is a representation of data from multiple microarray experiments. Each element is a log ratio (usually log 2 (Cy5 / Cy3) ) Black indicates a log ratio of zero, i. e., Cy5 and Cy3 are very close in value Green indicates a negative log ratio , i.e., Cy5 < Cy3 Gray indicates missing data Red indicates a positive log ratio, i.e, Cy5 > Cy3
1.5 -0.8 1.8 0.5 -0.4 -1.3 1.5 0.8 Expression Vectors -Gene Expression Vectors encapsulate the expression of a gene over a set of experimental conditions or sample types. Log2(cy5/cy3)
Expression Vectors As Points in‘Expression Space’ Exp 1 Exp 2 Exp 3 G1 -0.8 -0.3 -0.7 G2 -0.7 -0.8 -0.4 G3 Similar Expression -0.4 -0.6 -0.8 G4 0.9 1.2 1.3 G5 1.3 0.9 -0.6 Experiment 3 Experiment 2 Experiment 1
Distance and Similarity -the ability to calculate a distance (or similarity, it’s inverse) between two expression vectors is fundamental to clustering algorithms -distance between vectors is the basis upon which decisions are made when grouping similar patterns of expression -selection of a distance metric defines the concept of distance
Exp 1 Exp 2 Exp 3 Exp 4 Exp 5 Exp 6 x1A x2A x3A x5A Gene A x4A x6A Gene B x1B x2B x3B x4B x5B x6B 6 6 • Manhattan: i = 1 |xiA – xiB| Distance: a measure of similarity between genes. p1 • Some distances: (MeV provides 11 metrics) • Euclidean: i = 1(xiA - xiB)2 p0 3. Pearson correlation
Distance Metric: EuclideanPearson(r*-1) D D Distance is Defined by a Metric 1.4 -0.90 4.2 -1.00
Normal distribution σ = std. deviation of the distribution X = μ (mean of the distribution)
Hierarchical Clustering K Means clustering Support Trees for HCL EASE (annotation clustering Self-organizing maps K-Nearest Neighbors Support Vector Machines Relevance Networks Template Matching PCA CGH Bayesean Networks T-test ANOVA One and two factor SAM Non-parametric tests Wilcoxon Fisher Exact Test Mack-Skillings Kruskat-Wallins BRIDGE Current MeV Algorithms
Demos • File loaders • HTA data: ANOVA • Affymetrix data: SAM • Non-Parametric tests • CGH
GCOD statistics • Studies: 52 • Hybridizations: 4591 • Analysis Result sets: 12,637 • Signal values: 204,296,195 • Samples: 3644 • Probesets: 160,817 eg.(HG-U133A: 22,293) (HG_U133_Plus_2: 54,684) • Arraydesigns: 9 • Accessions: 54,414
MeV Team • Eleanor Howe • Sarita Nair • Raktim Sinha • mev@tigr.org