1 / 18

Microarray analysis

Understand microarray data analysis through PCA and SVD techniques for gene expression patterns. Learn how to reduce dimensions efficiently and detect meaningful patterns in your data. Dive into algorithms for transforming high-dimensional datasets and explore their applications in computational biology.

mwillard
Download Presentation

Microarray analysis

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Microarray analysis Algorithms in Computational Biology Spring 2006 Written by Itai Sharon

  2. Microarrays • Measure the expression of genes in the cell • “Count” the number of mRNA molecules that attach to biological probes • Expression data is gathered for many (thousands) of genes at once • Data is gathered for several experiments • Either in several time stamps or different conditions

  3. Relative expression of gene i in experiment j

  4. Detecting Patterns in Expression Data • Genes may have similar expression patterns because • They are part of the same complex (protein-protein interactions) • They are part of the same pathway • They have similar regulatory elements • They have similar functions (part of a fail-safe mechanism) • A popular solution: clustering (we saw already) • Hierarchical clustering, K-means, agglomerative,... • Today: dimensionality reduction • PCA • SVD

  5. Why Dimensionality Reduction • Using irrelevant data may harm accuracy • Clustering algorithms do not perform well in high dimensional data • Visualizing high dimensional data

  6. Principle Components Analysis (PCA) • PCA seeks for a linear projection that best describes the data in a least mean squares sense • Finds a set of principle components (PCs) • A PC defines a projection that encapsulates the maximum amount of variation in a dataset • Each PC is orthogonal to all other PCs • Reduce dimensionality by picking the most informative PCs • Namely, for reducing from dimension d to dimension d’, pick the d’ most informative PCs

  7. PCA - Steps Input: a dataset • Subtract the mean from each dimension • Compute the covariance matrix  for the d dimensions • The covariance of two variables X and Y: • The covariance matrix:

  8. PCA – Steps (cont.) • Compute the eigenvectors and eigenvalues of the covariance matrix • Choose the most informative PCs, construct a feature vector • Eigenvectors with highest eigenvalues carry the most information • Feature vector is simply the combination of all eigenvectors chosen FeatureVector = (eig1, eig2, …, eigd’) • Transform dataset to the new axis system • For sS:

  9. When Things Get Messy… • PCA is fine when initial dimension is not too big • Space and time complexity are of O(d2) - size of covariance matrix • Otherwise – we have a problem… • E.g. when d=104 time/space complexity is O(108)… • Luckily an alternative exists: SVD

  10. Eigengenes, Eigenarrays and SVD • The idea: • Use the singular value decomposition (SVD) theorem for transforming the dataset from the gene/array space to the eigengene/eigenarray space • Eigengenes, eigenarrays and eigenvalues: • Each dimension is represented by an eigengene/eigenarray/eigenvalue triplet • Eigenvalues are used for ranking dimensions • Paper: • Alter et. Al., 2000

  11. Singular Value Decomposition (SVD) • Theorem: if E is a real M by N matrix, then there exist orthogonal matrices s.t.Whereand

  12. SVD • i is the ithsingular value of E. ui and viare the ithleft singular vector and right singular vector of E, respectively. • It holds that • Efficient algorithms for calculating the SVD exist

  13. Orthogonality of Decomposition

  14. Orthogonality of Decomposition

  15. SVD and Microarray analysis • Reduction from the N genes x M arrays to p eigengenes x p eigenarrays space • W is the eigenexpression matrix • U represents the expression of genes over eigenarrays • V represents the expression of eigengenes over arrays • The “fraction of eigenexpression”: • “Shannon entropy” of the dataset:

  16. Example: Cell cycle of Saccharomyces Cerevisiae • Data is available for 5981 genes over 14 time steps (with ½ hour intervals) • 784 genes were classified as cell-cycle regulated (with no missing values)

  17. Data Sorting • For eigengenes 1 and 2, plot the correlation of each gene g1 with both on a 2-D plot • X-axis represents the correlation with 2, Y-axis relates to 1. • Sort by angular distance

  18. Further Reading • PCA: • L. Smith: A Tutorial on Principal Components Decomposition • Eigengenes, eigenvectors and SVD: • O. Alter, P. Brown & D. Botstein: Singular Value Decomposition for Genome-wide Expression Data Processing and Modeling, PNAS 97:18, 2000

More Related