100 likes | 333 Views
Literature Survey: Microarray Data Analysis. Ei-Ei Gaw Arizona State University CSE 591 April 24, 2003. cDNA Microarray Procedure. http://www.anst.uu.se/frgra677/projekt_eng.html. Microarray Data. Expression patterns of thousands of genes simultaneously.
E N D
Literature Survey:Microarray Data Analysis Ei-Ei Gaw Arizona State University CSE 591 April 24, 2003
cDNA Microarray Procedure http://www.anst.uu.se/frgra677/projekt_eng.html
Microarray Data • Expression patterns of thousands of genes simultaneously. • Usually the number of experiments is small compare to the number of genes. • Random and systematic variations. • Systematic variations due to complexity of the method. • Remove low-quality measurements.
Preprocessing • Transformation • Aim: Change data to reflect assumptions (Homologous variance and normal distribution) of statistical techniques. • Log and variance-stabilizing transformation. • Normalization • Aim: Account for random and systematic variations. • Global, lowness, location, and scale normalization methods. • Missing data • K Nearest Neighbors (KNN) algorithm, a Singular Value Decomposition based method (SVD), and simple row (gene) average. • Reduce dimensionality
Classification • Hierarchical clustering • Classify tumor and find previously unrecognized tumor subtypes • Identify differentially expressed genes • Cluster co-expressed genes, but not suited to find multiple ways expression patterns are similar • Self-organizing map • Suited to find a small number of prominent classes • Class discovery • Support vector machine • Operate in extremely high-dimensional feature space • Supervised learning – take advantage of prior knowledge • Genetic Algorithm/KNN
Regulatory Networks • Two-stage approach • Find co-regulated gene using clustering algorithm and then look for conserved motifs upstream • Unified approach – Joint likelihoods for sequence and expression • Find co-regulated gene and then look for conserved motifs upstream • Kolmogorov-Smirnov method • Does not require clustering • Sort red-green ratios • Minreg • Require prior biological knowledge – candidate regulators • One advantage is speed • Identify and characterize both regulators and regulatees • Assign biological function to regulators
Genetic Networks • Association rules • Global gene expression profiling • Can revel relationship between different genes and relationship between environment and expression • Bayesian Networks • Boolean Networks • REVEAL (REVerse Engineering Algorithm) • NetWork
Bibliography • Durbin, B. P., Hardin, J. S., Hawkins, D. M., and Rocke, D. M. (2002) A variance-stabilizing transformation for gene-expression microarray data. Bioinformatics, 18:S105-S110. • Kerr, M. Kathleen, Martin, Mitchell, and Churchill, Gary A. (2000) Analysis of Variance for Gene Expression Microarray Data. Journal of Computational Biology, 7:819-837 • Yang, Yee Hwa, Dudoit, Sandrine, Luu, Percy et.al (2002) Normalization for cDNA microarry data: a robust composite method addressing single and multiple slide systematic variation. Nucleic Acids Research, 30:e15. • Quackenbush, John (2002) Microarray data normalization and transformation. Nature Genetics Supplement 32:496-501. • Troyanskaya, Olga et. al. (2001) Missing value estimation methods for DNA l;. Bioinformatics, 17:520-525. • Antoniadis, A., Lambert-', S. and Leblanc, F. (2003) Effective dimension reduction methods for tumor classification using gene expression data. Bioinformatics, 19, 563-570. • Golub, T. R. et. al. (1999) Molecular classification of Cancer: class Discovery and Class Prediction by Gene Expression Monitoring. Science 286:531-537. • Rickman, David S. et. al. (2001) Distinctive Molecular profile of High-Grade and Low-Grade Gliomas Based on Oligonucleotide Microarray Analysis. Cancer Research 61:6885-6891. • Eisen, Michael B. et. al. (1998) Cluster analysis and display of genome-wide expression patterns. Proc. Natl. Acad. Sci. USA 95:14863-14868.
Bibliography • Brown, Michael P. S. et. al. (2000) Knowledge-based analysis of microarray gene expression data by using support vector machines. Proc. Natl. Acad. Sci. USA 97:262-267. • Li, Leping et. al. Gene Assessment and Sample Classification for Gene Expression Data Using a Genetic Algorithm/k-nearest Neighbor Method. • Holmes, Ian, Bruno, (2000) William J. Finding Regulatory Elements Using Joint Likelihoods for Sequence and Expression Profile Data. American Association for Artificial Intelligence (www.aaai.org). • Van Helden, J., Andre, B., and Collado-Vides, J. (1998) Extracting Regulatory Sites from the Upstream Region of Yeast Genes by Computational analysis of Oligonucleotide Frequencies. J. Mol. Biol. 281:827-842. • Pe’er, Dana, Regev, Aviv, and Tanay, Amos (2002) Minreg: Inferring an active regulator set. Bioinformatics 18:S258-S267. • Jensen, Lars and Knudsen, Steen (2002) Automatic discovery of regulatory patterns in promoter regions based on whole cell expression data and functional annotation. Bioinformatics 16:326-333. • Creighton, Chad and Hanash, Samir (2003) Mining gene expression databases for association rules. Bioinformatics 19:79-86. • Friedman, Nir et. al. (2000) Using Bayesian Networks to Analyze Expression Data. J. Comp. Bio. 7:601-620.
Bibliography • Liang S., Fuhrman, S. and Somogyi, R. (1998) REVEAL, A General Reverse Engineering Algorithm for Inference of Genetic Network Architectures. Pacific Symposium on Biocomputing 3:18-29 (1998). • Akutsu, T., Miyano, S. and S. Kuhara S. (1999) Identification of Genetic Networks from a Small Number of Gene Expression Patterns Under the Boolean Network Model. Pacific Symposium on Biocomputing 4:17-28. • Samsonova, M.G. and Serov, V.N. (1999) NetWork: An Interactive Interface to the Tools for Analysis of Genetic Network Structure and Dynamics. Pacific Symposium on Biocomputing 4:102-111.