100 likes | 220 Views
Analyzing Microarray Data with Methods from Statistics and Machine Learning. B-IT IPEC Winter School 2008 Prof. Dr. A. B. Cremers Jörg Zimmermann. DNA Microarray Data. Genome Chips containing a collection of microscopic DNA spots
E N D
Analyzing Microarray Data with Methods from Statistics and Machine Learning B-IT IPEC Winter School 2008 Prof. Dr. A. B. Cremers Jörg Zimmermann
DNA Microarray Data • Genome Chips containing a collection of microscopic DNA spots • Simultaneous determination of > 105 Gene Expression Levels • Dramatic acceleration of data aquisition • New possibilities for disease diagnosis, treatment studies, network analysis, … Analyzing Microarray Data with Methods from Statistics and Machine Learning
DNA Microarray Data The resulting data have the form: x11 x12 … x1n ( L1 ) . . . . . . xp1 xp2 … xpn ( Lp ) n = number of measured cell states (e.g. gene expression levels) p = number of samples xij = real number e.g. representing expression level of gene j in sample i Li = Label of sample i Analyzing Microarray Data with Methods from Statistics and Machine Learning
Challenges for Data Analysis • Normalization (removing systematic measurement effects) • Variable Selection (Identification of relevant Variables) • Large sample Effects: Type I and Type II errors (False positives / False negatives) • Dimensionality Reduction • Identification of new disease classes • Classification of data into known disease classes Analyzing Microarray Data with Methods from Statistics and Machine Learning
Cluster Analysis Finding Structure in data without labels (unsupervised learning) Does a cluster characterize a (new) disease type? Analyzing Microarray Data with Methods from Statistics and Machine Learning
Prediction Problem • Classify data into known disease classes: Supervised Learning • Split data in Training and Test set • Learn a model on the training set • Evaluate model on the test set Analyzing Microarray Data with Methods from Statistics and Machine Learning
Prediction Problem Under- and Overlearning: Analyzing Microarray Data with Methods from Statistics and Machine Learning
Data Analysis Methods Dimension Reduction • PCA (Principle Component Analysis) • ICA (Independent Component Analysis) • Multidimensional Scaling Unsupervised Learning • K-Means / K-Medoid • Hierarchical Clustering Algorithms Supervised Learning • Linear Discriminant Analysis • Maximum Likelihood Discrimination • Nearest Neighbor Methods • Decision Trees • Random Forests Analyzing Microarray Data with Methods from Statistics and Machine Learning
Organisation Schedule: 31.3.2008 – 4.4.2008, B-IT Building Language: german and english (Slides in english) Talk: 45 min + 15 min discussion Documentation: 10 – 15 pages (german or english) Bereich (DPO Bonn): B Background Literature: Hastie, Tibshirani, Friedman: The Elements of Statistical Learning, Springer, 2001 Contact:jz@iai.uni-bonn.de Summer Course: Gene Mining and Network Analysis Summer School: Programming Data Analysis Algorithms with R Analyzing Microarray Data with Methods from Statistics and Machine Learning