1 / 9

Analyzing Microarray Data with Methods from Statistics and Machine Learning

Analyzing Microarray Data with Methods from Statistics and Machine Learning. B-IT IPEC Winter School 2008 Prof. Dr. A. B. Cremers Jörg Zimmermann. DNA Microarray Data. Genome Chips containing a collection of microscopic DNA spots

skip
Download Presentation

Analyzing Microarray Data with Methods from Statistics and Machine Learning

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Analyzing Microarray Data with Methods from Statistics and Machine Learning B-IT IPEC Winter School 2008 Prof. Dr. A. B. Cremers Jörg Zimmermann

  2. DNA Microarray Data • Genome Chips containing a collection of microscopic DNA spots • Simultaneous determination of > 105 Gene Expression Levels • Dramatic acceleration of data aquisition • New possibilities for disease diagnosis, treatment studies, network analysis, … Analyzing Microarray Data with Methods from Statistics and Machine Learning

  3. DNA Microarray Data The resulting data have the form: x11 x12 … x1n ( L1 ) . . . . . . xp1 xp2 … xpn ( Lp ) n = number of measured cell states (e.g. gene expression levels) p = number of samples xij = real number e.g. representing expression level of gene j in sample i Li = Label of sample i Analyzing Microarray Data with Methods from Statistics and Machine Learning

  4. Challenges for Data Analysis • Normalization (removing systematic measurement effects) • Variable Selection (Identification of relevant Variables) • Large sample Effects: Type I and Type II errors (False positives / False negatives) • Dimensionality Reduction • Identification of new disease classes • Classification of data into known disease classes Analyzing Microarray Data with Methods from Statistics and Machine Learning

  5. Cluster Analysis Finding Structure in data without labels (unsupervised learning) Does a cluster characterize a (new) disease type? Analyzing Microarray Data with Methods from Statistics and Machine Learning

  6. Prediction Problem • Classify data into known disease classes: Supervised Learning • Split data in Training and Test set • Learn a model on the training set • Evaluate model on the test set Analyzing Microarray Data with Methods from Statistics and Machine Learning

  7. Prediction Problem Under- and Overlearning: Analyzing Microarray Data with Methods from Statistics and Machine Learning

  8. Data Analysis Methods Dimension Reduction • PCA (Principle Component Analysis) • ICA (Independent Component Analysis) • Multidimensional Scaling Unsupervised Learning • K-Means / K-Medoid • Hierarchical Clustering Algorithms Supervised Learning • Linear Discriminant Analysis • Maximum Likelihood Discrimination • Nearest Neighbor Methods • Decision Trees • Random Forests Analyzing Microarray Data with Methods from Statistics and Machine Learning

  9. Organisation Schedule: 31.3.2008 – 4.4.2008, B-IT Building Language: german and english (Slides in english) Talk: 45 min + 15 min discussion Documentation: 10 – 15 pages (german or english) Bereich (DPO Bonn): B Background Literature: Hastie, Tibshirani, Friedman: The Elements of Statistical Learning, Springer, 2001 Contact:jz@iai.uni-bonn.de Summer Course: Gene Mining and Network Analysis Summer School: Programming Data Analysis Algorithms with R Analyzing Microarray Data with Methods from Statistics and Machine Learning

More Related