130 likes | 256 Views
Statistical Techniques for Temporal Microarray Data Analysis. Ritesh Krishna Department Of Computer Science. WPCCS July 1, 2008. Why should you listen to my talk ?. System Biology is everybody’s playground in this room – Image processing, Algorithms, Parallel processing etc.
E N D
Statistical Techniques for Temporal Microarray Data Analysis Ritesh Krishna Department Of Computer Science WPCCS July 1, 2008
Why should you listen to my talk ? • System Biology is everybody’s playground in this room – Image processing, Algorithms, Parallel processing etc. • Importance of System Biology in today’s context – • Agriculture • Energy sources (Bio Fuels) • Gene Therapy • Waste clean-up
Use of Computational Techniques • Massive data generated by molecular biology experiments • Need to analyse outputs files produced in various formats, facilitate storage of bulk data, quick and precise retrieval, and most importantly understanding the behaviour and pattern in the data
How are these experiments performed Major revolution in the world of molecular biology No limitation of one gene in one experiment Possible to monitor expression levels of thousands of genes simultaneously
An example - Arabidopsis Thaliana • Popular in plant biology as a model plant • One of the smallest plant genome • First plant genome to be sequenced • Present Study • The present study is about understanding • leaf senescence process in Arabidopsis. • Senescence refers to the biological processes • of a living organism approaching an advanced • age, caused due to age and stress in plant • It is a programmed event responding to a wide • range of external and internal signals and is • controlled in a tightly regulated manner by • different genes and proteins..
Experimental Design Dye Laser (Total 16 replicates) Quantitative Data
Issues with data • Biological variations vs. Technical variations • Technical variations – Sample bias, Dye bias, Slide bias, Experimental conditions variations, Scanning and Imaging errors, Human errors • Massive dataset with ~31,000 genes • Goal is to understand functioning of certain sets of genes (needle in the haystack)
Step one – Clean the raw data using Normalization • To assess different sources of technical biases • To remove the correlations between replicates to make them independent from each other • Fitting a multivariate error model - Normal distribution with mean zero and constant variance for the residuals associated with genes • Propose statistical tests for evaluating the effects of normalization
Step two - Clustering • Reduce the data dimension • Similar genes sit in the same cluster.
Circadian Circuit ELF4 TOC1 LFY CCA1
ERS2 ERS1 ETR2 ETR1 CTR1 EIN2 EIN6 EIN4 EIN3 EIL2 EIL1 EIL4 EIL3 EIL5 ERF1 PDF1.2
More information…. • Affymetrix Inc. (http://www.affymetrix.com/index.affx) • Agilent Technologies (http://www.chem.agilent.com) • Microarray Analysis , Gibson G (2003) Microarray Analysis. PLoS Biol 1(1): e15