1 / 37

Statistical Design and Analysis of Microarray Experiments Peng Liu 6/15/2010

Statistical Design and Analysis of Microarray Experiments Peng Liu 6/15/2010. Microarray Technology. Microarray technology allows measuring expression levels (abundance of mRNA transcripts) of thousands of genes simultaneously. Two types of platforms: Affymetrix (single-color)

Download Presentation

Statistical Design and Analysis of Microarray Experiments Peng Liu 6/15/2010

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Statistical Design and Analysis of Microarray Experiments Peng Liu 6/15/2010

  2. Microarray Technology • Microarray technology allows measuring expression levels (abundance of mRNA transcripts) of thousands of genes simultaneously. • Two types of platforms: • Affymetrix (single-color) • Two-color microarray

  3. Wild-type vs. Myostatin Knockout Mice Belgian Blue cattle have a mutation in the myostatin gene. Design of Affymetrix experiment: one sample  one chip

  4. Designing 2-color microarray (3 layers) From Churchill, 2002, nature genetics

  5. M B V bundle sheath strands mesophyll protoplasts Example I: Sawers et al, 2007, BMC Bioinformatics

  6. Example I: Sawers et al, 2007, BMC Bioinformatics • The establishment of C4 photosynthesis in maize is associated with differential accumulation of gene transcripts and proteins between bundle sheath and mesophyll photosynthetic cell types. • Goal: To detect genes that are differentially expressed in Bundle Sheath (B) and Mesophyll (M) cells.

  7. Example I: Sawers et al, 2007, BMC Bioinformatics • A simple method: Isolate cells and perform a microarray experiments to compare the gene expression between the two cells (treatments).

  8. Example I: Sawers et al, 2007, BMC Bioinformatics • A little more complication: The procedure for extracting mRNA for the two cells are different. The one to extract mRNA from M cells introduces stress. • Solution: Add two more treatment groups: samples with both M and B cells going through extraction of mRNA with and without stress. B, M, Stress and Total (4 treatment groups)

  9. Direct comparison vs indirect comparison • Direct: comparison within slide • Indirect: comparison between slides • Suppose we want to compare gene expression levels between treatment 1 and treatment 2. 2 1 2 1 R 2 1 Direct Comparison Indirect Comparison

  10. Comments about 2-color Microarray Designs • A unique and powerful feature of 2-color microarray is to make direct comparison between two samples on the same slide. • For pairing samples, the variation due to slide can be accounted for. • When possible, it is more efficient to use direct comparison. • However, sometimes, it is not practical to make direct comparison of all possible pairs.

  11. Efficiency of comparison • The efficiency of comparisons between 2 samples is determined by the length and the number of paths connecting them. 2 1 2 1 R 2 1 Direct Comparison (Dye-swap) Indirect Comparison

  12. Reference vs Loop design 2 1 2 1 3 3 R Reference Design Loop Design

  13. B Total Stress M Designing experiment for example I With 6 biological replicates

  14. Performing the experiment (Naturecell biol. 2001 3:8)

  15. After the bench work… Affymetrix Gene Chip image 2-color microarray image

  16. The data table looks like

  17. Pre-normalization analysis • Image processing • obtain the intensity measurement of the signal • Background correction • get rid of local background that might due to non-specific binding and obtain the target sample intensity • Filtration • remove unreliable spots and reduce the dimension of data • Transformation • convert data into a format that makes data analysis valid or easier

  18. Normalization • Normalization describes the process of removing (or minimizing) non-biological variation in measured signal intensity levels so that biological differences in gene expression can be appropriately detected. • Aim: remove sources of systematic variation • Example of non-biological variation: dye difference for 2-color microarray

  19. Figure from Dudoit et al, 2002, Statistica Sinica Self-self experiment

  20. Normalization: M vs. A Plot (45o rotation) Log Red-Log Green = M (Log Green+Log Red)/2 = A

  21. LOWESS Fit Log Red-Log Green (Log Green+Log Red)/2

  22. After normalization Normalized M A

  23. Y224 Y114 dye slide treatment Statistical Inference • Data notation for normalized signal intensities (NSI): Yijk for each gene (g) i: treatment index j: dye index k: slide index

  24. Fitting linear models to microarray data • After the normalization, we have one observation (normalized signal intensity) for each gene on each channel (a combination of dye and array). • Together, the data is an array with each row for one gene and each column for one channel or one chip. • We will fit a statistical model for each gene separately.

  25. Mean expressions for 4 treatment groups Treatments means • M (M cell with stress) μ+v2+ • B (B cell without stress) μ+v1 • TO (both cells without stress) μ+c*v2+ (1-c)*v1 • ST (both cells with stress) μ+c*v2+ (1-c)* v1+ • Note that c is the proportion of M cells in the total leaf sample with both cells. • We are interested in testing H0: v1 = v2, whether a given gene is differentially expressed between M and B cells or not.

  26. Fixed effects • The parameters on the previous slide (v1, v2, and ) specify fixed effects. • Fixed effects are used to specify the mean of the response variable. • A factor is fixedif the levels of the factor were selected by the investigator with the purpose of comparing the effects of the levels to one another. • The fixed effects included in the model depend on the experimental design.

  27. Random effects • There are some random effects that are unknown: • slide effects • other effects introduced in the experiment (such as biological replicate effects) • residual random effects that include any sources of variation unaccounted for by other terms B Total Stress M

  28. Random effects • Random factors are used to specify the correlation structure among the response variable observations. • e.g., observations on the same slide are more correlated than observations from different slides. • The random effects included in the model also depend on the experimental design. • A model that has both fixed and random effects is called a mixed model.

  29. Detecting differentially expressed genes • Construct statistical test for parameters that we are interested in, e.g., what are the difference in gene expression (v1 - v2)? v1 - v2 0 means differential expression. • Model the random effects and perform tests or construct confidence intervals. • Perform tests for each gene and obtain a p-value. • Empirical Bayes test that borrows information across genes is often used because of higher power.

  30. Results from testing

  31. 2536 p-values below 0.05. 0.05 We would expect around 0.05*40000=2000 p-values to be less than 0.05 by chance if no genes were differentially expressed.

  32. Possible Errors in Testing ONE gene • Type I Error: false positives • Type II Error: false negatives (1-power) • Power: true positives

  33. Error Rate in Multiple Testing Outcomes when testing m genes (Benjamini and Hochberg, 1995) Family-wise error rate, FWER= Pr(V >0) False Discovery Rate, FDR = E(V/R |R>0) * Pr(R>0)

  34. Results from testing for example I

  35. Clustering • Grouping genes into different “clusters” based on their expression profile  Clustering

  36. Other analyses • Relating the gene expressions with biological functional categories  Gene Enrichment Test • Connecting microarray data with other kinds of data such as survival data. • More …

  37. Assigned References • Nettleton, D. (2006) A Discussion of statistical methods for design and analysis of microarray experiments for plant scientists. The Plant Cell,18, 2112–2121.

More Related