1 / 52

Design of microarray gene expression profiling experiments

Design of microarray gene expression profiling experiments. Peter-Bram ’ t Hoen. Lay-out. Practical considerations Pooling Randomization One-color vs Two-colors Two-color hybridization designs Ratio-based vs Intensity-based analysis. Think before you start. research question

lou
Download Presentation

Design of microarray gene expression profiling experiments

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Design of microarray gene expression profiling experiments Peter-Bram ’t Hoen

  2. Lay-out • Practical considerations • Pooling • Randomization • One-color vs Two-colors • Two-color hybridization designs • Ratio-based vs Intensity-based analysis

  3. Think before you start • research question • choice of technology • controls and replicates Ref: Churchill. 2002. Nature Genetics Supplement 32: 490-495

  4. Research question • Limit your (initial) number of question / conditions • choose best timepoint for mRNA regulation • can be different from protein/activity • pilots using RT-qPCR • experimental follow-up • what will you do with the data? • verification of differential gene expression • in vitro experiments to study mechanism • "in vivo" verification in tissue sections

  5. Choice of technology • What is affordable? • Do a pilot to estimate the variance for your samples, experimental set-up and platform • Calculate your power: What is the lower border of the effect size that you can pick up?

  6. Controls • positive: genes whose regulation is known • check on biological experiment & data analysis • positive: spikes in mRNA and/or hyb mix • check labeling procedure and hybridization • detection range (sensitivity) and dynamic range • "landing lights" for gridding software • negative controls: non-specific binding • check cross-hybridization: buffer, non-homologous DNA

  7. spike Reference RNA Test RNA … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … Array containing DNA controls …… …… …… …… …… …… Spikes Spiked 2-fold change (copies/cell) Spiked 3-fold change (copies/cell) RCA Cab rbcL LTP4 LTP6 XCP2 RPC1 NAC1 TIM PRK 2 1 100 50 3 1 150 50 10 5 60 30 15 5 60 20 300 150 300 100 cDNA probe synth. & hybridize

  8. Spikes Van de Peppel et al. EMBO Reports 4, 387 (2003)

  9. Controls • positive: genes whose regulation is known • check on biological experiment & data analysis • positive: spikes in mRNA and/or hyb mix • check labeling procedure and hybridization • detection range (sensitivity) and dynamic range • "landing lights" for gridding software • negative controls: non-specific binding • check cross-hybridization: buffer, non-homologous DNA

  10. Replicates • Include sufficient replicates, based on pilot experiment • Biological replicates are preferred over technical replicates • Control experimental variables with possible unintended effects • genetic background • gender • age

  11. Randomization • Randomize samples with respect to experimental influences • experimenter • day of hybridization • batch of arrays • dye • etc

  12. Pooling • Often done because of lack of sufficient amounts of RNA, but good amplification protocols are available • Advantages: • dampening of individual variation, may increase statistical power • Generally not recommended: • outliers in the population may result in large and significant effects • information on the differences in the population is lost and is probably biologically relevant • in fact, it is an artificial way to increase the significance of your findings

  13. Hybridization design • One color: not many difficulties expected • Two color: what to hybridize with what in which color? • Reference design • Paired design • Loop design • Mixed design Read: Yang & Speed (2002). Design issues for cDNA microarray experiments. Nature Reviews Genetics 3, 579-588

  14. Hybridization design: general issues • Comparisons on the same array are more precise than comparisons on different arrays • Identify most important comparisons • Hybridize those on the same slide • Dye swap • A dye-effect is always there • Balance designs with respect to dye (exception: some common reference designs)

  15. Common reference vs direct hybridizations • Direct • Common reference Variance[ log(A/B) ] for slide = s2 then the variance of the average of the two measurements is s2 /2 B A A log(A/B) = log(A/R) – log(B/R) and variance of log(A/B) is variance[ log(A/R) ] + variance[ log(B/R) ] = s2 + s2 = 2 s2 R B

  16. More samples • Loop Reference 6 arrays A A R B B C C Log (A/B) = 2/3 log (A/B) + 1/3 {log (A/C) – log (B/C)} Assuming that all variances are equal Variance [ log(A/B) ] = 4/9 (s2 / 2) + 1/9 (s2) = 1/3 s2 Variance [ log(A/B) ] = Variance [ log(A/C) ] = Variance [ log(B/C) ] = 0.5s2 + 0.5s2 = s2

  17. Common reference vs direct hybridizations Theoretical Considerations • A design is optimal when it minimizes the variance of the effect of interest • Look for designs leading to small variance of log(A/B) Practical considerations • Common reference may be desired when experiment is extended in the future or when a lot of different conditions have to be compared • Choose a biologically relevant common reference (say: your control sample). In that case, your ratios are of interest and better interpretable

  18. Time-course designs Take 4 time points T1 T2 T3 T4 The best choice of design depends on the comparisons of interest and on the number of slides available

  19. Time-course designs Using 3 slides: T1 T2 T3 T4 which is the best to estimate changes relative to the initial time point: T2 / T1, T3 / T1, T4 / T1

  20. Time-course designs • Using 3 slides: T1 T2 T3 T4 which is the best to estimate relative changes between successive time points: T2 / T1, T3 / T2, T4 / T3

  21. Time course designs • Using 4 slides: T1 T2 T3 T4 R which is the reference design; All comparisons have equal precision

  22. Time course design • Using 4 slides: T1 T2 T3 T4 which is the loop design, balanced wrt dye Distant comparisons have lower precision

  23. Time course designs • Using 4 slides: T1 T2 T3 T4 also uses exactly 2 hybridizations per treatment, balanced wrt dye. Most precise estimates: 1/2, 1/3, 2/4, 3/4

  24. Factorial designs • Designs for studies which involve factors as explanatory variables • Age group • gender • Cell line • Tumor types

  25. Factorial designs Glonek & Solomon (2004) • Admissible design: using the same number of arrays, there are no other designs yielding smaller variances of all parameters Glonek et al.Biostatistics5, 89-111 (2004)

  26. Factorial design; example • Time • 0h • 24h • Cell lines • I (non-leukaemic) • II (leukaemic) • Find genes diff. expressed at 24 but not at 0: interaction between time and cell line

  27. Factorial design; possible samples • All combinations of factor levels. In this case, 4 are possible:

  28. Factorial design: analysis model • (log-)linear model is used • experimental conditions correspond to parameter combinations as in:

  29. Factorial design; possible arrays (2) I,24 I,0 (3) (6) (4) (1) II,0 II,24 (5)

  30. Optimal admissible design • Designs that are not worse than others, and for which the variance of the parameter of interest is (one of the) smallest • In the example: wish to find admissible designs for which the interaction term has one of the smallest variances

  31. Glonek et al.Biostatistics5, 89-111 (2004)

  32. Optimal admissible design Glonek et al.Biostatistics5, 89-111 (2004)

  33. Factorial designs: conclusions • Design with all pairwise comparisons is not the best in this case • Best design can only be found with respect to a model • if model does not fit the data well, design choice may not be the best • make sure model chosen is adequate

  34. How to compare efficiently many different conditions? • Common reference: not efficient • Loop and mixed designs: not all comparisons have equal precisions GA Churchill, Nat Genet. 2002 Dec;32 Suppl:490-5

  35. Possible solution • Randomized design • Intensity-based rather than ratio-based calculations • Requires: • Hybridization of two samples independent; no competition for binding sites • Absence of large spot and array effects • To be tested for each platform

  36. Our favourite platform • Spotted collection of 65-mer oligonucleotides (Sigma-Compugen collection) • 22K

  37. Design used to demonstrate independent hyb ‘t Hoen et al. Nucleic Acids Res. 32:e41 (2004)

  38. Distribution of signal intensities is similar ‘t Hoen et al. Nucleic Acids Res. 32:e41 (2004)

  39. R > 0.95 0.90 < R < 0.95 R < 0.90 Correlation of intensities is high ‘t Hoen et al. Nucleic Acids Res. 32:e41 (2004)

  40. Effect of addition of unlabelled target Two targets on microarray Single target on microarray ‘t Hoen et al. Nucleic Acids Res. 32:e41 (2004)

  41. Correlation of ratios calculated from different hyb designs ‘t Hoen et al. Nucleic Acids Res. 32:e41 (2004)

  42. Intensity-based analysis • Hybridizations of two targets on the array are independent • No saturation and no competition • Intensity readings show high inter-array correlation • Comparisons on the same array have highest precision and all other comparisons have equal precision ‘t Hoen et al. Nucleic Acids Res. 32:e41 (2004)

  43. Example of randomized design • Mouse models for muscular dystrophy Turk et al. FASEB J 20, 127-129 (2006)

  44. Our design • Randomly assign samples to the arrays, avoiding co-hybridization of sample from the same group • 2 biological replicates • 4 technical replicates (dye-swap + replicate spotting) Turk et al. FASEB J 20, 127-129 (2006)

  45. Intensity-based analysis can go wrong Vinciotti et al. Bioinformatics 21:492-501 (2005)

  46. Intensity-based analysis can go wrong Vinciotti et al. Bioinformatics 21:492-501 (2005)

  47. Some guidelines • First determine the main question, pointing out the effect of interest • log[A/B] • Then choose analysis model, so that effect variance can be computed • VAR { log[A/B]} • Practical constraints: amount of RNA available, number of hybridizations, number of slides • A good design measures the effect of interest as accurately as possible • small VAR { log[A/B] }

  48. Some useful links • http://dial.liacs.nl/Courses/CMSB%20Courses.html • http://www.brc.dcs.gla.ac.uk/~rb106x/microarray_tips.htm • http://exgen.ma.umist.ac.uk/course/notes/WitDesignLecture.pdf • http://discover.nci.nih.gov/microarrayAnalysis/Experimental.Design.jsp

  49. Acknowledgements Human and Clinical Genetics, LUMC Judith Boer Renée de Menezes Rolf Turk Ellen Sterrenburg Johan den Dunnen Gertjan van Ommen Microarray facility: Leiden Genome Technology Center

  50. Case study • Two genetically-modified zebrafish strains and one wild-type • Defects mainly in muscle development • Apparent at 12-48 hours of development; early death • Question: which biological pathways are affected and responsible for defective myogenesis?

More Related