1 / 19

Statistical Analysis and Design of Experiments for Large Data Sets

Statistical Analysis and Design of Experiments for Large Data Sets. Steven Gilmour School of Mathematical Sciences Centre for Statistics. Introduction. I will discuss microarrays, but there are many other possible biological applications

yale
Download Presentation

Statistical Analysis and Design of Experiments for Large Data Sets

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Statistical Analysis and Designof Experiments for Large Data Sets Steven Gilmour School of Mathematical Sciences Centre for Statistics

  2. Introduction • I will discuss microarrays, but there are many other possible biological applications • Microarray experiments provide a measure of gene activity • Used to compare expression levels of “treatment” groups • Single channel (e.g. Afymetrix) arrays, or two-colour platforms

  3. False Discovery Rate • Hypothesis test procedures for a single response variable are unsuitable for screening for thousands of genes • Testing at 5% level of significance would imply wrongly rejected very large numbers of null hypotheses (declaring inactive genes to be active) • Traditional corrections, such as familywise error rate are too conservative • False discovery rate (FDR) ensures that a suitably small proportion of genes declared active are truly inactive.

  4. Sample size calculations • Many methods have been suggested for determining an appropriate number of slides • Assume fixed, unstructured, treatments • Microarrays used recently in genetical genomics studies to understand genetic mechanisms governing variation in complex traits • Treatments now have structure, e.g. family structure, multiloci genotypic groups • We have worked out better sample size methods for such treatments

  5. Design for Two-Colour Arrays • Slides are blocks of size two, so incomplete blocks are usually needed • Two colours imply a row-column structure • Designs suggested by several authors • Examples for 4 and 9 treatments

  6. Structured Treatment Effects • Three possible genotypes, e.g. F2 populations and codominant markers • Modelled by additive-dominance model • Single locus, genotypes bb, Bb, BB • Plot variance vs. proportion of each homozygous group (r)

  7. bb BB • Optimal treatment design and blocking for 10 slides: (a) additive effect; (b) dominance effect; (c) both

  8. bb bb BB BB Bb Bb

  9. For multiple loci, factorial structures are used • Two-locus experiment in 10 slides • Optimal treatment design and blocking follow

  10. AABB aabb AABB aabb AAbb aaBB AABb aaBb AAbb aaBB AaBB Aabb AaBb

  11. AABB aabb AABb aaBb AAbb aaBB AaBB Aabb AaBb

  12. AABB aabb AABb aaBb AAbb aaBB AaBB Aabb AaBb • Including epistatic effects • Same design problem

  13. Random Treatment Effects • Aim to get good estimates of genetic variances and heritabilities • Designs to find BLUPs of breeding values, given a known pedigree • Two simple pedigree structures:

  14. Optimal designs in 9 slides:

  15. Discussion • Consideration of different experimental objectives should lead to different types of design being used • Often a search algorithm is needed to find an optimal design – we have written an R function • There are still many open questions

More Related