1 / 14

SNP chips

SNP chips. Advanced Microarray Analysis Mark Reimers, Dept Biostatistics, VCU, Fall 2008. Affy SNP chips. SNP Chip Probe Design. 10 25-mers overlapping the SNP Alleles A & B Sense and Anti-sense or PM and MM (old). RMA for SNP chips. Initial Affy software wasn’t very accurate

maegan
Download Presentation

SNP chips

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. SNP chips Advanced Microarray Analysis Mark Reimers, Dept Biostatistics, VCU, Fall 2008

  2. Affy SNP chips

  3. SNP Chip Probe Design • 10 25-mers overlapping the SNP • Alleles A & B • Sense and Anti-sense • or PM and MM (old)

  4. RMA for SNP chips • Initial Affy software wasn’t very accurate • Rabbee & Speed (2006) proposed RLMM, an RMA-like method using: • Quantile normalization • Two variables ( A & B signals) • Discriminant analysis • Much better than Affy software • Variant (BRLMM) adopted by Affy

  5. Discriminating SNPs • Estimate common covariance to clusters on ‘training’ set (Hapmap) data • Separate clusters by Mahalanobis metric • Use pre-defined clusters & metric to tell apart alleles on new data

  6. Success Rate • 90% (MPAM) to 98% (CRLMM) called at comparable accuracy on HapMap data • Cross-validation estimate • BUT • New chips don’t have same distributions as ‘training’ set

  7. CRLMM - a heroic solution • RLMM couldn’t be extended across labs • Still problems with several hundred SNPs • CRLMM addresses both these issues by careful normalization • Achieves accuracy of 99.85% on hets; 99.95% on homozygotes • Most complicated statistical calculation in BioC!

  8. CRLMM Overview • Normalize intensity on each chip separately by • Summarize qA+, qB+, qA-, qB- by median polish: M+ = qA+ - qB+ ; M- = qA-- qB- • Model log ratio bias on each chip by • Estimate log ratio bias using E-M • Where Zi indexes which SNP state is likely • k = 1,2,3 for AA, AB, BB

  9. Normalization – Step 1 • Regress (PM) intensity on sequence predictors and fragment length hb(t) for all four bases on two chips g(L) and 95% CI on one chip

  10. Normalization – Step 1 • Too many hb(t)’s • Impose constraint: • hb(t) is a cubic spline with 5 df on [1,25] • Forces neighboring values of h to be close • Allows variation in smoothness (unlike loess) • Subtract fitted values from signal • BUT: bias still present

  11. Step 2 – Summarization • Median Polish • Tukey’s exploratory method for arrays of numbers • Iterative method • Subtract medians of each row and each column (and accumulate) until medians converge • Robust • Fast

  12. Step 3 – Ratio Normalization • Fit bias function: • of form: • m reflects allele bases • But what is k? • Estimate by E-M m fL(L) for one chip

  13. E-M Algorithm • Systematic way to ‘guess and improve’ • Start with putative assignments to classes • i.e. guess k based on overall separations • Estimate bias for each k: fi,k • Use residuals from fit to classify again • Repeat until converge!

  14. Final Step: Calling • Aim: separation in two-dimensional log-ratio space: • Accuracy > 99.85% on all Hapmap calls

More Related