1 / 67

Meiotic recombination mapping with tiling microarrays: genotype and rate inference

Meiotic recombination mapping with tiling microarrays: genotype and rate inference. Richard Bourgon 22 November 2007 bourgon@ebi.ac.uk. Overview. Meiotic recombination Genotyping with tiling microarrays Previous single-probe approaches Multivariate approaches Post-processing

janet
Download Presentation

Meiotic recombination mapping with tiling microarrays: genotype and rate inference

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Meiotic recombination mapping with tiling microarrays:genotype and rate inference Richard Bourgon 22 November 2007 bourgon@ebi.ac.uk

  2. Overview • Meiotic recombination • Genotyping with tiling microarrays • Previous single-probe approaches • Multivariate approaches • Post-processing • SSC vs. SNPscanner • Recombination event inference • Tetrad level results • Local rate inference • Biology

  3. — Meiotic recombination —

  4. Meiotic recombination Meiosis… • Two divisions, yielding 4 haploid daughter cells. From Molecular Biology of the Cell, Fourth Edition.

  5. second end capture dHJ crossover Meiotic recombination DSB Meiosis… • Two divisions, yielding 4 haploid daughter cells. • Double-stranded breaks (DSBs) initiate recombination. • Both crossover and non-crossover resolutions are possible. strand resection 3’ 3’ single-end invasion D-loop DSBR

  6. invading strand unwound second end capture nicked HJ dHJ crossover noncrossover crossover Meiotic recombination DSB Meiosis… • Two divisions, yielding 4 haploid daughter cells. • Double-stranded breaks (DSBs) initiate recombination. • Both crossover and non-crossover resolutions are possible. strand resection 3’ 3’ single-end invasion D-loop SDSA DSBR

  7. Research questions • In S. cerevisiae, where do hotspots occur and what are the local recombination rates? Do crossovers and non-crossovers exhibit the same pattern? • Do hotspots (binary) or recombination rates (quantitative) correlate with features of DNA sequence or chromatin structure? • How large are conversion regions? Can we identify the various resolution patterns? • What is the relative frequency of the various DSB-repair pathways? • Does the observed pattern among recombination events concur with current models for “interference”? Do mutations impact interference? • Are hotspots mutagenic or conservative? Are there biases in gene conversion?

  8. — Genotyping with microarrays —

  9. “Single feature polymorphisms” • Hybridization efficiency depends on the number and position of mismatches. • Differential hybridization provides a means of detecting polymorphisms, even when only the reference genome sequence is known: SFPs. • Winzeler et al., Science 281(5380), 1998. • Brem et al., Science 296(5568), 2002. • Steinmetz et al., Nature 416(6878), 2002. • Borevitz et al., Genome Research 13(3), 2003. • Given parental behavior, genotype segregants via supervised classification.

  10. Single-probe methods • Polymorphism detection • Winzeler et al. (and others): ANOVA testing 1 = 1. • Borevitz et al.: moderated t-test using the SAM adjustment. • Brem et al.: moderated t-test. Then cluster all data (parental and segregant) and discard SFPs for which clusters don’t separate the parental data. • Segregant genotyping • ANOVA and t-test methods use the estimated posterior probability of class membership, with a uniform prior on the classes: • Brem et al. augment this: are estimated from clustered data.

  11. Saccharomyces cerevisiae microarray data • Two strains • S96, isogenic to the common laboratory strain S288c. • YJM789, isogenic to the clinical isolate YJM145. • In alignable regions ≈ 56,000 SNPs, thousands of insertions and deletions. • Tiling microarrays • ≈ 6.5 M 5µ features, tiling non-repetitive S96 every 4 bases. • ≈ 4% of probes are specific to YJM789 sequence. • Data • 25 parental genomic DNA hybridizations. • 208 wildtype offspring hybes. • 20 msh4 mutant offspring hybes. • 20 mms4 mutant offspring hybes.

  12. S96: CCTCCTGACCGGGATTGAAGTGATAAACATGTCTAGCGTTA YJM789: CCTCCTGACCGGGATTGAACTGATAAACATGTCTAGCGTTA Probe sets: SNP interrogation 6: CTTCACTATTTGTACAGATCGCAAT • Probe set: group of probes which each exactly map to a unique location, and which interrogate a common polymorphism. • Marker: one or more polymorphisms interrogated by the same probe set. 5: CTAACTTCACTATTTGTACAGATCG 4: GGCCCTAACTTCACTATTTGTACAG 2: GACTGGCCCTAACTTCACTATTTGT 1: GGAGGACTGGCCCTAACTTCACTAT 3: GACTGGCCCTAACTTGACTATTTGT

  13. Marginal probe behavior

  14. A multi-probe method: SNPscanner Gresham et al., Science 311(5769), 2006: • Model the decrease in a given probe’s intensity in the presence of a single SNP, as a function of • Position within the probe, • Probe response to reference sequence, • Probe GC content, and • Nucleotides surround the SNP position. • Fit model parameters using two sequenced strains with known SNPs. • To genotype a segregant or new strain at a given base, assume probes in a probe set are independent and compute a likelihood ratio: vs. with assumed to be common for both genotypes.

  15. An alternative multi-probe method • Residual correlation remains after centering log intensities for each probe within inferred genotype class: a multivariate approach is justified.

  16. An alternative multi-probe method • Residual correlation remains after centering log intensities for each probe within inferred genotype class: a multivariate approach is justified. • Parental arrays are informative, but do not always provide an ideal model. Supervised classification of offspring (i) wastes information, and (ii) may be misleading.

  17. An alternative multi-probe method • Residual correlation remains after centering log intensities for each probe within inferred genotype class: a multivariate approach is justified. • Parental arrays are informative, but do not always provide an ideal model. Supervised classification of offspring (i) wastes information, and (ii) may be misleading. • Clear division into two distributions is necessary but not sufficient. Quantitative aspects of the inferred clusters are useful.

  18. Semi-supervised, model-based clustering (SSC) Semi-supervised clustering via EM algorithm: • Assume a two-component mixture, with 1 = 2 = 1/2. • (Xi,Yi) with latent class variable Y. Y is known for parental arrays. • Assume X|Y multivariate normal. • Begin with E-step: initialize the unknown Y with any simple clustering scheme: k-means, hierarchical agglomeration, etc. • Iteratively estimate parameters, E(Yi|Xi), parameters, etc. • Classify segregant i based on final estimated E(Yi|Xi). For diagnostic purposes only: • Multivariate Gaussian fit to dimension-reduced parental data. • Unsupervised clustering of offspring data, by EM algorithm, with k{2,3}.

  19. Examples of SSC probe set results

  20. — Reducing genotyping error rate —

  21. Chromosome-level SSC results

  22. Filtering • Array level • Excess “genotype switching”. • Large RMS residual (Mahalanobis) to assigned class. • Probe set level • High estimated misclassification rate. • Aberrant cluster behavior. • Very unusual genotype ratio. • Call level • Intermediate posterior probability of class membership. • Large residual to assigned class.

  23. Aberrant probe sets: non-response

  24. Aberrant probe sets: possible cross-hybridization?

  25. Chromosome-level SSC results (unfiltered)

  26. Chromosome-level SSC results (filtered)

  27. Chromosome-level SSC results (filtered)

  28. — Comparison with SNPscanner —

  29. SSC vs. SNPscanner • SSC • Multivariate Gaussians. • Class specific, non-diagonal covariance matrices. • Parameters estimates via EM, using labeled and unlabeled data. • Data-based estimates for both classes. • Both S288c- and YJM789- specific probes used. • SNPscanner • Multivariate Gaussians. • Common, diagonal covariance matrices for the two classes. • Parameter estimation using parental data only. • Data-based parameter estimation for reference class. • Model based shift gives variant class mean estimate. • Only S288c-specific probes may be used.

  30. SNPscanner: approximately correct distributions

  31. SNPscanner: wrong distributions, right calls

  32. SNPscanner: inaccurate covariance estimation

  33. Genotype call comparison: SNPscanner vs. SSC • Filter for both methods: • Remove bad arrays and aberrant probe sets. • Remove probe sets with poorly separated clusters. • Drop calls falling between two observed clusters. • Only consider polymorphisms with at least one S288c-specific probe. • Compute concordance rate between the two methods.

  34. SSC calls

  35. SNPscanner calls

  36. Disagreement

  37. PC plots for probe sets with strong disagreement

  38. PC plots for probe sets with strong disagreement

  39. PC plots for probe sets with strong disagreement

  40. — Tetrad-level results —

  41. Tetrad-level results

  42. High resolution in crossover regions

  43. Crossovers accompanied by events on other strands

  44. Double crossovers

  45. — Rate inference —

  46. Definitions • Marker and inter-marker intervals (IMIs). • Recombination event intervals: • Conversions: midpoints of IMIs immediately beyond genotype change. • Crossovers: midpoints of IMIs immediately before return to 2:2 ratio.

  47. Inter-marker interval event rates

  48. Larger IMIs are involved in more recombination events

  49. -M -w +w +M 0 High marker density adjustment (crossovers) • An inter-marker interval I at [-w, w], centered at 0. • Chromosome at [-M, M], with . • Yj: symmetric extension of recombination interval, given a DSB at j. • Assuming that two recombination intervals cannot overlap I,

  50. High marker density adjustment (crossovers) • An inter-marker interval I at [-w, w], centered at 0. • Chromosome at [-M, M], with . • Yj: symmetric extension of recombination interval, given a DSB at j. • Assuming that two recombination intervals cannot overlap I, • For crossovers, “involvement” is equivalent to detection.

More Related