1 / 46

Fine recombination mapping in S. cerevisiae using tiling microarrays

Fine recombination mapping in S. cerevisiae using tiling microarrays. Richard Bourgon 20 September 2007 bourgon@ebi.ac.uk. Meiotic recombination. Meiosis… Two divisions, yielding 4 haploid daughter cells. Double-stranded breaks (DSBs) initiate recombination.

ping
Download Presentation

Fine recombination mapping in S. cerevisiae using tiling microarrays

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Fine recombination mapping in S. cerevisiae using tiling microarrays Richard Bourgon 20 September 2007 bourgon@ebi.ac.uk

  2. Meiotic recombination Meiosis… • Two divisions, yielding 4 haploid daughter cells. • Double-stranded breaks (DSBs) initiate recombination. • Both crossover and non-crossover resolutions are possible.

  3. Meiotic recombination Meiosis… • Two divisions, yielding 4 haploid daughter cells. • Double-stranded breaks (DSBs) initiate recombination. • Both crossover and non-crossover resolutions are possible. Based on B de Massy, TRENDS in Genetics 19(9), 2003

  4. Meiotic recombination Meiosis… • Two divisions, yielding 4 haploid daughter cells. • Double-stranded breaks (DSBs) initiate recombination. • Both crossover and non-crossover resolutions are possible. Based on B de Massy, TRENDS in Genetics 19(9), 2003

  5. Meiotic recombination Meiosis… • Two divisions, yielding 4 haploid daughter cells. • Double-stranded breaks (DSBs) initiate recombination. • Both crossover and non-crossover resolutions are possible. Based on B de Massy, TRENDS in Genetics 19(9), 2003

  6. Research questions • In S. cerevisiae, where do hotspots occur and what are the local recombination rates? • Do hotspots (binary) or recombination rates (quantitative) correlate with features of DNA sequence or chromatin structure? • How large are conversion regions? Can we identify the various resolution patterns? • What is the relative frequency of the various DSB-repair pathways? • Does the observed pattern among recombination events concur with current models for “interference”? Do mutations impact interference?

  7. Saccharomyces cerevisiae microarray data • Two strains • S96, isogenic to the common laboratory strain S288c. • YJM789, isogenic to the clinical isolate YJM145. • In alignable regions ≈ 56,000 SNPs, 30,000 insertions and deletions. • Tiling microarrays • ≈ 6.5 M 5µ features, tiling non-repetitive S96 every 4 bases. • ≈ 4% of probes are specific to YJM789 sequence. • Data • 25 parental genomic DNA hybridizations. • 208 wildtype offspring hybridizations. • 20 msh4 mutant offspring hybridizations.

  8. “Single feature polymorphisms” • Hybridization efficiency depends on the number and position of mismatches. • Differential hybridization provides a means of detecting polymorphisms, even when only the reference genome sequence is known: SFPs. • Winzeler et al., Science 281(5380), 1998. • Brem et al., Science 296(5568), 2002. • Steinmetz et al., Nature 416(6878), 2002. • Borevitz et al., Genome Research 13(3), 2003. • Given parental behavior, genotype segregants via supervised classification.

  9. Single-probe methods • Polymorphism detection • Winzeler et al. (and others): ANOVA testing 1 = 1. • Borevitz et al.: moderated t-test using the SAM adjustment. • Brem et al.: moderated t-test. Then cluster all data (parental and segregant) and discard SFPs for which clusters don’t separate the parental data. • Segregant genotyping • ANOVA and t-test methods use the estimated posterior probability of class membership, with a uniform prior on the classes: • Brem et al. augment this: are estimated from clustered data.

  10. S96: CCTCCTGACCGGGATTGAAGTGATAAACATGTCTAGCGTTA YJM789: CCTCCTGACCGGGATTGAACTGATAAACATGTCTAGCGTTA Probe sets: SNP interrogation 6: CTTCACTATTTGTACAGATCGCAAT Probe sets: groups of probes which each exactly map to a unique location, and which interrogate a common polymorphism. 5: CTAACTTCACTATTTGTACAGATCG 4: GGCCCTAACTTCACTATTTGTACAG 2: GACTGGCCCTAACTTCACTATTTGT 1: GGAGGACTGGCCCTAACTTCACTAT 3: GACTGGCCCTAACTTGACTATTTGT

  11. Marginal probe behavior

  12. A multi-probe method: SNPScanner Gresham et al., Science 311(5769), 2006: • Model the decrease in a given probe’s intensity in the presence of a single SNP, as a function of • Position within the probe, • Probe response to reference sequence, • Probe GC content, and • Nucleotides surround the SNP position. • Fit model parameters using two sequenced strains with known SNPs. • To genotype a segregant or new strain at a given base, assume probes in a probe set are independent and compute a likelihood ratio: vs. with assumed to be common for both genotypes.

  13. An alternative multi-probe method • Residual correlation remains after centering log intensities for each probe within inferred genotype class: a multivariate approach is justified.

  14. An alternative multi-probe method • Residual correlation remains after centering log intensities for each probe within inferred genotype class: a multivariate approach is justified. • Parental arrays are informative, but do not always provide an ideal model. Supervised classification of offspring (i) wastes information, and (ii) may be misleading.

  15. An alternative multi-probe method • Residual correlation remains after centering log intensities for each probe within inferred genotype class: a multivariate approach is justified. • Parental arrays are informative, but do not always provide an ideal model. Supervised classification of offspring (i) wastes information, and (ii) may be misleading. • Clear division into two distributions is necessary but not sufficient. Quantitative aspects of the inferred clusters are useful.

  16. Semi-supervised, model-based clustering (SSC) Semi-supervised clustering via EM algorithm: • Assume a two-component mixture, with 1 = 2 = 1/2. • (Xi,Yi) with latent class variable Y. Y is known for parental arrays. • Assume X|Y multivariate normal. • Begin with E-step: initialize the unknown Y with any simple clustering scheme: k-means, hierarchical agglomeration, etc. • Iteratively estimate parameters, E(Yi|Xi), parameters, etc. • Classify segregant i based on final estimated E(Yi|Xi). For diagnostic purposes only: • Multivariate Gaussian fit to dimension-reduced parental data. • Unsupervised clustering of offspring data, by EM algorithm, with k{2,3}.

  17. Examples of SSC probe set results

  18. Chromosome-level SSC results

  19. Filtering • Array level • Excess “genotype switching”. • Large RMS residual (Mahalanobis). • Probe set level • High estimated misclassification rate. • Aberrant cluster behavior. • Very unusual genotype ratio. • Call level • Intermediate posterior probability of class membership.

  20. Aberrant probe sets: non-response

  21. Aberrant probe sets: possible cross-hybridization?

  22. Filtering

  23. Chromosome-level SSC results (unfiltered)

  24. Chromosome-level SSC results (filtered)

  25. Chromosome-level SSC results (filtered)

  26. Genotyping accuracy • 82 usable forward sequencing runs. (Reverse similar.) • 16 spores sequenced. • Sequenced regions include 322 array-interrogated SNPs. • Sequenced samples had a range of array qualities. • Sequenced regions focused on single-marker conversions with a range of probe set quality scores.

  27. SNPScanner: approximately correct distributions

  28. SNPScanner: wrong distributions, right calls

  29. SNPScanner: inaccurate covariance estimation

  30. Genotype call comparison: SNPScanner vs. SSC • Filter for both methods: • Remove probe bad arrays and aberrant probe sets. • Remove probe sets with poorly separated clusters. • Drop calls falling between two observed clusters. • Only consider polymorphisms with at least one S288c-specific probe. • Compute concordance rate between the two methods.

  31. SSC calls

  32. SNPScanner calls

  33. Disagreement

  34. PC plots for probe sets with strong disagreement

  35. PC plots for probe sets with strong disagreement

  36. PC plots for probe sets with strong disagreement

  37. Usable probes per polymorphisms

  38. Tetrad-level results

  39. High resolution in crossover regions

  40. Crossovers accompanied by events on other strands

  41. Double crossovers

  42. Msh4 mutant

  43. Recombination rates

  44. Recombination rates

  45. Summary and future work • Summary • Semi-supervised clustering out-performs supervised classification: • Parental data are often not a faithful indicator of offspring behavior. • Offspring clusters contain a lot of information. • Filtering is important for small event detection: • Aberrant or error-prone probe sets create spurious small “events” • Correct distribution estimates are required to detect the latter. • Future work • Exploration and recovery of aberrant probe sets. • Unanticipated polymorphism detection. • Application to a single sequenced genome. • Rate/count adjustments given varying marker spacing. • Hotspots, conversion/crossover ratio, sizes, spacing and interference. • New mms4 mutant…

  46. Acknowledgements • EMBL Heidelberg • Eugenio Mancera Ramos • Lars Steinmetz • Julien Gagneur • Zhenyu Xu • EBI • Wolfgang Huber • EBI, and Istituto Europeo di Oncologia, Milan • Alessandro Brozzi • EBI, and Higgins Lab, University College, Dublin • Paul McGettigan

More Related