Fine mapping of recombination in S. cerevisiae

Fine mapping of recombination in S. cerevisiae Wolfgang Huber EMBL - EBI

The maths of marker genotyping sensitivity, specificity, data QA/QC  Event classification cross-overs, conversions… and weirdness  Event rates biological significance

Single-reporter methods • De novo polymorphism detection • Winzeler et al.Science 281, 1998 (and others): ANOVA testing 1 = 1. • Borevitz et al.Genome Research 13, 2003: moderated t-test (SAM). • Brem et al. Science 296, 2002: moderated t-test, then cluster all data (parental and segregant) and discard SFPs for which clusters don’t separate the parental data. • Segregant genotyping (using polymorphims) • Use the estimated posterior probability of class membership (uniform prior on the classes): • Brem et al. augment this: are estimated from clustered data.

But we have multiple reporters per SNP: probe sets 6: CTTCACTATTTGTACAGATCGCAAT Probe sets: a set of reporters that exactly + uniquely map to a location and interrogate one polymorphism 5: CTAACTTCACTATTTGTACAGATCG 4: GGCCCTAACTTCACTATTTGTACAG 2: GACTGGCCCTAACTTCACTATTTGT 1: GGAGGACTGGCCCTAACTTCACTAT S96: CCTCCTGACCGGGATTGAAGTGATAAACATGTCTAGCGTTA YJM789: CCTCCTGACCGGGATTGAACTGATAAACATGTCTAGCGTTA 3: GACTGGCCCTAACTTGACTATTTGT

Multivariate analysis of probe set dataparallel coordinate plots log2 intensity reporters in probe set

Multivariate analysis of probe set dataparallel coordinate plots

Multivariate methods SNPScanner: Gresham et al., Science 311, 2006: • Model probe intensity xi with & without presence of SNP as function of • Probe GC content • Position of SNP within the probe • Nucleotides surrounding the SNP • Fit model parameters using two sequenced strains with known SNPs. • To genotype a segregant or new strain at a given base, compute a likelihood ratio assumption: covariance matrix diagonal and same

But • neighbouring probes' data are not independent • variances for the two genotypes are often quite different • training data is often not representative • likelihood ratio test generates too many FPs •  a generalized multi-probe method

GTS (genotyping by semi-supervised clustering) An instance of the EM algorithm applied to multivariate Gaussian mixture modeling: simutaneously estimate class shapes and object class membership

GTS (genotyping by semi-supervised clustering) An instance of the EM algorithm applied to multivariate Gaussian mixture modeling: simutaneously estimate class shapes and object class membership R package ss.genotyping

Examples of probe set results

Aberrant probe sets (cross-hybridization?)

Filtering ambiguous individual genotype calls (z) Aberrant probe sets Weakly separating probesets Imbalanced probesets Probe Sets Genotype Calls

Benchmark SNPScanner - GTS • 233 Affymetrix yeast tiling arrays from Steinmetz group: 13 S288, 12 YJM789: training data 52 tetrads of crosses: to be genotyped • Same post-processing/filter

GTS vs SNPScanner arrays genomic position (markers)

GTS vs SNPScanner

High resolution in crossover regions

Three adjacent cross-overs involving three chromosomes chr 1, wt_47

A cross-over plus two long conversions, involving all four chromosomes chr 3, wt_19

Three adjacent conversions involving three chromosomes chr 3, wt_38

Cross-over accompanied by multiple conversions chr 4, wt_36

Event classification Automatic algorithm takes tetrad-level genotype traces and assigns them into events: Cross-over, conversion, complex cross-over, complex coversion,... R package recombination.genotyping Still need manual curation: we are just beginning to understand the spectrum of possible event types!

Genetic Interactions Genotypes at pairs of loci on different chromosomes are unlinked, but the population shows evidence of selection over-represen-tation under-represen-tation

Genetic interaction network of S288c-YJM789 crosses

Acknowledgements EMBL HD Lars Steinmetz Julien Gagneur Zhenyu Xu Sandra Clauder-Münster Fabiana Perocchi Wu Wei • EBI • Elin Axelsson • Ligia Bras • Alessandro Brozzi • Tony Chiang • Audrey Kauffmann • Paul McGettigan • Greg Pau • Oleg Sklyar • Mike Smith • Jörn Tödling • Jitao Zhang Richard Bourgon Eugenio Mancera Ramos • The contributors to R and Bioconductor projects

Tetrad-level results

Crossovers accompanied by events on other strands

Double crossovers

Summary • Semi-supervised clustering is natural given the experimental structure • Parental data are often not a faithful indicator of offspring behavior! Supervised classification may experience problems for some polymorphisms. • Multivariate Gaussian model is adequate • EM works well when data behave as expected — but this is not always the case. Importance of fit diagnostics, QA/QC, post-processing filters. • Outlook • Hotspots, conversion/crossover ratio, sizes, spacing and interference. • Msh4 mutant data (deficient in the putative interference-generating pathway): how do interference patterns change? • Unanticipated polymorphism detection (de-novo in segregants; in unsequenced strains)

Fine mapping of recombination in S. cerevisiae

Fine mapping of recombination in S. cerevisiae

Presentation Transcript

Linkage, Recombination and Eukaryotic Mapping

Linkage, Recombination and Eukaryotic Mapping

S. cerevisiae 2.0

Functional profiling of the S. cerevisiae genome

Understanding the Nutritional Control of Metabolic Flux in S. cerevisiae

Lecture 7: Recombination mapping

Diversity of Saccharomyces cerevisiae

Auxotrophic Mutations of S. cerevisiae

Fine mapping of recombination in S. cerevisiae

Fine Mapping of Complex Traits in Yeast: Mapping Meiotic Recombination across the Genome

RAD54 Primers ( S. cerevisiae gene)

FINE SCALE MAPPING

Meiosis, Recombination and Mapping

Joint analysis of genetic and physical interactions in S. cerevisiae

Functional profiling of the S. cerevisiae genome

Sampling Design in Regional Fine Mapping of a Quantitative Trait

Analysis of genetic interaction networks in S. cerevisiae

Fine recombination mapping in S. cerevisiae using tiling microarrays

Recombination and Mapping (cont ’ d)

S. cerevisiae 2.0

Recombination Frequency and Gene Linkage Mapping

Meiosis, Recombination and Mapping