High-resolution mapping of meiotic crossovers and noncrossovers

High-resolution mapping of meioticcrossovers and noncrossovers Wolfgang Huber EMBL-EBI

Statistical and computational technology for the understanding of genotype - phenotype networks HT screening - RNAi, drugs, combinations Automated phenotyping from image analysis Genotyping from microarrays & new sequencing Machine learning & Data integration Bioconductor project

10 min: Visualisation of high density along genome data 35 min: Recombination

ChIP-Seq data and "pile-up" Solexa reads, aligned to genome  "pile-up" vector Figure from Zhang et al., PLoS Comp. Biol. 2008

Pile-up plot for chromosome 10 H3K4me1 ChIP-Seq Barski et al. Cell 2007 H3K4me3

Zoom-in H3K4me1 H3K4me3

Hilbert curve

Hilbert curve, iteration 1

Hilbert plots of chromosome 10 H3K4me1 H3K4me3

History The concept of space-filling curves is due to Giuseppe Peano (1890). This specific curve has been invented by David Hilbert (1891). The idea to use these curves for visualization was first published by Daniel Keim (1996) for economics data.

3-colour Hilbert plot red: H3K4me1 green: H3K4me3 blue: exons

Graphical user interface

Availability Open source, released Oct 2008 under GPL v3 Bioconductor packages HilbertVis & HilbertVisGUI Stand-alone application: reads GFF and wiggle track files (incl. BED) Simon Anders

Meiotic recombination Proper chromosome segregation Increase of genetic diversity Gene A Gene B Gene C Gene A Gene b Gene c Gene a Gene b Gene c Gene a Gene B Gene C

Double-strand break repair CO: NCO: Recombination initiates with a double-strand break in one DNA molecule. Only two DNA molecules (homologs) are shown here. Slide18

Non-uniform distribution of recombination across the genome female average male Human chr. 22q Yeast chr. 3 Petes T.D., 2001 Baudat F. & Nicolas A., 1997 Recombination hotspots are small genomic regions where recombination events cluster, surrounded by stretches with little or no recombination activity.

Map all recombination events that occurred in 50 yeast meiosis using high-density tiling microarrays

Eugenio Mancera Ramos Richard Bourgon • Lars Steinmetz

Clinical isolates of S. cerevisiae Clinical strain (YJM789) Laboratory strain (S288c) The common lab yeast Isolated from rotten fig in California in 1930s Domesticated: related to baker's yeast, wine-making and beer-brewing yeasts Genome sequence of S288c: A Goffeau et al. Science (1996) Isolated from immuno-compromised patients Pathogenic in mouse model of systemic infection Various fungal pathogenic characteristics: pseudohyphae, colony morphology switching Ability to grow at >37˚C – a virulence trait Genome sequence of YJM789: W Wei et al., PNAS (2007): 60k SNPS, 6k indels wrt S288c

Experimental approach Mancera*, Bourgon*. Brozzi, Huber, Steinmetz (2008) Slide23

1 tiling array for 2 yeast genomes common S-specific Y-specific * 5’ 3’ Watson strand 8bp * 3’ 5’ Crick strand 4bp 25mer 10% 4% 86% S288c YJM789 291k 2,368k 108k Wei et al., PNAS (2007) Custom design manufactured by Affymetrix (probes)

Identification of previously unknown ncRNA and antisense transcripts and mapping of transcripts Antisense CBF1 David*, Huber* et al., (2006)

The computational & statistical challenges Genotyping marray probes and polymorphisms are in a many-to-many relationship Tiling array provides thorough coverage, but probes have variable performance wrt sensitivity & specificity (e.g. cross-hybridisation) We need highly accurate individual genotype calls if we want to detect small events Event rate inference Our data invert the traditional relationship between event and markers: instead of inferring crossovers between widely spaced markers, we have multiple markers over single events, both crossover and non-crossover Marker spacing influences detection rate, but in complicated ways Non-crossovers falling between markers are not observed Slide26

Genotyping “single feature polymorphisms” Hybridization efficiency depends on number and position of mismatches. Differential hybridization provides a means of detecting polymorphisms, even when only the reference genome sequence is known.Winzeler et al., Science 281(5380), 1998. Brem et al., Science 296(5568), 2002. Steinmetz et al., Nature 416(6878), 2002. Borevitz et al., Genome Research 13(3), 2003. Given parental behavior, segregants can be genotyped via supervised classification. Slide27

Tiling arrays, probe sets, markers Probe set: group of probes which each exactly map to a unique locus and which interrogate a common polymorphism. Marker: one or more polymorphisms interrogated by the same probe set. 6: CTTCACTATTTGTACAGATCGCAAT 5: CTAACTTCACTATTTGTACAGATCG 4: GGCCCTAACTTCACTATTTGTACAG 2: GACTGGCCCTAACTTCACTATTTGT 1: GGAGGACTGGCCCTAACTTCACTAT S96: CCTCCTGACCGGGATTGAAGTGATAAACATGTCTAGCGTTA YJM789: CCTCCTGACCGGGATTGAACTGATAAACATGTCTAGCGTTA 3: GACTGGCCCTAACTTGACTATTTGT

Multivariate probe set dataparallel coordinate plots Slide29

A multivariate method SNPScanner: Gresham et al., Science 311, 2006 Detailed parametric model of probe intensity xiwith and without presence of SNP as function of • Probe GC content • Position of SNP within the probe • Nucleotides surrounding the SNP Fit these model parameters using two sequenced strains with known SNPs To genotype a segregant or new strain at a given base, compute a Bayes factor assumption: covariance matrices diagonal and same

SNP Scanner ~ 97% correct calls – not enough for the reliable detection of conversion events Parental arrays are informative, but alone often do not predict segregant behaviour. Purely supervised classification (i) wastes information and (ii) may be misleading. Binary classification boundary is necessary but not sufficient. Shapes of class distributions (→confidence) are useful for QA/QC.

ssG: a semi-supervised, model-based genotyping algorithm 2-component Normal mixture model p(x) = 1 pN(x | m1, S1) + 2 pN(x | m2, S2) For each array i and each probeset: (Xi,Yi) with array data Xi and class variable Yi.Yi known for parental arrays, unknown for segregants. Fit with the EM algorithm.

ssG An instance of the EM algorithm applied to multivariate Gaussian mixture modeling: iteratively estimate class shapes and object class membership probabilities

Examples of ssG probe set results (2d PCA view)

Aberrant probe sets: non-response

Aberrant probe sets: possible cross-hybridization?

Filtering ambiguous individual genotype calls Aberrant probe sets Weakly separating probesets Imbalanced probesets Probe Sets Genotype Calls

SSC calls

SNPscanner calls

Disagreement

Recombination event inference for one tetrad median intermarker-distance: 78bp Slide43

Event size and marker resolution 4163 crossovers, 2126 non-crossovers across 46 meioses. Slide44

Complex events

Inferring event rates Slide46

Recombination event rates Traditional corrections (e.g., Haldane) use recombination fraction, and adjust for unseen crossovers which occur between widely-spaced markers. High-density marker data invert the traditional relationship, placing multiple markers within most recombination events — both crossover (CO) and non-crossover (NCO).

Statistical model for event detection probabilities -M -w +w +M 0 Slide48

Hot spot identification Slide49

Hotspots Identified 179 recombination hot spots Incl. all previously known except for HIS2:HIS4, ARG4, CYS3, DED81, ARE1/IMG1, CDC19, THR4, LEU2-CEN3 None overlapped centromere Hottest: 28% of spores (59% of meioses) 84% overlap a promoter 25% of bases in hot spot intervals overlap promoters, while 68% overlap coding sequences

High-resolution mapping of meiotic crossovers and noncrossovers

High-resolution mapping of meiotic crossovers and noncrossovers

Presentation Transcript

Using High Resolution Imagery for Forest Pest Mapping

Meiotic Cell Division

High Resolution Manometry

DIY Digital Crossovers

A Complete Mapping Solution for Directly Georeferenced High Resolution Airborne Imagery.

High-resolution grids and coupling/ regridding .

HIGH RESOLUTION SURFACE CURRENT MAPPING IN BROWARD COUNTY

High-resolution mapping of meiotic crossovers and noncrossovers

High Resolution Melting

High-Resolution Usages

High-resolution Airborne Laser Swath Mapping DEM's for ICESat Calibration/Validation

A High-Resolution

BUILDING EXTRACTION AND POPULATION MAPPING USING HIGH RESOLUTION IMAGES

High-resolution mapping and analysis of the human regulatory genome

High resolution bathymetric mapping of the Indian Exclusive Economic Zone

Meiotic recombination mapping with tiling microarrays: genotype and rate inference

High resolution detection of IBD

Fine Mapping of Complex Traits in Yeast: Mapping Meiotic Recombination across the Genome

Occurrence of meiotic cell division

High-resolution genome-wide mapping of histone modifications

High Resolution Cameras

High-resolution structural mapping in Southwest Candor Chasma