170 likes | 347 Views
High resolution detection of IBD. Sharon R Browning and Brian L Browning Supported by the Marsden Fund. Aim. Detect short segments of identity by descent (IBD) in “unrelated” individuals or distant relatives <1 cM (or < 1 Mb) Need dense SNP data Account for linkage disequilibrium (LD)
E N D
High resolution detection of IBD Sharon R Browning and Brian L Browning Supported by the Marsden Fund
Aim • Detect short segments of identity by descent (IBD) in “unrelated” individuals or distant relatives • <1 cM (or < 1 Mb) • Need dense SNP data • Account for linkage disequilibrium (LD) • Various applications • IBD mapping in humans – midway between linkage mapping and association mapping. • Could be useful for QTL mapping in cows and sheep?
What is IBD? Founder (grandmother) Half-cousins (may share IBD through grandmother) • In a pedigree, IBD is defined in terms of pedigree founders: • Two haplotypes are IBD if they are copies of the same founder haplotype. • IBD regions typically large (10+ cM) for small pedigrees.
IBD without a pedigree is nebulous • Assuming no recurrent mutation, identical alleles are IBD • this definition leads to ordinary association tests • Useful IBD for improvements in mapping • Extends beyond background LD • Due to non-ancient ancestry
What level of resolution is needed? • Very long IBD stretches (5+ Mb) • are easy to detect • but are too rare. • For IBD mapping • Expected size of IBD regions depends on when the mutation(s) entered the population. • Small IBD regions give better localization.
IBD Model Part I • Uses Beagle model previously applied to • haplotype phase inference • imputation • multilocus association testing. • No need to prune SNPs → greater power to detect short segments. • Beagle LD model is computationally efficient.
Beagle model • At each marker location, haplotypes are clustered. • Number of clusters can vary, depending on LD structure. • Approx. 100 clusters in a data set with 2000 individuals. • The model is constructed to be Markov (in the haplotype clusters).
IBD Model Part II • Markov model for IBD with two states • 0 or 1 pair of haplotypes shared IBD between a pair of individuals. • Need to check for homozygosity within individuals first. • Transition probabilities specified by the user based on population history.
IBD Model Part III • Allow for some genotyping error • Computationally prohibitive to sum over all possible miscalled genotypes. • Instead allow for IBD when there is no IBS, with a penalty. • P(haplotypes | IBD) multiplied by error rate if haplotypes are not IBS at the position. • Used error rate = 0.01 or 0.001 (depending on data quality). • Doesn’t correct for the messed up haplotypes caused by genotype error.
Estimation • Build LD model using 10 iterations of stochastic EM. • Simultaneous phasing and IBD detection. • Don’t have to worry about getting haplotypes wrong. • Calculate IBD probabilities using forward-backward algorithm for this model. • Repeat with 3 restarts of LD model building, then average the IBD probabilities. • Model can get caught in local max, leading to false positive IBD.
Threshold for IBD 1 IBD prob. 0.5 0 • We use a threshold of 0.99 on posterior IBD probability. • Define length of IBD region as distance over which IBD probability > 0.5 • but IBD probability must be ≥ 0.99 somewhere in the region. IBD region
Data • 1958 British Birth Cohort (1958BC) • Genotyped on Illumina 550K platform (Sanger) and Affymetrix 500K (WTCCC). • Genotypes re-called by Beagle (using LD) to improve accuracy. • 1400 individuals.
Detection of IBD – 1958BC • Chromosome 22, non-monomorphic markers • Illumina: 8407 SNPs • Affymetrix: 5098 SNPs • In 40,000 random pairs found • Illumina: • 54 IBD regions (lengths 0.52 – 12.5 cM) • Affymetrix: • 19 IBD regions (lengths 2.1 – 12.1 cM) • 58 regions total • For the 4 regions found by Affymetrix but not by Illumina, Illumina had IBD probability ≥0.92 • Various regions shown on next 3 slides.
0.5 cM region Illumina = solid black line; Affymetrix = dashed blue line
Conclusions • New, very dense genotype data provide new opportunity to detect small IBD regions. • Detection of short IBD regions will play an important role in various genetic analyses. • Computation is challenging • Need a pre-filter?