1 / 17

High resolution detection of IBD

High resolution detection of IBD. Sharon R Browning and Brian L Browning Supported by the Marsden Fund. Aim. Detect short segments of identity by descent (IBD) in “unrelated” individuals or distant relatives <1 cM (or < 1 Mb) Need dense SNP data Account for linkage disequilibrium (LD)

amato
Download Presentation

High resolution detection of IBD

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. High resolution detection of IBD Sharon R Browning and Brian L Browning Supported by the Marsden Fund

  2. Aim • Detect short segments of identity by descent (IBD) in “unrelated” individuals or distant relatives • <1 cM (or < 1 Mb) • Need dense SNP data • Account for linkage disequilibrium (LD) • Various applications • IBD mapping in humans – midway between linkage mapping and association mapping. • Could be useful for QTL mapping in cows and sheep?

  3. What is IBD? Founder (grandmother) Half-cousins (may share IBD through grandmother) • In a pedigree, IBD is defined in terms of pedigree founders: • Two haplotypes are IBD if they are copies of the same founder haplotype. • IBD regions typically large (10+ cM) for small pedigrees.

  4. IBD without a pedigree is nebulous • Assuming no recurrent mutation, identical alleles are IBD • this definition leads to ordinary association tests • Useful IBD for improvements in mapping • Extends beyond background LD • Due to non-ancient ancestry

  5. What level of resolution is needed? • Very long IBD stretches (5+ Mb) • are easy to detect • but are too rare. • For IBD mapping • Expected size of IBD regions depends on when the mutation(s) entered the population. • Small IBD regions give better localization.

  6. IBD Model Part I • Uses Beagle model previously applied to • haplotype phase inference • imputation • multilocus association testing. • No need to prune SNPs → greater power to detect short segments. • Beagle LD model is computationally efficient.

  7. Beagle model • At each marker location, haplotypes are clustered. • Number of clusters can vary, depending on LD structure. • Approx. 100 clusters in a data set with 2000 individuals. • The model is constructed to be Markov (in the haplotype clusters).

  8. IBD Model Part II • Markov model for IBD with two states • 0 or 1 pair of haplotypes shared IBD between a pair of individuals. • Need to check for homozygosity within individuals first. • Transition probabilities specified by the user based on population history.

  9. IBD Model Part III • Allow for some genotyping error • Computationally prohibitive to sum over all possible miscalled genotypes. • Instead allow for IBD when there is no IBS, with a penalty. • P(haplotypes | IBD) multiplied by error rate if haplotypes are not IBS at the position. • Used error rate = 0.01 or 0.001 (depending on data quality). • Doesn’t correct for the messed up haplotypes caused by genotype error.

  10. Estimation • Build LD model using 10 iterations of stochastic EM. • Simultaneous phasing and IBD detection. • Don’t have to worry about getting haplotypes wrong. • Calculate IBD probabilities using forward-backward algorithm for this model. • Repeat with 3 restarts of LD model building, then average the IBD probabilities. • Model can get caught in local max, leading to false positive IBD.

  11. Threshold for IBD 1 IBD prob. 0.5 0 • We use a threshold of 0.99 on posterior IBD probability. • Define length of IBD region as distance over which IBD probability > 0.5 • but IBD probability must be ≥ 0.99 somewhere in the region. IBD region

  12. Data • 1958 British Birth Cohort (1958BC) • Genotyped on Illumina 550K platform (Sanger) and Affymetrix 500K (WTCCC). • Genotypes re-called by Beagle (using LD) to improve accuracy. • 1400 individuals.

  13. Detection of IBD – 1958BC • Chromosome 22, non-monomorphic markers • Illumina: 8407 SNPs • Affymetrix: 5098 SNPs • In 40,000 random pairs found • Illumina: • 54 IBD regions (lengths 0.52 – 12.5 cM) • Affymetrix: • 19 IBD regions (lengths 2.1 – 12.1 cM) • 58 regions total • For the 4 regions found by Affymetrix but not by Illumina, Illumina had IBD probability ≥0.92 • Various regions shown on next 3 slides.

  14. 0.5 cM region Illumina = solid black line; Affymetrix = dashed blue line

  15. Illumina = solid black line; Affymetrix = dashed blue line

  16. Illumina = solid black line; Affymetrix = dashed blue line

  17. Conclusions • New, very dense genotype data provide new opportunity to detect small IBD regions. • Detection of short IBD regions will play an important role in various genetic analyses. • Computation is challenging • Need a pre-filter?

More Related