1 / 27

Haplotype inference and haplotype-based transmission disequilibrium test (Hap-TDT)

中国畜牧兽医学会信息技术分会 2009 年会 . 哈尔滨. Haplotype inference and haplotype-based transmission disequilibrium test (Hap-TDT). 丁向东 中国农业大学动物科技学院. Outline. Haplotype inference Introduction of important methods Parsimony (Clark,1990) EM (Excoffier and Slatkin, 1995 )

makala
Download Presentation

Haplotype inference and haplotype-based transmission disequilibrium test (Hap-TDT)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. 中国畜牧兽医学会信息技术分会 2009年会. 哈尔滨 Haplotype inference and haplotype-based transmission disequilibrium test (Hap-TDT) 丁向东 中国农业大学动物科技学院

  2. Outline • Haplotype inference • Introduction of important methods • Parsimony (Clark,1990) • EM (Excoffier and Slatkin, 1995) • Bayesian (Stephens and Donnelly,2001) • Haplotype inference using family information • Transmission Disequilibrium Test (TDT) • Haplotype-based TDT (Hap-TDT)

  3. Genotype and Haplotype Haplotype: A collection of alleles from same chromosome Allele & Genotype Haplotype & diplotype Haplotype reconstruction Chromosome phase is kown Chromosome phase is unkown

  4. Algorithms for haplotype inference • Statistical methods • Parsimony (Clark,1990) • EM • Excoffier and Slatkin (1995); Qin et al. (2002) • Bayesian • Stephens and Donnelly (2001); Niu et al. (2002) • Rule-based methods • Minimum recombination principle • Qian and Beckmann (2002); Baruch, et al. (2006)

  5. A1A1 A1A2 A2A3 A6A7 A3A4 A3A5 Parsimony (Clark,1990) • Start from a homozygote • Determine any other ambiguous sequence using the definitive haplotype from 1 • Continue this procedure until all haplotypes are resolved or until no more new haplotypes can be found

  6. EM algorithm • Numerical method of finding maximum likelihood estimates for parameters given incomplete data. • Initial parameter values: haplotype frequencies • Expectation step: compute expected values of missing data based on initial data • Maximization step: compute MLE for parameters from the complete data • Repeat with updated set of parameters until changes in the parameter estimates are negligible.

  7. Heavy computational burden with large number of loci Partition-ligation algorithm (Niu et al., 2002) PL-EM (Qin et al., 2002) Accuracy and departures from HWE Assumption of HWE in most EM-based methods Robust to departure from HWE (Fallin and Schork, 2000) EM algorithm efficiency

  8. Bayesian method • PHASE (Stephens and Donnelly, 2001) • Based on coalescent model • Use Gibbs sampling • So far, very accurate, but also complicated. • fastPHASE (Scheet and Stephens,2006) • Revised version of PHASE • Much faster, missing genotype

  9. Comparison of Parsimony,EM and PHASE Parsimony EM PHASE • PHASE performs better than parsimony and EM (Stephen, 2001) • PHASE and EM-based methods exhibited similar performances (Zhang et al. 2001; Xu et al. 2002)

  10. Haplotype inference using family data (1) • Haplotype inference based on close relatives • Reduces haplotype ambiguity and improves the efficiency • Rohde and Fuerst (2001) – EM algorithm • Families with both parents and their children • The genotyped offspring reduce the number of potential haplotype pairs for both parents. • Ding et al. (2006) - EM algorithm • Families with only one parent available

  11. Ding et al. – EM algorithm Families with only sibs Mixed family data Complete families Incomplete families One parent Only sibs Haplotype inference using family data (2) genotypes of sibs Parental information Haplotype pairs of sibs

  12. Comparison of four different strategies (Ding et al.,2006)

  13. Incomplete-family-EM PHASE GENEHUNTER Result for incomplete families Running time of PHASE: • 3.5 hs for the whole 100 datasets of 30 trios,187 SNPs (Marchini et al.,2006) • Running time will become prohibitive for large SNPs

  14. Rule-based method • Minimum recombination principle • Qian and Beckmann (2002); Baruch, et al. (2006) • Genetic recombination is rare • Haplotype with fewer recombinants should be preferred in a haplotype reconstruction Rule 1 Rule 2 … Wang et al., 2006

  15. Joint EM and rule-based algorithm for (grand-) daughter design • Assumption of no recombination • EM algorithm to construct diplotype • Taking into account recombination • Minimum recombination principle • Find the diplotype of sire that minimizes the number of recombinations in the sire family

  16. Example: • Possible diplotypesrecom. events • ------------------------------------------------ • 1. 22222 11111 1 • 2. 11222 22111 51 • 3. 11122 22211 49 • 4. 22221 11112 51 • 12222 21111 51 • ------------------------------------------------- A sire family with 50 offspring phase-unknown genotype of sire:(12;12;12;12;12)

  17. Results • 10 sire families with 30 offspring each • 10 SNPs , 0.5 centiMorgan

  18. Results - incomplete data • 10 sire families with 20 offspring each • 5SNPs, 0.5 centiMorgan • 30% individuals with one random missing locus.

  19. Linkage Family-based Matching/ethnicity generally unimportant Few markers for genome coverage (300-400 STRs) Can be weak design Good for initial detection; poor for fine-mapping Powerful for rare variants Association Families or unrelateds Matching/ethnicity crucial Many markers req for genome coverage (105 – 106 SNPs) Powerful design Poor for initial detection; good for fine-mapping Powerful for common variants; rare variants generally impossible Linkage vs Association

  20. TDT (Transmission Disequilibrium Test) • Compares the distribution of transmitted and non-transmitted alleles by parents of affected offspring (Spielman et al. 1993) If the marker is unlinked to the causative locus then we expect b = c, else, one of the alleles will tend to be transmitted more often

  21. TDT • TDT (Transmission Disequilibrium Test) • Good for fine-mapping, poor for initial detection • Robust for population stratification/admixture • Widely used in genome-wide association study • Extended to all kinds of data structure • Multi allelic markers (Sham and Curtis, 1995) • Multiple siblings (Spielman,1998; Boehnke, 1998) • Missing parental data (Sun, 1999) • Extended pedigree (Martin et al., 2000) • Quantitative traits (Allison,1997; Ding et al,2004)

  22. Comparison of PDT,QTDT and PTDT Quantitative trait

  23. Hap-TDT • Hap-TDT (Haplotype-based TDT) • Haplotype is more informative than single marker • The original TDT and most of its extensions consider one marker at a time. • Two categories of haplotype-based TDT • Haplotype reconstruction first • Sethuraman (1997); Wilson (1997); Clayton and Jones (1999); Zhao et al. (2000); Zhang et al.(2003) • Implicit haplotype reconstruction • Dudbridge (2003)

  24. Hap-TDT vs TDT Zhang et al. (2003) Qualitative trait Quantitative trait (h2 = 0.06)

  25. Hap-TDT vs TDT • 5SNPs, Excluding causal variant SNP

  26. A large number of haplotypes A large number of degrees of freedom Limit the power of test More markers Hap- TDT • Problem of multiple comparisons • Increase in the degree of freedom

  27. Acknowledgement • Prof. Qin Zhang (China Agricultural University) • Prof. Henner Simianer (University of Goettingen)

More Related