230 likes | 382 Views
Biostatistics-Lecture 19 Linkage Disequilibrium and SNP detection. Ruibin Xi Peking University School of Mathematical Sciences. Haplotype Freqeuncies. Linkage Equilibrium. Linkage Disequilibrium. Disequilibrium Coefficient D AB. D AB is hard to interpret. Sign is arbitrary …
E N D
Biostatistics-Lecture 19Linkage Disequilibrium and SNP detection Ruibin Xi Peking University School of Mathematical Sciences
DAB is hard to interpret • Sign is arbitrary … • A common convention is to set A, B to be the common allele and a, b to be the rare allele • Range depends on allele Frequencies • Hard to compare between markers
r2 (also called Δ2) • Ranges between 0 and 1 • 1 when the two markers provide identical information • 0 when they are in perfect equilibrium
Comparing Populations CEPH: Utah residents with ancestry from northern and western Europe (CEU)
Use LD for SNP imputation and detection fastPhase
Use LD for SNP imputation and detection fastPhase
Model for haplotypes • Observed n haplotypes • Each with M markers • bij = 0, 1 • Assume each haplotye originates from one of K clusters • zi: unknown cluster of origin of bi • Since clusters of origin are unknown
Local clustering of haplotype • Assume zi = (zi1,…, ziM) forms a Markov chain on {1,…,K} • zim denote the cluster origin for bim • Initial probabilities • Transition probabilities • Conditional on the cluster of origin • Marginal
Local clustering of genotype data • We have genotype data • gim: genotype at marker m of individual i • Take values 0, 1, 2 • Initial probabilities ( unordered cluster of origins) • Transition probabilities
Local clustering of genotype data • Genotype probabilities conditional on cluster of origins • Joint likelihood
Algorithms for genotype imputation • fastPhase • BEAGLE • IMPUTE • PLINK • MaCH
Algorithms for genotype imputation • fastPhase • BEAGLE • IMPUTE • PLINK • MaCH Picture taken from IMPUTE v2
SNP detection with LD information • MaCH: (G: genotye, S: cluster)
SNP detection with LD information • For sequencing data G is not observed • Coverage of base A, B are observed, we have the HMM
SNP detection with LD information Nielsen et al. 2011 Nature Review Genetics