200 likes | 400 Views
(1) Schedule Mar 15 Linkage disequilibrium (LD) mapping Mar 17 LD mapping Mar 22 Guest speaker, Dr Yang Mar 24 Overview Attend ENAR Biometrical meeting in Austin from Mar 20 to 23 (2) Projects - Work on a problem learnt in the class Select a problem from your own projects.
E N D
(1) Schedule Mar 15 Linkage disequilibrium (LD) mapping Mar 17 LD mapping Mar 22 Guest speaker, Dr Yang Mar 24 Overview Attend ENAR Biometrical meeting in Austin from Mar 20 to 23 (2) Projects - Work on a problem learnt in the class • Select a problem from your own projects
What I have learnt from my trip to Seattle -Fred Hutchinson Cancer Research Center -University of Washington Statistical Genetics of Complex Traits Single Nucleotide Polymorphisms (SNPs) Haplotype blocks HIV/AIDS dynamics Cancer progression
Statistical Genetics of Complex Traits Rongling Wu, Chang-Xing Ma and George Casella Springer-Verlag New York Linkage, Disequilibrium and QTL
Linkage Disequilibrium • Linkage analysis – controlled crosses (backcross or F2) and structured pedigrees (grandparent-parent-children generation) • Linkage disequilibrium analysis – Natural population • Linkage mapping is used in plant and animal genetics, as well as human genetics of diseases like cancers. • LD mapping is used for human genetics of diseases like HIV/AIDS and SARS.
Linkage mapping - backcross Mixture model-based likelihoodwithout marker information L(y|) = i=1n [½f1(yi) + ½f0(yi)] Height QTL genotype Sample (cm, y) Qq qq 1 184 ½ ½ 2 185 ½ ½ 3 180 ½ ½ 4 182 ½ ½ 5 167 ½ ½ 6 169 ½ ½ 7 165 ½ ½ 8 166 ½ ½
Linkage mapping - backcross Mixture model-based likelihoodwith marker information L(y,M|) = i=1n [1|if1(yi) + 0|if0(yi)] Sam- Height Marker genotype QTL ple (cm, y) M1 M2 Qq qq 1 184 Mm (1) Nn (1) 1 0 2 185 Mm (1) Nn (1) 1 0 3 180 Mm (1) Nn (1) 1 0 4 182 Mm (1) nn (0) 1- 5 167 mm (0) nn (1) 1- 6 169 mm (0) nn (0) 0 1 7 165 mm (0) nn (0) 0 1 8 166 mm (0) Nn (0) 0 1 Prior prob.
Linkage mapping - backcross Conditional probabilities of the QTL genotypes (missing) based on marker genotypes (observed) L(y,M|) = i=1n [1|if1(yi) + 0|if0(yi)] = i=1n1 [1 f1(yi) + 0 f0(yi)] Conditional on 11 (n1) i=1n2 [(1-) f1(yi) + f0(yi)] Conditional on 10 (n2) i=1n3 [ f1(yi) + (1-) f0(yi)] Conditional on 01 (n3) i=1n4 [0 f1(yi) + 1 f0(yi)] Conditional on 00 (n4)
Linkage mapping - backcross Normal distributions of phenotypic values for each QTL genotype group f1(yi) = 1/(22)1/2exp[-(yi-1)2/(22)], 1 = + a* f0(yi) = 1/(22)1/2exp[-(yi-0)2/(22)], 0 =
Linkage mapping - backcross Differentiating L with respect to each unknown parameter, setting derivatives equal zero and solving the log-likelihood equations L(y,M|) = i=1n[1|if1(yi) + 0|if0(yi)] log L(y,M|) = i=1n log[1|if1(yi) + 0|if0(yi)] Define 1|i= 1|if1(yi)/[1|if1(yi) + 0|if0(yi)] (1) 0|i= 0|if1(yi)/[1|if1(yi) + 0|if0(yi)] (2) 1 = i=1n(1|iyi)/ i=1n1|i (3) 0 = i=1n(0|iyi)/ i=1n0|i (4) 2 = 1/ni=1n[1|i(yi-1)2+0|i(yi-0)2] (5) = (i=1n21|i +i=1n30|i)/(n2+n3) (6)
Linkage disequilibrium mapping – natural population Mixture model-based likelihoodwithout marker information Suppose there is natural population with a segregating QTL of two alternative alleles, Q and q, Prob(Q)=q, Prob(q)=1-q→ Prob(QQ)=q2, Prob(Qq)=2q(1-q), Prob(qq)=(1-q)2 L(y|) = i=1n [[q2f2(yi) + 2q(1-q)f1(yi) + (1-q)2f0(yi)] Height QTL genotype Sample (cm, y) QQ Qq qq 1 184 q2 2q(1-q) (1-q)2 2 185 q2 2q(1-q) (1-q)2 3 180 q2 2q(1-q) (1-q)2 4 182 q2 2q(1-q) (1-q)2 5 167 q2 2q(1-q) (1-q)2 6 169 q2 2q(1-q) (1-q)2 7 165 q2 2q(1-q) (1-q)2 8 166 q2 2q(1-q) (1-q)2
Linkage disequilibrium mapping – natural population Association between marker and QTL -Marker, Prob(M)=p, Prob(m)=1-p -QTL, Prob(Q)=q, Prob(q)=1-q Four haplotypes: Prob(MQ)=p11=pq+D p=(p11+p10)/2 Prob(Mq)=p10=p(1-q)-D q=(p11+p01)/2 Prob(mQ)=p01=(1-p)q-D D=p11p00-p10p01 Prob(mq)=p00=(1-p)(1-q)+D
Joint and conditional (j|i) genotype prob. between marker and QTL QQ Qq qq Obs MM p112 2p11p10 p102 n2 Mm 2p11p01 2(p11p00+p10p01) 2p10p00 n1 mm p012 2p01p00 p002 n0 MM p112 2p11p10 p102 n2 p2p2p2 Mm 2p11p01 2(p11p00+p10p01) 2p10p00 n1 2p(1-p) 2p(1-p) 2p(1-p) mm p012 2p01p00 p002 n0 (1-p)2 (1-p)2 (1-p)2
Linkage disequilibrium mapping – natural population Mixture model-based likelihoodwith marker information L(y,M|)=i=1n[2|if2(yi) + 1|if1(yi) + 0|if0(yi)] Sam- Height Marker genotype QTL genotype ple (cm, y) MQQQq qq 1 184 MM (2) 2|i1|i 0|i 2 185 MM (2) 2|i1|i 0|i 3 180 Mm (1) 2|i1|i 0|i 4 182 Mm (1) 2|i1|i 0|i 5 167 Mm (1) 2|i1|i 0|i 6 169 Mm (1) 2|i1|i 0|i 7 165 mm (0) 2|i1|i 0|i 8 166 mm (0) 2|i1|i 0|i Prior prob.
Linkage disequilibrium mapping – natural population Conditional probabilities of the QTL genotypes (missing) based on marker genotypes (observed) L(y,M|) = i=1n [2|if2(yi) + 1|if1(yi) + 0|if0(yi)] = i=1n2 [2|2if2(yi) + 1|2if1(yi) + 0|2if0(yi)] Conditional on 2 (n2) i=1n1 [2|1if2(yi) + 1|1if1(yi) + 0|1if0(yi)] Conditional on 1 (n1) i=1n0 [2|0if2(yi) + 1|0if1(yi) + 0|0if0(yi)] Conditional on 0 (n0)
Linkage disequilibrium mapping – natural population Normal distributions of phenotypic values for each QTL genotype group f2(yi) = 1/(22)1/2exp[-(yi-2)2/(22)], 2 = + a f1(yi) = 1/(22)1/2exp[-(yi-1)2/(22)], 1 = + d f0(yi) = 1/(22)1/2exp[-(yi-0)2/(22)], 0 = - a
Linkage disequilibrium mapping – natural population Differentiating L with respect to each unknown parameter, setting derivatives equal zero and solving the log-likelihood equations L(y,M|) = i=1n[2|if2(yi) + 1|if1(yi) + 0|if0(yi)] log L(y,M|) = i=1n log[2|if2(yi) + 1|if1(yi) + 0|if0(yi)] Define 2|i= 2|if1(yi)/[2|if2(yi) + 1|if1(yi) + 0|if0(yi)] (1) 1|i= 1|if1(yi)/[2|if2(yi) + 1|if1(yi) + 0|if0(yi)] (2) 0|i= 0|if1(yi)/[2|if2(yi) + 1|if1(yi) + 0|if0(yi)] (3) 1 = i=1n(1|iyi)/ i=1n1|i (4) 0 = i=1n(0|iyi)/ i=1n0|i (5) 2 = 1/ni=1n[1|i(yi-1)2+0|i(yi-0)2] (6)
Complete data Prior prob QQ Qq qq Obs MM p112 2p11p10 p102 n2 Mm 2p11p01 2(p11p00+p10p01) 2p10p00 n1 mm p012 2p01p00 p002 n0 QQ Qq qq Obs MM n22 n21 n20 n2 Mm n12 n11 n10n1 mm n02 n01 n00n0 p11=[2n22 + (n21+n12) + n22]/2n, p10=[2n20 + (n21+n10) + (1-)n22]/2n, p01=[2n02 + (n12+n01) + (1-)n22]/2n, p11=[2n00 + (n10+n01) + n22]/2n, =p11p00/(p11p00+p10p01)
Incomplete (observed) data Posterior prob QQ Qq qq Obs MM 2|2i1|2i 0|2i n2 Mm 2|1i1|1i 0|1in1 mm 2|0i1|0i 0|0in0 p11=1/2n{i=1n2[22|2i+1|2i]+ i=1n1[2|1i+1|1i], (7) p10=1/2n{i=1n2[20|2i+1|2i]+ i=1n1[0|1i+(1-)1|1i], (8) p01=1/2n{i=1n0[22|0i+1|0i]+ i=1n1[2|1i+(1-)1|1i], (9) p00=1/2n{i=1n2[20|0i+1|0i]+ i=1n1[0|1i+1|1i] (10)
EM algorithm (1) Give initiate values (0) =(2,1,0,2,p11,p10,p01,p00)(0) (2) Calculate 2|i(1), 1|i(1)and 0|i(1)using Eqs. 1-3, (3) Calculate (1) using 2|i(1), 1|i(1)and 0|i(1)based on Eqs. 4-10, (4) Repeat (2) and (3) until convergence.