340 likes | 463 Views
QTL Mapping Using Mx. Michael C Neale. Virginia Institute for Psychiatric and Behavioral Genetics Virginia Commonwealth University. Overview. Alternative approach Linkage as Mixture Univariate/Multivariate One/more loci Practical considerations Power Pihat vs covs Larger Sibships.
E N D
QTL Mapping Using Mx Michael C Neale Virginia Institute for Psychiatric and Behavioral GeneticsVirginia Commonwealth University
Overview • Alternative approach • Linkage as Mixture • Univariate/Multivariate • One/more loci • Practical considerations • Power • Pihat vs covs • Larger Sibships
Schematic of Genome QTL Marker 1 Marker 2 Marker 3 Marker 4 d1 d2 d3 d4
Genetic Heterogeneity Sib pairs IBD at a locus, parents AB and CD AC AD BC BD AC 2 1 1 0 AD 1 2 0 1 BC 1 0 2 1 BD 0 1 1 2
Pi hat approach • 1 Pick a putative QTL location • 2 Compute p(IBD0) p(IBD1) p(IBD2) given • marker data [Mapmaker/sibs] • 3 Compute = p(IBD2) + .5p(IBD1) • 4 Fit model • Repeat 1-4 as necessary for different locations ^ B Elston & Stewart
Major QTL effects DZ twins ^ B .25 1 .5 A1 C1 D1 E1 Q1 Q2 E2 D2 C2 A2 P1 P2
Normal Theory Likelihood Function For raw data in Mx m ln Li=filn [ 3 wjg(xi,:ij,Gij)] j=1 xi- vector of observed scores onn subjects :ij - vector of predicted means Gij - matrix of predicted covariances - functions of parameters
General Likelihood Function Things that may differ over subjects m • Model for Means can differ • Model for Covariances can differ • Weights can differ • Frequencies can differ ln Li=fi ln [ 3 wij g(xi,:ij,Gij)] j=1 i = 1....n subjects (families)
Normal distributionN(:ij,Gij) Likelihood is height of the curve N 0.5 0.4 0.3 G 0.2 likelihood 0.1 0 -4 -3 -2 -1 0 1 2 3 4 : xi
Weighted mixture of models Finite mixture distribution m ln Li=fi ln [ 3 wij g(xi,:ij,Gij)] j=1 j = 1....m models wij Weight for subject i model j e.g., Segregation analysis
Mixture of Normal Distributions Two normals, propotions w1 & w2, different means g 0.5 0.4 w1 x l1 0.3 0.2 w2x l2 0.1 0 xi -4 -3 -2 -1 0 1 2 3 :2 :1 But Likelihood Ratio not Chi-Squared - what is it?
Weighted Likelihood Method • 1 Pick a putative QTL location • 2 Compute p(IBD0) p(IBD1) p(IBD2) given marker data • these are "WEIGHTS" • 3 Compute likelihood of phenotype data under each of 3 IBD conditions • 4 Maximize weighted likelihood of 3 • Repeat 1-4 as necessary for different locations
Mixture method Add them up .5 .25 1 .5 p(IBD1) x A1 C1 D1 E1 Q1 Q2 E2 D2 C2 A2 P1 P2 p(IBD2) x p(IBD0) x 1 0 .25 1 .5 .25 1 .5 A1 C1 D1 E1 Q1 Q2 E2 D2 C2 A2 A1 C1 D1 E1 Q1 Q2 E2 D2 C2 A2 P1 P2 P1 P2
Dataset structure Rectangular format Locus 1 Locus 2 Id sex age P1 P2 IBD0 IBD1 IBD2 IBD0 IBD1 IBD2 1231 1 24 103.5 115.6 .81 .13 .06 .28 .51 .21 1781 0 29 127.4 145.6 .23 .65 .11 .08 .57 .35 1952 1 39 98.5 . .81 .13 .06 .28 .51 .21 2056 1 19 93.5 100.3 . . . .20 .40 .40 Missing data: Phenotypes ML Markers Listwise
Mx Script Mixture method !QTL analysis via Mixture Distribution method !Using marker1 !Using DZ twins only !Analysis of LDL !Dutch Adults #define nvar 1 !different for multivariate #define nsib 2 !number of siblings #NGroups=2
Mx Script Mixture part 2 G1: Parameter Estimates Calculation Begin Matrices; X Lower nvar nvar Free !familial background Z Lower nvar nvar Free !unique environment L Full 1 1 Free !QTL effect M Full 1 nvar Free !means H Full 1 1 End Matrices; Matrix H .5 Begin Algebra; F= X*X'; !familial variance E= Z*Z'; !unique environmental variance Q= L*L'; !variance due to QTL V= F+Q+E; !total variance T= F|Q|E; !parameters in one matrix for standardizing S= T@V~; !standardized variance component estimates End Algebra; Labels Row S standest Labels Col S f^2 q^2 e^2 Labels Row T unstandest Labels Col T f^2 q^2 e^2 End
Mx Script G2: Dizygotic twins #include lipiddzmix.dat Select ibd0m1 ibd1m1 ibd2m1 ldl1 ldl2; Definition ibd0m1 ibd1m1 ibd2m1; Begin Matrices = Group 1; K Full 3 1 !IBD probabilities (from Merlin) U Unit 3 2 End Matrices; Specify K ibd0m1 ibd1m1 ibd2m1 Means U@M; Covariance F+Q+E | F _ F | F+Q+E _ ! IBD 0 Covariance matrix F+Q+E | F+ h@Q_ F+h@Q | F+Q+E _ ! IBD 1 Covariance matrix F+Q+E | F+Q _ F+Q | F+Q+E; ! IBD 2 Covariance matrix Weights K; ! IBD probabilities Start 1 All Start 2.8 M 1 1 1 Option NDecimals=3 Option Multiple Issat End
Mx Script Mixture part 4 ! Test significance of QTL effect Drop L 1 1 1 End
Output Pihat Method Summary of VL file data for group 1 Code -3.000 -2.000 -1.000 1.000 2.000 Number 190.000 190.000 190.000 190.000 190.000 Mean 0.234 0.510 0.256 4.927 4.928 Variance 0.104 0.096 0.096 1.092 1.325 MATRIX F This is a LOWER TRIANGULAR matrix of order 1 by 1 1 1 0.898 MATRIX Q This is a FULL matrix of order 1 by 1 1 1 0.540
Output QTL Effect Present Your model has 4 estimated parameters and 950 Observed statistics -2 times log-likelihood of data >>> 1057.064 Degrees of freedom >>>>>>>>>>>>>>>> 946 QTL Effect Absent Your model has 3 estimated parameters and 950 Observed statistics -2 times log-likelihood of data >>> 1059.025 Degrees of freedom >>>>>>>>>>>>>>>> 947 Difference chi-squared = 1.961 (1 df)
Output Pihat Method QTL Effect Present Your model has 4 estimated parameters and 950 Observed statistics -2 times log-likelihood of data >>> 1057.500 Degrees of freedom >>>>>>>>>>>>>>>> 946 QTL Effect Absent Your model has 3 estimated parameters and 950 Observed statistics -2 times log-likelihood of data >>> 1059.025 Degrees of freedom >>>>>>>>>>>>>>>> 947 Difference chi-squared = 1.525 (1 df)
Summary • SEM - QTL direct relationship • Mx graphical/script approaches • Mixture vs Pihat • Multivariate treatment • Multilocus • Missing Data • Ascertainment
How much more power? • Large sibships much more powerful • Dolan et al 1999 • Pihat simple with large sibships • Solar, Genehunter etc • Pihat shows substantial bias with missing data
Expected IBD Frequencies Sibships of size 2
Expected IBD Frequencies Sibships of size 3
More power in large sibships Dolan, Neale & Boomsma (2000) +Size 2 o Size 3 * Size 4
Number of IBD Combinations As a function of number of sibs in family Sibship Size Number of combinations 2 3 3 10 4 36 5 136 6 528 7 2080 8 7196
Mixture Approach for Pedigrees Some ideas • Iterate configurations within families • Only use non-zero IBD probabilities • Set threshold? • Improves with genotype data • Allows moderated genotypes
Strategy 2 • Families within combinations • Limited # of IBD configurations • Depends on max sibship size • Usually Faster • Can do missing data • Cannot do moderator variables
Multivariate QTL Vectors of variables, Matrices of paths Three component mixture ^ B .25 1 .5 A1 C1 D1 E1 Q1 Q2 E2 D2 C2 A2 P1 P2
Two locus model ^ B2 ^ B1 .25 1 R1 C1 A1 E1 Q1 Q2 E2 A2 C2 R2 P1 P2
Two locus model mixture p(ibd0 R) p(ibd1 R) p(ibd2 R) 0 .5 1 0 0 0 .25 1 .25 1 .25 1 R1 C1 A1 E1 Q1 Q2 E2 A2 C2 R2 R1 C1 A1 E1 Q1 Q2 E2 A2 C2 R2 R1 C1 A1 E1 Q1 Q2 E2 A2 C2 R2 p(ibd0 Q) P1 P2 P1 P2 P1 P2 0 .5 1 .5 .5 .5 .25 1 .25 1 .25 1 R1 C1 A1 E1 Q1 Q2 E2 A2 C2 R2 R1 C1 A1 E1 Q1 Q2 E2 A2 C2 R2 R1 C1 A1 E1 Q1 Q2 E2 A2 C2 R2 p(ibd1 Q) P1 P2 P1 P2 P1 P2 0 .5 1 1 1 1 .25 1 .25 1 .25 1 R1 C1 A1 E1 Q1 Q2 E2 A2 C2 R2 R1 C1 A1 E1 Q1 Q2 E2 A2 C2 R2 R1 C1 A1 E1 Q1 Q2 E2 A2 C2 R2 p(ibd2 Q) P1 P2 P1 P2 P1 P2
Multivariate multilocus multipoint • Eaves Neale & Maes 1996 • 10 minutes for 5 phenotypes • Restart at previous solution • Only fit null model (q=0) once
Not dead yet • Latent variable qtls • Multiple rater • Comorbidity • Repeated measures