1 / 75

Joint records & genomic & pedigree evaluation

Joint records & genomic & pedigree evaluation. Andrés Legarra # , Ignacio Aguilar * † # UR 631, SAGA, Castanet-Tolosan 31326 France *Animal and Dairy Science Department, University of Georgia, Athens 30602 † Instituto Nacional de Investigación Agropecuaria, Las Brujas 90200, Uruguay

kathie
Download Presentation

Joint records & genomic & pedigree evaluation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Joint records & genomic & pedigree evaluation Andrés Legarra#, Ignacio Aguilar * † #UR 631, SAGA, Castanet-Tolosan 31326 France *Animal and Dairy Science Department, University of Georgia, Athens 30602 †Instituto Nacional de Investigación Agropecuaria, Las Brujas 90200, Uruguay andres.legarra@toulouse.inra.fr with help from, I. Misztal, D.L. Johnson, T. Lawlor, M. Toro, JM Elsen, O. Christensen, L. Varona, P. Van Raden, G. de los Campos and others Zaragoza, 10/11/2009

  2. Financing • Eadgene • AMASGEN • Holstein Association of America

  3. Plan • Short review of models for genomic selection • The two-(three) step procedure • The genomic relationship matrix • …and its extensions to include pedigree • Performance of one-step evaluation

  4. Plan • Short review of models for genomic selection • The two-(three) step procedure • The genomic relationship matrix • …and its extensions to include pedigree • Performance of one-step evaluation

  5. Single marker • Assume there is a marker in complete LD with a QTL • For example, the polymorphism in DGAT1 which increases fat yield • Use a linear model to estimate its effect • yi= marker effect in animal i + noise

  6. Base model • y= Za + e • Z= incidence matrix of marker effects • a= marker effect • e=residuals 3 individuals, 1 marker with 4 alleles • This can be solved, for example, by least squares

  7. Single marker • This is fine if we know what markers are good predictor of what genes • But this is rarely the case

  8. Whole genome • The simpler is to do an extension of single marker association analysis • Do multiple marker regression • Why multiple? To account for all genes (markers) simultaneously • Works well only with dense markers! • Because to trace correctly QTLs we need some markers in LD with them

  9. Multiple marker additive model • y= Za + e • Z= incidence matrix of marker effects • a= marker effect • e=residuals 4 individuals, 2 markers each 2 alleles in 1st marker 4 alleles in 2nd marker

  10. A priori Distributions for marker effects • Several distributions have been proposed • Normal (Meuwissen et al., Genetics 2001; Van Raden JDS 2008) • Mixture of normal (Van Raden JDS 2008) • BayesA, BayesB (Meuwissen et al. 2001) • Lasso (YI, N. and S. XU, 2008 Genetics 179: 1045; De los Campos, Genetics, 2009; Park and Casella, Journal of the American Statistical Association) • No clear proof (from data) that any one is superior • I will use normal:

  11. Useful parameterizations • value of « 1 » allele = 0 • value of « 2 » allele = ai, where ai is the effect of the SNP at that locus • « 11 » = 0 • « 12 » = ai • « 22 » = 2ai

  12. Useful parameterizations • value of « 1 » allele = -0.5 ai • value of « 2 » allele = +0.5 ai, where ai is the effect of the SNP at that locus • « 11 » = -ai • « 12 » = 0 • « 22 » = +ai

  13. Useful parameterizations • value of « 1 » allele = -pi ai • value of « 2 » allele = (1-pi) ai, where ai is the effect of the SNP at that locus, and pi is the frequence of the allele 2 • Thus results in centered Z matrix (E(Za)=0 for any a) • « 11 » = -2piai • « 12 » = ai-2piai • « 22 » = 2ai-2piai • How do we choose p?

  14. Useful parameterizations • Different parameterizations do not give the same result • This is different from quantitative genetics theory • « Old » Falconerian genes are fixed and constant terms are absorbed by the mean • But now SNP are random effects

  15. Plan • Short review of models for genomic selection • The two-(three) step procedure • The genomic relationship matrix • …and its extensions to include pedigree • Performance of one-step evaluation

  16. Why 2-step procedure • BayesB and A and mixtures and Lasso are fine, but only some animals are genotyped • Do they have data? • This limits practical applications • Need to get pseudo-data for genotyped animals

  17. Inferring genotypes • Genotypes in some individuals can be inferred, only to some extent • Peeling • Peeling unilocus • Pseudo-peeling multi-locus • Gengler’s gene content prediction • They work well only a few generations back (forward) unless we genotype more individuals with low-density SNPs and then use (2) (Habier et al., Genetics 182: 343)

  18. Pseudo-data • So we need pseudo-data • EBV’s • DYD’s

  19. Pseudo-data • EBV’s • The problem with EBV’s is that they already share information among individuals • e.g., a dam EBV is = own yield + parent average + progeny contribution • But then we are including genetic contribution of parents, and thus the SNP effects that we want to estimate

  20. Pseudo-data • Also, EBV’s are correlated • The correlation depends on the amount of data and distribution across fixed effects and families • EBVs of two cows are correlated, for example, if they belong to the same herd, even if they are not related

  21. Pseudo-data • DYD’s avoid part of these problems (Van Raden Wiggans 1991) • DYD = daughter yield deviation • Record of the daughter, corrected by environmental effects and dam’s EBV • Thus DYD = 0.5 BV sire + mendelian sampling • E(DYD)=0.5 BV sire • YD’s exist for cows • YD = record –environmental effects

  22. Pseudo-data Problems of DYD’s / YD’s • YD’s little reliable and subject to preferential treatment • YD’s and DYD’s are less, yet still, correlated, and their variances (=accuracies) are very hard to estimate. This leads to serious problems (Neuner et al., 2008, 2009) • Hard to define for some species/traits • We accept regular BLUP with pedigree that we don’t like

  23. 2-3-step procedure • Get pseudo-data from pedigree-BLUP • USA (Van Raden et al., JDS 92:16) • Run genomic evaluations with DYDs • Combine with pedigree-BLUP • FR (Guillaume et al. JDS 91:2520; also • http://www.inst-elevage.asso.fr/html1/IMG/pdf_CR_0972128-JT_13_oct_2009.pdf) • Run joint « QTLs – additive infinitesimal » BLUP evaluation with DYDs • Need variance component estimates (difficult to compute with 20-35 QTLs they’re using)

  24. Real problem of Pseudo-data • Extremely complex procedure • Loss of generality • We analyse a subset of the population. • Thus, ungenotyped dams (or daughters) of a bull do not benefit from its improved accuracy.

  25. Plan • Short review of models for genomic selection • The two-(three) step procedure • The genomic relationship matrix • …and its extensions to include pedigree • Performance of one-step evaluation

  26. The genomic relationship matrix • Remember y = Za + noise; (phenotype = sum of SNP effects). • But we can say g = Za (genetic value = sum of SNP effects). This is a Breeding Value (=2EPD) in the literal sense. • Then it follows that • Var(g)=ZZ’2a • Standardizing • Var(g)=ZZ’2a/k = G2u • Where 2u is « the » additive variance

  27. The genomic relationship matrix • G : matrix of pseudo-relationship or « genomic relationships » • Also, a « molecular relationship matrix » • ZZ’ : « looks like » number of shared SNP alleles among two individuals

  28. Assume all possible combinations in a locus, a parameterization of 012; the covariance at that locus is 11 12 22 11 0 0 0 12 0 1 2 22 0 2 4

  29. The genomic relationship matrix • ZZ’ is different depending on how do we parameterize Z • Parameterizations are • -1,0,1 • 0,1,2 • -2p, 1-2p, 2-2p

  30. The genomic relationship matrix • For example, assume two individuals are • (11,12,11,22,11) • (22,11,12,22,12) • zz’ (its covariance) is • 4 with 0 1 2 • 0 with -1 0 1 • -1.75 with 2p, 1-2p, 2-2p • all are correct (yet different) since they represent a valid linear model!

  31. The genomic relationship matrix • How do we get the variance of SNP effects from a polygenic variance? • The formula assumes HW, linkage equilibrium of SNPs (which is false) Gianola et al. (2009) • This formula is (in HW) equal to trace(ZZ’)/ number of individuals in data • k is not the number of SNPs

  32. The genomic relationship matrix • Elements in G are related to « true » (IBD) relationships • Why? • Two guys share the same allele at the marker because they have a common ancestor (perhaps beyond pedigree founders)

  33. The genomic relationship matrix • E(G)=A (relationship matrix) + a constant matrix (Habier et al. 2007 Genetics 177:1389) • If we use the parameterization of -2p, 1-2p, 2-2p, then the constant is 0 (Van Raden, 2008 91:4414) • « If » p is the frequency at the founders • Otherwise the genotyped animals « are »the base population (Oliehoek et al. 173:483)

  34. The genomic relationship matrix • E(G)=A (relationship matrix) • A is an average relationship, deviations of which do exist • G more informative than A. • Two fullsibs might have a correlation of 0.6 or 0.4 • You need many markers to get these « fine relationships »

  35. Example This is the chromosome of a sire These are sons In the infinitesimal model, each son receives exactly half the sire.

  36. Example This is the chromosome of a sire These are FOUR sons • In reality, two sons are identical and other two are very different from the first two but alike among them.

  37. The genomic relationship matrix • G can be used for evaluation: • Same results as fitting marker effects a • Some nice properties • Pseudo-reliabilities from inverse • Smaller set of equations • Can use old programs 

  38. Plan • Short review of models for genomic selection • The two-(three) step procedure • The genomic relationship matrix • …and its extensionsto include pedigree • Performance of one-step evaluation

  39. Proposals for overall relationship matrix(Legarra et al., 2009 JDS 92:4656; Christensen & Lund, EAAP 2009) • Not big loss in assuming normality for SNP effects (Cole et al., Van Raden et al.) • G easy to be constructed then • Can we include G in the relationship matrix? • If we construct an overall relationship matrix with good properties, then we can just do BLUP with all data and animals

  40. Proposals for overall relationship matrix(Legarra et al., 2009 JDS 92:4656; Christensen & Lund, EAAP 2009; also Misztal et al., 92:4648) • Naif • Modification for progeny • Overall modification

  41. Naif proposal (Legarra et al., 2009; Gianola & De los Campos, 2008) • Let • 1 : ungenotyped animals • 2 : genotyped animals

  42. Naif proposal • Modification • 1 : do not touch • 2 : plug G (=K-1 in G&dlC) • negative definite • Incoherent • Sons (parents) of two animals correlated in G might not be correlated themselves in A11

  43. Naif proposal • Does not work • Assume 2 are bulls and 1 are cows • Then bulls’ EBV can be computed as • and G serves to nothing… • This is because Ag is not a valid covariance matrix, as assumed by selection index or BLUP • Ignoring G to compute covariances among 1 and 2 or individuals in 1 is wrong thisZ = incidence matrix of animals (not of SNPs!)

  44. Modification for progeny • Assume all parents (« 1 ») are genotyped and use G • Use Quaas’ 1988 tracing of BV’s • Use average transmissions of 0.5 • Each son is half parents + mendelian samplings

  45. Proposals for overall relationship matrix • Matrix becomes

  46. What are these T’s and P’s • P: you are half your parents. One row of the pedigree file • T: you have ½ genes of your parents, ¼ of your grandparents, 1/8 of your grandgrandparents and so on. • Can be computed from pedigree file through recursion • D: mendelian samplings

  47. Proposals for overall relationship matrix • Matrix becomes • Correct, positive definite if all founders genotyped • Otherwise incoherent « backwards » • Animals correlated in G might have uncorrelated ancestors • Not very practical because it is complicated and does not account for ancestors

  48. Overall modification • What would we do « in the old times »? • Compute breeding values for whatever animals • Then use the « classical » selection index based on pedigree

  49. Overall modification • So then: • and we can construct Variance of the selection index (under normality) Selection index Genotyped Ungenotyped

  50. Overall modification • This leads to: • (Semi) positive definite (by construction) • No obvious incoherences • Identical to Ap if all founders genotyped

More Related