620 likes | 818 Views
Analysis of whole genome association studies in pedigreed populations. Goutam Sahana Genetics and Biotechnology Faculty of Agricultural Sciences Aarhus University, 8830 Tjele, Denmark. Concept of mapping. Identification of genetic variant underlying disease susceptibility or a trait value.
E N D
Analysis of whole genome association studies in pedigreed populations Goutam Sahana Genetics and Biotechnology Faculty of Agricultural Sciences Aarhus University, 8830 Tjele, Denmark
Concept of mapping Identification of genetic variant underlying disease susceptibility or a trait value Evidence for the location of the gene = Causal variant
Approaches to Mapping • Candidate gene studies • Association • Resequencing approaches • Genome-wide studies • Linkage analysis • Genome-wide association studies (Linkage disequilibrium, LD mapping)
Linkage mapping • Look for marker alleles that are correlated with the phenotype within a pedigree • Different alleles can be connected with the trait in the different pedigrees
Association mapping • Marker alleles are correlated with a trait on a population level • Can detect association by looking at unrelated individuals from a population • Does not necessarily imply that markers are linked to (are close to) genes influencing the trait.
Linkage vs. association Unlikely to exist Linkage analysis Effect Association study Very difficult Freq. of causal variant Modified from D. Altschuler
Linkage vs. association Hirschhorn & Daly, Nature Rev. Genet. 2005
Allelic Association • Direct Association • Allele of interest is itself involved in phenotype • Indirect Association • Allele itself is not involved, but due to LD with the functional variant • Spurious association • Confounding factors (e.g., population stratification)
Linkage disequilibrium • Non random association between alleles at different loci. Loci are in LD if alleles are present on haplotypes in different proportions than expected based on allele frequencies • Two alleles that are in LD are occurring together more often than would be expected by chance
Linkage disequilibrium Locus A: Alleles A & a; freq. PA & Pa Locus B: Alleles B & b; freq. PB & Pb A b a B a b A B Possible haplotyoes Expected frequencies: pApB pApb papB papb Observed frequencies: pAB pAb paB pab D = pAB - pApB ≠ 0
LD variation across genome • The extent of LD is highly variable across the genome • The determinants of LD are not fully understood. • Factors that are believed to influence LD • Genetic drift • Population growth • Admixture or migration • Selection • Variable recombination rates
Haplotype Genotypes Locus1 2 4 Locus2 1 3 Locus3 3 2 Locus4 4 1 Locus5 2 3 Locus6 1 2 Haplotypes 2 3 2 4 3 1 4 1 3 1 2 2 Identification of phase PHASE BEAGLE
Haplotype-based analysis • Increased ability to identify regions that are shared identical by descent among affected individuals • Haplotypes may the causative ‘composite allele’ rather than a particular nucleotide at a particular SNP • Haplotype analysis is meaningful only if SNPS are in themselves in LD
Monogenic verses Complex traits
Monogenic trait • Mutation in single gene is both necessary and sufficient to produce the phenotype or to cause the disease • The impact of the gene on genetic risk is the same in all families • Follow clear segregation pattern in families • Typically rare in population
Complex trait • Multiple genes lead to genetic predisposition to a phenotype • Pedigree reveals no Mendelian pattern • Any particular gene mutation is neither sufficient nor necessary to explain the phenotype • Environment has major contribution • We study the relative impact of individual gene on the phenotype
Quantitative Trait A biological trait that shows continuous variation rather than falling into distinct categories Quantitative trait locus (QTL) - Genetic locus that is associated with variation in such quantitative trait
Assessing genetic contributions to complex traits • Continuous characters (wt, blood pressure) • Heritability: Proportion of observed variance in phenotype explained by genetic factors • Discrete characters (disease) • Relative risk ratio: λ= risk to relative of an affected individual/risk in general population • λ encompasses all genetic and environmental effects, not just those due to any single locus
Factors that influence identification of allelic association • Effect size • Linkage disequilibrium • Disease and marker allele frequencies • Sample Size Reviewed by Zondervar & Cardon, Nature Rev. Genet. 2004
Sample size No. of cases= no. of controls; D’=0.7; power 80%; =0.001 Zondervar & Cardon (Nature Rev. Genet. 2004)
Population stratification Consider two case/control samples, genotyped at a marker with alleles M and m Sample A Sample B 2 NS 2 NS
Population stratification Sample A Sample B 2 =14.8 P<0.001
2 No stratification E(2) Unlinked ‘null’ markers Test locus 2 E(2) Stratification Adjust test statistics Dealing with population structure • Genomic control (Devlin and Roeder, 1999) • Inflate the distribution of the test statistic by λ. • λ estimated from data
Dealing with population structure • Structured association (Pritchard et al., 2000) • Discover structure from set of unlinked markers, i.e. assign probabilities of ancestry from k populations to each individual, and then control for it.
Association analysis approaches • Case–control studies • Markers frequencies are determined in a group of affected individuals and compared with allele frequencies in a control population • Family based methods • Based on unequal transmission of alleles from parents to a single affected child in each family. Associations are summed over many unrelated families
Case-Control studies: 2 test Alleles Genotypes 2x3 contingency table 2x2 contingency table Test of independence: 2 = (O-E)2/E with 2 or 1 df
Family based tests • Genotypes from independent family trios where the child is affected • Use the non-transmitted genotypes or alleles as internal controls to the transmitted ones
Family-based association studies ? ? 1 4 transmitted 2 3non-transmitted 1 2 3 4 control 14 Is an allele transmitted more often than it’s not transmitted to affected offspring ?
TDT: Transmission Disequilibrium Test Non-transmitted G g G/g G/G ab cd G g Transmitted G/g TDTG = (TG-NTG)2/(TG+NTG) =(b-c)2/(b+c) ~ 21
TDT: Transmission Disequilibrium Test • Multiallelic markers • ETDT (Sham & Curtis, 1995) • Missing parent genotypes • TRANSMIT (Cayton,1999) • Haplotypes • TDTHAP (Clayton & Jones, 1999) • Sibs • TDT/STDT (Spielman & Ewens, 1998) • Pedigrees • PBAT (Martin et al, 2000) • Quantitative traits • QTDT (Abecasis et al. 2000)
Some limitations • Subjects – random or structure family • Parents not available • Difficult when there are very many genes individually of small effect • Environmental influence may obscure genetic effects • Genetic heterogeneity underlying disease phenotype • Hidden (unaccounted) relationship
Rare allele Single family is segregating A a B b Offspring groupI Offspring group II
Complex pedigree • Non-independence among pedigree members • Only polygenic relationship is not sufficient • Association analysis should account for the point-wise relationship among individuals • Identical-by-decent probabilities
Methods • Combined linkage and LD • Generalized linear models • Mixed-model (Yu et al. 2006) • Bayesian approach
Combined linkage and LD Phenotype= Fixed factors + Polygene + Haplotype • Polygene – the whole relationship in pedigree is used • Identical-by-descend coefficients were estimated for point-wise relationship Phase determination - GDQTL QTL mapping - DMU
QTL for Clinical Mastitis in cattle LD/LA LA LD
Simulation • 100 half-sib families (Dairy cattle pedigree) • 2000 progeny • 5 chromosomes – 100 cM (each) • SNP – 5000 • 15 QTL (1QTL-10%, 4QTL-5 %, 10QTL–2%) • 50% of the genetic variance • Heritability – 30%
Generalized linear models Phenotype= Sire-family + genotype Software – TASSEL http://www2.maizegenetics.net/index.php?page=bioinformatics/tassel
Mixed-model (Yu et al. 2006) Phenotype= Fixed factors + SNP + Population + polygene 0 1 2 STRUCTURE Relationship SAS mixed model (Gael Pressoir)