270 likes | 418 Views
Lecture 25: Association Genetics. November 30, 2012. Announcements. Final exam on Monday, Dec 10 at 11 am, in 3306 LSB 2010 exam and study sheets posted on website Exam is mostly non-cumulative Review session on Friday, Dec. 7 Extra credit lab next Wednesday: up to 10 points
E N D
Lecture 25: Association Genetics November 30, 2012
Announcements • Final exam on Monday, Dec 10 at 11 am, in 3306 LSB • 2010 exam and study sheets posted on website • Exam is mostly non-cumulative • Review session on Friday, Dec. 7 • Extra credit lab next Wednesday: up to 10 points • Extra credit report due at final exam
Last Time • Quantitative traits • Genetic basis • Heritability • Linking phenotype to genotype • QTL analysis introduction • Limitations of QTL
Today • Association genetics • Effects of population structure • Transmission Disequilibrium Tests
Quantitative Trait Locus Mapping Parent 1 Parent 2 a b c a b c A B C A B C X HEIGHT F1 F1 A B C a b c bb Bb BB X GENOTYPE A B C a b c B b BB Bb BB bb bb BB Bb Bb BB A b c A B c A B c a B c a B c A b c a B c A b c a b c A b c A B C A B c A b c a B c a B c A b c a B c a B c modified from D. Neale
a b c a b c A B C A B C X A B C A B C a b c a b c X X chromosome Monoamine Oxidase A (MAOA) Brodkin et al. 2002 QTL for aggressive behavior in mice http://people.bu.edu/jcherry/webpage/pheromone.htm F1 A B c A b c A B c a B c a B c A b c a B c A b c
Monoamine Oxidase A (MAOA) • Selectively degrades serotonin, norephinephrine, and dopamine • Located near QTL for aggressive behavior on the X chromosome • Levels of expression affected by a VNTR (minisatellite) locus in the promoter region Sabol et al. 1998
Genotype-by-Environment interaction MAOA and childhood maltreatment Caspi et al. 2002
QTL Limitations • Biased toward detection of large-effect loci • Need very large pedigrees to do this properly • Limited genetic base: QTL may only apply to the two individuals in the cross! • Genotype x Environment interactions rampant: some QTL only appear in certain environments • Huge regions of genome underly QTL, usually hundreds of genes • How to distinguish among candidates?
Association Studies with Natural Populations • No pedigree required • Need large numbers of genetic markers • Small chromosomal segments can be localized • Many more markers are required than in traditional QTL analysis Cardon and Bell 2001, Nat. Rev. Genet. 2: 91-99 Linkage Disequilibrium and Quantitative Trait Mapping • Linkage and quantitative trait locus (QTL) analysis • Need a pedigree and moderate number of molecular markers • Very large regions of chromosomes represented by markers
* G T HEIGHT TT TC CC GENOTYPE * G T G C C A * T G A C Association Mapping ancestral chromosomes recombination through evolutionary history present-day chromosomes in natural population * T A Slide courtesy of Dave Neale
Candidate Region QTL Candidate Gene Identification I P_204_C 0.0 S8_32 8.8 P_2385_C P_2385_A 11.6 T4_10 12.1 S15_8 S5_37 13.8 T4_7 S6_12 15.5 S8_29 17.9 P_2786_A S12_18 20.4 T1_13 22.3 T7_4 23.5 T3_13 T3_36 24.1 S17_21 S15_16 T12_15 25.3 T2_30 26.5 S13_20 29.5 S1_20 36.5 T9_1 S1_19 43.2 50.5 S3_13 S1_24 52.9 S2_7 54.1 P_575_A 59.1 T12_22 60.6 S2_32 85.0 T7_9 95.7 ABOVE:BELOW COARSE ROOT S2_6 107.8 S13_16 T5_25 121.4 T5_12 124.3 T10_4 129.0 T1_26 T7_13 135.7 P_93_A 148.6 S4_20 150.2 S7_13 S7_12 152.8 T12_4 S4_24 T3_10 154.1 S6_4 P_2852_A 157.3 S3_1 163.4 S6_20 S13_31 171.3 T7_15 T2_31 178.2 S8_4 180.8 S8_28 182.1 O_30_A 184.2 T5_4 193.5 T3_17 198.1 T12_12 206.8 S5_29 210.6 P_2789_A 219.9 P_634_A S17_43 226.5 S17_33 230.3 S17_12 232.7 S4_19 243.1 S17_26 262.9 Candidate Gene Associations vs. Whole Genome Scans • If LD is high and haplotype blocks are conserved, entire genome can be efficiently scanned for associations with phenotypes • Simplest for case-control studies (e.g., disease, gender) • If LD is low, candidate genes are usually identified a priori, and a limited number are scanned for associations • Biased by existing knowledge • Use "Candidate Regions" from high LD populations, assess candidate genes in low LD populations
Human HapMap Project and Whole Genome Scans • LD structure of human Chromosome 19 (www.hapmap.org) • 1 common SNP genotyped every 5kb for 269 individuals • 9.2 million SNP in total • Take advantage of haplotype blocks to efficiently scan genome NATURE|Vol 437|27 October 2005
Next-Generation Sequencing and Whole Genome Scans • The $1000 genome is on the horizon • Current cost with Illumina HiSeq 2000 is about $2000 for 10X depth • The 1000 genomes project has sequenced thousands of human genomes at low depth • Can detect most polymorphisms with frequency >0.01 • True whole genome association studies now possible at a very large scale http://www.1000genomes.org/
Identifying genetic mechanisms of simple vs. complex diseases • Simple (Mendelian) diseases: Caused by a single major gene • High heritability; often can be recognized in pedigrees • Example: Huntington’s, Achondroplasia, Cystic fibrosis, Sickle Cell Anemia • Tools: Linkage analysis, positional cloning • Over 2900 disease-causing genes have been identified thus far: Human Gene Mutation Database: www.hgmd.cf.ac.uk • Complex (non-Mendelian) diseases: Caused by the interaction between environmental factors and multiple genes with minor effects • Interactions between genes, Low heritability • Example: Heart disease, Type II diabetes, Cancer, Asthma • Tools: Association mapping, SNPs !! • Over 35,000 SNP associations have been identified thus far: http://www.snpedia.com Slide adapted from Kermit Ritland
Complicating factor: Trait Heterogeneity Same phenotype has multiple genetic mechanisms underlying it Slide adapted from Kermit Ritland
Case-Control Example: Diabetes • Knowler et al. (1988) collected data on 4920 Pima and Papago Native American populations in Southwestern United States • High rate of Type II diabetes in these populations • Found significant associations with Immunoglobin G marker (Gm) • Does this indicate underlying mechanisms of disease? Knowler et al. (1988) Am. J. Hum. Genet. 43: 520
= [(8x71)-(29x92)]2 (200) (100)(100)(37)(163) = 14.62 Case-control test for association (case=diabetic, control=not diabetic) Gm Haplotype Question: Is the Gm haplotype associated with risk of Type 2 diabetes??? (1) Test for an association C21 = (ad - bc)2N . (a+c)(b+d)(a+b)(c+d) (2) Chi-square is significant. Therefore presence of GM haplotype seems to confer reduced occurence of diabetes Slide adapted from Kermit Ritland
Case-control test for association (continued) Question: Is the Gm haplotype actually associated with risk of Type 2 diabetes??? The real story: Stratify by American Indian heritage 0 = little or no indian heritage; 8 = complete indian heritage Conclusion: The Gm haplotype is NOT a risk factor for Type 2 diabetes, but is a marker of American Indian heritage Slide adapted from Kermit Ritland
Alleles at neutral locus Alleles causing susceptibility to disease Population structure and spurious association • Assume populations are historically isolated • One has higher disease frequency by chance • Unlinked loci are differentiated between populations also • Unlinked loci show disease association when populations are lumped together Population with low disease frequency Population with high disease frequency Gene flow barrier
Association Study Limitations • Population structure: differences between cases and controls • Genetic heterogeneity underlying trait • Random error/false positives • Inadequate genome coverage • Poorly-estimated linkage disequilibrium
Transmission Disequilibrium Test (TDT) (Spiegelman et al 1993) • Compare diseased offspring genotypes to parental genotypes to test if loci violate Mendelian expectations • Controls for population structure Mm mm a=# times M transmitted b=# times M not transmitted (a-b)2/(a+b) Approximately distributed as 2 with 1 degree of freedom mm Mm mm Mm Slide adapted from Kermit Ritland
Transmission Disequilibrium Test (TDT) • Compared with “standard” association tests: • Still need to have tight LD, so need many markers: • Is not affected by population stratification • Uses only affected progeny (and parental genotypes), so method is efficient • Only detects signal if there is both linkage and association, does not depend on mode of inheritance
Association Tests and Population Structure Cardon and Bell 2001 Nature Reviews Genetics 2:91 • Transmission disequilibrium tests have limited power and range of application • sample size limitations • restricted allelic diversity • “Genomic Control” uses random markers throughout genome to control for false associations • “Mixed Model” approach allows incorporation of known relatedness and population structure simultaneously
effect size (regression coefficient) (monotonic) transformation error (residual) p(β=0) phenotype (response variable) of individual i coded genotype(feature) of individual i ANOVA/Regression Model Goal: Find effect size that explains best all (potentially transformed) phenotypes as a linear function of the genotypes and estimate the probability (p-value) for the data being consistent with the null hypothesis (i.e. no effect) http://www2.unil.ch/cbg/index.php?title=Genome_Wide_Association_Studies
Mixed Model effects of background SNPs effect of target SNP Family effect (Kinship coefficient) Population Effect (e.g., Admixture coefficient from Structure or values of Principal Components) phenotype (response variable) of individual i Implemented in the Tassel program (Wednesday in lab)
Commercial Services for Human Genome-Wide SNP Characterization NATURE|Vol 437|27 October 2005 • Assay 1.2 million “tag SNPs” scattered across genome using Illumina BeadArray technology • Ancestry analyses and disease/behavioral susceptibility