220 likes | 374 Views
What is an association study? Define linkage disequilibrium. What are the different study designs and statistical approaches used to analyse such studies? Why was the HapMap project so vital for this work?. Louise McClelland FRCpath study course 15 th December 2010. Keywords / Phases.
E N D
What is an association study?Define linkage disequilibrium.What are the different study designs and statistical approaches used to analyse such studies?Why was the HapMap project so vital for this work? Louise McClelland FRCpath study course 15th December 2010
Keywords / Phases • Association • Haplotypes • SNP • Linkage disequilibrium • Founder effect • Admixture • Type 1 errors • Genome-wide association studies (GWAS) • Common disease – common variant hypothesis Population stratification • Transmission disequilibrium test (TDT) • Case-control study • HapMap • SNParrays • Odds ratio • Chi squared test
What is an association study? • Used to identify variants associated with complex diseases • Compare the incidence of variant(s) in affected patients to a carefully matched control group • Assess whether there is a statistical difference (association) between the F of a variant in the control and patient groups • Association is different to linkage ; • Relates to study of disease in populations NOT families • Does not rely on multi-case families or special family structures • Higher power to detect weak association effects
Causes of Association • Direct causation • Having a particular allele makes an individual susceptible to disease • Epistatic effect • Variant gives reproductive advantage to affected people • Population stratification • The population contains several genetically distinct subsets • Both the disease and the associated variant are frequent in one subset • e.g. Lander and Schork showed association of HLA*A1 allele and being able to eat with chopsticks in the San Francisco Bay area population - HLA*A1 is more common among the Chinese population of that area • Type 1 error • Risk of rejecting the Null hypothesis (no association) when it is true - False positives • Linkage disequilibrium (LD)
Linkage disequilibrium (LD) • “Association of certain alleles at two linked loci more frequently that would be expected by chance.” • Causes of LD • Founder effect – Caused by rapid growth of a small, isolated population • Selective pressure - Allele combination causes e.g. enhanced/diminished reproductive fitness • Population admixture - Interbreeding between two or more previously isolated populations within a species
LD varies around the genome because; • Recombination hotspots • Gene conversion may cause localised breakdown of LD • Population history No. meioses that separate 2 individuals from their common ancestor The shared chromosomal segment reduces
Common disease – Common Variant hypothesis • Common diseases are caused by a combination of relatively common low penetrance variants • Disease causing variants; • Have arisen in a common ancestor • LD is still present • Still causative • Relatively benign • Survived for numerous generations • Alternative: Mutation-selection • Many or most susceptibility factors are; • Deleterious enough to be removed by natural selection • De novo mutations replace old ones lost • Individually rare but each contribute high susceptibility • Ass studies have little power to detect these rare variants <5% patients
Types of association study • Candidate approach • Variant • Gene • Chromosome region • Variants across the genome (GWAS) • No prior assumptions about variant involved needed • Most disease associated variants will not actually be the cause of disease susceptibility but will be in LD with the causal variant(s)
Candidate association studies (pre-gwas) • Focused on small candidate chromosomal regions • 1950s – ABO and disease • 1960s – HLA loci and disease e.g. autoimmune • Limited success in diabetes (type 1 &2), age-related macular degeneration, schizophrenia • But... • Associations were seldom replicated • Poor study design • Inadequate matched controls • Insufficient correction for multiple testing • When ass was proved - Luck in underpowered studies • Costly, in money and manpower
What allowed genome-wide association studies (GWAS) to develop? • ~2005 • New technical developments • SNParray technology • Creation of consortia to co-ordinate large scale association studies (1000s of subjects) • Genetic Association information partnership (GAIN) in USA • Wellcome Trust Case-Control Consortium (WTCCC) in UK • HapMap project
WWW.HAPMAP.ORG • Began in 2002 • Consortium of academic institutions and pharmaceutical companies • Aim • Genotype at least 1 common SNP every 5 kb in the euchromatic portion of the human genome • Map blocks of linkage disequilibrium (LD) • Identify SNPs that can be used as ‘tags’ for a regions of LD • ‘Common’ SNPs • ~10 million • Defined as having a minor allele frequency (MAF) ≥0.05
Haplotypes phased by computer analysis • Phase 1 reported 2005 • ~1 mill SNPs • Phase 2 reported 2007 • 3.1 mill SNPs • 25-30% common variation • SNP density of 1 per kb Haplotypes phased directly from trios
results • All 4 populations showed similar LD blocks • Little or no LD between markers in adjacent haplotype blocks • Each haplotype block contains a few common haplotypes • 4.0-5.6 haplotypes account for 93-95% of copies of any given block. • Size of blocks similar in CEU, CHB and JPT (13.2-16.3 kb) but smaller in YRI (7.3 kb) • For each block, a small no. of tag-SNPs can defined which of the 4-6 common haplotype blocks a person carries • Tag-SNPs allow better GWAS study design to capture the most possible variation for number of SNPs used.
GWAS • Individuals from a chosen population are genotyped for a series of polymorphic markers • Compare allele frequencies in the disease cases to the control • Most common method used currently – SNP arrays • SNP good because; • Abundant • Lower mutation rate than e.g. microsatellites • Easy to genotype on a large scale, up to 1 million at once • More robust, replicable results than previous association studies. • A well designed study can detect disease associated alleles with relative risk as low as 1.2
Study design: Reduce chance of spurious observation • Large enough population size e.g. a few 1000 cases and controls • Well matched cases and controls • Be aware of population subgroups • Campbell et al., association between tall stature and persistence of intestinal lactase among European Americans • Re-matching individuals on basis of European ancestry greatly decreases the association (population stratification) • HLA*A1 and ability to eat with chopsticks (Lander and Schork) Blue is over represented among cases but only because it is more frequent in population 1
Study design: Population choice • Good population choices; • Founder groups e.g. Iceland, Finland, Quebec • Decode project (Iceland), similar projects in Quebec and elsewhere • Recent population admixture with widely different incidences of common disease • e.g. Lemba – a Bantu-Semitic hybrid pop in Africa • In reality the availability of large no of potential subjects with good medical record is more important than pop structure • Biobank UK • Collecting medical and lifestyle data and DNA from 500 000 British people 45-69 years and follows their health prospectively. • African populations require larger sample sizes and more elaborate stats • A tagged-SNP in European pop can give info on up to 10 SNPs • In African populations may only give information on a few others
Study design: Case and Control choice • Carefully selecting patient phenotype will increase the power of the study • Carefully matched control group • How closely the controls need to match depends on the population • WTCCC showed that the British Caucasian population only shows regional differences at a small number of chromosome locations so can be treated as one population
Study Analysis: Transmission disequilibrium test (TDT) • Genotype trios – proband and both parents (regardless of affected status) • Selection parents who are heterozygous at a loci • Compare no of times each allele is transmitted to their affected offspring • Assess over transmission by stats based on Chi squared test • (a-b)2/(a+b) • a = no. of times a heterozygous parent transmits allele I to an affected offspring • b = no. of times a heterozygous parent does not transmit allele I to an affected offspring • Modifications to the TDT: • An extended TDT (ETDT) has been created to work with multiallelic markers e.g. microsatellites • TDT can be used with only 1 parent but this may bias the result • Advantages • Overcomes population stratification problems (problem of matching your controls) • Non-inherited parental alleles serve as an internal control • Also can be used to check for non-faithful segregation of alleles which reflects effects from CNVs • Disadvantages • For late onset diseases parents are often not available so can use sib-TDT (looks at differences between affected and unaffected sibs)
Study Analysis: Case-control • Uses a disease cohort and unrelated controls • Usual statistical analysis is the Odds ratio • A measure of relative risk • Can be calculated directly from the study results (no need for pop F) • Advantages • Need 50% fewer samples than TDT • More feasible for late onset disease • Disadvantages • Clinical validity depends on the control group used • Odds ratio must be interpreted with caution, esp. for high F risk alleles
Limitations of GWAS studies • Cannot be used to detect rare susceptibility variants (<5% patients) • Identify the causal variant • Assess each variant in/near to haplotype block • ID factor(s) that give the strongest association • Functional assays • Inequality of population choice - tend to focus on populations of European origin because: • Where the funding comes from • Harder to study African populations because reduced LD • If GWAS yield clinically relevant results, could cause a health disequilibrium (but hasn’t as yet) • CNVs account for a greater no of variable nucleotides than SNPs • Often cause gene dosage effects of one or more gene e.g. salivary amylase
References • Amos, HMG, 16(R2), R220-R225, 2007 • Balding, Nat Rev Genet, 7, 781-790, 2006 • Bodmer and Bonilla, Nat Genet, 40(6), 695-701, 2008 • Campbell et al., Nat Genet 37 868-872 • International HapMap Consortium, Nat, 449, 851-862, 2007 • Mathew, Nat Rev Genet, 9, 9-14, 2008 • Risch and Merikangas, Science, 273(5281), 1516-7,1996 • Strachan and Read, HMG 4th Edition • Wellcome Trust Case Control Consortium, Nat, 447, 661-678, 2007