130 likes | 448 Views
What is an association study? Define linkage disequilibrium. Roy Poh St Georges hospital. Association studies. Association studies are carried out in order to provide a statistical statement about the co-occurrence of alleles and phenotypes.
E N D
What is an association study? Define linkage disequilibrium. Roy Poh St Georges hospital
Association studies • Association studies are carried out in order to provide a statistical statement about the co-occurrence of alleles and phenotypes. • For instant, allele A is associated with disease D if people who have D also have A significantly more often (or maybe less often) than would be predicted from the individual frequencies of D and A in the population.
Reasons for association • Direct causation: Having allele A makes the inidividual susceptible to disease X. (it is neither necessary nor sufficient to cause disease X, but it increases the likelihood) • Natural selection: People who have disease X might be more likely survive and have children, if they also have allele A • Population stratification: The population contains several genetically distinct subsets. Both the disease and allele may happen to be more frequent in one subset. • Type 1 error: Even without any true effect, 5% of the results will be significant at p=0.05 level, and 1% at p=0.01 level. The raw p value needs correcting for the number of questions asked. • Linkage disequilibrium: The goal of association studies in complex disease is to discover associations caused by LD between the marker and the disease. Population association therefore can have many possible causes and not always genetic.
Linkage disequilibrium • The occurrence together of two or more alleles at closely linked loci more frequently than would be expected from a random formation of haplotypes from alleles based on their frequencies. Non-random associations between genes at different loci are measured by the degree of linkage disequilibrium (D’).
Measures of Linkage disequilibrium • The two most common measures are D’ and r2
Measures of linkage disequilibrium • The maximum absolute value of D depends on the gene frequencies at the two loci as well as the extent of disequilibrium. Thus the value D’ is preferred as a measure of LD as it takes into account the gene frequencies. • The value of D’ is determined by dividing D by its maximum possible value, given the allele frequencies at the two loci, D’ = D/Dmax • D’=1 is complete LD, provide a useful indication of minimal historical recombination. • D’<1 indicate that the complete ancestral LD has been disrupted – meaning unclear and should not be used for comparisons of the strength of LD between studies, or to measure the extent of LD. Estimates of D’ are strongly inflated in small samples. • As a rule of thumb D’ >0.33 is taken as the threshold level of LD above which associations will be apparent
Measures of Linkage disequilibrium • R2 = D2/p1p2q1q2 (D2 divided by the product of the allele frequencies at the two loci) • R2 measures statistical association and there is a simple inverse relationship between this measure and the sample size required to detect association between susceptibility loci and SNPs • R2 takes value 1 if only two haplotypes are present.
Linkage disequilibrium vs Association • Linkage disequilibrium: relation between loci, specifically genetic relationship • Association: relation between alleles or phenotypes, statistical relationship with various causes (not necessarily genetic) • LD can be used to detect associations in a family/population eg: if a disease gene is linked to another locus, the haplotype shared in the family/population of that locus will be associated with the disease. (outside the family or in different populations the haplotype of the linked locus might be different)
More information… • LD does not decay smoothly with distance • Chromosomes contain a series of islands of long-range LD (up to 50kb) that are bound by recombination hotspots • If chromosomal and thus genomic LD structure is elucidated, then the haplotypes of each LD island defined, this data could then be used to test for association with any disease. (Hapmap project)
Example: • SORL1 (neuronal sortilin-related receptor gene) associated with Alzheimer disease (Nature Genetics 2007) • Accumulation of Aβ peptide (derivative of APP) is a central event in the pathogenesis of Alzheimer disease • Principal location for Aβ generation is during the re-entry and recycling of APP from the cell surface via the endocytic pathway • Inherited variants in these pathways may modulate APP processing and therefore affect risk of Alzheimer disease • Theory supported by reports that: 1)several candidate proteins from these pathways are reduced in brain tissues from AD patients 2)reduction in expression of these proteins is associated with increased Aβ production • Examined SNPs in VPS35; VPS26A; SORT1; SORCS1,1,3 and SORL1 in a discovery cohort and a replication cohort
An example: • One SNP in SORCS1, one SNP in SORCS2 and 2 SNPs in SORL1 were associated in the discovery data set • Investigated additional SNPs in these 3 genes, only SNPs in SORL1 showed further association in both the discovery and replication cohort (association remained after adjusting for age/sex and APOE genotype) • These SNPs were clustered at the 3’ and 5’ ends of the gene, allowed the formation of haplotypes (used D’ as a measure of LD to define haplotype blocks) • Different haplotypes (i.e 5’ or 3’) showed association in different populations • These findings were then replicated in a third group of cases and controls for confimation • Based on these findings cell biology of SORL1 was investigated. No pathogenic sequence variants were identified and further experiments led to the conclusion that these SNPs are most likely associated with intronic variants that reduce cell-type specific transcription of SORL1 (together with other genetic and non-genetic factors)
References: • Wall JD and Pritchard JK. 2003. Haplotype blocks and linkage disequilibrium in the human genome. Nature Reviews Genetics; 4:587-597 • Rogaeva et al. 2007. The neuronal sortilin- related receptor SORL1 is genetically associated with Alzheimer disease. Nat Genet.Advanced online publication.