400 likes | 609 Views
The Architecture of Linkage Disequilibrium Blocks Across Chromosomes 6, 21, & 22. Linkage Disequilibirum Mapping. Linkage disequilibrium mapping is based on the allelic association between a SNP surrogate marker and the phenotype influencing mutation being sought
E N D
The Architecture of Linkage Disequilibrium Blocks Across Chromosomes 6, 21, & 22
Linkage Disequilibirum Mapping • Linkage disequilibrium mapping is based on the allelic association between a SNP surrogate marker and the phenotype influencing mutation being sought • For it to work, a predictable relationship between LD strength and distance is required
LD mapping: We rely on allelic association 200 kb Sense genes DNA Causative SNP Antisense genes SNPs Marker SNP 100% 50% 0%
LD-distance relationship Stephens et al, 2001
LD “Block” Structure 200 kb Sense genes DNA Causative SNP Antisense genes SNPs Marker SNP
Haplotype Block Structure 200 kb Sense genes DNA Antisense genes SNPs Haplotype blocks 1 2 3 4
Patterns of LD across chromosomes 6, 21, & 22 • Haplotypes as the basic elements of genetic variation • Consideration of LD patterns necessary to design effective association studies • Understanding of LD patterns across the genome should be helpful for: • Selection of optimal SNP subsets for a given population • Support the data analysis of LD mapping studies • Estimate the necessary number of SNPs for whole-genome LD mapping studies
Patterns of LD across chromosomes 6, 21, & 22 • Find Linkage Disequilibrium “blocks” • Develop block descriptors • Block length • Haplotype diversity • Compare blocks across populations and chromosomes • Find correlations between block descriptors and other chromosomal features • Comparison with published data
Datasets used for this study • Chr 6 • SNPs: 7255 African-American; Caucasian (45 samples each, ~95% call rate); average spacing 24 Kb • 1281 gene regions targeted • Coverage: ~123 Mb out of 172.2 Mb (71%) • Chr 21 • SNPs: 7049 African-American; 7255 Caucasian; average spacing ~12 Kb • 264 gene regions targeted • Coverage: ~28.38 Mb out of 33.6 Mb (84%) • Chr 22 • SNPs: 3653 African-American; 4040 Caucasian (2334 common); average spacing ~10 Kb • 624 gene regions targeted • Coverage: ~28.6 Mb out of 36.6 Mb (78%) • Total coverage: 179.98 Mb – about 6.4% of genome
LD “block” definitions • Chromosome segments where allelic association among SNPs show little historical recombination • Low haplotype “diversity” • Methods used for block definition • LD based method • Typically using D’ as association statistic, plus some statistical significance test • Haplotype diversity-based method • Sweeping-window inference of haplotypes • Recombination evidence-based methods • Four gamete rule • Note that in most situations one is dealing with unphased genotype data
LD “block” definitions (2) • Gabriel et al. block criteria • D’ 95% confidence interval: upper bound >0.98, lower bound >0.7 • Minor allele frequency > 10%, HWE test p<0.01 • Heuristic used for blocks of <4 SNPs • Previously they used D’>0.8, but D’ is biased towards high values with small sample size and low allele frequency • Block criteria • D’ >0.9, p-value < 0.001 Fisher exact test • Minor allele frequency > 10%, HWE test p<0.01 • Both definitions allow for small amounts of recombination, including gene conversion, and occasional genotyping errors • Notice that these thresholds are somewhat arbitrary
Caucasians African-Americans 92 Kb Gap: 12 Kb 75 Kb Example of LD blocks Haplotypes 180 Kb region of Chromosome 6
LD block inference using common SNPs All available SNPs 401 blocks Common SNPs 276 blocks
LD block quality assessment • Tested the blocks found by Block method for violations of the 4-gamete rule • Perform test among all pairs within a block • For each violation within a block, find the haplotype which carry the pair causing the violation (i.e. lowest frequency) • Report the highest haplotype frequency found
Comparison between different LD block definitions D’ >0.9; p<0.001 401 blocks 95% C.I. 0.98-0.7 417 blocks
Gabriel et al. 4.8 Mb in blocks 3.91 Mn shared 350 blocks 40 independent blocks 889 Kb unique Block method 4.55 Mb in blocks 3.91 Mb shared 335 Blocks 20 independent blocks 649 Kb unique Comparison of different block definitions
Haplotype inferences • Infer common haplotypes (freq > 5%) within blocks • Haplotype population frequencies • No need to assign haplotypes to individuals for this study • Fast, scalable method (up to 10-15 SNPs) • Good accuracy
Measuring haplotype blocks diversity • Number of haplotypes • Heterozygosity • Polymorphic Information Index (PIC) • Shannon Entropy • Where Pi is the frequency of the ith haplotype • H increases with haplotype number and their evenness of their frequencies
Looking for correlations with chromosomal features • GC content, GpC islands, runs of bases • Repetitive elements density (LINE & SINE) • Recombination rate (from deCode & TSC genetic maps) • Chromosomal location (telomere vs. centromere) • Transcriptional activity (MPSS tags) • Intron/exons, gene length • Segmental duplications
Visualization of LD blocks on a chromosome scale • Looking at the patterns of LD across the whole chromosome • Block size & diversity • Allow to pinpoint interesting regions • Clustering of blocks • Unusual block diversity • Allow to find correlations with other chromosomal descriptors
Chr 22: Different block definitions D’ >0.9; p<0.001 95% C.I. 0.98-0.7
Chr 22: Different block definitions D’ >0.9; p<0.001 95% C.I. 0.98-0.7
Chr 22: Common SNPs All available SNPs Common SNPs
Conclusions • LD block distribution across chromosomes is not uniform • Hot and cold spots evident considering both block length & haplotype diversity (H) • Related to some extent to recombination rate • There are differences on block structure and distribution among the two populations studied • Caucasians have more blocks of greater average length • African-American blocks are usually nested within Caucasian blocks • There are private block spans in both populations, but Caucasians have more unique blocks • Different block definitions tested does not change the picture dramatically, but there are differences • Definitions are arbitrary and one needs to look at them from the practical point of view