170 likes | 175 Views
This lecture by Prof. Sorin Istrail discusses association studies, haplotype block structure, and tagging SNPs for evaluating genetic associations with phenotypes.
E N D
Association Studies, Haplotype Blocks and Tagging SNPs Prof. Sorin Istrail CSCI2950-L Tues. & Thur. 2:30-3:50p.m. CIT room 241
Control Non-responder Disease Responder Allele 0 Allele 1 Marker A: Allele 0 = Allele 1 = Marker A is associated with Phenotype Association studies
T T T T C C G G A G A G G A G A A G A A C A A A T T G G G T Association studies • Evaluate whether nucleotide polymorphisms associate with phenotype
T T T C C T G G A G A G G A G G A A A A C A A A G T T T G G Association studies
Hypothesis – Haplotype Blocks? • The genome consists largely of blocks of common SNPs with relatively little recombination within the blocks • Patil et al., Science, 2001; • Jeffreys et al., Nature Genetics, 2001; • Daly et al., Nature Genetics, 2001
Haplotype Block StructureLD-Blocks, and 4-Gamete Test Blocks 200 kb Sense genes DNA Antisense genes SNPs Haplotype blocks 1 2 3 4
One definition of block Based on the Four Gamete test. Intuition: when between two SNPs there are all four gametes, there is a recombination point somewhere inbetween the two sites
Four Gamete Block Test • Hudson and Kaplan 1985 A segment of SNPs is a block if between every pair of SNPs at most 3 out of the 4 gametes (00, 01,10,11) are observed. 0 0 1 0 1 1 1 1 0 1 1 1 0 0 1 0 1 1 1 1 0 1 0 1 BLOCK VIOLATES THE BLOCK DEFINITION
Finding Recombination Hotspots:Many Possible Partitions into Blocks A C T A G A T A G C C T G T T C G A C A A C A T A C T C T A T G A T C G G T T A T A C G A C A T A C T C T A T A G T A T A C T A G C T G G C A T All four gametes are present:
The final result is a minimum-size set of sites crossing all constraints. A C T A G A T A G C C T G T T C G A C A A C A T Find the left-most right endpoint of any constraint and mark the site before it a recombination site. A C T C T A T G A T C G Eliminate any constraints crossing that site. Repeat until all constraints are gone. G T T A T A C G A C A T A C T C T A T A G T A T A C T A G C T G G C A T
Only 4 SNPs are needed to tag all the different haplotypes Tagging SNPs A------A---TG-- G------G---CG-- A------G---TC-- A------G---CC-- G------A---TG-- ACGATCGATCATGAT GGTGATTGCATCGAT ACGATCGGGCTTCCG ACGATCGGCATCCCG GGTGATTATCATGAT An example of real data set and its haplotype block structure. Colors refer to the founding population, one color for each founding haplotype
0 0 1 1 1 0 0 0 0 1 Informativeness A measure for the “information” a SNP contains about about another SNP. Useful for designing SNPs Arrays and Tagging SNPs selection. s h1 h2
1 1 0 0 0 1 1 0 1 0 1 0 0 0 0 1 0 1 0 1 Informativeness s1 s2 s3 s4 s5 I(s1,s2) = 2/4 = 1/2
1 1 0 0 0 1 1 0 1 0 1 0 0 0 0 1 0 1 0 1 Informativeness s1 s2 s3 s4 s5 I({s1,s2}, s4) = 3/4
1 1 0 0 0 1 1 0 0 0 1 1 0 0 1 0 0 0 1 1 Informativeness s1 s2 s3 s4 s5 I({s3,s4},{s1,s2,s5}) = 3 S={s3,s4} is a Minimal Informative Subset
0 1 0 0 1 0 1 0 1 0 1 0 1 0 0 0 1 0 1 1 s1 s2 s3 s4 s5 e6 Informativeness e5 s5 Graph theory insight Minimum Set Cover= Minimum Informative Subset e4 s4 e3 s3 s2 e2 s1 e1 Edges SNPs
0 1 0 0 1 0 1 0 0 1 1 0 1 0 0 0 1 0 1 1 s1 s2 s3 s4 s5 e6 Informativeness e5 s5 Graph theory insight Minimum Set Cover {s3, s4}= Minimum Informative Subset e4 s4 e3 s3 s2 e2 s1 e1 SNPs Edges