240 likes | 720 Views
What is Haplotyping?. T. C. A. G. Diploid organisms may be polyallelic at some loci ( highlighted to the left). A. C. C. G. SNP sequencing would show the genotype as A/T, C, A/C, G This would still leave two possibilities for the true haplotype: ACCG / TCAG or ACAG/TCCG .
E N D
What is Haplotyping? T C A G • Diploid organisms may be polyallelic at some loci (highlighted to the left) A C C G • SNP sequencing would show the genotype as A/T, C, A/C, G • This would still leave two possibilities for the true haplotype: ACCG/TCAG or ACAG/TCCG. • Single-molecule haplotyping can distinguish between chromosomes, and tell us that the true haplotype is ACCG/TCAG • Haplotypes give more information, and have better predictive power than SNP sequences
Haplotypes: Confusion among heterozygotes Assume we have a mother (M), father (F), and child (C). We look at two SNPs on each. Then we may have , for instance: SNP 1 SNP 2 Mother: A/C G/G : Father: A/C G/T : then Child: A/C G/T. The child can then have haplotype AG/CT, or AT/CG There is nothing to be done except collect more data – e.g. if there are two children or 4 grandparents. (Bring more data points, i.e., looking at more haplotypes only hurts you.) If one is homozygotic: Mother: A/C G/G : Father: C/C G/G : then Child: AC/GG or CC/GG ??
3 kbp 17 kbp Single Molecule Fluorescence Haplotyping • In our approach, PCR products are site- and allele-specifically labeled with single dye molecules, and are imaged to establish a “barcode” which can be used to determine the haplotype at selected SNPs. • Different alleles are labeled with different dyes (e.g. Cy3, Cy5), and can be distinguished by color
3 kbp 17 kbp ACCTGTCAGGCGTACCA Padlock probe labeling is used to allele-specifically label the SNPs of interest. TGGACAGTCCGCATGGT Molecular combing is used to stretch the DNA molecules on a surface prior to imaging Three Underlying Technologies FIONA is used to localize the labels. Combining all three gives barcoded DNA
Haplotyping by single molecule “bar-coding” Fig. 1 Unique "haplotype barcodes" for the two haplotypes where each allele of 5 SNPs is labeled with either a red or a green fluorescent probe. The distance and color combinations between labels along the blue-stained DNA backbone are determined by fluorescence single molecule detection with TIRF microscopy. The haplotype can then be inferred from the “barcode”.
Homozygous Person, at one position (rs12797) Fig. 2a Three Channels: Green= Cy3, Allele A Red = Cy5, Allele G Blue non-specific YoYo. A composite image of all three channels. The alleles of the SNP rs12797 were labeled with green (Cy3) dye for the A allele and red (Cy5) dye for the G allele. The positions of labeled alleles are indicated with red arrow. Few red labels were observed, indicating this sample is A/A homozygous.
Fig. 2B Statistics of Homozygous Person Histogram of the distance distribution of the results from Figure 2A. Red bar indicates the G allele and green bar represents the A allele respectively. The Gaussian curve fitting shows a green peak at 3311 ± 161 bp from one end, which is consistent with the expected distance of 3291
Heterozygous Person Fig. 2C A composite image of all three channels. The alleles of the SNP rs12797 were labeled with green (Cy3) dye for the A allele and red (Cy5) dye for the G allele. The positions of labeled alleles are indicated with red arrow. Both green and red labels were observed, indicating this sample is G/A heterozygous. Three Channels: Green= Cy3; Red = Cy5; Blue = non-specific YoYo.
Heterozygous Person, at one position (rs12797) Fig. 2D Histogram of distance distribution of the results in Figure 2C. Red indicates the G allele and green represents the A allele respectively. The Gaussian curve fitting shows a green peak and a red peak at 3459 ± 492 bp and 3413 ± 372 bp from one end respectively, which is consistent with the actual distance of 3291 bp. As expected 50% allele A, 50% allele G
Statistics on Heterozygote Fig. 4 All eight possible heterozygous haplotypes with their scores. The arrow indicates the score of the highlighted haplotype, RGGR/GRRG. Inset: Scores for Cy3 and Cy5 at each individual locus, showing that all four loci are heterozygous. RRRR=GGGG: The two are equivalent because we can show that all four positions are heterozygous. So you could show each haplotype as a pair, RGGR/GRRG for instance. Because all four positions are heterozygous, the presence of one implies the presence of the other.
Molecular Haplotyping • Why? • Diploid organisms (e.g. humans) may be heterozygous • Knowing the correlations of SNPs on each individual chromosome (the haplotype) confers more predictive power than SNPs alone • Most genotyping techniques use bulk PCR products, and cannot distinguish chromosomes • Single molecule analysis can allow us to distinguish discrete populations • How? • First, DNA is allele specifically labeled with single fluorescent probes using padlock-probe labeling • Second, DNA is stretched onto a surface using molecular combing • DNA backbone and fluorescent nucleotide labels are imaged using FIONA Double stranded DNA can be stretched with molecular combing onto a surface, as shown. Single stranded DNA can potentially be stretched if it is stabilized with RecA, a single-stranded DNA binding protein. Labeling of ssDNA with padlock probes can be more efficient with ssDNA.
Haplotyping and Genomics • Current genotyping technologies work on bulk PCR products, and hence cannot distinguish heterozygous haplotypes, because they cannot distinguish products from different chromosomes • Haplotypes have much more predictive power than simple SNP sequencing for determining disease susceptibility and drug interactions • Once general patterns have been discerned using genome-wide SNP scans, haplotyping will be the dominant means of determining precise correlations between regions of genomic interest and disease. • We will focus on a 500 kb region containing the HOXA locus on human chromosome 7, as part of the ongoing work of the International HapMap consortium.