1 / 12

Multiple-Locus Genome-Wide Association Testing

Discovering an efficient method for genome-wide association testing to identify multiple loci contributing to disease phenotypes. Various strategies tested, including single-locus tests and two-stage approaches, to reduce computational burden.

bclingerman
Download Presentation

Multiple-Locus Genome-Wide Association Testing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Multiple-Locus Genome-Wide Association Testing David Dean CSE280A

  2. Genome-wide Association Testing • Genome-wide association tests have used the concept of linkage disequilibrium (LD) to identify individual genes that correlate with disease phenotypes. • However, many human diseases arise out of the interaction of multiple genes, rather than just a single gene.

  3. Linkage Dis-equilibrium • SNPs that are close to each other on a chromosome tend to have a high correlation, relative to ones that are far apart from each other. Recombination works to undo this correlation. • Without recombination • P11 is not equal to P1*P*1 • D = |P11 – P*1P1*| • With recombination, LD will decay with distance between the two loci • Linkage Equilibrium: P11 = P1*P*1 (loci are independent)

  4. Disease Gene Mapping • The disease phenotypes of the individuals being studied can be treated as a column vector, similar to a column vector of SNPs. LD is used to find a locus that is close to the locus of interest. • If you find a locus (and a particular allele at that locus) that correlates highly with a particular disease phenotype, then one can infer that the allele “may play an important role” in the development of that disease.

  5. Epistasis • The interaction between genes, or epistasis, is an important area of genetics research, where much is still unknown. • For example, one gene may suppress the expression of another gene. • Gene-gene interactions can be synergistic (positive) or antagonistic (negative).

  6. The Problem • Testing multiple loci across the whole genome that interact and contribute to a particular phenotype can present a computational challenge. • Example: 104 individuals * 106 SNPs • # of SNP pairs = 106 * 106 = 1012 • # of SNP trios = 106 * 106 * 106 = 1018

  7. Objective • The objective is discover an efficient method to perform genome-wide association testing, which identifies multiple loci that may be interacting and contributing to a disease phenotype.

  8. Evans et al 2006 • 4 strategies tested: • Single-locus tests of association • Exhaustive two-locus search • Fit all possible two-locus models of association to all pairs of SNPs • “Both Significant” two-stage strategy • Applies single-locus test to determine which loci to include in the second stage of pairwise association testing • “Either Significant” two-stage strategy • Applies single-locus test to determine a set of loci to then test in second stage, but only requires 1 of pair to pass initial phase • These two-stage strategies were less powerful than the exhaustive two-locus search strategies, but were able to significantly reduce the computational burden

  9. Current Project • Start with n x m SNP matrix (Rana et al 2007) • n = # of haplotypes (~104) • m = # of SNPs (~106) • For a pair of SNPs, s1 and s2 • Labeled-hamming-distance: H[s1, s2] = min{p1p2 + q1q2, p1q2 + p2q1} if H is low, then s1 and s2 are correlated if H is high, then s1 and s2 are uncorrelated • Formalize and quantify an efficient filtering method • Identify a hamming distance, d1, to act as a threshold that filters out pairs that may be correlated • This small subset can then be exhaustively tested for epistatic interactions

  10. Current Project • PairedSNPs(δ,k) • Repeat for l iterations: • Select k rows of haplotypes at random • For each SNP location, j, hash into the SNP vector hj and the bitwise complement ĥj • Filter pairs of SNPs that have a hamming distance < d1n • Identify all pairs of SNPs that are filtered out at least (1 - δ)µ1 times • µ1 is the expected number of times that a SNP pair is filtered out, if the hamming distance is low (= d1) • µ1 = le-kd1

  11. Haploview • An open source application designed to analyze and visualize patterns of LD, and perform association testing on genetic data. • Haploview is developed and maintained by Dr. Mark Daly’s lab at MIT (Barrett et al 2005).

  12. References • Barrett, J.C., Fry, B., Maller, J., and Daly, M.J. Haploview: analysis and visualization of LD and haplotype maps. Bioinformatics, 21:263-265, 2005. • Brizna, D., He, J., and Zelikovsky, A. Combinatorial search methods for multi-SNP disease association. Proc. of IEEE EMBS Annual International Conference, 2006. • Evans, D.M., Marchini, J., Morris, A.P., and Cardon, L.R. Two-stage two-locus models in genome-wide association. PLoS Genetics, 2:e157, Sep 2006. • Rana, B.K., Insel, P.A., Payne, S.H., Abel, K., Beutler, E., Ziegler, M.G., Schork, N.J., and O’Connor, D.T. Population-based sample reveals gene-gender interactions in blood pressure in white americans. Hypertension, 49:96-106, Jan 2007.

More Related