1 / 18

Linear Reduction Method for Tag SNPs Selection

Explore the linear reduction method for tag single nucleotide polymorphisms (SNPs) selection, maximizing tagging separability to reduce cost in haplotype tagging problems. Conclusions and future work outlined.

Download Presentation

Linear Reduction Method for Tag SNPs Selection

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Linear Reduction Method for Tag SNPs Selection Jingwu He Alex Zelikovsky

  2. Outline • SNPs , haplotypes and genotypes • Haplotype tagging problem • Linear reduction method for tagging • Maximizing tagging separability • Conclusions & future work

  3. Outline • SNPs , haplotypes and genotypes • Haplotype tagging problem • Linear reduction method for tagging • Maximizing tagging separability • Conclusions & future work

  4. Human Genome and SNPs • Length of Human Genome  3  109 base pairs • Difference b/w any people  0.1% of genome  3  106 SNPs • Total #single nucleotide polymorphisms (SNP)  1  107 • SNPs are mostly bi-allelic, e.g., alleles A and C • Minor allele frequency should be considerable e.g. > 1% • Diploid = two different copies of each chromosome • Haplotype = description of single copy (0,1) Genotype = description of mixed two copies (0=00, 1=11, 2=01) 0 0 1 1 1 1 1 0 0 1 1 0 0 1 1 1 0 0 Two Two haplotypes haplotypes per individual per individual  1 1 1 1 0 0 1 0 0 1 1 0 0 1 0 0 0 0 Genotype for the individual Genotype for the individual 2 2 1 1 2 2 1 0 0 1 1 0 0 1 2 2 0 0

  5. Haplotype and Disease Association • Haplotypes/genotypes define our individuality • Genetically engineered athletes might win at Beijing Olympics (Time (07/2004)) • Haplotypes contribute to risk factors of complex diseases (e.g., diabetes) • International HapMap project: http://www.hapmap.org • SNP’s causing disease reason are hidden among 10 million SNPs. • Too expensive to search • HapMap tries to identify 1 million tag SNPs providing almost as much mapping information as entire 10 million SNPs.

  6. Outline • SNPs, haplotypes and genotypes • Haplotype tagging problem • Linear reduction method for tagging • Maximizing tagging separability • Conclusions & future work

  7. Tagging Reduces Cost • Decrease SNP haplotyping cost: • sequence only small amount of SNPs = tag SNP • infer rest of (certain) SNPs based on sequenced tag SNPs • Cost-saving ratio = m / k (infinite population) • Traditional tagging = linkage disequilibrium (LD) needs too many SNPs, cost-saving ratio is too small (≈ 2) • Proposed linear reduction method: cost-saving ratio ≈ 20 Number of SNPs: m Number of Tags : k

  8. Haplotype Tagging Problem • Given the full pattern of all SNPs for sample • Findminimum number of tag SNPs that will allow for reconstructing the complete haplotype for each individual

  9. Outline • SNPs, haplotypes and genotypes • Haplotype tagging problem • Linear reduction method for tagging • Maximizing tagging separability • Conclusions & future work

  10. Linear Rank of Recombinations • Human Haplotype Evolution = • Mutations – introduce SNPs • Recombinations – propagate SNPs over entire population • Replace notations (0, 1) with (–1, 1) • Theorem: Haplotype population generated from l haplotypes with recombinations at k spots has linear rank (l-1)(k+2) • It is much less than number of all haplotypes = l k • Conclusion: use only linearly independent SNP’s as tags

  11. Tag SNPs Selection • Tag Selecting Algorithm • Using Gauss-Jordan Elimination find Row Reduced Echelon Form (RREF) X of sample matrix S. • Extract the basis T of sample S • Factorize sample S = T  X • Output set of tags T • Fact: In sample, each SNP is a linear combination of tag SNPs • Conjecture: In entire population, each SNP is same linear combination of tags as in sample = × tags T rref X Sample S

  12. Haplotype Reconstruction • Given tags t of unknown haplotype h andRREF X of sample matrix S • Find unknown haplotype h • Predict the h’ = t  X • We may have errors, since predicted h’ may not equal to unknown haplotype h. we assign –1 if predicted values are negative and +1 otherwise. (RLRP) • Variant : randomly reshuffle SNPs before choosing tags (RLR) Unknown haplotype h rref X Predicted haplotype h’ tags set  =

  13. Results for Simulated Data • Cost-saving ratio for 2% error for LR is 3.9 and for RLRP is 13 • P =1000 different haplotypes • m =25000 sites • Sample size = k (number of tag SNP’s) = 50,100,…,750

  14. Results for Real Data • Cost-saving ratio for 5% error for LR is 2.1 and for RLRP is 2.8 • P =158 different haplotypes (Daly el.,) • m =103 sites • Sample size = k (number of tag SNP’s) = 10,15,20,…,90

  15. Outline • SNPs, haplotypes and genotypes • Haplotype tagging problem • Linear reduction method for tagging • Maximizing tagging separability • Conclusions & future work

  16. Tag Separability • Correlation between number of zeros for SNPs in RREF X and number of errors in prediction column • Greedy heuristic gives a more separable basis. For 5% error, cost-saving ratio 2.8 vs 3.3 for RLRP

  17. Conclusions and Future work • Our contributions • new SNP tagging problem formulation • linear reduction method for SNP tagging • enhancement of linear reduction using separable basis • Future work • application of tagging for genotype and haplotype disease association

  18. Thank you

More Related