1 / 39

A New Model of Multi-Marker Correlation for Genome-Wide Tag SNP Selection

A New Model of Multi-Marker Correlation for Genome-Wide Tag SNP Selection. Wei-Bung Wang Tao Jiang. Introduction. Outline. Introduction Problem Related Work Our Approach Result. Introduction. C T T A G C T T. 94%. C T T A G T T T. 6%. SNP. Single Nucleotide Polymorphism.

mtritt
Download Presentation

A New Model of Multi-Marker Correlation for Genome-Wide Tag SNP Selection

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A New Model of Multi-Marker Correlation for Genome-Wide Tag SNP Selection Wei-Bung Wang Tao Jiang

  2. Introduction Outline • Introduction • Problem • Related Work • Our Approach • Result

  3. Introduction C T T A G C T T 94% C T T A G T T T 6% SNP Single Nucleotide Polymorphism • Single Nucleotide Polymorphism (SNP) • A genetic variation Modified from slide by Yao-Ting Huang, National Taiwan University Department of Computer Science and Information Engineering

  4. Introduction C T T A G C T T 94% C T T A G T T T 6% SNP SNPs • SNPs are usually bi-allelic • Major allele • Minor allele • Minor allele frequency > 1% (or 5%) • Tri-allelic: very rare

  5. Introduction CTC Haplotype 1 -A C T T T G C T C- -A C T T A G C T T- CAT Haplotype 2 ATC -A A T T T G C T C- Haplotype 3 SNP1 SNP2 SNP3 SNP1 SNP2 SNP3 Haplotype Modified from slide by Yao-Ting Huang, National Taiwan University Department of Computer Science and Information Engineering

  6. Introduction Tag SNP • What is a tag SNP? • Here I use some slides by Yao-Ting Huang and Kun-Mao Chao

  7. Examples of Tag SNPs Haplotype patterns An unknown haplotype sample P1 P2 P3 P4 S1 • Suppose we wish to distinguish an unknown haplotype sample. • We can genotype all SNPs to identify the haplotype sample. S2 S3 S4 S5 S6 SNP loci S7 S8 S9 : Major allele S10 S11 : Minor allele S12

  8. Examples of Tag SNPs Haplotype pattern P1 P2 P3 P4 S1 • In fact, it is not necessary to genotype all SNPs. • SNPs S3, S4, and S5 can form a set of tag SNPs. S2 S3 S4 S5 S6 SNP loci P1 P2 P3 P4 S7 S8 S3 S9 S4 S10 S5 S11 S12

  9. Examples of Wrong Tag SNPs Haplotype pattern P1 P2 P3 P4 S1 • SNPsS1, S2, and S3 can not form a set of tag SNPs because P1 and P4 will be ambiguous. S2 S3 S4 S5 S6 SNP loci P1 P2 P3 P4 S7 S1 S8 S2 S9 S3 S10 S11 S12

  10. Examples of Tag SNPs Haplotype pattern • SNPs S1 and S12 can form a set of tag SNPs. • This set of SNPs is the minimum solution in this example. P1 P2 P3 P4 S1 S2 S3 S4 S5 S6 SNP loci S7 S8 P1 P2 P3 P4 S9 S1 S10 S12 S11 S12

  11. Problem Problem • Tag SNP selection • How to select representatives? • Many different ways

  12. What we do Flowchart A group of individuals(all SNPs are known) Select A set of SNPs(tag SNPs) Relationships between tag SNPsand other SNPs ? Assay Haplotype: tag SNPs Haplotype: all SNPs Save money here

  13. Problem Problem • Perfect world • Minimum set of tag SNPs • Save most money • NP-hard • Real life • Relatively small set • Sufficient accuracy/confidence

  14. A very frequently used method is Linkage Disequilibrium (LD) A group of individuals(all SNPs are known) Select What we do A set of SNPs(tag SNPs) Relationships between tag SNPsand other SNPs ? Assay Haplotype: tag SNPs Haplotype: all SNPs Save money here

  15. Related Work 2 r Linkage Disequilibrium (LD) • Non-random association of alleles at two or more loci • Correlated coefficient: estimation of dependency • LD = correlated coefficient =

  16. Related Work 2 ( [ ] ) b A B 0 1 2 r a , , ; ; 2 P P P 1 r = , = = A B A B 2 ( ) P P P ¡ A B A B 2 r = P P P P b A B a Linkage Disequilibrium • r2 = 1: perfect correlation • r2 = 0.9: strong correlation (0.95, etc.) • r2 = 0: no correlation

  17. Related Work An Example A A T T C C

  18. Related Work Minimum Dominating Set Problem Highly correlated SNP

  19. Examples of Tag SNPs Haplotype patterns An unknown haplotype sample P1 P2 P3 P4 S1 • Suppose we wish to distinguish an unknown haplotype sample. S2 S3 S4 S5 S6 SNP loci S7 S8 S9 : Major allele S10 S11 : Minor allele S12

  20. Examples of Tag SNPs Haplotype patterns An unknown haplotype sample P1 P2 P3 P4 S1 • Suppose we wish to distinguish an unknown haplotype sample. S2 S3 S4 S5 S6 SNP loci S7 S8 S9 : Major allele S10 S11 : Minor allele S12

  21. Examples of Tag SNPs Haplotype pattern • SNPs S1 and S12 can form a set of tag SNPs. • This set of SNPs is the minimum solution in this example. P1 P2 P3 P4 S1 S2 S3 S4 S5 SNPs can work together and help each other S6 SNP loci S7 S8 P1 P2 P3 P4 S9 S1 S10 S12 S11 S12

  22. We introduce new allele AC and AC Only one mistake Our Approach A C A C : T 0 7 0 0 7 . . G 0 1 0 2 0 3 . . . 0 8 0 2 . . Our Approach A C G else T

  23. (snp1, snp2) vs. snp3 (snp1, snp2) vs. snp4 Our Approach ( ) A C A C C T G A _ , , ( ) A C A C C T T C _ : : , , Our Approach

  24. Our Approach In the Right Order A group of individuals(all SNPs are known) Select A set of SNPs(tag SNPs) Relationships between tag SNPsand other SNPs Second First

  25. Our Approach If SNP 1, 4, 10 are tag SNPsPredict SNP 17 with patterns …Accuracy / LD: 0.97 . If SNP 5, 8, 13 are tag SNPsPredict SNP 11 with patterns …Accuracy / LD: 0.62 . Our Approach • Generate relationships …………

  26. Our Approach ( ) b b b b A B C P A C A P C M D j _ _ _ _ > c a a m c a o r ) = ) d A B C D A B C ( ) d P A B P B C B i _ _ < c a a m c n o r m ) = ) d A B D A B c c ¢ ¢ ¢ How to Predict / Determine the Alleles? • LD: (tag) SNP 1, 2, 3 vs. SNP 4 • Allele A/a, B/b, C/c, D/d abc abC Abc aBc AbC aBC SNP[123] becomes bi-allelic ABC ABc majorbucket minorbucket

  27. Our Approach Similar Work • Ke Hao also did a similar work • The same LD model • Different way to determine alleles for composite SNPs • Less flexibility • A special case of our model • Related paper: “Genome-wide selection of tag SNPs using multiple-marker correlation,” Bioinformatics, 2007

  28. Our Approach Sketch • Get r2 value for all possible combinations • Find a small subset of SNPs according to LD

  29. Our Approach Sketch • Find a small subset of SNPs according to LD Tag SNPs Tag SNPs are also covered by themselves covered partialcovered

  30. Our Approach Sketch • Simple greedy algorithm (Ke Hao) • Cover more SNPs in each iteration • Modified greedy algorithm (my work) • A SNP that can’t be covered by others • High priority • A SNP that is not picked but covered • OK • Break tie: partial cover

  31. Our Approach Supersede No longer contributes

  32. Our Approach Supersede

  33. Our Approach l i h A T M M M T 1 t w o a r k e r a g g e r g o r m - i f l R i t t t e q u r e : s e o r p e s h i l h d d S N P t 1 w e o e r e a r e s u n c o v e r e : i f h h d h S N P i i i i t t t 2 e n e r e s a s w n o n c o m n g e g e s : ¤ 3 s s à : l 4 e s e : ¤ h h h d S N P S N P t t t t t 5 s e a c o v e r s e m o s u n c o v e r e s à : ( ) f h l f f d ¤ i t t t B 6 o r e a c o r p e s o o r m s s s : i j ; d d d i i t t 7 r e m o v e a n s c o r r e s p o n n g e g e s : / * * / ¤ ¤ \ k d " P S N P i i i t t t t 8 u s n o a g s e s s p c e : ( ) ( ) f h l f f d ¤ ¤ i t t t B B 9 o r e a c o r p e s o o r m s s s o r s s s : i j i j ; ; i f k d h i i t 1 0 e n s s p c e : i d S N P i t t t 1 1 p u s n o c o v e r e s e : j d d d i i t t 1 2 r e m o v e a n s c o r r e s p o n n g e g e s : l 1 3 e s e : ( 0 ) ( 0 ) l l l f f i t t B B 1 4 r e m o v e a r p e s o o r m s s s o r s s s : i j i j ; ; My Program: MMTagger Pick a SNP Data structure

  34. Our Approach ( ) ­ T Complexity • Computing r2 value • O(nk+1) for k-marker • Picking tag SNPs • where T is the number of relationships • O(T log T) time algorithm

  35. Result Result • Our program: MMTagger • Vs. Single-marker approach (LRTag) • A state-of-the-art program • Single-marker • Vs. Hao’s program (MultiTag) • Multi-marker

  36. Result Vs. Single-Marker Approach

  37. Result MMTagger Vs. MultiTag

  38. Conclusion • We provide a new multi-marker model • Size of tag SNP set • 2- vs. 1-marker: apparently better • 3- vs. 2-marker: slightly better • 4-marker or more: slow, unacceptable • Performance • Our program outperforms the only other program with similar model

  39. Thank you!

More Related