1 / 21

Detecting Inversions in Human Genome

Detecting Inversions in Human Genome. Phillip Tao Advisor: Eleazar Eskin. Polymorphism. Structural abnormality in chromosome Deletion Duplication Translocation Inversion. Inversion. Portion of chromosome is flipped Usually no major adverse effects

avidan
Download Presentation

Detecting Inversions in Human Genome

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Detecting Inversions in Human Genome Phillip Tao Advisor: Eleazar Eskin

  2. Polymorphism • Structural abnormality in chromosome • Deletion • Duplication • Translocation • Inversion

  3. Inversion • Portion of chromosome is flipped • Usually no major adverse effects • Inverted section tends to have strong LD • Small inversions are very hard to detect

  4. Bafna’s Method • Define inversion as two breakpoints • Find two SNPs on each side of each breakpoint • SNP on outside of one breakpoint should correlate higher with SNP on inside of other breakpoint if there’s an inversion

  5. ... A ... ... T ... ... C ... ... C ... ... A ... ... G ... ... C ... ... G ... ... C ... ... T ... ... G ... ... C ... ... C ... ... G ... ... G ... ... G ... ... A ... ... G ... ... C ... ... G ...

  6. My Goal • Simplify Bafna’s method • Use r-correlation • Use single SNPs instead of finding multi-SNP markers

  7. My Method • Calculate correlation between all SNPs • For each SNP, calculate difference in correlation between all other SNPs to it • Find sets of four SNPs which fit pattern described earlier • Organize sets into groups based on position

  8. Example 1 2 3 4 5 6 7 AT CA GC G AG AA GT C TG CG GC C AT CA GC G TT CG AC G

  9. Example r table 1 2 3 4 5 6 7 1 1.0 2 0.2 1.0 3 0.4 0.6 1.0 4 1.0 0.2 0.4 1.0 5 0.6 0.4 0.3 0.4 1.0 6 0.4 0.61.0 0.4 0.3 1.0 7 0.2 1.00.6 0.2 0.4 0.6 1.0

  10. Example diff table (SNP 1) 1 2 3 4 5 6 7 1 2 0.0 3 0.2 0.0 4 0.80.6 0.0 5 0.40.2 -0.4 0.0 6 0.2 0.0 -0.6 -0.2 0.0 7 0.0 -0.2 -0.8 -0.4 -0.2 0.0 1 2 4 1 2 5 1 3 4 1 3 5 1 2 3 1 2 6

  11. Example diff table (SNP 6) 1 2 3 4 5 6 7 1 0.0 2 -0.2 0.0 3 -0.6 -0.4 0.0 4 0.0 0.20.6 0.0 5 0.1 0.30.7 0.1 0.0 6 7 0.0 2 4 6 2 5 6 3 4 6 3 5 6

  12. Example cont. 1 2 4 1 2 5 1 3 4 1 3 5 2 4 6 2 5 6 3 4 6 3 5 6 2 4 7 2 5 7 3 4 7 3 5 7 1 2 4 6 1 2 5 6 1 3 4 6 1 3 5 6 1 2 4 7 1 2 5 7 1 3 4 7 1 3 5 7 1 2 3 1 2 6 [1 – 1] [2 – 3] [4 – 5] [6 – 7]

  13. Results • Results for 8 ENCODE regions • Each encode region has about one “big” inversion, and 3 or 4 smaller possible inversions • Inversion candidates range from about 20kb to 250kb

  14. Encode 1 CEU length 138206: 26933775 26961947 27061501 27080620 (x1152) [26933311 - 26935400] [26935778 - 27001979] [27061501 - 27073984] [27074652 - 27115799] length 24723: 27229393 27243243 27265414 27269500 (x549) [27222615 - 27242896] [27243243 - 27247682] [27264662 - 27267966] [27269500 - 27290893]

  15. Encode 1 JPTCHB length 112765: 26925087 26961569 27038413 27095921 (x696) [26925087 - 26936161] [26936185 - 26984395] [27018432 - 27048950] [27053451 - 27098098] length 16797: 27286339 27297153 27308501 27317801 (x430) [27282442 - 27291838] [27292455 - 27297184] [27308501 - 27309252] [27309746 - 27318505]

  16. Encode 2 CEU length 146580: 89679961 89740881 89846316 89856918 (x10169) [89629528 - 89702509] [89703442 - 89751478] [89842982 - 89850022] [89851175 - 89971133] length 103202: 89984366 90038027 90141147 90162545 (x4464) [89960639 - 90037168] [90037945 - 90074697] [90125136 - 90141147] [90143267 - 90244055]

  17. Encode 2 JPTCHB length 61931: 89740469 89777036 89815696 89844587 (x7363) [89740469 - 89753274] [89754595 - 89783950] [89807767 - 89816526] [89817163 - 89869295] length 241177: 90147369 90237945 90461335 90485128 (x5137) [90071367 - 90186818] [90223524 - 90325391] [90457540 - 90464701] [90468056 - 90493804]

  18. Encode 3 CEU length 53311: 126434362 126444935 126484991 126520444 (x6392) [126430928 - 126434467] [126435292 - 126461428] [126483937 - 126488603] [126489707 - 126537051] length 79164: 126717787 126750681 126810226 126838912 (x4294) [126653273 - 126730160] [126731062 - 126753794] [126810226 - 126810226] [126811293 - 126868969]

  19. Encode 3 JPTCHB length 53311: 126434155 126435292 126484017 126489707 (x8664) [126434155 - 126434467] [126435292 - 126461428] [126483937 - 126488603] [126489707 - 126534298] length 56719: 126499913 126517706 126563455 126598442 (x2480) [126461428 - 126509693] [126510624 - 126536076] [126558033 - 126567343] [126567738 - 126622425]

  20. Problems • Grouping algorithm not very good • Many redundant groups • Not weighting sets • Some candidate inversions overlap others • Seems to be detecting too many • Very slow and inefficient

  21. Extensions • Improve grouping algorithm • Add weighting of sets • Combine similar groups • Filter out sets which are likely outliers • Use other inversion detection techniques • Use length constraints to filter out sets and groups

More Related