210 likes | 355 Views
Detecting Inversions in Human Genome. Phillip Tao Advisor: Eleazar Eskin. Polymorphism. Structural abnormality in chromosome Deletion Duplication Translocation Inversion. Inversion. Portion of chromosome is flipped Usually no major adverse effects
E N D
Detecting Inversions in Human Genome Phillip Tao Advisor: Eleazar Eskin
Polymorphism • Structural abnormality in chromosome • Deletion • Duplication • Translocation • Inversion
Inversion • Portion of chromosome is flipped • Usually no major adverse effects • Inverted section tends to have strong LD • Small inversions are very hard to detect
Bafna’s Method • Define inversion as two breakpoints • Find two SNPs on each side of each breakpoint • SNP on outside of one breakpoint should correlate higher with SNP on inside of other breakpoint if there’s an inversion
... A ... ... T ... ... C ... ... C ... ... A ... ... G ... ... C ... ... G ... ... C ... ... T ... ... G ... ... C ... ... C ... ... G ... ... G ... ... G ... ... A ... ... G ... ... C ... ... G ...
My Goal • Simplify Bafna’s method • Use r-correlation • Use single SNPs instead of finding multi-SNP markers
My Method • Calculate correlation between all SNPs • For each SNP, calculate difference in correlation between all other SNPs to it • Find sets of four SNPs which fit pattern described earlier • Organize sets into groups based on position
Example 1 2 3 4 5 6 7 AT CA GC G AG AA GT C TG CG GC C AT CA GC G TT CG AC G
Example r table 1 2 3 4 5 6 7 1 1.0 2 0.2 1.0 3 0.4 0.6 1.0 4 1.0 0.2 0.4 1.0 5 0.6 0.4 0.3 0.4 1.0 6 0.4 0.61.0 0.4 0.3 1.0 7 0.2 1.00.6 0.2 0.4 0.6 1.0
Example diff table (SNP 1) 1 2 3 4 5 6 7 1 2 0.0 3 0.2 0.0 4 0.80.6 0.0 5 0.40.2 -0.4 0.0 6 0.2 0.0 -0.6 -0.2 0.0 7 0.0 -0.2 -0.8 -0.4 -0.2 0.0 1 2 4 1 2 5 1 3 4 1 3 5 1 2 3 1 2 6
Example diff table (SNP 6) 1 2 3 4 5 6 7 1 0.0 2 -0.2 0.0 3 -0.6 -0.4 0.0 4 0.0 0.20.6 0.0 5 0.1 0.30.7 0.1 0.0 6 7 0.0 2 4 6 2 5 6 3 4 6 3 5 6
Example cont. 1 2 4 1 2 5 1 3 4 1 3 5 2 4 6 2 5 6 3 4 6 3 5 6 2 4 7 2 5 7 3 4 7 3 5 7 1 2 4 6 1 2 5 6 1 3 4 6 1 3 5 6 1 2 4 7 1 2 5 7 1 3 4 7 1 3 5 7 1 2 3 1 2 6 [1 – 1] [2 – 3] [4 – 5] [6 – 7]
Results • Results for 8 ENCODE regions • Each encode region has about one “big” inversion, and 3 or 4 smaller possible inversions • Inversion candidates range from about 20kb to 250kb
Encode 1 CEU length 138206: 26933775 26961947 27061501 27080620 (x1152) [26933311 - 26935400] [26935778 - 27001979] [27061501 - 27073984] [27074652 - 27115799] length 24723: 27229393 27243243 27265414 27269500 (x549) [27222615 - 27242896] [27243243 - 27247682] [27264662 - 27267966] [27269500 - 27290893]
Encode 1 JPTCHB length 112765: 26925087 26961569 27038413 27095921 (x696) [26925087 - 26936161] [26936185 - 26984395] [27018432 - 27048950] [27053451 - 27098098] length 16797: 27286339 27297153 27308501 27317801 (x430) [27282442 - 27291838] [27292455 - 27297184] [27308501 - 27309252] [27309746 - 27318505]
Encode 2 CEU length 146580: 89679961 89740881 89846316 89856918 (x10169) [89629528 - 89702509] [89703442 - 89751478] [89842982 - 89850022] [89851175 - 89971133] length 103202: 89984366 90038027 90141147 90162545 (x4464) [89960639 - 90037168] [90037945 - 90074697] [90125136 - 90141147] [90143267 - 90244055]
Encode 2 JPTCHB length 61931: 89740469 89777036 89815696 89844587 (x7363) [89740469 - 89753274] [89754595 - 89783950] [89807767 - 89816526] [89817163 - 89869295] length 241177: 90147369 90237945 90461335 90485128 (x5137) [90071367 - 90186818] [90223524 - 90325391] [90457540 - 90464701] [90468056 - 90493804]
Encode 3 CEU length 53311: 126434362 126444935 126484991 126520444 (x6392) [126430928 - 126434467] [126435292 - 126461428] [126483937 - 126488603] [126489707 - 126537051] length 79164: 126717787 126750681 126810226 126838912 (x4294) [126653273 - 126730160] [126731062 - 126753794] [126810226 - 126810226] [126811293 - 126868969]
Encode 3 JPTCHB length 53311: 126434155 126435292 126484017 126489707 (x8664) [126434155 - 126434467] [126435292 - 126461428] [126483937 - 126488603] [126489707 - 126534298] length 56719: 126499913 126517706 126563455 126598442 (x2480) [126461428 - 126509693] [126510624 - 126536076] [126558033 - 126567343] [126567738 - 126622425]
Problems • Grouping algorithm not very good • Many redundant groups • Not weighting sets • Some candidate inversions overlap others • Seems to be detecting too many • Very slow and inefficient
Extensions • Improve grouping algorithm • Add weighting of sets • Combine similar groups • Filter out sets which are likely outliers • Use other inversion detection techniques • Use length constraints to filter out sets and groups