1 / 25

Optimizing pooling strategies for the massive next-generation sequencing of viral samples

Optimizing pooling strategies for the massive next-generation sequencing of viral samples. Pavel Skums 1 Joint work with Olga Glebova 2 , Alex Zelikovsky 2 , Ion Mandoiu 3 and Yury Khudyakov 1. 1 Centers for Disease Control and Prevention, Atlanta, GA

erol
Download Presentation

Optimizing pooling strategies for the massive next-generation sequencing of viral samples

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Optimizing pooling strategies for the massive next-generation sequencing of viral samples Pavel Skums1 Joint work with Olga Glebova2, Alex Zelikovsky2, Ion Mandoiu3 and Yury Khudyakov1 1Centers for Disease Control and Prevention, Atlanta, GA 2Georgia State University, Atlanta, GA 3University of Connecticut, Storrs, CT

  2. Outline 1. Massive NGS of viral samples 2. Optimal pooling design problem 3. Algorithm and results

  3. NGS in epidemiology Epidemiological parameters Transmission networks Prediction of the epidemics progress Molecular surveillance Vaccination strategies

  4. NGS in epidemiology A large-scale molecular surveillance requires sequencing of unprecedentedly large sets of viral samples. NGS of tens of thousands of samples is highly cost- and labor-intensive. Example: sequencing 100K samples using 454 senior system with 50 MIDs doing one sequencing run per day Cost:  5000*(100 000)/50 = 10 000 000$ Time: (100 000)/50 = 2000 days  5.5 years

  5. Optimal Pooling Design Problem Goal: a framework for identification of viral sequences from large number of samples using the smallest possible number of NGS runs. Idea: for n samples generate mpools (i.e. mixtures of samples) with m<< nin such a way that every sample is uniquely identified by the pools to which it belongs.

  6. Optimal Pooling Design Problem E Example. 8 samples: 1,2,3,4,5,6,7,8 Sequencing each sample separately: 8 runs 4 pools: M1= {1,2,3,4} M2= {5,6,7,8} M3= {1,2,5,6} M4= {1,3,5,7} Sequencing pools: 4 runs

  7. Optimal Pooling Design Problem 4 pools: M1= {1,2,3,4} M2= {5,6,7,8} M3= {1,2,5,6} M4= {1,3,5,7} {1} = M1M3M4 {2} = (M1M3) \ M4 …………………………………… {8} = (M2 \ M3) \ M4

  8. Optimal Pooling Design Problem Problem 1 (Optimal Pooling Design Problem). Given: a set of samples S= {S1,...,Sn} Find: a set of pools P = {P1,…,Pm} , Pk S for k=1,…,m such that 1)P1…Pm = S 2) for every Si,SjSthere exists Pk P such that |Pk{Si,Sj}| = 1 3) m is minimal (Pkseparates Si and Sj) Theorem1. There exists a solution of Problem 1 with m = log(n) + 1

  9. Optimal Pooling Design Problem Additional conditions for the problem

  10. Optimal Pooling Design Problem Additional conditions for the problem

  11. Optimal Pooling Design Problem Additional conditions for the problem

  12. Optimal Pooling Design Problem A graph G(S) = (V,E), where V = S SiSjE if and only if there is a confidence that the samples Si and Sjdo not intersect.

  13. Optimal Pooling Design Problem Problem 2 (Minimum Clique Test Set Problem). Given: a graph G=G(S), natural numbers k,l Find: a set of cliques P = {P1,…,Pm} of G such that 1) |Pi| ≤ k for every i=1,…,m 2) every vertex v V(G) belongs to at least l cliques from P 3) for every u,vV(G) there exists Pi P such that |Pi{u,v}| = 1 4) m is minimal

  14. Optimal Pooling Design Problem Minimum Test Set Problem(Garey, Johnson) Given: collection Q={Q1,…,Qn} of subsets of a finite set S Find: a subcollectionP = {P1,…,Pm}Qsuch that 1) for every si,sjSthere exists Pr P such that |Pr{si,sj}| = 1 2) m is minimal

  15. Problem reformulations Minimum Clique Test Set Problem Given: a graph G=G(S), natural numbers k,l Find: a set of cliques P = {P1,…,Pm} of G such that 1) |Pi| ≤ k for every i=1,…,m 2) every vertex v V(G) belongs to at least l cliques from P 3) for every u,vV(G) there exists Pi P such that |Pi{u,v}| = 1 4) m is minimal Only some pairs of vertices should be separated A graph H with V(H)=V(G), uvE(H) if and only if u and v should be separated

  16. Problem reformulations Minimum Clique Test Set Problem Given: a graphs G and H on the same set of vertices V, natural numbers k,l Find: a set of cliques P = {P1,…,Pm} of G such that 1) |Pi| ≤ k for every i=1,…,m 2) every vertex v V(G) belongs to at least l cliques from P 3) for every uvE(H) there exists Pi P such that |Pi{u,v}| = 1 4) m is minimal Only some pairs of vertices should be separated A graph H with V(H)=V(G), uvE(H) if and only if u and v should be separated

  17. Problem reformulations Minimum Clique Test Set Problem Given: a graphs G and H on the same set of vertices V, natural numbers k,l Find: a set of cliques P = {P1,…,Pm} of G such that 1) |Pi| ≤ k for every i=1,…,m 2) every vertex v V belongs to at least l cliques from P 3) for every uvE(H)there exists Pi P such that |Pi{u,v}| = 1 4) m is minimal Replace each vertex vV(G) with l pairwise non-adjacent copies

  18. Problem reformulations Minimum Clique Test Set Problem Given: a graphs G and H on the same set of vertices V, natural number k Find: a set of cliques P = {P1,…,Pm} of G such that 1) |Pi| ≤ k for every i=1,…,m 2) P1…Pm = V(G) 3) for every uvE(H) there exists Pi P such that |Pi{u,v}| = 1 4) m is minimal Replace each vertex vV(G) with l pairwise non-adjacent copies

  19. Problem reformulations Minimum Clique Test Set Problem Given: a graphs G and H on the same set of vertices V, natural number k Find: a set of cliques P = {P1,…,Pm} of G such that 1) |Pi| ≤ k for every i=1,…,m 2) P1…Pm = V 3) for every uvE(H) there exists Pi P such that |Pi{u,v}| = 1 4) m is minimal For every uV add a vertex xu and an edge uxuE(H) 1 2 3

  20. Problem reformulations Minimum Clique Test Set Problem Given: a graphs G and H on the same set of vertices V, natural number k Find: a set of cliques P = {P1,…,Pm} of G such that 1) |Pi| ≤ k for every i=1,…,m 2) P1…Pm = V 3) for every uvE(H) there exists Pi P such that |Pi{u,v}| = 1 4) m is minimal For every uV add a vertex xu and an edge uxuE(H) x2 x1 1 2 3 x3

  21. Problem reformulations Minimum Clique Test Set Problem Given: a graphs G and H on the same set of vertices, natural number k Find: a set of cliques P = {P1,…,Pm} of G such that 1) |Pi| ≤ k for every i=1,…,m 3) for every uvE(H) there exists Pi P such that |Pi{u,v}| = 1 4) m is minimal For every uV add a vertex xu and an edge uxuE(H) x2 x1 1 2 3 x3

  22. Heuristic algorithm Input: a graphs G and H on the same set of vertices V, natural number k P =  WHILECP C  V OR E(H)   find maximum cut (A,B) in H (using local search); for every aA put w(a) = # of neighbors of a from B in H; for every bBput w(b) = # of neighbors of b from A in H; find maximum clique C1 with |C1|≤k in a subgraph G[A] with weights w; find maximum clique C2 with |C2|≤kin a subgraphG[B] with weights w; C := argmax{w(C1),w(C2)} ; P:= P  {C}; E(H) := E(H) \ {uv : uC, vV\C} ENDWHILE

  23. Algorithm results 1. All samples are unrelated (i.e. G is complete graph)

  24. Algorithm results 2. Some samples are related (G is a random graph with the edge probability p)

  25. Thank you!

More Related