1 / 77

Flipping letters to minimize the support of a string

Explore minimizing k-mers in a string by flipping letters while adhering to budget constraints. Discuss related parameterized complexity and ILP formulations.

Download Presentation

Flipping letters to minimize the support of a string

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Flipping letters to minimize the support of a string Giuseppe Lancia, Franca Rinaldi, Romeo Rizzi University of Udine

  2. Outline of talk: 1. Problem definition 2. Parametrized complexity 3. Polynomial cases 4. NP-hardness 5. ILP formulations

  3. 1. Problem definition

  4. We are given a string s and a parameter k (e.g., k = 3) 010010011 The string has a set of k-mers, its support, K(s)

  5. We are given a string s and a parameter k (e.g., k = 3) 010010011 The string has a set of k-mers, its support, K(s) K(s) = { 010 }

  6. We are given a string s and a parameter k (e.g., k = 3) 010010011 The string has a set of k-mers, its support, K(s) K(s) = { 010, 100 }

  7. We are given a string s and a parameter k (e.g., k = 3) 010010011 The string has a set of k-mers, its support, K(s) K(s) = { 010, 100, 001}

  8. We are given a string s and a parameter k (e.g., k = 3) 010010011 The string has a set of k-mers, its support, K(s) K(s) = { 010, 100, 001}

  9. We are given a string s and a parameter k (e.g., k = 3) 010010011 The string has a set of k-mers, its support, K(s) K(s) = { 010, 100, 001}

  10. We are given a string s and a parameter k (e.g., k = 3) 010010011 The string has a set of k-mers, its support, K(s) K(s) = { 010, 100, 001}

  11. We are given a string s and a parameter k (e.g., k = 3) 010010011 The string has a set of k-mers, its support, K(s) K(s) = { 010, 100, 001, 011}

  12. We are given a string s and a parameter k (e.g., k = 3) 010010011 The string has a set of k-mers, its support, K(s) K(s) = { 010, 100, 001, 011} | K(s) | = 4

  13. We are given a string s and a parameter k (e.g., k = 3) 010010011 By flipping some bits, we could reduce the number of k-mers K(s) = { 010, 100, 001, 011} | K(s) | = 4

  14. We are given a string s and a parameter k (e.g., k = 3) 010010011 010010010 S’= By flipping some bits, we could reduce the number of k-mers K(s) = { 010, 100, 001, 011} | K(s) | = 4

  15. We are given a string s and a parameter k (e.g., k = 3) 010010011 010010010 S’= By flipping some bits, we could reduce the number of k-mers K(s) = { 010, 100, 001, 011} | K(s) | = 4 K(s’) = { 010, 100, 001} | K(s’) | = 3

  16. The Problem : Ingredients: - A string s over an alphabetS

  17. The Problem : Ingredients: - A string s over an alphabetS - A parameter k (k-mer size)

  18. The Problem : Ingredients: - A string s over an alphabetS - A parameter k (k-mer size) - A budget B

  19. The Problem : Ingredients: - A string s over an alphabetS - A parameter k (k-mer size) - A budget B Objective: Change at most B letters in s so as resulting s’ has as few distinct k-mers as possible

  20. The Problem : Ingredients: - A string s over an alphabetS - A parameter k (k-mer size) - A budget B Objective: Find a string s’ with d(s,s’) <= B with the smallest number of kmers s s’

  21. Motivation : Real: Curiosity-driven (it’s a cute combinatorial problem)

  22. Motivation : Real: Curiosity-driven (it’s a cute combinatorial problem) Fictious: Analysis of DNA sequences atcgattgatccttta atc, tcg, cga, gat, …. 3-mers are aminoacid codons. Protein complexity relates to # of codons. Mutations may reduce complexity….

  23. Our results: The problem has many parameters (|s|, |S|, k, B), we study all versions (when possibly some of the parameters are bounded) • - Polynomial special cases (e.g. for B fixed or both k,|S| fixed) • - NP-hard special cases (even k=2 or |S|=2)

  24. 2. Parametrized complexity

  25. |S| NO k NO |S| YES k NO |S| NO k YES |S| YES k YES |s| NO B NO |s| YES B NO |s| NO B YES |s| YES B YES

  26. |S| NO k NO |S| YES k NO |S| NO k YES |S| YES k YES |s| NO B NO |s| YES B NO |s| NO B YES |s| YES B YES k <= |s| We can assume :

  27. |S| NO k NO |S| YES k NO |S| NO k YES |S| YES k YES |s| NO B NO |s| YES B NO |s| NO B YES |s| YES B YES k <= |s| We can assume :

  28. |S| NO k NO |S| YES k NO |S| NO k YES |S| YES k YES |s| NO B NO |s| YES B NO |s| NO B YES |s| YES B YES B <= |s| We can assume :

  29. |S| NO k NO |S| YES k NO |S| NO k YES |S| YES k YES |s| NO B NO |s| YES B NO |s| NO B YES |s| YES B YES |S| <= |s| (we don’t need any symbol not already in s) We can assume :

  30. |S| NO k NO |S| YES k NO |S| NO k YES |S| YES k YES |s| NO B NO |s| YES B NO |s| NO B YES |s| YES B YES

  31. |S| NO k NO |S| YES k NO |S| NO k YES |S| YES k YES |s| NO B NO |s| YES B NO |s| NO B YES |s| YES B YES Polynomial cases

  32. |S| NO k NO |S| YES k NO |S| NO k YES |S| YES k YES |s| NO B NO NP-hard for |S|=2 NP-hard for k=2 NP-hard |s| YES B NO |s| NO B YES |s| YES B YES NP-hard cases

  33. 3. Polynomial cases

  34. The case |S| and k fixed:

  35. The case |S| and k fixed: |S| NO k NO |S| YES k NO |S| NO k YES |S| YES k YES |s| NO B NO NP-hard for |S|=2 NP-hard for k=2 NP-hard |s| YES B NO |s| NO B YES |s| YES B YES

  36. The case |S| and k fixed: We start with this subproblem: SUB(A): Given a set of kmers A, can we correct s within budget so as it has all of its kmers in A?

  37. The case |S| and k fixed: We start with this subproblem: SUB(A): Given a set of kmers A, can we correct s within budget so as it has all of its kmers in A? s = 01000100101110 A = 0100, 1001, 0010 , 0001 B = 3

  38. The case |S| and k fixed: We start with this subproblem: SUB(A): Given a set of kmers A, can we correct s within budget so as it has all of its kmers in A? s = 01000100101110 A = 0100, 1001, 0010 , 0001 B = 3 0100 1001 0010 0001 0100 1001 0010 0001 0100 1001 0010 0001 …… 0100 1001 0010 0001 0100 1001 0010 0001 0100 1001 0010 0001 …… ……

  39. The case |S| and k fixed: We start with this subproblem: SUB(A): Given a set of kmers A, can we correct s within budget so as it has all of its kmers in A? s = 01000100101110 A = 0100, 1001, 0010 , 0001 B = 3 0100 1001 0010 0001 0100 1001 0010 0001 0100 1001 0010 0001 …… 0100 1001 0010 0001 0100 1001 0010 0001 0100 1001 0010 0001 …… ……

  40. The case |S| and k fixed: We start with this subproblem: SUB(A): Given a set of kmers A, can we correct s within budget so as it has all of its kmers in A? s = 01000100101110 A = 0100, 1001, 0010 , 0001 B = 3 0100 1001 0010 0001 0100 1001 0010 0001 0100 1001 0010 0001 …… 0100 1001 0010 0001 0100 1001 0010 0001 0100 1001 0010 0001 …… ……

  41. The case |S| and k fixed: We start with this subproblem: SUB(A): Given a set of kmers A, can we correct s within budget so as it has all of its kmers in A? s = 01000100101110 A = 0100, 1001, 0010 , 0001 B = 3 0100 1001 0010 0001 0100 1001 0010 0001 0100 1001 0010 0001 …… 0100 1001 0010 0001 0100 1001 0010 0001 0100 1001 0010 0001 …… …… 0100 0 1 ….. 1 1 0

  42. The case |S| and k fixed: We start with this subproblem: SUB(A): Given a set of kmers A, can we correct s within budget so as it has all of its kmers in A? s = 01000100101110 A = 0100, 1001, 0010 , 0001 B = 3 0100 1001 0010 0001 0100 1001 0010 0001 0100 1001 0010 0001 …… 0100 1001 0010 0001 0100 1001 0010 0001 0100 1001 0010 0001 …… …… 0100 0 1 ….. 1 1 0

  43. The case |S| and k fixed: We start with this subproblem: SUB(A): Given a set of kmers A, can we correct s within budget so as it has all of its kmers in A? s = 01000100101110 A = 0100, 1001, 0010 , 0001 B = 3 0100 1001 0010 0001 0100 1001 0010 0001 0100 1001 0010 0001 …… 0100 1001 0010 0001 0100 1001 0010 0001 0100 1001 0010 0001 0 0 0 0 0 0 2 0 …… 0 0 1 0 3 1 1 1 0 0 …… 0 2 1 1 1 1 0100 0 1 ….. 1 1 0

  44. The case |S| and k fixed: We start with this subproblem: SUB(A): Given a set of kmers A, can we correct s within budget so as it has all of its kmers in A? s = 01000100101110 A = 0100, 1001, 0010 , 0001 B = 3 0100 1001 0010 0001 0100 1001 0010 0001 0100 1001 0010 0001 …… 0100 1001 0010 0001 0100 1001 0010 0001 0100 1001 0010 0001 0 0 0 0 0 0 2 0 …… 0 0 1 0 3 1 1 1 0 0 …… 0 2 1 1 1 1 0100 0 1 ….. 1 1 0 Each path corresponds to a string s’ with all its kmers in A

  45. The case |S| and k fixed: We start with this subproblem: SUB(A): Given a set of kmers A, can we correct s within budget so as it has all of its kmers in A? s = 01000100101110 A = 0100, 1001, 0010 , 0001 B = 3 0100 1001 0010 0001 0100 1001 0010 0001 0100 1001 0010 0001 …… 0100 1001 0010 0001 0100 1001 0010 0001 0100 1001 0010 0001 0 0 0 0 0 0 2 0 …… 0 0 1 0 3 1 1 1 0 0 …… 0 2 1 1 1 1 0100 0 1 ….. 1 1 0 The length of the path is the Hamming distance d(s’, s)

  46. The case |S| and k fixed: We start with this subproblem: SUB(A): Given a set of kmers A, can we correct s within budget so as it has all of its kmers in A? s = 01000100101110 A = 0100, 1001, 0010 , 0001 B = 3 0100 1001 0010 0001 0100 1001 0010 0001 0100 1001 0010 0001 …… 0100 1001 0010 0001 0100 1001 0010 0001 0100 1001 0010 0001 0 0 0 0 0 0 2 0 …… 0 0 1 0 3 1 1 1 0 0 …… 0 2 1 1 1 1 0100 0 1 ….. 1 1 0 SUB(A) has a solution iff the shortest path is <= B

  47. The case |S| and k fixed: We start with this subproblem: SUB(A): Given a set of kmers A, can we correct s within budget so as it has all of its kmers in A? - we can solve SUB(A) in polytime (O|A||S||s|) = O(|s|) since

  48. The case |S| and k fixed: We start with this subproblem: SUB(A): Given a set of kmers A, can we correct s within budget so as it has all of its kmers in A? - we can solve SUB(A) in polytime (O|A||S||s|) = O(|s|) since - There are “only” possible subsets A to try…  problem is solved in polytime O(|s|)

  49. The case of B fixed:

  50. The case of B fixed: |S| NO k NO |S| YES k NO |S| NO k YES |S| YES k YES |s| NO B NO NP-hard for |S|=2 NP-hard for k=2 NP-hard |s| YES B NO |s| NO B YES |s| YES B YES

More Related