1 / 43

An Algorithm for the Consecutive Ones Property

An Algorithm for the Consecutive Ones Property. Claudio Eccher. Outline. C1P definition. Biological background Hybridization mapping. An algorithm for the C1P problem Dividing in components Taking care of a component Joining the components together. The consecutive ones property.

mohawk
Download Presentation

An Algorithm for the Consecutive Ones Property

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. An Algorithm forthe Consecutive Ones Property Claudio Eccher

  2. Outline • C1P definition • Biological background • Hybridization mapping • An algorithm for the C1P problem • Dividing in components • Taking care of a component • Joining the components together

  3. The consecutive ones property Definition: A binary matrix is said to have the consecutive ones property (C1P) if a permutation of its columns can be found such that all 1s in each row are consecutive

  4. The consecutive ones property Observation: the C1P is closed under taking submatrices A bad matrix: Whichever column x I put in the middle there is a row in which x is 0 Hence, every matrix containing this submatrix is ‘bad’

  5. Hybridization mapping (1) • Copies of a DNA molecule are broken into several fragments (~104 bases) and replicated by cloning (clones) • The possible binding of small sequences (probes) to a clone are checked, the subset of the probes bounded (hybridized) to a clone becomes its fingerprint • Clones’ overlap, and thus their relative order, are determined by comparing fingerprints

  6. Hybridization mapping (2) Two clones sharing part of their respective fingerprints are likely to have come from overlapping DNA regions Clone 1 Clone 2 Probes A B D C

  7. Assumptions • Probes are unique • There are no errors • All “clones x probes” hybridization experiments have been done

  8. n x m binary matrix M built from experimental data • Mij = 1 ð probe j hybridized to clone i • Mij = 0 ð probe j not hybridized to clone i Model • n clones and m probes

  9. Finding a permutation of the columns such that all 1s in each row are consecutive Determing if M has the C1P for rows Problem Obtaining a physical map from M

  10. Without loss of generality we can assume that: • All rows are different • No row is all zeros An algorithm for the C1P problem • The problem belongs to P • The algorithm is from Fulkerson and Gross (1965)

  11. Algorithm sketch Separation of the rows into components (subsets of rows) Permutation of the columns of each component Join of the components together

  12. Row relations Definition: "row iÎM, Si={columns k | Mi,k=1} • Given two rows i and j: • SiÇSj = Æ or • SiÍSj or Sj ÍSi or • SiÇSj¹Æ and none of them is a subset of the other

  13. If $ a row k s.t.: SkÇSi = Æ or SkÍSi"i ¹k in this component Then row k can be put in its own component Dividing in components (1) Let’s initially lump together in the same component the rows with non empty intersection

  14. A graph Gc = (V,E) is built from matrix M • Each vertex V is a row of M • There is an undirected edge E from Vi to Vj if SiÇSj¹Æand none of them is a subset of the other Dividing in components (2) The components we want are the connected components of Gc

  15. b l3 l4 g l5 l8 d l6 l7 Building Gc: an example Gc l2 a l1 Edge (l1, l2)

  16. b l3 l8 d l6 l7 Building Gc: an example Gc l2 a l4 l1 g l5 Edge (l4, l5)

  17. b l3 l4 g l5 Building Gc: an example Gc l2 a l1 Edge (l6, l7) l8 d l6 l7

  18. b l3 l4 g l5 Building Gc: an example Gc l2 a l1 Edge (l6, l8) l8 d l6 l7

  19. l1 l2 l3 Taking care of a component (1) The 1s of the first row have to be put consecutive. The possible solutions can be represented as follows: The second row is adjacent to the first one. Hence, for the second row (l2) there are 2 choices: the 1s can be placed to the left or to the right of those of the row l1. In any case the direction does not really matter

  20. l1 l2 l3 Taking care of a component (2) For the third row (l3) we have to consider the relations with the rows connected by edges to l3 Let’s place l3 with respect to l2: we cannot place l3 in either direction (left or right) because of its relation with l1 To take into account the relation between l1 and l3 is necessary to consider the number of elements in the intersections between S1, S2 and S3

  21. l1 l2 l3 If l1·l3 < min(l1·l2 , l2·l3) then l3 has to be placed in the same direction that l2 was placed with respect to l1 If l1·l3 > min(l1·l2 , l2·l3) then l3 has to be placed in the opposite direction that l2 was placed with respect to l1 Taking care of a component (3) Definition: Let x·y = | SxÇSy | be the internal product of rows x and y If we have equality it isn’t possible to have the 1s of l3 consecutive

  22. l1 l2 l3 Taking care of a component (4) For l3, S3 = {1,4,7,8}, l1·l3= 2, l1·l2= 2, l1·l3= 1, so l3 have to be put to the right of l2:

  23. We had no choice in placing l3 Therefore, if the component has the C1P, then l1 and l3must result properly placed If, on the contrary, l1and l3are not properly placed, then we conclude that the component (and hence the matrix) doesn’t have the C1P Taking care of a component (5) The only choice made was in the placement of l2 with respect to l1 and both possibilities result in the same solutions up to reversal.

  24. String generator We have seen the following examples of string generator A permutation p of the probes is compatible with a string generator if whenever A, B, C appear in this order in p and A and C are in a group G, then B is also included in G An invariant of the algorithm is that, after considering rows 1..k, a permutation p certificates the C1P of the submatrix on rows 1..k iff either p or its reversal is compatible with the string generator

  25. Taking care of a component: a ‘bad’ component The relations between the rows are the same as the preceding component

  26. Taking care of a component (6) For a new row k in the same component find two previously placed rows i and j s.t. $E(k,i), E(i,j) in Gc and proceed as for the three-row case. Check also the consistency with the solution generator The algorithm gives all possible permutations of a component having the C1P, up to reversal

  27. Algorithm implementation Construct Gc and traverse it using depth-first search When visiting a vertex invoke procedure Place AlgorithmPlace input: u, v, w vertices of Gc=(V,E) s.t. (u,v)ÎE and (v,w) ÎE output: A placement for row u, if possible if v = nil and w = nil then Place all 1s of u consecutively else if w = nil then Left- or right-place the 1s of u with respect to the 1s of v Record direction used else if u · w < min(u · v , v · w) then Place u with respect to v in the same direction used in v, w placement. Record direction used else Place u with respect to v in the opposite direction used in v, w placement. Record direction used Check consistency of column set If column sets are not consistent then the component doesn’t have the C1P

  28. Algorithm running time For a n x m matrix building graph Gc takes O(nm) time To check consistency of column sets requires O(m) time per row and there are n rows to process Total time is thus O(nm)

  29. Construct a new graph GM = (V,E) in which: • Each component ak of M is a vertex in GM • For a,bÎV, there is a directed edge from a to b if " row iÎb sets Si are contained in at least one set Sj of a Joining components together (1) GMtells us how the components of M fit together

  30. GM for the example matrix GM a a b b g d g d

  31. Joining components together (2) For two sets Si Î b, SjÎa, if SiÍSj then there is no row kÎ a s.t. Si ËSk and SiÇSk¹ Æ The exact same containments and disjunctions hold for all other sets from b GMis acyclic

  32. Joining components together (3) The joining of components depends on the way sets in one component contain or are contained in sets from other components Components having sets not contained anywhere else should be processed first Containment is specified by the directed edges in GM

  33. Joining components together (4) GM has to be processed in topological order Remove all sources from GM (e.g. a) and make the union of their string generators While GM is not empty take the next source b,remove b from GM, and refine the current string generator with the string generator of b

  34. Example (1) GM a b a b g g d d One topological order is a, b, g, d

  35. Example (2) a b d g

  36. Example (3)

  37. Example (4)

  38. Example (5) In this particular case there are two solutions corresponding to the permutation of identical columns (5 and 9)

  39. Algorithm solution is not unique In general multiple solutions may exist because: • Each component may on its own have several solutions • Each solution can be used in two ways: the permutation and its reversal

  40. Algorithm running time Topological sorting of GM takes time O(n+m) If the entries of M are preprocessed the queries needed for traversing GM can take constant time Preprocessing takes at most O(nm) Total time for processing each component ci is O(nim) Algorithm running time is O(nm)

  41. Concluding remarks (1) Even if a C1P permutation exists, this is not necessarily the true permutation: • The solution is not unique • In general errors do exist, so the true permutation is not the C1P one

  42. Concluding remarks (2) Generalizations to account for errors yield NP-hard problems Also relaxing the assumption of unique probes yields NP-hard problems

  43. Related works A considerably more complicated algorithm from Booth and Leuker exists (1976) that takes O(n+m+r) time (r is the total number of 1s) Quite recently a simple O(n+m+r)-time algorithm has been presented by Hsu - J Algorithms 43 (2002), no. 1, 1-16

More Related