380 likes | 561 Views
A Simplified View of DCJ-Indel Distance. Phillip Compeau University of California-San Diego Department of Mathematics. Abstract. Braga et al., 2010: Solved problem of DCJ-indel sorting in linear time. Goals: “Hardwire” DCJ sorting into DCJ-indel sorting.
E N D
A Simplified View of DCJ-Indel Distance Phillip Compeau University of California-San Diego Department of Mathematics
Abstract • Braga et al., 2010: Solved problem of DCJ-indel sorting in linear time. • Goals: • “Hardwire” DCJ sorting into DCJ-indel sorting. • Characterize solution space for DCJ-indel sorting. • DCJ solution space known (Braga and Stoye, 2010).
Section 1: Preliminaries Preliminaries Encoding Indels as DCJs DCJ-Indel Sorting The Solution Space of DCJ-Indel Sorting Conclusion
The Discrete Genome • Genome (Π): formed of two matchings • genes g(Π): each numbered gene has a head and a tail. • adjacencies (a(Π)):a blue matching on V(g(Π)) Π Γ
The Discrete Genome • Chromosome: component of Π (alternating path or cycle) • Linear or circular depending on path or cycle of Π • Telomere: path endpoint of Π; has null adjacency {v, Ø} Π Γ
The Double-Cut-and-Join Operation • Double-cut-and-joinoperation (DCJ; Yancopoulos et al., 2005): “cuts” genome in two places and rejoins adjacencies. • DCJ Distance (dDCJ(Π, Γ)): minimum # of DCJs required to transform Π intoΓ (having the same genes).
The Breakpoint Graph • B(Π, Γ) is formed from the adjacencies of Π and Γ. • B(Π, Γ) also comprises (alternating) red-bluepaths and cycles.
DCJ Distance Formula • Bergeron et al., 2006: If Π and Γ share the same genes, then the DCJ distance is given by the following formula: • N = # of genes • c(Π, Γ) = # of cycles in B(Π, Γ) • peven(Π, Γ) = # of even paths in B(Π, Γ)
Indels and the DCJ-Indel Distance • Indel: The insertion or deletion of a chromosome or chromosomal interval (consecutive genes). • Assumption: we can’t remove a gene common to Π and Γ • DCJ-Indel Distance (dindDCJ(Π, Γ)): Minimum # of DCJs and indels required to transform Π into Γ. • Braga et al., 2010: Solve DCJ-indel sorting in linear time. • Lots of cases…can we simplify it? c b d a a b c d a c a b b Ø Ø Ø
Section 2: Encoding Indels as DCJs Preliminaries Encoding Indels as DCJs DCJ-Indel Sorting The Solution Space of DCJ-Indel Sorting Conclusion
Deletion DCJ Creating Circular Chromosome • Ma et al., 2009: View deletion as formation and removal of a circular chromosome. • Idea: Indel = DCJ creating circular chromosome • Wait…what about the deletion of circular chromosomes? c b d a a b c d a c a b b Ø Ø Ø DCJ DCJ DCJ b a b c a b a c d b DCJ a d c Ø
Apparent Exceptions • Apparent Exception #1: Two deleted circular chromosomes are created from a single DCJ. c b d a b a c d DCJ 3 Operations
Apparent Exceptions • Apparent Exception #1: Two deleted circular chromosomes are created from a single DCJ. c c b b d d a a b a c d 1 Operation DCJ 3 Operations
Apparent Exceptions • Apparent Exception #2: A deleted circular chromosome is never involved in a DCJ • Circular singleton of Π: A circular chromosome of Π that shares no genes with Γ. • Question: Can we delete all circular singletons first?
Apparent Exceptions • Apparent Exception #2: A deleted circular chromosome is never involved in a DCJ • Circular singleton of Π: A circular chromosome of Π that shares no genes with Γ. • Question: Can we delete all circular singletons first? YES!
Handling Circular Singletons • Proposition: When transforming Π into Γ via a minimum collection of DCJs and indels, no gene belonging to a circular singleton of Π can ever appear in the same chromosome as a gene of Γ. • Corollary 1: If Π* is formed from Π by removing a circular singleton from Π, then dindDCJ(Π*, Γ) = dindDCJ(Π, Γ) – 1. • Let sing(Π, Γ) = # of circular singletons of Π and Γ. • Corollary 2: If Π0 and Γ0 are formed by removing all circular singletons from Π and Γ, thendindDCJ(Π, Γ) = dindDCJ(Π0 , Γ0) + sing(Π, Γ)
A Novel View of DCJ-Indel Distance • WLOG we may henceforth assume that sing(Π, Γ) = 0. • A completion of Π is a genome Π’ such that: • g(Π’) = g(Π) U g(Γ) • a(Π’) = a(Π) U perfect matching on V(Π’) – V(Π) • New chromosomes of Π’ are circular: the indels of Π’ • Theorem:
A Novel View of DCJ-Indel Distance • An optimal completion achieves the optimum below. • A completion of Π is a genome Π’ such that: • g(Π’) = g(Π) U g(Γ) • a(Π’) = a(Π) U perfect matching on V(Π’) – V(Π) • New chromosomes of Π’ are circular: the indels of Π’ • Theorem:
Section 3: DCJ-Indel Sorting Preliminaries Encoding Indels as DCJs DCJ-Indel Sorting The Solution Space of DCJ-Indel Sorting Conclusion
Open Vertices • π-open vertex: vertex not found in Π (must be matched in Π’) • path endpoint in B(Π,Γ) must be π-open/γ-open or telomere (or both) • Define {π, π}-paths, {π, γ}-paths, π-paths in B(Π, Γ) • Idea: Construct B(Π*, Γ*) from B(Π, Γ) by matching vertices.
Necessary Conditions for B(Π*, Γ*) • Lemma 1: If (Π*, Γ*) is an optimal completion of (Π, Γ), then every {π, π}-path ({γ, γ}-path) of length 2k– 1 in B(Π, Γ) embeds into a cycle of length 2k in B(Π*, Γ*).
Necessary Conditions for B(Π*, Γ*) • Lemma 1: If (Π*, Γ*) is an optimal completion of (Π, Γ), then every {π, π}-path ({γ, γ}-path) of length 2k– 1 in B(Π, Γ) embeds into a cycle of length 2k in B(Π*, Γ*). • Picture: π π π π π π dDCJ(Π’’, Γ’) < dDCJ(Π’, Γ’) Vs. π π Cycle B(Π’, Γ’) B(Π’’, Γ’)
Necessary Conditions for B(Π*, Γ*) • Lemma 1: If (Π*, Γ*) is an optimal completion of (Π, Γ), then every {π, π}-path ({γ, γ}-path) of length 2k– 1 in B(Π, Γ) embeds into a cycle of length 2k in B(Π*, Γ*). • Remaining components of B(Π*, Γ*): • bracelet: cycle linking {π, γ}-paths • chain: path linking π-paths/γ-paths via intermediate {π, γ}-paths π π π π π γ γ 3-Chain 2-Bracelet π π π 2-Chain γ γ
Necessary Conditions for B(Π*, Γ*) • Lemma 2: B(Π*, Γ*) can contain only 2-bracelets, 2-chains, and 3-chains. • Picture: π π π π π π π π Vs. dDCJ(Π’’, Γ’) < dDCJ(Π’, Γ’) P1 P2 P1 P2 Cycle γ γ γ γ B(Π’, Γ’) B(Π’’, Γ’)
Necessary Conditions for B(Π*, Γ*) • Lemma 3: B(Π*, Γ*) cannot have one 2-chain joining two odd π-paths and another 2-chain joining two even π-paths. The same holds for γ-paths. • Picture: Ø Ø P3 even Ø Ø P1 odd EvenPath π π π π dDCJ(Π’’, Γ’) < dDCJ(Π’, Γ’) Vs. π π π π P2 odd EvenPath P4 even Ø Ø Ø Ø B(Π’, Γ’) B(Π’’, Γ’)
Sorting Algorithm • Remove all circular singletons of Π and Γ. • Lemma 1 Close every {π, π}-path ({γ, γ}-path) into a cycle by adding a single new adjacency to Π* (Γ*). • Form a maximum set of 2-bracelets (only chains remaining). • Form a maximum set of even 2-chains by linking pairs of π-paths (γ-paths) having opposite parity (Lemma 3). • If pπ, γ is odd, then link the remaining {π, γ}-path with any remaining π-path and γ-path. • Arbitrarily link pairs of remaining π-paths, all of which have the same parity. Do the same for any γ-paths remaining.
DCJ-Indel Distance • Theorem: The preceding algorithm solves DCJ-indel sorting in linear time, and it implies a DCJ-indel distance formula: ind where δ = 1 only if pπ, γis odd and either: • pπodd > pπeven , pγodd> pγeven; or • pπodd < pπeven , pγodd < pγeven Otherwise, δ = 0.
Section 4: The Solution Space of DCJ-Indel Sorting Preliminaries Encoding Indels as DCJs DCJ-Indel Sorting The Solution Space of DCJ-Indel Sorting Conclusion
Encompassing all Possible Cases • The solution space is known for DCJ-sorting (Braga and Stoye, 2010). • Thus, we only need to find all optimal completions, and the specific operations will fall out in the wash.
Handling Circular Singletons • The circular singletons of Π must be removed in sing(Π) steps. We have two options: • Delete all the circular singletons of Π. • Perform k “fusion” DCJs followed by sing(Π) – kchromosome deletions. • This poses a straightforward (yet tedious) counting problem.
Adding Necessary Conditions on B(Π*, Γ*) • Proposition 1: Every π-path embedding into a 3-chain of an optimal completion must have the same parity. • Proposition 2: If pπ, yis even, then B(Π*, Γ*) must contain a maximum collection of even 2-chains. • Proofs are slightly more involved…
Finishing the Job • Four cases, depending on path statistics. • pπ, γ is odd: • pπodd > pπeven , pγodd > pγeven (or vice-versa); δ = 1 • pπodd> pπeven , pγodd < pγeven(or vice-versa); δ = 0 • pπ, γ is even: • pπodd > pπeven , pγodd > pγeven (or vice-versa); δ = 0 • pπodd> pπeven , pγodd < pγeven(or vice-versa); δ = 0 • These cases are tedious but straightforward and can be handled similarly.
Section 5: Conclusion Preliminaries Encoding Indels as DCJs DCJ-Indel Sorting The Solution Space of DCJ-Indel Sorting Conclusion
Future Work • Correspondence with Braga et al., 2010? • Varying the indel cost? • Charge indel cost ≤ DCJ cost, take minimum total cost. • Most of the simplifying sorting lemmas hold, but actually computing the minimum cost appears difficult in this model. • The problem is solved! (under framework of Braga et al., 2010)
Shameless Plug • www.rosalind.info • A novel education website that teaches bioinformatics through programming exercises. • Have “professor” environment for assigning programming exercises to your bioinformatics classes.