140 likes | 228 Views
On the weight of indels in genomic distances. Marília D. V. Braga, Raphael Machado, Leonardo C. Ribeiro and Jens Stoye. ( Inmetro - Brazil / Bielefeld University - Germany ). RECOMB-CG 2011. Guidance. Hybrid models for genome rearrangements Triangle inequality disruption
E N D
On the weight of indels in genomic distances MaríliaD. V. Braga, Raphael Machado, Leonardo C. Ribeiroand Jens Stoye ( Inmetro - Brazil / Bielefeld University - Germany ) RECOMB-CG 2011
Guidance • Hybrid models for genome rearrangements • Triangle inequality disruption • General framework to establish the triangle inequality • Tight bounds for DCJ-indel (and DCJ-substitution) distance Background Results
Definitions genome chromosome Marker d b c telomere w a s t A: ct dt at bt dt dh wt at st tt ch ah ch dh th ah wh sh dh dt bh ct vt vh tail head b c d a v B:
Genomic distance Inversion a b c c b d Some models: Classicalgenomic distances Hannenhalli & Pevzner 1995 (inv.+transloc.) Yancopouloset al. 2005 (DCJ) Bergeron et al. 2006 (DCJ) d c b Translocation b d c Organizational Operations a a b w Distances with indels El Mabrouk 2001 (inversion-indel distance) Yancopoulos et al. 2008 (“ghost-DCJ” distance) Braga et al. 2010 (DCJ-indel distance) Insertion Indel Operations • Indels in these models are applied to blocks of markers
Triangle Inequality When indel operations of multiple markers are allowed, the triangle inequality may be disrupted [Yancopoulos et al. 2008] dist= 3 inversions A = a b c d e B = a c d b e dist(A, B)≤ dist(A, C) +dist(C, B) dist= 1 indel dist= 1 indel C = a e Is there a distance definition that does not disrupt the triangle inequality?
Double cut and join with indels The adjacency graph AG(A, B): A: ct chbh btwat ahdt dh ahxzbh ct chdt bt dhat B: • Sorting A into B • Only common markers: • Minimum number of DCJs: dDCJ(A, B) = nAB - (# cycles+ # AB-paths/2) [Bergeron et al. 2006] • Including unique markers: • DCJ + indel operations: A-run A-run L1 L4 L2 Λ(P) = # of runs in C dDCJ-id(A, B) ≤ dDCJ(A, B) + λ (P) Λ(P) + 1 term related to the number of markers added or removed λ(P) = 2 L3 [WABI 2010] B-run
A posterioricorrection Fixing the triangle inequality – prior work [JCB 2011]: Applying an a posteriori correction, the triangular inequality holds for the function mid(A , B) = dDCJ-id(A , B) + ku(A , B) and for any constant k≥ 3/2, where u(A,B) = #unique markers in A and B. To improve the lower bound of k we study the worst case for the inequality disruption.
Evaluation of k Worst case (suppose unichromosomal genomes) General case maximum distance dDCJ-id = diameter A B A B Minimum distance dDCJ-id = 1 Minimum distance dDCJ-id = 1 C C = { }
Finding the diameter/lowest k 2. The number of vertices in the adjacency graph AG(A,B) is 2nAB + 2: number of common markers +1 ch A: at ah et ettct So, we have: dDCJ-id(A,B) ≤ |AG(A,B)| = 2nAB + 2 1. The DCJ distance is at least equal to the number of vertices of AG(A,B) dDCJ(P)= λ(P)≤ dDCJ-id(P) = dDCJ(P) + λ(P) ≤|P| dDCJ-id(A,B) ≤ ΣdDCJ-id(P) = Σ |P| = |AG(A,B)| 3. The corrected distance mid satisfies the triangular inequality if k ≥ 1: dDCJ-id (A,C) + k u(A,C) + dDCJ-id (B,C) + k u(B,C) ≥ dDCJ-id (A,B) + k u(A,B) 1 + 1 + k (2 nAB + nA + nB)≥2 nAB + 2 + k (nA + nB) 2knAB≥2nAB
Framework to assign weights to Indels Let w(ρ) be the weight of an operation ρ. • For any organizational operation: • w(ρ) = 1 • For indels: • w(ρ) = p + k m(ρ) where m(ρ) the number of markers inserted or deleted by ρ. a b w s Insertion m(ρ) = 2
Distance on Hybrid Model Assuming p=1 = k (m(ρ2) + m(ρ3) + . . . + m(ρn)) = k u(A,B) = dDCJ-indel Number of operations dHp,k(A,B) = dHp,0(A,B) + k u(A,B)
More plausible distances? 3 inversions a c d b e a b c d e 1 indel 1 indel a e „ghost-DCJ model“ DCJ-indelmodel (k=1) 3 3 a c d b e a b c d e a c d b e a b c d e 2 2 4 4 a e a e
Conclusion • DCJ-indel distance is a metric for • A posteriori distance correction is equivalent to the hybrid model • Similar results for DCJ-substitution distance(see talk by Marília Braga, Sunday) • Open: • p ≠ 1 • Other weight functions • Inversion-indel distance