240 likes | 397 Views
Faster 2-Dimensional Scaled Matching. Amihood Amir and Eran Chencinski. Real Scaling. Given an n x n Text T, m x m pattern P, find all occurrences of P in T, scaled to any read scale Best known algorithm [Amir at el.]: Time: O(nm 3 +n 2 m*log(m)) Space: O(nm 3 +n 2 ) Our Altorithm:
E N D
Faster 2-Dimensional Scaled Matching Amihood Amir and Eran Chencinski
Real Scaling • Given an n x n Text T, m x m pattern P, find all occurrences of P in T, scaled to any read scale • Best known algorithm [Amir at el.]: • Time: O(nm3+n2m*log(m)) Space: O(nm3+n2) • Our Altorithm: • Time: O(n2m) Space: O(n2)
Scaling – Algebraic Definition • Rounding Function:
Scaling – Algebraic Definition • Given pattern P, of size m x m, and scale r • The first row would be scaled to || 1*r || • The first 2 rows would be scaled to || 2*r || • … • The first m rows would be scaled to || m*r || • Similarly on the columns
Scaling – Algebraic Definition • Rounding Function: • Inverse Rounding Function: suppose we know that K rows where scaled to L row:
Subrow/column Repetition Query Query time: O(1), preprocessing time: O(n2)
Algorithm Layout The algorithm consists of 4 stages: 1. Scale Elimination 2. Candidate Consistency 3. Candidate Verification 4. Occurrence Recognition Each stage takes O(n2m) time and O(n2) space
Scale Elimination Stage Pivot
Scale Elimination Stage (i,j)
(i,j) Scale Elimination Stage O(m) time for each location, O(n2m) total, O(n2) space
Candidate Consistency Stage Case (a) Case (b)
Witness Table Construction For each suffix O(m2) time and O(m) space
Pre-Dueling Step For each candidate c in T: For each suffix s of P: Compare c’s borders with witness table borders of suffix s If borders are not the same – c is eliminated Can be done in O(m) time for each candidate
The Dueling Order Each candidate performs at most O(m) succ. duels
Witness Table construction: • O(m3) time, O(m2) space Pre-Dueling Step: • O(n2m) time, O(m2) space # of Duel • At most O(n) unsucc., at most O(n2m) succ. where each duel takes O(1) time Total: O(n2m) time, O(n2) space Candidate Consistency Stage
Candidate Verification Stage For each location find maximal containing interval Can be solved in O(n) time per row using solution to Maximal Interval Problem
Candidate Verification Stage Once we find the largest interval we: • Verify each row in O(m) time, using subcolumn repetition queries • Save the longest matching length • For each candidate run a Range Minimum Query on the lengths The pattern appears iff pattern size >= RMQ
Finding largest intervals: • O(n) time per row, O(n2) total Verifing columns: • O(nm) time per row, O(n2m) total RMQ : • Preprocess: O(n) time per row, O(n2) total • Quering: O(1) time per candidate, O(n2) total Total: O(n2m) time, O(n2) space Candidate Verification Stage
Recall: Scale elimination stage returned Occurrence Recognition Stage At most O(m) steps per candiate Total: O(n2m) time
Conclusions The algorithm consists of 4 stages: 1. Scale Elimination 2. Candidate Consistency 3. Candidate Verification 4. Occurrence Recognition Each stage takes O(n2m) time and O(n2) space