230 likes | 353 Views
Less Than Matching. Orgad Keller. Less Than Matching. Input: A text , a pattern over alphabet with order relation . Output: All locations where Can we use the regular methods?. Transitivity.
E N D
Less Than Matching Orgad Keller
Less Than Matching • Input: A text , a pattern over alphabet with order relation . • Output: All locations where • Can we use the regular methods? Orgad Keller - Algorithms 2 - Recitation 12
Transitivity • Less Than Matching is in fact transitive, but that is not enough for us: does not imply anything about the relation between and . Orgad Keller - Algorithms 2 - Recitation 12
Approach • A good approach for solving Pattern Matching problems is sometimes solving: • The problem for a binary alphabet . • The problem for a bounded alphabet . • The problem for an ubounded alphabet . In that order. Orgad Keller - Algorithms 2 - Recitation 12
Binary Alphabet • The only case that prevents a match at location is the case where: • This is equivalent to: • So how can we solve this case? Orgad Keller - Algorithms 2 - Recitation 12
Binary Alphabet • So if , there is no match at . • We can calculate • Then we’ll calculate using FFT. • We’ll return all locations where • Time: . Orgad Keller - Algorithms 2 - Recitation 12
Bounded Alphabet • We need reductions to binary alphabet. • For each we’ll define: • We notice are binary. Orgad Keller - Algorithms 2 - Recitation 12
Bounded Alphabet • Theorem: (less than) matches at location if and only if , (less than) matches at location . • Proof: does not match at iff . that is true iff , meaning that does not (less than) match at location . Orgad Keller - Algorithms 2 - Recitation 12
Bounded Alphabet • So for each , we’ll run the binary alphabet algorithm on . • We’ll return only the locations that matched in all iterations. • Time: . Orgad Keller - Algorithms 2 - Recitation 12
Unbounded Alphabet • Running the bounded alphabet algorithm could result in a time algorithms (We’ll run it only for alphabet symbols which are actually in pattern). • Can be worse than the naïve algorithm. • We present an improvement on the next slides. Orgad Keller - Algorithms 2 - Recitation 12
Abrahamson-Kosaraju Method • First, use the segment splitting trick. Therefore we can assume . • For each location in text, we’ll produce a triplet: , where . • For each location in pattern, we’ll produce a triplet: , where . • We now have triplets all together. Orgad Keller - Algorithms 2 - Recitation 12
Abrahamson-Kosaraju Method • We’ll hold all triplets together. • Sort all triplets according to symbol. • We’ll define a symbol that has more than triplets as a “frequent symbol”. • There are frequent symbols. • Put all frequent symbols’ triplets aside. Orgad Keller - Algorithms 2 - Recitation 12
Abrahamson-Kosaraju Method • Split non-frequent symbols’ triplets to groups of size in the following manner: Orgad Keller - Algorithms 2 - Recitation 12
Abrahamson-Kosaraju Method • The rule is that there can’t be two triplets of the same symbol in different groups. Orgad Keller - Algorithms 2 - Recitation 12
Abrahamson-Kosaraju Method • For each such group, choose the symbol of the first triplet in group as the group’s representative. • For instance, on previous example, group 1’s representative is and group 2’s representative is . • There are representatives all together. Orgad Keller - Algorithms 2 - Recitation 12
Abrahamson-Kosaraju Method • To sum up: • frequent symbols. • representatives of non-frequent symbols. • We’ll swap each non-frequent symbol in pattern and text with its representative. • Now our text and pattern are over sized alphabet. Orgad Keller - Algorithms 2 - Recitation 12
Abrahamson-Kosaraju Method • We want to run our algorithm over the new text and pattern to count the mismatches between symbols of different groups. • But we have a problem: • Let’s say is a frequent symbol, but: Orgad Keller - Algorithms 2 - Recitation 12
Abrahamson-Kosaraju Method • The representative of group 2 is , which is smaller than , but the group also contains which is greater than . Orgad Keller - Algorithms 2 - Recitation 12
Abrahamson-Kosaraju Method • In that case we’ll split group 2 to two groups with their own representatives. • Since we performed at most such splits, we still have representatives. Orgad Keller - Algorithms 2 - Recitation 12
Abrahamson-Kosaraju Method • We can now run our algorithm over the new text and pattern in . • But we still haven’t handled comparisons between two non-frequent symbols that are in the same group. Orgad Keller - Algorithms 2 - Recitation 12
Abrahamson-Kosaraju Method • We’ll do so naively in each group: • For each triplet in the group • For each triplet of the form in the group, if , then add an error at location . • Time: Orgad Keller - Algorithms 2 - Recitation 12
Running Time • For one segment: • Sorting the triplets and representatives: . • Running the algorithm: . • Correcting results (Adding in-group errors): . • Overall for one segment: . • Overall for all segments: . Orgad Keller - Algorithms 2 - Recitation 12
Running Time • We can improve to . • Left as an exercise. Orgad Keller - Algorithms 2 - Recitation 12