Function Matching

Function Matching Amihood Amir Yonatan Aumann Moshe Lewenstein Ely Porat Bar Ilan University

Baker’s Parameterized Matching Prog.c int a,b; a=1; a = g(a)*5+f(a); b=2; a = func(a,b); a = a*g(b); b=1; b = g(b)*5+f(b); ….

Baker’s Parameterized Matching c=1; c = g(c)*5+f(c); Prog.c int a,b; a=1; a = g(a)*5+f(a); b=2; a = func(a,b); a = a*g(b); b=1; b = g(b)*5+f(b); …. Pattern Baker’s work pdup dupstat psearch SICOMP 1997 JCSS 1996

Two dimensional parameterized matching pattern ‘A horse is a horse, it ain’t make a difference what color it is’John Wayne

Parameterized Matching InputP = p1…pm over alphabet T = t1 . . . tn over alphabet Output: locations i of T, for which a bijection : exists s.t. (P) = (p1) (p2)… (pm) = ti…ti+m-1

Parameterized Matching • One dimensional • Baker 1996, JCSS - Suffix Trees • Baker 1997, SICOMP - Boyer Moore • Amir, Farach, Muthu 1995, IPL - Knuth-Morris-Pratt • Two dimensional Regular methods fail !!

Function Matching Input: P = p1…pm over alphabet T = t1 . . . tn over alphabet Output: locations i of T, where f: exists s.t. f(P) = f(p1)f(p2)…f(pm) = ti…ti+m-1

Function Matching Input: P = p1…pm over alphabet T = t1 . . . tn over alphabet Output: locations i of T, where f: exists s.t. f(P) = f(p1)f(p2)…f(pm) = ti…ti+m-1 P = h e h a e h T = a b c b a c b a d a b d a d d a d

Function Matching Input: P = p1…pm over alphabet T = t1 . . . tn over alphabet Output: locations i of T, where f: exists s.t. f(P) = f(p1)f(p2)…f(pm) = ti…ti+m-1 f(h) = b f(e) = c f(a) = a P = hehaeh T = a bcbacb a d a b d a d d a d

Function Matching Input: P = p1…pm over alphabet T = t1 . . . tn over alphabet Output: locations i of T, where f: exists s.t. f(P) = f(p1)f(p2)…f(pm) = ti…ti+m-1 f(h) = a f(e) = d f(a) = b P = hehaeh T = a b c b a c b adabda d d a d

Function Matching Input: P = p1…pm over alphabet T = t1 . . . tn over alphabet Output: locations i of T, where f: exists s.t. f(P) = f(p1)f(p2)…f(pm) = ti…ti+m-1 f(h) = d f(e) = a f(a) = d P = hehaeh T = a b c b a c b a d a b daddad

Function Matching Input: P = p1…pm over alphabet T = t1 . . . tn over alphabet Output: locations i of T, where f: exists s.t. f(P) = f(p1)f(p2)…f(pm) = ti…ti+m-1 no match ! f(h) = ?? P = h e h a e h T = a b c b a c b a d a b d a d d a d

Function Matching vs. Parameterized Matching P p-matches ti…ti+m-1 iff 1. P f-matches ti…ti+m-1 and 2. # of symbols in ti…ti+m-1 = # of symbols in P f(h) = b f(e) = c f(a) = a f(h) = d f(e) = a f(a) = d P = hehaeh hehaeh T = a bcbacb a d a b daddad

Naïve Algorithm At each location i of text T check if pattern f-matches Check For each letter ‘a’ in pattern Are elements aligned with the pattern ‘a’s the same? no? declare ‘no match’ All letters “OK” – declare ‘match’ Running time:O(nm), where m = |P| and n = |T|

Function Matching with Don’t Cares Input: P = p1…pm over alphabet {?} T = t1 . . . tn over alphabet Output: locations i of T, where f: exists s.t. f(P) = f(p1)f(p2)…f(pm) = ti…ti+m-1, f(?) - wildcard P = he ? ? eh T = a b c b a cb c d bc d a d d a d

Why do we need don’t cares? Pattern Text

Linearize Text and Pattern Pattern Text Line 1 Line 2 T = …

Linearize Text and Pattern n m Text m Pattern n Line 1 Line 2 n-m n-m P = ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? T= … … Line 5 Line 6

Polynomial Multiplication - Convolutions t1 t2 t3 t4 . . . tn-2 tn-1 tn pm pm-1 . . . p2 p1 p1t1 p1t2 . . . p1tn-2 p1tn-1 p1tn p2t1 p2t2 p2t3 . . . p2tn-2 p2tn-1 p2tn p3t1 p3t2 p3t3 p3t3 . . . p3tn-1 p3tn . . . .. . . . pmt1 . . . pmtm pmtm+1 . . pmtn-1 pmtn . . . . . . Running time: O(n log m)

Convolutions: Fischer-Patterson [1974] p1 p2 p3 p4 . . . pm t1t2 t3 t4 . . . tn-2 tn-1 tn pm pm-1 . . . p2 p1 p1t1 p1t2 . . . p1tn-2 p1tn-1 p1tn p2t1p2t2p2t3 . . . p2tn-2 p2tn-1 p2tn p3t1 p3t2p3t3p3t4 . . . p3tn-1 p3tn . . . .. . . . pmt1 . . . pmtmpmtm+1 . . pmtn-1 pmtn . . . . . .

Convolutions: Fischer-Patterson [1974] p1 p2 p3 p4 . . . pm t1 t2 t3 t4 . . . tn-2 tn-1 tn pm pm-1 . . . p2 p1 p1t1 p1t2 . . . p1tn-2 p1tn-1 p1tn p2t1 p2t2 p2t3 . . . p2tn-2 p2tn-1 p2tn p3t1 p3t2 p3t3 p3t4 . . . p3tn-1 p3tn . . . .. . . . pmt1 . . . pmtm pmtm+1 . . pmtn-1 pmtn . . . . . .

How does this help for Function Matching? The property that needs to be checked is: beneath each symbol from the pattern alphabet all text characters must be the same

Example - T = a b c b a c b a c a b d a d d a d e a P = h e h a e h ? e PR = e ? h e a h e h

Example - T = a b c b a c b a c a b d a d d a d e a P = h e h a e h ? e PR = e ? h e a h e h h in P vs. a in T Ta= 1 0 0 0 1 0 0 1 0 1 0 0 1 0 0 1 0 0 1 PRh = 0 0 1 0 0 1 0 1

Example - T = a b c b a c b a c a b d a d d a d e a P = h e h a e h ? e PR = e ? h e a h e h h - a Ta= 1 0 0 0 1 0 0 1 0 1 0 0 1 0 0 1 0 0 1 PRh= 0 0 1 0 0 1 0 1 1 0 0 0 1 0 0 1 0 1 0 0 1 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 1 0 1 0 0 1 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 1 0 1 0 0 1 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 1 1 0 2 0 2 1 0 3 0 1 2 0 1 2 0 1 1 0 1

Example - h e h a e h ? e T = a b c b a c b a c a b d a d d a d e a P = h e h a e h ? e PR = e ? h e a h e h h - a Ta = 1 0 0 0 1 0 0 1 0 1 0 0 1 0 0 1 0 0 1 PRh = 0 0 1 0 0 1 0 1 1 0 0 0 1 0 0 1 0 1 0 0 1 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 1 0 1 0 0 1 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 1 0 1 0 0 1 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 1 1 0 2 0 2 1 0 3 0 1 2 0 1 2 0 1 1 0 1

Example - T = a b c b a c b a c a b d a d d a d e a P = h e h a e h ? e PR = e ? h e a h e h h - a Ta = 1 0 0 0 1 0 0 1 0 1 0 0 1 0 0 1 0 0 1 PRh = 0 0 1 0 0 1 0 1 0 0 1 0 0 1 1 1 0 2 0 2 1 0 3 0 1 2 0 1 2 0 1 1 0 1 => in O(n log m) time!!

Example - T = a b c b a c b a c a b d a d d a d e a P = h e h a e h ? e PR = e ? h e a h e h h - a 1 0 2 0 2 1 0 3 0 1 2 0 h - b 0 3 0 1 1 1 1 0 1 0 1 0 h - c 2 0 1 2 0 1 1 0 1 0 0 0 h - d 0 0 0 0 0 0 1 0 1 2 0 3 Match(h) 0 1 0 0 0 0 0 1 0 0 0 1 => in O(| | n log m) time!!

In general - the Algorithm • For each character ‘a’ in create Pa • For each character ‘b’ in create Tb • For all Pa and Tb multiply them and • construct Match(a) for each ‘a’ in • Announce each location i of T as a ‘match’ if Match(a)[i] = 1 for all a’s in P => in O(| || | n log m) time.

Improvement Lemma: Let a1, ..., ak , then k iff for all i,j, ai = aj Idea: Let’s encode text with numbers for symbols and encode pattern to compute their sum and separately their sum of squares.

Improvement Lemma: Let a1, ..., ak , then k iff for all i,j, ai = aj Example: Compute sum of text char’s beneath “e” T# =1 2 3 2 13 2 1 3 1 2 4 1 4 4 1 4 5 1 T = a b c b a c b a c a b d a d d a d e a P = h e h a e h ? e Pe = 0 1 0 0 1 0 0 1

Improvement Lemma: Let a1, ..., ak , then k iff for all i,j, ai = aj Example: Compute sum of squares beneath “e” T#2= 1 4 9 4 1 9 4 1 9 1 4 16 1 16 16 1 16 25 1 T# =1 2 3 2 1 3 2 1 3 1 2 4 1 4 4 1 4 5 1 T = a b c b a c b a c a b d a d d a d e a P = h e h a e h ? e Pe = 0 1 0 0 1 0 0 1

Improvement Lemma: Let a1, ..., ak , then k iff for all i,j, ai = aj Running Time: Two convolutions for each pattern character. O(| | n log m)

We have seen – 2 algorithms for Function Matching • O(nm) - naïve algorithm • O(| | n log m) - convolution based • O(n log2m) - randomized convolutions based • Lower bound of (nm) for deterministic convolutions based methods We will see: Can we do better for big alphabets?

Def:A pattern is 2-charactered if every character appears at most twice in the pattern. Lemma: Let P be a pattern and T a text. 2-charactered patterns P1 and P2 s.t. at loc. i of T Pf-matches iffP1 and P2f-match. Example:P = a b c b c c b b P1 = a1 b1 c1 b1 c1 c2 b2 b2 (even pairs) P2 = a1 b1 c1 b2 c2 c2 b2 b3 (odd pairs)

Situation: An algorithm for Function Matching with 2-charactered patterns a general algorithm for Function Matching. So, all that needs to be checked is that: each pair in P has equal text symbols beneath it.

New Randomized Algorithm • For each character:- a in T, randomly choose ra in {0, 1} - relace all a’s in T with ra - get T’- b in P, randomly choose sbin {1,2} - set first b to be sb and the second b to be -sb - get P’ • Convolve T’ and P’R • For each location i, for which T’*P’R[i] equals 0 for the convolutiondeclare a ‘match’

h(v) = a h(q) = b h(u) = a h(s) = a Example: P = v q v u q u ? s T = a b a a b a b a c a b d a b c b d b a g(P) = 2 6 –2 8 –6 –8 0 0 f(T) = 1 0 1 1 0 1 0 1 0 0 1 0 1 1 0 0 0 1 0 1 2+0–2+8+0–8+0+0 = 0 g(v) = g(q) = g(u) = 2 6 8 f(a) = f(b) = f(c) = f(d) = 1 0 0 1

Example: P = v q v u q u ? s T = a b a a b a b a c a b d a b c b d b a g(P) = 2 6 –2 8 –6 –8 0 0 f(T) = 1 0 1 1 0 1 0 1 0 0 1 0 1 1 0 0 0 1 0 1 0+6–2+0-6+0+0+0 = -2 g(v) = g(q) = g(u) = 2 6 8 f(a) = f(b) = f(c) = f(d) = 1 0 0 1

Example: P = v q v u q u ? s T = a b a a b a b a c a b d a b c b d b a g(P) = 2 6 –2 8 –6 –8 0 0 f(T) = 1 0 1 1 0 1 0 1 0 0 1 0 1 1 0 0 0 1 0 1 0= 2+6+0+0+0-8+0+0 g(v) = g(q) = g(u) = 2 6 8 f(a) = f(b) = f(c) = f(d) = 1 0 0 1

Running Time: O(nk log m) with probability 2-k O(n log2m) with probability 1/m if P f-matches at location i of T then f(T)*g(P)R [i+m-1] is trivially always equal to 0 if P does not f-match at location i of T then for each convolution <f,g>, f(T)*g(P)R [i+m-1], equals 0 with probability ½ with k rounds of amplification the probability is (½)k Correctness:

Limitation of the Convolutions Model Can we do the same deterministically? No! To show this we use the model of communication complexity Alice Bob x y f(x,y)

Limitation of the Convolutions Model Known: for x,y in {0,1}k the communication complexity of equals(x,y) is (k) Take pattern P = a1 a2 a3 …am a1 a2 a3 …am, where i j ai aj Given a collection of convolutions {<g(P), f(T)>} the convolutions of location i, (g(P)*f(t))[i+m-1] = g(aj )*f(ti+j-1) + g(aj )*f(ti+j+m-1). Since we are in essence comparing ti…ti+m-1 to ti+m…ti+2m-1 we get the equal information from the convolution. This is lower bounded by (m) for each location, In general (nm)

Another Application for Function Matching Protein Folding detection: 10 10 9 9 8 8 1 2 3 7 7 1 2 3 4 5 6 P = 1 2 3 4 5 6 7 8 9 10 10 9 8 7 6 5 4 11 12 … 12 11 3 2 1

Questions • Can Function Matching be solved deterministicallyin o(nm) time for big alphabets? • Are there special cases of Function Matching thatare easier (other than Parameterized Matching andother trivial ones)? • Does 2-dimensional Parameterized Matching needto be solved with function matching?

Function Matching

Function Matching

Presentation Transcript

Matching

matching

Matching

Turn angle function and elastic time series matching

Matching

Matching

Matching

Matching

Line Matching

Matching

Matching

Matching

Function Matching: Algorithm, Application, and a Lower Bound

Matching

Matching

Matching

Property Matching and Weighted Matching

Matching

Horoscope Matching | Online Kundali Matching

Matching

Matching

Tier 2 / 3 Matching Support to Function of Behavior