850 likes | 1.01k Views
Length Reduction in Binary Transforms. Amihood Amir Bar Ilan University and Johns Hopkins University. Oren Kapah Ely Porat Amir Rothschild. Motivation. Error in Content:. U. of Minnesota. Bar Ilan University. Error in Address:. U. of Minnesota. Bar Ilan University.
E N D
Length Reduction in Binary Transforms Amihood Amir Bar Ilan University and Johns Hopkins University Oren Kapah Ely Porat Amir Rothschild
Error in Content: U. of Minnesota Bar Ilan University Error in Address: U. of Minnesota Bar Ilan University
Motivation: Architecture. Assume distributed memory. Our processor has text and requests pattern of length m. Pattern arrives in m asynchronous packets, of the form: <symbol, addr> Example:<A, 3>, <B, 0>, <A, 4>, <C, 1>, <B, 2> Pattern: BCBAA
Our Model… Text: T[0],T[1],…,T[n] Pattern: P[0]=<C[0],A[0]>, P[1]=< C[1],A[1]>, …, P[m]=<C[m],A[m]>; P[i] є∑, I[i] є {1,…,m}. Standard pattern Matching: no error in A. Asynchronous Pattern Matching: no error in C. Eventually: error in both.
Address Register log m bits “bad” bits What does “bad” mean? 1.bit “flips” its value. 2. bit sometimes flips its value. 3. Transient error.
We will now concentrate on consistent bit flips Example:Let∑={a,b} T[0] T[1] T[2] T[3] a a b b P[0] P[1] P[2] P[3] b b a a
Example: BAD P[0] P[1] P[2] P[3] b b a a P[00] P[01] P[10] P[11] P[00] P[01] P[10] P[11] b b a a
Example: GOOD P[0] P[1] P[2] P[3] b b a a P[00] P[01] P[10] P[11] P[00] P[01] P[10] P[11] a a b b
Example: BEST P[0] P[1] P[2] P[3] b b a a P[00] P[01] P[10] P[11] P[00] P[01] P[10] P[11] a a b b
Naïve Algorithm For each of the 2 = m different bit combinations try matching. Choose match with minimum bits. Time:O(m ). log m 2
Approximate Pattern Matching Hamming distance: For every location, write number of mismatches Text: A B B A B C B A A B C B A B B C Pattern: A B C B A
Approximate Pattern Matching Hamming distance: For every location, write number of mismatches Text: A B B A B C B A A B C B A B B C Pattern: A B C B A 3
Approximate Pattern Matching Hamming distance: For every location, write number of mismatches Text: A B B A B C B A A B C B A B B C Pattern: A B C B A 3
Approximate Pattern Matching Hamming distance: For every location, write number of mismatches Text: A B B A B C B A A B C B A B B C Pattern: A B C B A 5
Approximate Pattern Matching Hamming distance: For every location, write number of mismatches Text: A B B A B C B A A B C B A B B C Pattern: A B C B A 0
Approximate Pattern Matching Hamming distance: For every location, write number of mismatches Text: A B B A B C B A A B C B A B B C Pattern: A B C B A 4 Naïve Algorithm Time: O(nm)
In Pattern Matching Polynomial Multiplication: b0 b1 b2 b0 b1 b2 b0 b1 b2 Naïve Time: O(nm)
What do the Two Examples have in Common?What Really Happened? P[0] P[1] P[2] P[3] 0 0 0 T[0] T[1] T[2] T[3] 0 0 0 C[-3] C[-2] C[-1] C[0] C[1] C[2] C[3] Dot products array:
What Really Happened? P[0] P[1] P[2] P[3] 0 0 0 T[0] T[1] T[2] T[3] 0 0 0 C[-3] C[-2] C[-1] C[0] C[1] C[2] C[3]
What Really Happened? P[0] P[1] P[2] P[3] 0 0 0 T[0] T[1] T[2] T[3] 0 0 0 C[-3] C[-2] C[-1] C[0] C[1] C[2] C[3]
What Really Happened? P[0] P[1] P[2] P[3] 0 0 0 T[0] T[1] T[2] T[3] 0 0 0 C[-3] C[-2] C[-1] C[0] C[1] C[2] C[3]
What Really Happened? P[0] P[1] P[2] P[3] 0 0 0 T[0] T[1] T[2] T[3] 0 0 0 C[-3] C[-2] C[-1] C[0] C[1] C[2] C[3]
What Really Happened? P[0] P[1] P[2] P[3] 0 0 0 T[0] T[1] T[2] T[3] 0 0 0 C[-3] C[-2] C[-1] C[0] C[1] C[2] C[3]
What Really Happened? P[0] P[1] P[2] P[3] 0 0 0 T[0] T[1] T[2] T[3] 0 0 0 C[-3] C[-2] C[-1] C[0] C[1] C[2] C[3]
Another way of defining the transform: Where we define: P[x]=0 for x<0 and x>m.
FFT solution to the “shift” convolution: 1. Compute in timeO(m log m) (values of X at roots of unity). 2. For polynomial multiplication compute values of product polynomial at roots of unity in timeO(m log m). 3. Compute the coefficient of the product polynomial, again in time O(m log m).
A General Convolution C f Bijections ;j=1,….,O(m)
Consistent bit flip as a Convolution Construct a mask of length log m that has 0 in every bit except for the bad bits where it has a 1. Example: Assume the bad bits are in indices i,j,k є{0,…,log m}. Then the mask is i j k 000001000100001000 An exclusive OR between the mask and a pattern index Gives the target index.
Example: Mask:0010Index:1010 1000 Index: 1000 1010
Our Case: Denote our convolution by: Our convolution: For each of the 2 =m masks, let jє{0,1} log m log m
To compute min bit flip: Let T,Pbe over alphabet {0,1}: For each j,is a permutation ofP. Thus, only the j ’s for which = number of 1 ‘s in T are valid flips. Since for them all 1’s match 1’s and all 0’s match 0’s. Choose valid j with minimum number of 1’s.
Time All convolutions can be computed in time O(m ) After preprocessing the permutation functions as tables. Can we do better? (As in the FFT, for example) 2
Idea – Divide and Conquer-Walsh Transform • SplitTandPto the length m/2 arrays: • Compute • Use their values to compute • in time O(m) . • Time:Recurrence:t(m)=2t(m/2)+m • Closed Form:t(m)=O(m log m)
Sparse Transform Applications where most of the input is 0. The locations where there are “1”s are given as inputs We are only interested in the transform results for the locations whereallpattern “1”s match text “1”s.
Motivation – Point Set Matching 1-D Point Set Matching: T: (t1,t2,…,tn) P: (p1,p2,…,pm) 2-D Point Set Matching – Searching in Music:
Notations: Length of text: N Length of Pattern: M Number of “1”s in text: n Number of “1”s in pattern: m.
Idea: Map text and pattern to small text and pattern: Hash function h
Idea: Do fast transform on the small text and pattern
Idea: Map results onto transform result of original text and pattern. h-1
Length Reduction in DFT Goal:Given two vectors V1&V2, obtain two vectors V’1&V’2 of size O(n’) such that all non-zero in V1 and in V2 will appear as singletons respectively while maintaining the distance property. The Distance Property:If V’2[h(0)] is aligned with V’1[h(i)], then V’2[h(j)] is aligned with V’1[h(fi (j))] = V’1[f(i +j)] . Using the reduced size vectors, matching can be done in time O(n’ log n’) using the FFT algorithm.
Example: Length Reduction The vectors are given as sets of pairs:(index, value) V1:(0, 5), (6, 2), (13, 3), (19, 1) V2:(0, 2), (7, 3) Length Reduction Hash Function:mod(5) V’1: V’2:
The Randomized Algorithmof Cole & Hariharan [STOC 02] Idea:Find a set of log(n) short vectors, in which with high probability, each non-zero in V, appears as a singleton in at least one of the vectors. Hash functions: (ax mod(q))mod(s). Where q is a large prime number, and s is O(n). If s is c·n, then the probability of a non-zero appearing as a multiple is constant. Using log(n) different hash functions will reduce the failure probability exponentially.
Problem For the Walsh Transform, the mod function is useless. The distance property has to do with exclusive or, not addition!
IDEA Instead of the modulo function Do an exclusive or( ) of the index bits with a random bit string.
Example Let’s do the Walsh Transform