290 likes | 305 Views
Efficient Equal Interval Neighborhood Ring (P-trees technology is patented by NDSU). OUTLINE. Review: HOBBit Metric Equal Interval Neighborhood Ring (EINring) Prototype Problem Definition of Range Mask Propositions Definition of EINring Calculation of EINring Using P-trees Summary.
E N D
Efficient Equal Interval Neighborhood Ring(P-trees technology is patented by NDSU)
OUTLINE • Review: HOBBit Metric • Equal Interval Neighborhood Ring (EINring) • Prototype Problem • Definition of Range Mask • Propositions • Definition of EINring • Calculation of EINring Using P-trees • Summary
Review: HOBBit Similarity Metric • Let X and Y be two values, the HOBBit similarity between X and Y is defined by • where xi and yi are the bits of X and Y respectively, denotes XOR. In another word, it is the left most position at which X and Y differ.
Review: HOBBit Distance & Ring • The HOBBit distance between two tuples X and Y is defined by • HOBBit Ring: The HOBBit ring of radii, r1 and r2 , centered at c is defined as R(c, r1, r2) = {x X | r1 d(c,x) < r2}, where d(c,x) is HOBBit distance.
Diagram of HOBBit Ring Diagram of HOBBit Ring
Summary of HOBBit Metric • The HOBBit metric is based on the most significant matching bit positions starting from the left. • HOBBit ring is a geometric ring whose diameter increases exponentially. • HOBBit ring is eccentric ring.
Outline • Prototype Problem • Definition of Range Mask • Propositions and Theorem • Definition of EINring • Calculation of EINring using P-trees
Prototype Problem • Problem: x > (4)10 > (100)2 • Conjecture: Px>(100)2 = P3(P2P1) 7 7 7 7 5 5 1 1 7 7 7 7 1 1 1 1 5 5 7 7 4 4 1 1 5 7 7 7 4 5 5 1 6 6 6 6 3 3 0 0 6 6 6 6 0 0 0 0 2 2 6 6 3 3 0 0 2 6 6 6 3 3 3 0 8x8 data set
Definition of Range Mask • Range Mask The Range Mask is the P-tree mask that calculates any data point, x, that satisfies range inequality, i.e., x c1, x > c1, x c2, etc., where c1, c2 are integers. • Example: Px>100 is a P-tree mask that calculates any data point greater than 100.
Proposition 1 • Let m be the number of binary bit of jth attribute of data point x, Pj,m, Pj,m-1, … Pj,0 be the basic P-trees of ith bit of jth attribute, and integers c=bmbm-1…b0, where bi is ith binary bit value of c. Let Pxjr be the Range Mask that satisfies inequality xj r, then Pxjr = Pj,m … Pj,i opj,i… Pj,0, s.t. 1) Opj,j is if bi=1, 2) Opj,i is if bi=0, 3) right binding within each attribute. • Example: Pxj (70)10 = Pxj (01000110)2 =P7(P6(P5(P4( P3( P2P1P0))))
Proof Sketch Without loss of generality, assume data point x has one attribute. Let c= bm…bi…b0, where bi is ith bit value of c. Pxjc is the range mask that satisfy x c. If bi=1, the ith bit of x should be set 1 when x and c have the same bit value from position mth to ith position, e.i., Pxjc =Pm…Pi…P0. (Partially done!) If bi=0, there are two cases that satisfy x c, one is to set ith bit of x, xi=1, another is to set xi=0. Thus Pxjc = (Pm… Pi)(Pm…Pi’Pi-1…P0). = < complement rule, X(XY)=XY > Pxjc =(Pm…(Pi(Pi-1…P0)). Done!
Proposition 2 • Let m be the number of binary bit of jth attribute of data point x, P’j,m, P’j,m-1, … P’j,0 be the complement P-trees of ith bit of jth attribute, and integers c=bmbm-1…b0, where bi is ith binary bit value of c. Let Pxj c be the Range Mask that satisfies xj c, then Pxj r = P’j,m … P’j,i opj,i… P’j,0 s.t. 1) Opj,i is if bi=0, 2) Opj,i is if bi=1, 3) right binding within each attribute • Example: Pxj (198)10= Pxj (11000101)2 =P7’ (P6’ (P5’P4’P3’(P2’ (P1’P0’)))
Proposition 3 • Let m be the number of binary bit of jth attribute of data point x, Pj,m, Pj,m-1, … Pj,0 be the basic P-trees of ith bit of jth attribute, and integers c=bmbm-1…b0, where bi is c’s ith binary bit value. Let Pxj>c be the Range Mask that satisfies inequality xj > c, then Pxj>c = Pj,m … Pi,j opi,j… Pj,k, s.t. 1) opi,j is if bi=1, 2) opi,j is if bi=0, 3) right binding within each attribute 4) bk=0, and bj=1 j<k . • Example: Pxj >(72)10= Pxj >(01001000)2 • =P7 (P6 (P5 P4 P3))
Proposition 4 • Let m be the number of binary bit of jth attribute of data point x, P’j,m, P’j,m-1, … P’j,0 be the complement P-trees of ith bit of jth attribute, and integer c=bmbm-1…b0, where bi is c’s ith binary bit value. Let Pxj<c be the Range Mask that satisfies xj < c, then Pxj<c = P’j,m … P’j,i opj,i… P’j,k, s.t. 1) opi,j is if bi=0, 2) opi,j is if bi=1, 3) right binding within each attribute, 4) bk=0, and bj=1 j<k . • Example: Pxj <(72)10= Pxj < (01001000)2 =P7’P6’ (P5’P4’P3’)
More Examples • c=(70)10=(01001000)2 Px<c =P7’P6’ (P5’P4’P3’) • c=(72)10=(01001000)2 Px>c =P7 P6 (P5 P4 P3) • c=(198)10=(11000101)2, Px<=c = P7’ (P6’ (P5’P4’P3’(P2’ (P1’P0’))) • Let c=(198)10=(11000101)2, Px=c =P7P6(P5 (P4 (P3 (P2(P1 P0)))))
Theorem – Range Mask Complement Rule • Theorem Range Mask Complement Rule Let Pxj<c, Pxj<c, Pxj c and Pxj>c be the Range Mask of jth attribute of any data point x, where c is integer, then Pxjc = P’xj<c and Pxj c = P’xj>c hold.
Definition of Neighborhood Ring • Neighborhood Ring: The Neighborhood ring of radii, r1 and r2 , centered at c is defined as R(c, r1, r2) = {x X | r1< abs(x-c) r2}, where abs(x-c) is absolute length between x and c.
Definition of Equal Interval Neighborhood Ring (EINring) • The Equal Interval Neighborhoodring of radii, r1 and r2, centered at c is defined as R(c, r1, r2) = {x X | r1<abs(x-c) r2}, and abs(r2-r1)=2k, where k=1,2,…, abs(x-c) is absolute length between x and c, and is interval
Diagram of EINring Diagram of EINring
Neighbor Count within EINring • For any data point, x, let x = (x1,x2,…xm), where x,j is x’s jth attribute column. Let r be vectors with m elements, we define the range mask Px>c+r as Px>c-r = Px1>c-r1 Px2>c-r2…. Pxj>c-rj and define the range mask Pxc+r as Pxc+r = Px1c+r1 Px2c+r2…. Pxjc+rj where c is a constant.
Neighbor Count within EINring: The range mask for any data points x within the neighborhood ring, R(c, 0, r), are calculated by Pc,r = Px>c-r Pxc+r The neighbor count for x within the neighborhood ring, R(c, 0, r), are calculated by NC (c,0,r) = RC(Pc,r) where RC is the root count of P-tree.
Neighbor Count within EINring The Neighbor CountNC(c, r1, r2) of c within EINring R(c, r1, r2) is calculated as NC(c, r1,r2) =RC(Pc,r2)-RC(Pc,r1)
Summary • Equal Interval Neighborhood Ring (EINring) is much finer than HOBBit ring. • Calculation of EINring using P-trees is efficient, comparable to that of HOBBit ring.