230 likes | 351 Views
Mining for Empty Rectangles in Large Data Sets. Jeff Edmonds Jarek Gryz Dongming Liang Renee Miller. A. B. 3. 6. 1. 7. 3. 8. 1 2 3. 0. 0. 1. 6. 1. 0. 0. 7. 0. 0. 1. 8. Matrix representation. A,B (R. S). al. um. 0. A. B. 3. 6. 0. 1. 7. 0. 3. 8. 1 2 3. 0.
E N D
Mining for Empty Rectangles in Large Data Sets Jeff Edmonds Jarek Gryz Dongming Liang Renee Miller
A B 3 6 1 7 3 8 1 2 3 0 0 1 6 1 0 0 7 0 0 1 8 Matrix representation A,B(R S)
al um 0 A B 3 6 0 1 7 0 3 8 1 2 3 0 0 0 0 1 6 0 0 1 0 0 7 0 0 0 0 1 8 Find All Maximal 0-Rectangles A,B(R S)
Car Year … Example A,B(R S) 95 96 97 0 0 0 0 1 BMW Z3 1 0 0 Honda L2 0 0 1 Toyota 6A First BMW Z3 series cars were made in 1997.
Find all maximal empty rectangles • between points in real plane • O( (# 1’s)2 ) • within a 0-1 matrix • O( #0’s ) • Machine Learning • Computational Geometry • Query Optimization Relation to Previous Work [Namaad, Hsu, Lee] Our Work [Lui, Ku, Hsu] & [Orlowski] Problem: Purpose: # of maximal 0-rectangles:
O( # 1’s log(#1’s) + # rectangles ) = O(|X||Y|) • O( #0’s ) = O(|X||Y|) • O(|X||Y|) • O(min(|X|, |Y|)) • only two rows of matrix kept in memory Relation to Previous Work [Namaad, Hsu, Lee] Our Work [Lui, Ku, Hsu] & [Orlowski] Time: Space:
Intensive random memory access • Requires a single scan of the sorted data • IBM paid us $25,000 to patent it! • Scales Badly • Scales well wrt • # of tuples in join • # of maximal rectangles • # of values |X| & |Y| Relation to Previous Work [Namaad, Hsu, Lee] Our Work [Lui, Ku, Hsu] & [Orlowski] Practical Implementation: Scalable: Practical?
Structure of Algorithm loop y = 1..|Y| loop x = 1..|X| • Output all maximal 0-rectangles with <x,y> as bottom-right corner • Maintain the loop invariant 1 X Y 1 Timing O(1) amortized time per <x,y> 1 • 0 0 1 1 <x,y> * 1
Designing an Algorithm Exit Exit Exit 0 km Exit 79 km 75 km 79 km to school Exit
1 X Y 1 1 • 0 0 1 1 <x,y> * 1 Define the Loop Invariant • We have read the matrix up to <x,y> and cannot reread the matrix. • We must output all maximal 0-rectangles with <x,y> as bottom-right corner • What must we remember?
1 ( x ,y ) r r 1 Stack of steps step 0 0 y* 1 1 0 0 0 1 0 ( x ,y ) ( x ,y ) ( x ,y ) ( x ,y ) ( x ,y ) 3 2 1 4 5 4 1 2 5 3 1 0 1 0 0 0 0 x* 1 Y 1 <x,y> * X
Constructing Maximal Rectangles <x,y> *
Constructing Maximal Rectangles • Too Narrow • Maximal • Too short <x,y> *
0 <x,y> * Constructing staircase(x,y)from staircase(x-1,y) 1 1 0 Case 1 0 0 1 0 1 0 0 0 0 1 0 0 1 0 0 0 0 1 0 <x-1,y> * 1 0 1 0 0 0 0
Constructing staircase(x,y)from staircase(x-1,y) 1 Case 2 ( x ,y ) r r 1 1 Y 1 1 0 1 0 0 0 0 0 1 0 ( x ,y ) 1 1 <x-1,y> * 1 0 ( x, y ) 1 0 0 0 0 X
Constructing staircase(x,y)from staircase(x-1,y) Delete 1 • Too Narrow • Maximal • Too short ( x ,y ) r r 1 1 Keep Y 1 1 0 0 0 1 0 0 0 0 0 1 0 ( x ,y ) 1 1 <x,y> * <x-1,y> * 1 0 ( x, y ) 1 0 0 0 0 X
y* Constructing x*& y* 1 ( x ,y ) r r 1 1 1 0 1 0 0 1 0 0 0 0 0 1 0 ( x ,y ) 1 1 <x,y> * 1 0 ( x, y ) x* 1 0 0 0 0
Location of last 1 seen in each column 1 1 1 Y 1 0 0 0 1 0 1 0 0 0 1 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 * 0 0 0 0 0 0 0 0 0 0 <x,y> X
Third Structure of Algorithm loop y = 1..|Y| loop x = 1..|X| • Construct staircase(x,y) • Output all maximal 0-rectangles with <x,y> as bottom-right corner 1 X Y 1 Timing O(1) amortized time per <x,y> 1 • 0 0 1 1 <x,y> * <x.y> 1
Timing Only work that is not constant Time Delete 1 • Too Narrow • Maximal • Too short ( x ,y ) r r 1 1 Y 1 1 0 0 0 1 0 0 0 0 0 1 0 ( x ,y ) 1 1 <x,y> * 1 0 ( x, y ) 1 0 0 0 0 X
1 1 0 0 0 1 0 1 0 0 0 0 1 0 0 1 0 0 0 0 1 0 <x-1,y> * 1 0 1 0 0 0 0 Amortized # of steps deleted (per <x,y>) = # of steps created (per <x,y>) £ 1 Timing
Number of Maximal Rectangles £ # of maximal 0-rectangles: • O( (# 1’s)2 ) [Namaad, Hsu, Lee] • Running time of alg = O( #0’s ) £
Designing an Algorithm Exit Exit Exit 0 km Exit 79 km 75 km 79 km to school Exit