220 likes | 343 Views
DMiST- Data Mining in Spatio-Temporal sets www.dmist.net. Input. Number of time steps = T. Example: T = 9. Entity: (x1,y1), (x2,y2), … , (x9,y9). t=0. t=1. t=2. t=3. t=4. t=5. t=6. t=7. t=8. convergence. encounter. flock. Input. Number of entities/animals/items = n
E N D
Input Number of time steps = T Example: T = 9 Entity: (x1,y1), (x2,y2), … , (x9,y9) t=0 t=1 t=2 t=3 t=4 t=5 t=6 t=7 t=8
convergence encounter flock Input Number of entities/animals/items = n Example: n=4 and T=11 I1 : (x11,y11), … , (x1T,y1T) I2 : (x21,y21), … , (x2T,y2T) … In : (xn1,yn1), … , (xnT,ynT)
Example Caribou Satellite Collar Project, Canada. Number of caribou = 15. Time steps = once a week for 8 years.
Input size? To obtain efficient solutions we need solutions that scales well, i.e. algorithms with limited dependency on the input. n - number of entities (20 millions) T – number of time steps (10 thousands) m – size of a flock (2 200) entities k – flock duration (5 50) time steps Size of input = nT Practical algorithms O((nT)2) Fast algorithms O(nT log nT)
Six basic patterns • Encounter • At least m entities pass through a circular region of radius r. • Convergence • At least m entities are simultaneously within a circular region of • radius r. • Flock • At least m entities move together during a time interval of length at least s; for every point in time there is a circular region of radius r that contains all the entities. • Recurrences • At least m entities are visiting a circular region of radius r at least k times. • Regular recurrences • Concurrent recurrences
Members NICTA Joachim Gudmundsson Thomas Wolle Ghazi Al-Naymat DSTO Brenton Williams Matthew Lowry Uni. of Queensland Xiaofang Zhou Heng Tao Shen Hoyoung Jeung Uni. of Sydney Sanjay Chawla Utrecht University Marc van Kreveld
Members NICTA Algorithms (apx) Computational Geometry Data mining DSTO Applications Data mining Uni. of Sydney Data mining Algorithms Uni. of Queensland Data base systems Data mining Utrecht University Algorithms GIS
Approximations Most problems cannot be solved fast! Instead we need to approximate the solution. Example: Convergence (Radius r is given) Find all discs of radius r that contains at least m entities. r Approximate radius Approximate #entities Convergence m=10
Convergence Is there a disc of radius r that intersects at least m lines? Is there a point that is “covered” by at least m rectangles?
Convergence Bad news: Cannot be solved exactly faster than ~Tn2. Good news: 2-approximation of the number of entities in O(Tn2/m) time.
Encounter Is there a disc of radius r that intersects at least m entities at some point in time? t4 t3 t2 t1 2r
Encounter - detect Idea: • Consider one “cylinder” C with radius 2r. • Compute the intersections between C and the n-1 paths. • If > 7m paths inside C at any time then “Encounter” Total time: O(n log n) / cylinder • If not, then solve exactly. Observation: The total size of all subsets within C is O(mn). Total time: O(n log n + nm) / cylinder Time O(Tn2 (log n+m)).
t2 t4 t1 t3 Flock - definition m – flock size k – flock duration r – radius of disc
c b a d e a d a b a c d b b c a e c e d e d e b e d e c b d t1 t2 t3 t4 t5 MaxClique Flock - Problem Problem: Find a largest flock. Problem is NP-hard. Problem as hard as MaxClique!
Flock – Hardness result Cannot be approximated in polynomial time within a factor of n1- of the optimal. (even if we approximate the radius (factor 2)). Hopeless?
Flock Idea: An entity in the time interval [t1,td] A point in 2d-dimensions t4 t6 t2 t7 14-dimensional Euclidean space t3 t5 t1
Flock t4 t6 t2 t7 t3 t5 Intersection of k (2k-2)-dimensional “cylinders” t1
Flock • For each i=k to T do • For every entity E in the time interval [ti,ti+k] do • transform E to a point in 2k-dimensional space • Build a “Skip Quadtree” 5. For each point do • perform a 2k-dimensional range counting query. Approximation: 3-approximation of the radius Total time: O(Tk (n log n + (1.5)2k))
What should be reported? • Detect if a pattern exists, report. • Report all patterns. • Report “largest” pattern
Current and future research • Advanced patterns • Regular recurrences • Hierarchical patterns • … • Implement practical algorithms • Algorithms and association rule mining • Input data with errors? • External memory algorithms? • Generate test data