350 likes | 497 Views
Spatio-temporal frequent pattern mining for public safety: Concepts and Techniques. Pradeep Mohan * Department of Computer Science University of Minnesota, Twin-Cities Advisor: Prof. Shashi Shekhar Thesis Committee: Prof. F. Harvey, Prof. G. Karypis, Prof. J. Srivastava.
E N D
Spatio-temporal frequent pattern mining for public safety: Concepts and Techniques Pradeep Mohan* Department of Computer Science University of Minnesota, Twin-Cities Advisor: Prof. Shashi Shekhar Thesis Committee: Prof. F. Harvey, Prof. G. Karypis, Prof. J. Srivastava *Contact: mohan@cs.umn.edu
Biography • Education • Ph.D., Student, Department. of Computer Science and Engineering., University of Minnesota, MN, 2007 – Present. • B. E., Department. of Computer Science and Engineering, Birla Institute of Technology, Mesra, Ranchi, India. 2003-2007 • Major Projects during PhD • US DoJ/NIJ- Mapping and analysis for Public Safety • CrimeStat .NET Libaries 1.0 : Modularization of CrimeStat, a tool for the analysis of crime incidents. • Performance tuning of Spatial analysis routines in CrimeStat • CrimeStat 3.2a - 3.3: Addition of new modules for spatial analysis. • US DOD/ ERDC/ TEC – Cascade models for multi scale pattern discovery • Designed new interest measures and formulated pattern mining algorithms for identifying patterns from large crime report datasets. 1
Thesis Related Publications Cascading spatio-temporal pattern discovery (Chapter 2) • P. Mohan, S.Shekhar, J.A.Shine, J.P. Rogers. Cascading spatio-temporal pattern discovery: A summary of results. In Proc. Of 10th SIAM International Conference on Data Mining 2010 (SDM 2010, Full paper acceptance rate 20%) • P. Mohan, S.Shekhar, J.A.Shine, J.P. Rogers. Cascading spatio-temporal pattern discovery. IEEE Transactions on Knowledge and Data Engineering (TKDE). (Accepted Regular Paper, In Press ~20% Acceptance Rate) Regional co-location pattern discovery (Chapter 3) • P.Mohan, S.Shekhar, J.A. Shine, J.P. Rogers, Z.Jiang, N.Wayant. A spatial neighborhood graph based approach to Regional co-location pattern discovery: summary of results. In Proc. Of 19th ACM SIGSPATIAL International Conference on Advances in GIS 2011 (ACM SIGSPATIAL 2011, Full paper acceptance rate 23%) Crime Pattern Analysis Application (Chapter 4) • S.Shekhar, P. Mohan, D.Oliver, Z.Jiang, X.Zhou. Crime pattern analysis: A spatial frequent pattern mining approach. M. Leitner (Ed.), Crime modeling and mapping using Geospatial Technologies, Springer (Accepted with Revisions). 2
Outline • Introduction • Motivation • Problem Statement • Our Approach • Future Work 4
Motivation: Public Safety • Crime generators and attractors • Identifying events (e.g. Bar closing, football games) that lead to increased crime. Question: What / Where are the frequent crime generators ? • Identifying frequent crime hotspots • Law enforcement planning • Courtsey: www.startribune.com Predicting the next location of burglary. Question: Where are the crime hotspots ? • Predicting crime events • Predictive policing (e.g. Predict next location of offense, forecast crime levels around conventions etc.) Question: What are the crime levels 1 hour after a football game within a radius of 1 mile ? • Courtsey: https://www.llnl.gov/str/September02/Hall.html Other Applications: Epidemiology 5
Scientific Domain: Environmental Criminology Routine activity theory and Crime Triangle Crime pattern theory Courtsey: http://www.popcenter.org/learning/60steps/index.cfm?stepnum=8 Courtsey: http://www.popcenter.org/learning/60steps/index.cfm?stepNum=16 • Crime Event: Motivated offender, vulnerable victim (available at an appropriate location and time), absence of a capable guardian. • Crime Generators : offenders and targets come together in time place, large gatherings (e.g. Bars, Football games) • Crime Attractors : places offering many criminal opportunities and offenders may relocate to these areas (e.g. drug areas) 6
Outline • Introduction • Problem Statement • Spatio-temporal frequent pattern mining problem • Challenges • Our Approach • Future Work 7
Spatio-temporal frequent pattern mining problem • Given: • Spatial / Spatio-temporal framework. • Crime Reports with type, location and / or time. • Spatial Features of interest (e.g. Bars). • Interest measure threshold (Pθ) • Spatial / Spatio-temporal neighbor relation. • Find: • Frequent patterns with interestingness >= Pθ • Objective: • Minimize computation costs. • Constraints : • Correctness and Completeness. • Statistical Interpretation (i.e. account for autocorrelation or heterogeneity) 8
Cascading ST Patterns(Inputs: Spatial, Temporal Neighborhood - 0.5 miles, 20 mins, Threshold - 0.5) Illustration: Output Time T1 Time T2 > T1 Time T3>T2 Aggregate(T1,T2,T3) CSTP: P1 C B A Bar Closing(B) Assault(A) Drunk Driving (C) Regional Co-location patterns (Inputs: Spatial Neighborhood – 1 mile, Threshold- 0.25) a 9
Challenges Time partitioning misses relationships Time T3>T2 Time T1 Time T2 > T1 • Spatio-temporal Semantics • Continuity of space / time • Partial order • Conflicting Requirements • Statistical Interpretation • Computational Scalability • Computational Cost • Exponential set of Candidate patterns {Null} C A B B C A C C C B A B A A B C B A C B C A C A A B C B B A A A B B ………. ………. C B C C A B B A C C A A.4 A.2 Space partitioning misses relationships Aggregate(T1,T2,T3) B.2 A.3 A.4 A.2 C.2 C.4 B.1 C.3 C.1 a A.5 A.1 B.2 # Patterns = Exponential (# event types) C.4 A.3 C.1 C.2 B.1 A.5 A.1 C.3 10
Our Contributions • New Spatio-temporal frequent pattern families. • Ex: Cascading ST Patterns and Regional Co-location patterns. • Novel interest measures guarantee statistical interpretation and computable in polynomial time. • Scalable algorithms based on properties of spatio-temporal data and interest measures. • Experimental evaluation using synthetic and real crime datasets. 11
Outline • Introduction • Problem Statement • Our Approach • Big Picture • Cascading Spatio-temporal pattern discovery • Other Frequent Pattern Families • Future Work 12
Cascading ST pattern (CSTP) Time T1 Time T2 > T1 Time T3>T2 Aggregate(T1,T2,T3) CSTP: P1 C B A Bar Closing(B) Drunk Driving (C) Assault(A) • Input: Crime reports with location and time. a • Output: CSTP • Partially ordered subsets of ST event types. • Located together in space. • Occur in stages over time. 14
Related Pattern Semantics:ST Data mining Spatio-temporal frequent patterns Others Partially Ordered Unordered (ST Co-occurrence) Totally Ordered (ST Sequences) Our Work (Cascading ST patterns ) • ST Co-occurrence [Celik et al. 2008, Cao et al. 2006] • Designed for moving object datasets by treating trajectories as location time series • Performs partitioning over space and time. • ST Sequence [Huang et al. 2008, Cao et al. 2005 ] • Totally ordered patterns modeled as a chain. • Does not account for multiply connected patterns(e.g. nonlinear) • Misses non-linear semantics. • No ST statistical interpretation. 16 15
C.2 Interpretation Model: Directed Neighbor Graph (DNG) CSTP: P1 A.1 A.4 A.2 • Nodes: Individual Events • Directed Edge (N1 N2) iff • Neighbor( N1, N2) • and • After(N2, N1) C B.1 B.2 A.3 C.2 C.3 A.3 B.1 B A C.1 C.3 C.4 C.4 A.1 C.1 B.2 A.5 A.2 A.4 TimeT1 TimeT2 TimeT3 A.5 Bar Closing(B) Assault(A) Drunk Driving (C) 17
C.2 Statistical Foundation: Interest Measures CSTP: P1 • Instances of CSTP P1 : (BA, BC, AC) are • (B1A1, B1C1, A1C1) • (B1A3, B1C2, A3C2) • ? ?(B1A1; A1 C2; B1 C2) • Cascade Participation Ratio : CPR (CSTP, M) : • Conditional Probability of an instance of CSTP in neighborhood, given an instance of event-type M • Examples • Cascade Participation Index: CPI(CSTP) • Min ( CPR(CSTP, M) ) over all M in CSTP • Example: A.1 C B.1 C.3 A.3 B A C.4 C.1 B.2 A.5 A.2 A.4 18
Analytical Evaluation:Statistical Interpretation Spatial Statistics: ST K-Function (Diggle et al. 1995) • Cascade Participation Index (CPI) is an upper bound to the ST K-Function per unit volume. Example: A.1 A.1 A.1 B.1 B.1 B.1 A.3 A.3 A.3 A.2 A.2 A.2 B.2 B.2 B.2 20
Comparison with Related Interest Measures C.2 CSTP: P1 A.1 C B.1 C.3 A.3 B A C.4 C.1 B.2 A.5 A.2 A.4 19
Computational Structure: CSTP Miner Algorithm • Basic Idea • Initialization • for k in (1,2…3..K-1) and prevalent CSTP found do • Generate size k candidates. • Compute CSTP instances / Materialize part of DNG • Calculate interest measure and select prevalent CSTPs. • end • Item sets in Association rule mining • Chemical compounds/sub graphs in graph mining. • Directed acyclic graph in CSTP mining Not part of a conventional apriori setting 21
CSTP Miner Algorithm: Illustration • CPI Threshold = 0.33 {Null} C.2 0 0.4 0.8 A B B B B A A A A B B A C C 0.75 B C A C A 0.2 0 B A.1 B.1 C C C C C.3 A.3 0.4 0.4 0.8 C.4 C.1 A.5 B.2 A.2 0.4 A.4 • Spatio-temporal join 22
Computational Structure: CSTP Miner Algorithm Fixed Parameters: Spatial neighborhood = 0.62 miles and temporal neighborhood = 1hr, CPI threshold = 0.0055 • Key Bottlenecks • Interest measure evaluation • Exponential pattern space • Computational Strategies • Reduce irrelevant interest measure evaluation • Filtering strategies • Compute interest measure efficiently • Time Ordered Nested Loop Strategy • Space-Time Partition Join Strategy 23
CSTP Miner Algorithm: Interest Measure Evaluation • ST Join Strategies: Perform each interest measure computation efficiently • Time Ordered Nested Loop (TONL) Strategy • Space-Time Partitioning (STP) Strategy = volume of ST neighborhood C.2 A.1 B.1 C.3 A.3 ST join by plane sweep Space C.4 C.1 A.5 A.2 B.2 A.4 Time # Edges = 13 24
CSTP Miner Algorithm: Filtering Strategies • Multi resolution ST Filter: Summarizing on a coarser neighborhood yields compression in most cases. CPI Threshold = 0.33 Space Actual Relation Coarse Relation Time 27
Experimental Evaluation :Experiment Setup Goals 1. Compare different design decisions of the CSTPM Algorithm - Performance: Run-time 2. Test effect of parameters on performance: - Number of event types, Dataset Size, Clumpiness Degree Experiment Platform: CPU: 3.2GHz, RAM: 32GB, OS: Linux, Matlab 7.9 28
Experimental Evaluation :Datasets Lincoln, NE Dataset Real Data • Data size: 5 datasets • Drawn by increments of 2 months • 5000- 33000 instances • Event types: • Drawn by increments of 5 event types • 5 – 25 event types. Synthetic Data • Data size: 5 datasets • 5000- 26000 instances • Event types: • 5 – 25 event types. • Clumpiness Degree: • 5- 25 instances per event type per cell. 29
Experimental Evaluation:Join strategy performance Question: What is the effect of dataset size on performance of join strategies? Fixed Parameters: Real Data (CPI = 0.15, 0.31 Miles, 10 Days); Synthetic data(0.5,25,25) Trends:ST Partitioning improves performance by a factor of 5-10 on synthetic data and by a factor of 3 on real data. 30
Lincoln, NE crime dataset: Case study • Is bar closing a generator for crime related CSTP ? Bar locations in Lincoln, NE Questions • Is bar closing a crime generator ? • Are there other generators (e.g. Saturday Nights )? • Observation: Crime peaks around bar-closing! K.S Test: Saturday night significantly different than normal day bar closing (P-value = 1.249x10-7 , K =0.41) 35
Outline • Introduction • Problem Statement • Our Approach • Big Picture • Cascading Spatio-temporal pattern discovery • Other Frequent Pattern Families • Future Work 38
Regional co-location patterns (RCP) • Input: Spatial Features, Crime Reports. • Output: RCP (e.g. < (Bar, Assaults), Downtown >) • Subsets of spatial features. • Frequently located in certain regions of a study area. 39
Statistical Foundation: Accounting for Heterogenity • Conditional probability of observing a pattern instance within a locality given an instance of a feature within that locality. Regional Participation Ratio Example Regional Participation index Example Quantifies the local fraction participating in a relationship. 40
Conclusions • Proposed SFPM techniques (e.g., Cascading ST Patterns and Regional Co-location patterns) honor ST Semantics (e.g., Partial order, Continuity). • Interest measures achieve a balance between statistical interpretation and computational scalability. • Algorithmic strategies exploiting properties of ST data (e.g., multiresolution filter) and properties of interest measures enhance computational savings. 42
Future Work – Short and Medium Term X: Unexplored 43
Future Work – Long Term • Exploring interpretation of discovered patterns by law enforcement. • ST Predictive analytics, Predictive models based on SFPM and Predictive policing. • Towards Geo-social analytics for policing (e.g. Criminal Flash mobs, gangs, groups of offenders committing crimes) • New ST frequent pattern mining algorithms based on depth first graph enumeration. • ST frequent pattern mining techniques that account for patron demographic levels. • Explore evaluation of choloropeth maps via ST frequent pattern mining. 43
Acknowledgment • Members of the Spatial Database and Data Mining Research Group University of Minnesota, Twin-Cities. • This Work was supported by Grants from U.S.ARMY, NGA and U.S. DOJ. • Advisor: Prof. Shashi Shekhar, Computer Science, University of Minnesota. • Thesis committee. • U.S. DOJ – National Institute of Justice: Mr. Ronald E. Wilson (Program Manager, Mapping and Analysis for Public Safety) , Dr. Ned Levine (Ned Levine and Associates, CrimeStat Program) • U.S. Army – Topographic Engineering Center: Dr. J.A.Shine (Mathematician and Statistician, Geospatial Research and Engineering Division ) and Dr. J.P. Rogers (Additional Director, Topographic Engineering Center) • Mr. Tom Casady, Public Safety Director (Formerly Lincoln Police Chief), Lincoln, NE, USA Thank You for your Questions, Comments and Attention! 44