380 likes | 528 Views
A K-Main Routes Approach to Spatial Network Activity Summarization. Authors: Dev Oliver Shashi Shekhar James M. Kang Renee Bousselaire Abdussalam Bannur. Outline. Motivation Problem Statement Contributions Validation Analytical Experimental Case Studies Summary and Future Work.
E N D
A K-Main Routes Approach to Spatial Network Activity Summarization Authors: Dev Oliver Shashi Shekhar James M. Kang Renee Bousselaire Abdussalam Bannur
Outline • Motivation • Problem Statement • Contributions • Validation • Analytical • Experimental • Case Studies • Summary and Future Work
Motivation: Crime Analysis (application domain) • Crime hotspot • Area of concentrated crime Street Place • “Most clustering algorithms will show areas of concentration even when a line is the most appropriate dimension.” – National Institute of Justice** Neighborhood Star Tribune, January 26, 2011 **J. E. Eck et. al. Mapping Crime: Understanding Hot Spots. US National Inst. of Justice (http://www.ncjrs.gov/pdffiles1/nij/209393.pdf), 2005.
Examples of Linear Patterns Linear patterns resulting from deforestation in Brazil http://en.wikipedia.org/wiki/Deforestation_in_Brazil Linear patterns of crime in a major US city
Motivation: Environmental Criminology (scientific domain) • Spatial theories in Environmental Criminology • Routine Activity Theory1 • Crime location related to criminal’s frequently visited areas • Crime Pattern Theory2 • Based on spatial model • Nodes (e.g. home, work, entertainment), • Paths (e.g. routes between nodes), • Edges • Crime locations close to edges • Near criminal’s activity boundaries where residents may not recognize him/her Source: Rossmo, Kim (2000). Geographic Profiling. Boca Raton, FL: CRC Press. http://www.popcenter.org/learning/60steps/index.cfm?stepNum=16 • Network based summarization adds value to Environmental Criminology • Assist with large scale verification of real-world data matching theories • Opportunities to develop hypotheses for new theory formulation 1L.E. Cohen et al., Social change and crime rate trends: A routine activity approach, American sociological review, 1979. 2P. L. Brantingham et al., Environmental Criminology, Waveland Press, 1990.
Other Domains Disaster Relief Accident Analysis and Prevention
Key Concepts • Activity • Object of interest located at node or edge • Summary path • A path chosen by KMR to summarize activities • Activity coverage • Total number of activities of a path or set of paths • Active node • A node having n ≥ 1 activities or joined by an edge having n ≥ 1 activities e.g., A, B, C, D, E • Inactive node • A node having n = 0 activities and joined by edges all having n = 0 activities e.g., F • Active node ratio • Total # active nodes/Total # nodes • e.g., 5/6 Each edge has a weight of 1
k = 2 Edge Weights are 1 Problem Statement Given P = the set of Shortest Paths • Given • A spatial network G = (N, E) • A set of activities, A and their locations (e.g. a node or edge) • A set of Paths, P • K (Number of routes) • Edge weights • Find • A cardinality k subset P′ of P, i.e., a subset P′⊆ P with |P′| = k • Objective • Maximize the activity coverage (AC) by P′ • Constraints • 1 ≤ k ≤ |P|.
Challenges • Measures of interestingness • Activity coverage, average distance, etc • Computational Complexity • Choose(N,2) paths, given N nodes • Exponential number of k subsets of paths
Related Work Network Summarization by Grouping/Clustering Zero or One routes Multiple routes Clumping (Okabe), e.g. NT-VCM (Shiode) Max. Subgraph, e.g. path, tree (Buchin) Our Work
Contributions • K-Main Routes (KMR) algorithm • Finds a set of k routes to group activities • New design decisions added • Network Voronoi Activity assignment • Divide and Conquer Summary path recomputation • Spatial network activity summarization is shown to be NP-complete. • Analytically demonstrate correctness of design decisions and show cost analysis • Experimental evaluation of the various algorithms • Performance evaluated using synthetic and real world datasets • Case study comparing KMR with geometry based summarization
P = the set of Shortest Paths, K=2 K-Main Routes (KMR) Algorithm • K-Main Routes Algorithm • Select k paths as initial summary paths • Repeat • Form k clusters by assigning each activity to its closest summary path • Recompute summary path of each cluster • Until summary paths do not change • Design Decisions • Inactive node pruning • Network Voronoi Activity assignment • Divide and Conquer Summary path recomputation
Design Decision: Inactive Node Pruning • Only consider paths between active nodes • Optimal solution will still be in this set • Given the set of shortest paths • 20 shortest paths calculated and stored versus 30
Design Decision: Network Voronoi (NV) Activity Assignment • Goals • Form k clusters by assigning each activity to its closest summary path • Improve execution time of current assignment strategy • Example (execution trace) Next K-Main Routes Algorithm Select k shortest paths as initial summary paths Repeat • Form k clusters by assigning each activity to its closest summary path • Recompute summary path of each cluster Until summary paths do not change K-Main Routes Algorithm Select k shortest paths as initial summary paths Repeat • Network Voronoi Activity Assignment • Recompute summary path of each cluster Until summary paths do not change
Design Decision: Network Voronoi (NV) Activity Assignment 0 X Open: X A E D H Closed: X ∞ ∞ ∞ ∞ 0 A B C D 0 3 4 7 8 ACTIVITIES 1 9 2 10 ∞ ∞ 0 E F G H 0 6 5 DISTANCE FROM ∞ ∞ Activity Summary Path Active Node Edge weight = 1 Inactive Node Edge weight = 0 Virtual Node Closed Node
Design Decision: Network Voronoi (NV) Activity Assignment 0 X Open: B A E D H Closed: X A ∞ ∞ 1 0 A B C D 0 3 4 7 8 ACTIVITIES 1 9 2 10 E F G H 0 6 5 0 0 DISTANCE FROM 0 ∞ ∞ 1 < 0? Activity Summary Path Active Node Edge weight = 1 Inactive Node Edge weight = 0 0 0 Virtual Node Closed Node
Design Decision: Network Voronoi (NV) Activity Assignment 0 X Open: B E D H F E Closed: X A ∞ 1 0 A B C D 0 3 4 7 8 ACTIVITIES 1 9 2 10 E F G H 0 0 6 5 0 DISTANCE FROM 0 ∞ ∞ 1 0 0 0 0 Activity Summary Path Active Node Edge weight = 1 Inactive Node Edge weight = 0 0 0 0 0 Virtual Node Closed Node
Design Decision: Network Voronoi (NV) Activity Assignment 0 X Open: B C D H F E Closed: X A D ∞ 1 1 0 A B C D 0 3 4 7 8 ACTIVITIES 1 9 2 10 E F G H 0 0 6 5 0 DISTANCE FROM 0 ∞ 1 1 < 0? 0 0 0 0 0 0 Activity Summary Path Active Node Edge weight = 1 Inactive Node Edge weight = 0 0 0 0 0 Virtual Node Closed Node 0 0 0 0
Design Decision: Network Voronoi (NV) Activity Assignment 0 X Open: B C H F G E Closed: X A D H 1 1 0 A B C D 0 3 4 7 8 ACTIVITIES 1 9 2 10 E F G H 0 0 6 5 0 DISTANCE FROM 0 ∞ 1 1 0 0 0 0 0 0 Activity Summary Path Active Node 0 0 Edge weight = 1 Inactive Node Edge weight = 0 0 0 0 0 Virtual Node Closed Node 0 0 0 0
Design Decision: Network Voronoi (NV) Activity Assignment 0 X Open: B C F G 2 < 1? E Closed: X A D H B 1 1 0 A B C D 0 3 4 7 8 ACTIVITIES 1 9 2 10 E F G H 0 0 6 5 0 1 1 DISTANCE FROM 0 1 1 2 < 1? 0 0 0 0 0 0 Activity Summary Path Active Node 0 0 Edge weight = 1 Inactive Node Edge weight = 0 0 0 0 0 1 1 Virtual Node Closed Node 0 0 0 0
Design Decision: Network Voronoi (NV) Activity Assignment 0 X Open: C F G E Closed: X A D H F B 1 1 0 A B C D 0 3 4 7 8 ACTIVITIES 1 9 2 10 E F G H 0 0 6 5 0 1 1 DISTANCE FROM 0 1 1 2 < 1? 0 0 0 0 0 0 Activity Summary Path Active Node 0 0 Edge weight = 1 Inactive Node Edge weight = 0 0 0 0 0 1 1 Virtual Node Closed Node 0 0 0 0
Design Decision: Network Voronoi (NV) Activity Assignment 0 X Open: C G E Closed: X A D H F C B 1 1 0 A B C D 0 3 4 7 8 ACTIVITIES 1 9 2 10 E F G H 0 0 6 5 0 1 1 DISTANCE FROM 0 1 1 2 < 1? 0 0 0 0 0 0 1 1 Activity Summary Path Active Node 0 0 Edge weight = 1 Inactive Node Edge weight = 0 0 0 0 0 1 1 Virtual Node Closed Node 1 1 0 0 0 0
Design Decision: Network Voronoi (NV) Activity Assignment • Network Voronoi Activity Assignment algorithm Input: Graph G = (N, E), a set of Activities A, a set of k Summary Paths, S Output: A set of k clusters formed by assigning all ai ∈A to one si ∈S, where dist(ai, si) ≤ dist(ai, sj) and sj ∈S and sj ≠ si • 1. Open ← all nodes ∈ S, Closed ← Ø • 2. Tnodes ← all nodes ∈ S, • 3. Tactivities ← activities on si ∈S • 4. repeat • 5. nc ← next node ∈ Open • 6. remove nc from Open • 7. Closed ← nc • 8. X ← neighbors of nc • 9. foreach xi ∈ X • 10. if xi ∉ Tnodes and xi ∉ Closed • 11. Tnodes ← xi • xi.prev ← nc, • xi.dist ← dist(xi, nc) + nc.dist • xi.sp ← nc.sp • else if xi ∈Tnodes • update xi if new dist < xi.dist • if xi ∉ Open • Open ← xi • Y ← activities on edge {nc, xi} • foreach yi ∈ Y • if yi ∉ Tactivities • Tactivities ← yi • yi.prev ← nc • yi.dist ← xi.dist • yi.sp ← xi.sp • else • update yi if new dist < yi.dist • until all active nodes ∈ Closed • return currentClusters
Design Decision: Divide and Conquer Summary PAth REcomputation • Goals • Recompute the summary path of each cluster • Improve execution time of current recomputation strategy • Example (execution trace) Next K-Main Routes Algorithm Select k shortest paths as initial summary paths Repeat • Network Voronoi Activity Assignment • Recompute summary path of each cluster Until summary paths do not change K-Main Routes Algorithm Select k shortest paths as initial summary paths Repeat • Network Voronoi Activity Assignment • Divide and Conquer Summary path Recomputation Design Decision Until summary paths do not change
Activity 3 4 7 8 Active Node 1 9 Inactive Node A B C D 2 10 Summary Path 6 5 Edge weights are 1 E F G H Design Decision: Divide and Conquer Summary PAth REcomputation • Summary Path Recomputation Algorithm Input: Graph G = (N, E), a set of Clusters, C Output: A set of summary paths, S where si ∈S has max coverage for ci ∈ C and si ∈ ci • nextClusters ← Ø • foreach ci ∈ C • X ← active nodes of ci • maxP ← Ø • foreach xi ∈ X • foreach xj ∈ X • if (i ≠ j) • cP ← getSP(xi, xj) • if (maxP = Ø) • maxP ← cP • if (maxP.activities < cP.activities) • maxP ← cP • if (maxP ≠ ci.summaryPath • nextClusters ← maxP • else • nextClusters ← ci.summaryPath • return nextClusters Cluster
Validation • Analytical • Cost analysis explaining computational savings • Experimental • Comparative analysis of KMR with various design decisions • Performed on real and synthetic data • Network voronoi activity assignment and divide and conquer summary path recomputation saves computational costs • Savings increase with number of nodes, routes, activities and active node ratio • Case studies • Qualitatively shows the usefulness of network based summarization on Crime data
Analytical Evaluation: Computational Analysis • KMR Execution Time = Number of Iterations × (Activity Assignment Cost + Summary Path Recomputation Cost) • TKMR = I × ([K × |A| × cost(ai,ci)] + [K × dc × |N|2]) • TKMR_I = I × ([K × |A| × cost(ai,ci)] + [K × dc × (|N| × r)2]) • TKMR_IAS = I × ([|E| + |N|×log |N|] + [K × dc × (|N|/K × r)2]) I = Number of Iterations K = Number of Clusters A = Set of activities cost(ai, ci) = Cost of calculating the distance between activity ai and cluster ci dc = Cost of looking up a path N = Set of Nodes E = Set of Edges r = active node ratio, 0 ≤ r ≤ 1
Experimental Evaluation Variables Synthetic Dataset Real Dataset #Nodes #Routes Measures Java-based Simulator Analysis #Activities Active Node Ratio Candidates KMR_I KMR_IV KMR_ID KMR_IVD • Goal: Comparative analysis • Candidates: KMR with various design decisions • KMR_I – KMR with inactive node pruning • KMR_IV – KMR with inactive node pruning and Network voronoi activity assignment • KMR_ID – KMR with Divide and conquer summary path recomputation • KMR_IVD – KMR with all three design decisions • Measure: CPU time (Unix time command) • Platform: Mac Pro, 2 x Xeon Quad Core 2.26 GHz, 16 GB RAM • Variables: #Nodes, #Routes, #Activities, Active Node Ratio • Fixed Parameters: unit edge length • Datasets: Synthetic and Real (Haiti Earthquake)
Data Description and Characteristics • Synthetic Data • 2010 Census TIGER/Line® Shapefiles used for road network • Activities randomly assigned to each edge • Real-world data: Haiti Data Set • Geospatial and Temporal Dataset describing recent events post-disaster • Dataset collected from Jan 12, 2010 to March 23, 2010 • 1,677 records • Characteristics • Attributes • Incident Title (e.g., “Food, Water, Tents needed…”) • Incident Date and Time • Location (City, port name) • Category (numeric category) • Latitude/Longitude • Sources • Crisis Map of Haiti - http://haiti.ushahidi.com/ • OpenStreetMap - http://www.openstreetmap.org/
Effect of Number of Nodes Synthetic Data Set Number of Activities = 1200 Active Node Ratio = 0.2 K = 2 Real Data Set Number of Activities = 1206 Active Node Ratio = 0.1998 K = 2 • Trends: • Voronoi Activity assignment and divide and conquer summary path recomputation saves comp. costs • Savings increase with number of nodes
Effect of Number of Routes, K Synthetic Data Set Number of Nodes = 1000 Number of Activities = 1200 Active Node Ratio = 0.2 Real Data Set Number of Nodes = 1000 Number of Activities = 202 Active Node Ratio = 0.219 • Trends: • Voronoi Activity assignment and divide and conquer summary path recomputation saves comp. costs • Savings increase with number of routes
Effect of Number of Activities Synthetic Data Set Number of Nodes = 1000 Active Node Ratio = 0.2 K = 2 • Trends: • Voronoi Activity assignment and divide and conquer summary path recomputation saves comp. costs • Savings increase with number of activities
Effect of Active Node Ratio Synthetic Data Set Number of Nodes = 1000 Number of Activities = 1200 K = 2 • Trends: • Voronoi Activity assignment and divide and conquer summary path recomputation saves comp. costs • Savings increase with active node ratio
Case Study: Crime Analysis Input (a set of crime incidents, k=5) KMR Output Crimestat K-Means (Euclidean distance) Crimestat K-Means (Network distance)
Case Study: Crime Analysis Input (a set of crime incidents, k=5) KMR Output Crimestat K-Means (Euclidean distance) Crimestat K-Means (Network distance)
Case Study: Crime Analysis Input (a set of crime incidents, k=5) KMR Output Crimestat K-Means (Euclidean distance) Crimestat K-Means (Network distance)
Summary • Spatial network activity summarization was shown to be NP-complete. • K-Main Routes (KMR) algorithm and its design decisions described • Inactive node pruning • Network Voronoi Activity assignment • Divide and Conquer Summary path recomputation • Analytically demonstrated correctness of design decisions and cost analysis showed • Experimental evaluation • Performance evaluated using synthetic and real world datasets • Case study comparing KMR with geometry based summarization
Acknowledgements • Members of the Spatial Database and Spatial Data Mining Research Group, University of Minnesota, Twin-Cities. • This work was supported by grants from USARMY and USDOD. • Thank you for your time! Any questions or comments?