1 / 38

A K-Main Routes Approach to Spatial Network Activity Summarization

A K-Main Routes Approach to Spatial Network Activity Summarization. Authors: Dev Oliver Shashi Shekhar James M. Kang Renee Bousselaire Abdussalam Bannur. Outline. Motivation Problem Statement Contributions Validation Analytical Experimental Case Studies Summary and Future Work.

lexi
Download Presentation

A K-Main Routes Approach to Spatial Network Activity Summarization

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A K-Main Routes Approach to Spatial Network Activity Summarization Authors: Dev Oliver Shashi Shekhar James M. Kang Renee Bousselaire Abdussalam Bannur

  2. Outline • Motivation • Problem Statement • Contributions • Validation • Analytical • Experimental • Case Studies • Summary and Future Work

  3. Motivation: Crime Analysis (application domain) • Crime hotspot • Area of concentrated crime Street Place • “Most clustering algorithms will show areas of concentration even when a line is the most appropriate dimension.” – National Institute of Justice** Neighborhood Star Tribune, January 26, 2011 **J. E. Eck et. al. Mapping Crime: Understanding Hot Spots. US National Inst. of Justice (http://www.ncjrs.gov/pdffiles1/nij/209393.pdf), 2005.

  4. Examples of Linear Patterns Linear patterns resulting from deforestation in Brazil http://en.wikipedia.org/wiki/Deforestation_in_Brazil Linear patterns of crime in a major US city

  5. Motivation: Environmental Criminology (scientific domain) • Spatial theories in Environmental Criminology • Routine Activity Theory1 • Crime location related to criminal’s frequently visited areas • Crime Pattern Theory2 • Based on spatial model • Nodes (e.g. home, work, entertainment), • Paths (e.g. routes between nodes), • Edges • Crime locations close to edges • Near criminal’s activity boundaries where residents may not recognize him/her Source: Rossmo, Kim (2000). Geographic Profiling. Boca Raton, FL: CRC Press. http://www.popcenter.org/learning/60steps/index.cfm?stepNum=16 • Network based summarization adds value to Environmental Criminology • Assist with large scale verification of real-world data matching theories • Opportunities to develop hypotheses for new theory formulation 1L.E. Cohen et al., Social change and crime rate trends: A routine activity approach, American sociological review, 1979. 2P. L. Brantingham et al., Environmental Criminology, Waveland Press, 1990.

  6. Other Domains Disaster Relief Accident Analysis and Prevention

  7. Key Concepts • Activity • Object of interest located at node or edge • Summary path • A path chosen by KMR to summarize activities • Activity coverage • Total number of activities of a path or set of paths • Active node • A node having n ≥ 1 activities or joined by an edge having n ≥ 1 activities e.g., A, B, C, D, E • Inactive node • A node having n = 0 activities and joined by edges all having n = 0 activities e.g., F • Active node ratio • Total # active nodes/Total # nodes • e.g., 5/6 Each edge has a weight of 1

  8. k = 2 Edge Weights are 1 Problem Statement Given P = the set of Shortest Paths • Given • A spatial network G = (N, E) • A set of activities, A and their locations (e.g. a node or edge) • A set of Paths, P • K (Number of routes) • Edge weights • Find • A cardinality k subset P′ of P, i.e., a subset P′⊆ P with |P′| = k • Objective • Maximize the activity coverage (AC) by P′ • Constraints • 1 ≤ k ≤ |P|.

  9. Challenges • Measures of interestingness • Activity coverage, average distance, etc • Computational Complexity • Choose(N,2) paths, given N nodes • Exponential number of k subsets of paths

  10. Related Work Network Summarization by Grouping/Clustering Zero or One routes Multiple routes Clumping (Okabe), e.g. NT-VCM (Shiode) Max. Subgraph, e.g. path, tree (Buchin) Our Work

  11. Contributions • K-Main Routes (KMR) algorithm • Finds a set of k routes to group activities • New design decisions added • Network Voronoi Activity assignment • Divide and Conquer Summary path recomputation • Spatial network activity summarization is shown to be NP-complete. • Analytically demonstrate correctness of design decisions and show cost analysis • Experimental evaluation of the various algorithms • Performance evaluated using synthetic and real world datasets • Case study comparing KMR with geometry based summarization

  12. P = the set of Shortest Paths, K=2 K-Main Routes (KMR) Algorithm • K-Main Routes Algorithm • Select k paths as initial summary paths • Repeat • Form k clusters by assigning each activity to its closest summary path • Recompute summary path of each cluster • Until summary paths do not change • Design Decisions • Inactive node pruning • Network Voronoi Activity assignment • Divide and Conquer Summary path recomputation

  13. Design Decision: Inactive Node Pruning • Only consider paths between active nodes • Optimal solution will still be in this set • Given the set of shortest paths • 20 shortest paths calculated and stored versus 30

  14. Design Decision: Network Voronoi (NV) Activity Assignment • Goals • Form k clusters by assigning each activity to its closest summary path • Improve execution time of current assignment strategy • Example (execution trace) Next K-Main Routes Algorithm Select k shortest paths as initial summary paths Repeat • Form k clusters by assigning each activity to its closest summary path • Recompute summary path of each cluster Until summary paths do not change K-Main Routes Algorithm Select k shortest paths as initial summary paths Repeat • Network Voronoi Activity Assignment • Recompute summary path of each cluster Until summary paths do not change

  15. Design Decision: Network Voronoi (NV) Activity Assignment 0 X Open: X A E D H Closed: X ∞ ∞ ∞ ∞ 0 A B C D 0 3 4 7 8 ACTIVITIES 1 9 2 10 ∞ ∞ 0 E F G H 0 6 5 DISTANCE FROM ∞ ∞ Activity Summary Path Active Node Edge weight = 1 Inactive Node Edge weight = 0 Virtual Node Closed Node

  16. Design Decision: Network Voronoi (NV) Activity Assignment 0 X Open: B A E D H Closed: X A ∞ ∞ 1 0 A B C D 0 3 4 7 8 ACTIVITIES 1 9 2 10 E F G H 0 6 5 0 0 DISTANCE FROM 0 ∞ ∞ 1 < 0? Activity Summary Path Active Node Edge weight = 1 Inactive Node Edge weight = 0 0 0 Virtual Node Closed Node

  17. Design Decision: Network Voronoi (NV) Activity Assignment 0 X Open: B E D H F E Closed: X A ∞ 1 0 A B C D 0 3 4 7 8 ACTIVITIES 1 9 2 10 E F G H 0 0 6 5 0 DISTANCE FROM 0 ∞ ∞ 1 0 0 0 0 Activity Summary Path Active Node Edge weight = 1 Inactive Node Edge weight = 0 0 0 0 0 Virtual Node Closed Node

  18. Design Decision: Network Voronoi (NV) Activity Assignment 0 X Open: B C D H F E Closed: X A D ∞ 1 1 0 A B C D 0 3 4 7 8 ACTIVITIES 1 9 2 10 E F G H 0 0 6 5 0 DISTANCE FROM 0 ∞ 1 1 < 0? 0 0 0 0 0 0 Activity Summary Path Active Node Edge weight = 1 Inactive Node Edge weight = 0 0 0 0 0 Virtual Node Closed Node 0 0 0 0

  19. Design Decision: Network Voronoi (NV) Activity Assignment 0 X Open: B C H F G E Closed: X A D H 1 1 0 A B C D 0 3 4 7 8 ACTIVITIES 1 9 2 10 E F G H 0 0 6 5 0 DISTANCE FROM 0 ∞ 1 1 0 0 0 0 0 0 Activity Summary Path Active Node 0 0 Edge weight = 1 Inactive Node Edge weight = 0 0 0 0 0 Virtual Node Closed Node 0 0 0 0

  20. Design Decision: Network Voronoi (NV) Activity Assignment 0 X Open: B C F G 2 < 1? E Closed: X A D H B 1 1 0 A B C D 0 3 4 7 8 ACTIVITIES 1 9 2 10 E F G H 0 0 6 5 0 1 1 DISTANCE FROM 0 1 1 2 < 1? 0 0 0 0 0 0 Activity Summary Path Active Node 0 0 Edge weight = 1 Inactive Node Edge weight = 0 0 0 0 0 1 1 Virtual Node Closed Node 0 0 0 0

  21. Design Decision: Network Voronoi (NV) Activity Assignment 0 X Open: C F G E Closed: X A D H F B 1 1 0 A B C D 0 3 4 7 8 ACTIVITIES 1 9 2 10 E F G H 0 0 6 5 0 1 1 DISTANCE FROM 0 1 1 2 < 1? 0 0 0 0 0 0 Activity Summary Path Active Node 0 0 Edge weight = 1 Inactive Node Edge weight = 0 0 0 0 0 1 1 Virtual Node Closed Node 0 0 0 0

  22. Design Decision: Network Voronoi (NV) Activity Assignment 0 X Open: C G E Closed: X A D H F C B 1 1 0 A B C D 0 3 4 7 8 ACTIVITIES 1 9 2 10 E F G H 0 0 6 5 0 1 1 DISTANCE FROM 0 1 1 2 < 1? 0 0 0 0 0 0 1 1 Activity Summary Path Active Node 0 0 Edge weight = 1 Inactive Node Edge weight = 0 0 0 0 0 1 1 Virtual Node Closed Node 1 1 0 0 0 0

  23. Design Decision: Network Voronoi (NV) Activity Assignment • Network Voronoi Activity Assignment algorithm Input: Graph G = (N, E), a set of Activities A, a set of k Summary Paths, S Output: A set of k clusters formed by assigning all ai ∈A to one si ∈S, where dist(ai, si) ≤ dist(ai, sj) and sj ∈S and sj ≠ si • 1. Open ← all nodes ∈ S, Closed ← Ø • 2. Tnodes ← all nodes ∈ S, • 3. Tactivities ← activities on si ∈S • 4. repeat • 5. nc ← next node ∈ Open • 6. remove nc from Open • 7. Closed ← nc • 8. X ← neighbors of nc • 9. foreach xi ∈ X • 10. if xi ∉ Tnodes and xi ∉ Closed • 11. Tnodes ← xi • xi.prev ← nc, • xi.dist ← dist(xi, nc) + nc.dist • xi.sp ← nc.sp • else if xi ∈Tnodes • update xi if new dist < xi.dist • if xi ∉ Open • Open ← xi • Y ← activities on edge {nc, xi} • foreach yi ∈ Y • if yi ∉ Tactivities • Tactivities ← yi • yi.prev ← nc • yi.dist ← xi.dist • yi.sp ← xi.sp • else • update yi if new dist < yi.dist • until all active nodes ∈ Closed • return currentClusters

  24. Design Decision: Divide and Conquer Summary PAth REcomputation • Goals • Recompute the summary path of each cluster • Improve execution time of current recomputation strategy • Example (execution trace) Next K-Main Routes Algorithm Select k shortest paths as initial summary paths Repeat • Network Voronoi Activity Assignment • Recompute summary path of each cluster Until summary paths do not change K-Main Routes Algorithm Select k shortest paths as initial summary paths Repeat • Network Voronoi Activity Assignment • Divide and Conquer Summary path Recomputation Design Decision Until summary paths do not change

  25. Activity 3 4 7 8 Active Node 1 9 Inactive Node A B C D 2 10 Summary Path 6 5 Edge weights are 1 E F G H Design Decision: Divide and Conquer Summary PAth REcomputation • Summary Path Recomputation Algorithm Input: Graph G = (N, E), a set of Clusters, C Output: A set of summary paths, S where si ∈S has max coverage for ci ∈ C and si ∈ ci • nextClusters ← Ø • foreach ci ∈ C • X ← active nodes of ci • maxP ← Ø • foreach xi ∈ X • foreach xj ∈ X • if (i ≠ j) • cP ← getSP(xi, xj) • if (maxP = Ø) • maxP ← cP • if (maxP.activities < cP.activities) • maxP ← cP • if (maxP ≠ ci.summaryPath • nextClusters ← maxP • else • nextClusters ← ci.summaryPath • return nextClusters Cluster

  26. Validation • Analytical • Cost analysis explaining computational savings • Experimental • Comparative analysis of KMR with various design decisions • Performed on real and synthetic data • Network voronoi activity assignment and divide and conquer summary path recomputation saves computational costs • Savings increase with number of nodes, routes, activities and active node ratio • Case studies • Qualitatively shows the usefulness of network based summarization on Crime data

  27. Analytical Evaluation: Computational Analysis • KMR Execution Time = Number of Iterations × (Activity Assignment Cost + Summary Path Recomputation Cost) • TKMR = I × ([K × |A| × cost(ai,ci)] + [K × dc × |N|2]) • TKMR_I = I × ([K × |A| × cost(ai,ci)] + [K × dc × (|N| × r)2]) • TKMR_IAS = I × ([|E| + |N|×log |N|] + [K × dc × (|N|/K × r)2]) I = Number of Iterations K = Number of Clusters A = Set of activities cost(ai, ci) = Cost of calculating the distance between activity ai and cluster ci dc = Cost of looking up a path N = Set of Nodes E = Set of Edges r = active node ratio, 0 ≤ r ≤ 1

  28. Experimental Evaluation Variables Synthetic Dataset Real Dataset #Nodes #Routes Measures Java-based Simulator Analysis #Activities Active Node Ratio Candidates KMR_I KMR_IV KMR_ID KMR_IVD • Goal: Comparative analysis • Candidates: KMR with various design decisions • KMR_I – KMR with inactive node pruning • KMR_IV – KMR with inactive node pruning and Network voronoi activity assignment • KMR_ID – KMR with Divide and conquer summary path recomputation • KMR_IVD – KMR with all three design decisions • Measure: CPU time (Unix time command) • Platform: Mac Pro, 2 x Xeon Quad Core 2.26 GHz, 16 GB RAM • Variables: #Nodes, #Routes, #Activities, Active Node Ratio • Fixed Parameters: unit edge length • Datasets: Synthetic and Real (Haiti Earthquake)

  29. Data Description and Characteristics • Synthetic Data • 2010 Census TIGER/Line® Shapefiles used for road network • Activities randomly assigned to each edge • Real-world data: Haiti Data Set • Geospatial and Temporal Dataset describing recent events post-disaster • Dataset collected from Jan 12, 2010 to March 23, 2010 • 1,677 records • Characteristics • Attributes • Incident Title (e.g., “Food, Water, Tents needed…”) • Incident Date and Time • Location (City, port name) • Category (numeric category) • Latitude/Longitude • Sources • Crisis Map of Haiti - http://haiti.ushahidi.com/ • OpenStreetMap - http://www.openstreetmap.org/

  30. Effect of Number of Nodes Synthetic Data Set Number of Activities = 1200 Active Node Ratio = 0.2 K = 2 Real Data Set Number of Activities = 1206 Active Node Ratio = 0.1998 K = 2 • Trends: • Voronoi Activity assignment and divide and conquer summary path recomputation saves comp. costs • Savings increase with number of nodes

  31. Effect of Number of Routes, K Synthetic Data Set Number of Nodes = 1000 Number of Activities = 1200 Active Node Ratio = 0.2 Real Data Set Number of Nodes = 1000 Number of Activities = 202 Active Node Ratio = 0.219 • Trends: • Voronoi Activity assignment and divide and conquer summary path recomputation saves comp. costs • Savings increase with number of routes

  32. Effect of Number of Activities Synthetic Data Set Number of Nodes = 1000 Active Node Ratio = 0.2 K = 2 • Trends: • Voronoi Activity assignment and divide and conquer summary path recomputation saves comp. costs • Savings increase with number of activities

  33. Effect of Active Node Ratio Synthetic Data Set Number of Nodes = 1000 Number of Activities = 1200 K = 2 • Trends: • Voronoi Activity assignment and divide and conquer summary path recomputation saves comp. costs • Savings increase with active node ratio

  34. Case Study: Crime Analysis Input (a set of crime incidents, k=5) KMR Output Crimestat K-Means (Euclidean distance) Crimestat K-Means (Network distance)

  35. Case Study: Crime Analysis Input (a set of crime incidents, k=5) KMR Output Crimestat K-Means (Euclidean distance) Crimestat K-Means (Network distance)

  36. Case Study: Crime Analysis Input (a set of crime incidents, k=5) KMR Output Crimestat K-Means (Euclidean distance) Crimestat K-Means (Network distance)

  37. Summary • Spatial network activity summarization was shown to be NP-complete. • K-Main Routes (KMR) algorithm and its design decisions described • Inactive node pruning • Network Voronoi Activity assignment • Divide and Conquer Summary path recomputation • Analytically demonstrated correctness of design decisions and cost analysis showed • Experimental evaluation • Performance evaluated using synthetic and real world datasets • Case study comparing KMR with geometry based summarization

  38. Acknowledgements • Members of the Spatial Database and Spatial Data Mining Research Group, University of Minnesota, Twin-Cities. • This work was supported by grants from USARMY and USDOD. • Thank you for your time! Any questions or comments?

More Related