1 / 37

Hot Spot Detection in a Network Space: Geocomputational Approaches

Hot Spot Detection in a Network Space: Geocomputational Approaches. Ikuho Yamada, Ph.D. Department of Geography & School of Informatics IUPUI October 3, 2005 Fall 2005 Talk Series on Network and Complex Systems . Introduction .

kiara
Download Presentation

Hot Spot Detection in a Network Space: Geocomputational Approaches

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Hot Spot Detection in a Network Space: Geocomputational Approaches Ikuho Yamada, Ph.D. Department of Geography & School of Informatics IUPUI October 3, 2005 Fall 2005 Talk Series on Network and Complex Systems

  2. Introduction • Clusters in a spatial phenomenon = hot spots, where occurrence or level of the phenomenon is higher than expected. • Detecting hot spots is useful for • Understanding of the nature of the phenomenon itself: • Factors influencing the phenomenon; • Decision making in related policies/planning: • Remedial/preventive actions; • Regional development planning; • New facility design, etc…

  3. Introduction (cont.) • Potential problem: • Spatial distribution of the phenomenon may be affected by a transportation network; • E.g., vehicle crashes, retail facilities, crime locations, … • Analytical results derived w/o considering the network’s influence will be misleading, especially for • Detailed micro-scale data, and local scale analysis.  Analysis based on a network space, rather than the Euclidean space. No!! Cluster?

  4. Stage2: Identifying influencing factors Data Stage 1: Detecting local clusters Classifier to determine cluster or not (e.g., Decision tree) Answer to Questions 1, 2, & 3 Answer to Question 4 Highway network Vehicle crash location Black spots (Clusters of crashes) Objectives • Is there any clustering tendency? • Where are the clusters? • How large are the clusters? • What causes the clusters?

  5. Event-based data Link-attribute-based data Objectives (cont.) • Stage 1: Cluster detection in the network space • To develop exploratory spatial data analysis methods for network-based local-cluster detection, named local indicators of network-constrained clusters (LINCS). K-function Moran’s I and Getis & Ord’s G statistics

  6. Objectives (cont.) • Stage 2: Influencing factor identification • To examine applicability of inductive learning techniques for constructing models that explain the clusters in relation to the characteristics of the network space; • Decision tree induction algorithms; • Feedforward neural networks; • Discrete choice/regression models --- as examples of traditional statistical methods.

  7. Outline • Constraints imposed by the network space • Stage 1 — Development of LINCS • Network K-function for event-based data • Stage 2 — Inductive learning • Decision tree induction to model relationships between the detected clusters using the network attributes • Case study: • 1997 vehicle crash data in Buffalo, NY • Conclusions

  8. Constraints imposed by the network space • Location constraint: • Some spatial phenomena occur only on the links of the network. • E.g., vehicle crashes, retail facilities, geocoded addresses (crime locations, patient residences, …); • Movement constraint: • Movement between locations is restricted to the network links; • E.g., One can get to a gas station only by driving along the streets; •  Distance between locations is more appropriately represented by the network (shortest-path) distance than by the Euclidean (straight-line) distance.

  9. Network constraints (cont.) Location constraint Movement constraint

  10. Stage 1Cluster detection in the network space

  11. Network K-function Planar K-function Global Network K-function (Okabe & Yamada 2001) • Extension of Ripley’s K-function (1976) to determine • If a point pattern has clustering/dispersal tendency significantly different from random with respect to the network; • For a set of network-constrained events P, • where ρ is the intensity of points. Not within distance h Within distance h

  12. Global Net K-function (cont.) An example of random distribution in a network space

  13. Global Net K-function (cont.) • Weakness of the global K-function in determining the scale of clustering: • If there is a strong cluster with radius R, K(h) tends to exceed the upper significance envelope, indicating clustering, even for h≥R. • Incremental K-function: • Instead of examining the total number of events within distance h, examine an increment of the number of events by a unit distance; • It can identify clustering scale more accurately than the original K-function. Different IncK(ht) Similar K(h)

  14. Local Network K-function • Local indicator of clustering tendency: • Decomposition of the global K-function: • This indicator is determined only for event locations;  only for limited locations in a network; • Introduction of reference points: • Distributed over the network with a constant interval for which indicator values are calculated; • c.f., regular grid used in the planar space analysis such as Geographical Analysis Machine (GAM).

  15. Local Net K-function (cont.) • Local network K-function: where j=1, …, m, and m is the number of reference points; • For an observed pattern, • Local K-function values are obtained for the reference points for a range of distance h. LINCS for event-based data (KLINCS)

  16. Example of the KLINCS analysis • The incremental K-function can be an indicator of the scale of clustering to help us determine which scale(s) of the local K-function to be closely examined;  Distance 2, in this case.

  17. KLINCS (cont.) • Results of the local network K-function: • Significance of individual reference points is determined by comparing with 1,000 simulations of random patterns on the network; • Obs. LKj(h) ≥ the largest simulated LKj(h)  clustering; • Obs. LKj(h) ≤ the smallest simulated LKj(h)  dispersal. (0.1% significance level)

  18. Local version ILINCS LINCS for link-attribute-based data • Moran’s I statistic (1948): • A global measure of spatial autocorrelation; • Dependence of a variable value at a location on those on its nearby locations in a spatial context • LISA (local indicators of spatial association) by Anselin (1995); • Network Moran’s I (Black 1992): • A measure of network autocorrelation; • Dependence between a variable value at a given link and those of other links that are connected to the link in a network context. • Getis and Ord local G statistics (1992): • A local measure of concentration of variable values around a region; • Applicable to link-attribute-based data (Berglund and Karlström 1999). GLINCS

  19. Relationship between I and G statistics Value of the target link i Values of the links in the neighborhood of link i

  20. From LINCS to inductive learning • Question: What causes the detected clusters? • LINCS gives a measure of clustering tendency for each spatial unit (ref. point or link segment). • Network data include attributes that may be related to the cause of the clusters. • E.g., travel speed, traffic volume, … • Spatial attributes can also be assigned to the spatial units. • E.g., distance from the closest intersection, travel time from the closest police station, average income of the area, …

  21. Spatial units LINCS results Network attributes Spatial attributes Clustering Random Dispersion Causality Relationships? LINCS to IL (cont.) • The spatial units can be categorized based on their LINCS values. • E.g., cluster/random/dispersion; large cluster/medium cluster/ small cluster/random; cluster center/cluster fringe/random. Inductive Learning Decision tree induction Feedforward neural network

  22. Stage 2Influencing factor identification

  23. Inductive learning • A means to model relationships between input variables and outcome (classification) without relying on prior knowledge: (Gahegan 2000) • Learns from a set of instances for which desired outcome is known; • Predicts outcomes for new instances with known input variables.

  24. Decision tree • A way of representing rules for classification in a hierarchical manner; (Witten & Frank 2000; Thill & Wheeler 2000) • Node --- test on an attribute; • Leaf node --- specification of a class. • Decision tree induction: • Recursive process of splitting a set of instances with correct class information (training set) into subsets based on a particular attribute; • E.g., CHAID (Kass 1980), CART (Breiman et al. 1984), C4.5(Quinlan 1993) .

  25. Other techniques of modeling • Feedforward neural network with back-propagation: (Thill & Mozolin 2000, Demuth & Beale 2000) • A way of deriving a mapping of multiple input variables to classification from a training dataset. • Discrete choice model ~ as an example of traditional statistical modeling: • A way to analyze a relationship between a set of independent variables and a dependent variable of binary formor discrete choice outcome among a small set of alternatives; • Probit model/logit model.

  26. Case study

  27. Crash distribution in the study region Data • 1997 vehicle crash data for the Buffalo, NY area (by New York State Department of Transportation): • NY State highways; • Milepost system with the resolution of 0.1 mile; • 1,658 crashes in the study region; • Mileposts are used as reference points;  Scale of analysis = 0.1 mile; • Monte Carlo simulation with 1,000 trials (0.1% significance level).

  28. 0.1~0.5mile 0.1mile Stage 1: Global scale results • Under the null hypothesis: • Crash probability = uniform over the network;  • Crash probability = proportional to traffic volume; • Annual Average Daily Traffic.

  29. KLINCS at 0.1 mile scale Adjusted for AADT Cluster: 110 ref. points Random: 1304 ref. points Dispersion: 38 ref. points (Total: 1452) Stage 1: Local scale results KLINCS at 0.1 mile scale Not adjusted for AADT Cluster: 125 ref. points Random: 1327 ref. points Dispersion: 0 ref. points (Total: 1452)

  30. Stage 1 local results (cont.) ILINCS at 0.1 mile scale adjusted for AADT GLINCS at 0.1 mile scale adjusted for AAD T Positive autocorrelation: 23 links Not significant: 1462 links Negative autocorrelation: 0 links (Total: 1485) High-valued cluster: 19 links Not significant: 1438 links Low-valued cluster: 28 links (Total: 1485)

  31. Stage 1 local results (cont.) Priority Investigation Locations (PILs) designated by NYSDOT KLINCS at 0.1 mile scale Adjusted for AADT

  32. Stage 2: Inductive learning results • AADT-adjusted KLINCS classification • Decision tree by the C4.5 induction algorithm with 24 attributes

  33. Stage 2 results (cont.) • AADT-adjusted GLINCS model • Dependent variable = degree of significant clustering (0~1000) • Model tree, where each leaf node represents a linear model

  34. Stage 2 results (cont.) • Accuracy for the test set: • Not much difference between the three models, especially in terms of all instances; • Because 90% of the instances are “random,” the modeling processes tried to fit the models more to the random instances to make fewer errors Weighting schemes to emphasize underrepresented classes

  35. Conclusions • This research proposes a comprehensive framework for a network-based spatial cluster analysis when the phenomenon of interest is constrained by a network space; • Event-based data & link-attribute-based data; • Detection of local clusters (stage 1) • The LINCS methods can detect clusters without detecting spurious clusters caused merely by the network constraints; • Identification of influencing factors (stage 2) • Inductive learning techniques are useful to construct robust models to explain the detected clusters in relation to the network’s attributes.

  36. Conclusions (cont.) • Combination of exploratory spatial data analysis and inductive learning modeling is a powerful tool • to reveal latent relationships between distributions of spatial phenomena and characteristics of physical/social environments; and then • to assist spatial decision making processes by providing guidance where/what to focus attention; • Stage 1  Spatial focus; Stage 2  Contextual focus. • The case study showed relatively well correspondence between the LINCS results and PILs, which verifies the effectiveness of the LINCS methods.

  37. Thank you! Any questions & suggestions

More Related