1 / 21

Pradeep Mohan*, Shashi Shekhar , Zhe Jiang University of Minnesota, Twin-Cities, MN

A spatial neighborhood graph approach to Regional Co-location pattern discovery: summary of results. Pradeep Mohan*, Shashi Shekhar , Zhe Jiang University of Minnesota, Twin-Cities, MN. James A. Shine, James P. Rogers, Nicole Wayant

alagan
Download Presentation

Pradeep Mohan*, Shashi Shekhar , Zhe Jiang University of Minnesota, Twin-Cities, MN

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A spatial neighborhood graph approach to Regional Co-location pattern discovery: summary of results Pradeep Mohan*, ShashiShekhar, Zhe Jiang University of Minnesota, Twin-Cities, MN James A. Shine, James P. Rogers, Nicole Wayant US Army- ERDC, Topographic Engineering Center, Alexandria, VA *Contact: mohan@cs.umn.edu

  2. Outline • Motivation • Problem Formulation • Computational Approach • Conclusions and Future work

  3. Motivation: Spatial Heterogeneity, the second law of Geography Spatial Heterogeneity (Goodchild, 2004; Goodchild 2003) • Expectations vary across space. • Global models may not explain locally observed phenomena. • Need for place based analysis. Spatial Heterogeneity in Retail • Traditional Data Mining : Which pair of items sell together frequently ? • Ans : Diaper in Transaction  Beer in Transaction. • Is this association true every where ? Answer : Blue Collar neighborhoods Global Spatial Data Mining – Global Co-location patterns • Which pairs of spatial features are located together frequently ? Example: Gas stations and Convenience Stores Our Focus: • Where do certain pairs of spatial features co-locate frequently ? Example: Assaults happen frequently around downtown bars.

  4. Applications • Crime analysis • Localizing frequent crime patterns, Opportunities for crime vary across space! Question: Do downtown bars often lead to assaults more frequently ? • Public Health • Localizing elevated disease risks around putative sources (e.g. mining areas) Courtsey: www.amazon.com Question: Where does high asbestos concentration often lead to lung cancer ? • Ecology • Localizing symbiotic relationships between different species of plants / animals. Question: Where are Plover birds frequently found in the vicinity of a crocodile ? • Courtsey: www.startribune.com Predicting localities of the next crime.

  5. Regional co-location patterns (RCP) • Input: Spatial Features, Crime Reports. • Output: RCP (e.g. < (Bar, Assaults), Downtown >) • Subsets of spatial features. • Frequently located in certain regions of a study area.

  6. Outline • Motivation • Problem Formulation • Basic Concepts • Problem Statement • Challenges • Related Work • Computational Approach • Conclusions and Future work

  7. Basic Concepts: Neighborhoods Prevalence locality • Subsets of spatial framework containing instances of a Pattern. • Simple representation to visualize: Convex Hull • Other Representations possible. Neighborhood Graph • Given: A Spatial Neighbor Relation (spatial neighborhood size) • Nodes: Individual event instances • Edges: Presence (If neighbor relation is satisfied) • Based on Event Centric Model (Huang , 2004)

  8. Basic Concepts: Quantifying regional interestingness • Conditional probability of observing a pattern instance within a locality given an instance of a feature within that locality. Regional Participation Ratio Example Regional Participation index Quantifies the local fraction participating in a relationship. Example

  9. Detailed Statement *Prevalence Threshold = 0.25 *Spatial neighborhood Size = 1 Mile • Given: • A spatial framework, • A collection of boolean spatial event types and their instances. • A minimum interestingness threshold, Pθ • A symmetric and transitive neighbor relation R (e.g. based on Spatial neighborhood size) • Find : • All RCPs with prevalence >= Pθ • Objective: • Minimize computational cost. • Constraints: • Spatial framework is Heterogeneous. • Interest measure captures spatial heterogeneity. • Completeness : All prevalent RCPs are reported. • Correctness: Only prevalent RCPs are reported.

  10. Challenges • Conflicting Requirements • Interest measure captures spatial heterogeneity while supporting scalable algorithms. • Exponential search space. • Candidate pattern set cardinality is exponential in the number of event types. Illustration: Spatial Data Mining (e.g. RCP) Statistics Rigor Computational Scalability

  11. Challenges • Conflicting Requirements • Interest measure captures spatial heterogeneity while supporting scalable algorithms. • Exponential search space. • Candidate pattern set cardinality is exponential in the number of event types. Illustration: {NULL} A C B AB AC BC ABC

  12. Contributions • Regional Co-location Patterns • Neighborhood based Formulation • Interest Measure • Captures the local fraction of events participating in patterns. • Shows attractive computational properties, Honors spatial heterogeneity. • Computational Approach • Computational Structure – Pattern Space Enumeration • Performance Enhancement- Maximal locality based Pruning Strategies • Experimental Evaluation • Performance Evaluation using real datasets, Lincoln, NE • Real world case study.

  13. Related Work Approaches for Regional Co-location Pattern discovery Spatial Neighborhood based Fitness function Clustering (Eick et al., 2008) Zoning Based (Celik et al., 2007) Our Work Zoning Based Fitness Function Clustering • Reports one pattern per interesting region based on a criterion (e.g. Max) • Computational structure and pruning strategies not explored. • Clustering is based on real valued attributes.

  14. Outline • Motivation • Problem Formulation • Computational Approach • Pattern Space Enumeration • Performance Tuning • Experimental Evaluation • Conclusions and Future work

  15. Computational Approach Prevalence Threshold = 0.25 {Null} A B C ✕ 0.16 ✔ ✕ 0.16 0.25 ✔ 0.25 ✔ ✔ 0.33 0.25 ✕ 0.16 ✔ ✔ ✕ 0.25 0.25 0.16 Key Idea • Enumerate Entire Pattern Space. Expensive ! ✕ 0.16 • Examine each pattern and prune. ✔ 0.25 ✔ 0.25 Compute Neighborhoods ✕ Pruned RCP Identify candidate RCP instance Accepted RCP ✔

  16. Performance Tuning: Key Ideas Key Idea • Interest Measure shows special pruning properties in certain subsets of the spatial framework. Maximal Locality Key Properties • Collection of connected instances. • Maximal localities are mutually disjoint. • Contains several RCPs. Key Observations • RPI shows anti-monotonicity property within Maximal Localities • Pruning a co-location, {AB}, prunes all its super sets (e.g. {ABC}, {ABCD}…etc.). • RPI within a Maximal locality is an upper bound to RPI of constituent Prevalence localities.

  17. Performance Tuning Prevalence Threshold = 0.25 {Null} A B C ML1 ML2 ML3 {AB},0.167 {AC},0.25 {BC},0.167 {AB},0.25 {AC},0.25 {BC},0.33 ✕ ✕ No RCP No RCP ✕ <{BC},PL3({BC})>,0.167 <{AC},PL1({AC})>,0.25 ✕ <{BC},PL4({BC})>,0.167 Completeness {ABC}: Pruned Automatically • Pruning a pattern within a maximal locality does not prune any valid RCPs. Compute Maximal Locality Correctness Due to upper bound property of RPI • Accepting a pattern involves additional checks so that only prevalent RCPs are reported. Due to anti-monotonicity of RPI

  18. Experimental Evaluation: Spatial Neighborhood Size • What is the effect of spatial neighborhood size on performance of different algorithms ? • Fixed Parameters: Dataset Size : 7498 instances; # Features: 5; Prevalence Threshold: 0.07 # of RCPs Run Time Trends • Run Time: ML Pruning out performs PS Enumeration by a factor of 1.5 - 5 • # of RCPs examined: ML Pruning out performs PS Enumeration by a factor of 4.13 - 19

  19. Experimental Evaluation: Feature Types • What is the effect of number of feature types on performance of different algorithms ? • Fixed Parameters: Dataset Size : 7498 instances; Spatial neighborhood size: 800 feet; Prevalence Threshold: 0.07 # of RCPs Run Time Trends • Run Time: ML Pruning out performs PS Enumeration by a factor of 1.2 • # of RCPs examined: ML Pruning out performs PS Enumeration by a factor of 1.6 – 3.5

  20. Real Dataset Case study Q: Where do assaults frequently occur around bars ? Are there other factors ? Dataset: Lincoln, NE, Crime data (Winter ‘07), Neighborhood Size = 0.25 miles, Prevalence Threshold = 0.07 RCP of Larceny, Bars and Assaults RCP of Larceny and Assaults RCP of Bar and Assaults Observations • Assaults are more likely to be found in areas reporting larceny (e.g. 47.6 % vs 21.1%) Crimes. • Bars in Downtown are more likely to be crime prone than bars in other areas (e.g. 21.1%, 20.1 %)

  21. Conclusion and Future work • Conclusions • Neighborhood based formulation of Regional Spatial Patterns. • Regional Participation Index: Measures the local fraction of the global count. • Vector representation for Prevalence Localities (other representations possible, convex for simplicity) • Future Work • Other representations for prevalence localities. • Statistical interpretation LISA statistics / variants of Local Ripley’s K , multiple hypothesis testing. • Interpretation using predictive methods (e.g. Geographically Weighted Regression) • Acknowledgement: • Reviewers of ACM GIS • Members of the Spatial database and spatial data mining group, UMN. • U.S. Department of Defense. • Mr. Tom Casady and Kim Koffolt.

More Related