Pradeep Mohan*, Shashi Shekhar , Zhe Jiang University of Minnesota, Twin-Cities, MN

A spatial neighborhood graph approach to Regional Co-location pattern discovery: summary of results Pradeep Mohan*, ShashiShekhar, Zhe Jiang University of Minnesota, Twin-Cities, MN James A. Shine, James P. Rogers, Nicole Wayant US Army- ERDC, Topographic Engineering Center, Alexandria, VA *Contact: mohan@cs.umn.edu

Outline • Motivation • Problem Formulation • Computational Approach • Conclusions and Future work

Motivation: Spatial Heterogeneity, the second law of Geography Spatial Heterogeneity (Goodchild, 2004; Goodchild 2003) • Expectations vary across space. • Global models may not explain locally observed phenomena. • Need for place based analysis. Spatial Heterogeneity in Retail • Traditional Data Mining : Which pair of items sell together frequently ? • Ans : Diaper in Transaction  Beer in Transaction. • Is this association true every where ? Answer : Blue Collar neighborhoods Global Spatial Data Mining – Global Co-location patterns • Which pairs of spatial features are located together frequently ? Example: Gas stations and Convenience Stores Our Focus: • Where do certain pairs of spatial features co-locate frequently ? Example: Assaults happen frequently around downtown bars.

Applications • Crime analysis • Localizing frequent crime patterns, Opportunities for crime vary across space! Question: Do downtown bars often lead to assaults more frequently ? • Public Health • Localizing elevated disease risks around putative sources (e.g. mining areas) Courtsey: www.amazon.com Question: Where does high asbestos concentration often lead to lung cancer ? • Ecology • Localizing symbiotic relationships between different species of plants / animals. Question: Where are Plover birds frequently found in the vicinity of a crocodile ? • Courtsey: www.startribune.com Predicting localities of the next crime.

Regional co-location patterns (RCP) • Input: Spatial Features, Crime Reports. • Output: RCP (e.g. < (Bar, Assaults), Downtown >) • Subsets of spatial features. • Frequently located in certain regions of a study area.

Outline • Motivation • Problem Formulation • Basic Concepts • Problem Statement • Challenges • Related Work • Computational Approach • Conclusions and Future work

Basic Concepts: Neighborhoods Prevalence locality • Subsets of spatial framework containing instances of a Pattern. • Simple representation to visualize: Convex Hull • Other Representations possible. Neighborhood Graph • Given: A Spatial Neighbor Relation (spatial neighborhood size) • Nodes: Individual event instances • Edges: Presence (If neighbor relation is satisfied) • Based on Event Centric Model (Huang , 2004)

Basic Concepts: Quantifying regional interestingness • Conditional probability of observing a pattern instance within a locality given an instance of a feature within that locality. Regional Participation Ratio Example Regional Participation index Quantifies the local fraction participating in a relationship. Example

Detailed Statement *Prevalence Threshold = 0.25 *Spatial neighborhood Size = 1 Mile • Given: • A spatial framework, • A collection of boolean spatial event types and their instances. • A minimum interestingness threshold, Pθ • A symmetric and transitive neighbor relation R (e.g. based on Spatial neighborhood size) • Find : • All RCPs with prevalence >= Pθ • Objective: • Minimize computational cost. • Constraints: • Spatial framework is Heterogeneous. • Interest measure captures spatial heterogeneity. • Completeness : All prevalent RCPs are reported. • Correctness: Only prevalent RCPs are reported.

Challenges • Conflicting Requirements • Interest measure captures spatial heterogeneity while supporting scalable algorithms. • Exponential search space. • Candidate pattern set cardinality is exponential in the number of event types. Illustration: Spatial Data Mining (e.g. RCP) Statistics Rigor Computational Scalability

Challenges • Conflicting Requirements • Interest measure captures spatial heterogeneity while supporting scalable algorithms. • Exponential search space. • Candidate pattern set cardinality is exponential in the number of event types. Illustration: {NULL} A C B AB AC BC ABC

Contributions • Regional Co-location Patterns • Neighborhood based Formulation • Interest Measure • Captures the local fraction of events participating in patterns. • Shows attractive computational properties, Honors spatial heterogeneity. • Computational Approach • Computational Structure – Pattern Space Enumeration • Performance Enhancement- Maximal locality based Pruning Strategies • Experimental Evaluation • Performance Evaluation using real datasets, Lincoln, NE • Real world case study.

Related Work Approaches for Regional Co-location Pattern discovery Spatial Neighborhood based Fitness function Clustering (Eick et al., 2008) Zoning Based (Celik et al., 2007) Our Work Zoning Based Fitness Function Clustering • Reports one pattern per interesting region based on a criterion (e.g. Max) • Computational structure and pruning strategies not explored. • Clustering is based on real valued attributes.

Outline • Motivation • Problem Formulation • Computational Approach • Pattern Space Enumeration • Performance Tuning • Experimental Evaluation • Conclusions and Future work

Computational Approach Prevalence Threshold = 0.25 {Null} A B C ✕ 0.16 ✔ ✕ 0.16 0.25 ✔ 0.25 ✔ ✔ 0.33 0.25 ✕ 0.16 ✔ ✔ ✕ 0.25 0.25 0.16 Key Idea • Enumerate Entire Pattern Space. Expensive ! ✕ 0.16 • Examine each pattern and prune. ✔ 0.25 ✔ 0.25 Compute Neighborhoods ✕ Pruned RCP Identify candidate RCP instance Accepted RCP ✔

Performance Tuning: Key Ideas Key Idea • Interest Measure shows special pruning properties in certain subsets of the spatial framework. Maximal Locality Key Properties • Collection of connected instances. • Maximal localities are mutually disjoint. • Contains several RCPs. Key Observations • RPI shows anti-monotonicity property within Maximal Localities • Pruning a co-location, {AB}, prunes all its super sets (e.g. {ABC}, {ABCD}…etc.). • RPI within a Maximal locality is an upper bound to RPI of constituent Prevalence localities.

Performance Tuning Prevalence Threshold = 0.25 {Null} A B C ML1 ML2 ML3 {AB},0.167 {AC},0.25 {BC},0.167 {AB},0.25 {AC},0.25 {BC},0.33 ✕ ✕ No RCP No RCP ✕ <{BC},PL3({BC})>,0.167 <{AC},PL1({AC})>,0.25 ✕ <{BC},PL4({BC})>,0.167 Completeness {ABC}: Pruned Automatically • Pruning a pattern within a maximal locality does not prune any valid RCPs. Compute Maximal Locality Correctness Due to upper bound property of RPI • Accepting a pattern involves additional checks so that only prevalent RCPs are reported. Due to anti-monotonicity of RPI

Experimental Evaluation: Spatial Neighborhood Size • What is the effect of spatial neighborhood size on performance of different algorithms ? • Fixed Parameters: Dataset Size : 7498 instances; # Features: 5; Prevalence Threshold: 0.07 # of RCPs Run Time Trends • Run Time: ML Pruning out performs PS Enumeration by a factor of 1.5 - 5 • # of RCPs examined: ML Pruning out performs PS Enumeration by a factor of 4.13 - 19

Experimental Evaluation: Feature Types • What is the effect of number of feature types on performance of different algorithms ? • Fixed Parameters: Dataset Size : 7498 instances; Spatial neighborhood size: 800 feet; Prevalence Threshold: 0.07 # of RCPs Run Time Trends • Run Time: ML Pruning out performs PS Enumeration by a factor of 1.2 • # of RCPs examined: ML Pruning out performs PS Enumeration by a factor of 1.6 – 3.5

Real Dataset Case study Q: Where do assaults frequently occur around bars ? Are there other factors ? Dataset: Lincoln, NE, Crime data (Winter ‘07), Neighborhood Size = 0.25 miles, Prevalence Threshold = 0.07 RCP of Larceny, Bars and Assaults RCP of Larceny and Assaults RCP of Bar and Assaults Observations • Assaults are more likely to be found in areas reporting larceny (e.g. 47.6 % vs 21.1%) Crimes. • Bars in Downtown are more likely to be crime prone than bars in other areas (e.g. 21.1%, 20.1 %)

Conclusion and Future work • Conclusions • Neighborhood based formulation of Regional Spatial Patterns. • Regional Participation Index: Measures the local fraction of the global count. • Vector representation for Prevalence Localities (other representations possible, convex for simplicity) • Future Work • Other representations for prevalence localities. • Statistical interpretation LISA statistics / variants of Local Ripley’s K , multiple hypothesis testing. • Interpretation using predictive methods (e.g. Geographically Weighted Regression) • Acknowledgement: • Reviewers of ACM GIS • Members of the Spatial database and spatial data mining group, UMN. • U.S. Department of Defense. • Mr. Tom Casady and Kim Koffolt.

Pradeep Mohan*, Shashi Shekhar , Zhe Jiang University of Minnesota, Twin-Cities, MN

Pradeep Mohan*, Shashi Shekhar , Zhe Jiang University of Minnesota, Twin-Cities, MN

Presentation Transcript

Overview of IBM India and IBM Research

Twin Cities Metro Advanced Practice Center Overview

Minnesota Fish Identification and Characteristics

URBANIZATION or HOW CITIES GROW

General Medicine Update

Minnesota Housing Programs

The Experience of New Mothers in Minnesota: Purpose, Design, Analysis and Early Findings from Minnesota PRAMS

Toben F Nelson, ScD Division of Epidemiology and Community Health University of Minnesota

vs. The AGREE Instrument

The World‘s Happiest Cities 世界上前十名最快樂的城市

Fellowship

FIND: Faulty Node Detection for Wireless Sensor Networks

TANVIR AHMED and ANAND R. TRIPATHI University of Minnesota Student: Yu-Cheng Hsiao

Fatih Ecevit Max Planck Institute for Mathematics in the Sciences

Later Vedic Literature

Twin Cities District Dietetic Association Meeting November 9, 2010 Kim Bihm, RD, LD, CDE

Logic Models

Fatih Ecevit Max Planck Institute for Mathematics in the Sciences

Continuous heterogeneity

What twin studies teach us about the causes of alcoholism

The Unintelligible Preschooler: