A Multi-Relational Approach to Spatial Classification

Martin Ester School of Computing Science Simon Fraser UniversityBurnaby BC, Canada ester@cs.sfu.ca Arno Knobbe LIACS, Leiden UniversityLeiden, the Netherlands knobbe@liacs.nl A Multi-Relational Approach to Spatial Classification Richard Frank School of Computing Science Simon Fraser UniversityBurnaby BC, Canada rfrank@cs.sfu.ca

MOTIVATION Burgled Houses Burnaby, British Columbia, Canada • Why are some malls profitable? • Why are some houses burgled? • Good location? • Expensive neighbourhood? • Close to major roads? • Learn classifiers given • Location • Feature values • Neighbouring locations • Features of neighbours • Use classifier to predict label of unknown entities

INTRODUCTION Spatial data seems to have multi-relational (MR) aspects MR classification techniques cannot be applied directly to spatial data With MR data the relationships between the entities are explicitly given Spatial relationships are only implied via the entity’s spatial location Non-spatial aggregation cannot deal with spatial dependencies Many relationships  large search space

STEPS Steps to apply multi-relational techniques • Select multi-relational framework • Determine neighbour relationships • Establish relationships and spatial features/literals that can be extracted • Apply spatial classifier • Incorporate relationships and spatial features/literals • Perform the classification in parallel • Analyze results

MULTI-RELATIONAL CLASSIFICATION • Classification with Inductive Logic Programming (ILP) • Find rules that can predict the labels of instances of a target entity • Example. “If a mall has a neighbouring house with income > 50,000 then the mall is profitable” rule head threshold • profit(M,’Profitable') ← • mall(M), neighbour(M,H), house(H), income(I,H)>50,000 classification rule rule body Sample Rule • profit(M,‘Profitable') ← mall(M), • covariance({income(I,H), size(S,H)},{neighbour(M,H), house(H)},C), C>0 aggregation literal Multi-Feature Aggregation Literal

STEPS Steps to apply multi-relational techniques • Select multi-relational framework • Determine neighbour relationships • Establish relationships and spatial features/literals that can be extracted • Apply spatial classifier • Incorporates relationships and spatial features/literals • Performs the classification in parallel • Analyze results

NEIGHBOURHOOD DEFINITIONS a constant sized buffer zone inappropriate for all entity sizes the distribution of entities can change significantly an infinite number of buffer zone sizes could be selected • Tobler’s First Law of Geography (Waldo Tobler) • “Everything is related to everything else, but near things are more related than distant things.” • Neighbourhood definition is required that • Mimics real life • People tend to go to the closest mall, food-store, hospital or airport • Creates meaningful neighbourhood relationships between multiple types of entities in spatial data • Most dominant neighbourhood definition: Buffer Zone • The area that is within distance d to an entity • Major drawbacks

VORONOI NEIGHBOURHOOD • Voronoi Diagrams • Defined and Named after Georgy Feodosevich Voronoy • Partition a plane into regions • Region contains area closest to the entity in the Voronoi cell • Naturally represent relationships between entities • Completely data-driven – no user parameter • Can be computed for • point data (e.g. houses) • segment data (e.g. roads) • areal data (e.g. lakes) Voronoi diagram for Houses Voronoi diagram for Malls

VORONOI NEIGHBOURHOOD Voronoi Neighbourhood Definition Two entities, A and B, are neighbours iff: A intersects the Voronoi cell of B or B intersects the Voronoi cell of A, and A and B are of different types, or, the Voronoi cells of A and B are adjacent, and A and B are of the same type Neighbourhood relationships

EXTRACT RELATIONSHIPS • Initially entity-types exist, no relationships • Extract relationships between entities of different types • Extract relationships between entities of the same type

EXTRACT SPATIAL FEATURES/LITERALS Properties of entities Spatial Features Properties of Relationships Spatial Literals • Size • Location (X, Y) • Area • Distance • Inside • Contains • Direction • Travel Time • Size • Location (X, Y) • Length • Start/End (X, Y) Ex: distance(R,H)<50m Ex: length(R)>1km Properties of neighbourhoods Spatial Aggregation Literals • Spatial Trends • Spatial Autocorrelation • Areal Adjusted Mean Ex: trend({distance(M,H),value(H)}, {house(H),neighbour(H,M)},S)

RULE LEARNING – OVERVIEW • Unified Multi-relational Aggregation-based Spatial Classifier (UnMASC) • Multi-relational based spatial classification algorithm • Two-class problem • Based on the idea of the sequential covering algorithm • Sequential covering algorithm • Generate one rule at a time • Refine rule by adding literals • Start new rule when rule-termination condition applies • Once a rule is finalised • Entities covered are removed • Another rule is started

RULE LEARNING – LITERAL SEARCH • Entity-types needs to be searched • Each search needs to identify the best • Feature(s) ex: value & distance • Aggregation (possibly) ex: trend • Threshold value ex: 0.1 • Comparison operator ex: > • This creates the best candidate literal for that search • Ex: trend({distance(M,H),value(H)}, {house(H),neighbour(H,M)},S), S>0.1

RULE LEARNING – LITERAL SEARCH Initialize: Pick entity type for classification (target entity type) Select class label Start a blank rule Target Entity Type Class Label R1: profitable(M,’yes’)  mall(M)

RULE LEARNING – LITERAL SEARCH Rule 1 – Iteration 1: Search the entity-types referenced in rule for best feature Search neighbours of entity-types referenced in rule for best feature Add best feature to the rule size R1: profitable(M,’yes’)  mall(M), TREND({D, V}, {house(H), neighbour(M,H), value(H, V), distance(M,H)}, S), S > 0 R1: profitable(M,’yes’)  mall(M)

RULE LEARNING – LITERAL SEARCH Rule 1 – Iteration 2: Search the entity-types referenced in rule for best feature Search neighbours of entity-types referenced in rule for best feature Add best feature to the rule type R1: profitable(M,’yes’)  mall(M), TREND({D, V}, {house(H), neighbour(M,H), value(H, V), distance(M,H)}, S), S > 0 , neighbour(R, H), type(R)=‘highway’

RULE LEARNING – LITERAL SEARCH • Number of searches increases as rule length increases Iteration 1 {Malls} {Malls, Malls} {Malls, Houses} {Malls, Roads} Searches: 4 Iteration 2 {Malls} {Malls, Malls} {Malls, Houses} {Malls, Roads} {Malls, Houses, Malls} {Malls, Houses, Roads} {Malls, Houses, Houses} Searches: 7 Iteration 3 {Malls} {Malls, Malls} {Malls, Houses} {Malls, Roads} {Malls, Houses, Malls} {Malls, Houses, Roads} {Malls, Houses, Houses} {Malls, Houses, Roads, Malls} {Malls, Houses, Roads, Houses} {Malls, Houses, Roads, Roads} Searches: 10

PARALLEL SEARCH RuleLearner Best Literal Search Search Best Literal Best Literal Search LiteralEvaluator LiteralEvaluator LiteralEvaluator • UnMASC split into two methods • RuleLearner, the server-component • Collects the searches that require evaluation • Maintains the rules and matching entities • Calls LiteralEvaluator • LiteralEvaluator, performs the searches • Extracts all spatial features • Performs all applicable aggregations • Finds best feature and threshold • Returns the best literal • RuleLearner picks best literal • Number of simultaneous searches possible is set apriori • If number of searches possible < searches required then queuing is done

PARALLEL SEARCH – OPTIMIZATION • Search sizes are different • For example • {Malls}: expected to be small • Only a few malls in a city • No aggregations are involved • {Malls} < {Malls, Houses} • Many houses in a city • Houses must be aggregated over their neighbouring malls • {Malls} < {Malls, Roads} < {Malls, Houses} • Aggregation has to occur • |roads| < |houses| • Very costly search can execute last • Estimate cost of each search based on • Number of entities in target table • Features of entities • Number of relationships between entities used in rule • Reprioritize queue  execute costly search first

EXPERIMENTS • Dataset • Real-world crime data • Collaboration with the Criminology Department at Simon Fraser University • For the Royal Canadian Mounted Police (RCMP) in British Columbia (BC) • Between August 1, 2001 and August 1, 2006 • Location of crime • Type of crime • British Columbia Assessment Authority (BCAA) dataset • Containing the property values of all plots of land within BC • The city of Burnaby, BC was selected • 66,000 entities • Types of entities & counters • Each property was labelled • Burglary exists or not

EXPERIMENTS • UnMASC was evaluated using three experiments • Neighbour using Buffer zones • 2.8 million spatial relationships between entities • Neighbour using Voronoi neighbourhoods • 3.8 million spatial relationships • Use only the target entity type • no neighbouring entity types are evaluated • Effectiveness of the parallelization of UnMASC • Parallel (6 threads) • Serial (1 thread) • 5-fold cross-validation was performed

EXPERIMENTS • Burglaries of Commercial Properties • 2812 commercial properties were selected as the target entities • Target: 33% were burglarized • R7: burglarized(C, ‘yes’) ←commercial(C), MEDIAN(B1, {industrial(I), neighbour(C,I), building_value(I,B1)}, M), M<445,000, industrial(I), neighbour(C,I), TREND({B2, X}, {duplex(D), neighbour(I,D), building_value(D, B2), x_coord(D,X)}, S), S>0.798, MAX(A, {park(P), neighbour(C,P), area(P,A)}, N), N<14,400 • Commercial property is • In a relatively inexpensive industrial neighbourhood • Neighbour to parks which could be a source of people inclined to commit crimes

EXPERIMENTS • Burglaries of High Rise Properties • 1036 high-rise properties were selected • 44.3% of properties were burglarized • burglarized(H, ‘yes’) ←highrise(H), • building_value(H,V), V>2,160,000 • AVG(D1, {commercial(C), neighbour(H,C), distance(H,C,D1)}, A), A<302 • commercial(C), neighbour(H,C), • MEDIAN(W, {park(P), neighbour(C,P), width(P)}, M1), M1<122 • MEDIAN(X, {duplex(D), neighbour(C,D), land_value(D)}, M2), M2>351,500 • Burglarized high-rises are • Expensive • Near high-traffic areas with small parks

CONCLUSION • Multi-relational approach to spatial classification was presented • Proposed a Voronoi-diagram based neighbourhood definition • Introduced a formal set of additions to the First Order Logic framework • Presented a scalable parallel implementation • Showed substantial gains in precision and accuracy • Demonstrated the importance of selecting the proper neighbourhood definition

Thank you

A Multi-Relational Approach to Spatial Classification

A Multi-Relational Approach to Spatial Classification

Presentation Transcript

The Relational Approach to Information Literacy

ANN Approach to ECG Classification

A Multi-Parameter Approach to Lightning Prediction

A New Approach for Classification :

A Multi-disciplinary Approach to Pain

Spatial Synoptic Classification (SSC)

Multi-label Relational Neighbor Classification using Social Context Features

A Lazy Approach to Associative Classification

Statistical Approach to Classification

A Multi-Scale and Multi-Threshold Approach

A Multi-dimensional Approach to Subjective Poverty

A Multi-layered Approach to Exchange Structure

A Multi-dimensional Approach

A Multi-level Approach to Quantization

A multi-pronged approach to treat cancer

A Multi-Agent-Approach

Object Relational Model Spatial Queries

A multi-disciplined approach to tinnitus research

Multi-class Classification