1 / 28

A Multi-Relational Approach to Spatial Classification

Martin Ester School of Computing Science Simon Fraser University Burnaby BC, Canada ester@cs.sfu.ca. Arno Knobbe LIACS, Leiden University Leiden, the Netherlands knobbe@liacs.nl. A Multi-Relational Approach to Spatial Classification. Richard Frank

middaugh
Download Presentation

A Multi-Relational Approach to Spatial Classification

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Martin Ester School of Computing Science Simon Fraser UniversityBurnaby BC, Canada ester@cs.sfu.ca Arno Knobbe LIACS, Leiden UniversityLeiden, the Netherlands knobbe@liacs.nl A Multi-Relational Approach to Spatial Classification Richard Frank School of Computing Science Simon Fraser UniversityBurnaby BC, Canada rfrank@cs.sfu.ca

  2. MOTIVATION Burgled Houses Burnaby, British Columbia, Canada • Why are some malls profitable? • Why are some houses burgled? • Good location? • Expensive neighbourhood? • Close to major roads? • Learn classifiers given • Location • Feature values • Neighbouring locations • Features of neighbours • Use classifier to predict label of unknown entities

  3. INTRODUCTION Spatial data seems to have multi-relational (MR) aspects MR classification techniques cannot be applied directly to spatial data With MR data the relationships between the entities are explicitly given Spatial relationships are only implied via the entity’s spatial location Non-spatial aggregation cannot deal with spatial dependencies Many relationships  large search space

  4. STEPS Steps to apply multi-relational techniques • Select multi-relational framework • Determine neighbour relationships • Establish relationships and spatial features/literals that can be extracted • Apply spatial classifier • Incorporate relationships and spatial features/literals • Perform the classification in parallel • Analyze results

  5. MULTI-RELATIONAL CLASSIFICATION • Classification with Inductive Logic Programming (ILP) • Find rules that can predict the labels of instances of a target entity • Example. “If a mall has a neighbouring house with income > 50,000 then the mall is profitable” rule head threshold • profit(M,’Profitable') ← • mall(M), neighbour(M,H), house(H), income(I,H)>50,000 classification rule rule body Sample Rule • profit(M,‘Profitable') ← mall(M), • covariance({income(I,H), size(S,H)},{neighbour(M,H), house(H)},C), C>0 aggregation literal Multi-Feature Aggregation Literal

  6. STEPS Steps to apply multi-relational techniques • Select multi-relational framework • Determine neighbour relationships • Establish relationships and spatial features/literals that can be extracted • Apply spatial classifier • Incorporates relationships and spatial features/literals • Performs the classification in parallel • Analyze results

  7. NEIGHBOURHOOD DEFINITIONS a constant sized buffer zone inappropriate for all entity sizes the distribution of entities can change significantly an infinite number of buffer zone sizes could be selected • Tobler’s First Law of Geography (Waldo Tobler) • “Everything is related to everything else, but near things are more related than distant things.” • Neighbourhood definition is required that • Mimics real life • People tend to go to the closest mall, food-store, hospital or airport • Creates meaningful neighbourhood relationships between multiple types of entities in spatial data • Most dominant neighbourhood definition: Buffer Zone • The area that is within distance d to an entity • Major drawbacks

  8. VORONOI NEIGHBOURHOOD • Voronoi Diagrams • Defined and Named after Georgy Feodosevich Voronoy • Partition a plane into regions • Region contains area closest to the entity in the Voronoi cell • Naturally represent relationships between entities • Completely data-driven – no user parameter • Can be computed for • point data (e.g. houses) • segment data (e.g. roads) • areal data (e.g. lakes) Voronoi diagram for Houses Voronoi diagram for Malls

  9. VORONOI NEIGHBOURHOOD Voronoi Neighbourhood Definition Two entities, A and B, are neighbours iff: A intersects the Voronoi cell of B or B intersects the Voronoi cell of A, and A and B are of different types, or, the Voronoi cells of A and B are adjacent, and A and B are of the same type Neighbourhood relationships

  10. STEPS Steps to apply multi-relational techniques • Select multi-relational framework • Determine neighbour relationships • Establish relationships and spatial features/literals that can be extracted • Apply spatial classifier • Incorporate relationships and spatial features/literals • Perform the classification in parallel • Analyze results

  11. EXTRACT RELATIONSHIPS • Initially entity-types exist, no relationships • Extract relationships between entities of different types • Extract relationships between entities of the same type

  12. EXTRACT SPATIAL FEATURES/LITERALS Properties of entities Spatial Features Properties of Relationships Spatial Literals • Size • Location (X, Y) • Area • Distance • Inside • Contains • Direction • Travel Time • Size • Location (X, Y) • Length • Start/End (X, Y) Ex: distance(R,H)<50m Ex: length(R)>1km Properties of neighbourhoods Spatial Aggregation Literals • Spatial Trends • Spatial Autocorrelation • Areal Adjusted Mean Ex: trend({distance(M,H),value(H)}, {house(H),neighbour(H,M)},S)

  13. STEPS Steps to apply multi-relational techniques • Select multi-relational framework • Determine neighbour relationships • Establish relationships and spatial features/literals that can be extracted • Apply spatial classifier • Incorporate relationships and spatial features/literals • Perform the classification in parallel • Analyze results

  14. RULE LEARNING – OVERVIEW • Unified Multi-relational Aggregation-based Spatial Classifier (UnMASC) • Multi-relational based spatial classification algorithm • Two-class problem • Based on the idea of the sequential covering algorithm • Sequential covering algorithm • Generate one rule at a time • Refine rule by adding literals • Start new rule when rule-termination condition applies • Once a rule is finalised • Entities covered are removed • Another rule is started

  15. RULE LEARNING – LITERAL SEARCH • Entity-types needs to be searched • Each search needs to identify the best • Feature(s) ex: value & distance • Aggregation (possibly) ex: trend • Threshold value ex: 0.1 • Comparison operator ex: > • This creates the best candidate literal for that search • Ex: trend({distance(M,H),value(H)}, {house(H),neighbour(H,M)},S), S>0.1

  16. RULE LEARNING – LITERAL SEARCH Initialize: Pick entity type for classification (target entity type) Select class label Start a blank rule Target Entity Type Class Label R1: profitable(M,’yes’)  mall(M)

  17. RULE LEARNING – LITERAL SEARCH Rule 1 – Iteration 1: Search the entity-types referenced in rule for best feature Search neighbours of entity-types referenced in rule for best feature Add best feature to the rule size R1: profitable(M,’yes’)  mall(M), TREND({D, V}, {house(H), neighbour(M,H), value(H, V), distance(M,H)}, S), S > 0 R1: profitable(M,’yes’)  mall(M)

  18. RULE LEARNING – LITERAL SEARCH Rule 1 – Iteration 2: Search the entity-types referenced in rule for best feature Search neighbours of entity-types referenced in rule for best feature Add best feature to the rule type R1: profitable(M,’yes’)  mall(M), TREND({D, V}, {house(H), neighbour(M,H), value(H, V), distance(M,H)}, S), S > 0 , neighbour(R, H), type(R)=‘highway’

  19. RULE LEARNING – LITERAL SEARCH • Number of searches increases as rule length increases Iteration 1 {Malls} {Malls, Malls} {Malls, Houses} {Malls, Roads} Searches: 4 Iteration 2 {Malls} {Malls, Malls} {Malls, Houses} {Malls, Roads} {Malls, Houses, Malls} {Malls, Houses, Roads} {Malls, Houses, Houses} Searches: 7 Iteration 3 {Malls} {Malls, Malls} {Malls, Houses} {Malls, Roads} {Malls, Houses, Malls} {Malls, Houses, Roads} {Malls, Houses, Houses} {Malls, Houses, Roads, Malls} {Malls, Houses, Roads, Houses} {Malls, Houses, Roads, Roads} Searches: 10

  20. PARALLEL SEARCH RuleLearner Best Literal Search Search Best Literal Best Literal Search LiteralEvaluator LiteralEvaluator LiteralEvaluator • UnMASC split into two methods • RuleLearner, the server-component • Collects the searches that require evaluation • Maintains the rules and matching entities • Calls LiteralEvaluator • LiteralEvaluator, performs the searches • Extracts all spatial features • Performs all applicable aggregations • Finds best feature and threshold • Returns the best literal • RuleLearner picks best literal • Number of simultaneous searches possible is set apriori • If number of searches possible < searches required then queuing is done

  21. PARALLEL SEARCH – OPTIMIZATION • Search sizes are different • For example • {Malls}: expected to be small • Only a few malls in a city • No aggregations are involved • {Malls} < {Malls, Houses} • Many houses in a city • Houses must be aggregated over their neighbouring malls • {Malls} < {Malls, Roads} < {Malls, Houses} • Aggregation has to occur • |roads| < |houses| • Very costly search can execute last • Estimate cost of each search based on • Number of entities in target table • Features of entities • Number of relationships between entities used in rule • Reprioritize queue  execute costly search first

  22. STEPS Steps to apply multi-relational techniques • Select multi-relational framework • Determine neighbour relationships • Establish relationships and spatial features/literals that can be extracted • Apply spatial classifier • Incorporate relationships and spatial features/literals • Perform the classification in parallel • Analyze results

  23. EXPERIMENTS • Dataset • Real-world crime data • Collaboration with the Criminology Department at Simon Fraser University • For the Royal Canadian Mounted Police (RCMP) in British Columbia (BC) • Between August 1, 2001 and August 1, 2006 • Location of crime • Type of crime • British Columbia Assessment Authority (BCAA) dataset • Containing the property values of all plots of land within BC • The city of Burnaby, BC was selected • 66,000 entities • Types of entities & counters • Each property was labelled • Burglary exists or not

  24. EXPERIMENTS • UnMASC was evaluated using three experiments • Neighbour using Buffer zones • 2.8 million spatial relationships between entities • Neighbour using Voronoi neighbourhoods • 3.8 million spatial relationships • Use only the target entity type • no neighbouring entity types are evaluated • Effectiveness of the parallelization of UnMASC • Parallel (6 threads) • Serial (1 thread) • 5-fold cross-validation was performed

  25. EXPERIMENTS • Burglaries of Commercial Properties • 2812 commercial properties were selected as the target entities • Target: 33% were burglarized • R7: burglarized(C, ‘yes’) ←commercial(C), MEDIAN(B1, {industrial(I), neighbour(C,I), building_value(I,B1)}, M), M<445,000, industrial(I), neighbour(C,I), TREND({B2, X}, {duplex(D), neighbour(I,D), building_value(D, B2), x_coord(D,X)}, S), S>0.798, MAX(A, {park(P), neighbour(C,P), area(P,A)}, N), N<14,400 • Commercial property is • In a relatively inexpensive industrial neighbourhood • Neighbour to parks which could be a source of people inclined to commit crimes

  26. EXPERIMENTS • Burglaries of High Rise Properties • 1036 high-rise properties were selected • 44.3% of properties were burglarized • burglarized(H, ‘yes’) ←highrise(H), • building_value(H,V), V>2,160,000 • AVG(D1, {commercial(C), neighbour(H,C), distance(H,C,D1)}, A), A<302 • commercial(C), neighbour(H,C), • MEDIAN(W, {park(P), neighbour(C,P), width(P)}, M1), M1<122 • MEDIAN(X, {duplex(D), neighbour(C,D), land_value(D)}, M2), M2>351,500 • Burglarized high-rises are • Expensive • Near high-traffic areas with small parks

  27. CONCLUSION • Multi-relational approach to spatial classification was presented • Proposed a Voronoi-diagram based neighbourhood definition • Introduced a formal set of additions to the First Order Logic framework • Presented a scalable parallel implementation • Showed substantial gains in precision and accuracy • Demonstrated the importance of selecting the proper neighbourhood definition

  28. Thank you

More Related