1 / 30

Identifying Patterns In Spatial Data

Understand spatial data mining, identify patterns in vast datasets like GPS traces, crime reports, and remote sensing images. Learn spatial statistics, relationship operations, and outlier detection.

mkristopher
Download Presentation

Identifying Patterns In Spatial Data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Identifying Patterns InSpatial Data Xun Zhou University of Iowa September 5, 2014

  2. Outline • Introduction • Spatial Dataand Models • Statistical models • Spatial Pattern Families • Computational Challenges

  3. What is spatial Data mining (SDM) • Identifying interesting, non-trivia, and useful patterns from large spatial datasets • “Spatial” is general – includes spatio-temporal • Examples of spatial/spatio-temporal datasets: • GPS traces • Facebook /Twitter check-ins • Climate observations (e.g., rainfall, temperature, etc). • Remotely sensed images (e.g., NASA products) • Crime reports • Disease maps and records • Traffic statistics and road networks • Sales/market price data, supply maps

  4. Why is SDM important • Location/time information brings rich context • Support decision making • Understanding natural phenomenon • Improve the quality of knowledge • London Cholera 1854 – John Snow • Modern examples • Predict land cover type with limited samples • Which animals often live in the same area? • Detect outbreaks of diseases/crimes • Find anomalous climate events Picture Courtesy: Prof. Shashi Shekhar @ UMN

  5. What is “special” about “spatial” Picture Source: [1]

  6. Spatial Data Mining Components • Input Data • Statistical Foundations • Output patterns • Computational Process

  7. Outline • Introduction • Spatial Dataand Models • Statistical models • Spatial Pattern Families • Computational Challenges

  8. Spatial Data Types • Two data representation models Picture source: [2]

  9. Spatial Relationships and operations • Between spatial objects: • Set-oriented: Union, Intersection, Membership… • Topological: Meet, within, overlap, connected… • Directional: North, East, left, above, below… • Metric: Distance, area, perimeter • Spatial field operations • Local, Focal, Zonal, Global Individual location (elevation > 1000 ft.) Among all the locations (The Everest) Part of a region (Mountain peak) A small neighborhood (slope, gradient)

  10. Outline • Introduction • Spatial Dataand Models • Statistical models • Spatial Pattern Families • Computational Challenges

  11. Two key features • Spatial Autocorrelation • The first law of geography[*]: “Everything is related to everything, but near things are more relevant than distant things”. • Spatial features are usually auto-correlated or clustered rather than randomly distributed • Spatial heterogeneity • Spatial patterns are not uniform globally – they vary from place to place. [*] Tobler W., (1970) "A computer movie simulating urban growth in the Detroit region". Economic Geography, 46(2): 234-240.

  12. Statistical foundations • Spatial statistics – a brunch of statistics * These are statistical models (like normal distribution) and may not lineup with data representation models.

  13. Spatial Neighborhood • A collection of nearby location/spatial object • Adjacent/connected objects/locations • Within a certain distance • The W-matrix: A B C D r

  14. Outline • Introduction • Spatial Dataand Models • Statistical models • Spatial Pattern Families • Computational Challenges

  15. Spatial Pattern families • A comparison with traditional DM tasks

  16. Spatial Prediction C4.5 results on land cover data [5] • Traditional classifiers based on i.i.d. and global model • Linear regression, Decision Tree, SVM, CART, etc. • Spatial auto-correlation and variation are not modeled • Predicting land cover types, location-based recommendation • Regression • Spatial Decision Tree[5] • Information gain function: add spatial autocorrelation measure • Decision rules: Illustration of focal-test-based spatial decision tree[5]

  17. Spatial Outlier detection • Traditional Anomaly Detection • Data is anomalous w.r.t. global data distribution • Spatial outlier[6] • Data is anomalous w.r.t. its neighbors (discontinuity) • Finding Suspicious buildings, broken sensors, or other points of interest… • Methods: • Variogram clouds • Moran scatterplot • Spatial Statistic (S) 1 1 1 2 1 5 1 2 1 1 1 2 2 2 2 2 4 5 4 5 4 5 4 5 4 5 5 5 5 4 4 5 4 5 5 4 1-D spatial data and distribution [1]

  18. Spatial Association • Spatial Co-location pattern[7] • Given a number of spatial object types and instances • Find sets of types that are frequently located in proximity • Example: {Fox, Rabbits}, {Nile Crocodiles, Egyptian Plover} Pictures source: [1] {‘+’, ‘x’}, {‘o’, ‘*’}

  19. Spatial Clustering • Grouping spatial objects into clusters such that • Intra-cluster similarity is maximized • Inter-cluster similarity is minimized • Detecting communities, crowds, building blocks, etc. • Is there a clustering tendency of data in space (point data)? 1. Hierarchical 2. Partitioning: k-means 3. Density-based: DBSCAN Picture Courtesy: Prof. Shashi Shekhar @ UMN Complete Spatial Randomness(CSR) Clustered Di-clustered

  20. Spatial hotspot detection • Special case of clustering • Identify regions with high density - not a complete partitioning of data • Ignore noise or sparse clusters • Crime/disease outbreaks, traffic jam, water pollution… • Statistical significance – avoid random clusters • Density-based approaches: DBSCAN[8] • Statistical tests – spatial scan statistics[9](public health) Spatial Scan Statistics Spatial Scan Statistics DBSCAN DBSCAN

  21. New dimensions of spatial patterns • Patterns on Spatial Networks • Hotspots (Dangerous routes with high risk of accidents)[10] • Clusters (Crimes along the streets, bus/bike route planning) • Predictions • Irregular/complex-shaped Spatial Patterns • Complex-shaped clusters (terrain constraints) • Irregular Hotspots (gerrymander …) Results on pedestrian fatality data from Orlando, FL.[10]

  22. Adding Time • Input data • Spatial data  Spatio-temporal data • Time series • Vector: point sequences, polygon series… • Raster: image sequences, spatial time series (a time series at each grid) • Relationship: before, after, during, simultaneous, … • Statistical Foundations • Markov Chain, Hidden Markov Model… • Spatiotemporal Statistics

  23. Adding Time - PATTERNS

  24. Adding Time – New PATTERNS • New Dimensions of Temporal Information • Change • Repeating/periodicity 2006 2001 2012 Vegetation increase in Saudi Arabia due to irrigation [14] An annual increase of 11.5%, 2001-2012

  25. Change Footprint PATTERNS Static Local Time Between snapshots Time Focal Point in time series Time Interval in time series Zonal Time

  26. Outline • Introduction • Spatial Dataand Models • Statistical models • Spatial Pattern Families • Computational Challenges

  27. Computational Challenges • Neighborhood graph generation • Parameter Estimation • Better Interpretability • Complex-shapes of pattern • Filter-n-refine approach • Pattern Completeness • High combinatorics of patterns • Enumeration and pruning strategies • Interest measure property • DP or Greedy may not be used • HPC with Spatial Data Mining • Parallel/Cloud Computing • GIS on Hadoop (ESRI) Pattern Interpretability Conceptual Modeling balance Interest measure Algorithm Design Computational Scalability

  28. Summary • What is SDM and why it’s important • What’s special about spatial • Pattern families, potential directions and applications • Computational Challenges

  29. Acknowledgement • This presentation is prepared based on materials from Prof. Shashi Shekhar and the Spatial Database and Spatial Data Mining Group at the University of Minnesota (http://www.spatial.cs.umn.edu/).

  30. References and readings [1]. Shekhar, Shashi, et al. "Identifying patterns in spatial information: A survey of methods." Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 1.3 (2011): 193-214. [2]. Xun Zhou, Shashi Shekhar, and Reem Y. Ali. "Spatiotemporal change footprint pattern discovery: an inter‐disciplinary survey." Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4.1 (2014): 1-23. [3]. Shashi Shekhar and Sanjay Chawla. Spatial Database: A Tour. Prentice Hall 2003. [4]. Banerjee, Sudipto, Alan E. Gelfand, and Bradley P. Carlin. Hierarchical modeling and analysis for spatial data. CRC Press, 2004. [5]. Jiang, Z., Shekhar, S., Zhou, X., Knight, J., & Corcoran, J. (2013, December). Focal-test-based spatial decision tree learning: A summary of results. In Data Mining (ICDM), 2013 IEEE 13th International Conference on (pp. 320-329). IEEE. [6]. Shekhar, Shashi, Chang-Tien Lu, and Pusheng Zhang. "A unified approach to detecting spatial outliers." GeoInformatica 7, no. 2 (2003): 139-166. [7]. Y Huang, S Shekhar, H Xiong, Discovering colocation patterns from spatial data sets: a general approach. Knowledge and Data Engineering, IEEE Transactions on 16 (12), 1472-1485 [8]. Ester, Martin; Kriegel, Hans-Peter; Sander, Jörg; Xu, Xiaowei (1996). "A density-based algorithm for discovering clusters in large spatial databases with noise". In Simoudis, Evangelos; Han, Jiawei; Fayyad, Usama M. Proceedings of the Second International Conference on Knowledge Discovery and Data Mining (KDD-96) [9]. Kulldorff, Martin. "A spatial scan statistic." Communications in Statistics-Theory and methods 26.6 (1997): 1481-1496. [10]. Dev Oliver, Shashi Shekhar, Xun Zhou, EmreEftelioglu, Michael Evans, Qiaodi Zhuang, James Kang, Renee Laubscher and Christopher Farah. Significant Route Discovery: A Summary of Results. In GIScience 2014 (to appear).[11]. Celik, Mete, et al. "Mixed-drove spatiotemporal co-occurrence pattern mining." Knowledge and Data Engineering, IEEE Transactions on 20.10 (2008): 1322-1335. [12]. Mohan, Pradeep, Shashi Shekhar, James A. Shine, and James P. Rogers. "Cascading spatio-temporal pattern discovery." Knowledge and Data Engineering, IEEE Transactions on 24, no. 11 (2012): 1977-1992. [13]. Daniel B. Neill, Andrew W. Moore, MaheshkumarSabhnani, and Kenny Daniel. Detection of emerging space-time clusters. Proceedings of the 11th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 218-227, 2005 [14]. Xun Zhou, Shashi Shekhar, Dev Oliver. "Discovering Persistent Change Windows in Spatiotemporal Datasets: A Summary of Results". In 2nd ACM SIGSPATIAL International Workshop on Analytics for Big Geospatial Data (BigSpatial-2013), Nov 5, 2013, Orlando, Florida, USA.

More Related