1 / 24

Title: Spatial Data Mining in Geo-Business

Title: Spatial Data Mining in Geo-Business. Overview. Paper available online at www.innovativegis.com/basis/present/GeoTec08/. Twisting the Perspective of Map Surfaces — describes the character of spatial distributions through the generation of a customer density surface

harvey
Download Presentation

Title: Spatial Data Mining in Geo-Business

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Title:Spatial Data Mining in Geo-Business

  2. Overview Paper available online atwww.innovativegis.com/basis/present/GeoTec08/ • Twisting the Perspective of Map Surfaces — describes the character of spatial distributions through the generation of a customer density surface • Linking Numeric and Geographic Distributions — investigates the link between numeric and geographic distributions of mapped data • Interpolating Spatial Distributions —discusses the basic concepts underlying spatial interpolation • Interpreting Interpolation Results —describes the use of “residual analysis” for evaluating spatial interpolation performance • Characterizing Data Groups —describes the use of “data distance” to derive similarity among the data patterns in a set of map layers • Identifying Data Zones —describes the use of “level-slicing” for classifying locations with a specified data pattern (data zones) • Mapping Data Clusters—describes the use of “clustering” to identify inherent groupings of similar data patterns • Mapping the Future —describes the use of “linear regression” to develop prediction equations relating dependent and independent map variables • Mapping Potential Sales —describes an extensive geo-business application that combines retail competition analysis and product sales prediction

  3. Customer Street Address Customer GIS Location Density Surface Totals Classified Density Levels Customer Counts (# per cell) Geo-Coding Vector to Raster Roving Window Classify Calculates the total number of customers within a roving window– customer density Counts the number of customers (points) within in each grid cell 91 3D surface plot 2D perspective display of density contours 2D grid display of customer counts Density Map Density Surface Analysis

  4. Customer Density (Map Surface) Customer Density (Non-spatial Statistics) Identifying Pockets of High Density Unusually High = Mean + 1 Standard Deviation

  5. Raster (cell) Analysis Frame …V to R Conversion plots customers location in the analysis frame (grid) Latitude, Longitude, C, R Vector (point) Customer Database (non-spatial) Customer Database (spatial) Grid-based Analysis Frame (Keystone Concept) …GeoCoding plots customers address on the streets map …appends Lat, Lon, Column, Row location to customer records

  6. Surface Map Avg = 42.9 66.3 Point Samples 66.3 “Spikes ‘n Blanket” “Spikes” Surface Modeling (Spatial Interpolation) …“maps the variance” by using geographic position to help explain the differences in the sample values.

  7. #14 #15 #16 x #11 1) Identifydata points in window— #11value = 56.9 #14value = 22.5 #15value = 52.3 #16value = 66.3 4) Assignweight-averaged value—53.35 Sampled Data 1 2 3 4 5 6 7 8 9 10 11 12 #11 #16 X 13 14 15 16 #15 #14 X IDW Interpolation (Inverse Distanced Weighted) 3) Weight-averagevalues in the window based on distance to grid location— (1/Distance)2 * Value “closer has more influence” 2) Calculatedistance from location to data points— Pythagorean Theorem #11distance = 22.80 #14distance = 26.08 #15distance = 6.32 #16distance = 14.14 5) Movewindow to next grid location and repeat

  8. Difference Surface (IDW – Average) Min = -26.1 Max = 29.5 Average IDW Surface Reds Avg>IDW Greens Avg<IDW IDW - Average Average vs. IDW Interpolated Surface

  9. Difference Surface (IDW – Krig) Min = -14.8 Max = 5.0 Krig Surface IDW Surface IDW - Krig Reds Krig>IDW Greens Krig<IDW IDW vs. Krig Interpolated Surfaces

  10. Housing Density (Units/ac) South has Lower Density Home Value South has Higher Values ($K) Home Age (Years) South has Newer Homes Assessing Relationships Among Maps

  11. Geographic Space– relative spatial position of measurements Point #2 Point #1 Data Similarity is inversely proportional to Data Distance …as data distance increases, the map values for two locations are less similar Density Value Comparison Point #1 D= Low (2.4 units/ac) V= High ($407,000) A= Low (18.3 years) Age Least Similar Point #2 D= High (4.8 units/ac) V= Low ($190,000) A= High (51.2 years) Data Space– relative numerical magnitude of measurements Geographic Space  Data Space

  12. Least Similar Point = 4.8, 190, 51.2 Data Space Least similar point Percent Similar 0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 …the farthest away point in data space (least similar) is set 0 and the comparison point is set to 100 — …all other Data Distances are scaled in terms of their relative similarity as “percent similar” to the comparison point (0 to 100) Comparison point Comparison Point = 2.4, 407, 18.3 Geographic Space Assessing Map Similarity “Data Distance” determines similarity among data patterns

  13. Unusually High Mean =3.56 Housing Density +StDev = 0.80 LevelMin =4.36 Geographic Space Data Space Geographic Space 67.2 = -StDev 189.8 = LevelMax Unusually Low 257.0 = Mean Home Value Identifying Data Patterns of Interest

  14. Unusually High Housing Density Unusually Low Home Value Unusually High Density and Low Value Geographic Space Level-Slicing Classifier (two variables) Data Space

  15. Data Space …identifies combinations of selected measurements (high D, low V, high A) 1 + 2 + 4 = 7 (high D, low V but not high A) 1 + 2 + 0 = 3 Geographic Space …locates combinations of selected measurements (high D, low V, high A) Level-Slicing Classifier (three variables) …common “data zones” can be mapped by identifying specific levels of each mapped variable then adding the binary maps

  16. Data Space …plots and identifies groups of similar data values Relatively high D, low V and high A Relatively low D, high V and low A Three Clusters Four Clusters Geographic Space …maps common data patterns (clusters) Two Clusters Spatial Data Clustering …“data clusters” are identified asgroups of neighboring data pointsin Data Space, and then mapped ascorresponding grid cells in Geographic Space

  17. Loan Concentration Loan Concentration vs.Housing Density High V Y = 26 -5.7 * Xdensity [R2 = 40%] Housing Density Loan Concentration vs.Home Value Low V Y = -13 +0.074 * Xvalue [R2 = 46%] Home Value High Loan Concentration vs.Home Age V Y = 17 - 0.074 * Xage [R2 = 23%] Home Age Low Spatial Regression (prediction equation) …relationship betweenLoan Concentrationand independent variableshousing Density, Valueand Age

  18. Step 1 • Build travel time maps for entire market area • Compute travel time from every location to our store • This requires grid-based map analysis software • Update customer record with travel time to our store • Add this to every non-customer record in trading area Step 2 • Repeat for every competitor • Update every customer record with travel time to competitor store • Add to every non-customer record in trading area Step 3 • Compute Travel Time Gain for travel to main store • Every customer and non-customer record is updated • The greater gain indicates lower travel effort to visit our store Competition Analysis(Spatial Analysis Steps)

  19. Step 4 • Build analytic dataset from customer data • Geocoding information • Transactions, sales, product category purchases • Visitation frequency, recency, spend • Customer Segment, travel times, demographics Step 5 • Build predictive models • Probability of Visitation (not possible for this demo) • Probability of Purchase by Product Category • Expected Sales and Transactions • Use store travel time and all competitive differences Step 6 • Map the scores • The distribution of the scores provide visual evidence of the effects of travel time and competitive pressure • Spatial hypotheses can be tested and evaluated Predictive Modeling(Spatial Statistics Steps)

  20. Mapping and Geo-query Map Analysis Framework While discrete sets of points, lines and polygons have served our mapping demands for over 8,000 years and keep us from getting lost… …the expression of mapped data as continuous spatial distributions (surfaces) provides a new foothold for the contextual and numerical analysis of mapped data— “Thinking with Maps”

  21. References Paper available online atwww.innovativegis.com/basis/present/GeoTec08/ • Twisting the Perspective of Map Surfaces — describes the character of spatial distributions through the generation of a customer density surface • Linking Numeric and Geographic Distributions — investigates the link between numeric and geographic distributions of mapped data • Interpolating Spatial Distributions —discusses the basic concepts underlying spatial interpolation • Interpreting Interpolation Results —describes the use of “residual analysis” for evaluating spatial interpolation performance • Characterizing Data Groups —describes the use of “data distance” to derive similarity among the data patterns in a set of map layers • Identifying Data Zones —describes the use of “level-slicing” for classifying locations with a specified data pattern (data zones) • Mapping Data Clusters—describes the use of “clustering” to identify inherent groupings of similar data patterns • Mapping the Future —describes the use of “linear regression” to develop prediction equations relating dependent and independent map variables • Mapping Potential Sales —describes an extensive geo-business application that combines retail competition analysis and product sales prediction

  22. www.innovativegis.com/basis/present/GeoTec08/ …to download this PowerPointslide set

  23. Spatial Data Mining in Geo-Business Weighted Average Calculations for Inverse Distance Weighting (IDW) Spatial Interpolation Technique

  24. …Residual Analysis is used to evaluate interpolation performance (Krig at .03 Normalized Error is best) Average IDW Krig Evaluating Interpolation Performance

More Related