Metadata, Provenance and Web Service for Spatial Analysis -- the case of spatial weights

Metadata, Provenance and Web Service for Spatial Analysis --the case of spatial weights Luc Anselin, Sergio Rey, Wenwen Li GeoDa Center School of Geographical Sciences and Urban Planning Arizona State University

Some Specific Project Goals • Integrate and sustain a core set of composable, interoperable, manageable, and reusable CyberGIS software elements based on community-driven and open source strategies

Challenge • most current spatial analysis/spatial econometrics software written for single CPU • rethink and rewrite analytical, algorithmic and processing facilities to integrate into a cyberinfrastructure • address lack of interoperability

Spatial Econometrics Workbench • framework for supporting spatial econometric research in a cyberscience era (Anselin and Rey, IJGIS 2012) • Leverage PySALand CyberGIS • Support scientific workflow

PySAL • open source library of Python routines for spatial analysis: geocomputation, spatial weights, spatial autocorrelation, spatial econometrics, regionalization • http://pysal.org • hosted on github

PySAL Progress Report • current version is 1.6 (7th release) • 3.5 years of on-time bi-annual releases • 20,000+ downloads (10,000 in 2012) • recognized in open source scientific community - Anaconda

Anaconda for big data analytics

Migrating to CyberGIS • performance = need for parallelization + refined algorithms • interoperability = provide functionality as web services • replicability: need for metadata and provenance tracking

Example: Spatial Weights • includes spatial data source, type of weights (e.g., contiguity, distance), any standardization or manipulation (e.g., higher order)

Lack of Interoperability • different implementations • no standards • duplication of efforts • hinders interoperability and workflow chaining

Example: Weights Formats in PySAL

Example: PySAL spgreg what do we know about south_k6.gwt and south_ep_k20.kwt

Conceptual Framework • separate data source from operations • data source: polygon or coordinate files with standard metadata (projection, origin, etc.) • operations: weights metadata

weights vocabulary

weights metadata structure (wmd)

Web service implementation(OGC WPS) • wraps PySAL weights module • (re)creates weights object from information in wmd file • makes weights object available as a file

wmd file (json) Weights Parser PySAL Dispatcher Weights Output Metadata Workflow

Illustration

Generate Weights from Shapefile • NAT.shp available on server • output format = gal

Get Request • http://spatial.gdta.asu.edu/cgi-bin/wps.cgi?request=Execute&service=WPS&version=1.0.0&identifier=weights_ws&status=false&datainputs=[outputformat=gal;metadata={"input1":{"type":"shp","uri":"http://toae.org/pub/NAT.shp"},"weight_type":"rook","transform":"O","parameters":{"p":2,"k":4}}] metadata input

Server Response

Sample gal output http://spatial.gdta.asu.edu/wpsoutput/e66df128-14ed-11e3-bde9-0050455c0671.gal

metadata (wmd) file http://spatial.gdta.asu.edu/wpsoutput/e66df128-14ed-11e3-bde9-0050455c0671.wmd

Performance Evaluation • How does PySAL scale when the amount of input data increases? • Is the overhead of web service framework acceptable? • How does the web service framework scale in handling massive concurrent requests?

Scale-up vs. Scale-out solution • Scale-up • High-end computer • Configuration • Processor 2 x 2.93 GHz Quad-Core Intel Xeon • Memory 16 GB 1066 MHz DDR3 ECC • Software Mac OS X Lion 10.7.4 (11E53) • Scale-out: • Web server cluster

Web Server Cluster

Performance • experiment using grid layout for N = 10,000 to N = 100,000 • rook contiguity and k nearest neighbors (k = 4) • input shape files on server in Utah, web service on server at ASU

Experiment 1 • Timing: average over 5 experiments • web server overhead, data transfer and computation • explore effect of data size

time for rook and KNN contiguity

Experiment 2 Scalability of web service framework High-end computer (8-cores) Cluster (4 computing nodes, each has 2-core) Total processing time Speed up

Total processing time

Speed-up

Experiment 3 Scalability of the cluster by adding more computing nodes Average response time 128 concurrent requests Dataset: 10,000 polygons

Scalability - cluster

Next Steps

Towards a Standard • refine specification: flexible, expandable, deal with edge cases • improve performance (parallelization) • implement seek operations on distributed files • interoperability with other software

Thank you!

Metadata, Provenance and Web Service for Spatial Analysis -- the case of spatial weights

Metadata, Provenance and Web Service for Spatial Analysis -- the case of spatial weights

Presentation Transcript

Spatial Analysis

Spatial Analysis

The spatial analysis of Congress

Spatial Analysis

Spatial Analysis

Spatial Analysis

The Spatial Web

Spatial Analysis

Spatial Analysis

Introduction to Spatial Analysis and Spatial Modeling

Spatial Analysis – A Case Study

Metadata with Spatial Data

GIS and Spatial Analysis

Spatial Analysis

Spatial Analysis

Spatial Analysis:

Spatial Analysis

Spatial Analysis

Definition of Spatial Analysis