420 likes | 547 Views
High-Performance Federated and Service-Oriented Geographic Information Systems. Ahmet Sayar Advisor: Prof. Geoffrey C. Fox. Outline. Background: Geographic Information Systems and Open Geographic Standards Motivations and Motivating Use Cases Research Issues
E N D
High-Performance Federated and Service-Oriented Geographic Information Systems Ahmet Sayar Advisor: Prof. Geoffrey C. Fox
Outline • Background: Geographic Information Systems and Open Geographic Standards • Motivations and Motivating Use Cases • Research Issues • Architecture: Federated Service-Oriented Geographic Information System • Performance enhancing designs -measurements and analysis • Contribution • Future Work
Geographic Information Systems (GIS) • GIS is a system for: creating, storing, sharing analyzing, manipulating and displaying spatial data and associated attributes. • GIS evaluated from mainframe systems to Desktop to Distributed systems. • Modern GIS require: • Distributed data access for spatial databases • Utilizing remote analysis, simulation or visualization tools. • Problems with traditional distributed GIS approaches: • Distributed nature of the geo-data; various client-server models, databases, HTTP, FTP
Open Geographic Standards • Aim is to make geographic information and services neutral and available across any network, application, or platform. • Two major well-known standard bodies: Open Geospatial Standards (OGC) and ISO/TC211. • OGC Specifications defines online services and data models: • Data Format Specs: Geographic Markup Language (GML) • Service Specs: Web Feature Service (WFS), Web Map Service (WMS) • OGC Services are HTTP-based which has limited data transport capabilities. • HTTP-based services are request-response type services; centralized and synchronous applications.
Motivations • Requirements for Interoperable Service-oriented Geographic Information Systems • Necessity for sharing and integrating heterogeneous data and computation resources • Uniform data access/query/display from a single access point • Responsive and interactive GIS systems • GIS applications require quick response • Emergency early warning systems • Home-land security and natural disasters.
Motivating Use Cases • Earthquake science applications • Pattern Informatics (PI) • Earthquake forecasting code developed by Prof. John Rundle (UC Davis) and collaborators, uses seismic archives. • Virtual California (VC) • Time series analysis code, can be applied to GPS and seismic archives. It can be applied to real-time and archival data. • Interdependent Energy Infrastructure Simulation System (IEISS) – Los Alamos National Laboratory (LANL) • Models infrastructure networks (e.g. electric power systems and natural gas pipelines) and simulates their physical behavior, interdependencies between systems.
ResearchIssues • Interoperability • Adoption of Open Geographic Standards • Applying Web Service principles to GIS data services • Flexibility and Extensibility • The system should bridge GIS and Web Service communities by adapting standards from both • Other GIS applications should be able to consume data without having to do costly format conversions • Federation • Federation of GIS systems • Unified data access/query/display from a single access point • Principles for generalizing the proposed federated GIS system • In terms of components, framework and requirements. • Addressing high-performance support for responsiveness
Interoperable Service-oriented GIS • Composed of two types of online services, Web Map Services (WMS) and Web Feature Services (WFS) • And two types of data: • Binary data –map images (provided by WMS), • Structured-data –GML : content (core data) and presentation (attribute and geometry elements) (provided by WFS) • WMS and WFS have their own type of capability metadata defined by Open Geographic specs. They exchange capabilities through “getCapability” service interface to make valid requests and get valid responses • UDDI based registry services • Components are Web Services and all control goes through SOAP messages Relation of the components and data flow: GIS WMS GML rendering WFS (mediator) wsdl wsdl Binary data GML getCapability getMap getFeatureInfo getCapability getFeature DescribeFeatureType
WMS WMS WMS Capability Federation Map Rendering WFS WFS WFS User Portal Interactive Map-Tools Federator 1 GIS Browser 2 2 3 2 1 1 1. GetCapability (data and operations available on) 2. GetMap (get map data in set of layer(s)) 3. GetFeatureInfo (query the attributes of data) Federated Interoperable GIS • Application-based hierarchical data: • [Application]- IEISS • [Layer-1] Gas-pipeline over Satellite • [Data-1] • Gas-pipeline (WFS-1) • [Data-2] • Satellite-Image(WMS-2) • [Layer-2] • Google map (WMS-1) • [Layer-3]- Electric-power • [Data-1] • Electric-power(WFS-3) • Unified data access/query/display from a single access point • Providing application-based hierarchical data definitions • layer based data and service (WMS and WFS) compositions • Federation is done by aggregating GIS Web Services’ capabilities metadata • Capability is basically a metadata about data+service: • Server’s information content and acceptable request parameter values a Sample Layers for IEISS: • Gas-pipeline • Electric-power • NASA satellite • State-boundaries c b d
Why Capability metadata • Web Services provide key low level capability but do not define an information or data architecture • These are left to domain specific capabilities metadata and data description language (GML). • Machine and human readable information • Enables easy integration and federation • Enables developing application based standard interactive re-usable tools • for data query display and analysis • Seamless data/access/query
Generalizing the Proposed Architecture - I • One can define a GIS-style information model in many application areas such Chemistry and Astronomy • Application Specific Information Systems (ASIS). • We have investigated the requirements and principles to generalize the proposed federated GIS approach. • From GML to ASL (Application Specific - Language) • Data description language in forms of domain specific features • From WFS to ASFS (Feature Services) • Provides data in ASL with standard service interfaces • From WMS to ASVS (Visualization Services) • Domain specific display format definitions and standard services • Visualizes information and provide a way of navigating ASFS compatible databases (cf. GetFeatureInfo for GIS) • Need to define application specific capabilities metadata for ASVS and ASFS.
Such as filter, transformation, reasoning, data-mining, analysis AS Repository AS Tool (ASVS) AS Tool (ASFS) AS Services (user defined) AS Sensor AS Sensor Messages using ASL Generalization of the Proposed Architecture - II • Mediators: Query and data format conversions • ASFS -> provide ASL(structured data covering content and presentation tags). • ASVS -> provide common data representations from ASL, in binary images • Federator federates the capabilities of distributed ASVS and ASFS to create application-based hierarchy of distributed data and service resources ASIS Unified data query/access/display Federator ASVS 1 3 1 4 2 2 Mediator Mediator Standard service API Standard service API 3 Capability Federation ASL-Rendering Standard service API
Performance Investigation • Interoperability requirements bring up some compliance costs: • Common data model (GML) • Web Services (SOAP protocol for communication) • Approaches: Enhancing the GIS systems responsiveness • Streaming GIS Web Services • Pre-fetching • Parallel processing with caching • Testing with large scale science applications using large scale data, and resource consuming processes • Earthquake forecasting (PI), • Virtual California (VC) • Turning compliance requirements into competitiveness
Limits of Conventional OGC-GIS systems • On-demand data access, single-threaded and no-caching • Related projects: Deegree and UMN-Minnesota Map Servers • Baseline performance tests over the systems developed with Open Geographic Standards: • Local-area network – from database to user ends • Small data sets (less than 500KB) response times are ok • For larger data sizes the performance is not enough.
Design & Measurement-1:Large sized structured data transfer • XML representation of data tend to be significantly larger than binary representations • The larger data sizes consume the greater network bandwidth. • In initial development of the proposed SOA based GIS we used GIS Web Services and SOAP over HTTP as transfer protocol. • BUT, this had some limitations over the performance. • We investigated “Streaming Data Transfer”: topic-based publish-subscribe messaging systems for exchanging SOAP messages and data payloads.
registry UDDI w s d l 2 1 w s d l (A)WMS WFS getFeature 3 (topic, IP, port) Publisher Subscriber server GML GML client Topic-wfs Narada Brokering Server Streaming GIS Web-Services • Lines 1, 2 and 3 show classic publish-find-bind triangle of Web Services • SOAP is used for negotiation (line-3) – standard getFeature request • Publisher information in (topic, IP, port) triple is returned. • Publisher streams, subscriber receives. • The performance gain is average 40%
Design & Measurement-2: GML Data Processing • Processing XML data: Parsing and rendering to create map images. • Two well-known approaches are document models (DOM) and push models (SAX). • We use pull approach for XML processing: • Parses only what is asked for • No support for document validation (major gains of performance) • Doesn’t build complete object model in memory (unlike DOM) • Contents are returned directly to application from calls to parser (unlike SAX)
(c,d) (c,d) R3 R2 (c, (b+d)/2) (c, (b+d)/2) R1 R4 (a,b) (a,b) ((a+c)/2, b) ((a+c)/2, b) (2) (1) Geo-Data Characteristic • A data is described with location attribute -(x, y) coordinates. • A set of data is described with bounding box (bbox) • (a, b, c, d) • Geo-data is described as un-evenly distributed and variable sized according to their locations attributes. • Ex. Human population • Cannot share workload evenly • Supporting alternative techniques based on data characteristics • 1. Pre-fetching • 2. Parallel processing with caching through attribute-based query decomposition
WMS WMS WFS WFS WFS Federator User Portal Interactive Tools Processor 2 1 2 PM 1 GML Temp Storage NB Local File System PF: Pre-fetching module NB: NaradaBrokering WMS: Web Map Service WFS: Web Feature Service Design & Measurement-3: Pre-fetching (PM) • Getting the GML data before it is needed • Overcomes the network bandwidth problem and repeated data conversions. • For infrequently changing archived data • In other case it might cause consistency • Red curve – pre-fetching the data (data is brought to federator – ready to use) • Black curve – on-demand fetching the from remote heterogeneous resources PM runs pre-defined task in pre-defined periodicity -independent of the application
Pre-fetching vs. On-demand Fetching • For 10MB, pre-fetching is about 200 times faster conventional on-demand fetching. • The larger the data size the higher the performance gains.
R1 R3 Critical data provider in GML WFS R2 R1 R4 GetFeature requests r1 r2 r3 rPn . . . Critical data falling into partitioned regions GML Cached GML1 GML2 GMLPn . . . . Main query: cached data extraction and rectangulation Critical data layer R1 R2 Layers from Other WFS and WMS R1 R2 Design 4: Parallel Processing and Caching Main query cached-data extraction rectangulation - {Rectangles[Ri]} partitioning – {sub-queries [ri]} assigning separate threads assembling the results 1 2 3 Successive request Cached Data 4
Attribute-based Query Decomposition Over un-cached Regions • Finding the number of partitions need to be made for each rectangle • Calculate the cached data density • Compare with the pre-defined threshold value • defines a region’s max possible size • Then , divide the region into equal sized (in bbox) sub-regions whose size should be less than or equal to the threshold value • Creating sub-queries and assembling the result sets • Sub-queries for the partitions inherit all the attributes from the main query. • The only difference is bbox values -110,35,-100,36 GFeature-1 -110,36,-100,37 GFeature-2 -110,37,-100,38 GFeature-3 -110,38,-100,39 GFeature-4 -110,39,-100,40 GFeature-5
Caching • Caching • Basically removes repeated jobs • One-time caching : Recently fetched data is kept for the successive requests • For each session (browser), separate short-term cache data • Session Tracking for Caching • How servers know what request came from whom? • Mapping Browser-based Sessions to Web Services • Standard Web Service interfaces and message formats • Each request initiated from the same browser will have same sessionID. • Adding new entry to header of SOAP request - “sessionID” • requestObj.setHeader(service_address, channel_name, sessionID)
Measurement-4: Performance Tests – Parallel Processing and Caching • As a result of comparing bbox of cached data and request, there are 3 different possible scenarios • Case 1: No usage of cached-data • Case 2: Complete usage of cached-data • Bets case looks like pre-fetching • Case 3: Partial usage of cached-data
Data Access Timings-No Cached Data- • Tdata access = Tqueryconversion (getFeature to SQL) + TGML conversion + TStreaming the data from WFS to federator + TBuilding GML at federator WFS Federator
Overhead and Response Timings ex. case: 10-threaded parallel processing • The performance does not increase in the same ratio at which the thread number increases • Overheads: Query partitioning, sub-query creation, map creation and map transfer. • There is no performance gain for less then a threshold-data size handled. WFS User-portal Interactive map - tools Federator Browser
Partial Usage of Cached Data (1/2 cached) • There is no performance gain for the small sizes of data due to the overheads. • For 10mb, the proposed system is almost 8 times faster than the ordinary on-demand one-threaded system. • As the data size increases, performance gain increases. • As the overlapped cached region increase, the performance gain increase • 100% overlapping -> look like pre-fetching case Browser
Contributions (Systems Research) • A framework for federated Service-Oriented GIS • Integrated Web Services with Open Geographic Standards for supporting interoperability at both data and application levels • Capability definitions and federation • Principles for Application Specific Information Systems • Conditions and requirements • Investigating performance efficient designs and detailed benchmarking • Streaming GIS Web Services and Pre-fetching • Attribute-based query partitioning and caching for parallel processing • Mapping browser-based session to Web Services • Forecasting workload from the cached-data
Contributions (Systems Software) • Developing Web Map Server (WMS) in Open Geographic Standards • Developing GIS Federator • Interactive map tools for data display, query and analysis. • Sci-Plot (Scientific data plotting) GIS Web Services • To integrate geo-science application data with Geo-data Grid
Future Research Directions • Developing generic framework for application specific information systems –ASIS • Considering semantics of data and services • Distributed capability federation • Capability files and application specific languages • Inter-service communications through capability exchange • Integrating ASIS with science applications • Science plotting services as a gateway between science data grid and applications • Handling processed data • Storage, overlay and association with raw(input) data
Acknowledgement • GalipAydin: Web Feature Server (WFS) • Mehmet Aktas: Universal Description and Discovery Services (UDDI) • The work described in this presentation is part of the QuakeSim project which is supported by the Advanced Information Systems Technology Program of NASA's Earth-Sun System Technology Office. • This collaboration is part of the NASA ACCESS ROSES funded project, Modeling and On-the-fly Solutions in Solid Earth Science.
Request Response ASF(V)S Service Layer getCapability,getFeature,describeFeatureTyp WSDL (2,3) (1,3) Request Handler Composition Mapping: query re-creation Source Connection/Execution (Hetero-Sources) Data/ information Sources Databases, file systems or other remote/local sources . General Structure of AS-ToolsASF(V)S-based mediation Standard Service API • To be concrete let’s analyze WFS-based mediation • Query conversion • From “GetFeature” to local query (ex. SQL for database) • Data set conversion and composition • Local query result to GML • Common service API • GetCapability • GetFeature • DescribeFeatureInfo AS Tool ASF(V)S
Capabilities Federation Capability Files for Standard Services <Capabilities> <Service> <Name> <OnlineResource> <ContactInfo> </Service> <Capability> <Request> <GetCapability> <GetMap> <GetFeaturInfo> </Request> <LayerList> <Layer-1: Satellite img> <Layer-2: gas-pipeline> <Layer-2: Google-map> </LayerList> </Capability> </Capabilities> <Capabilities> <Service> <Name> <OnlineResource> <ContactInfo> </Service> <Capability> <Request> <GetCapability> <GetFeature> <DescribeFeaturType> </Request> <DataList> <Data-1: gas-pipeline> <Data-2: electric-power> <Data-2: other-data> </ DataList > </Capability> </Capabilities> WMS WFS Metadata about provided data/information Operations - Web Service Interfaces General Service Metadata
Parallel processing with caching through attribute-based query decomposition - I (maxxc, maxyc) • Attribute is bounding box (bbox) defined as • (minx, miny, maxx, maxy) • CD_size_br2= (maxxc - minxc)*(maxyc - minyc) • R_size_br2= (maxx - minx)*(maxy - miny) • And pre-defined thr (threshold) value to determine if partitioning is required for a rectangle (bbox) • Pn : The number of partitions calculated for a rectangle (minx, miny) (minxc, minyc) (minx, miny) 1. Determining the number of partitions (Pn)
maxx, maxy Sy 1 2 Pn minx,miny Parallel processing with caching through attribute-based query decomposition - II • 2. How to partition a rectangle in bbox • We know the rectangle’s bbox and Pn. • Since we still don’t know the workload falls in that bbox earlier, we partition that rectangle into equal sizes • There are two options here, vertical partitioning and horizontal partitioning. Let’s pick vertical and explain the algorithm: Calculating the bboxes of the partitioned regions: for (i=0; i<Pn*sy; i=i+sy;) print( minx, miny – i, maxx, maxy-(i+sy) ) ; Partitioning the rectangle along the coordinate y
Creating queries for these bbox values Decomposing the rectangle according to Pn and sy -110, 35, -100, 36 GetFeature-1 A rectangle from the rectangulation process -110, 36, -100, 37 GetFeature-2 GetFeature-3 -110, 35, -100, 40 -110, 37, -100, 38 -110, 38, -100, 39 GetFeature-4 -110, 39, -100, 40 GetFeature-5 Parallel processing with caching through attribute-based query decomposition - III • 3. How to created sub-queries • After having partitioned regions’ bbox values printed in previous step, corresponding sub-queries are created. • Each partition is differentiated by their bbox values calculated above. Other attributes are inherited from the main query. • Ex: main query bbox is “-110, 35, -100, 40” and let’s assume we found out that Pn=5 -110,35,-100,36 GetFeature-1
R1 R3 Critical data provider in GML WFS R2 R1 R4 GetFeature requests r1 r2 r3 rPn . . . Critical data falling into partitioned regions GML Cached GML1 GML2 GMLPn . . . . Main query: cached data extraction and rectangulation Critical data layer R1 R2 Layers from Other WFS and WMS R1 R2 Performance Tests – Based on Case Scenarios • As a result of comparing bbox of cached data and request • (1) No usage of cached-data • (2)-(3) Complete usage of cached-data • (4) Partial usage of cached-data 1 2 3 Successive request Cached Data 4 Main query >---rectangulation---> Rectangles[Rs] >---partition---> sub-queries [rs]