500 likes | 516 Views
Explore the design, measurements, and analysis of a Federated Service-Oriented GIS architecture for efficient data sharing and integration. This research addresses key issues like interoperability, federation, and high-performance support to enhance responsiveness and scalability. Utilizing components such as WMS and WFS, the study focuses on achieving seamless data access and query through innovative approaches like streaming services, pre-fetching, and client-based caching. Investigate the potential of capability-based federation to elevate GIS systems' performance.
E N D
High-Performance, Federated and Service-Oriented Geographic Information Systems Ahmet Sayar (asayar@cs.indiana.edu) Advisor: Prof. Geoffrey C. Fox
Outline • Motivations • Research Issues • Architecture: Federated Service-Oriented Geographic Information System • Performance enhancing designs - measurements and analysis • Conclusions
Geographic Information Systems (GIS) • GIS is a system for: creating, storing, sharing, analyzing, manipulating and displaying geo-data and associated attributes. • Inherently requires federation (see the figure) • Autonomy for scalability flexibility and extensibility • Distributed data access for geo-data resources (databases, digital libraries etc.) • Utilizing remote analysis, simulation or visualization tools. • Open Standards • OGC • ISO/TC-211
Motivations • Requirements for – • Interoperable Service-oriented Geographic Information Systems • Necessity for sharing and integrating heterogeneous data and computation resources to produce knowledge. • Uniform data access/query, display and analysis from a single access point • Responsive and interactive information systems • GIS applications require quick response • Emergency early warning systems • Home-land security and natural disasters.
ResearchIssues • Interoperability • Defining component based Service-oriented GIS data Grid framework • Adoption of Open Geographic Standards -data model and services • Applying Web Service principles to GIS data services • Integrating Web Service and Open Geographic Standards • Federation • Capability-based federation of GIS Web Service components • Unified data access/query, display from a single access point through integrated data-views • Addressing high-performance support for responsiveness • Streaming GIS Web Services and Pre-fetching framework • Client-based caching • Parallel processing through attribute based query decomposition
Web Service components and data-flow Service-oriented GIS • WMS are data rendering services providing human comprehensible data (binary map images) • WFS are data services providing data in common data model GML – Geographic Markup Language • behaving as mediator and annotation services. • WMS and WFS have their own type of capability metadata defined by Open Geographic specs. • Inter-service communication is done through “getCapability” service interface. • UDDI based registry services. • Components are Web Services and all control goes through SOAP messages • XML-based query language (standard schema) • Built over: • Web Services standards (WS-I+) and • Open Geographic Standards (OGC and ISO/TC-211) • Consists of two types of online services • Web Map Services (WMS) and Web Feature Services (WFS) • And two types of data: • Binary data –map images (provided by WMS), • Structured-data –GML : content (core data) and presentation (attribute and geometry elements) (provided by WFS) Relation of the components and data flow: GIS WMS GML rendering WFS (mediator) wsdl wsdl Binary data GML getCapability getMap getFeatureInfo getCapability getFeature DescribeFeatureType
Capability-based Federation of Standard GIS Web Service Components Web Map Client Interactive map tools WSDL • Built over the proposed standard Web Service components and common data models • Federation is done by aggregating GIS Web Services’ capabilities metadata • Inspired from OGC’s cascading WMS • Unified data access/query/display from a single access point • Providing application-based hierarchical data definitions • layer based data and service (WMS and WFS) compositions • Capability is basically a metadata about data+service: • Server’s information content and acceptable request parameter values Aggregating WMS (Federator) Stubs Stubs HTTP SOAP WSDL WSDL “REST” WFS + Seismic Rec. WFS + State Bounds … WMS + OnEarth Google Maps
Why Capability metadata • Web Services provide key low level capability but do not define an information or data architecture • These are left to domain specific capabilities metadata and data description language (GML). • Machine and human readable information • Enables easy integration and federation • Enables developing application based standard interactive re-usable tools • for data query display and analysis • Seamless data/access/query
High-performance Support for Responsive GIS Designs, measurements and analysis
Performance Investigation • Interoperability requirements bring up some compliance costs: • Common data model (GML) • Web Services (SOAP protocol for communication) • Approaches: Enhancing the GIS systems’ responsiveness • Data transfer and rendering • Streaming GIS Web Services (1) • Structured/annotated GML data rendering (2) • Federator-oriented approaches • Pre-fetching (3) • Client-based caching (4) • Query decomposition and parallel processing (5) • Testing with large scale Geo-science applications • Earthquake forecasting (PI), • Virtual California (VC) • Aim: Turning compliance requirements into competitiveness
Conventional OGC-GIS systemsBaseline Performance Test • Naïve approach is characterized as • Stateless services • On-demand data access, • Single-threaded and no-caching • Systems developed with Open Geographic Standards have: • High degree of interoperability but poor performance results Test Setup:
(1) Streaming GIS Web-Services • Concern is large-sized XML-structured data transfer • XML representation of data tend to be significantly larger than binary representations • The larger data sizes consume the greater network bandwidth • We still need to use it for interoperability reasons • In initial development of the proposed Service-oriented GIS we used GIS Web Services and SOAP over HTTP as transfer protocol. • BUT, this had some limitations over the performance. • We investigated “Streaming Data Transfer” • topic-based publish-subscribe messaging systems for exchanging SOAP messages and data payloads.
registry UDDI w s d l 2 1 w s d l (A)WMS WFS getFeature 3 (topic, IP, port) Publisher Subscriber server GML GML client Topic-wfs Narada Brokering Server (1) Streaming GIS Web-Services (Cont) • Lines 1, 2 and 3 show classic publish-find-bind triangle of Web Services • SOAP is used for negotiation (line-3) – standard getFeature request • Publisher information in (topic, IP, port) triple is returned. • Publisher streams, subscriber receives. • The performance gain is average 40%
(2) GML Data Processing • Processing XML data: Parsing and rendering to create map images. • Two well-known approaches are document models (DOM) and push models (SAX). • We use pull approach for XML processing: • Parses only what is asked for • No support for document validation (major gains of performance) • Doesn’t build complete object model in memory (unlike DOM) • Contents are returned directly to application from calls to parser (unlike SAX)
WMS WMS WFS WFS WFS Federator User Portal Interactive Tools Processor 2 1 2 PR 1 GML Temp Storage NB Local File System PR: Pre-fetching runner NB: NaradaBrokering WMS: Web Map Service WFS: Web Feature Service (3) Pre-fetching • Getting the GML data before it is needed • Extension for Pre-fetching Module is shown in grey region • Overcomes the network bandwidth problem and repeated data conversions. • This technique is good for infrequently changing archived data • In other case, it might cause consistency problem • Red curve – map rendering over the pre-fetched data (ready to use GML data) • Black curve – map rendering through on-demand fetching PR runs pre-defined task in pre-defined periodicity
(3) Pre-fetching vs. On-demand Fetching • For 100MB, pre-fetching is about 30 times faster conventional on-demand fetching. • The larger the data size the higher the performance gains.
(4) Client-based Caching • Each client has separate caching area allocated. • Application of working-window and locality principles into map images rendering • Clients are differentiated according to the client assigned session-id parameter in the header of queries. • Always keep the least recently-used data • Brings up some overhead to keep up working-window for each client.
Brief Architecture Server-side Create identity card. Update at every request from the client • FormerRequest Class String uuid; /*unique-user-id*/ String bbox; /*bounding box of the user’s last request*/ Double density; /*data size falling into per unit square*/ Vector [] feature_data; /*geometry elements of the last request*/ Register to client table Set identity to message header Client-side ClientWSStub binding; binding = (ClientWSStub ) new ServiceLocator().WMSServices( servaddress)); String sessionID = session.getid(); //uuid-1 String channel_name = “getMapChannel”; /*Add SessionID to the SOAP message’s header*/ binding.setHeader(service_address, channel_name, sessionID); Map mymap = binding.getMap(request);
Why Client-based Caching • Makes stateless GIS Web Services stateful • Allows share workload as equal as possible for the most efficient parallel processing. Comparing with Google-like Map Servers: • In large scale applications it is impossible to cache whole data • Limited storage and computation capabilities • Google-like map servers are fast because • They replace computation with storage. • Pre-making all images and cut up into tiles • They formalize the accepted requests in terms of parameters, and responses in terms of the tile compositions. • BUT, good for only the client-server based applications • It can’t be applied to distributed dynamic data rendering and extensible applications. • They don’t deal with the feature enriched maps enabling attribute-based querying, • And structured/annotated scientific data rendering.
R1 R3 Critical data provider in GML WFS R2 R1 R4 GetFeature requests r1 r2 r3 rPn . . . Critical data falling into partitioned regions GML Cached GML1 GML2 GMLPn . . . . Main query: cached data extraction and rectangulation Critical data layer R1 R2 Layers from Other WFS and WMS R1 R2 (5) Parallel Processing over Client-based Caching Main query cached-data extraction rectangulation - {Rectangles[Ri]} partitioning – {sub-queries [ri]} assigning separate threads assembling the results 1 2 3 Successive request Cached Data 4
(c,d) (c,d) R3 R2 (c, (b+d)/2) (c, (b+d)/2) R1 R4 (a,b) (a,b) ((a+c)/2, b) ((a+c)/2, b) (2) (1) Challenge: Geo-Data Characteristic • A point data is described with location attribute • (x, y) coordinates. • Linestrings, polylines, polygons etc are defined as set of points. • Data sets falling into a queried region is formulated as bounding box (bbox) • Coordinates of a rectangle (a, b, c, d) • Geo-data is characterized as un-evenlydistributed and variable sized according to their locations attributes. • Ex. Human population • Need for advanced techniques for workload sharing !
maxx,maxy, Cached Data Query minx,miny, Attribute-based Query Decomposition • Cached data extraction • Rectangulation over the remaining : R1, R2, R3, R4 • Each rectangle goes through partitioning process. • Blind partitioning • Such as first time queries • Uses default partitioning number • Smart partitioning • client-based caching • FormerRequest Object • All partitions are assigned to separate threads and results are merged to create final response R3 R2 R1 R2 R1 R2 R4 Partition into 4 R1
Smart Partitioning through Client-based Caching • Based-on the locality principles. • Assumption: Former and current requests have similar data density • Cached data area: CD_size_br2= (maxxc - minxc)*(maxyc - minyc) • Main-query area: R_size_br2= (maxx - minx)*(maxy - miny) • Thr: Pre-defined threshold value changing from data to data. • Pn : The number of partitions calculated for a rectangle (maxxc, maxyc) Determining the most efficient number of partitions (Pn) (maxx, maxy) Cache Query (minxc, minyc) (minx, miny) If Pn >= 2 Cut the rectangle into Pn number of equal sized regions.
Assigning Partitions to Workers • Partitions are assigned to the worker nodes in round-robin fashion. • We keep a pool of worker nodes for each feature layer that parallel processing is applied. • According to the algorithm • PN: number of partitions • WN: number of worker nodes in the pool • shareis the number of partitions each worker is supposed to get • Check if there is still remaining partitions waiting • Assignments: • First rmg#of worker nodes assigned share+1 • And others (WN-rmg) are assigned sharenumber of partitions
-110,35,-100,36 GFeature-1 -110,36,-100,37 GFeature-2 -110,37,-100,38 GFeature-3 -110,38,-100,39 GFeature-4 -110,39,-100,40 GFeature-5 Vertical partitioning in case of having 5 partitions
Data Access Timings-No Cached Data- • Tdata access = Tqueryconversion (getFeature to SQL) + TGML conversion + TStreaming the data from WFS to federator + TBuilding GML at federator Federator WFS DB
Overhead and Response Timings ex. case: 10-threaded parallel processing • The performance does not increase in the same ratio at which the thread number increases • Overheads: Query partitioning, sub-query creation, map creation and map transfer. • There is no performance gain for less then a threshold-data size handled. Federator Event-based dynamic map tools WFS WFS DB Browser
Partial Usage of Cached Data (Ex. case:1/2 cached) • There is no performance gain for the small sizes of data due to the overheads. • For 10mb, the proposed system is almost 4 times faster than the ordinary on-demand one-threaded system. • The performance gain increases: • As the data size increases. • As the overlapped cached region increase • 100% overlapping -> look like pre-fetching case WFS DB WFS CT Fedrtr WFS
Conclusions • Streaming data transfer techniques allow data rendering even on partially returned data. • Pull parsing results in best outcomes for XML encoded GML data rendering - Eliminating the requirement of data validation. • Federator’s natural characteristic allowed us develop advanced caching and parallel processing designs. • Pre-fetching and parallel-processing techniques are mutually exclusive. • Best performance outcomes are achieved through pre- fetching but can cause data inconsistency . • Triggering periodicity must be defined carefully. • Parallel-processing techniques’ success is based on how well we share the workload to worker nodes. • Un-evenly distributed and variable sized geo-data characteristics. • We saw that • Application of working-window and locality principles by means of client-based caching. • Parallel processing through attribute-based query decomposition Helped us increase the system responsiveness to a greater extent.
Conclusions – General Framework • Heterogeneous data sources are queried as a single resource • Heterogeneous: Autonomous local resources controlling definition of data • Single resource: Remove the burden of individually accessing each data source with ad-hoc query languages. • WFS-based mediation : • Data and query conversions • Easy extension with new data and service resources • Open Geographic and Web Service standards • No physical data integration • Data always at local source • Easy maintenance of data and high degree of autonomy • Seamless interaction with the system through integrated data views as multi-layered map images
Contributions • A federated Service-oriented Geographic Information Systems framework • Integrating Web Services with Open Geographic Standards to support interoperability at both data and service levels • Production of knowledge from distributed data sources in multi-layered map images. • Hierarchical data definitions through capability metadata federations • Enabling unified interactive data access/query and display. • Investigated performance efficient designs and did detailed benchmarking • Streaming GIS Web Services • Federator-oriented high-performance design techniques • Pre-fetching • Client-based caching : Working-window and locality principles • Parallel processing through attribute-based query decomposition
Acknowledgement • The work described in this presentation is part of the QuakeSim project which is supported by the Advanced Information Systems Technology Program of NASA's Earth-Sun System Technology Office. • GalipAydin: Web Feature Server (WFS)
WMS WMS WMS Capability Federation Map Rendering WFS WFS WFS User Portal Interactive Map-Tools Federator 1 GIS Browser 2 2 3 2 1 1 1. GetCapability (metadata data+service) 2. GetMap (get map data in set of layer(s)) 3. GetFeatureInfo (query the attributes of data) Capability-based Federation of the standard Web Service Components • Application-based hierarchical data: • [Application]- Pattern Informatics • [Layer-1] State-boundary over Satellite • [Data-1] • State-boundary (WFS-1) • [Data-2] • Satellite-Image(WMS-2) • [Layer-2] • Google map (WMS-1) • [Layer-3]- Earthquake-Seismic • [Data-1] • Earthquake-Seismic(WFS-3) • Built over the proposed standard Web Service components and common data models • Unified data access/query/display from a single access point • Providing application-based hierarchical data definitions • layer based data and service (WMS and WFS) compositions • Federation is done by aggregating GIS Web Services’ capabilities metadata • Capability is basically a metadata about data+service: • Server’s information content and acceptable request parameter values a, b, c and d a Sample Layers for PI: • NASA satellite layer • Earthquake-seismic layer • Google Map Layer • State-boundaries Layer c b d Events: - Move, - Zooming in/out - Panning (drag-drop) - Rectangular region - Distance calc. - Attribute querying
Hierarchical data Integrated data-view 1 2 3 1: Google map layer 2: States boundary lines layer 3: seismic data layer Event-based Interactive Tools : Query and data analysis over integrated data views
Integrated views • Event-based querying through integrated views. • WFS-based mediators • XML-based query language • Federation related specific related works (might not be active) • MIX mediation of information using XML • SRB/MCAT (SDSC) • TSIMMIS (Stanford Univ) • XML-based standard queries for the standard services. • Capability gives the list of data provided, attribute lists they can be queried and constraints on the queries to make create valid requests such as getMap, getFeature.) • We do syntactical and structural integration.
Hierarchical data / Integrated data-viewFor IEISS Geo-science Application • Application-based hierarchical data: • [Application]- IEISS • [Layer-1] Gas-pipeline over Satellite • [Data-1] • Gas-pipeline (WFS-1) • [Data-2] • Satellite-Image(WMS-2) • [Layer-2] • Google map (WMS-1) • [Layer-3]- Electric-power • [Data-1] • Electric-power(WFS-3)
Event-based Interactive Map Tools • <event_controller> • <event name="init" class="Path.InitListener" next="map.jsp"/> • <event name="REFRESH" class=" Path.InitListener " next="map.jsp"/> • <event name="ZOOMIN" class=" Path.InitListener " next="map.jsp"/> • <event name="ZOOMOUT" class="Path.InitListener" next="map.jsp"/> • <event name="RECENTER" class="Path.InitListener“next="map.jsp"/> • <event name="RESET" class=" Path.InitListener " next="map.jsp"/> • <event name="PAN" class=" Path.InitListener " next="map.jsp"/> • <event name="INFO" class=" Path.InitListener " next="map.jsp"/> • </event_controller>
WWW Generalizing the Problem Domain Client/User-Query • Query heterogeneous data sources as a single resource • Heterogeneous: local resource controls definition of the data • Single resource: remove the burden of individually accessing each data source • Easy extension with new data and service resources • No real integration of data • Data always at local source • Easy maintenance of data • Seamless interaction with the system • Collaborative decision makings Integrated View federation services Mediator Mediator Mediator DB Files Data in files, HTML, XML/Relational Databases, Spatial Sources/sensors
Such as filter, transformation, reasoning, data-mining, analysis AS Repository AS Tool (ASVS) AS Tool (ASFS) AS Services (user defined) AS Sensor AS Sensor Messages using ASL Generalization of the Proposed Architecture • We need to define Application Specific: • Federator federating the capabilities of distributed ASVS and ASFS to create application-based hierarchy of distributed data and service resources. • Mediators: Query and data format conversions • Data sources maintain their internal structure • Large degree of autonomy • No actual physical data integration • GIS-style information model can be redefined in any application areas such as Chemistry and Astronomy • Application Specific Information Systems (ASIS). • We need to define Application Specific • Language (ASL) -> GML :expressing domain specific features, semantic of data • Feature Service (ASFS) -> WFS :Serving data in common language (ASL) • Visualization Services (ASVS) -> WMS : Visualizes information and provide a way of navigating ASFS compatible/mediated data resources • Capabilities metadata for ASVS and ASFS. Unified data query/access/display Federator ASVS 1 3 1 4 2 2 Mediator Mediator Standard service API Standard service API 3 Capability Federation ASL-Rendering Standard service API
Contributions (Systems Software) • Developing Web Map Server (WMS) in Open Geographic Standards • Extended with Web Service Standards and • Streaming map creation capabilities • Developing GIS Federator • Provides application specific layer-structured hierarchical data as a composition of distributed standard GIS Web Service components • Enable uniform data access and query • Interactive map tools for data display, query and analysis. • Browser and event-based. • Extended with AJAX (Asynchronous Java and XML)