500 likes | 516 Views
High-Performance, Federated and Service-Oriented Geographic Information Systems. Ahmet Sayar ( asayar@cs.indiana.edu ) Advisor: Prof. Geoffrey C. Fox. Outline. Motivations Research Issues Architecture: Federated Service-Oriented Geographic Information System
E N D
High-Performance, Federated and Service-Oriented Geographic Information Systems Ahmet Sayar (asayar@cs.indiana.edu) Advisor: Prof. Geoffrey C. Fox
Outline • Motivations • Research Issues • Architecture: Federated Service-Oriented Geographic Information System • Performance enhancing designs - measurements and analysis • Conclusions
Geographic Information Systems (GIS) • GIS is a system for: creating, storing, sharing, analyzing, manipulating and displaying geo-data and associated attributes. • Inherently requires federation (see the figure) • Autonomy for scalability flexibility and extensibility • Distributed data access for geo-data resources (databases, digital libraries etc.) • Utilizing remote analysis, simulation or visualization tools. • Open Standards • OGC • ISO/TC-211
Motivations • Requirements for – • Interoperable Service-oriented Geographic Information Systems • Necessity for sharing and integrating heterogeneous data and computation resources to produce knowledge. • Uniform data access/query, display and analysis from a single access point • Responsive and interactive information systems • GIS applications require quick response • Emergency early warning systems • Home-land security and natural disasters.
ResearchIssues • Interoperability • Defining component based Service-oriented GIS data Grid framework • Adoption of Open Geographic Standards -data model and services • Applying Web Service principles to GIS data services • Integrating Web Service and Open Geographic Standards • Federation • Capability-based federation of GIS Web Service components • Unified data access/query, display from a single access point through integrated data-views • Addressing high-performance support for responsiveness • Streaming GIS Web Services and Pre-fetching framework • Client-based caching • Parallel processing through attribute based query decomposition
Web Service components and data-flow Service-oriented GIS • WMS are data rendering services providing human comprehensible data (binary map images) • WFS are data services providing data in common data model GML – Geographic Markup Language • behaving as mediator and annotation services. • WMS and WFS have their own type of capability metadata defined by Open Geographic specs. • Inter-service communication is done through “getCapability” service interface. • UDDI based registry services. • Components are Web Services and all control goes through SOAP messages • XML-based query language (standard schema) • Built over: • Web Services standards (WS-I+) and • Open Geographic Standards (OGC and ISO/TC-211) • Consists of two types of online services • Web Map Services (WMS) and Web Feature Services (WFS) • And two types of data: • Binary data –map images (provided by WMS), • Structured-data –GML : content (core data) and presentation (attribute and geometry elements) (provided by WFS) Relation of the components and data flow: GIS WMS GML rendering WFS (mediator) wsdl wsdl Binary data GML getCapability getMap getFeatureInfo getCapability getFeature DescribeFeatureType
Capability-based Federation of Standard GIS Web Service Components Web Map Client Interactive map tools WSDL • Built over the proposed standard Web Service components and common data models • Federation is done by aggregating GIS Web Services’ capabilities metadata • Inspired from OGC’s cascading WMS • Unified data access/query/display from a single access point • Providing application-based hierarchical data definitions • layer based data and service (WMS and WFS) compositions • Capability is basically a metadata about data+service: • Server’s information content and acceptable request parameter values Aggregating WMS (Federator) Stubs Stubs HTTP SOAP WSDL WSDL “REST” WFS + Seismic Rec. WFS + State Bounds … WMS + OnEarth Google Maps
Why Capability metadata • Web Services provide key low level capability but do not define an information or data architecture • These are left to domain specific capabilities metadata and data description language (GML). • Machine and human readable information • Enables easy integration and federation • Enables developing application based standard interactive re-usable tools • for data query display and analysis • Seamless data/access/query
High-performance Support for Responsive GIS Designs, measurements and analysis
Performance Investigation • Interoperability requirements bring up some compliance costs: • Common data model (GML) • Web Services (SOAP protocol for communication) • Approaches: Enhancing the GIS systems’ responsiveness • Data transfer and rendering • Streaming GIS Web Services (1) • Structured/annotated GML data rendering (2) • Federator-oriented approaches • Pre-fetching (3) • Client-based caching (4) • Query decomposition and parallel processing (5) • Testing with large scale Geo-science applications • Earthquake forecasting (PI), • Virtual California (VC) • Aim: Turning compliance requirements into competitiveness
Conventional OGC-GIS systemsBaseline Performance Test • Naïve approach is characterized as • Stateless services • On-demand data access, • Single-threaded and no-caching • Systems developed with Open Geographic Standards have: • High degree of interoperability but poor performance results Test Setup:
(1) Streaming GIS Web-Services • Concern is large-sized XML-structured data transfer • XML representation of data tend to be significantly larger than binary representations • The larger data sizes consume the greater network bandwidth • We still need to use it for interoperability reasons • In initial development of the proposed Service-oriented GIS we used GIS Web Services and SOAP over HTTP as transfer protocol. • BUT, this had some limitations over the performance. • We investigated “Streaming Data Transfer” • topic-based publish-subscribe messaging systems for exchanging SOAP messages and data payloads.
registry UDDI w s d l 2 1 w s d l (A)WMS WFS getFeature 3 (topic, IP, port) Publisher Subscriber server GML GML client Topic-wfs Narada Brokering Server (1) Streaming GIS Web-Services (Cont) • Lines 1, 2 and 3 show classic publish-find-bind triangle of Web Services • SOAP is used for negotiation (line-3) – standard getFeature request • Publisher information in (topic, IP, port) triple is returned. • Publisher streams, subscriber receives. • The performance gain is average 40%
(2) GML Data Processing • Processing XML data: Parsing and rendering to create map images. • Two well-known approaches are document models (DOM) and push models (SAX). • We use pull approach for XML processing: • Parses only what is asked for • No support for document validation (major gains of performance) • Doesn’t build complete object model in memory (unlike DOM) • Contents are returned directly to application from calls to parser (unlike SAX)
WMS WMS WFS WFS WFS Federator User Portal Interactive Tools Processor 2 1 2 PR 1 GML Temp Storage NB Local File System PR: Pre-fetching runner NB: NaradaBrokering WMS: Web Map Service WFS: Web Feature Service (3) Pre-fetching • Getting the GML data before it is needed • Extension for Pre-fetching Module is shown in grey region • Overcomes the network bandwidth problem and repeated data conversions. • This technique is good for infrequently changing archived data • In other case, it might cause consistency problem • Red curve – map rendering over the pre-fetched data (ready to use GML data) • Black curve – map rendering through on-demand fetching PR runs pre-defined task in pre-defined periodicity
(3) Pre-fetching vs. On-demand Fetching • For 100MB, pre-fetching is about 30 times faster conventional on-demand fetching. • The larger the data size the higher the performance gains.
(4) Client-based Caching • Each client has separate caching area allocated. • Application of working-window and locality principles into map images rendering • Clients are differentiated according to the client assigned session-id parameter in the header of queries. • Always keep the least recently-used data • Brings up some overhead to keep up working-window for each client.
Brief Architecture Server-side Create identity card. Update at every request from the client • FormerRequest Class String uuid; /*unique-user-id*/ String bbox; /*bounding box of the user’s last request*/ Double density; /*data size falling into per unit square*/ Vector [] feature_data; /*geometry elements of the last request*/ Register to client table Set identity to message header Client-side ClientWSStub binding; binding = (ClientWSStub ) new ServiceLocator().WMSServices( servaddress)); String sessionID = session.getid(); //uuid-1 String channel_name = “getMapChannel”; /*Add SessionID to the SOAP message’s header*/ binding.setHeader(service_address, channel_name, sessionID); Map mymap = binding.getMap(request);
Why Client-based Caching • Makes stateless GIS Web Services stateful • Allows share workload as equal as possible for the most efficient parallel processing. Comparing with Google-like Map Servers: • In large scale applications it is impossible to cache whole data • Limited storage and computation capabilities • Google-like map servers are fast because • They replace computation with storage. • Pre-making all images and cut up into tiles • They formalize the accepted requests in terms of parameters, and responses in terms of the tile compositions. • BUT, good for only the client-server based applications • It can’t be applied to distributed dynamic data rendering and extensible applications. • They don’t deal with the feature enriched maps enabling attribute-based querying, • And structured/annotated scientific data rendering.
R1 R3 Critical data provider in GML WFS R2 R1 R4 GetFeature requests r1 r2 r3 rPn . . . Critical data falling into partitioned regions GML Cached GML1 GML2 GMLPn . . . . Main query: cached data extraction and rectangulation Critical data layer R1 R2 Layers from Other WFS and WMS R1 R2 (5) Parallel Processing over Client-based Caching Main query cached-data extraction rectangulation - {Rectangles[Ri]} partitioning – {sub-queries [ri]} assigning separate threads assembling the results 1 2 3 Successive request Cached Data 4
(c,d) (c,d) R3 R2 (c, (b+d)/2) (c, (b+d)/2) R1 R4 (a,b) (a,b) ((a+c)/2, b) ((a+c)/2, b) (2) (1) Challenge: Geo-Data Characteristic • A point data is described with location attribute • (x, y) coordinates. • Linestrings, polylines, polygons etc are defined as set of points. • Data sets falling into a queried region is formulated as bounding box (bbox) • Coordinates of a rectangle (a, b, c, d) • Geo-data is characterized as un-evenlydistributed and variable sized according to their locations attributes. • Ex. Human population • Need for advanced techniques for workload sharing !
maxx,maxy, Cached Data Query minx,miny, Attribute-based Query Decomposition • Cached data extraction • Rectangulation over the remaining : R1, R2, R3, R4 • Each rectangle goes through partitioning process. • Blind partitioning • Such as first time queries • Uses default partitioning number • Smart partitioning • client-based caching • FormerRequest Object • All partitions are assigned to separate threads and results are merged to create final response R3 R2 R1 R2 R1 R2 R4 Partition into 4 R1
Smart Partitioning through Client-based Caching • Based-on the locality principles. • Assumption: Former and current requests have similar data density • Cached data area: CD_size_br2= (maxxc - minxc)*(maxyc - minyc) • Main-query area: R_size_br2= (maxx - minx)*(maxy - miny) • Thr: Pre-defined threshold value changing from data to data. • Pn : The number of partitions calculated for a rectangle (maxxc, maxyc) Determining the most efficient number of partitions (Pn) (maxx, maxy) Cache Query (minxc, minyc) (minx, miny) If Pn >= 2 Cut the rectangle into Pn number of equal sized regions.
Assigning Partitions to Workers • Partitions are assigned to the worker nodes in round-robin fashion. • We keep a pool of worker nodes for each feature layer that parallel processing is applied. • According to the algorithm • PN: number of partitions • WN: number of worker nodes in the pool • shareis the number of partitions each worker is supposed to get • Check if there is still remaining partitions waiting • Assignments: • First rmg#of worker nodes assigned share+1 • And others (WN-rmg) are assigned sharenumber of partitions
-110,35,-100,36 GFeature-1 -110,36,-100,37 GFeature-2 -110,37,-100,38 GFeature-3 -110,38,-100,39 GFeature-4 -110,39,-100,40 GFeature-5 Vertical partitioning in case of having 5 partitions
Data Access Timings-No Cached Data- • Tdata access = Tqueryconversion (getFeature to SQL) + TGML conversion + TStreaming the data from WFS to federator + TBuilding GML at federator Federator WFS DB
Overhead and Response Timings ex. case: 10-threaded parallel processing • The performance does not increase in the same ratio at which the thread number increases • Overheads: Query partitioning, sub-query creation, map creation and map transfer. • There is no performance gain for less then a threshold-data size handled. Federator Event-based dynamic map tools WFS WFS DB Browser
Partial Usage of Cached Data (Ex. case:1/2 cached) • There is no performance gain for the small sizes of data due to the overheads. • For 10mb, the proposed system is almost 4 times faster than the ordinary on-demand one-threaded system. • The performance gain increases: • As the data size increases. • As the overlapped cached region increase • 100% overlapping -> look like pre-fetching case WFS DB WFS CT Fedrtr WFS
Conclusions • Streaming data transfer techniques allow data rendering even on partially returned data. • Pull parsing results in best outcomes for XML encoded GML data rendering - Eliminating the requirement of data validation. • Federator’s natural characteristic allowed us develop advanced caching and parallel processing designs. • Pre-fetching and parallel-processing techniques are mutually exclusive. • Best performance outcomes are achieved through pre- fetching but can cause data inconsistency . • Triggering periodicity must be defined carefully. • Parallel-processing techniques’ success is based on how well we share the workload to worker nodes. • Un-evenly distributed and variable sized geo-data characteristics. • We saw that • Application of working-window and locality principles by means of client-based caching. • Parallel processing through attribute-based query decomposition Helped us increase the system responsiveness to a greater extent.
Conclusions – General Framework • Heterogeneous data sources are queried as a single resource • Heterogeneous: Autonomous local resources controlling definition of data • Single resource: Remove the burden of individually accessing each data source with ad-hoc query languages. • WFS-based mediation : • Data and query conversions • Easy extension with new data and service resources • Open Geographic and Web Service standards • No physical data integration • Data always at local source • Easy maintenance of data and high degree of autonomy • Seamless interaction with the system through integrated data views as multi-layered map images
Contributions • A federated Service-oriented Geographic Information Systems framework • Integrating Web Services with Open Geographic Standards to support interoperability at both data and service levels • Production of knowledge from distributed data sources in multi-layered map images. • Hierarchical data definitions through capability metadata federations • Enabling unified interactive data access/query and display. • Investigated performance efficient designs and did detailed benchmarking • Streaming GIS Web Services • Federator-oriented high-performance design techniques • Pre-fetching • Client-based caching : Working-window and locality principles • Parallel processing through attribute-based query decomposition
Acknowledgement • The work described in this presentation is part of the QuakeSim project which is supported by the Advanced Information Systems Technology Program of NASA's Earth-Sun System Technology Office. • GalipAydin: Web Feature Server (WFS)
WMS WMS WMS Capability Federation Map Rendering WFS WFS WFS User Portal Interactive Map-Tools Federator 1 GIS Browser 2 2 3 2 1 1 1. GetCapability (metadata data+service) 2. GetMap (get map data in set of layer(s)) 3. GetFeatureInfo (query the attributes of data) Capability-based Federation of the standard Web Service Components • Application-based hierarchical data: • [Application]- Pattern Informatics • [Layer-1] State-boundary over Satellite • [Data-1] • State-boundary (WFS-1) • [Data-2] • Satellite-Image(WMS-2) • [Layer-2] • Google map (WMS-1) • [Layer-3]- Earthquake-Seismic • [Data-1] • Earthquake-Seismic(WFS-3) • Built over the proposed standard Web Service components and common data models • Unified data access/query/display from a single access point • Providing application-based hierarchical data definitions • layer based data and service (WMS and WFS) compositions • Federation is done by aggregating GIS Web Services’ capabilities metadata • Capability is basically a metadata about data+service: • Server’s information content and acceptable request parameter values a, b, c and d a Sample Layers for PI: • NASA satellite layer • Earthquake-seismic layer • Google Map Layer • State-boundaries Layer c b d Events: - Move, - Zooming in/out - Panning (drag-drop) - Rectangular region - Distance calc. - Attribute querying
Hierarchical data Integrated data-view 1 2 3 1: Google map layer 2: States boundary lines layer 3: seismic data layer Event-based Interactive Tools : Query and data analysis over integrated data views
Integrated views • Event-based querying through integrated views. • WFS-based mediators • XML-based query language • Federation related specific related works (might not be active) • MIX mediation of information using XML • SRB/MCAT (SDSC) • TSIMMIS (Stanford Univ) • XML-based standard queries for the standard services. • Capability gives the list of data provided, attribute lists they can be queried and constraints on the queries to make create valid requests such as getMap, getFeature.) • We do syntactical and structural integration.
Hierarchical data / Integrated data-viewFor IEISS Geo-science Application • Application-based hierarchical data: • [Application]- IEISS • [Layer-1] Gas-pipeline over Satellite • [Data-1] • Gas-pipeline (WFS-1) • [Data-2] • Satellite-Image(WMS-2) • [Layer-2] • Google map (WMS-1) • [Layer-3]- Electric-power • [Data-1] • Electric-power(WFS-3)
Event-based Interactive Map Tools • <event_controller> • <event name="init" class="Path.InitListener" next="map.jsp"/> • <event name="REFRESH" class=" Path.InitListener " next="map.jsp"/> • <event name="ZOOMIN" class=" Path.InitListener " next="map.jsp"/> • <event name="ZOOMOUT" class="Path.InitListener" next="map.jsp"/> • <event name="RECENTER" class="Path.InitListener“next="map.jsp"/> • <event name="RESET" class=" Path.InitListener " next="map.jsp"/> • <event name="PAN" class=" Path.InitListener " next="map.jsp"/> • <event name="INFO" class=" Path.InitListener " next="map.jsp"/> • </event_controller>
WWW Generalizing the Problem Domain Client/User-Query • Query heterogeneous data sources as a single resource • Heterogeneous: local resource controls definition of the data • Single resource: remove the burden of individually accessing each data source • Easy extension with new data and service resources • No real integration of data • Data always at local source • Easy maintenance of data • Seamless interaction with the system • Collaborative decision makings Integrated View federation services Mediator Mediator Mediator DB Files Data in files, HTML, XML/Relational Databases, Spatial Sources/sensors
Such as filter, transformation, reasoning, data-mining, analysis AS Repository AS Tool (ASVS) AS Tool (ASFS) AS Services (user defined) AS Sensor AS Sensor Messages using ASL Generalization of the Proposed Architecture • We need to define Application Specific: • Federator federating the capabilities of distributed ASVS and ASFS to create application-based hierarchy of distributed data and service resources. • Mediators: Query and data format conversions • Data sources maintain their internal structure • Large degree of autonomy • No actual physical data integration • GIS-style information model can be redefined in any application areas such as Chemistry and Astronomy • Application Specific Information Systems (ASIS). • We need to define Application Specific • Language (ASL) -> GML :expressing domain specific features, semantic of data • Feature Service (ASFS) -> WFS :Serving data in common language (ASL) • Visualization Services (ASVS) -> WMS : Visualizes information and provide a way of navigating ASFS compatible/mediated data resources • Capabilities metadata for ASVS and ASFS. Unified data query/access/display Federator ASVS 1 3 1 4 2 2 Mediator Mediator Standard service API Standard service API 3 Capability Federation ASL-Rendering Standard service API
Contributions (Systems Software) • Developing Web Map Server (WMS) in Open Geographic Standards • Extended with Web Service Standards and • Streaming map creation capabilities • Developing GIS Federator • Provides application specific layer-structured hierarchical data as a composition of distributed standard GIS Web Service components • Enable uniform data access and query • Interactive map tools for data display, query and analysis. • Browser and event-based. • Extended with AJAX (Asynchronous Java and XML)