370 likes | 408 Views
SensorGrid High Performance Web Service Architecture for Geographic Information Systems. Thesis Proposal Galip Aydin gaydin@cs.indiana.edu. Outline. Introduction Motivations SensorGrid Architecture Research Issues and Goals Contributions. Geographic Information Systems.
E N D
SensorGridHigh Performance Web Service Architecture for Geographic Information Systems Thesis Proposal Galip Aydingaydin@cs.indiana.edu
Outline • Introduction • Motivations • SensorGrid Architecture • Research Issues and Goals • Contributions
Geographic Information Systems • A geographic information system (GIS) is a system for creating and managing spatial data and associated attributes. • A computer system capable of • integrating, • storing, • editing, • analyzing, • and displaying geographically-referenced information. • A "smart map" tool that allows users to • create interactive queries (user created searches), • analyze the spatial information, • and edit data. • Maps are created by overlaying various geospatial features.
Traditional GIS approach • Mostly desktop applications, require expertise and high amount of resources. • Centralized server-client models for web-based GIS environments. • Cross-vendor or cross-product interoperability is not possible without costly format conversions. • Most of the applications consume archived data but with the advancements of the sensors new applications that consume real-time data are appearing in abundance.
Traditional GIS approach (contd.) • Limitations • Distributed nature of geospatial data. • Proprietary data formats, and service methodologies. • Lack of interoperable services. • Problems • Assembling data from distributed sources • Format conversions • Amount of resources for geoprocessing
Open GIS Standards • Several standards bodies started developing data standards and implementation specifications for geospatial and location based services. • The goal is to make geographic information and services neutral and available across any network, application, or platform. • Two major organizations are Open Geospatial Consortium (OGC) and ISO/TC211.
OGC • Supports interoperable solutions that "geo-enable" the Web. Several specifications: • Geospatial Data: Geography Markup Language (GML) • Sensors: • Metadata – SensorML • Measurements – Observations & Measurements (GML extension) • Services: • Web Feature Service • Web Map Service • Web Coverage Service etc.
Issues with Open Standards • HTTP GET/POST based services; limited data transport capabilities (HTTP, FTP, e-mail, files etc.) • Not Web Services; tightly coupled, point to point communication results in centralized, synchronous applications. • High-end scientific and complex GIS apps require: • Asynchronous communication models to cope with the high number of participants and long-running codes. • Transfer of large data between services. • Coupling data sources and high performance tools. • Orchestrating multiple services for solving complex problems.
Motivation 1 • Complex problems require GIS applications and services to collaborate. • Lack of service orchestration capabilities • Lack of service oriented practice causes hard to manage distributed practices especially when large number of participants are involved. • Coupling data sources to GIS applications • There are various types of distributed geospatial data sources used by the GIS applications and we need a flexible computing environment for seamless integration.
Motivation 2 • Data transport requirements • GIS require large amount of data to be transported between sources and consumers. Current approaches do not provide a scalable and flexible solution. • High performance • It is a must, not an option for most scientific GIS applications. For instance evaluating pre-seismic real-time messages may lead to early warnings. • Proliferation of Sensors • Sensors introduce new challenges to the current GIS applications in terms of data collection, management and processing.
Motivating Examples • Pattern Informatics • Earthquake forecasting code developed by Prof. John Rundle (UC Davis) and collaborators. • Uses seismic archives. • Regularized Dynamic Annealing Hidden Markov Method (RDAHMM) • Time series analysis code by Dr. Robert Granat (JPL). • Can be applied to GPS and seismic archives. • Can be applied to real-time data. • Interdependent Energy Infrastructure Simulation System (IEISS) • Models infrastructure networks (e.g. electric power systems and natural gas pipelines) and simulates their physical behavior, interdependencies between systems.
SOA for GIS • Utilize Web Services to realize Service Oriented Architecture, Open GIS standards for “data format and service interfaces” for interoperability. • We have built WS versions of: • WFS – access to geospatial data on various databases • WMS (A. Sayar) – visualization of feature data • Extended UDDI and WS-Context (M. Aktas) - supporting dynamic service metadata and services registry. • Problems with simple WS version • Basic WFS; request-response, not asynchronous. • Performance: GI Services are not designed to handle non-trivial data transfers. • XML: Size of the geospatial data increases with XML encoding.
GIS Data Grids • Data is in the heart of every GIS. • Easy and fast access to distributed geospatial data is crucial especially in time of crisis or disasters. • Points to consider: • High performance transport • Real-time observations from distributed sensors. • Unified access to geospatial data stored in relational DBs, XML DBs and ESRI Shape files. • Leverage OGC Web Feature Service to provide standard access and query interfaces. • Develop Web Service version of WFS and modify/extend for high performance. • Fast population of GML Feature Collections from data in the various DBs.
GIS Data Services • WFS Specification; transporting high volume geospatial data encoded in GML is not trivial with HTTP methods or pure Web Services. • Researching use of publish/subscribe based messaging system for large data transport and fast response. • Issues: • Support for multiple clients, creating topics on the fly. • Dynamic session metadata: Keeping session state and metadata for each client and request. Use of WS-Context. • Prioritize client requests.
Real-Time Sensors • Sensors are everywhere; they are being deployed as sensor networks for more accurate measurements. • With the proliferation of the sensors, data collection and processing paradigms are changing. • Most scientific geo-applications are designed to work with archived data. • Critical Infrastructure Systems and Crisis management environments require fast and accurate access to real-time sources and a flexible/pluggable architecture for geoprocessing of the data.
Use Case - GPS Sensors • A good example for scientific sensors are GPS station networks. • GPS measurements are used for determining seismic events, understanding long-term crustal movement etc. • We have access to SOPAC GPS networks: • Currently only socket based RYO format access is available, but not utilized! • We provide multiple format (RYO, ASCII, GML) real-time streaming access by using NaradaBrokering topics. • OHIO and chain of filters. • We are investigating use of topic based messaging systems for managing real-time data streams.
SensorGrid Architecture • Support both archived and real-time geospatial data access. • Support alternate transport and representation schemes. Use topic based messaging infrastructure for large volume data transport. • WS-Context for managing dynamic service metadata. • UDDI based FTHPIS as services registry. • Streaming WFS for serving archived data. • Streaming SCS for serving sensor metadata and sensor measurements.
Framework for HP WS • Research improving Web Service performance by using better transport protocol and XML representation scheme. • Virtualize representation and protocol by binding SOAP to message-oriented middleware. • Handlers will negotiate protocol and convert messages between different representations. • WS-Context for keeping session metadata related to methodology and specific parameters.
Negotiation Protocol • Design a negotiation protocol for web services to negotiate: • Transport protocol • HTTP over TCP, Parallel TCP, UDP … • Efficient representation of XML • BXSA, bnux, BXML, MTOM, Fast Infoset, Millau, XOP, DFDL, Fast Web Services, … • Other (Security etc.) • Try to develop strategies for determining • Best available protocol • Best representation for a given communication. • We will investigate use/extend of WS-Policy to build a negotiation protocol. • We will not develop a binary representation method but build a framework that supports multiple binary formats.
Research Issues 1 • Applying Web Service principles to GIS data services • We have built a WS version of WFS • Not suitable for large data sets and where quick response is required • High Performance • Should support HP data transport for GIS services. • Interoperability • The system should bridge GIS and Web Service communities by adapting standards from both. • Other GIS applications should be able to consume data without having to do costly format conversions. • Security
Research Issues 2 • Scalability • The system should be able to handle high volume and high rate data transport and processing. • Plugging new sensors, data sources or geoprocessing applications should not degrade system’s overall performance. • Flexibility and extendibility • Setting architectural principles for real-time Filters to process sensor data on the fly. • Ability to add new filters without system failures. • Quality of Service • Is latency introduced by filter chains in processing real-time sensor data acceptable? • Is the system fault tolerant?
Research Goals • Design a High Performance Web Service architecture for distributed GIS services to support archived and real-time geospatial data. • Build GIS Data Services for coupling scientific applications with various types of distributed geospatial databases. • Implement Web Service versions of • Web Feature Service for archived data • Sensor Collection Service for real-time geospatial data and sensor metadata. • Utilize publish-subscribe based messaging infrastructure to deploy distributed filters for processing real-time sensor data. • Develop a negotiation protocol for Web Services for supporting high performance data transport.
Contribution of This Thesis • Merges two important software worlds: GIS and Web Service Architectures. • Allows unified access to data by developing Web Services and Open GIS standards based services to access and manage archived and real-time geospatial data. • Develops a novel way of deploying filter chains on a topic based messaging system for processing real-time streaming sensor data. • Identifies a novel approach for negotiating various characteristics of communication between Web Services for High Performance messaging.
Sample GML Document <wfs:FeatureCollection > <gml:boundedBy> <gml:Box> <gml:coordinates decimal="." cs="," ts=" ">-83,25 -80,31</gml:coordinates> </gml:Box> </gml:boundedBy> <gml:featureMember> <Entity> <CityGate> <name>City Gate #10</name> <id>CG10</id> <consumptionRate>8.5579E7</consumptionRate> <location> <gml:Point srsName="null"> <gml:coord> <gml:X>-85.465</gml:X> <gml:Y>30.132</gml:Y> <gml:Z>2.0</gml:Z> </gml:coord> </gml:Point> </location> <connections> <id>J27</id> </connections> </CityGate> </Entity> </gml:featureMember> <gml:featureMember> . .
High Performance XML I (G. Fox) • There are many approaches to efficient “binary” representations of XML Infosets • MTOM, XOP, Attachments, Fast Web Services • DFDL is one approach to specifying a binary format • Assume URI-S labels Scheme and URI-R labels realization of Scheme for a particular message i.e. URI-R defines specific layout of information in each message • DFDL from GGF quite interesting for this • Assume we are interested in conversations where a stream ofmessages is exchanged between two services or between a client and a service i.e. two end-points • Assume that we need to communicate fast between end-points that understand scheme URI-S but must support conventional representation if one end-point does not understand URI-S
F1 F2 F3 F4 Container Handlers High Performance XML II (G. Fox) • First Handler Ft=F1 handles Transport protocol; it negotiates with other end-point to establish a transport conversation which uses either HTTP (default) or a different transport such as UDP with WSRM implementing reliability • URI-Tspecifies transport choice • Second HandlerFr=F2 handles representation and it negotiates a representation conversation with scheme URI-S and realization URI-R • Negotiation identifies parts of SOAP header that are present in all messages in a stream and are ONLY transmitted ONCE • Fr needs to negotiate with Service and other handlers illustrated by F3 and F4 below to decide what representation they will process
H1 H2 H3 H4 Body Ft Fr F3 F4 Container Handlers High Performance XML III (G. Fox) • Filters controlled by Conversation Context convert messages between representations using permanent context (metadata) catalog to hold conversation context • Different message views for each end point or even for individual handlers and service within one end point • Conversation Context is fast dynamic metadata service to enable conversions • NaradaBrokering will implement Fr and Ftusing its support of multiple transports, fast filters and message queuing; Conversation ContextURI-S, URI-R, URI-T Replicated Message Header Transported Message Handler Message View ServiceMessage View Service
RDAHMM: GPS Time Series Segmentation (M. Pierce)Slide Courtesy of Robert Granat, JPL GPS displacement (3D) length two years.Divided automatically by HMM into 7 classes. • Features: • Dip due to aquifer drainage (days 120-250) • Hector Mine earthquake (day 626) • Noisy period at end of time series • Complex data with subtle signals is difficult for humans to analyze, leading to gaps in analysis • HMM segmentation provides an automatic way to focus attention on the most interesting parts of the time series