310 likes | 339 Views
Explore Federated Service-Oriented Information Management Architecture for distributed data in scientific domains. Challenges, methodologies, and motivations explained using the GIS domain as an example.
E N D
Federated Service Oriented Information Management Ahmet Sayar asayar@cs.indiana.edu
Introduction • Aim: Develop a general Grid architecture based approach to distributed heterogeneous data, information and knowledge –which are provided by different repositories and producers- in an efficient and robust manner. • Challenges in • Representing, • Transforming, • Integrating and • Displaying of • Data • Information/knowledge for decision makers in scientific application domains. • Methodology: • Create “Federated Service Oriented Information Management architecture” for the GIS domain based on OGC (Open Geospatial Consortium) specifications. • Determine the requirements for the generalization of the architecture for other domains (Chemistry).
Motivation • SOA based on Grid or Web Services • We use DIKW to describe the hierarchy of Data-Information-Knowledge-Wisdom that we are attempting to support • “Filter Services” are Information Sources: • A service inputs DIKW from other Grids or Services and outputs DIKW – perhaps converting data to information etc. • Web Services, easy to extend and federate. • Easy to publish, located and bind. • Predictable input/output interfaces defined by metadata • A repository or sensor has or gets DIKW from "outside Grid"; it outputs DIKW; they are “just” filters whose output is Grid compatible DIKW as messages or message streams • Information management through ASIS (Application Specific Information System) framework in Science Domains. • Data and metadata concepts and formats
GIS – OGC (Motivation Domain) (1) • Geographic Information System (GIS) is a system for creating and managing spatial data and associated attributes. • OGC (Open Geospatial Consortium) The goal is to make geographic information and services neutral and available across any network, application, or platform. • Challenges (valid for any science domains) • Distributed nature of geospatial data. • Proprietary data formats, and service methodologies. • Lack of interoperable services. • Assembling data from distributed sources • Format conversions • Amount of resources for geoprocessing
GIS – OGC (Motivation Domain) (2) • GML : Geographic Markup language • WFS: Web Feature Server • Provides vector data such as rivers, state and city boundaries in GML. • WCS : Web Coverage Server • Provides coverage (raster) data. Grided data, pixel info. • WMS : Web Map Server • Provides data in the form of jpeg, svg, png etc. Defined in its capabilities file. • WMS’ : Cascading Web Map Server • Provides data in the form of layers in mages. It is cascading because it provides other WMS layers as if its own.
Vector data Raster data Data capability Interactive Decision Support Information Management ArchIn GIS Domain (Sample Scenario) • Query : No Standard – Filter specification – query on vector data by WFS using SQL • Data Encodings : GML, images • Metadata : Structured Capability doc in XML. • No event notification – WS-Context for asynchronous run. • Registry : WRS – we call it MD. Data:b WCS (Minnesota) Data:a MD WFS WMS Data:b Data:c (Nasa) (CGL) WMS’ (CGL) Data:a Data:b Data:c Interactive tools Decision support Data:a Data:b Data:c Filter Container
Database From Raw Data to Information / Knowledge • Raw Data GML (WFS in Filter - ASFS) • GML Map image (WMS in Filter - ASVS) • Each filter provides data in a consistent format. • Formats should be consistent with the systems data model, GML • Any Data Common Data Model • Data Model is XML based hierarchical data • Portable across • Languages • Operating system Raw Data Or Any Data SS
Interactive Decision Support Tools- Interactive query,- Interactive display, movie and animation- Integration to Application Science Simulations http://virtualsky.org (R. Williams et al.)
Application Use Domains • ServoGrid Projects (GIS) • Patter Informatics (PI) • GeoFest • Virtual California (VC) • Los Alamos National Labs (LANL) • IEISS (The Interdependent Energy Infrastructure Simulation System ) • Models infrastructure networks (e.g. electric power systems and natural gas pipelines) and simulates their physical behavior, interdependencies between systems. • Chemistry and Astronomy (Future) • CML (Chemistry Markup Language) representation of molecules. VOTable (Virtual Observatory Table format)
DB DB DB DB DB DB DB DB Problem Recognition Coverage data SS Vector data netCDF Bitmap data Image jpeg SS SS Binary data HDF5 XML data SS Bar graphs Plots images Statistics data SS Interactive Tools
Problem Recognition -cont • Services like discovery and notification do not need to be made application specific. • BUT If the domain changes then : • choices, • database requirements, • data format, • core service requirements, • attributes, and • metadata context CHANGES ! • What are the common concepts and characteristics for • data, • metadata, • query language, • services, and • communication language, in order to drive information/knowledge from the heterogeneous data/information sources in any application domains ?
Generalization of Service Oriented Information Management Architecture • GIS has some specifications based on standards such as OGC ISO/TC210, But many others do not • GIS ASIS (Science Domain) • GML ASL (Representing) • WFS ASFS (Storing-Resource) • WMS ASVS (Displaying) • Capa.xml Metadata (Integrating) • SOAP over HTTP. (Communication Protocol)
Generalization - Overall Structure Solution • ASL : Application Specific Language. XML based hierarchical data representation format. • Cross language, platform and operating system • ASVS : Application Specific Visualization System • Last filter before the decision maker. • Provides information/knowledge in human readable formats • ASFS : Application Specific Feature Service. • Stores and provides common data model (ASL) • Treat binary and common data (in ASL) differently. ASFS ASVS Display AS Repository AS Tool (generic) AS Service (user defined) AS Tool (generic) AS “Sensor” Message Using ASL
ASFS and ASVS in SOA Interfaces, querying, metadata and data model • Each routine is published in the WSDL, invoked based on predefined request schema and put into SOAP body. <SOAP:Envelope> …<SOAP:Body> ……<request> ……..<GetCapability> ……</request> ...<SOAP:Body> <SOAP:Envelope> <request> …..<GetCapability> </request>
Sample Capabilities File (too simplified) – GIS Domain • <?xml version='1.0' encoding="UTF-8" standalone="no" ?> <!DOCTYPE WMT_MS_Capabilities SYSTEM "http://toro.ucs.indiana.edu:8086/xml/capabilities.dtd"> <Capabilities version="1.1.1" updateSequence="0"> <Service> <Name>CGL_Mapping</Name> <Title>CGL_Mapping WMS</Title> <OnlineResource xmlns:xlink="http://www.w3.org/1999/xlink" xlink:type="simple“ xlink:href="http://toro.ucs.indiana.edu:8086/WMSServices.wsdl" /> <ContactInformation> ….. </ContactInformation> </Service> <Capability> <Request> <GetCapabilities> <Format>WMS_XML</Format> <DCPType><HTTP><Get> <OnlineResource xmlns:xlink="http://w3.org/1999/xlink" xlink:type="simple“ xlink:href="http://toro.ucs.indiana.edu:8086/WMSServices.wsdl" /> </Get></HTTP></DCPType> </GetCapabilities> <GetMap> <Format>image/GIF</Format> <Format>image/PNG</Format> <DCPType><HTTP><Get> <OnlineResource xmlns:xlink="http://w3.org/1999/xlink" xlink:type="simple“ xlink:href="http://toro.ucs.indiana.edu:8086/WMSServices.wsdl" /> </Get></HTTP></DCPType> </GetMap> </Request> <Layer> <Name>California:Faults</Name> <Title>California:Faults</Title> <SRS>EPSG:4326</SRS> <LatLonBoundingBox minx="-180" miny="-82" maxx="180" maxy="82" / > </Layer> </Capability> </Capabilities>
Data A DataD Data E Data F Data B,C Sample Scenario for ASIS A,B,C A A,B,C GetData(A) • Static linking of filters.Capability aggregation cycle through “GetCapabilities” interfaces of filters. A,B,C,D,E,F A E B,C A,B,C GetVis(A,E) A,B,C,D,E,F GetData(A) Interactive Tools D GetVis(E) E E,F F • Successive requests are done, user is not involved. These request chains are created based on filters capabilities that published before • GetCapability request from client tools at the startup. Later requests will be created based on returned aggregated capabilities • Client needs to visualize Data A and E and makes a GetVis request to ASVS with specific attributes for querying. GetVis is defined in a schema file. • Each Filter publishes its data through its capability file.
Overall Structure Solution -cont • Common data (ASL) is kept in ASFS with query capability. • In a given domain every filter speaks in ASL. • Filters (ASVS, ASFS) keep their metadata locally. • ASVS both visualize information and provide a way of navigating ASFS and their underlying DB. • ASVS can itself be federated and present output interface. • Dynamic metadata update via MD services or P2P metadata exchange. • Utilizing data/information at the application level via filters • ASFS provide ASL. • ASVS provide human readable information such as text, graphs (scalable vector (svg) or portable (png)) and images. • Filters have common ports and interfaces • Enable chaining for more complex data and information creation. • Filters are easily published, located and invoked over the internet.
Applicability to Different Science Domains • How strongly our service definitions in proposed architecture matches to general science domains?
Research Issues (1) • Requirements for the domain metadata in capability • What does capabilities do and need to have to federate filters? • Requirements for the ASL (such as CML, GML) • What does ASL need to have to federate the filters? • Concept of data (such as feature, coverage) • Common representation? Possible? To what extend? • A common information management framework which can be applied to any domain. • some instructions- any field, what needs to be done
Research Issues (2) • Application level data/information federation. • Integrating the system with application science simulations. • Creating interactive decision support tools utilizing integrated filter services. • Tools for map animation, map movies, images • Interactive query support to get further information on the image and/or animation. • Enabling binding of services into pipelines with or without human intervention through metadata. • Caching and load balancing to handle large scientific data in an efficient and robust manner (application based).
Related WorkSRB (Storage Resource Broker) • SRB • Uniform access to distributed heterogeneous data resources by attributes. • Catalog service is MCAT (Metadata Catalog Service). • Resource and data location transparency. • Remote authentication authorization – user groups. • Not just for access, transferring and replicating. • Sample projects using SRB: BIRN and IVOA. • Summary • Other important digital library projects and the NGAS (Next Generation Archive System) from ESO. • We will research more these important activities, identify key architecture ideas and incorporate lessons. • SRB can be leveraged in ASIS.
Related Work -ContOGSA-DAI • Ogsa-DAI • Open Grid Service Architecture–Data Access and Integration. • Access to heterogeneous data via common interfaces on the grid. • Catalog service is MCS (Metadata Catalog Service) • OGSI-compliant Grid. • Components are Grid services. Resources should be registered. • Sample projects using Ogsa-DAI : LEAD, MyGrid. • Summary • OGSA-DAI emphasizes database layer whereas we are tackling the application specific DIKW. • OGSA-DAI can be leveraged in ASIS.
Contributions • Instructions how to build ASL and metadata in capability for the application sciences. • Instructions how to build application specific information system (ASIS) federating multiple filters speaking ASL. • Information grid (ASIS) formalization through capabilities metadata, defining all the data/information sources as interacting Web Service filters with standard metadata service ports. • Optimize and enhance the distributed heterogeneous information management.
THANKS asayar@cs.indiana.edu Ahmet Sayar
Literature Survey OGSA-DAI SRB
Discussions on SRB & Ogsa-DAI • SRB • Monolithic – does too much • MCAT dependent • MCAT has limited support for application-level metadata • Need diff metadata for diff domain, and extensions for applications • Not standard based – Not open source • Not handling data based on DIKW hierarchy • Ogsa-DAI • At the data and Database level • MCS dependent • MCS has limited support for application-level metadata • Need diff metadata for diff domain, and extensions for applications • For Grid applications - GGF standards • Data only in relational and XML database or ordinary files • Not handling data based on DIKW hierarchy
Our Work Compared to SRB & Ogsa-DAI (1) • Each filter has its own metadata • Distributed metadata handling • Peer to peer • Through MD services • They provide heterogeneous data access and federation through central metadata services • SRB MCAT and Ogsa-DAI MCS • Main motivation is sharing, interpreting and knowledge extraction of the data and information. • Their motivation is storing, accessing and updating of the heterogeneous data. • We leverages their power and usability in our federated service oriented information management architecture. • They are not competitors, instead completers.
Our Work Compared to SRB & Ogsa-DAI (2) Wisdom Decisions, ready to use information and knowledge Wisdom decisions, knowledge and information extraction by the user Interactive Tools • Reusable components Filter Services with specific ports and interfaces • Distributed DIKW abstraction • Metadata in capability document • Metadata aggregators • New metadata for different domains • Smart data querying • Web Services based SOA (advantages). • -Central data access abstraction. Uniform access to heterogeneous data sources • Metadata : SRB/MCAT, Ogsa-DAI/MCS • Both provides extensible metadata arch for diff domains • SRB has “zone” concept addresses similar issues but in different way ASVS GDSReg ASVS ASVS MCAT MasterSRB Ogsa-GDSF SRB AgentsOgsa/GDS ASFS ASFS ASFS R R R R R R Wisdom decisions Information/knowledge Data access and query
Why are we different ?Federated Service Oriented Information Management • SOA (Service Oriented Architecture) • Easy to extend • Reusable components • Cross platform and language. • XML based hierarchical data representation • Easy data integration • Easy querying • Human readable information • Easy to access data – no command line • Interactive tools • On the fly query creation. • Not only accessing data but also transforming through its path to end users. • Ports to integrate application simulations to application specific information system (ASIS) • Integrating application simulation data/information with ASIS outputs
DB DB DB Data capability Interactive Decision Support An Example of Other Domains:Astronomy Domain (IVOA Standards) • FS-1 : VOPlot • Integrating, Interacting visualization tools • FS-2 : SkyNode • ADQL based SOAP interface returning VOTable based results • FS-3 : SIA • 2D sky projection, logically a grid of pixels encoded as a FITS image • FS-4 : SSA • URL-based returning a dataset "document" (VOTable) • Query : ADQL –extension of SQL • Data Encoding: VOTable, FITS • Metadata : UCD, VOResource • Event notification : VOEvent • Registry : VORegistry • QueryableData in : SSAP and SIAP, VOStore FS-3 FS-4 FS-2 FS-1 MD PORTAL