1 / 54

Data and Metadata Architectures in a Robust Semantic Grid

This presentation discusses different architectures and standards for building robust semantic grids, including the use of web services, SOAP, GridFTP, and other technologies. It also covers the areas of controversy and technical evolution in grid development, such as security, workflow, service discovery, and data transport. The presentation explores the role of services and grids in managing collections of services, and the importance of mediation and transformation in grid of grids and simple services. Additionally, it addresses interoperability and the challenges in defining standards at different levels of grid systems.

sklos
Download Presentation

Data and Metadata Architectures in a Robust Semantic Grid

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data and Metadata Architectures in a Robust Semantic Grid Chinese Academy of Sciences July 28 2006 Geoffrey Fox Computer Science, Informatics, Physics Pervasive Technology Laboratories Indiana University Bloomington IN 47401 http://grids.ucs.indiana.edu/ptliupages/presentations/ gcf@indiana.eduhttp://www.infomall.org

  2. Status of Grids and Standards I • It is interesting to examine Grid architectures both to see how to build great new systems but also to look at linking Grids together and making them (or parts of them) interoperable • There is agreement that one should use Web Services with WSDL and SOAP and not so much agreement after that • But use non SOAP transport like GridFTP • Can divide Service areas into • General Infrastructure • Compute Grids • Data and Information Grids • Other …..

  3. Status of Grids and Standards II • General Infrastructure covers area where Industry, OASIS and W3C are building the pervasive Web service environment • There are important areas of debate and vigorous technical evolution but these are within confined areas • Relatively clear how to adapt between different choices • Examples of areas of some contoversy • Security critical but commercial, academic institution and Grid project solutions still evolving • Workflow has many choices and BPEL not clearly consensus standard; differences between control and data flow • Architecture of Service discovery understood but skepticism that UDDI appropriate; it keeps getting improved • In Management, Notification and Reliable Messaging, there are multiple standards but rather trivial to map between them • WSRF symbolizes disagreements in state (which is roughly meta-data area) but roughly this is question as to whether metadata in message or context service or hidden in application • Data transport model unclear: GridFTP v. BitTorrent v. Fast XML

  4. The Ten areas covered by the 60 core WS-* Specifications

  5. Activities in Open Grid Forum Working Groups

  6. The NCES/WS-*/GS-* Features/Service Areas I

  7. CPUs Clusters Compute Resource Grids Overlay and Compose Grids of Grids MPPs Methods Services Component Grids Federated Databases Databases Data Resource Grids Sensor Sensor Nets Grids of Grids of Simple Services • Grids are managed collections of one or more services • A simple service is the smallest Grid • Services and Grids are linked by messages • Internally to service, functionalities are linked by methods • Link serices via methods  messages  streams • We are familiar with method-linked hierarchyLines of Code  Methods  Objects  Programs  Packages

  8. Port Port Port InternalInterfaces InternalInterfaces InternalInterfaces Port Port Port Port Port Port Port Port Port Grid or Service Grid or Service Grid or Service Mediation and Transformation in a Grid of Grids and Simple Services Mediation and Transformation Services Distributed Brokers between distributed ports Mediation and Transformation Services Listen, Queue Transform, Send External facing Interfaces Mediation and Transformation Services 1-10 ms Overhead Use “OGSA” to Federate?

  9. The NCES/WS-*/GS-* Features/Service Areas II

  10. Interoperability etc. for FS11-14 • The higher level services are harder as the systems are more complicated and less agreement on where standards should be defined • OGF has JSDL, BES (Basic Execution Services) but might be better to set standards at a different level • i.e. users might prefer to view Condor or GT4 as collections of services as the interface • Idea is that maybe we should consider high level capabilities as Grids (an EGEE or “Condor” compute Grid for example whose internals are black boxes for users) and then you need two types of interfaces • Internal interfaces like JSDL defining how the Condor Grid interacts internally with a computer • External Interfaces defining how one sets up a complex problem (maybe with lots of individual jobs as in SETI@Home) for a Compute Grid

  11. gLite Grid Middleware Services Access API CLI Security Services Authorization Information & Monitoring Services Application Monitoring Information &Monitoring Auditing Authentication Data Management Workload Mgmt Services JobProvenance PackageManager MetadataCatalog File & ReplicaCatalog Accounting StorageElement DataMovement WorkloadManagement ComputingElement Connectivity

  12. Job monitor Production manager GANGA UI User CLI BK query webpage FileCatalog browser FileCatalogSvc BookkeepingSvc DIRAC Job Management Service JobMonitorSvc DIRAC services JobAccountingSvc FileCatalog AccountingDB ConfigurationSvc Agent Agent Agent DIRAC resources DIRAC Sites LCG DIRAC Storage Resource Broker DIRAC CE CE 3 Agent gridftp DIRAC CE DIRAC CE CE 2 DiskFile CE 1 DIRAC Architecture

  13. Old AliEn Framework User 100% perl5 Central services SOAP Local Site elements David Evans

  14. Raw Data  Data  Information  Knowledge  Wisdom SS Database SS SS SS SS SS SS SS AnotherGrid Decisions AnotherGrid SS SS SS SS FS FS OS MD MD FS Portal Portal OS OS FS OS SOAP Messages OS FS FS FS AnotherService FS FS MD MD MD OS MD OS OS OS OS FS Other Service FS FS FS FS MD OS OS OS OS FS FS FS FS MD MD MD MD FS FS Filter Service OS OS AnotherGrid FS MetaData FS FS FS MD Sensor Service SS SS SS SS SS SS SS SS SS SS Grids of Grids Architecture AnotherService

  15. Resource View Resource View Resource View DIKW1 DIKW2 DIKW3 F F F Access View Access View Access View Data-Information-Knowledge-Wisdom Pipeline • DIKWi represent different forms of DIKW with different terminology in different fields. • Each DIKWi has a resource view describing its physical instantiation (different distributed media with file, database, memory, stream etc.) and • An access view describing its query model (dir or ls, SQL, XPATH, Custom etc.). • The different forms DIKWi are linked by filtering steps F. This could be a simple format translation; a complex calculation as in the running of an LHC event processing code; a proprietary analysis as in a Search engines processing of harvested web pages; an addition of a metadata catalog to a collection of files.

  16. DIKW Pipeline II • Each DIKW can be a complete data grid • The resource view is typified by standards like ODBC, JDBC, OGSA-DAI and is internal to DIKW Grid • A name-value resource view is exemplified by Javaspaces (tuple model) and WS-Context • The access (user) view is external view of a data grid and does not have such a clear model but rather • Systems like SRB (Storage Resource Broker) that virtualize file collections • WebDAV supports the distributed file access view • VOSpace from astronomy community is viewed by some as an abstraction of SRB • WFS Web Feature Service from Open Geospatial Consortium is an important example

  17. WMS uses WFS that uses data sources <gml:featureMember> <fault> <name> Northridge2 </name> <segment> Northridge2 </segment> <author> Wald D. J.</author> <gml:lineStringProperty> <gml:LineStringsrsName="null"> <gml:coordinates> -118.72,34.243 -118.591,34.176 </gml:coordinates> </gml:LineString> </gml:lineStringProperty> </fault> </gml:featureMember>

  18. Managed Data • Most grids have a managed data component (which we call a “Managed Data Grid”) • Managed data can consist of the data and one or more metadata catalogs • Metadata catalogs can contain semantic information enabling more precise access to the “data” • Replica catalogs (managing multiple file copies) are another metadata catalog • SRB and Digital libraries have this architecture with mechanisms to keep multiple metadata copies coherent • RDF has clear relevance • However there is no clear consensus as to how to build a Managed Data Grid

  19. Resource Level Federation Resource View Resource View Resource View Resource View User Level Federation DIKW1 DIKW2 DIKW3 DIKW4 Access View Access View Access View Access View Resource and User Views • Federation implies we integrate (virtualize) N data systems which could be heterogeneous • Sometimes you can choose where to federate but sometimes you can only federate at user view • In Astronomy Grids there are several (~20) different data sources (collections) corresponding to different telescopes. These are built on traditional bases but expose astronomy query interface (VOQL etc.) and one cannot federate at database level • Geographical Information Systems GIS are built on possibly spatially enhanced databases but expose WFS or WMS OGC interfaces • To make a map of Indiana you need to combine the GIS of 92 separate counties; this cannot be done at database level • More generally when we linking black box data repositories to the Grid, we can only federate at the interfaces exposed by the black box

  20. Metadata Systems I: Applications • Semantic description of data (for each application) • Replica Catalog • UDDI or other service registry • VOMS or equivalent (PERMIS) authorization catalog • Compute Grid static resource metadata • Compute Grid dynamic events • And implicitly metadata defining workflow, state etc. which can be stored in messages and/or catalogs (databases) • Why not unify the resource view of these?

  21. Metadata Systems II: Implementations • There are also many WS-* specifications addressing meta-data defined broadly • WS-MetadataExchange • WS-RF • UDDI • WS-ManagementCatalog • And many different implementations from (extended) UDDI through MCAT of the Storage Research Broker • And of course representations including RDF and OWL • Further there is system metadata (such as UDDI for core services) and metadata catalogs for each application domain • They have different scope and different QoS trade-offs • e.g. Distributed Hash Tables (Chord) to achieve scalability in large scale networks • WS-Context • ASAP • WBEM • WS-GAF

  22. Different Trade-offs • It has never been clear to me how a poor lonely service is meant to know where to look up meta-data and if it is meant to be thought up as a database (UDDI, WS-Context) or as the contents of a message (WS-RF, WS-MetadataExchange) • We identified two very distinct QoS tradeoffs • 1) Large scale relatively static metadata as in (UDDI) catalog of all the world’s services • 2) Small scale highly dynamic metadata as in dynamic workflows for sensor integration and collaboration • Fault-tolerance andability to support dynamic changes with few millisecond delay • But only a modest number of involved services (up to 1000’s in a session) • Need Session NOT Service/Resource meta-data so don’t use WS-RF

  23. XML Databases of Importance • We choose a message based interfaceto a backend database • We built two pieces of technology with different trade-offs but each could store any meta-data but with different QoS • WS-Context designed forcontrolling a dynamic workflow • (Extended) UDDI exemplified by semantic service discovery • WFS provides general application specific XML data/meta-data repository built on top of a hybrid system supported by UDDI and WS-Context • These have different performance, scalability and data unit size requirement • In our implementation, each is currently “just an Oracle/MySQL” database (with Javaspaces cache in WS-Context) front ended by filters that convert between XML (GML for WFS) and object-relational Schema • Example of Semantics (XML) versus representation (SQL) • OGSA-DAI offers Grid interface to databases – we could use this internally but don’t as we only need to expose external interfaces WFS and not MySQL to Grid

  24. WFS: Geographical Information System compatible XML Metadata Services • Extended UDDI XML Metadata Service (alternative to OGC Web Registry Services) supports WFS GIS Metadata Catalog (functional metadata), user-defined metadata ((name, value) pairs), up-to-date service information (leasing), dynamically updated registry entries. • Our approach enables advanced query capabilities • geo-spatial and temporal queries , • metadata oriented queries, • domain independent queries such as XPATH, XQuery on metadata catalog. • http://www.opengrids.org/extendeduddi/index.html

  25. Context as Service Metadata • We define all metadata (static, semi-static, dynamic) relevant to a service as “Context”. • Context can be associated to a single service, a session (service activity) or both. • Context can be independent of any interaction • slowly varying, quasi-static context • Ex: type or endpoint of a service, less likely to change • Context can be generated as result of service interactions • dynamic, highly updated context • information associated to an activity or session • Ex: session-id, URI of the coordinator of a workflow session

  26. Hybrid XML Metadata Services –> WS-Context + extendedUDDI • We combine functionalities of these two services: WS-Context AND extendedUDDI in one hybrid service to manage Context (service metadata). • WS-Context controlling a workflow • (Extended) UDDI supporting semantic service discovery • This approach enables uniform query capabilities on service metadata catalog. • http://www.opengrids.org/wscontext/index.html

  27. IS Client IS Client IS Client WSDL WSDL WSDL HTTP(S) WSDL WSDL WSDL Information Service Optimized forPerformance Optimized forScalability WS-Context Ver1.0 ws-context.wsdl WSDL WSDL UDDI Version 3.0 WSDL Service Interface Descriptions uddi_api_v3_portType.wsdl WSDL WSDL Extended WS-Context Service Extended UDDI Registry Service JDBC JDBC DB DB interaction-independent relatively static metadata dynamic metadata

  28. Generalizing a GIS • Geographical Information Systems GIS have been hugely successful in all fields that study the earth and related worlds • They define Geography Syntax (GML) and ways to store, access, query, manipulate and display geographical features • In SOA, GIS corresponds to a domain specific XML language and a suite of services for different functions above • However such a universal information model has not been developed in other areas even though there are many fields in which it appears possible • BIS Biological Information System • MIS Military Information System • IRIS Information Retrieval Information System • PAIS Physics Analysis Information System • SIIS Service Infrastructure Information System

  29. ASIS Application Specific Information System I • a) Discovery capabilities that are best done using WS-* standards • b) Domain specific metadata and data including search/store/access  interface. (cf WFS). Lets call generalization ASFS (Application Specific Feature Service) • Language to express domain specific features (cf GML). Lets call this ASL (Application Specific language) • Tools to manipulate information expressed in language and key data of application (cf coordinate transformations). Lets call this ASTT (Application specific Tools and Transformations) • ASL must support Data sources such as sensors (cf OGC metadata and data sensor standards) and repositories. Sensors need (common across applications) support of streams of data • Queries need to support archived (find all relevant data in past)   and streaming (find all data in future with given properties) • Note all AS Services behave like Sensors and all sensors are wrapped as services • Any domain will have “raw data” (binary) and that which has been filtered to ASL. Lets call ASBD (Application Specific Binary Data)

  30. Filter, Transformation, Reasoning, Data-mining, Analysis ASRepository AS Tool (generic) AS Service (user defined) AS Tool (generic) ASVS Display AS“Sensor” Messages using ASL ASIS Application Specific Information System II • Lets call this ASVS (Application Specific Visualization Services) generalizing WMS for GIS • The ASVS should both visualize information and provide a way of navigating (cf GetFeatureInfo) database (the ASFS) • The ASVS can itself be federated and presents an ASFS output interface • d) There should be application service interface for ASIS from which all ASIS service inherit • e) There will be other user services interfacing to ASIS • All user and system services will input and output data in ASL using filters to cope with ASBD

  31. Application – Context Store usage in communication of mobile Web Services • Handheld Flexible Representation (HHFR) is an open source software for fast communication in mobile Web Services. HHFR supports: • streaming messages, separation of message contents and usage of context store. • http://www.opengrids.org/hhfr/index.html • We use WS-Context service as context-store for redundant message parts of the SOAP messages. • redundant data is static XML fragments encoded in every SOAP message • Redundant metadata is stored as context associated to service conversion in place • The empirical results show that we gain 83% in message size and on avg. 41% on transit time by using WS-Context service.

  32. Optimizing Grid/Web Service Messaging Performance The performance and efficiency of Web Services can be greatly increased in conversational and streaming message exchanges by removing the redundant parts of the SOAP message.

  33. Performance with and without Context-store Summary of the Round Trip Time (TRTT) • Experiments ran over HHFR • Optimized message exchanged over HHFR after saving redundant/unchanging parts to the Context-store • Save on average 83% of message size, 41% of transit time

  34. System Parameters • Taccess: time to access to a Context-store (i.e. save a context or retrieve a context to/from the Context-store) from a mobile client • TRTT: Round Trip Time to exchange message through a HHFR channel • N: number of simultaneous streams supported by stream summed over ALL mobile clients • Twsctx: time to process setContext operation • Taxis: time consumed for Axis process • Ttrans: transmission time through network • Tstream: stream length

  35. Context-store: System Parameters

  36. Summary of Taxis and Twsctx measurements Taccess = Twsctx + Taxis + Ttrans Data binding overhead at Web Service Container is the dominant factor to message processing

  37. Performance Model and Measurements • Chhfr = nthhfr + Oa + Ob • Csoap = ntsoap • Breakeven point: nbe thhfr + Oa + Ob = nbe tsoap Oa(WS) is roughly 20 milliseconds Oa : overhead for accessing the Context-store Service Ob : overhead for negotiation

  38. Core Features of Management Architecture • Remote Management • Allow management irrespective of the location of the resource (as long as that resource is reachable via some means) • Traverse firewalls and NATs • Firewalls complicate management by disabling access to some transports and access to internal resources • Utilize tunneling capabilities and multi-protocol support of messaging infrastructure • Extensible • Management capabilities evolve with time. We use a service oriented architecture to provide extensibility and interoperability • Scalable • Management architecture should be scale as number of managees increases • Fault-tolerant • Management itself must be fault-tolerant. Failure of transports OR management components should not cause management architecture to fail.

  39. Management System built in terms of • Bootstrap System – Robust itself by Replication • Registry for metadata (distributed database) – Robust by standard database techniques and our system itself for Service Interfaces • NaradaBrokering for robust tunneled messages – NB itself robust using our system • Managers – Easy to make robust using our system; these are essentially agents • Managees – what you are managing – Our system makes robust – There is NO assumption that Managed system uses NB

  40. Registry Read / Write from / to Registry via pre-determined TOPIC NB Basic Management Architecture I • Registry • Stores system state. • Fault-tolerant through replication • Could be a global registry OR separate registries for each domain (later slide) • Current implementation uses a simple in-memory system • Will use our WS - Context service as our registry(Service/Message Interface to in-memory JavaSpaces cache and MySQL) • Note metadata transported by messages but we use distributed database to implement • Messaging Nodes • NaradaBrokering nodes that form a scalable messaging substrate • Main purpose is to serve as a message delivery mechanism between Managers and Service Adapters (Managees) in presence of varying network conditions

  41. Manager Registry Read / Write from / to Registry via pre-determined TOPIC Service Adapter Resource NB … Basic Management Architecture II • Resources to Manage (Managee) • If the resources DO NOT have a Web Service interface, we create a Service Adapter (a proxy that provides the Web Service interface as a wrapper over the basic management functionality of the resource). • The Service Adapters connect to existing messaging nodes. This mainly leverages multi-protocol transport support in the messaging substrate. Thus, alternate protocols may be used when network policies cause connection failures • Managers • Active entities that manage the resources. • May be multi-threaded to improve scalability (currently under further investigation) Managees

  42. ArchitectureUse of Messaging Nodes • Service adapters and Managers communicate through messaging nodes • Direct connection possible, however • This assumes that the service adapters are appropriately accessible from the machines where managers would run • May require special configuration in routers / firewalls • Typically managers and messaging nodes and registries are always in the same domain OR a higher level network domain with respect to service adapters • Messaging Nodes (NaradaBrokering Brokers) provides • A scalable messaging substrate • Robust delivery of messages • Secure end-to-end delivery

  43. ArchitectureBootstrapping Process • The architecture is arranged hierarchically. • Resources in different domains can be managed with separate policies for each domain • A Bootstrapping service is run in every domain where the management architecture exists. • Serves to ensure that the child domain bootstrap process are always up and running. • Periodic heartbeats convey status of bootstrap service • Bootstrap service periodically spawns a health-check manager that checks health of the system (ensures that the registry and messaging nodes are up and running and that there are enough managers for managees) • Currently 1 manager per managee HierarchicalBootstrap Nodes /ROOT /ROOT/FSU /ROOT/CGL Registry Registry

  44. Architecture: User Component • Application-specific specification of the characteristics that the resources/services being managed, should maintain. • Impacts Managee interface, registry and Manager • Generic and Application specific policies are written to the registry where it will be picked up by a manager process. • Updates to the characteristics (WS-Policy in future) are determined by the user. • Events generated by the Managees are handled by the manager. • Event processing is determined by policy (future work), • E.g. Wait for user’s decision on handling specific conditions • The event can be processed locally, so execute default policy, etc… • Note Managers will set up services if registry indicates that is appropriate; so writing information to registry can be used to start up a set of services

  45. SAM Module Resource Manager ArchitectureStructure of Managers • Manager process starts appropriate manager thread for the manageable resource in question • Heartbeat thread periodically registers the Manager in registry • SAM (Service Adapter Manager) Module Thread starts a Service/Resource Specific “Resource Manager” that handles the actual management task • Management system can be extended by writing ResourceManagers for each type of Managee Manager Heartbeat Generator Thread

  46. Prototype • We illustrate the architecture by managing the distributed messaging middleware, NaradaBrokering • This example motivated by the presence of large number of dynamic peers (brokers) that need configuration and deployment in specific topologies • Use WS – Management (June 2005) parts (WS – Transfer [Sep 2004], WS – Enumeration [Sep 2004] and WS – Eventing) (could use WS-DM) • WS – Enumeration implemented but we do not foresee any immediate use in managing the brokering system • WS – Transfer provides verbs (GET / PUT / CREATE / DELETE) which allow us to model setting and querying broker configuration, instantiating brokers and creating links between them and finally deleting brokers (tear down broker network) and re-deploy with possibly a different configuration and topology • WS – Eventing (will be leveraged from the WS – Eventing capability implemented in OMII) • WS – Addressing [Aug 2004] and SOAP v 1.2 used (needed for WS-Management) • Used XmlBeans 2.0.0 for manipulating XML in custom container. • WS-Context will replace current registry

  47. Prototype Components • Broker Service Adapter • Note NB illustrates an electronic entity that didn’t start off with an administrative Service interface • So add wrapper over the basic NB BrokerNode object that provides WS – Management front-end • Also provides a buffering service to buffer undeliverable responses • These will be retrieved later by a separate Request – Response message exchange • Broker Network Manager • WS – Management client component that is used to configure a broker object through the Broker Service Adapter • Contains a Request-Response as well as Asynchronous messaging style capabilities • Contains a topology generator component that determines the wiring between brokers (links that form a specific topology) • For the purpose of prototype we simply create a CHAIN topology where each ith broker is connected to (i-1)st broker

  48. Prototype Resources/Properties Modeled (very specific to NaradaBrokering)

  49. Response TimeHandling Events (WS – Eventing) • Test Resource which does not do any work other than responding to events • This base model shows that up to 200 resources can be managed per manager process, beyond which response time increases rapidly • This number is resource dependent and this result is illustrative. • Equally dividing management between 2 processes, increases response time, although slowly.

  50. Amount of Management Infrastructure Required • N = Number of resources to manage • NMP = Number of Manager processes • If a manager process can manage 200 resources simultaneously, then NMP = N/200 • NMN = Number of Messaging Nodes • If a messaging node can support 800 simultaneous connections then • NMN = (N + N/200 + 1) /800 • 1 connection is for registry

More Related