290 likes | 304 Views
Explore the management of dynamic metadata and context in distributed systems for correlating activities, managing events, and enabling real-time capabilities. Discover the application use domains, motivation, and characteristics of different domains.
E N D
Managing Dynamic Metadata and ContextMehmet S. Aktas <maktas@cs.indiana.edu>
Outline • Introduction • Problem Statement, Hypothesis, Design Goals • Literature Survey • Research Issues • Milestones • Contributions • Summary
Context • Def: "Context is any information that can be used to characterize the situation of an entity, where an entity can be a person, place, or computational object.“ Dey A. et al, 1999 • Context is metadata associated to both services and their activities • Context can be • independent of any interaction • static context • Examples: type or endpoint of a service, less likely to change • dynamic context • Examples: throughput of a service, likely to change over time • generated as result of interaction • information associated to an activity or session • Examples: session-id, URI of the coordinator of a session
Gaggle of Services • Gaggle of Services • are set of actively collaborating managed services put together for a particular functionality, such as collaboration, visualization or sensor Grid • collaborate for a particular common goal • Example: emergence preparedness and response • are actively generate events as result of interactions • are very small part of the whole Grid
Motivation • Current Grid Information Services provide information describing services independent of their interactions. • We need management of all information associated with services for; • correlating activities of widely distributed services • workflow-style, SOA based applications • management of events especially in multimedia collaboration • distributed session management • for instance; audio, video, audio/video meetings in Chinese Olympics
Motivation II • More reasons for management of Context • enabling uniform query capabilities to both dialog or monolog context information • “Give me list of services satisfying C:{a,b,c..} QoS requirements and participating S:{x,y,z..} sessions” • enabling real-time replay/playback capabilities in collaboration based sessions • enabling session failure recovery
Application Use Domain • Multimedia Collaboration domain: Global MMCS • multiple A/V services talk to various collaboration clients and services • defines a general session collaboration protocol (XGSP) • XSGP enables different collaboration tools to talk to each other e.g. AccessGrid, H.323 • needs a distributed session management systems • Characteristics of the domain • widely distributed services • metadata of events (archival data) • mostly read-only • persistent, but lifetime is bounded to lifetime of events
Application Use Domain - II • Workflow-style distributed application: Geographic Information System Grid • sensor grid data services generates events when a certain magnitude event occurs • firing off various codes, filtering, analyzing raw data, generating images, maps • needs a distributed context management to correlate workflow activities • Characteristics of domain • any number of widely distributed services can be involved • conversation metadata • transient • multiple writers
1 WMS GUI WFS <context xsd:type="ContextType"timeout=“100"> <context-id>http://../abcdef:012345<context-id/> <context-service>http://.../HPSearch</ context-service> <content>http://danube.ucs.indiana.edu:8080\x.xml</content> </context> <context xsd:type="ContextType"timeout=“100"> <context-service>http://.../HPSearch</ context-service> <parent-context>http://../abcdef:012345<parent-context/> <content> shared data for HPSearch activity </content> <activity-list mustUnderstand="true" mustPropagate="true"> <service>http://.../DataFilter1</service> <service>http://.../PICode</service> <service>http://.../DataFilter2</service> </activity-list> </context> <context xsd:type="ContextType"timeout=“100"> <context-service>http://.../HPSearch</ context-service> <parent-context>http://../abcdef:012345<parent-context/> <content> profile information related WMS </content> </context> <context xsd:type="ContextType"timeout=“100"> <context-service>http://.../WMS</ context-service> <activity-list mustUnderstand="true" mustPropagate="true"> <service>http://.../WMS</service> <service>http://.../HPSearch</service> </activity-list> </context> <context xsd:type="ContextType"timeout=“100"> <context-service>http://.../HPSearch</ context-service> <content> HPSearch associated additional data generated during execution of workflow. </content> </context> 2 http://..../..../..txt <?xml version="1.0" encoding="UTF-8"?> <soap:Envelope xmlns:soap="http://www.w3..."> <soap:Header encodingStyle=“WSCTX URL" mustUnderstand="true"> <context xmlns=“ctxt schema“ timeout="100"> <context-id>http..</context-id> <context-service> http.. </context-service> <context-manager> http.. </context-service> <activity-list mustUnderstand="true" mustPropagate="true"> <p-service>http://../WMS</p-service> <p-service>http://../HPSearch</p-service> </activity-list> </context> </soap:Header> ... 4 5,6,7 Data Filter 3,9 HP Search PI Code 8 Data Filter http://..../..../tmp.xml Context Information Service • session associated dynamic metadata • user profile • activity associated dynamic metadata • service associated dynamically generated metadata What are the examples of dynamically generated metadata in a real-life example? session shared state service associated user profile SOAP header for Context activity 3,4: WMS starts a session, invokes HPSearch to run workflow script for PI Code with a session id 5,6,7: HPSearch runs the workflow script and generates output file in GML format (& PDF Format) as result 8: HPSearch writes the URI of the of the output file into Context 9: WMS polls the information from Context Service 10: WMS retrieves the generated output file by workflow script and generates a map
Problem Statement What is a novel process of building Information Services, maintaining dynamic session-related metadata of widely distributed services, providing uniform interface to both interaction-independent and conversation-based context?
Hypothesis • A fault-tolerant, high performance, scalable information system • maintaining widely distributed dynamically generated metadata for Gaggle of Services • providing uniform interface to context information • utilization of existing Grid Information Services for interaction-independent context to improve search capabilities • enabling coordination of widely distributed services in Gaggles • workflow-style Grid applications • enabling distributed event management and various capabilities for A/V conferencing applications • discovery of entities in a session • enabling playback/replay capabilities, • enabling session failure recovery
Architectural Design Goals • Key Design Goals of our Design • scalability • with respect to # • widely distributed services • performance • high responsiveness, reduced access latency • fault tolerance • high availability of information • robust to replica crashes • flexibility • accommodate broad range of application domains • read-dominated, read/write dominated
Literature Survey • Main Stream Grid Information Services • MDS, R-GMA, UDDI (Grimories) • Specifications for stateful service interactions • WS-CAF, WSRF, WS-Metadata Exchange • Linda TupleSpaces coordination model
Limitations in Grid Information Services • Lack of support for session related dynamic metadata • MDS4 adopts WSRF approach which does not scale managing activities of multiple services sharing same state • Lack of support for advanced query capabilities • ex: “Give me list of WFS services participating “fault displacement calculations” workflow session where the service connected by a network path over 2MB/sec of bandwidth with max 100 msec of latency.”
WS-CAFWS-Context - Key Concepts • WS Composite Application Framework (WS-CAF) • WS-Context, WS-Coordination, WS-Transaction Mngmt. • WS Context • defines context, context service and mapping on SOAP • shared data to correlate service activities • context information dependent on the type of the activity • transactional activity: the URI of the coordinator in a session • context service maintains associated context • participants of an activity register with context service for lifecycle of that activity
Web Service ResourceFrameworkKey Concepts • defines standard interfaces and behaviors for distributed system integration • standard XML-based information model • standard interfaces for push and pull mode access to service data • enables every service to expose state data for query, update • monitoring shared state • models resource state as private to a service • supports resource oriented approach for stateful interactions • requires the identity of the resource to be passed in the SOAP message
WS-Metadata ExchangeKey Concepts • WS Metadata is key to interactions • WS-Policy: capabilities, requirements, general characteristics of services • WSDL: describes message operations, supported network protocols used by services • WS-Metadata Exchange • provides mechanism for sharing information about the capabilities of individual Web services • allows querying a WS Endpoint to retrieve metadata about what to know to interact with them • defines request/response message pairs to retrieve WS metadata
Limitations in Specifications for Service Communication • WSRF does not actually accomplish state management by just enabling access and update rights • heterogeneous service environment • workflow-style applications • WSRF, WS-Metadata Exchange models service metadata private to a service • does not scale in managing activities of multiple services • WS-Metadata Exchange defines only how to access interaction-independent metadata • WS-Context is promising it has limitations • simple framework for context management • limited query capability • does not address distributed management aspects of context metadata
TupleSpaces Paradigm • a communication paradigm • space-based asynchronous communication • first described in Linda project in 1982 at Yale • pioneered by David Gelernter • Linda is a coordination language using primitive operations on shared data in shared space • data-centric coordination model • communication units are tuples • data-structure consisting of one or more typed fields • a TupleSpace is an intermediary container
JavaSpaces [Sun Microsystems] • JavaSpaces is an object oriented • strongly influenced by Linda model • Java based, platform independent • spaces are transactionally secure • mutual exclusive access to objects • spaces are persistent • temporal, spatial uncoupling • spaces are associative • content based search • limitations • centralized • inefficient reading/writing performance • dependent on stack of different software layers
Research Issues • Recap on key design goals: • scalability, performance, fault tolerance • research issues related replicating dynamic metadata • deployment (dynamic vs. static replication) • Where to place replicas of given context metadata? • What are the properties of new location must meet? • How to know if replica location stable? • How can we provide tailored replication based on R/W properties?
Research Issues II • consistency • What is the appropriate consistency model? • How do replicas exchange replica updates in what direction? • How can we utilize an ordering capability based on NTP (Network Time Protocol) to provide consistency on the replicated context metadata? • performance • efficient metadata access • How to choose a replica server to best serve client request? • How to avoid performance degradation due to repetitive queries?
Research Issues III • scalability • load balancing strategies • How to manage load balancing? • other research issues • replay/playback capabilities • How to enable real-time replay/playback capabilities? • session recovery • How to enable session recovery? • uniform interface to context • How to provide a uniform interface to context?
Milestones • Implementation of TupleSpaces paradigm • Uniform Update and Query (search, discovery) Services • Sequencer Service • ensures that an order is imposed on actions/events that take place in a session
Milestones II • Storage (Replication) Service • decide # and placement of replicas • enable autonomous behavior • support robust behavior for replica crashes • Access (Request Distribution) Service • distribute request among object replicas • Expeditor Service • generalized caching mechanism • reduce storage access due to repetitive queries
Evaluation of Hypothesis • Qualitative evaluation • Does the system delivers what it promises in terms of functionality? • Example test domains: Geographical Information System Grid, Global MMCS • How does the system function incase of replica crashes? • Quantitative evaluation • How well the system delivers what it promises in terms of performance? • What are the performance cost and gains brought together with scalability and fault tolerance? • trade offs between fault-tolerance, scalability and performance • what limitations does the trade offs impose to the practical use of my system? • what is # of replicas needed for certain availability? • what is the cost of fault tolerance? • what is the cost of scalability?
Contribution of this Thesis • Identifies a novel approach for building Information Services managing session related context. • Identifies a novel approach for providing fault tolerance and scalability while providing high performance when managing dynamic metadata • Identifies a dynamic replication mechanism for widely distributed dynamic and transient metadata
Summary • This thesis addresses following problems • Lack of support in Grid Information Services for context (session-related dynamic metadata) management to correlate activities in workflow-style applications: • by providing a novel approach for management of widely distributed, shared session-related dynamic metadata • Lack of support in Grid Information Services to provide distributed session management: • by providing distributed event management system enabling session failure recovery or replay/playback capabilities • Lack of search capabilities in Grid Information Services: • by providing uniform search interface to both interaction independent and conversation-based context enabling service discovery through events