Adaptation of Legacy Software to Grid Services

Adaptation of Legacy Software to Grid Services Bartosz Baliś, Marian Bubak, and Michał Węgiel Institute of Computer Science / ACC CYFRONET AGH Cracow, Poland bubak@uci.agh.edu.pl

Outline • Introduction - motivation & objectives • System architecture – static model (components and their relationships) • System operation – dynamic model (scenarios and activities) • System characteristics • Migration framework (implementation) • Performance evaluation • Use case & summary

Introduction • Legacy software • Validated and optimized code • Follows traditional process-based model of computation (language & system dependent) • Scientific libraries (e.g. BLAS, LINPACK) • Serviceoriented architecture (SOA) • Enhanced interoperability • Language-independent interface (WSDL) • Execution within system-neutral runtime environment (virtual machine)

Objectives • Originally: adaptation of the OCM-G to GT 3.0 • After generalization: • design of a versatile architecture enabling for bridging between legacy software and SOA • implementation of a framework providing tools facilitating the process of migration to SOA Site Tool OMIS LM OMIS Node SM Grid Service LM Node OMIS

Related Work • Lack of comprehensive solutions • Existing approaches possess numerous limitations and fail to meet grid requirements • Kuebler D., Einbach W.: Adapting Legacy Applications as Web Services (IBM) Main disadvantages: insecurity &inflexibility Client Service Adapter Server Web Service Container

Roadmap • Introduction - motivation & objectives • System architecture – static model (components and their relationships) • System operation – dynamic model (scenarios and activities) • System characteristics • Migration framework (implementation) • Performance evaluation • Use case & summary

General Architecture Hosting Environment Service Requestor SOAP Registry Legacy System Factory Master SOAP Monitor Instance Slave Proxy Factory Proxy Instance Service Process

Service Requestor • From client’s perspective, cooperation with legacy systems is fully transparent • Only two services are accessible: factory and instance; the others are hidden • Standard interaction pattern is followed: • First, a new service instance is created • Next, method invocations are performed • Finally, the service instance is destroyed • We assume a thin client approach

Legacy System (1/4) • Constitutes an environment in which legacy software resides and is executed • Responsible for actual request processing • Hosts three types of processes: master, monitor and slave, which jointly provide a wrapper encapsulating the legacy code • Fulfills the role of network client when communicating with hosting environment (thus no open ports are introduced and process migration is possible)

Legacy System (2/4) one per host permanent process Legacy System Master creates responsible for host registration and creation of monitor and slave processes Monitor controls Slave

Legacy System (3/4) Legacy System one per client transient process Master creates Monitor responsible for reporting about and controlling the associated slave process controls Slave

Legacy System (4/4) Legacy System provides means of interface-based stateful conversation with legacy software Master creates Monitor controls one per client transient process Slave

Hosting Environment (1/5) • Maintains a collection of grid services which encapsulate interaction with legacy systems • Provides a layer of indirection shielding the service requestors from collaboration with backend hosts • Responsible for mapping between clients and slave processes (one-to-one relationship) • Mediatescommunication between service requestors and legacy systems

Hosting Environment (2/5) Hosting Environment Registry one per service keeps track of backend hosts which registered to participate in computations permanent services Factory Proxy Factory Instance transient services Proxy Instance

Hosting Environment (3/5) Hosting Environment one per service responsible for creation of the corresponding instances Registry permanent services Factory Proxy Factory Instance transient services Proxy Instance

Hosting Environment (4/5) Hosting Environment Registry one per clientdirectly called by client, provides externally visible functionality permanent services Factory Proxy Factory Instance transient services Proxy Instance

Hosting Environment (5/5) Hosting Environment Registry one per clientresponsible for mediation between backend host and service client permanent services Factory Proxy Factory Instance transient services Proxy Instance

Resource Management (1/2) • Resources = processes (master/monitor/slave) • Registry service maintains a pool of master processes which can be divided into: • static part – configured manually by site administrators (system boot scripts) • dynamic part – managed by means of job submission facility (GRAM) • Optimization: coarse-grained allocation and reclamation performed in advance in the background (efficiency, smooth operation)

Resource Management (2/2) • Coarse-grained resource = master process • Fine-grained resource = monitor & slave process c.2 Coarse-Grained Allocation (c) Resource Broker Information Services c.3 c.1 Registry Monitor/Slave Data Management c.4 f.1 f.2 Fine-Grained Allocation (f) c.5 Master Job Submission

Invocation patterns • Apart from synchronous and sequential mode of method invocation our solution supports: • Asynchronism – assumed to be embedded into legacy software; our approach: invocation returns immediately and a separate thread is blocked on a complementary call waiting for the output data to appear • Concurrency – slave processes handle each client request in a separate thread • Transactions - the most general model of concurrent nested transactionsis assumed

Legacy Side Scenarios (1/2) • Client assignment - master process repetitively volunteers to participate in request processing (reporting host CPU load). When registry service assigns a client before timeout occurs, new monitor and slave processes are created. • Request processing – embraces: input retrieval, request processing and output delivery. • System self-monitoring - monitor process periodically reports to proxy instance about the status of the slave process and current CPU load statistics (both system- and slave-related).

Legacy Side Scenarios (2/2) Registry Master Proxy Instance Assign [success] Monitor Create Slave Create Request Assign Heartbeat [timeout] Response [continue] Request Heartbeat Response [migration] Assign [timeout] Destroy

Client Side Scenarios (1/2) • Instance construction - involves two steps: • Creation of the associated proxy instance, • Assignment of one of the currently registered master processes. • Method invocation - client call is forwarded to the proxy instance, from where it is fetched by the associated slave process; the requestor is blocked until the response arrives. • Instance destruction - destruction request is forwarded to the associated proxy instance.

Client Side Scenarios (2/2) Factory Proxy Factory Registry Create Instance New Create Proxy Instance New Assign Invoke Invoke Destroy Destroy

Process Migration (1/5) • Indispensable when we need to: • dynamically offload work onto idle machines (automatic load-balancing) • silently mask recovery from system failures (transparent fail-over) • Challenges: state extraction & reconstruction • Low-level approach • Suitable only for homogeneous environment (e.g. cluster of workstations) • Supported by our solution since legacy systems act as clients rather than servers

Process Migration (2/5) • High-level approach • Can be employed in heterogeneous environment • State restoration is based on the combination of checkpointing and repetition of the short-term method invocation history • Requires additional development effort (state serialization, snapshot dumping and loading) • Proxy instance initiates high-level recovery upon detection of failure (lost heartbeat) or overload • Only slave and monitor processes are transferred onto another computing node

Process Migration (3/5) • Selection of optimal state reconstruction scenario is based on transaction flow and checkpoint sequence (multiple state snapshots are recorded and the one enabling for fastest recovery procedure is chosen) Committed Committed Committed Unfinished Aborted Aborted Aborted Time Transaction omitted Check point Failure point Transaction repeated

CPUload generated by slave process (as reported by monitor process) is approximated as a function of time and used to estimate the cost of invocations Process Migration (4/5) c – total cost f – frequency l – CPU load t – time

Process Migration (5/5) • In case of concurrent method invocations, emulation of synchronization mechanisms employed on the client side is necessary • Timing data is gathered (method invocation start & end timestamps), • If two operations overlapped in time, they are executed concurrently (otherwise sequentially). • Prerequisite: repeatable invocations (unless system state was changed, in response to the same input data identical results are expected to be obtained).

System Features (1/3) • Non-functional requirements: • QoS-related (the fashion that service provisioning takes place in): performance & dependability, • TCO-related (expenses incurred by system maintenance): scalability & expandability. • Efficiency – coarse-grained resource allocation; pool of master processes always reflects actual needs; algorithms have linear time complexity; checkpointing and transactions jointly allow for selection of optimal recovery scenario.

System Features (2/3) • Availability – fault-tolerance based on both low-level and high-level process migration; failure detection and self-healing; checkpointing allows for robust error recovery; in the worst case A = 50% (when the whole call history needs to be repeated we have MTTF = MTTR). • Security – no open incoming ports on backend hosts are introduced; authentication of legacy systems is possible; we rely upon the grid security infrastructure provided by the container.

System Features (3/3) • Scalability - processing is highly distributed and parallelized (all tasks are always delegated to legacy systems); load balancing is guaranteed (by registry and proxy instance); job submission mechanism is exploited (resource brokering). • Versatility - no assumptions are made as regards programming language or run-time platform; portability; non-intrusiveness (no legacy code alteration needed); standards-compliance and interoperability.

Migration Framework (1/2) • Code-named L2G(Legacy To Grid) • Based on GT 3.2 (hosting environment) and gSOAP 2.6 (legacy system) • Objective: to facilitate the adaptation of legacy C/C++ software to GT 3.2 services by automaticcode generation (with particular emphasis on ease of use and universality) • Structural and operational compliance with the proposed architecture • Served as a proof of concept of our solution

Migration Framework (2/2) • Most typical development cycle: • Programmer specifies the interface that will be exposed by the deployed service (Java) • Source code generation takes place (Java/C++/XML/shell scripts) • Programmer provides the implementation for the methods on legacy system side (C++) • Support for process migration, checkpointing, transactions, MPI (parallel machine consists of multiple slave processes one of which is in charge of communication with proxy instance)

Performance evaluation (1/5) • Benchmark: comparison of two functionally equivalent grid services (the same interface) one of which was dependent on legacy system • Both services were exposing a single operation: int length (String s); • Time measurement was performed on the client side; all components were located on a single machine; no security mechanism was employed; relative overhead was estimated

Performance evaluation (2/5) Measurement results for method invocation time = length/bandwidth + latency

Performance evaluation (3/5) Measurement results for instance construction time = iterations/throughput

Performance evaluation (4/5) Measurement results for instance destruction time = iterations/throughput

Performance evaluation (5/5) • Instance construction and destruction • Method invocation Scenario Ordinary service Legacy service Relative change Construction 6.2 iterations/s 2.0 iterations/s Reduced 3.1 x Destruction 25.4 iterations/s 12.2 iterations/s Reduced 2.1 x Quantity Ordinary service Legacy service Relative change Bandwidth 909.1 kB/s 370.4 kB/s Reduced 2.5 x Latency 15.4 ms 37.8 ms Increased 2.5 x

Use Case: OCM-G • Grid application monitoring system composed of two components: Service Manager (SM) and Local Monitor (LM), compliant to OMIS interface Slave SM MCI SOAP MCI MCI Proxy Instance LM LM SOAP Node Node Instance Site

Summary • We elaborated a universal architecture enabling to integrate legacy software into the grid services environment • We demonstrated how to implement our concept on the top of existing middleware • We developed a framework (comprising a set of the command line tools) which automates the process of migration of C/C++ codes to GT 3.2 • Further work: WSRF, message-level security, optimizations, support for real-time applications

More info www.icsr.agh.edu.pl/lgf/see alsowww.eu-crossgrid.organdwww.cyfronet.krakow.pl/ICCS2004/

Adaptation of Legacy Software to Grid Services