Construction & Integration of Distributed Systems

Construction & Integration of Distributed Systems Jerry James Oct. 30, 2000

Outline • Constructing Distributed Systems • Maintainability • Simulation & debugging • Integrating Distributed Systems • Predicting integration effects • Optimizing integrated systems

Constructing distributed systems • Explosion of heterogeneous distributed systems: • E-commerce (e.g., web-based businesses) • Distributed databases (e.g., reservation systems) • Collaborative systems (e.g., shared calendars) • Proposed solutions: CORBA, DCOM, Java RMI • Complex APIs, difficult interaction • Need for fault tolerance • Overlap communication and computation for performance reasons (e.g., with multithreading)

The Maintainability Problem • Complex APIs make programming more difficult, but increase software maintenance costs even more • The need to conform to a programming model often precludes performance-enhancing optimizations • Making software fault tolerant further increases software complexity • Asynchrony (e.g., from multithreading) can be dealt with, but procedures that are always safe are rarely efficient

Language-level solutions • Use a language approach, rather than an API approach • Hide complexity from the user: • object-oriented programming interface • automatically apply optimizations • hide failures • manage distribution

The Kan System (http://www.ittc.ukans.edu/kan/) Kan source Kan compiler Java bytecode + Kan run-time libraries JVM JVM JVM Java Sockets

High-level distributed programming • Assembly language / high-level language analogy • Code in a high-level language almost always • Code in assembly language only: • for highly performance-critical routines • when the higher-level language lacks expressiveness • Likewise, use a high-level approach to distribution, dropping to the (current) low level only: • when performance is very bad • when the high-level approach lacks expressiveness

Simulation of distributed systems • Predict system performance before going to the expense of building it • Need to ensure that numerous factors are realistic for results to be applicable, e.g.: • Execution time for program components • Network performance under program load, including effects of network congestion • This is a hard problem!

Debugging distributed systems • True asynchrony: multiple events happening in multiple places simultaneously • Event ordering is difficult to determine • No analog of single-stepping with a debugger • No analog of watch points • No global breakpoints

The Reactor: simulation & debugging Code for Process 1 Code for Process 2 Code for Process 3 Post ticket REACTOR Activate ticket Network Model Ticket queue Code for Process N-2 Code for Process N-1 Code for Process N

The Reactor: deployment Process 1 Process 2 Process 3 Post ticket REACTOR Activate ticket Ticket queue Process N-2 Process N-1 Process N

The Reactor, continued • The code for the processes does not change: more accurate simulation • Most programmers are not comfortable with reactive programming: a little preprocessing converts from a multithreaded programming style • This means that multithreaded programs can reap the same benefits • Joint work with Dr. Niehaus of ITTC

Integration of distributed systems • Relational databases (and transaction processing systems in general) have well-understood semantics, including failure semantics • Virtually synchronous systems have well-understood semantics, including failure semantics • Shared memory systems have well-understood semantics, including failure semantics • Other types of systems, also • But the semantics are all different!

Integrating distributed systems • What happens if: • a computation running on a shared memory multiprocessor  • accesses a view serializable database  • by sending requests across an ordered multicast system  • which manages failures with a virtually synchronous controller? • Nobody knows! • Solution: throw in lots of synchronization, timeouts, and pessimistic failure handlers whenever system borders are crossed • Inefficient, and we still don’t know if it is safe!

The history algebra • Although the semantics are different, every kind of system shares the notion of a history • Generalize this notion • Find basic operations that manipulate histories • Determine how to talk about consistency and failures in a general way • Result: an algebra that computes the behavior of an integrated system • Work in progress

Optimizing integrated systems • With the history algebra, we can compute when optimizations (e.g., reducing synchronization) are safe for system boundary-crossing operations. • This is a big job; we need tool support. • Future investigation: what optimizations are desirable and when can we prove that they are safe?

Conclusion • Construction of distributed systems can be made easier by programming at a higher level, dropping down to the lower level to handle performance and expressiveness problems • Simulation & debugging can be aided with the Reactor pattern • Predicting the behavior of integrated systems is on the horizon • Using such predictions to optimize performance remains to be addressed

Construction & Integration of Distributed Systems