250 likes | 393 Views
Thoughts on a Java Reference Implementation for MPJ. Mark Baker * , Bryan Carpenter . * University of Portsmouth Florida State University IPDPS, Cancun, Mexico – 5 th May 2000 http://www.dcs.port.ac.uk/~mab/Talks/. Contents. Introduction Some design decisions
E N D
Thoughts on a Java Reference Implementation for MPJ Mark Baker*, Bryan Carpenter *University of Portsmouth Florida State University IPDPS, Cancun, Mexico – 5th May 2000 http://www.dcs.port.ac.uk/~mab/Talks/ Mark.Baker@Computer.Org
Contents • Introduction • Some design decisions • An overview of the architecture • Process creation and monitoring • The MPJ daemon • Handling aborts and failures • MPJ device • Conclusions and future work Mark.baker@Computer.Org
Introduction • The Message-Passing Working Group of the Java Grande Forum was formed in late 1998 as a response to the appearance of several prototype Java bindings for MPI-like libraries. • An initial draft for a common API specification was distributed at Supercomputing '98. • Since then the working group has met in San Francisco and Syracuse. • The present API is now called MPJ. Mark.baker@Computer.Org
Introduction • No complete implementation of the draft specification. • mpiJava, is moving towards the “standard”. • The new version (1.2) of the software supports direct communication of objects via object serialization, • Version 1.3 of mpiJava will implement the new API. • The mpiJava wrappers rely on the availability of platform-dependent native MPI implementation for the target computer. Mark.baker@Computer.Org
Introduction • While this is a reasonable basis in many cases, the approach has some disadvantages. • The 2-stage installation procedure – get and build native MPI then install and match the Java wrappers – tedious/off-putting to new users. • On several occasions we saw conflicts between the JVM environment and the native MPI runtime behaviour. The situation has improved, and mpiJava now runs on various combinations of JVM and MPI implementation. • This strategy simply conflicts with the ethos of Java – write-once-run-anywhere software is the order of the day. Mark.baker@Computer.Org
MPJ – the Next Generation of Message Passing in Java, • An MPJ reference implementation could be implemented as: • Java wrappers to a native MPI implementation, • Pure Java, • Principally in Java – with a few simple native methods to optimize operations (like marshalling arrays of primitive elements) that are difficult to do efficiently in Java. • We are aiming at pure Java to provide an implementation of MPJ that is maximallyportable and that hopefully requires the minimum amount of support effort. Mark.baker@Computer.Org
Benefits of a pure Java implementation of MPJ • Highly portable. Assumes only a Java development environment. • Performance: moderate. May need JNI inserts for marshalling arrays. Network speed limited by Java sockets. • Good for education/evaluation. • Vendors provide wrappers to native MPI for ultimate performance? Mark.baker@Computer.Org
Design Criteria for the MPJ Environment • Need an infrastructure to support groups of distributed processes: • Resource discovery, • Communications, • Handle failure, • Spawn processes on hosts. Mark.baker@Computer.Org
Resource discovery • Technically, Jini discovery and lookup seems an obvious choice. • Daemons register with lookup services. • A “hosts file” may still guide the search for hosts, if preferred. Mark.baker@Computer.Org
Communication base • Maybe, some day, Java VIA?? • For now sockets are the only portable option. • RMI surely too slow. Mark.baker@Computer.Org
Handling “Partial Failures” • Need to overcome: • When a network connection breaks, • The host system goes down, • The JVM running the remote MPJ task halts for some other reason (e.g., occurrence of a Java exception), • The programthat initiated the MPJ job is killed. • Unexpected termination of any particular MPJ job. • Concurrent tasks associated with other MPJ jobs should be unaffected, even if they were initiated by the same daemon. • All processes associated with the particular job must shut down within some (preferably short) interval of time cleanly. Mark.baker@Computer.Org
Handling “Partial Failures” • A useable MPJ implementation must deal with unexpected process termination or network failure, without leaving orphan processes, or leaking other resources. • Could reinvent protocols to deal with these situations, but Jini provides a ready-made framework (or, at least, a set of concepts). Mark.baker@Computer.Org
Handling failures with Jini • If any slave dies, client generates a Jini distributed event, MPIAbort – all slaves are notified and all processes killed. • In case of other failures (network failure, death of client, death of controlling daemon, …) client leases on slaves expire in a fixed time, and processes are killed. Mark.baker@Computer.Org
Integration of Jini and MPI • Provides a natural Java framework for parallel computing with the powerful fault tolerance and dynamic characteristics of Jini combined with proven parallel computing functionality and performance of MPI Mark.baker@Computer.Org
MPJ - Implementation • In the initial reference implementation we will use Jini technology to facilitate location of remote MPJ daemons and to provide a framework for the required fault-tolerance. • This choice rests on our guess that in the medium-to-long-term Jini will be a ubiquitous component in Java installations. • Hence using the Jini paradigms from the start should eventually help inter-working and compatibility between our software and other systems. Mark.baker@Computer.Org
Acquiring compute slaves through Jini Mark.baker@Computer.Org
MPJ • We envisage that a user will download a jar-file of MPJ library classes onto machines that may host parallel jobs, and install a daemon on those machines – technically by registering an activatable object with an rmid daemon. • Parallel java codes are compiled on one host. • An mpjrun program invoked on that host transparently loads the user's class files into JVMs created on remote hosts by the MPJ daemons, and the parallel job starts. Mark.baker@Computer.Org
MPJ - Implementation • In the short-to-medium-term – beforeJini software is widely installed – we might have to provide a “lite” version of MPJ that is unbundled from Jini. • Designing for Jini protocols should, nevertheless, have a beneficial influence on overall robustness and maintainability. • Use of Jini implies use of RMI for various management functions. Mark.baker@Computer.Org
Slave 1 Slave 2 Slave 3 Slave 4 Host Mpj Deamon Mpjrun myproggy –np 4 rmid http server Mark.baker@Computer.Org
MPJ – Implementation • Some assumptions that have a bearing on the organization of the MPJ daemon: • stdout (and stderr) streams from all tasks in an MPJ job are merged non-deterministically and copied to the stdout of the process that initiates the job. • No guarantees are made about other IO operations - these are system dependent. • Rudimentary support for global checkpointing and restarting of interrupted jobs may be quite useful, although checkpointing would not happen without explicit invocation in the user-level code, or that restarting would happen automatically. Mark.baker@Computer.Org
MPJ – Implementation • The role of the MPJ daemons and their associated infrastructure is to provide an environment consisting of a group of processes with the user-code loaded and running in a reliable way. • The process group is reliable in the sense that no partial failures should be visible to higher levels of the MPJ implementation or the user code. • We will use Jini leasing to provide fault tolerance –clearly no software technology can guarantee the absence of total failures, where the whole MPJ job dies at essentially the same time. Mark.baker@Computer.Org
MPJ - Implementation • Once a reliable cocoon of user processes has been created through negotiation with the daemons, we have to establish connectivity. • In the reference implementation this will be based on Java sockets. • Recently there has been interest in producing Java bindings to VIA - eventually this may provide a better platform on which to implement MPI, but for now sockets are the only realistic, portable option. Mark.baker@Computer.Org
MPJ – Implementation • Between the socket API and the MPJ API there will be an intermediate “MPJ device” level – modelled on the Abstract Device Interface (ADI) of MPICH. • Although the role is slightly different here - we do not really anticipate a need for multiple platform-specific implementations - this still seems like a good layer of abstraction to have in our design. • The API is actually not modelled in detail on the MPICH device, but the level of operations is similar (based on isend/irecv/waitany calls). Mark.baker@Computer.Org
High Level MPI Collective Operations Process Topologies Base Level MPI All pt-to-pt modes Groups Communicators Datatypes Isend, irecv, waitany, … Physical PIDs Contexts & Tags Byte vector data MPJ Device Level Java Socket and Thread API All-to-all TCP Connect Input Handler Threads Synchronised methods, wait, notify… MPJ Daemon Lookup, Leasing (Jini) Exec java MPJLoader Serializable objects Process Creation and Monitoring Layers of an MPJ Reference Implementation Mark.baker@Computer.Org
MPJ - Conclusions • On-going effort (NSF proposal + volunteer help). • Collaboration to define exact MPJ interface – consisting of other Java MP system developers. • Work at the moment is based around the development of the low-level MPJ device and exploring the functionality of Jini. Mark.baker@Computer.Org