1 / 29

Wide-Area Parallel Computing in Java

vrije Universiteit. Wide-Area Parallel Computing in Java. Henri Bal. Vrije Universiteit Amsterdam Faculty of Sciences. Introduction. Distributed supercomputing Parallel applications on geographically distributed computing system (computational grid) Examples: SETI@home, RSA-155

mae
Download Presentation

Wide-Area Parallel Computing in Java

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. vrije Universiteit Wide-Area Parallel Computing in Java Henri Bal Vrije Universiteit Amsterdam Faculty of Sciences

  2. Introduction • Distributed supercomputing • Parallel applications on geographically distributed computing system (computational grid) • Examples: SETI@home, RSA-155 • Programming support • Language-neutral systems: Legion, Globus • Language-centric: Java • Goal: study wide-area parallel computing in Java • Programming model: Remote Method Invocation

  3. Outline • Wide-area parallel computing • Java Remote Method Invocation (RMI) • Performance of JDK RMI • The Manta high-performance Java system • Wide-area parallel Java applications using RMI • Application performance

  4. Wide-area parallel computing • Challenge • Tolerating poor latency and bandwidth of WANs • Basic assumption: wide-area system is hierarchical • Connect clusters, not individual workstations • Most links are fast • General approach • Optimize applications to exploit hierarchical structure most communication is local

  5. Distributed ASCI Supercomputer VU (128) UvA (24) Node configuration 200 MHz Pentium Pro 64-128 MB memory 2.5 GB local disks Myrinet LAN Redhat Linux 2.0.36 6 Mb/s ATM Leiden (24) Delft (24)

  6. Java • Growing interest in Java for parallel applications • Java Grande forum • Parallel programming support in Java • Shared memory : multithreading • Distributed memory : Remote Method Invocation • Study suitability of Java RMI for (wide-area) parallel programming • Optimizing performance of local RMI [PPoPP’99] • Wide-area parallel programming using RMI [JavaGrande’99]

  7. Animal Orca Panda Manta RMI (1) • Flexible object-oriented RPC-like primitive • Easy interoperability between Java Virtual Machines • Polymorphism  dynamic bytecode loading void species(Animal x) throws … { System.out.println(“Species “ + x.name()); } o.species(new Orca());  “Species orca” o.species(new Panda());  “Species panda” o.species(new Manta());  “Species manta”

  8. RMI (2) • Designed for client-server applications • Automatic serialization (marshalling) • Normally used in a high latency environment • E.g. Internet • Is RMI fast enough for parallel programming ?

  9. JDK RMI Performance ( 200 MHz Pentium Pro, JDK 1.1.4 )

  10. Why is JDK RMI slow ? • Serialization uses run-time type inspection • Protocol overhead (class information) • Thread creation for incoming calls • TCP/IP • Most code is written in Java

  11. The Manta system • Designed for high-performance computing • Native (static) compilation • Source  executable • New fast RMI protocol between Manta nodes • Support (polymorphic) RMIs with JVMs • Implemented on wide-area DAS system

  12. JDK versus Manta 200 MHz Pentium Pro, Myrinet, JDK 1.1.4 interpreter,1 object as parameter

  13. Manta serialization class Test implements Serializable { int i; double d; Object o; } Java Source void PackageClass__Test(…) { WRITE_INT( type_id ); WRITE_INT( i ); WRITE_DOUBLE( d ); WRITE_OBJECT( o ); } Manta JDK

  14. RMI protocol • Light-weight RMI protocol • Send minimal type information • Avoid thread creation • Simple nonblocking methods executed directly • Avoid interrupts • Poll network when processor is idle • Everything is written in C

  15. Communication software Manta RMI • Panda user space RPC protocol • LFC Myrinet control program • Similar to active messages • Implemented partly on Myrinet network interfaces • Myrinet network interfaces mapped in user space Panda RPC LFC UDP TCP Myrinet Ethernet ATM

  16. Interoperability with JVMs • Manta RMI protocol incompatible with JDK • Use fast RMI between Manta nodes • Use JDK-compliant protocol with JVMs • Polymorphic RMI requires exchanging bytecodes • Also generate bytecodes when compiling a program • Dynamically compile and link bytecodes into running program

  17. Null-RMI latency

  18. RMI Throughput

  19. Outline • Wide-area parallel computing • Java Remote Method Invocation (RMI) • Performance of JDK RMI • The Manta high-performance Java system • Wide-area parallel Java applications using RMI • Application performance

  20. 2 orders of magnitude between intra-cluster (LAN) and inter-cluster (WAN) communication performance Manta exposes hierarchical structure to application Applications are optimized to reduce WAN-overhead Manta on wide-area DAS

  21. Wide-area programming • Problem: how to tolerate difference between LAN and WAN performance • Wide-area system is structured hierarchically • Most links are fast • Approach: application-level optimizations that exploit the hierarchical structure • Reduce wide-area communication

  22. Application experience • Parallel applications • Successive overrelaxation (SOR) • All-pairs shortest paths problem (ASP) • Traveling salesperson problem (TSP) • Iterative Deepening A* (IDA*) • Measurements on wide-area DAS • 1-4 clusters with 16 nodes • Comparison with single 64-node cluster

  23. Successive Overrelaxation • Red/black SOR • Neighbor communication, using RMI • Problem: nodes at cluster-boundaries • Overlap wide-area communication with computation • RMI is synchronous  use multithreading 5600 µsec µs 40 CPU 1 CPU 2 CPU 3 CPU 4 CPU 5 CPU 6 Cluster 1 Cluster 2

  24. All-pairs shortest paths • Broadcast at beginning of each iteration • Problem: broadcasting over wide-area links • Lack of broadcast in Java -> use spanning tree • Use coordinator node per cluster • Do asynchronous send to all remote coordinators • Implemented using threads Cluster 1 2 3

  25. 1 2 3 Traveling salesperson problem • Replicated-worker style parallel search algorithm • Problem: work distribution • Central job-queue has high overhead • Statically distribute jobs over clusters • Use centralized job-queue per cluster • Easy to express using RMI

  26. Iterative Deepening A* • Parallel search algorithm using work stealing • Problem: inter-cluster work stealing • Optimization: first look for work in local cluster • Easy to express using RMI Cluster 1 2

  27. Performance • Wide-area DAS system: 4 clusters of 16 CPUs • Comparison with single 16-node and 64-node cluster

  28. Conclusions • Fast RMI possible through • Compiler-generated serialization, light-weight communication & RMI protocols • Optimized wide-area applications are efficient • Reduce wide-area communication, or hide its latency • Java RMI is easy to use, but some optimizations are awkward to express • No asynchronous communication, collective comm. • Programming systems should take hierarchical structure of wide-area systems into account http://www.cs.vu.nl/manta

  29. Performance breakdown Manta ( Fast Ethernet )

More Related