1 / 33

Parallel Virtual Machines in Kepler

Parallel Virtual Machines in Kepler. Daniel Zinn Xuan Li Bertram Ludaescher. Eighth Biennial Ptolemy Mini-conference. Thursday, April 16, 2009, Berkeley, California. Outline. Motivation & Related Work Lightweight Parallel PN Engine Kepler PPN Director Demo

nam
Download Presentation

Parallel Virtual Machines in Kepler

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Parallel Virtual Machines in Kepler Daniel Zinn Xuan Li Bertram Ludaescher Eighth Biennial Ptolemy Mini-conference Thursday, April 16, 2009, Berkeley, California

  2. Outline • Motivation & Related Work • Lightweight Parallel PN Engine • Kepler PPN Director • Demo • Future Directions & Conclusion

  3. Motivation • Kepler used to automate and build scientific apps • maybe compute intensive • maybe data intensive long-running • Speed-up executions to save scientists time • … by leveraging distributed resources

  4. Distribution Efforts in Kepler / PTII • Remote execution of a complete workflow • Hydrant (Tristan King) • Web service for remote execution (Jianwu Wang) • Parameter sweeps with Nimrod/K (Colin Enticott, David Abramson, Ilkay Altintas) • Distribution within actors • “Plumping Workflows” with ad-hoc ssh-control (Nortbert Podhorszki) • Globus actors in Kepler: GlobusJob, GlobusProxy, GridFTP, GridJob. • GLite actors available through ITER • Webservice executions by actors • Distribution of few or all actors • Distributed SDF Director (Daniel Cuadrado) • Pegasus Director (Daniel Cuadrado and Yang Zhao) • Master-Slave Distributed Execution (Chad Berkley and Lucas Gilbert) with DistributedCompositeActor • PPN Director (Daniel Zinn and Xuan Li) Thanks to Jianwu for help with overview

  5. Outline • Motivation & Related Work • Parallel PN Engine • Kepler PPN Director • Demo • Future Directions & Conclusion

  6. Lightweight Parallel PN Engine (LPPN) • Motivation • PN as inherently parallel MoC • Build simple, efficient distributed PN-engine • Design Requirements • KISS • Avoid centralization as much as possible • Provide Actor and Port abstractions • Allow actors being written in different languages • “Experimentation Platform” for scheduling, data routing, … • Design Priniciples • One actor = one process • Communication between actors • Central component only for setup, termination detection, …

  7. LPPN – Technology Choices • C++ for core libraries • Actor, Port, Token as C++ classes • Parallel Virtual Machine (PVM) for parallelization • Thin layer on top of machine clusters (pool of hosts) • Message passing • Implemented simple RPC on top of this • SWIG for adding higher-languages above core • Perl/Python interfaces for writing actors • Perl interfaces for composing and starting workflow • Java interface for composing, starting, monitoring workflows

  8. LPPN – C++ Core Library: Actor

  9. LPPN – C++ Core Library: Ports • Ports are parameterized by the data type sent through the port. • Data types: • string • int, double, … • Custom structs • BLOB • Transfer via PVM messages

  10. LPPN – C++ Core Library: Tokens • BLOB_Token to encapsulate BLOB as files • Construct from files • LinkTo(path) • File data is known to the workflow system • Data is sent via rsync/scp by the LPPN system • Each actor has private workspace in file system • Generic Token • Polymorphic Tokens for COMAD workflows • Open, Close, BLOB, …

  11. LPPN – Actor Communication (Ports) • During Setup-Time • Connect OutPorts to InPorts (1:1 mapping) • During Runtime • Port implementation sends data from port to port • BlobToken-Ports handle file movement • Block on read • Block on write when buffer full (still fixed-size) • Tokenbuffers in PVM, file-buffers in actor directory • Gathering of statistics • Actor status, token counts, …

  12. Command-line Actor and Actors in Perl • Example: ConvertResize Actor • Automatically create input and output ports • Read from input ports, call command, write to output ports

  13. Workflow Setup • Start actors • Connect all ports • Unleash actors

  14. Workflow-Script (sneak preview) • Simple DSL for defining workflows • Create Composite Actors • Specify partial parameterizations • Connect ports easily • Check sanity (all ports connected, are types ok,…) • Give deployment directives (hostname, co-locations) • Run workflow

  15. Workflow-Script Example

  16. Outline • Motivation & Related Work • Parallel PN Engine • Kepler PPN Director • Demo • Future Directions & Conclusion

  17. Kepler PPN Director • Idea: Use Kepler as sophisticated GUI • Create, run and monitor LPPN workflows • Marrying LPPN and Kepler – The PPN Director • Drag’n’drop workflow creation (1:1 mapping for actors) • Parameter support • Hints for deployment from user • Monitor token sending and receiving • Monitor actor status • …

  18. PPN Director – Architecture Overview Kepler Local Machine LPPN

  19. PPN Director – Design Decisions • Proxy-Actors in Kepler represent Actors in LPPN • Repository of available LPPN Actors in XML file (next slide) • Actor-name • Parameters and default values • Ports • Generic PPN-Actor is configured using this information • Monitor actor state • Send data from Kepler Actors to LPPN actors and vice versa • PPN Director • Start Actors with parameters, deployment info • Connect Actors according to Kepler workflow • Unleash and stop workflow execution

  20. LPPN Actor Repository

  21. Monitoring Support • PPN Actors periodically probe LPPN actors for info • Number of tokens sent and received • Current actor state: • Working • Block on receive • Block on write • Sending BLOB tokens • Displayed on actor while workflow is running …

  22. Communication with Regular PN Actors? • Sending data from regular Kepler • Actors to LPPN and vice versa

  23. Communication with Regular PN Actors

  24. Communication with Regular PN Actors! • Sending data from regular Kepler • Actors to LPPN and vice versa

  25. Outline • Motivation & Related Work • Lightweight Parallel PN Engine • Kepler PPN Director • Demo • Future Directions & Conclusion

  26. Outline • Motivation & Related Work • Lightweight Parallel PN Engine • Kepler PPN Director • Demo • Future Directions & Conclusion

  27. Future Directions • Adding Black-box (Java) actors as actors in LPPN • Detailed measurements when actors need time for what • Automatic movement of actors for CPU congestions (deploying spring/mass model) • Automatic data parallelism (actor cloning and scatter+gather) • Overhaul of LPPN, maybe in Java, RMI, JNI • Better resource management

  28. Conclusions • LPPN – a simple, fast and extensible PN engine • Kepler successfully used as front-end for LPPN • Kepler for staging & monitoring • Interoperability between Kepler and LPPN

More Related