1 / 30

The Roadmap to New Releases

The Roadmap to New Releases. Todd Tannenbaum Department of Computer Sciences University of Wisconsin-Madison http://www.cs.wisc.edu/condor condor-admin@cs.wisc.edu. Stable vs. Development Series. Much like the Linux kernel, Condor provides two different releases at any time: Stable series

obelia
Download Presentation

The Roadmap to New Releases

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The Roadmap to New Releases Todd Tannenbaum Department of Computer Sciences University of Wisconsin-Madison http://www.cs.wisc.edu/condor condor-admin@cs.wisc.edu

  2. Stable vs. Development Series • Much like the Linux kernel, Condor provides two different releases at any time: • Stable series • Development series • Allows Condor to be both a research project and a production-ready system

  3. Stable series • Series number in version is even (e.g. 6.2.0) • Releases are heavily tested • Only bug fixes and ports to new platforms are added on a stable series

  4. Stable series (cont.) • A given stable release is always compatible with other releases from the same series • Recommended for production pools

  5. Development Series • Series number in the version is odd (e.g. 6.1.17, 6.3.1) • New features and new technology are added frequently • Versions from the same development series are not always compatible with each other

  6. Development Series (cont.) • Releases are not as heavily tested • Generally not recommended for production pools • … unless new features are required • … unless we recommend otherwise :^)

  7. Where is Condor Today? • Version 6.3.2 being released asap – this is the v6.4.0 release candidate. • We expect version 6.4.0 released by the end of March.

  8. What’s new for Condor v6.4.0?

  9. New Ports in 6.4.0 • Full support (with checkpointing and remote system calls): • RedHat 7.x (Linux 2.4.x kernel + glibc 2.2.x)

  10. New Ports in 6.4.0 (cont.) • ”Clipped" support (no checkpointing, PVM, or remote system calls, but all other functionality is available) • Windows 2000 • Mac OS X

  11. Secure Communication • Secure network communication • Strong user authentication • Multiple methods supported: Kerberos, X509, NT LanMan, … • Encryption • Integrity • Authorization based on host or user

  12. New Job Universes • MPI Universe • Launch MPI jobs linked with MPICH library • Globus Universe • Faster, more reliable, better integrated • Java Universe

  13. Java Universe Job universe = java executable = Main.class jar_files = MyLibrary.jar input = infile output = outfile arguments = Main 1 2 3 queue condor_submit

  14. Why not use Vanilla Universe for Java jobs? • Java Universe provides more than just inserting “java” at the start of the execute line • Knows which machines have a JVM installed • Knows the location, version, and performance of JVM on each machine • Provides more information about Java job completion than just JVM exit code • Program runs in a Java wrapper, allowing Condor to report Java exceptions, etc.

  15. Java support, cont. condor_status -java Name JavaVendor Ver State Activity LoadAv Mem aish.cs.wisc. Sun Microsy 1.2.2 Owner Idle 0.000 249 anfrom.cs.wis Sun Microsy 1.2.2 Owner Idle 0.030 249 babe.cs.wisc. Sun Microsy 1.2.2 Claimed Busy 1.120 123 ...

  16. Condor File Transfer • Condor will transfer job files from the submit machine to the execute machine • Files to send and/or receive specified at submit time • Transfer is atomic • All files are transferred, or transfer fails • Appeared in v6.2 only in Condor for Windows

  17. File Transfer, cont. • Example: transfer_input_files = x, y, z … transfer_output_files = a, b, c …. transfer_files = [ ALWAYS | ONEXIT ] • Note: Condor can automatically figure out output files • Default: Send back any new/changed files

  18. Remote I/O Socket • Job can request that the condor_starter process on the execute machine create a Remote I/O Socket • Used for online access of file on submit machine – without Standard Universe. • Use in Vanilla, Java, … • Libraries provided for Java and for C, e.g. : Java: FileInputStream -> ChirpInputStream C : open() -> chirp_open()

  19. shadow starter Secure Remote I/O Local I/O (Chirp) I/O Server I/O Proxy Fork Local System Calls Job Home File System I/O Library Submission Site Execution Site

  20. Job Policy Expressions • User can supply job policy expressions in the submit file. • Can be used to describe a successful run. on_exit_remove = <expression> on_exit_hold = <expression> periodic_remove = <expression> periodic_hold = <expression>

  21. Job Policy Examples • Do not remove if exits with a signal: on_exit_remove = ExitBySignal == False • Place on hold if exits with nonzero status or ran for less than an hour: on_exit_hold = ((ExitBySignal==False) && (ExitSignal != 0)) || ((ServerStartTime – JobStartDate) < 3600) • Place on hold if job has spent more than 50% of its time suspended: periodic_hold = CumulativeSuspensionTime > (RemoteWallClockTime / 2.0)

  22. Firewall Support • Port Restrictions • In condor_config file can specify: LOWPORT = x HIGHPORT = y • All dynamic ports will be between x and y inclusive • Condor + Firewalls/Private Networks: • Who: Se-Chang Son • Time: 9am-12pm Weds • Where: rm 3387

  23. Condor on Windows • On both NT and Win2k • New universes added: MPI, Java, Scheduler (and Globus in the works!) • DAGMan ported • CondorView ported • Run shadow + DAGMan as the user • Allows submission from directories on shared filesystems

  24. And more… • Unix Man pages • Fetch/consolidate log files remotely • ClassAd chaining • Many DAGMan improvements • Bug fixes, etc…

  25. What’s Next?Future Directions • Increased focus on standalone tools built with Condor Technology • DAGMan • NeST • PFS • HawkEye • Condor-G …

  26. What’s Next? • Big Item: More focus on being a service provider than just an end-user tool: • Developer APIs / libraries • SOAP access to services • XML representations of user logs, ClassAds, accounting info, etc.

  27. More what’s next… • Condor on Windows • Increased support from Microsoft Research • Remote I/O • Complete Shared Filesystem support • Condor-G • MPI Scheduling Improvements

  28. More what’s next… • New version of ClassAds into Condor • Conditionals !! • if/then/else • Aggregates (lists, nested classads) • Built-in functions • String operations, pattern matching, time operators, unit conversions • Clean implementations in C++ and Java • ClassAd collections

  29. More what’s next… • Re-write of the condor_schedd • Performance enhancements and lowered resource requirements (particularly RAM) • Re-write of the checkpoint server • Add secure communication • NEST technology infusion • Enhanced support for multiple servers • Store meta-data along with checkpoint files

  30. Thank you for coming to Paradyn/Condor Week!

More Related