90 likes | 225 Views
SSS Deployment using OSCAR. John Mugler, Thomas Naughton, Phil Pfeiffer & Stephen Scott. Aug 2004, Argonne, IL SSS Face-to-face meeting. OSCAR: Cluster Toolkit. Framework for cluster management
E N D
SSS Deployment using OSCAR John Mugler, Thomas Naughton, Phil Pfeiffer & Stephen Scott Aug 2004, Argonne, IL SSS Face-to-face meeting
OSCAR: Cluster Toolkit • Framework for cluster management • simplifies installation, configuration and operation • reduces time/learning curve for cluster build • requires: pre-installed headnode w. supported Linux distribution • thereafter: wizard guides user thru setup/install of entire cluster • Package-based framework • Content: Software + Configuration, Tests, Docs • Types: • Core: SIS, C3, Switcher, ODA, OPD, (Support Libs) • Non-core: selected & third-party • Access: repositories accessible via OPD/OPDer
OSCAR Wizard * OSCAR-3.0 release
Using OSCAR for SSS Problem: Helping users obtain and install SSS software. Solution: Leverage OSCAR framework to package and distribute the SSS suite, sss-oscar. sss-oscar A release of OSCAR containing all SSS software in single downloadable bundle.
OSCAR-ized SSS Components • Bamboo – Queue/Job Manager • BLCR – Berkeley Checkpoint/Restart • Gold – Accounting & Allocation Management System • LAM/MPI (w/ BLCR) – Checkpoint/Restart enabled MPI • MAUI-SSS – Job Scheduler • SSSLib – SSS Communication library • Includes: SD, EM, PM, BCM, NSM, NWI • Warehouse – Distributed System Monitor • MPD2 – MPI Process Manager * As of April 2004
Current Status • Several 0.2 cuts – (latest 0.2a8) • 2 items remain in 0.2 Tracker, http://sf.net/projects/sss-oscar/ • Both items have fixes in CVS, pending testing • Added $OSCAR_PACKAGE_TEST_HOME and work arounds for current testing framework • Ready for 0.2a9 & can test during meeting! • After testing post as pre-release on main www site • Starting work on 0.3, etc. • New Gold pkg • OSCAR support for part of BCWG schema
TODO • Integrate Gold into new releases • Integrate APItest into OSCAR • SSS Component authors create their APItest cases • Update individual SSS Pkgs as needed • Update/Improve Documentation for v1.0 • Start weekly builds for testing (next slide) • Improve testing/bug reporting (fixing)
Release Schedule ADJUST - previous Sept. 3 freeze date for SC’04 release. [Nov 8] SC’04 release sss-oscar-1.0 • Whoo-hooo! [Oct 4] SC’04 freeze • No changes except to fix approved bugs [Sep *] weekly builds • Available first day of week by 12 noon • Untested “as-is” tarballs for tests/bug • Each developer to test their component for acceptance • Your pkg & any dependent pkgs install properly • Report/Respond to bugs in Tracker • Make appropriate fixes in CVS to remedy any errors
Resources • ORNL “Test1” cluster • Full install tests & restore of headnode (2 compute victims) • Access via ORNL Login Server (examples/info pending) • Must do reservations/coordinate use • Restart nodes with care no remote power mgmt • Developer CVS repository • Hosted at http://sss-oscar.sf.net • Account requests torc@msr.csm.ornl.gov • OSCAR Homepage • http://oscar.OpenClusterGroup.org • Includes “HOWTO: Create an OSCAR Package” document