130 likes | 248 Views
SSS Deployment using OSCAR. John Mugler, Thomas Naughton & Stephen Scott. May 2005, Argonne, IL SSS Face-to-face meeting. OSCAR: Cluster Toolkit. Framework for cluster management
E N D
SSS Deployment using OSCAR John Mugler, Thomas Naughton & Stephen Scott May 2005, Argonne, IL SSS Face-to-face meeting
OSCAR: Cluster Toolkit • Framework for cluster management • simplifies installation, configuration and operation • reduces time/learning curve for cluster build • requires: pre-installed headnode w. supported Linux distribution • thereafter: wizard guides user thru setup/install of entire cluster • Package-based framework • Content: Software + Configuration, Tests, Docs • Types: • Core: SIS, C3, Switcher, ODA, OPD, (Support Libs) • Non-core: selected & third-party • Access: repositories accessible via OPD/OPDer
OSCAR Wizard * OSCAR-3.0 release
Using OSCAR for SSS Problem: Helping users obtain and install SSS software. Solution: Leverage OSCAR framework to package and distribute the SSS suite, sss-oscar. sss-oscar A release of OSCAR containing all SSS software in single downloadable bundle.
OSCAR-ized SSS Components • Bamboo – Queue/Job Manager • BLCR – Berkeley Checkpoint/Restart • Gold – Accounting & Allocation Management System • LAM/MPI (w/ BLCR) – Checkpoint/Restart enabled MPI • MAUI-SSS – Job Scheduler • SSSLib – SSS Communication library • Includes: SD, EM, PM, BCM, NSM, NWI • Warehouse – Distributed System Monitor • MPD2 – MPI Process Manager * As of May 2005
Current Status • Released v1.0 at SC’04 • Based on oscar-3.0 (using Red Hat 9/x86) • All SSS components represented • Testing for v1.1 release • Small update release • Still oscar-3.0 based • Synchronize with OSCAR release schedule • oscar-4.1 released • Shift to oscar-4.1 in sss-oscar-1.2 release (2Q2005)
OSCAR v4.1 Highlights • SSS’s APItest tool integrated into v4.1 release • Improved use of DepMan/PackMan abs. layer • Distributions supported in v4.1 • x86: RH 9, FC2, MDK 10.0 • x86 & ia64: RH EL 3 • Initial work started for Debian • Not in v4.1 release but working with 4.x devel tree
TODO: SSS • Short term • Complete testing for v1.1beta & release • Update SSS documentation • Medium term • Migrate to new FRE testbedand repository (pending approval) • New/more Linux distribution/architecture/kernel support • Longer term • Extend SSS component tests 1) Installation, 2) Validation, 3) Durability/Stress, 4) Performance • Track oscar-4.x releases for v5.0 compatibility • Distribute as OSCAR “Package Set” • Pending feature support in OSCAR • OPKG ordering within a phase • Pending feature support in OSCAR
SSS-OSCAR Release Schedule Add features to Tracker @ http://sf.net/projects/sss-oscar/
Roadmap • 1.2 (frz: jun, rel: jul) • Fedor Core 2 / Pkg rebuild • BLCR upgrade to linux-2.6 • Improved install/validation tests • oscar-4.1 opkg modifications (updates) • Updates to HOWTO as needed • Simplify XML meta file • Close (most) open tracker issues • 2.0 (frz: aug, rel: sep) • LRS change over • Fedora Core 4 / Pkg rebuild • Improved install/validation tests • Add performance/stress tests? • oscar-4.x opkg modifications (updates) • Updates to HOWTO as needed • Meta-scheduler (Silver)? • 2.0.1 (frz: oct, rel: nov) [SC’05] • Any bugfixes/minor updates • 2.02 • SSS oscar-pkg set
Goals for sss-oscar-2.0 • Release v2.0 at SC’05 • Compatible with oscar-5.0 • Support current Linux distribution(s) • Improve interoperability with standard OSCAR • Users obtain via “SSS OSCAR Pkg Repository” • Likely leverage “Package Sets” for logical grouping • Clarify SSS package dependencies • What about outside of SSS-OSCAR? • Improved testing • Supply thorough installation/validation/performance tests • Documentation • Specifications for component interfaces (schemas), etc.
Comments/Discussion • Provide a lower cost of entry • Doc to help knit system together • Clarify dependencies/interactions • Intra-component and inter-component • Feedback to help Ron O. for testing/validation • Tests to verify against component specs. • Ex. The PM specs state X capability & it work in this build • Effectively conformance tests to “optional” SSS specs. • What do we need to help coming releases? • Louder drum for Thomas? • Dedicated integration periods (face-to-face and/or virtual)?
Resources • ORNL test clusters • Systems: sss-xtorc, test1, test2 • Access via ORNL SSH Login Server • Must do reservations/coordinate use (Note, no remote power mgmt) • Investigating ORNL “FRE” (enclaves) • Add “testX” system to alleviate ORNL SSH Login Server • SSS-OSCAR Project page • Hosted at http://sourceforge.net/projects/sss-oscar/ • OSCAR Homepage • http://www.OpenClusterGroup.org/OSCAR/ • Includes “HOWTO: Create an OSCAR Package” document