1 / 9

Experiences with SSS software Architecture in a “Production” Environment

Rick Bradshaw, Narayan Desai, Andrew Lusk, Rusty Lusk, Brian Pellin Mathematics and Computer Science Division Argonne National Laboratory. Experiences with SSS software Architecture in a “Production” Environment. The “SSS on Chiba” Project.

komala
Download Presentation

Experiences with SSS software Architecture in a “Production” Environment

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Rick Bradshaw, Narayan Desai, Andrew Lusk, Rusty Lusk, Brian Pellin Mathematics and Computer Science Division Argonne National Laboratory Experiences with SSS software Architecture in a “Production” Environment

  2. The “SSS on Chiba” Project This was a summer project launched shortly after the last face-to-face meeting in June. Outline: • Definition of Project • Motivation • Limitations • Approach • Experiences • Status and Plans • Distribution

  3. Project Definition • Chiba City consists of 256 dual processor nodes running Linux, with Myrinet and Fast Ethernet • Scalability testbed • Project: determine whether SSS component architecture could be used to replace existing Chiba City system software, consisting of • PBS • Maui scheduler • Home-grown user software for distributing files and executables • No shared file system • Home-grown system software for managing nodes

  4. Motivation • Needed better systems software on Chiba City cluster • In general • For testing other SSS components (e.g. checkpointing) • For enabling Chiba as a testbed for scalable OS research • Needed to more thoroughly test existing ANL-written components • Stand-alone components • Build-and-Config Manager, Process Manager, Event Manager • Infrastructure components • Service Directory, Communication Library • Needed more experience with published XML interfaces • Had extra programming muscle available over the summer

  5. Limitations • Needed to do this very fast, before summer resources evaporated • Chiba is in constant use by research computer scientists (e.g. developing parallel file system) and computational scientists (e.g., physics, biology, etc.)

  6. Approach • Utilize assets on hand • Some central components (SD, EM, PM, Comm Library) • Existing publicized XML interfaces for these • Python programmers • Write stubs for other essential components • Scheduler • Nothing fancy • Only does FIFO with reservations and backfill • QM • Interface among user, scheduler, process manager • But some extra capabilities • Multiple job steps, e.g., to distribute files • Specify OS image to be loaded, to support testbed function • “PBS compatibility mode”, to allow users to reuse their job submission scripts • Use restriction syntax for stubs for simplicity and speed

  7. Experiences • At end of summer, after 2-week shakedown, we convinced Chiba management to go forward rather than reinstall old software. (No more PBS.) • Have been running user job mix for about three weeks, with no disasters. • Shook out some ambiguities in XML specification for component interfaces • Fixed bugs • Found and fixed scalability problems

  8. Status and Plans • Status • Working • Collecting user experiences • Plans • Short term • Incorporate other components from Process Management Working Group • Paul: kernel module, LAM support, and CP Manager • Craig: monitoring and data warehouse • Long term • Other components from rest of project, especially Resource Management Working Group components • Provide Chiba for OS experimentation as part of normal batch-scheduled jobs, e.g. Sandia group

  9. The End

More Related