1 / 8

A Litany on Usability: A mostly users view

A Litany on Usability: A mostly users view. David Evensky Sandia National Laboratories Livermore CA. Resource Allocation. Users can allocate CPU's, but we still have to share storage, networks, and messaging fabric. QoS for heavily contended control fabric (messaging?)

judith
Download Presentation

A Litany on Usability: A mostly users view

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Litany on Usability: A mostly users view David Evensky Sandia National Laboratories Livermore CA

  2. Resource Allocation • Users can allocate CPU's, but we still have to share storage, networks, and messaging fabric. • QoS for heavily contended control fabric (messaging?) • Node allocation to prevent space shared jobs from suffering cross talk. • Such fragmentation has been conjectured as a cause for intermittent slow downs for some codes on space shared machines.

  3. Architectural Assumptions“What do you mean there is no default route?” • Tools available in the community at large (open source & commercial) make assumptions about disks per node, network topology, and direct network reachability (no direct routes only ssh tunnels, etc.) • Storage & I/O remain problems.

  4. Architectural Assumptions “But my distributed objects code uses java!” • Systems that are optimized for a single class of problems have difficulty running other classes of problems. • Systems must be designed in cooperation with user community. (language set, apparent disk per node.) • Users should not have to see the node structure, resource management systems should take care of the basic functionality. There should be a way of cleanly exposing this for those that wish to fully optimize their performance. • I should be able to login to systems that I am using, if I want or need.

  5. Need standards and common practices • How can I get my data (big D) into the cluster for processing, and get it back out. How do I prevent any solution to this problem from being an ad hoc stovepipe solution? • Need std tools. everyone that builds a cluster rewrites similar tools for administrators and users.

  6. Attic • Dusty decks. How can I quickly run old programs on modern machines, without wasting large amounts of valuable programmer time? • Even not-so-dusty ones have to be rewritten at every purchase variation. • Special optimizations for specific processors and networks still crucial to achieving performance.

  7. Open source “Linux rules”“I am elite, give me warz, dude” • Using an open source OS and home grown systems. Need to provide users with • understandable documentation • training • development assistance to run effectively on this class of machines.

  8. Fault Tolerance • Make it easy for users to do checkpoint restart. • Integrated with resource management system. • When a HW fault occurs (assuming we get that info from the HW) that we report to the user that his job was killed because of HW rather than a mysterious 'Bus Error'. • Process migration (auto-magic migration to hot spares).

More Related