1 / 20

Metrics for the Office of Science HPC Centers Jonathan Carter User Services Group Lead

Metrics for the Office of Science HPC Centers Jonathan Carter User Services Group Lead jtcarter@lbl.gov NERSC User Group Meeting June 12, 2006. Goals. Informational Metrics Panel Draft proposal Solicit Feedback Are proposed metrics reasonable? Fine tuning ‘capability job’ metrics.

falala
Download Presentation

Metrics for the Office of Science HPC Centers Jonathan Carter User Services Group Lead

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Metrics for the Office of Science HPC Centers Jonathan Carter User Services Group Lead jtcarter@lbl.gov NERSC User Group Meeting June 12, 2006

  2. Goals • Informational • Metrics Panel • Draft proposal • Solicit Feedback • Are proposed metrics reasonable? • Fine tuning ‘capability job’ metrics

  3. Office of Science “Metrics Panel” • ASCR has asked a panel for recommendations about metrics • Panel is headed by Gordon Bell from Microsoft • Its goals: • performance measurement and assessment at Office of Science (SC) HPC facilities • appropriateness and comprehensiveness of the measures • science accomplishments and their effects on SC’s science programs • provide input for the Office of Management and Budget (OMB) • evaluation of ASCR progress towards the long-term goals specified in the OMB Program Assessment Rating Tool (PART) • NERSC, ORNL and ANL have provided input

  4. Current OMB PART Metrics • Acquisitions should be no more than 10% more than planned cost and schedule. This metric is reasonable. • 40% of the computational time is used by jobs with a concurrency of 1/8 or more of the maximum usable compute CPUs. Meeting this metric has positive and negative effects: motivated increased scaling of user codes; not related to the quantity, quality, or productivity of the science. • Every year several selected science applications are expected to increase efficiency by at least 50%. This metric was motivated by the desire to increase the percent of peak performance in large science applications, which now has less merit. Should be replaced by a scaling metric.

  5. Suggestions for PART Metrics • Three PART metrics are sufficient to demonstrate DOE Office of Science’s progress in advancing the state of high performance computing. • Cost-efficient and timely acquisitions clearly important • Metric #1 retained but slightly modified (scoring). • Primary interest of OMB is whether the computational resources in the Office of Science are facilitating scientific discovery: the PART metrics should reflect this interest. • Metrics #2 and #3 should be changed 5

  6. Suggestions for PART Metrics • Scientific Discovery is hard to measure in near term • Propose using following sets of metrics to assess two factors that are highly influential on scientific discovery • how well the computational facilities provision resources and services (Facility Metrics), and • how well computational scientists use these resources to produce science (Computational Science Metrics) • Some combination of these metrics should replace PART #2 and #3

  7. Metrics Terminology • Goal: the behavior being motivated • Metric: what is being measured • Value: the value for the metric that must be achieved

  8. Facility Metrics • How well the computational facilities provision resources and services • Specifics of goals and metrics impacts your experience running at NERSC

  9. Facility Metrics: User Satisfaction Goal #1: User Satisfaction Meeting the metric means that the users are satisfied with how well the facility provides resources and services. Metric #1.1: Users find the systems and services of a facility useful and helpful. Value #1.1: The overall satisfaction of an annual user survey is 5.25 or better (out of 7). Metric #1.2: Facility responsiveness to user feedback Value #1.2: There is an improved user rating in areas where previous user ratings had fallen below 5.25 (out of 7).

  10. Facility Metrics: System Availability Goal #2: Office of Science systems are ready and able to process the user workload. Meeting this metric means the machines are up and available most of the time. Availability has real meaning to users. Metric #2.1: Scheduled availability Scheduled availability is the percentage of time a system is available for users, accounting for any scheduled downtime for maintenance and upgrades. Value #2.1: Within 18 months of delivery and thereafter, scheduled availability is > 95%

  11. Facility Metrics: Effective Assistance Goal #3: Facilities provide timely and effective assistance Helping users effectively use complex systems is a key service that leading computational facilities supply. Users desire their inquiry is heard and is being worked. Users also need to have their problems answered properly in a timely manner. Metric #3.1: Problems are recorded and acknowledged Value #3.1: 99% of user problems are acknowledged within 4 working hours. Metric #3.2: Most problems are solved within a reasonable time Value #3.2: 80% of user problems are addressed within 3 working days, either by resolving them or (for longer term problems) by informing the user of a longer term plan and providing periodic updates

  12. Facility Metrics: Facilitating Capability Jobs Goal #4: Facilitate running capability jobs Major computational facilities have to run capability jobs. The definition of a capability job needs to be defined by agreement between the Program Office and the Facility. The number of processors that define a capability job is a function of the number of available processors, the number and kind of projects or users that the facility supports. This function has not yet been determined. Metric #4.1: The majority of computational time goes to capability jobs. Value #4.1: T% of all computational time will go to jobs that use more than N CPUs (or x% of the available processors) Metric #4.2: Capability jobs are provided excellent turnaround Value #4.2: For capability jobs, the expansion factor is X orless.

  13. Discussion: What is a Capability Job? • A job using 1/8 of the processors? • A job using 1/10 of the processors? • A project that received ≥ 3% of the DOE allocation (3 such projects at NERSC)? • A project that received ≥ 2% of the total allocation (12 projects)? • A project that received ≥ 1% of the total allocation (25 projects)? • A function of both the number of processors and the number of projects at a facility? E.g. 10 * max procs / num projects: • NERSC: 10 * 6080 procs / 300 projects = 202 processors • Leadership: 10* 10,000 procs / 20 projects = 5,000 processors

  14. Discussion: Should we have a Target Expansion Factor? • Relationship between Expansion Factor and Allocations: • Inverse relationship between the expected expansion factor and the percentage of resource that is allocated • the more that gets allocated the longer the wait times and the higher the expansion factor • For which class of jobs should an Expansion Factor metric apply? • Capability jobs only? • All regular charge jobs? • Other? • For which machines should an Expansion Factor metric apply? • Only the largest machine at a facility? • All machines, each weighted by their contribution to the total allocation?

  15. Discussion: What should the Target Expansion Factor Be? • Traditional Expansion Factor: E(job) = (wait_time + run_time) / run time • Proposed Formula (only request time can influence scheduling decisions): E(job) = (wait time + request time) / request time • Weight to use in computing the Expansion Factor for a class of jobs: • Simple average • Request time • Request time * number of processors (this gives more weight to capability jobs) • When to start counting wait time? • On Seaborg and Bassi: when the job enters Idle state • On Jacquard: when the job was submitted (this will change with Maui scheduler)

  16. Past NERSC Expansion Factors for Regular Charge Class

  17. Past Seaborg Expansion Factors for Regular Charge Class

  18. Computational Science Metrics • Ability of projects to use facility resources for science

  19. Computational Science Metrics: Science Progress CS Goal #1: Science Progress While there are many laudable science goals, it is vital that significant computational progress is made against the Nation’s science challenges and questions. Metric #CS1.1: Progress is demonstrated toward the scientific milestones in the top 20 projects at each facility based on the simulation results planned and promised in their project proposals. Value #CS1.1: For the top 20 projects at each facility, an assessment is made by the related program office regarding how well scientific milestones were met or exceeded relative to plans determined during the review period.

  20. Computational Science Metrics: Code Scalability CS Goal #2: Scalability of Computational Science Applications The major challenge facing computational science during the next five to ten years is the increased parallelism needed to use more computational resources. Multi-core chips accelerate the need to respond to this challenge. Metric #CS2.1: Science applications should increase in scalability. Value #CS2.1: The scalability of selected applications increase by a factor of 2 every three years. The definition of scalability (strong, weak, etc.) might be domain- and/or code-specific.

More Related