1 / 10

Predictive Application-Performance Modeling in a Computational Grid Environment (HPDC ‘99)

This paper presents the use of locally-weighted memory-based learning to predict resource usage of applications in a grid environment. The study shows that the simplest approach is often the best. The approach has been implemented in the PUNCH system, which is a web-based batch-oriented system for accessing non-interactive tools. The paper discusses the use of synthetic and real datasets to argue for the effectiveness of the approach. The paper also highlights the optimizations and algorithm improvements in PUNCH.

dannab
Download Presentation

Predictive Application-Performance Modeling in a Computational Grid Environment (HPDC ‘99)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Predictive Application-Performance Modeling in a Computational Grid Environment(HPDC ‘99) Nirav Kapadia, José Fortes, Carla Brodley ECE, Purdue Presented by Peter Dinda, CMU

  2. Summary • Use locally-weighted memory-based learning (instance-based learning) to predict each application run’s resource usage based on parameters specified by an application expert and measurements of previous application runs. • Surprising result: simplest is best • Implemented in the PUNCH system

  3. Outline • PUNCH • Resource usage and application parameters • Locally-weighted, memory-based learning • Synthetic datasets argue for a sophisticated approach • Algorithm optimizations in PUNCH • Datasets from a real application argue for a mind-numbingly simple approach

  4. PUNCH • “Purdue University Network Computing Hub” • Web-based batch-oriented system for accessing non-interactive tools • Tool-specific forms guide user in setting up a run • command-line parameters, input and output files • PUNCH schedules run on shared resources • Extensively used: 500 users, 135K runs • Mostly students taking ECE classes • Wide range of tools (over 40) • Paper focuses on T-Supreme3 • Simulates silicon fabrication • Really bad ideas: batch-oriented matlab

  5. Resource Usage • PUNCH needs to know resource usage (CPU time) to schedule run • Resource usage depends on application-specific parameters • command-line and input file parameters • Which ones? Specified by app expert • 7 parameters for T-Supreme3 • What is the relationship? Learn it on-line using locally-weighted memory-based learning

  6. Locally-weighted Memory-based Learning • Each time you run the application, record the parameter values and the resource usage in a database • Parameter values x -> resource usage y is function to be learned • Parameter values x define a point in domain • Predict resource usage yq of a new run whose parameters are xqbased on database records xi->yi where the xi are “close” to xq

  7. Answering a Query • Compute distance d from query point xq to all points xi in database • Select subset of points within some distance (the neighborhood kw) • Transform distances to neighborhood points into weights using a kernel functionK (Gaussian, say) • Fit a local model that tries to minimize the weighted sum of squared errors for the neighborhood • linear regression, ad hoc, mind-numbingly simple, ... • Apply the model to the query

  8. PUNCH Approaches • I don’t understand their distance metric • Kernel is 1.0 to nearest neighbor and then Gaussian • 1-Nearest-Neighbor • Return the nearest neighbor • 3-Point Weighted Average • Return weighted average of 3 nearest points • Linear regression • 16 nearest points for T-Supreme3 • Theoretically much better than the others

  9. Optimizations • 2-level database • Recent runs are preferred • Not clear how • May help when function is time dependent • when all students are doing the same homework • Significantly reduces query time • Instance editing • Add new runs only if incorrectly predicted • Remove runs that produce incorrect predictions • Shrink database without losing information

  10. Conclusions • LWMBL looks like a promising approach to resource usage prediction in some cases • Needs a much more thorough study, though, even for this batch-oriented use • Simplest is best is difficult to believe • Paper is a reasonable introduction to LWMBL for the grid community

More Related