Active and Accelerated Learning of Cost Models for Optimizing Scientific Applications

Active and Accelerated Learning of Cost Models for Optimizing Scientific Applications Paper by - Piyush Shivam,Shivnath Babu, Jeffrey S.Chase (VLDB 2006) Presented by - Rahul Singh CS691D: Hot-OS

Previous Work • Problem of generating a cost model for a task G reduced to learning a regression model that fits a set of m sample points collected by running G on different resource assignments • Each run of G on a resource assignment produces a vector where each is a h/w attribute of and T is the total execution time CS691D: Hot-OS

Challenges • Taking each sample takes T time, may be hours or days for long running scientific applications • As dimensionality of data increases, we need more samples to get an accurate model • Need more samples to cover operating range CS691D: Hot-OS

How big is the problem ? • Consider a 5 - dimensional space with 10 distinct values per dimension , average sample acquisition time of 10 mins. , to get just 1% of the total sample space we need (1/100)*(10)5*10 / 60/24 = 6.9 days CS691D: Hot-OS

What NIMO does? • NIMO ( NonInvasive Modeling for Optimization ) • Is active , deploys and monitors applications on heterogeneous resource assignments to gather training data • Noninvasive, does not require changes to OS or application s/w, all data collected through monitoring tools • Active sampling of resource assignments to accelerate convergence to an accurate cost model CS691D: Hot-OS

Components • Scheduler • Enumerates , selects , and executes plans for workflows • Modeling engine • Resource profiler • Data profiler • Application profiler • Workbench • Proactively run plans to collect samples CS691D: Hot-OS

Scheduler • Given a workflow G • Enumerate candidate resource assignments • Estimate cost using cost function • Choose execution plan with least estimated completion time • Sometimes there may be a staging task between tasks CS691D: Hot-OS

Modeling Engine • Generating cost model automatically for G CS691D: Hot-OS

Workbench • Heterogeneous pool of compute , network , and storage resources • Modeling engine initiates new runs of G on selected resource assignments in the workbench to obtain sufficient training data • Instrumentation data is collected during run then aggregated to find out occupancies CS691D: Hot-OS

Total picture CS691D: Hot-OS

Assumptions • Leave out dataset as a variable , hence we find cost models for a specific task and a specific dataset I.e. predictor functions are instead of • Does not handle shared access to a resource by multiple tasks. • Resources assigned to a task should remain constant throughout the execution of the task CS691D: Hot-OS

Active accelerated learning of predictor functions CS691D: Hot-OS

Running G on a resource assignment CS691D: Hot-OS

Finding G’s occupancies on CS691D: Hot-OS

Initialization • Select a reference resource assignment • 3 ways to select reference assignment • Random assignment ( Rand ) • High-Capacity Assignment ( Max), pick the fastest processor speed , lowest network latency , highest transfer rate storage • Low-Capacity Assignment ( Min ) • Initially the predictor functions are constant functions using values for the occupancies noted while running on the reference resource assignment CS691D: Hot-OS

Predictor function to refine? • Static schemes • Dynamic schemes CS691D: Hot-OS

Static schemes • Decide a total ordering of the predictor functions • Decide a traversal plan for picking a predictor function to refine in each iteration CS691D: Hot-OS

Static schemes - Ordering techniques • Domain-knowledge-based • Expert may know that a G is likely to be CPU-intensive for most resource assignments so should come first in ordering and should be refined first • Relevance-based • Estimate the relevance of the predictor function on G(I) using classical Plackett-Burman design with foldover (PBDF ) statistical technique, sort in decreasing order of effect • To order the 4 functions , perform 8 runs of G(I) on predefined resource assignments CS691D: Hot-OS

Static schemes - Traversal plan • Round-robin • Choose the predictor function in a round-robin fashion from the total order • Improvement-based • Traverse total order from beginning to end • Keep refining current predictor function till reduction in prediction error falls below a threshold • When done with all predictor functions start from beginning of total order CS691D: Hot-OS

Dynamic schemes • Predictor function to refine is based on the training samples collected so far • Choose to refine the predictor function with the maximum current prediction error CS691D: Hot-OS

Adding new attribute to predictor function • When to add a new resource profile attribute to a predictor function • Which of the k attributes from to add for maximum reduction in prediction error • First define a total order over and then define a traversal order CS691D: Hot-OS

Total ordering over attributes • Domain-knowledge-based • Domain expert specifies a total order • For instance the expert may know that the task has purely sequential I/O pattern and hence the memory-size attribute may have minimal effect on the compute occupancy.So put this attribute towards the end of the total order • Relevance-based • Estimate the effect of each resource-profile attribute on the occupancy predicted by the predictor function using PBDF • Order attributes in decreasing order of effect CS691D: Hot-OS

Traversal Order over attributes • Improvement-based • Add the next attribute when the reduction achieved in prediction error during an iteration with the current predictor function falls below a threshold • Resume from beginning when all get exhausted CS691D: Hot-OS

Selection of new resource assignment • We need to collect new samples for learning by choosing different resource assignments • We need to keep in mind: • Covering the full operating range to avoid learning functions which are only valid for specific regions • If the processor speed is kept in low ranges, there may be prefetching which may hide the I/O latency • Capturing interactions among attributes • Effect of changing processor speed on the compute occupancy may depend on value of network latency • This interaction may or may not have an impact on the overall execution time CS691D: Hot-OS

Techniques for selecting new resource assignment CS691D: Hot-OS

The notation • denotes the number of distinct values or levels in the attribute’s operating range covered by the technique • denotes the largest degree of interactions among the attributes guaranteed to be captured by the technique CS691D: Hot-OS

The Lmax-I1 CS691D: Hot-OS

L2-I2 • 2 levels per attribute • Pairwise interactions among attributes CS691D: Hot-OS

Current Prediction Error • Cross Validation • For each sample s out of the m samples collected , find predictor functions using all but s then find mean absolute percentage error (MAPE) by predicting s using the prediction function • Fixed test set • Designate a small subset of resource assignments in workbench as internal test set • Selection of test set assignments can be done randomly • In the initialization step we run the task on each of the assignments in the test set • Current prediction error is the MAPE in predicting occupancy for each element of this set CS691D: Hot-OS

Test Bed • 4 scientific tasks taken • Workbench has • 5 CPU speeds • 5 memory sizes • 6 network latencies • Hence we have total 150 candidate resource assignments CS691D: Hot-OS

Impact of reference assignments CS691D: Hot-OS

Observations • Max has highest resource capacity so it takes the shortest time to finish first run • Prediction errors may drop sharply with the addition of a relevant attribute • Min and Rand training samples more representative of the sample space CS691D: Hot-OS

Impact of choosing predictor to refine CS691D: Hot-OS

Observations • Improvement based is sensitive to ordering of the predictor functions, you may keep improving a predictor which doesn’t have that much effect • Round robin less sensitive to order • Dynamic gets stuck in a local minima it keeps refining one function only CS691D: Hot-OS

Impact of choice of adding new attributes CS691D: Hot-OS

Impact of choice of selecting new sample assignments CS691D: Hot-OS

Impact of choice of computing current prediction error CS691D: Hot-OS

Design of Experiments • A branch of statistics which studies planned investigation of factors affecting system performance CS691D: Hot-OS

Designed experiments In a designed experiment, the data-producing process is actively manipulated to improve the quality of information and to eliminate redundant data. A common goal of all experimental designs is to collect data as parsimoniously as possible while providing sufficient information to accurately estimate model parameters. CS691D: Hot-OS

Various ways to design experiments • Full factorial designs • If k factors all 2k runs have to be performed • Not feasible when there are large variables • Fractional factorial designs • Take only a fraction out of the 2k combinations • Plackett-Burman designs • Economical run numbers only a multiple of 4 • Efficient when only main effects are of interest CS691D: Hot-OS

Guide to designing experiments CS691D: Hot-OS

References: • http://www.itl.nist.gov/div898/handbook/pri/section3/pri3.htm • http://www.mathworks.com/access/helpdesk/help/toolbox/stats/index.html?/access/helpdesk/help/toolbox/stats/f7630.html&http://www.google.com/search?hl=en&q=fractional+factorial+design&btnG=Google+Search CS691D: Hot-OS

Active and Accelerated Learning of Cost Models for Optimizing Scientific Applications

Active and Accelerated Learning of Cost Models for Optimizing Scientific Applications

Presentation Transcript

Accelerated Learning and Training

Accelerated Learning and Training

Optimizing Cost and Performance for Multihoming

Optimizing Cost and Performance for Multihoming

Optimizing Estimated Loss Reduction for Active Sampling in Rank Learning

Program for Accelerated Learning

Active Learning for Probabilistic Models

Scientific Applications of Machine Learning

Active Learning for Hidden Markov Models

Active Learning for Active Citizenship

Active Teaching for Active Learning

Active and Accelerated Learning of Cost Models for Optimizing Scientific Applications

Active Sampling for Accelerated Learning of Performance Models

Learning for Optimizing Compilers

ELT and Accelerated Learning

Optimizing Cost and Performance for Multihoming

Optimizing Distributed Learning Models:

Accelerated learning

Accelerated Learning