420 likes | 521 Views
Active and Accelerated Learning of Cost Models for Optimizing Scientific Applications. Paper by - Piyush Shivam,Shivnath Babu, Jeffrey S.Chase (VLDB 2006). Presented by - Rahul Singh. Previous Work.
E N D
Active and Accelerated Learning of Cost Models for Optimizing Scientific Applications Paper by - Piyush Shivam,Shivnath Babu, Jeffrey S.Chase (VLDB 2006) Presented by - Rahul Singh CS691D: Hot-OS
Previous Work • Problem of generating a cost model for a task G reduced to learning a regression model that fits a set of m sample points collected by running G on different resource assignments • Each run of G on a resource assignment produces a vector where each is a h/w attribute of and T is the total execution time CS691D: Hot-OS
Challenges • Taking each sample takes T time, may be hours or days for long running scientific applications • As dimensionality of data increases, we need more samples to get an accurate model • Need more samples to cover operating range CS691D: Hot-OS
How big is the problem ? • Consider a 5 - dimensional space with 10 distinct values per dimension , average sample acquisition time of 10 mins. , to get just 1% of the total sample space we need (1/100)*(10)5*10 / 60/24 = 6.9 days CS691D: Hot-OS
What NIMO does? • NIMO ( NonInvasive Modeling for Optimization ) • Is active , deploys and monitors applications on heterogeneous resource assignments to gather training data • Noninvasive, does not require changes to OS or application s/w, all data collected through monitoring tools • Active sampling of resource assignments to accelerate convergence to an accurate cost model CS691D: Hot-OS
Components • Scheduler • Enumerates , selects , and executes plans for workflows • Modeling engine • Resource profiler • Data profiler • Application profiler • Workbench • Proactively run plans to collect samples CS691D: Hot-OS
Scheduler • Given a workflow G • Enumerate candidate resource assignments • Estimate cost using cost function • Choose execution plan with least estimated completion time • Sometimes there may be a staging task between tasks CS691D: Hot-OS
Modeling Engine • Generating cost model automatically for G CS691D: Hot-OS
Workbench • Heterogeneous pool of compute , network , and storage resources • Modeling engine initiates new runs of G on selected resource assignments in the workbench to obtain sufficient training data • Instrumentation data is collected during run then aggregated to find out occupancies CS691D: Hot-OS
Total picture CS691D: Hot-OS
Assumptions • Leave out dataset as a variable , hence we find cost models for a specific task and a specific dataset I.e. predictor functions are instead of • Does not handle shared access to a resource by multiple tasks. • Resources assigned to a task should remain constant throughout the execution of the task CS691D: Hot-OS
Active accelerated learning of predictor functions CS691D: Hot-OS
Running G on a resource assignment CS691D: Hot-OS
Finding G’s occupancies on CS691D: Hot-OS
Initialization • Select a reference resource assignment • 3 ways to select reference assignment • Random assignment ( Rand ) • High-Capacity Assignment ( Max), pick the fastest processor speed , lowest network latency , highest transfer rate storage • Low-Capacity Assignment ( Min ) • Initially the predictor functions are constant functions using values for the occupancies noted while running on the reference resource assignment CS691D: Hot-OS
Predictor function to refine? • Static schemes • Dynamic schemes CS691D: Hot-OS
Static schemes • Decide a total ordering of the predictor functions • Decide a traversal plan for picking a predictor function to refine in each iteration CS691D: Hot-OS
Static schemes - Ordering techniques • Domain-knowledge-based • Expert may know that a G is likely to be CPU-intensive for most resource assignments so should come first in ordering and should be refined first • Relevance-based • Estimate the relevance of the predictor function on G(I) using classical Plackett-Burman design with foldover (PBDF ) statistical technique, sort in decreasing order of effect • To order the 4 functions , perform 8 runs of G(I) on predefined resource assignments CS691D: Hot-OS
Static schemes - Traversal plan • Round-robin • Choose the predictor function in a round-robin fashion from the total order • Improvement-based • Traverse total order from beginning to end • Keep refining current predictor function till reduction in prediction error falls below a threshold • When done with all predictor functions start from beginning of total order CS691D: Hot-OS
Dynamic schemes • Predictor function to refine is based on the training samples collected so far • Choose to refine the predictor function with the maximum current prediction error CS691D: Hot-OS
Adding new attribute to predictor function • When to add a new resource profile attribute to a predictor function • Which of the k attributes from to add for maximum reduction in prediction error • First define a total order over and then define a traversal order CS691D: Hot-OS
Total ordering over attributes • Domain-knowledge-based • Domain expert specifies a total order • For instance the expert may know that the task has purely sequential I/O pattern and hence the memory-size attribute may have minimal effect on the compute occupancy.So put this attribute towards the end of the total order • Relevance-based • Estimate the effect of each resource-profile attribute on the occupancy predicted by the predictor function using PBDF • Order attributes in decreasing order of effect CS691D: Hot-OS
Traversal Order over attributes • Improvement-based • Add the next attribute when the reduction achieved in prediction error during an iteration with the current predictor function falls below a threshold • Resume from beginning when all get exhausted CS691D: Hot-OS
Selection of new resource assignment • We need to collect new samples for learning by choosing different resource assignments • We need to keep in mind: • Covering the full operating range to avoid learning functions which are only valid for specific regions • If the processor speed is kept in low ranges, there may be prefetching which may hide the I/O latency • Capturing interactions among attributes • Effect of changing processor speed on the compute occupancy may depend on value of network latency • This interaction may or may not have an impact on the overall execution time CS691D: Hot-OS
Techniques for selecting new resource assignment CS691D: Hot-OS
The notation • denotes the number of distinct values or levels in the attribute’s operating range covered by the technique • denotes the largest degree of interactions among the attributes guaranteed to be captured by the technique CS691D: Hot-OS
The Lmax-I1 CS691D: Hot-OS
L2-I2 • 2 levels per attribute • Pairwise interactions among attributes CS691D: Hot-OS
Current Prediction Error • Cross Validation • For each sample s out of the m samples collected , find predictor functions using all but s then find mean absolute percentage error (MAPE) by predicting s using the prediction function • Fixed test set • Designate a small subset of resource assignments in workbench as internal test set • Selection of test set assignments can be done randomly • In the initialization step we run the task on each of the assignments in the test set • Current prediction error is the MAPE in predicting occupancy for each element of this set CS691D: Hot-OS
Test Bed • 4 scientific tasks taken • Workbench has • 5 CPU speeds • 5 memory sizes • 6 network latencies • Hence we have total 150 candidate resource assignments CS691D: Hot-OS
Impact of reference assignments CS691D: Hot-OS
Observations • Max has highest resource capacity so it takes the shortest time to finish first run • Prediction errors may drop sharply with the addition of a relevant attribute • Min and Rand training samples more representative of the sample space CS691D: Hot-OS
Impact of choosing predictor to refine CS691D: Hot-OS
Observations • Improvement based is sensitive to ordering of the predictor functions, you may keep improving a predictor which doesn’t have that much effect • Round robin less sensitive to order • Dynamic gets stuck in a local minima it keeps refining one function only CS691D: Hot-OS
Impact of choice of adding new attributes CS691D: Hot-OS
Impact of choice of selecting new sample assignments CS691D: Hot-OS
Impact of choice of computing current prediction error CS691D: Hot-OS
Design of Experiments • A branch of statistics which studies planned investigation of factors affecting system performance CS691D: Hot-OS
Designed experiments In a designed experiment, the data-producing process is actively manipulated to improve the quality of information and to eliminate redundant data. A common goal of all experimental designs is to collect data as parsimoniously as possible while providing sufficient information to accurately estimate model parameters. CS691D: Hot-OS
Various ways to design experiments • Full factorial designs • If k factors all 2k runs have to be performed • Not feasible when there are large variables • Fractional factorial designs • Take only a fraction out of the 2k combinations • Plackett-Burman designs • Economical run numbers only a multiple of 4 • Efficient when only main effects are of interest CS691D: Hot-OS
Guide to designing experiments CS691D: Hot-OS
References: • http://www.itl.nist.gov/div898/handbook/pri/section3/pri3.htm • http://www.mathworks.com/access/helpdesk/help/toolbox/stats/index.html?/access/helpdesk/help/toolbox/stats/f7630.html&http://www.google.com/search?hl=en&q=fractional+factorial+design&btnG=Google+Search CS691D: Hot-OS