HELIX: Holistic Optimization for Accelerating Machine Learning Iterations

HELIX: Holistic Optimization for Accelerating Iterative Machine Learning Doris Xin, Stephen Macke, Litian Ma, Jialin Liu, Shuchen Song, AdityaParameswaran University of Illinois (UIUC)

Machine Learning is Everywhere! Machine learning development is inherently iterative

Application: Census Income

W0 data.csv age edu. gender occ. Data Preprocessing Logistic Regression Modeling Predictions Evaluation Accuracy

W0 W1 W2 SVM 2hrs 2hrs 2hrs AUC

Improvement via Iteration • Knobs to tune • Data preprocessing: e.g., data cleaning, adding features • Modeling: e.g., hyperparameters, model type • Post processing: e.g., evaluation metrics • Can take hundreds of iterations! • Currently rerun workflow end-to-end in every iteration • 1stiter: 2 hrs 2nditer: 2 hrs  3rditer: 2 hrs • Materializeintermediate results to speed up subsequent executions through reuse W A S T E F U L

30min 5min 2hrs

What to materialize?

What to reuse? • Suppose everything was materialized • Reload input to changed nodes?

Recap . • ML is iterative • Data preprocessing (e.g. add features) • Modeling (e.g. hyperparameter) • Post processing (e.g. evaluation metric) • Materialize and reuse! Need to • Materialize to handle future changes • Reuseoptimally to minimize overall runtime more reuse

VLDB ‘18 Demo

The Workflow Lifecycle Compile Time Run Time Reuse Optimizer Materialization Optimizer

Reuse: Optimal Execution Plan n2 n1 n3 n4 n7 n6 n5 n8 Compute s Load Prune s.t. or

Reuse: Optimal Execution Plan n2 n1 n3 n4 n7 n6 n5 n8 Bad! Compute s Load Prune s.t. or

Reuse: Optimal Execution Plan n2 n1 n3 n4 n7 n6 n5 n8 Compute s Load Prune s.t. or Total cost = load time for ’s + compute time for ’s

Reuse: Optimal Execution Plan n2 n1 n3 n4 n7 n6 n5 n8 MinCut

Reuse: Optimal Execution Plan n2 n1 n3 n4 n7 n6 n5 n8 MinCut Guaranteed to have the lowest cost Proof in Technical Report at https://arxiv.org/pdf/1812.05762.pdf

Optimal Materialization Plan n’6 n3 n1 n2 n5 n8 n2 n7 n4 n5 n6 n7 n4 n3 n1 n8 n0 n0 Total cost = n5 n3 n4 Mat. cost + Wt+1 Wt

Optimal Materialization Plan n8 n1 n3 n4 n6 n5 n2 n7 n2 n1 n4 n7 n6 n5 n8 n3 n0 n0 Total cost = n4 n3 n5 Mat. cost + Wt+1 Wt

Optimal Materialization Plan n8 n1 n3 n7 n6 n5 n2 n4 n2 n1 n3 n7 n6 n5 n8 n4 n0 n0 Total cost = n4 n3 n5 Mat. cost + Optimal runtime from reuse Wt+1 Wt

Optimal Materialization Plan Total cost = mat. cost + reduced runtime from reuse at time t at time t + 1 penalty benefit Redundancy Mat. nothing Mat. everything Cost Mat. NP-hard even if Wt =Wt+1 Proof in Technical Report at https://arxiv.org/pdf/1812.05762.pdf

Streaming Heuristic n2 n1 n3 n4 n7 n6 n5 n8 Make mat. decisions immediately upon completion  no need to cache Intuition: larger ancestor graph  more valuable to materialize • time to materialize • time to get inputs If this is larger, materialize

Empirical Evaluation Data Preprocessing Modeling Evaluation Materialization 19x Helix: per iteration runtime breakdown and much more in the paper!

Takeaways • ML dev. is iterative. Opportunities for for reuse • Helix automates this  ~20x speedup • Mat. & reuse generalizes to other settings • Parallel executions for hyperparameter search • Sharing across verticals in enterprise • Agile development for ML/AI

dorx@berkeley.edu Questions?

HELIX: Holistic Optimization for Accelerating Machine Learning Iterations