Open Machine Learning Platform for Reproducible ML Lifecycle

A platform for the Complete Machine Learning Lifecycle Corey Zumar March 27th, 2019

Outline • Overview of ML development challenges • How MLflow tackles these challenges • MLflow components • Demo • How to get started

Machine Learning Development is Complex

ML Lifecycle Delta Data Prep μ μ λ λ θ θ Raw Data Training Tuning Tuning ModelExchange Scale Scale Scale Scale Deploy Governance

Custom ML Platforms Facebook FBLearner, Uber Michelangelo, Google TFX • Standardize the data prep / training / deploy loop:if you work with the platform, you get these! • Limited to a few algorithms or frameworks • Tied to one company’s infrastructure Can we provide similar benefits in an open manner?

Introducing Open machine learning platform • Works with any ML library & language • Runs the same way anywhere (e.g. any cloud) • Designed to be useful for 1 or 1000+ person orgs

Standard packaging format for reproducible ML runs • Folder of code + data files with a “MLproject” description file MLflow Components Tracking Projects Models Record and queryexperiments: code,configs, results, …etc Packaging formatfor reproducible runs on any platform General model format that supports diversedeployment tools

Key Concepts in Tracking Parameters: key-value inputs to your code Metrics: numeric values (can update over time) Artifacts: arbitrary files, including data and models Source: training code that ran Version: version of the training code Tags and Notes: any additional information

MLflow Tracking Tracking APIs (REST, Python, Java, R) UI Tracking Server API

Standard packaging format for reproducible ML runs • Folder of code + data files with a “MLproject” description file MLflow Tracking import mlflow with mlflow.start_run(): mlflow.log_param("layers", layers) mlflow.log_param("alpha", alpha) # train model mlflow.log_metric("mse", model.mse()) mlflow.log_artifact("plot", model.plot(test_df)) mlflow.tensorflow.log_model(model) Tracking Record and queryexperiments: code,configs, results, …etc

MLflow backend stores • Entity Store • FileStore (local filesystem) • SQLStore (via SQLAlchemy) • REST Store • Artifact Repository • S3 backed store • Azure Blob storage • Google Cloud storage • DBFS artifact repo

Demo Goal: Classify hand-drawn digits • Instrument Keras training code with MLflow tracking APIs • Run training code as an MLflow Project • Deploy an MLflow Model for real-time serving

MLflow Projects Motivation Diverse set of training tools Result: ML code is difficult to productionize. Diverse set of environments

MLflow Projects Local Execution Project Spec Config Code Remote Execution Dependencies Data

MLflow Projects Packaging format for reproducible ML runs • Any code folder or GitHub repository • Optional MLproject file with project configuration Defines dependencies for reproducibility • Conda (+ R, Docker, …) dependencies can be specified in MLproject • Reproducible in (almost) any environment Execution APIfor running projects • CLI / Python / R / Java • Supports local and remote execution

Example MLflow Project conda_env: conda.yamlentry_points: main:parameters:training_data: path lambda: {type: float, default: 0.1}command: python main.py {training_data} {lambda} my_project/├── MLproject│ │ │ │ │├── conda.yaml├── main.py└── model.py ... $ mlflow run git://<my_project>

MLflow Models Motivation Inference Code Batch & Stream Scoring Serving Tools ML Frameworks

MLflow Models Inference Code Model Format Flavor 1 Flavor 2 Batch & Stream Scoring Standard for ML models Serving Tools ML Frameworks

MLflow Models Packaging format for ML Models • Any directory with MLmodel file Defines dependencies for reproducibility • Conda environment can be specified in MLmodel configuration Model creation utilities • Save models from any framework in MLflow format Deployment APIs • CLI / Python / R / Java

Example MLflow Model run_id: 769915006efd4c4bbd662461time_created: 2018-06-28T12:34flavors:tensorflow:saved_model_dir: estimatorsignature_def_key: predictpython_function:loader_module: mlflow.tensorflow my_model/├── MLmodel│ │ │ │ │└ estimator/ ├─ saved_model.pb └─ variables/ ... Usable with Tensorflow tools / APIs Usable with any Python tool mlflow.tensorflow.log_model(...)

Model Flavors Example PyTorch mlflow.pytorch.log_model() Train a model predict = mlflow.pyfunc.load_pyfunc(…) predict(input_dataframe) Flavor 1: Pyfunc Model Format Flavor 2: PyTorch model = mlflow.pytorch.load_model(…) with torch.no_grad(): model(input_tensor)

Model Flavors Example predict = mlflow.pyfunc.load_pyfunc(…) predict(input_dataframe)

Get started with MLflow pip install mlflowto get started Find docs & examples at mlflow.org tinyurl.com/mlflow-slack

0.9.0 Release MLflow 0.9.0 was released this week! Major features: • Tracking server supports SQL via SQLAlchemy • Pluggable tracking server backends • Docker environments for Projects • Custom python models

Ongoing MLflow Roadmap • UI scalability improvements (1.0) • X-coordinate logging for metrics & batched logging (1.0) • Fluent API for Java and Scala (1.0) • Packaging projects with build steps (1.0+) • Better environment isolation when loading models (1.0) • Improved model schemas (1.0)

Thank you!

Open Machine Learning Platform for Reproducible ML Lifecycle

Open Machine Learning Platform for Reproducible ML Lifecycle

Presentation Transcript

Microsoft Application Lifecycle Management Platform

Supporting the Complete Learning LifeCycle

Developing a Platform for Learning

A complete Math Practice platform

e-Learning Platform what is a Learning Platform ?

Machine Learning For the Web: A Unified View

The Learning Object Lifecycle:

Using a Learning Platform

cRIO as a hardware platform for Machine Protection.

Virtual Machine Lifecycle

Launching a Learning Platform

Strategies for Implementing a Learning Platform

A complete learning solution

Machine Learning For Beginners

The best platform for machine learning teams

Medtalks | A Complete Healthcare Learning Platform

The Visual Big Data Analytics Platform for Stream Processing and Machine Learning

cRIO as a hardware platform for Machine Protection.