350 likes | 366 Views
Explore the concept of serverless architecture for deploying R models in a scalable and flexible manner on Azure. Learn how serverless computing can help data scientists put models into production easily, scale efficiently, and require minimal maintenance. Dive into the benefits of serverless computing, focusing on coding over maintenance, cost-effectiveness, fast scaling, and on-demand resources. Discover the components of a serverless architecture and ways to interact with cloud providers. Examine cost comparisons and use cases for model training and scoring within a serverless setup.
E N D
serveRless computing for R EARL 2019 – London by Thomas Laber
Agenda 01 02 03 Serverless Architecture The Problem Building a scalable and flexible pipeline to deploy R models What does this buzzword actually mean? A solution architecture for Azure
Agenda 01 02 03 The Problem Serverless Architecture Building a scalable and flexible pipeline to deploy R models What does this buzzword actually mean? A solution architecture for Azure
The Problem How can we build a cost-effective data science pipeline that allows data scientists using R to easily put their models into production, that scales well and is easy to maintain? Rare pictureofthefabled „eierlegende Wollmilch“
What we want a serverless data science architecture get training data docker pull Batch Training Pool 1 Pool 2 … Pool n store models Model storage … Project 1 Project 2 Project n get scoring data read models write results Auto-Scaling Trigger REST Batch Scoring Realtime Scoring
Agenda 01 02 03 The Problem Serverless Architecture Building a scalable and flexible pipeline to deploy R models What does this buzzword actually mean? A solution architecture for Azure
The Solution Just like wireless internet has wires somewhere, serverless architectures still have servers somewhere.What ‘serverless’ really means is that, as a developer you don’t have to think about those servers. You just focus on code. (AWS) Components of a serverlessarchitecture serverless.com
Why serverless? The promise: Focus on coding, not maintenance NO ADMINISTRATION PAY-PER-USE FASTER TURNAROUND SCALE ON DEMAND Scaling is automatic and part of the service. Billing is based on actual compute resources used. No compute used, no costs. No server provisioning and maintenance is necessary. Hardware and OS are abstracted away. Spinning up new environments is quick and allows for faster experimentation.
The Evolution of the Cloud Cloud provider versus customerrolesformanagingcloudservices Entreprise IT Infrastructure as a Service Platform as a Service Functions as a Service legacy IT Virtual Machines Containers Functions Applications Applications Applications Applications Scalability Scalability Scalability Scalability Customer Managed Security Security Security Security OS OS OS OS Virtualization Virtualization Virtualization Virtualization Customer Managed Provider Managed Servers Servers Servers Servers Provider Managed Storage Storage Storage Storage Provider Managed Networking Networking Networking Networking Data Centers Data Centers Data Centers Data Centers
What now? VMs vs containers vs functions
Three ways to interact with Cloud Providers Terminology Browser (GUI) Command Line Interface (CLI) Azure: ARM-Templates AWS: CloudFormation REST calls excellent tool: Postman
Three ways to interact with Cloud Providers Browser Browser (GUI)
Three ways to interact with Cloud Providers Terminology Command Line Interface (CLI) Azure: ARM-Templates AWS: CloudFormation IaC (Infrastructure as Code)
Three ways to interact with Cloud Providers Terminology REST calls excellent tool: Postman
Cost Comparison Serverlesscanbecheap, but depends on workload Big lock-in potential!
Cost Comparison Serverlesscanbecheap, but depends on workload You create a Linux container group with a 1 vCPU, 1 GB configuration once daily during a month (30 days). The duration of each container group is 5 minutes (300 seconds). Memory duration vCPUduration TOTAL = € 0.013 + € 0.116 = € 0.13
Cost Comparison Serverlesscanbecheap, but depends on workload You create a Linux container group with a 1 vCPU, 2 GB configuration 50 times daily during a month (30 days). The container group duration is 150 seconds. Memory duration vCPUduration TOTAL = € 0.63 + € 2.903 = € 3.53
Cost Comparison Serverlesscanbecheap, but depends on workload You create a Linux container group with a 4 vCPU, 8 GB configuration 2 times daily during a month (30 days). The container group duration is 1 hour (= 3600 seconds). Memory duration vCPUduration TOTAL = € 2.42 + € 11.146 = € 13.56
Agenda 01 02 03 The Problem Serverless Architecture Building a scalable and flexible pipeline to deploy R models What does this buzzword actually mean? A solution architecture for Azure
Two Use Cases Model training and scoring have different architecture requirements • SCORING • Mostly short running tasks • Resource usage low • Either adhoc or on schedule • TRAINING • Usually long running tasks • Resource intensive • Mostly in batch mode OUR FOCUS TODAY
Serverless Options We primarily looked at the following options: AWS Lambda AzureFunctions Azure Container Instances
Requirements Many ways to realize serverless scoring architecture with different pros and cons Loadingfromblobs Must support at least time/http trigger Trigger Deploy Resources Load model Serve Model Score Optionallyservingscoreswith HTTP Custom runtimesupportnecessary !
Function as a Service AWS Lambda additional layers R/ └──library/ ├── package 1/ ├── package 2/ ├── package …/ └── package n/ • A function can use up to 5 layers at a time. The total unzipped size of the function and all layers can't exceed the unzipped deployment package size limit of 250MB. runtime.zip Philipp Schirmer R/ ├──bin/ ├──lib/ ├──library/ └──… bootstrap runtime.r runtime.zip Compiledpackagescanbe a headache… baselayer
Function as a Service Azure Functions Neal Fultz Java C# … function code modern open source high performance RPC framework languageworkerprocess Protocol Buffers .NET .NET Core host Dirk Eddelbuettel Google's language-neutral, platform-neutral, extensible mechanism for serializing structured data host process functions V2 runtime
Why Azure Container? Container give us maximum flexibility regarding runtime and reduce vendor lock-in PROS CONS More setup involved compared to FaaS such as AWS Lambda Supports arbitrary runtimes Higher startup times compared to FaaS depending on Image No problems with compiled libraries Lots of supported triggers in combination with logic apps Low vendor lock-in Pay-as-you-go
Docker Basics Terminology docker build docker run Docker Container Dockerfile Docker Image FROMrocker/rstudio:3.5.3 RUNmkdir -p /usr/src/app WORKDIR /usr/src/app COPY * ./TravisR/ WORKDIR /usr/src/app/TravisR
Christoph Azure Container + Logic App Our setup currently looks like this 01 Write Code 01 User Rstudio or the IDE of your choice to write some arbitrary R code, ideally as a package 02 02 Create Dockerfile Dockerfile Use serverless to create a Dockerfileor do it yourself 03 03 git Repo Push yourpackage and thedockerfiletogithub Architecturewith ACI + Logic Apps: Overview
Christoph Azure Container + Logic App Our setup currently looks like this 04 Create Container Registry 04 05 ACR ACR Task Create a resourcegroup and moreimportantly an Azure Container Registry (ACR) 05 Create ACR Task Create a registrytaskwhichbuilds a dockerimage based on yourgithubrepo 07 06 ACI 06 Create Container Instance Create a Azure Container Instance (ACI) pullingthedockerimagefromthe ACR 07 Manage it Use a Logic App orsingle REST callstostart and stopit Architecturewith ACI + Logic Apps: Overview
Logic App A serverless workflow orchestration tool with GUI for prototyping LOGIC APP DESIGNER LOGIC APP TEMPLATING Infrastructure as Code + Serverless : so manybuzzwords
Scoring Workflow Template workflow for a wide range of scenarios Specifytrigger type (time, HTTP, email, etc) Create containerresourcesbased on spec (cpu + RAM + nodes) Check ifcontainergroupisspawnedsuccessfully Delete container after workisdone
serveRless Package We want to build a package to help automate this setup IDEA STATUS Build prototype to test setup Build an R packagethatallows R usersto deploy their code in a serverless setup Build RstudioAddin R Package serveRless Many thanksto Hong Ooiforhisawesomeworksupporting R in Azure! And ofcourse Christoph Bodner und Florian Schwendinger whoare not heretoday. Howdoesthiswork? -> Code snippets, creditsto Hong forcloudyrpackages + httrforrest • Idea: buildserverlesspackagetohelpdatascientistsdeployeasilytocloud (disclaimersecurity…) • RStudio Addin • Automaticdeploymentofinfra
Thank you for your attention! Feel free to reach : linkedin.com/in/thomas-laber data-science-austria.at data-analytics.netlify.com
What now? VMs vs containers vs functions