1 / 35

computing for R

Explore the concept of serverless architecture for deploying R models in a scalable and flexible manner on Azure. Learn how serverless computing can help data scientists put models into production easily, scale efficiently, and require minimal maintenance. Dive into the benefits of serverless computing, focusing on coding over maintenance, cost-effectiveness, fast scaling, and on-demand resources. Discover the components of a serverless architecture and ways to interact with cloud providers. Examine cost comparisons and use cases for model training and scoring within a serverless setup.

reneec
Download Presentation

computing for R

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. serveRless computing for R EARL 2019 – London by Thomas Laber

  2. Agenda 01 02 03 Serverless Architecture The Problem Building a scalable and flexible pipeline to deploy R models What does this buzzword actually mean? A solution architecture for Azure

  3. Agenda 01 02 03 The Problem Serverless Architecture Building a scalable and flexible pipeline to deploy R models What does this buzzword actually mean? A solution architecture for Azure

  4. The Problem How can we build a cost-effective data science pipeline that allows data scientists using R to easily put their models into production, that scales well and is easy to maintain? Rare pictureofthefabled „eierlegende Wollmilch“

  5. What we want a serverless data science architecture get training data docker pull Batch Training Pool 1 Pool 2 … Pool n store models Model storage … Project 1 Project 2 Project n get scoring data read models write results Auto-Scaling Trigger REST Batch Scoring Realtime Scoring

  6. Agenda 01 02 03 The Problem Serverless Architecture Building a scalable and flexible pipeline to deploy R models What does this buzzword actually mean? A solution architecture for Azure

  7. The Solution Just like wireless internet has wires somewhere, serverless architectures still have servers somewhere.What ‘serverless’ really means is that, as a developer you don’t have to think about those servers. You just focus on code. (AWS) Components of a serverlessarchitecture serverless.com

  8. Why serverless? The promise: Focus on coding, not maintenance NO ADMINISTRATION PAY-PER-USE FASTER TURNAROUND SCALE ON DEMAND Scaling is automatic and part of the service. Billing is based on actual compute resources used. No compute used, no costs. No server provisioning and maintenance is necessary. Hardware and OS are abstracted away. Spinning up new environments is quick and allows for faster experimentation.

  9. The Evolution of the Cloud Cloud provider versus customerrolesformanagingcloudservices Entreprise IT Infrastructure as a Service Platform as a Service Functions as a Service legacy IT Virtual Machines Containers Functions Applications Applications Applications Applications Scalability Scalability Scalability Scalability Customer Managed Security Security Security Security OS OS OS OS Virtualization Virtualization Virtualization Virtualization Customer Managed Provider Managed Servers Servers Servers Servers Provider Managed Storage Storage Storage Storage Provider Managed Networking Networking Networking Networking Data Centers Data Centers Data Centers Data Centers

  10. What now? VMs vs containers vs functions

  11. Three ways to interact with Cloud Providers Terminology Browser (GUI) Command Line Interface (CLI) Azure: ARM-Templates AWS: CloudFormation REST calls excellent tool: Postman

  12. Three ways to interact with Cloud Providers Browser Browser (GUI)

  13. Three ways to interact with Cloud Providers Terminology Command Line Interface (CLI) Azure: ARM-Templates AWS: CloudFormation IaC (Infrastructure as Code)

  14. Three ways to interact with Cloud Providers Terminology REST calls excellent tool: Postman

  15. Cost Comparison Serverlesscanbecheap, but depends on workload Big lock-in potential!

  16. Cost Comparison Serverlesscanbecheap, but depends on workload You create a Linux container group with a 1 vCPU, 1 GB configuration once daily during a month (30 days). The duration of each container group is 5 minutes (300 seconds). Memory duration vCPUduration TOTAL = € 0.013 + € 0.116 = € 0.13

  17. Cost Comparison Serverlesscanbecheap, but depends on workload You create a Linux container group with a 1 vCPU, 2 GB configuration 50 times daily during a month (30 days). The container group duration is 150 seconds. Memory duration vCPUduration TOTAL = € 0.63 + € 2.903 = € 3.53

  18. Cost Comparison Serverlesscanbecheap, but depends on workload You create a Linux container group with a 4 vCPU, 8 GB configuration 2 times daily during a month (30 days). The container group duration is 1 hour (= 3600 seconds). Memory duration vCPUduration TOTAL = € 2.42 + € 11.146 = € 13.56

  19. Agenda 01 02 03 The Problem Serverless Architecture Building a scalable and flexible pipeline to deploy R models What does this buzzword actually mean? A solution architecture for Azure

  20. Two Use Cases Model training and scoring have different architecture requirements • SCORING • Mostly short running tasks • Resource usage low • Either adhoc or on schedule • TRAINING • Usually long running tasks • Resource intensive • Mostly in batch mode OUR FOCUS TODAY

  21. Serverless Options We primarily looked at the following options: AWS Lambda AzureFunctions Azure Container Instances

  22. Requirements Many ways to realize serverless scoring architecture with different pros and cons Loadingfromblobs Must support at least time/http trigger Trigger Deploy Resources Load model Serve Model Score Optionallyservingscoreswith HTTP Custom runtimesupportnecessary !

  23. Function as a Service AWS Lambda additional layers R/ └──library/ ├── package 1/ ├── package 2/ ├── package …/ └── package n/ • A function can use up to 5 layers at a time. The total unzipped size of the function and all layers can't exceed the unzipped deployment package size limit of 250MB. runtime.zip Philipp Schirmer R/ ├──bin/ ├──lib/ ├──library/ └──… bootstrap runtime.r runtime.zip Compiledpackagescanbe a headache… baselayer

  24. Function as a Service Azure Functions Neal Fultz Java C# … function code modern open source high performance RPC framework languageworkerprocess Protocol Buffers .NET .NET Core host Dirk Eddelbuettel Google's language-neutral, platform-neutral, extensible mechanism for serializing structured data host process functions V2 runtime

  25. Why Azure Container? Container give us maximum flexibility regarding runtime and reduce vendor lock-in PROS CONS More setup involved compared to FaaS such as AWS Lambda Supports arbitrary runtimes Higher startup times compared to FaaS depending on Image No problems with compiled libraries Lots of supported triggers in combination with logic apps Low vendor lock-in Pay-as-you-go

  26. Docker Basics Terminology docker build docker run Docker Container Dockerfile Docker Image FROMrocker/rstudio:3.5.3 RUNmkdir -p /usr/src/app WORKDIR /usr/src/app COPY * ./TravisR/ WORKDIR /usr/src/app/TravisR

  27. Christoph Azure Container + Logic App Our setup currently looks like this 01 Write Code 01 User Rstudio or the IDE of your choice to write some arbitrary R code, ideally as a package 02 02 Create Dockerfile Dockerfile Use serverless to create a Dockerfileor do it yourself 03 03 git Repo Push yourpackage and thedockerfiletogithub Architecturewith ACI + Logic Apps: Overview

  28. Christoph Azure Container + Logic App Our setup currently looks like this 04 Create Container Registry 04 05 ACR ACR Task Create a resourcegroup and moreimportantly an Azure Container Registry (ACR) 05 Create ACR Task Create a registrytaskwhichbuilds a dockerimage based on yourgithubrepo 07 06 ACI 06 Create Container Instance Create a Azure Container Instance (ACI) pullingthedockerimagefromthe ACR 07 Manage it Use a Logic App orsingle REST callstostart and stopit Architecturewith ACI + Logic Apps: Overview

  29. Logic App A serverless workflow orchestration tool with GUI for prototyping LOGIC APP DESIGNER LOGIC APP TEMPLATING Infrastructure as Code + Serverless : so manybuzzwords

  30. Scoring Workflow Template workflow for a wide range of scenarios Specifytrigger type (time, HTTP, email, etc) Create containerresourcesbased on spec (cpu + RAM + nodes) Check ifcontainergroupisspawnedsuccessfully Delete container after workisdone

  31. serveRless Package We want to build a package to help automate this setup IDEA STATUS Build prototype to test setup Build an R packagethatallows R usersto deploy their code in a serverless setup Build RstudioAddin R Package serveRless Many thanksto Hong Ooiforhisawesomeworksupporting R in Azure! And ofcourse Christoph Bodner und Florian Schwendinger whoare not heretoday. Howdoesthiswork? -> Code snippets, creditsto Hong forcloudyrpackages + httrforrest • Idea: buildserverlesspackagetohelpdatascientistsdeployeasilytocloud (disclaimersecurity…) • RStudio Addin • Automaticdeploymentofinfra

  32. Short Demo

  33. Questions?

  34. Thank you for your attention! Feel free to reach : linkedin.com/in/thomas-laber data-science-austria.at data-analytics.netlify.com

  35. What now? VMs vs containers vs functions

More Related