520 likes | 533 Views
Distributed Systems CS 15-440. Introduction to Cloud Computing Lecture 25, Dec 1 st , 2014 Mohammad Hammoud. Today…. Last Session: Distributed File Systems Today’s Session: Cloud Computing Announcements: Prof. Andy Pavlo will be delivering next lecture
E N D
Distributed SystemsCS 15-440 Introduction to Cloud Computing Lecture 25, Dec 1st, 2014 Mohammad Hammoud
Today… • Last Session: • Distributed File Systems • Today’s Session: • Cloud Computing • Announcements: • Prof. Andy Pavlo will be delivering next lecture • P3 grades will be out by tomorrow • Project 4 is due on Dec 3rd by midnight • PS5is due on Dec 4th by midnight • Final Exam is on Monday Dec 8th at 9:00AM in Room 1031. It will be comprehensive, but open book and notes
We Live in a World of Data… 72.9 Items Ordered /S @ Amazon 24 PB/ Day @ Google 50 Million Tweets/Day 2.9 Million Emails/S
What Do We Do With Data? We want to do these seamlessly...
Using Diverse Interfaces & Devices Computers …and even appliances Mobile Devices Consumer Electronics Personal Monitors and Sensors We also want to access, share and process our data from all of our devices, anytime, anywhere!
Data Becoming Critical to Our Lives Health Science Domains of Data Education Work Environment Finance … and more
Think of it this Way … • Evolution of water Utility
How About Electricity? • Transformation from a Product to a Service
Cloud Computing is a vital and incremental step in this direction… … and Cloud Computing?
Can We Define Cloud Computing? “Cloud Computing is the transformation of IT from a product to a service”
A More Formal Definition • Cloud Computing is a model for enabling on-demand network access to a shared pool of configurable computing resources • that can be rapidly provisioned on the form of services. • Cloud Computing can be characterized in terms of: • Six qualities • Three service models • Three deployment models The Service Model Services Services Services IaaS PaaS SaaS The IT Model Orchestration Virtualization Apps Servers Storage Networks
Three Cloud Service Models • Software-as-a-Service (SaaS) • Provides applications as services • The cloud infrastructure cannot be managed or controlled by users • Users can though define some user-specific application configuration settings • Applications are accessible from various client devices (e.g., mobile phones, laptops, and PDAs) through a thin client interface (e.g., a Web browser) • E.g., Google Apps, Microsoft SharePoint, etc. SaaS Application Middleware Guest OS Hypervisor Servers Storage Network
Three Cloud Service Models • Platform-as-a-Service (PaaS) • Provides a middleware which allows creating applications using programming languages and tools supported by the CSP • Users do not manage or control the underlying cloud infrastructure, but has control over the deployed applications • E.g., Google App Engines PaaS Application Middleware Guest OS Hypervisor Servers Storage Network
Three Cloud Service Models • Infrastructure-as-a-Service (IaaS) • This is the foundation of all cloud services • It allows provisioning fundamental computing resources • Users can run arbitrary software (which can include OS and apps) • Users do not manage or control the underlying infrastructure, but has control over OSs, storage, apps • E.g., Amazon EC2, Rackspace Cloud Offerings and IBM BlueCloud IaaS Application Middleware Guest OS Hypervisor Servers Storage Network
Three Deployment Models Public • Public cloud • Exists externally to its end users • Accessed via Internet • Data of different users are comingled • Resources are shared • Private cloud • Usually dedicated to an organization • Accessed via LAN • Data of different users are comingled • Resources are shared • Hybrid cloud • Leverages a public cloud to expand the capabilities of a private cloud Private Hybrid
Requirements to Transform IT to a Service • Connectivity • For moving data around • Reliability • Failure will affect many people, not just one • Pay-as-you-Go • Should not pay an upfront fee for the service • Scalability and Elasticity • Flexible and rapid response to changing user needs • Efficient Storage of Large Amounts of Data • Big Data and Big Graphs • Ease of Programmability • Ease of development of complex services and programs • Efficient Processing of Big Data/Graphs • Efficiency • Performance • Cost • Power
Requirements to Transform IT to a Service Internet Cloud Storage • Connectivity • For moving data around • Reliability • Failure will affect many people, not just one • Pay-as-you-Go • Should not pay an upfront fee for the service • Scalability and Elasticity • Flexible and rapid response to changing user needs • Efficient Storage of Large Amounts of Data • Big Data and Big Graphs • Ease of Programmability • Ease of development of complex services and programs • Efficient Processing of Big Data/Graphs • Efficiency • Performance • Cost • Power Fault-Tolerance Cloud Analytics Engines Utility Computing Virtualization and Resource Management
Where to Store Analytics Data? • The underlying cloud storage layer is a key component for enabling cloud analytics engines • Typically, the cloud storage layer “divides” and “distributes” Big Data/Graph, using stripingand placement techniques • Allows concurrent accesses to data • Improves fault-tolerance Stripe Size Striping Unit Logical File 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Server 1 Server 2 Server 3 Server 4 0 4 8 12 1 5 9 13 2 6 10 14 3 7 11 15
Example: The Google File System • The Google File System (GFS) is a scalable and common distributed file system for storing and managing Big Data/Graphs • GFS adopts a master-slave architecture File name, chunk index GFS client Master Contact address Chunk Id, range Chunk Server Chunk Server Chunk Server Chunk data Linux File System Linux File System Linux File System
The Striping and Placement Policies of GFS • GFS stripes large files into fixed-size blocks and distributes them randomly across cluster machines Blk 2 Blk 3 Large File Blk 0 Blk 1 Blk 4 Blk 6 Blk 5 Server 2 Server 3 Server 1 Server 0 (Writer) Blk 0 Blk 0 Blk 1 Blk 0 0M Blk 1 Blk 2 Blk 2 Blk 1 64M Blk 2 Blk 3 Blk 4 Blk 4 128M Blk 3 Blk 3 Blk 6 192M Blk 4 Blk 5 256M Blk 5 Blk 5 320M Blk 6 Blk 6 384M
Requirements to Transform IT to a Service Internet Cloud Storage • Connectivity • For moving data around • Interactivity • Seamless interfaces • Reliability • Failure will affect many people, not just one • Pay-as-you-Go • Should not pay an upfront fee for the service • Scalability and Elasticity • Flexible and rapid response to changing user needs • Efficient Storage of Large Amounts of Data • Big Data and Big Graphs • Ease of Programmability • Ease of development of complex services and programs • Efficient Processing of Big Data/Graphs • Efficiency • Performance • Cost • Power Web 2.0 Cloud Analytics Engines Fault-Tolerance Utility Computing Virtualization and Resource Management
Developing Cloud Programs • The effectiveness of cloud programs hinges on the manner in which they are constructed and deployed • This entails specifying and addressing: • The Programming Model • The Computation Model • The Architectural Model • Several challenges (e.g., scalability, heterogeneity, etc.,) How much time, effort and money will be needed to develop ONE Cloud program?
Cloud Analytics Engines • Recently, cloud analytics engines were developed to: • Relieve programmers from concerns with many of the difficult aspects of developing cloud programs • Allow programmers to focus on ONLY the sequential portions of their applications’ algorithms • Examples of cloud analytics engines • Hadoop MapReduce • Google’s Pregel • CMU’s GraphLab
Requirements to Transform IT to a Service Internet Cloud Storage • Connectivity • For moving data around • Interactivity • Seamless interfaces • Reliability • Failure will affect many people, not just one • Pay-as-you-Go • Should not pay an upfront fee for the service • Scalability and Elasticity • Flexible and rapid response to changing user needs • Efficient Storage of Large Amounts of Data • Big Data and Big Graphs • Ease of Programmability • Ease of development of complex services and programs • Efficient Processing of Big Data/Graphs • Efficiency • Performance • Cost • Power Web 2.0 Cloud Analytics Engines Fault-Tolerance Utility Computing Virtualization and Resource Management
Objectives Discussion on Virtualization Virtual machine types Virtualization, para-virtualization, virtual machines and hypervisors Why virtualization, and virtualization properties Why virtualization, and virtualization properties
Benefits of Virtualization • Here are some of the benefits that are typically provided by a virtualized system • A system VM provides a sandbox that isolates one system environment from other environments • Virtualization helps isolate the effects of a failure to the VM where the failure occurred Multiple Secure Environment Failure Isolation Mixed-OS Environment Better System Utilization • A single hardware platform can support multiple operating systems concurrently • A virtualized system can be (dynamically or statically) re-configured for changing needs
Operating Systems Limtations • OSs provide a way of virtualizing hardware resources among processes • This may help isolate processes from one another • However, this does not provide a virtual machine to a user who may wish to run a different OS • Having hardware resources managed by a single OS limits the flexibility of the system in terms of available software, security, and failure isolation • Virtualization typically provides a way of relaxing constraints and increasing flexibility
Virtualization Properties • Fault Isolation • Software Isolation • Performance Isolation (accomplished through scheduling and resource allocation) • All VM state can be captured into a file (i.e., you can operate on VM by operating on file– cp, rm) • Complexity is proportional to virtual HW model and independent of guest software configuration • All guest actions go through the virtualizing software which can inspect, modify, and deny operations Isolation Encapsulation Interposition 2 1 3
What is Virtualization? • Informally, a virtualized system (or subsystem) is a mapping of its interface, and all resources visible through that interface, to the interface and resources of a real system • Formally, virtualization involves the construction of an isomorphism that maps a virtual guestsystem to a real hostsystem(Popek and Goldberg 1974) • Function V maps the guest state to the host state • For a sequence of operations, e, that modifies a guest state, there is a corresponding e’ in the host that performs an equivalent modification • How can this be managed? Guest e(Si) Si Sj V(Si) V(Sj) Host e’(Si’) Si’ Sj’
Abstraction • The key to managing complexity in computer systems is their division into levels of abstractionsseparated by well-defined interfaces • Levels of abstractions allow implementation details at lower levels of a design to be ignored or simplified File File Disk • Files are an abstraction of a Disk • A level of abstraction provides a simplified interface to underlying resources
Virtualization and Abstraction • Virtualization uses abstraction but is different in that it does not necessarily hide details; the level of detail in a virtual system is often the same as that in the underlying real system Virtual Disks File File Disk • Virtualization provides a different interface and/or resources at the same level of abstraction
Objectives Discussion on Virtualization Virtual machine types Hypervisors, full virtualization, and para-virtualization Hypervisors, full virtualization, and para-virtualization Why virtualization, and virtualization properties
Virtual Machines and Hypervisors • The concept of virtualization can be applied not only to subsystems such as disks, but to an entire machine denoted as a virtual machine (VM) • A VM is implemented by adding a layer of software to a real machine so as to support the desired VM’s architecture • This layer of software is often referred to as virtual machine monitor (VMM) • Early VMMs are implemented in firmware • Today, VMMs are often implemented as a co-designed firmware-software layer, referred to as the hypervisor
A Mixed OS Environment • Multiple VMs can be implemented on a single hardware platform to provide individuals or user groups with their own OS environments VM1 VM2 VM3 VM4 VM5 Linux Red Hat Solaris 10 XP Vista Mac Virtual Machine Monitor Hardware
Full Virtualization • Traditional VMMs provide full-virtualization: • The functionally provided is identical to the underlying physical hardware • The functionality is exposed to the VMs • They allow unmodified guest OSs to execute on the VMs • This might result in some performance degradation • E.g., VMWare provides full virtualization
Para-Virtualization • Other types of VMMs provide para-virtualization: • They provide a virtual hardware abstraction that is similar, but not identical to the real hardware • They modify the guest OS to cooperate with the VMM • They result in lower overhead leading to better performance • E.g., Xen provides both para-virtualization as well as full-virtualization
Virtualization and Emulation • VMs can employ emulation techniques to support cross-platform software compatibility • Compatibility can be provided either at the system level (e.g., to run a Windows OS on Macintosh) or at the program or process level (e.g., to run Excel on a Sun Solaris/SPARC platform) • Emulation is the process of implementing the interface and functionality of one system on a system having a different interface and functionality • It can be argued that virtualization itself is simply a form of emulation
Objectives Discussion on Virtualization Virtual machine types Virtual machine types Hypervisors, full virtualization, and para-virtualization Why virtualization, and virtualization properties
Background: Computer System Architectures 1 Application Programs Instruction Set Architecture (ISA): 7 & 8 2 Software 3 3 Libraries Application Binary Interface (ABI): 3 & 7 OS 4 5 6 Application Programming Interface (API): 2 & 7 Memory Manager Scheduler Drivers ISA 8 8 8 8 7 7 9 Execution Hardware Memory Translation 10 10 System Interconnect (bus) Hardware 11 11 12 Controllers Controllers 13 14 I/O Devices & Networking Main Memory
Types of Virtual Machines • As there is a process perspective and a system perspective of machines, there are also process-level and system-level VMs • Virtual machines can be of two types: 1. Process VM • Capable of supporting an individual process 2. System VM • Provides a complete system environment • Supports an OS with potentially many types of processes
Process Virtual Machine • Runtime is placed at the ABI interface • Runtime emulates both user-level instructions and OS system calls Application Process Application Process Guest Virtual Machine Virtualizing Software Runtime OS Hardware Host
System Virtual Machine • VMM emulates the ISA used by one hardware platform to another, forming a system VM • A system VM is capable of executing a system software environment developed for a different set of hardware Applications Applications OS Virtual Machine OS Guest Virtualizing Software VMM Hardware Host
Native and Hosted VM Systems Guest Applications Guest OS VMM Host OS Hardware Guest Applications Guest OS VMM Host OS Hardware Guest Applications Guest OS VMM Hardware Applications OS Hardware Nonprivileged modes Privileged modes Traditional Uniprocessor System Native VM System User-mode Hosted VM System Dual-mode Hosted VM System