RESOURCE MANAGEMENT FOR ISOLATION ENHANCED CLOUD SERVICES

Himanshu Raj Ripal Nathuji Abhishek Singh Paul England Microsoft Corportaion RESOURCE MANAGEMENT FOR ISOLATION ENHANCED CLOUD SERVICES ACM Workshop on Cloud Computing Security 2009 Presented by: Yun Liaw

Outline • Introduction • Example Scenario for Isolation Attributes • Enforcing Cache Isolation in Multicore Systems • Cache Hierarchy Aware Core Assignment • Page-Coloring Based Cache Partitioning • Experimental Evaluation • An SLA Driven Approach to Resource Management in the Cloud Infrastructure • Related Work • Conclusions and Future Work • Comments

Introduction • The cloud computing in IaaS model separates the service provider and infrastructure owner • the service provider (SP) has less control over the service deployment, and must trust cloud infrastructure provider (CIP) to uphold the guarantees provided in the service level agreement (SLA) • A service provider must trust the infrastructure provider’s ability to properly isolate the service from each other • For the performance and security issue • Traditionally: physical isolation • Good isolation but costly • In cloud: Use virtualization to encapsulate service inside VM • Flexible but weaker isolation

Introduction Last Level Cache • Resources are implicitly shared among VMs • Last level cache (LLC) on multicore processors and memory bandwidth • Present opportunities for security and performance interference • Process confidentiality compromising • DoS attack launched by malicious VMs • Isolation attributes for a service defined as part of the SLA between SP and CIP serve two purpose • To capture the degree of isolation demanded by a service • To allow a service to authoritatively report its isolation characteristics for the service user • isolation attestation This paper’s focus!

Introduction • This paper’s focus: • Presenting mechanisms to enforce some isolation constraints, focusing on last level cache (LLC) • Cache hierarchy aware core assignment • Page-coloring based cache partitioning • Providing an example formulation of a constraint satisfaction problem (CSP) for CIP’s VM placement

Example Scenario for Isolation Attributes • Several VMs belonging to various independent SPs are deployed on a CIP’s infrastructure • Example Scenario: Virtual Desktop Experience (VDE) The SP adds value by allowing roaming access to the VDE, and provide management ability Service VM: Provide services that can be accessed in the VDE Session VM: Specific to a client, and works as her personal computer

Example Scenario for Isolation Attributes • Service client’s concern about the service (may be addressed in the SLA between client and SP) will create concerns about isolation and resource management for the SP • Example: Can adversary VM impact the performance of session VM? • This isolation and resource management concern will in turn pass to the SLA between SP and CIP • The CIP must manage their resources to meet the SLA between SP and CIP • The resource assignment problem can be posed as constraint satisfaction problem (CSP)

Example Scenario for Isolation Attributes

Enforcing Cache Isolation in Multicore Systems • Shared caches are commonly used in multicore systems that are prevalent in today’s large scale data centers • Difficult to guarantee performance to a thread whose active working set spills out of its local caches into the LLC • It is possible to impact a thread’s confidentiality by cache-based side channel attack • Two techniques for cache isolation • Cache hierarchy aware core assignment • Page-coloring based cache partitioning

Cache Hierarchy Aware Core Assignment • Group cores on a machine based on their LLC organization • All cores sharing the LLC are put in a single group • If a VM V’s SLA defines isolation attribute related to the cache, • Choose a group that is currently not assigned to any other VM • Assign the cores in this group toV as V’s virtual processors • Depending on the # of virtual processors, one or more groups may be used • Drawback: under utilization of cores within a group

Page-coloring Based Cache Partitioning – Cache • Cache Line: The smallest unit of memory that can be transferred between the RAM and the cache • N-way Associative Cache • a hybrid between a fully associative cache (which requires parallel searches of all slots), and direct mapped cache (which may cause collisions of addresses to the same slot)

Page-coloring Based Cache Partitioning – Page • Page: a fixed-length block of memory that is contiguous in memory addressing • A page is usually the smallest unit of data for the following: • memory allocation for a program • transfer between main memory and any other auxiliary store

Page-coloring Based Cache Partitioning – Page Coloring • Page Coloring • A Software technique that controls the mapping of physical memory to a processor’s cache block • Memory Pages that map to the same cache blocks are assigned the same color • The granularity of page color is the unit of cache space that can be allocate to an application (VM)

Page-coloring Based Cache Partitioning – Page Coloring Cache line 6 GB Memory Set Page size: 4KB 1 2 3 16 … • 1 page’ size = 64 cache lines’ size • 128K cache lines in this cache (8MB/64byte) • 8K associative sets in this cache (128K/16) 8 MB 16-way Cache Cache Line Size: 64byte The Maximum color that this cache can support= # of sets / # a page’s cache line size = 8K / 64 = 128 By controlling the color of pages assigned to an application, the OS can manipulate cache blocks at the granularity of cache space that can be allocated to an application

Page-coloring Based Cache Partitioning • The hypervisor allocates memory pages to back a VM can influence the cache usage of threads in the VM • Utilizing page coloring for cache isolation by isolating the color sets that are used to back individual VMs running on CPU cores that share the LLC • Drawback: under utilization of memory

Experiment – Implementation Detail and Methodology • Based on Microsoft Hyper-V • The memory management component in Hyper-V 11is replaced by a Windows NT kernel’s memory allocation API • The configuration of each physical machine is enhanced with 2 pieces of information • The group information for cores • # of page colors and their current size

Experiment – Implementation Detail and Methodology • Experimental platform: • 8-core Intel Nehalem processors based machine • 6GB RAM • 8MB shared LLC • The prefetch function of Nehalem processor is disabled • Cache Hierarchy: • 2 groups of cores

Experiment – Implementation Detail and Methodology • Target VM: • 1 virtual processor • Running program: allocates an array of a specific working set size, and then accesses it in a regular pattern • Perturbing VM: • 3 virtual processors • Running program: intensive application with repeatedly access memory and cause cache thrashing • Cache hierarchy aware core assignment (CHACA) experiment • Target VM and Perturbing VM are placed on different groups of cores • Page-coloring based cache partitioning (PCBCP) experiment • Target VM and Perturbing VM are placed on same groups of cores • The target VM shares 50% of the total number of colors available, and the perturbing VM shares the other 50%

Experiment Result - No Isolation and CHACA In CHACA, since the perturbing VM is placed on different group of cores, it does not cause any influence on the target VM The execution time decreases to the baseline when the working set is smaller than the LLC

Experiment Result - PCBCP Additional threads does not impact the performance

Experiment Result - PCBCP Log axis Coloring causes performance penalty The execution time can be cut when the perturbing VM included

An SLA Driven Approach to RM in the Cloud Infrastructure • The SLA between SP and CIP can be converted into a set of CIP specific constraints • The constraints are defined in terms of available resourcesat the CIP → A Constraint Satisfaction Problem (CSP)! • Example scenario – The SLA between SP and CIP defines • Number of processors = 2 • Replication factor (r)= 5 • H/w fault domain (n)= 5 • Cache based DoS attack avoidance = True • Cache based side channel attack avoidance = True → To place 5 VMs (based on r) on physical machines in the cloud such that the SLA is satisfied

An SLA Driven Approach to RM in the Cloud Infrastructure • Example Scenario (Cont’d) • physical node: Blade object

Blade Attributes

An SLA Driven Approach to RM in the Cloud Infrastructure • Let VMs be the set of virtual machines, corresponding to vm1, vm2, … vm5, that needed to be placed on the set Blades • Decision Variables of each VM • Blade • ProcessorDomain • PageColorDomain

Pseudo code of a greedy algorithm for CSP formulation DecisionVariables

Constraints

Related Work • There is little prior work on security and isolation specific SLA constraints • This work is the first attempt on characterizing specific isolation related attributes for SLA between SP and CIP • Monahan et al., define security related SLA constraints that are applicable in cloud computing scenario [10] • Research on cache based interferences

Conclusions and Future Work • Conclusions: • This paper envisions that SP in cloud computing environment will also specify security and performance isolation constraints as part of their SLA • One such set of constraint advocated in this paper is based on cache sharing in contemporary multicore systems • This paper presents 2 approaches to provide security and performance isolation • This paper provides a generic CSP formulation • Future Work • To use other CSP solvers to formulate and solve the CSP • To evaluate the impact of SLA isolation attributes on the overall cost of VM placement • Isolation attestation

Comments • Did not mention much of the detailed approaches of cache isolation • CSP might be a good way to study • 滷蛋 = 回香豆蔻甘草百里香風味白蛋(?!)

RESOURCE MANAGEMENT FOR ISOLATION ENHANCED CLOUD SERVICES

RESOURCE MANAGEMENT FOR ISOLATION ENHANCED CLOUD SERVICES

Presentation Transcript

Efficient Resource Management for Cloud Computing Environments

Windows Intune : Cloud Services for PC Management

Windows Intune : Cloud Services for PC Management

A.R.M.S. Active Resource Management Services

Energy-Efﬁcient Resource Management for Cloud Computing Infrastructures

Verifiable Resource Accounting for Cloud Computing Services

KaffeOS: Isolation, Resource Management and Sharing in Java

Wide-Area Traffic Management for Cloud Services

VGRIS: Virtualized GPU Resource Isolation and Scheduling in Cloud Gaming

Efficient Resource Management for Cloud Computing Environments

Trust Management System for Opportunistic Cloud Services

Windows Intune: Cloud Services for PC Management

Comptel Dynamic OSS for Cloud Services Management

A CLOUD-BASED RESOURCE MANAGEMENT TOOL

Hybrid Cloud Management services

Human Resource Management Services

Cloud Management Services

Virtual Resource Management (VRM) in Cloud Environment

Resource Management and Primary Services