450 likes | 757 Views
Architecture of Cloud Computing and Distributed Database Systems. Didem Gündoğdu 23.12.2011. CMPE 422. 1. Why everybody wants the Cloud?. What is Cloud Computing?. • Internet computing – Computation done through the Internet – No concern about any maintenance or management
E N D
Architecture of Cloud Computing and Distributed Database Systems Didem Gündoğdu 23.12.2011 CMPE 422 1
What is Cloud Computing? • Internet computing – Computation done through the Internet – No concern about any maintenance or management of actual resources • Shared computing resources – As opposed to local servers and devices • Comparable to Grid Infrastructure • Web applications • Specialized raw computing services
Why Cloud Computing? • Costs • Independency • Scalability • Redundancy and Disaster Recovery
Cloud Service Taxonomy? • Layer – Software-as-a-Service (SaaS) – Platform-as-a-Service (PaaS) – Infrastructure-as-a-Service (IaaS) – Data Storage-as-a-Service (DaaS) – Communication-as-a-Service (CaaS) – Hardware-as-a-Service(HaaS) • Type – Public cloud – Private cloud – Inter-cloud
The Software Stack and the Cloud SaaS:Softwareas a Service PaaS:Platformas a Service IaaS:Infrastructureas a Service Application ApplicationPlatform Infrastructure „old-style“ applications: everything local IaaS: infrastructure in the cloud, everything else „local“ PaaS: application development & runtime in the cloud SaaS: software „as is“ from the cloud
Service Value to Users Value visibility to end users Salesforce.com,Google docs, Emailservices SaaS users Microsoft Azure,Google App Engine PaaS Application developers IaaS Amazon Elastic Compute Cloud(EC2), Amazon Simple StorageService (S3) Network admins Vision: Use applications just as we use electrical power
SaaS: Software as a Service Service Provider hosts application that multiple customers access over the Internet Sales, marketing, support, payroll, … Goal: reduce total cost of ownership (TCO) Hardware and software („capital expenditures“) Bandwidth, personnel, electricity („operational exp.“) Most useful for small/medium businesses without own data center (and without knowledge in this area)
Variants of Scalability Handle large data sets and many data sets Scale up: big iron Scale out: commodityhardware Scale in: multiple apps/tenants/VMs on singlemachine
Scale Out: Pros and Cons Pros: more performance for the same cost (hardware, energy) Simple failure handling (replace broken machines, impact of failure is small) Easy to incrementally adjust capacity Easy to incrementally upgrade Cons: Frequent load balancing and capacity adjustment necessary (over-provisioning, move distributed data) Large data sets distributed over many machines (reads require good distribution of work or many network requests; writes require transactions)
Scale In: Multi-Tenancy Run application for multiple clients („tenants“) on the same machine/in the same process Only makes sense if enough tenants can be served from one machine Reduces CapEx: resource utilization increased, fewer/smaller hardware required Reduces OpEx: Fewer processes/machines to manage Where is multi-tenancy feasible?
Shared Machine One dedicated database process for each tenant Does not scale beyond a few tenants per server (overhead!) Applications with small number of tenants
Shared Process Single database process manages several tenants (in separate table spaces) Scales to many tenants (regarding overhead) Migration of a tenant to a different process is simple (moving all data files)
Shared Table Data from many tenants in the same table Add tenant_id column, must be set for each access (by application or by dbms) But: sometimes tenants require additional columns Extend schema with generic columns; database needs to efficiently support rows with many NULL values Advantage: everything is pooled Processes, memory, connections, prepared statements New tenants can be created by DML (not DDL) operations Disadvantage: isolation is very weak Query optimization, statistics, cost estimation more difficult (data from other tenants can influence estimates) Table scans more expensive, caching etc not as effective Tenant migration by extracting data from the operational system
Schema Flexibility in SaaS Each tenant uses common base schema plus optional (private or shared) extensions All mapped into one (multi-tenant) physical schema by a dedicated Mapper in the database Schema evolution must be possible during operation, without intervention by DBA
Schema Management Two alternatives: Database manages the schema (evolution through DDL operations): Private tables Extension tables Sparse columns Application manages the schema (evolution through DML operations) XML Columns Universal tables Pivot tables Running example:simple account table Account
Private Tables Account_3 Account_2 Account_1 • Each tenant uses private set of tables • Works well for small number of tables and tenants • Constant overhead per table • each table is small, storage space not used effectively Automotive Domain Healthcare Domain
Private Tables – Query Transformation • Tenant 1: SELECT Beds FROM AccountWHERE Hospital=‘State‘ • SELECT Beds FROM Account_1 WHERE Hospital=‘State‘ • Tenant 2: SELECT Name FROM AccountWHERE Dealers>50 • SELECT Name FROM Account_2WHERE Dealers>50 Account_1 Account_2
Extension Tables • Keep common data in base table with explicit Tenant-ID and row-ID • Each tenant‘s extension gets its own table, join on tenant-ID and row-ID at query time • Number of table still proportional to number of tenants (but additional tables are rather small) Account_Healthcare Account Account_Automotive
Extension Tables – Query Transformation Account • Tenant 1: SELECT Name,BedsFROM AccountWHERE Hospital=‘State‘ • SELECT A.Name,H.BedsFROM Account A,Account_Healthcare H WHERE A.Tenant=1 AND H.Tenant=1 AND A.Row=H.Row AND H.Hospital=‘State‘ Account_Healthcare
Sparse Columns • Each tuple has only a few out of many possible attributes (e.g., catalogues) many NULL values • Store only attributes with values to avoid NULLs • store values with their attribute (identifier) • not widely supported in systems (Microsoft SQL Server) • Add extensions as „sparse column“ to tables • Schema evolution requires DDL Account
Sparse Columns – Query Transformation System retrieves attribute ID andextracts values from SPARSE column • CREATE TABLE Account (Tenant INT, Account INT, Name VARCHAR(100),Hospital VARCHAR(100) SPARSE,Beds INT SPARSE, Dealer INT SPARSE) • Tenant 1: SELECT Name,BedsFROM AccountWHERE Tenant=1 AND Hospital=‘State‘ Account
Universal Table • Use wide table with generic VARCHAR columns • Application-specific mapping • Requires casting of non-textual values • Many NULL values, usually no indexes possible • Used with huge number of extensions and many tenants Universe
Universal Table – Query Transformation • Tenant 1: SELECT Name,BedsFROM AccountWHERE Hospital=‘State‘ • SELECT Col2, TO_INTEGER(Col4)FROM UniverseWHERE Tenant=1 AND Table=1 AND Col3=‘State‘ Universe
Pivot Tables • Store data in 3-ary tables with column_ids and values • One tuple for each non-NULL attribute of original table • One pivot table for each type (int, string, …) • Eliminates the problem of many NULL values • No casts necessary, indexing possible • Google BigTable Pivot_Int Pivot_String
Pivot Tables – Example Account_3 Account_2 Account_1 Automotive Domain Healthcare Domain Pivot_Int Pivot_String
Pivot Tables – Query Transformation Pivot_Int • Tenant 1: SELECT BedsFROM AccountWHERE Hospital=‘State‘ • SELECT I.IntFROM Pivot_Int I,Pivot_String S WHERE I.Tenant=1 AND S.Tenant=1 AND S.Table=0 AND S.Col=3 AND I.Table=0 AND I.Col=4 AND I.Row=S.Row AND S.String=‘State‘ Pivot_String
Pivot Tables – Query Transformation Pivot_Int • Tenant 1: SELECT Name,BedsFROM AccountWHERE Hospital=‘State‘ • SELECT S1.String,I.IntFROM Pivot_Int I,Pivot_String S1, Pivot_String S2 WHERE I.Tenant=1 AND S1.Tenant=1AND S2.Tenant=1ANDS1.Table=0 AND S1.Col=2 AND S2.Table=0 AND S2.Col=3 AND I.Table=0 AND I.Col=4 AND I.Row=S2.RowANDS1.Row=S2.Row AND S2.String=‘State‘ Pivot_String
Google BigTable Group columns into column families Explicitly defined Not too many (hundreds), should be stable Expected type, but internally stored as strings Roughly correspond to pivot table Data in a family compressed and stored together Columns Created on the fly, unbounded number
Google BigTable BigTable One column family for logical table „Account“
Potential Problems with Multi-Tenancy Resource contention among tenants Main problem: malicious and bad requests Impose limit on resource consumption of requests SLAs Access control among tenants Dangerous to rely only on application (bugs!) Carefully implement Mappers or use tuple-level ACLs (expensive and not always available) Moving data for individual tenants Load balancing, archiving (, backup) Requires expensive queries at runtime
Different Cloud Computing Layers MS Live/ExchangeLabs, IBM, Google Apps; Salesforce.com Quicken Online, Zoho, Cisco Application Service (SaaS) Google App Engine, Mosso, Force.com, Engine Yard, Facebook, Heroku, AWS Application Platform 3Tera, EC2, SliceHost, GoGrid, RightScale, Linode Server Platform Storage Platform Amazon S3, Dell, Apple, ...
Technical Issues • Virtualization Security • Reliability • Monitoring • Manageability
Virtualization Security (1/2) • Virtualization – Abstracting the underlying resources so that multiple operating systems can be run on a single physical environment simultaneously – Improving resource utilization by sharing available resources to multiple on demand needs
Virtualization Security (2/2) • Virtualization security – Including the standard enterprise security policies on access control, activity monitoring and patch management • Many IT people still believe that the hypervisor and virtual Machines are safe – Becoming one of the important factors when virtualization technologies move into the cloud • Access control and monitoring of the virtual infrastructure will be on top of providers’ mind
Reliability One of the biggest factors in enterprise adoption – Almost no SLAs provided by the cloud providers today • Enterprises cannot reasonably rely on the cloud infrastructures/platforms to run their business • Amazon said that “AWS (Amazon Web Service) only provides SLA for their S3 service” – Hard to imagine enterprises signing up cloud computing contracts without SLAs clearly defined • Some startup coming up with clever idea to provide SLA as a third party vendor • Cloud providers to grow/wake up and actually do something to encourage the enterprise adoption
Monitoring • Critical to any IT for performance, availability and payment • Monitoring in Cloud Computing – How much CPU or memory the machines are using • CPU and memory usage are misleading most of the time in virtual environments – Only real measurements: how long your transactions are taking and how much latencies there are • High Availability’s article on latency – Amazon finding every 100ms of latency cost them 1% in sales – Google finding an extra .5 seconds in search page generation time dropped traffic by 20%
Managebility • Most of IaaS/PaaS providers being raw infrastructures and platforms that do not have great management capabilities • Auto-scaling: example of missing management capabilities for cloud infrastructures – Amazon EC2 claiming to be elastic; however, it really means that it has the potential to be elastic – Amazon EC2 not automatically scaling your application as your server becomes heavily loaded
Summary: Cloud Computing New paradigm to provide services (S,P,I) SaaS important part of cloud computing Different forms of scaling: up, out, in Re-use processes: multiple tenants on single database instance (but: isolation?) Map many logical schemas to a single physical schema, many mapping techniques(but: query transformation?)
References References 1.Architecture and Design of Distributed Database Systems, WAMDM Cloud Computing Group Haiping Wang 2.Eom, Hyeonsang ,School of Computer Science & Engineering Seoul National University 3. Distributed Database Systems , Grid and Cloud, Katja Hose, Ralf Schenkel 4. Introduction to Cloud Computing and Technical Issues , Eom, Hyeonsang School of Computer Science & Engineering Seoul National University 5. Cloud Computing Databases: Latest Trends and Architectural Concepts , Tarandeep Singh, Parvinder S. Sandhu 6. Cloud Computing & Databases , Mike Hogan, Scale DB INC.