310 likes | 427 Views
SMILE: A Data Sharing Platform for Mobile Apps in the Cloud. Mohamed Sarwat UMN. Haopeng Zhang UMass, Amherst. Jagan Sankaranaryanan Hakan Hacıgümüs NEC Labs America. Motivation For Sharing in Cloud. Mobile apps run their databases in the cloud Often small databases
E N D
SMILE: A Data Sharing Platform for Mobile Apps in the Cloud Mohamed Sarwat UMN Haopeng Zhang UMass, Amherst Jagan Sankaranaryanan HakanHacıgümüs NEC Labs America
Motivation For Sharing in Cloud • Mobile apps run their databases in the cloud • Often small databases • Often hosted in the same cloud infrastructure • Often need “fresh” data from other apps • e.g., Calendar app wants the itinerary from airline booking app • Need a declarative way for apps to share data Sharing MiddLewarE(SMILE) Database As a Service Multitenant Database App 1 DB App 2 DB App n DB
Declarative Sharing Sharing (S1): CloudDB Datasets D1 D2 D3 D2 Transformation: (SPJ) Staleness SLA D3 Transform Sharing (S2): … D1 Sharing (Sn): …
Three ways of Enabling Sharing Web Service Service provider’s cost inkeeping shared space consistent What requirements materialization satisfies? App Alice Data App Bob Data API App Alice Data App Bob Data SMILE App Alice Data App Bob Data Materialized Shared Space SQL SQL SQL Sharing via API Sharing using a Materialized Shared Space (i.e., view) Direct Sharing
Sharing ExampleSimple Sharing Scenario Sharing (S1): Sources: SP, UP Transformation: ps(SP ✖ UP) Staleness: <= 5 Seconds SP UP ps(SP ✖ UP) < 5 seconds SP ✖ UP SP = Stock Price UP = User Portfolio
Sharing Example (Contd.) UP UP UP SP SP SP SP SP DISTRIBUTED JOIN JOIN JOIN COPY COPY COPY COPY COPY SP ✖ UP SP UP SP ✖ UP SP ✖ UP $$$, 1 second staleness $, 10 second staleness $$, 3 second staleness
Problem Formulation • Given n sharings S: • S = {S1Sn} • Each sharing specifies a staleness requirement in seconds • e.g., 5 seconds • Datasets are relations in RDBMS • Updated asynchronously (i.e., independently) • Goal: Enable all sharing such that • Using MVs that are always consistent • All MVs under the staleness SLA • At the cheapest cost for service provider
SMILE System Architecture Postgres 1 Sharing Plan Postgresql Database Gateway Postgres 4 Postgres 3 R ¢R Copy Delta Postgres 2 Capture Delta ¢R Updates LOG R SMILE Input Sharings Sharing Plan optimizer
Sharing Plan Optimizer • R*-style optimizer • Varies join ordering and operator placement • Using a dynamic programming formulation • Uses four operators to express SPJ transformations in sharings • DeltaToRel, Join, Union, CopyDelta • Two cost models: • Dollar Cost of a plan • Time Cost of a plan
A ΔA DETATOREL DETATOREL B ΔB ΔB COPYDELTA ΔA COPYDELTA JOIN JOIN Δ(A⋈ΔB) Δ(ΔA⋈B) Machine m2 Machine m1 COPYDELTA COPYDELTA Δ(A⋈ΔB) Δ(ΔA⋈B) UNION Δ(A⋈B) A⋈B Sharing Plan DETATOREL Machine m3
Cost Models: Dollar and Time Dollar cost is expense to provider to execute thesharing plan, in $/second • Use Amazon EC2 pricing Time cost is critical data path time in seconds • Using a synthetic time model for each operatortype $ staleness
Time Cost Model We use a simple linear cost model to estimate the time taken by each operator CopyDelta DeltaToRel Join Union
Generating Global Sharing Plan • Input: Set of n sharings • Step 1: For each sharing generate a sharing plan so that: • Plan is admissible • Means that its critical time path is less than the Staleness SLA • Generate two plans • DPD: Cheapest Dollar Cost Plan • DPT: Smallest Critical Time Path Plan • Discard if not admissible but choose DPD is both admissible • Step 2: Make cheaper by merging commonalities with other sharing plans in the style of Multi-query optimization • We call merging operation as ``plumbing’’
Plumbing Operation Remove Remove SRC (pi) pi pi • DST(pi) • SRC(pi) DST(pi) COPY DELTA JOIN Plumbing increases the critical time path of the left plan, so valid as long as left plan is still under its staleness SLA Perform plumbing in a greedy fashion one at a time starting with the one resulting in most cost savings
SMILE System Architecture Postgres 1 Sharing Plan Postgresql Database Gateway Postgres 4 Postgres 3 R ¢R Copy Delta Postgres 2 Capture Delta ¢R Updates Heartbeat Agent Agent LOG R Push Agent Agent Pub/Sub SMILE Sharings Sharing Executor Sharing Plan optimizer
Sharing Executor • Accounts for runtime variations in the system • Change in the input update rate • Machine or resource contention or unavailability • Basically obtains current timestamp of vertices and issues “push” operation • Push operation specifies how much to “synchronously” advance the timestamp of each vertex in the sharing plan • Tries to combine work as much as possible • Uses a feedback loop to automatically account for runtime variations
Staleness and Push - TS(DEST) MAX_TS(SRCS) • Current STALENESS = • PUSH: How much to advance TS(DEST)? • Cannot be more than MIN_TS(SRCS) – TS(DEST) • Look at the paper for a sharing executor that is lazy by design and refreshes MVs just as it is about miss the staleness SLA
Experiments • Twitter GardenHose Stream • 6 machines • One machine generates updates and hosts base relations • 5 machines for hosting sharing plan operators • Rate: 50—10k tweets/sec • Sharings: 5—50 sharings • SLA: 10—60 seconds
Base relations • Unpack incoming Tweets into 9 base relations
Sharing Arrangements • 25 sharing arrangements as SPJ transformations on base relations
Sharing Plan 25Sharings, 6 machines
For Varying Update Rates • SLA violation is low even for large update rates
Related Work • View Maintenance • View Selection • Cache Placement • Data Quality/Staleness • Data Integration • Distributed Databases • Multi-query optimization • Other data sharing effort
View Maintenance • When sources not always at a consistent snapshot • Need to use compensation [Zhuge et al., SIGMOD 1995] • Rolling join [Salem et al., SIGMOD 2000] • Shows how to compose n-way asynchronous propagation queries • Sharing plan is based on this work • How to reduce maintenance cost? • Merge common sub-expressions in the update mechanism of different MV’s to reduce cost [Ross et al., SIGMOD 1996][Mistry et al., SIGMOD 2001] • Staleness in data warehouse setup: • Labrinidis et al., UMD CS TR, 1998]
Summary • SMILE is a declarative data sharing platform in the cloud • Sharings can specify a transformation and a staleness SLA • SMILE uses both static and runtime optimizations • Experimental results show that it can handle high update rates and large number of sharings