320 likes | 499 Views
Price Optimal Querying with Data APIs. Prasang Upadhyaya , Magdalena Balazinska , Dan Suciu VLDB 2017 Presenter: Shunit Agmon. Outline. Motivation New solution: Refunds Extensions and Optimizations Experimental Results. Motivational Example.
E N D
Price Optimal Querying with Data APIs PrasangUpadhyaya, Magdalena Balazinska, Dan Suciu VLDB 2017 Presenter: Shunit Agmon
Outline • Motivation • New solution: Refunds • Extensions and Optimizations • Experimental Results
Motivational Example • Bob sells data about users’ check-ins to businesses. • API: (lat, long, r, time) -> list of users Alice makes an API call: 1 4 Some people visited area 3 2 3 Alice makes another call: Alice makes another call: Impossible to avoid overpaying!
Objective • Define a new pricing method s.t. • Clients pay for data they buy • seller is happy • Clients don’t pay too much for the data • clients are happy, more clients come, seller is happy again • We will formalize this later.
Problem Setting • Trusted seller, untrusted buyer • Alice runs an app that acquires data from Bob • Alice is charged separately for each output tuple • Database D with schemas of the form (tid, ver, ) • Pricing by full selection queries: • Assume a D has a single relation • SELECT * FROM D WHERE [condition] • All tuples have the same price (but the solution can be generalized)
Existing Solutions • Count (no history) • Bob charges Alice for each tuple she gets • Alice will pay for some tuples more than once • Block/Stream • History • Bob tracks Alice’s purchases • Bob needs to store all the purchases, and buyer purchases can’t be anonymous. • Block/Stream
Outline • Motivation • New solution: Refunds • Extensions and Optimizations • Experimental Results
Solution: Refunds • Alice makes multiple API calls, as before • Bob computes refund coupons and returns them with the answers • If Alice notices a repeated purchase of the same tuple, she sends the coupons back to Bob • Bob refund Alice for the tuples she bought twice.
Protocol: BasicRefunds Query Q Result Q(D), Refunds(Q,D): Refund please, ? ? ? Then give Alice her money back.
Annotations • Let W be a sequence of messages from Alice to Bob • is the set of all tuples Alice purchased and their counts • is the amount Alice pays for Queries in W • is the amount Bob refunds to Alice after processing W • is the net payment by Alice with message sequence W
Properties of a Refunds Protocol • Safety– A refund protocol is safe if Alice must pay at least once for each tuple she purchased. • Optimality – A refund protocol is optimal if there is a way to ask for refunds so that Alice never pays more than once for each tuple she purchased. • Is BasicRefunds optimal? Is it safe?
BasicRefunds is Optimal • Proof by induction on the number of queries in a sequence of messages W. • Base case: with no queries, Alice pays nothing. • I.A: Assume that for n-1 queries there was an optimal sequence of queries and refund messages: • Step: Append to the n’thquery , and a sequence of refund messages, one for each tuple from that Alice has seen before.New Sequence: • Then show that :
BasicRefunds is not Safe Query Q Result Q(D), Refunds(Q,D): Refund please, Refund please, Refund please, ? ? ? Then give Alice her money back.
Protocol: MonotoneRefunds Query Q Result Q(D), Refunds(Q,D): BEGIN REFUND All refund requests for tuples in query : END REFUND • Only one refund request for each tuple • qid of the second coupon is the same as in BEGIN REFUND • And • Then give Alice her money back and update • Otherwise, reject all coupons between BEGIN, END. ? ? ?
Monotone Refunds is Safe and Optimal • Optimality: as in BasicRefunds (same construction of refunds). • Safety: Stronger claim: for each tuple t, define: • k – number of queries by Alice that contain t • r – number of valid refund requests for tuple t • Then in MonotoneRefunds, at all times. • Proof by induction on the length of the sequence of messages (W).
Monotone Refunds Safety Proof • Base case: A single query for tuple t was executed, so no valid refund messages can be constructed (query ids in the request must be different). • Given a message sequence of length such that t appears in k queries and r refund messages, and k, observe the n’th message: • If it’s a query, it can only increase k • If it’s a BEGIN/END REFUND message, or a refund message for another tuple, k and r stay the same • If it’s a valid refund message for t: • If then k(We’re OK) • Otherwise, k-r=1. The r refund messages had to use r+1 distinct query ids. Then is at least one more than the query id of the k’th query with t in it. Then it is not a valid refund message, in contradiction.
Outline • Motivation • New solution: Refunds • Extensions and Optimizations • Experimental Results
Extension: Multiple Buyers • Observation: the safety and optimality of the protocol relies on tuple ids being different iff the tuples are different • To enable multiple buyers, the tuple id will include the user id. • Coupons will look like: • Different users will be assigned different tuple ids for the same tuple, so a buyer can’t use another buyer’s coupon.
Extension: Updates • Assuming an update of a tuple has the same price as a brand new tuple • Bob maintains a version number for each tuple, incremented when the tuple updates. • Coupons with version numbers will look like: • Storage overhead, but some systems already have them. (SDSS,SciDB)
Optimization: Group Coupons • Computing one coupon per tuple causes a lot of refund messages (API calls) • Solution: Bob can compute a coupon for a group of tuples • A group coupon can only be used to ask for a refund on all the tuples in the group • In a refund round, each tuple can only be included in one coupon • A coupon now contains a group id and group version number: • Bob has to give Alice a way to check if a tuple belongs to a group. • Contains(tid, gid) {True, False} • How should Bob group the tuples?
Tree Structured Group Coupons 1,0 1,3 1,2 2,0 0,0 0,1 1,1 0,2 0,3 2,1 0,4 0,5 0,6 0,7 3,0 h+1,n h,2n h,2n+1 • Leaves are tuple ids, padded to a power of 2 • Group id (h,n) represents the tuples group • Possibility: Larger fan outs
Outline • Motivation • New solution: Refunds • Extensions and Optimizations • Experimental Results
Experimental Evaluation Setup • Single server running PostgreSQL 9.4 over OS X 10.11.5 • 2.7 GHz Intel Core i7, 16 GB DDR3 RAM • Hashing - SHA1, pgcrypto module implementation • Client is on the same machine as the server
Experimental Evaluation Setup • Data: one table (test) with two integer columns (tid, val) and rows • Tid is a primary key starting with • valis a permutation of where N=|test| • Queries: • pkey.simple: SELECT * FROM test WHERE tid>=l and tid<=u • other.simple: SELECT * FROM test WHERE val>=l and val<=u • join: SELECT * FROM test a, test b WHERE a.val= b.tid AND a.tid>=l AND a.tid<=u
Cost Savings(for pkey.simple) Query Answer Cardinalities X1.4 Query Parameter Distribution 100 times cheaper 10 times cheaper
Single Coupons vs. Group Coupons(pkey.simple) Refunds time Query, pricing and coupons time
Summary • The paper shows a method to support history-aware pricing for data APIs. • A buyer is only charged once for each data item she purchases. • The buyer is responsible to track the data items and ask for refunds. • The paper shows a concise and tamper-proof protocol that is both optimal and safe. • Experimental evaluation shows that the method has a reasonably low time overheadwhile enabling significant cost savings for clients.