180 likes | 284 Views
A Model and Algorithms for Pricing Queries. Tang Ruiming, Wu Huayu, Bao Zhifeng, Stephane Bressan, Patrick Valduriez. Overview. Aggdata. Overview. Windows Azure Marketplace. motivation and existing works. People may want to buy data by asking queries.
E N D
A Model and Algorithms for Pricing Queries Tang Ruiming, Wu Huayu, Bao Zhifeng, Stephane Bressan, Patrick Valduriez
Overview Aggdata
Overview Windows Azure Marketplace
motivation and existing works • People may want to buy data by asking queries. • As stated by Koutris et al. in [Koutris et al., 2012], current pricing schemes have limitations: • Assign prices to entire datasets. • Assign prices to predefined views, and consumers are restricted to these views. • May lead to arbitrage situations. E.g. 10 10-application-free accounts can be used to get 100 applications. • In frameworks of [Koutris et al., 2012], [Koutris et al., 2013], [Li et al., 2012] • Assign prices to pre-defined views. • The price of a query is the price of cheapest set of pre-defined views which can determine the query. (NP-hard)
Framework provenance • In our framework • Assign prices to individual tuples. • For a query, we track the source tuples contributing to the query result. • Each contributing source tuple is charged only once no matter how many times it contributes. Nature of information goods [Balazinska et al., 2011]
Minimal provenance • (provenance) Let Q be a query, D be a database. Q(D) is the query result. A provenance of Q(D) is a set of tuples L in D, such that • (minimal provenance) A minimal provenance of Q(D) is a provenance L of Q(D) such that • where L’ is a provenance of Q(D).
Pricing function • Pricing setting function maps each tuple in database to its price. • Pricing function takes a query as input and returns its price. • Properties of pricing function: • Contribution monotonicity: if a query uses less source tuples than the other query, the price of the first query should be lower. • Contribution arbitrage-freedom: if a query uses less source tuples than a set of queries, the price of the first query should be lower than the sum price of the set of queries. • Bounded price: the price of a query is always not higher than the price of source tuples in the involved relations in the query.
Pricing function • The price of a query Q in a database D is defined as the price of the cheapest minimal provenance of Q(D): • where is the p-norm of L. Increasing p value decreases the p-norm value. Data seller can use p-norm to adjust prices according to different categories of data consumers.
Algorithms for price computation • We assume that for each result tuple, its set of minimal provenances is available. • We aim to find the cheapest minimal provenance of the set of result tuples. • We prove that this problem is NP-hard. • Exact algorithm: • enumerates all the provenances of the query result. (exponential number) • choose the cheapest one.
Approximation algorithms • We devise some approximation algorithms. • Worst case Khanna et al. prove that the approximability of this problem is a polynomial factor in the size of input. ([Khanna et al., 2000] )
Approximation algorithms • Heuristic 1: choose the cheapest minimal provenance for each individual result tuple independently. (greedy algorithm) • Heuristic 2: choose the minimal provenance with the lowest average price for each individual result tuple independently. (greedy algorithm) • Heuristic 3: Heuristic 1 but consider previous choices. (semi-greedy) • Heuristic 4: Heuristic 2 but consider previous choices. (semi-greedy)
Experiments • Effectiveness: the ratio between approximate price and exact price • Efficiency: running time of approximation algorithms.
Experiments • Effectiveness: the ratio between approximate price and exact price • Efficiency: running time of approximation algorithms. • Set up: • Number of result tuples is 10 for measuring effectiveness. (ratio in the worst case is 10) • Number of result tuples varies from 1,000 to 5,000 for measuring efficiency. • For each result tuple, the number of minimal provenances and the size of each minimal provenance is sampled from [1,5] with uniform distribution.
Effectiveness 50,000 runs
Conclusion • We propose a framework for pricing queries based on the source tuples contributed in the query result. • The price of a query is the price of the cheapest minimal provenance of the query result. • We propose a baseline algorithm to compute the exact price of a query and four heuristics to compute the approximate price of a query. • We conduct experiment to show the effectiveness and efficiency of the heuristics.