Corelab Seminar

Paris Koutris (University of Washington)
Query-Based Data Pricing

Data is increasingly being bought and sold online, and Web-based marketplace services have emerged to facilitate these activities.However, current mechanisms for pricing data are very simple: buyers can choose only from a set of explicit views, each with a specific price. In this paper, we propose a framework for pricing data on the Internet that, given the price of a few views, allows the price of any query to be derived automatically. We call this capability ``query-based pricing.'' We first identify two important properties that the pricing function must satisfy, called arbitrage-free and discount-free. Then, we prove that there exists a unique function that satisfies these properties and extends the seller's explicit prices to all queries. When both the views and the query are Unions of Conjunctive Queries, the complexity of computing the price is high. To ensure tractability, we restrict the explicit prices to be defined only on selection views (which is the common practice today). We give an algorithm with polynomial time data complexity for computing the price of any chain query by reducing the problem to network flow. Furthermore, we completely characterize the class of Conjunctive Queries without self-joins that have PTIME data complexity, and prove that pricing all other queries is NP-complete, thus establishing a dichotomy on the complexity of the pricing problem when all views are selection queries.

joint work with Prasang Upadhyaya, Magda Balazinska, Bill Howe and Dan Suciu