80 likes | 171 Views
Query Processing Techniques for Compliance with Data Confidence Policies. Chenyun Dai 1 , Dan Lin 2 , Murat Kantarcioglu 3 , Elisa Bertino 1 , Ebru Celikel 3 , and Bhavani Thuraisingham 3 1 Department of Computer Science, Purdue University
E N D
Query Processing Techniques for Compliance with Data Confidence Policies Chenyun Dai1, Dan Lin2, Murat Kantarcioglu3, Elisa Bertino1, EbruCelikel3, and Bhavani Thuraisingham3 1Department of Computer Science, Purdue University 2Department of Computer Science, Missouri University of Science and Technology 3Department of Computer Science, The University of Texas at Dallas
Outline • Motivation • Policy Compliant Query Evaluation • Related Work • Algorithms • Performance Study • Conclusion and Future Work
Motivation • Improving data quality incurs costs • Verify a customer address • Verify the financial status • Different types of medical data • Obtaining accurate data are expensive • Data quality depends on the purpose • Not critical: statistical summery • Critical: investment, evaluating effectiveness of treatment
Challenges • How to specify which task requires high-confidence data? • How can we improve the confidence of the data to desired level with minimum cost? • Which portion of the data should be selected for quality improvement?
System Framework • Four components • (1) Assocate confidence values with data tuples [SDM’08] • (2) results’ confidence computation based on lineage [VLDB’04] • (3) confidence policy* • (4) finding optimal strategy for increasing confidence level* • * proposed in this paper
Contributions • Propose the first systematic approach to data use based on confidence values of data items • Introduce the notion of confidence policy and confidence policy compliant query evaluation • Propose three algorithms to minimize the cost for adjusting confidence values of data • Carried out performance studies which demonstrate our system is efficient
Related Work • Access Control Policies • RBAC • Lineage calculation • Trio[VLDB’06] • Provenance in e-science[SIGMOD Record’05] • Probabilistic data[TKDE’92] • Quality view[VLDB’06] • Specify users’ quality requirements using views • Does not include a quality increment component