150 likes | 275 Views
School of Information Technology and Electrical Engineering. Data Quality Aware Query Systems. Naiem Khodabandehloo Yeganeh Supervised by: Dr. Shazia Sadiq , Co-supervisor: Prof. Xiaofang Zhue. At VLDB 2010 PhD Workshop. Data Quality Aware Query Systems.
E N D
School of Information Technology and Electrical Engineering Data Quality Aware Query Systems NaiemKhodabandehlooYeganeh Supervised by: Dr. ShaziaSadiq, Co-supervisor: Prof. XiaofangZhue At VLDB 2010 PhD Workshop
Data Quality Aware Query Systems My goal is to answer a query like this and maximize user satisfaction • SELECT TOP 10 • Name, • Job Title, • Phone No, • Address, • Last Tax Paid, • Count (Publications) FROM (Anywhere) WHERE Name=“Naiem” and School=“UQ” ORDER BY DataQuality (As defined visually) NETWORK Accurate Current Consistent Complete
Framework & Assumptions • Database Schemas of all data sources (metadata M) are known, and a federated view to all of them exists . • Data sources contribute to generate their own DQ profile, because they know best about their data. (i.e. England based Data source vsAustralian based data source have different rules for measuring accuracy of Address) Quality Aware Queries Communication Network DQ Profiling DQ Profiling DQ Profiling Org n Org 1 Org 2 DB DB DB M M M
Challenges • Capture user preference on data quality • Data Quality Aware Query Language • (Pre-)Estimate quality of the result of query • Data Quality Profiling • Responding to the (TOP k) query efficiently • Data Quality Aware Query Planning
Data Quality Aware Query Language Preference as partial oreders • Goal: Capture user preferences I have the following preference matrix about data quality I like tea more than Cofee
Data Quality Aware Query Language • We defined an extension to SQL language to capture user preference on Data Quality • We developed a visual user interface to visually capture preferences • We developed methods to detect inconsistencies in user preferences with effective visual feed back
Data Quality Aware Profiling • Traditional DQ Profiling -DQ scores assigned to source or schema object. -Can not estimate query results Quality of information about Apple products in a Microsoft website may not be good even if the web site has high quality data in general.
Data Quality Aware Profiling • We developed a new profiling method called Conditional DQ Profiling to estimate the quality of results of a query. • This should include ANY possible query for a where clause (WHERE Name=‘Naiem’ AND School=‘UQ’)
Data Quality Aware Profiling • Example a table with data about digital camera. Brand: C = Cannon S = Sony Model: S = SLR N = Normal Price: H = High L = Low
Data Quality Aware Profiling Conditional DQ Profile Reduced Conditional DQ Profile with two threshold (minimum set=2, and accuracy=%20)
Data Quality Aware Profiling Effect of thresholds on the size of Conditional DQ profile PPM – Power Plant Meters Database DBLP – DBLP Publications Database
Possible join plans Select * from join A,B,C,D on ... Data Quality Aware Query Planning A B C D Querying Interface S3 S5 Sk Sj Si S9 S4 Sn Sx S1 Sy Sb .. .. .. .. Communication Infrastructure S1 S2 S3 Sn
Love to get feedbacks • Questions?