290 likes | 461 Views
Introduction to Data quality services. Presentation by Tim Mitchell (Artis Consulting) www.TimMitchell.net. Today’s Agenda. Overview of DQS Structure Knowledge Base DQS Project Operations Matching Cleansing Administration SSIS Component Shortcomings. About the Presenter.
E N D
Introduction to Data quality services Presentation by Tim Mitchell (Artis Consulting) www.TimMitchell.net
Today’s Agenda • Overview of DQS • Structure • Knowledge Base • DQS Project • Operations • Matching • Cleansing • Administration • SSIS Component • Shortcomings
About the Presenter Tim Mitchell • BI Consultant, Artis Consulting • North Texas SQL Server User Group • SQL Server MVP • Contributing author, MVP Deep Dives Vol 2 • Coauthor, SSIS Design Patterns • TimMitchell.net | twitter.com/Tim_Mitchell
Housekeeping • Questions • Surveys
What is DQS? • DQS is a knowledge driven data cleansing and matching services • Built on top of SQL Server 2012 • Simple yet powerful interface
What is DQS? • Replaces manual data quality work you’re already doing • Stored procedures • Triggers • Custom applications
DQS Structure and Flow Knowledge Base Cleansing Project Matching Project Domains Matching Policies Cleansing Project Matching Project Composite Domains Cleansing Project
Knowledge Base • Starting point for data quality provisioning • Uses locally customized data stores or marketplace data sources • Highly reusable and evolutionary • Key elements: • Domains • Matching policies
Knowledge Base • Create by: • Knowledge discovery • Domain management • Matching rule
Domains • Domain = data field • Domain rules • Composite domains • Allows greater flexibility in domain rules
Data Quality Project • Create interactive projects for data matching and cleansing • Leverage one or more domains in an existing knowledge base • Somewhat reusable
Data Quality Project • Nondestructive – no changes to source of data to be cleansed • No changes to the KB either • Separately, DQS project data can be used to improve the knowledge base
DQS Operations • Cleansing • Process data against known entities and domain rules • Similar to Fuzzy Lookup transform in SSIS • Matching • Group data together • Similar to Fuzzy Grouping transform in SSIS
DQS Administration • Monitor past activity • Set logging options • Set confidence thresholds
DQS and SSIS • SQL Server Integration Services has integrated hook into DQS • DQS Cleansing Component • Provide automated, noninteractive data cleansingoperations
Shortcomings • V1 product • No API – must use DQS client interactively • SSIS component only does cleansing
Final Thoughts • CU1 performance improvements • http://bit.ly/IKmMow • DQS videos/ blogs • http://technet.microsoft.com/en-us/sqlserver/hh780961 • My blog (www.TimMitchell.net) • DQS/MDS virtual chapter • masterdata.sqlpass.org