350 likes | 500 Views
Tame Your Unruly Data… With Constraints!. Rob Volk. Speaker Bio / Contact. SQL Server DBA/Developer since 1998 / 6.5 Started with dBase III & Clipper And of course MS Access Wireless telecom, utility billing, credit cards SQLTeam moderator & contributor
E N D
Speaker Bio / Contact • SQL Server DBA/Developer since 1998 / 6.5 • Started with dBase III & Clipper • And of course MS Access • Wireless telecom, utility billing, credit cards • SQLTeam moderator & contributor • Blog: http://weblogs.sqlteam.com/robv/ • Twitter: @sql_r • Email: sql_r@outlook.com • I do deranged things in SQL Server, so you don’t have to
What are Constraints? constraint |kənˈstrānt| noun a limitation or restriction : the availability of water is the main constraint on food production | time constraints make it impossible to do everything. • Constraints maintain the integrity of your database • Domain – what kind of data types/attributes • Entity – what kind of entities/objects/rows • Relational – control relations between entities • Constraints don’t allow good data, they prevent “bad” data • But be mindful of what your constraints allow • Let me tell you a story... (SalesOrderDetails)
Why Don’t We Use Constraints? • Developer ignorance • Not necessarily stupid, just didn’t know • Cross-database integrity not supported • Multi-platform support (or lack thereof) • Data cleansing as a career (e.g. Data Quality Services) • “The app does all the checking, we don’t need them” • Yeah, right. We’ll get to that later
What aren’t Constraints? • Identity columns • Application Logic • Triggers • Security • Development environment • Execution restrictions • Server Policies (PBM etc.) • XML Schema Collections (these are a good thing though) Still important, just not managed using Constraints Still important, just not managed using Constraints
Examples • February 31 • Incomplete phone numbers • Still invalid if padded or trimmed • Duplicate rows* • Sales to non-existent customer accounts • Did they get the merchandise? Did they pay us? WHO ARE THESE PEOPLE?
But really, how bad could it be? • Ariane Flight 501 (16 bit integer overflow) • Mars Climate Orbiter (metric vs. US) • USS Yorktown (1997, divide by zero) • F-22 (crashes, Intl. Date Line) & F-35 (entire fleet grounded) • Qantas Flight 72 (didn’t crash, nose dived twice, injuries) • Patriot Missile (clock drift, missed target, soldiers died) • Therac-25 (deaths) • Not data but process, lack of safeguards at every level • More: http://goo.gl/4Akqm
Domain Constraints • Domain – fancy name for type or attribute (column in SQL) • Data type - what kind of data you’re storing (number, date, etc.) • Nullability- whether it’s required or not • Default • Check - what values are valid and invalid • Demo (discovery script)
Domain Constraints - Types • First constraint: Data type • Choose the right type • VARCHAR(max) is a bad sign • Float or real -> Bad! • Money -> Better (but still problematic) • Dates (use proper date/time types) • Good data type = storage and performance efficiency • But don’t be TOO efficient Ariane 501 NVARCHAR(max) is overkill for a phone # Ariane 501
Domain Constraints – NOT NULL • Limit or eliminate nullable columns • Nulls are not comparable, even to themselves • Not applicable vs. unknown • Which one does Null represent? • Nulls cannot be typed properly (or at all) • Read Chris Date • In SQL Server, nulls have storage overhead
Domain Constraints - Default • Use this default when value is unknown or unsupplied • N/A vs. Unknown can be addressed with defined values • Can be useful in cascading foreign keys • SET DEFAULT supported in SQL Server 2005+
Domain Constraints - CHECK • Restrict values further than what data type allows • Dates • Numeric/Money • Character • Read Alex Kuznetsov
Entity Constraints • Entity – fancy name for row or record (tuple – even fancier) • Primary Key - value(s) that identify the entity • Unique - value(s) that must be unique • Table-level check - values that are valid or invalid • One or more domains/attributes/columns • Demo (discovery script)
Domain Constraints - External • Very often ignored, not all values are valid (special meaning) • Doesn’t matter if external system can’t provide invalid value • Read Joe Celko • Read Karen López (@DataChick) Data, Measurements and Standards in SQL
Domain Constraints - Internal • Very often ignored (and rarely documented) • $10,000 utility bills for single (unoccupied) apartment • Percentage values (tax rate, discount) • See AdventureWorks Sales tables (smallmoney) • App could handle, but then must also fix if wrong • Date range overlaps (Alex Kuznetsov)
Entity Constraints – Primary Key • Uniquely identifies the row/record/tuple/entity • Must provide a value (NOT NULL by definition) • Logical construct, NOT physical • Surrogate vs. natural key • Celko: if you don’t have a key, you don’t have a table • Demo (discovery script)
Entity Constraints - Unique • Candidate keys (natural keys) • Phone number is unique, but not good candidate for PK • Screen Actors Guild (billed/credited name) • SSN, Credit card # (unique but can’t be used for ID) • Multiple columns are unique • Constraint vs. index (logical vs. physical)
Entity Constraints - CHECK • Multi-column conditions • Pregnancy status (N/A for males or young girls) • Unless you're British...http://goo.gl/msSeP • ANSI/ISO supports subqueries • Single table only • e.g. Force date- or time-only values in datetime columns (< SQL 2008) • Demo (discovery script)
Relational Constraints • Foreign keys • Requires data in one table to reference another table’s data • Referenced table (parent) • Referencing table (child) • Can be one OR MORE columns • Referential actions on UPDATE/DELETE in referenced • Demo (discovery script)
Demos • AdventureWorks • Good (Birth and Hire dates) • Could be better (Salesperson commission pct.) • SalesOrderDetail
Performance • UNIQUE/PRIMARY keys can improve performance • Demo • Can also save space in index structures • CHECK and Foreign Key • Eliminate data access if WHERE clause violates CHECK constraint • Must be trusted (next slides)
Trusted Constraints • ALTER TABLE…CHECK/NOCHECK CONSTRAINT • CHECK WITH CHECK… • Performance improvement • Demo • Partitioned views
Untrusted Constraints • Constraint could be enabled but data could be invalid • Performance hit • Demo • How to detect: • sys.check_constraints.is_not_trusted = 1 • sys.foreign_keys.is_not_trusted= 1 • Primary key and Unique are always trusted
Considerations • Be wary of “meets our current needs” • Be wary of “will also meet all possible future needs” • Aaron Bertrand “Bad Habits to Kick” blog • If there is a standard defined for your industry, use it (Celko – ISO) • Sanity check, make sure you covered everything • If you must deviate from the standard, document why • Multi-platform support • Doomed • Least-feature support = baseline • Constraints (if any) go into app logic • Use DBCC CHECKCONSTRAINTS! (thanks to Neil Hambly for reminding me!) • PASS Data Architecture Virtual Chapter! June 21 2012: http://dataarch.sqlpass.org/
DOMAIN Support • ANSI/ISO • Combines data type, nullability, default and checks in one object • Not available in SQL Server • Please vote on Connect: http://goo.gl/HbRo2 • PostgreSQL support! Example: CREATE DOMAINus_postal_codeAS varchar(10) NOT NULL DEFAULT('00000') CHECK( VALUE ~ '^\\d{5}$' ORVALUE ~ '^\\d{5}-\\d{4}$' ); • Can be simulated with user-defined types and rules…
Rules • Deprecated vendor extension (Celko gripe) • Since 7.0, still supported in SQL 2012 • Not part of formal definition of object • Must reference data value and have condition on it • No “anonymous” rules like with constraints • Only one rule can bind to an object • But that rule can have multiple conditions • And rule can bind to multiple objects • Which is too bad, because…
Unusual Constraint Use • Security • Prevent sa or sysadmin updates/inserts • Execution • Ad-hoc updates/inserts • Cursors • Connection settings like ANSI_PADDING • Time-of-day • Production vs. Dev/QA/UAT environments (thanks Argenis!) • See blog post: http://goo.gl/p23NL • Unusual data restrictions
Can you have too many? • IMHO, no • What about performance? • Layers? • App logic + stored procedure + triggers + constraints + rules • Don’t assume a single layer can do all the work • What’s the cost of fixing bad data? • And will it actually get fixed? • Importing/migrating bad data or design • Cruft from old systems • ETL – transform bad data, but don’t constrain it in destination Therac-25 Never time to do it right, Always time to do it over -Ron Soukup, SQL Server PM Ariane 501
But… • IF: • Customers don’t care • Boss doesn’t care • Devs/$$$ Contractor say app will cover • Make them personally fix any bad data • Yes, that includes customers • Have contractor refund fee AND fix bad data at no cost • If they don’t agree or can’t be convinced: • Add constraintsanyway • Unit testing An ounce of prevention is worth a metric ton of cure
But…but… • Now I’m just having fun… • Devs say Application does everything… • And DB constraints will make sure it works • Then why use SQL Server? Write your own storage engine. Or use NoSQL. The data will be right…eventually. • In 5 years we’ll rewrite your app in Ocaml, or Scala, or Blub • Moral: you’re using SQL Server anyway, use what it offers • If app actually works correctly, constraints won’t interfere • Even if you switch RDBMS, USE CONSTRAINTS! Credit: Andy Leonard & Buck Woody
And if you think I’m strict… • SQLite testing procedure: http://goo.gl/GfvIH • 1177x as much test code as program code • Space Shuttle Software Development: http://goo.gl/1k2kt • 11 versions * 400K+ lines each = 17 errors total • F-35 (JSF) Software coding standard: http://goo.gl/4rPxu • 141 pages on C++ programming guidelines • Which is really ironic… • Mars Spacecraft (Curiosity, Opportunity/Spirit, Phoenix) • Extremely limited resources (20 MHz/10 MB RAM)
Conclusion • Types of constraints • Domain, Entity, Relational • Why you need them • Process, procedure and environmental • Apps and triggers can’t reliably protect or enforce • New apps/processes must duplicate all logic • Assume layers/components/processes will fail or change • Protect data and DB integrity with constraints • Constraints document integrity rules • Use them! And be sure to check on them (DBCC)
References • Books Online! • MSDN • Books • Beginning SQL Server 2008 for Developers: From Novice to Professional by Robin Dewson (Apress) • Pro SQL Server 2008 Relational Database Design and Implementationby Louis Davidson, with Kevin Kline, Scott Klein, Kurt Windisch(Apress) • A Developer’s Guide to Data Modeling for SQL Server, by Eric Johnson and Joshua Jones (Addison-Wesley) • Defensive Database Programming by Alex Kuznetsov (Red Gate Books, PDF link below) • SQL Antipatterns: Avoiding the Pitfalls of Database Programming by Bill Karwin (Pragmatic Bookshelf) • SQL and Relational Theory: How to Write Accurate SQL Code by C. J. Date(O’Reilly) • Blogs/Sites • http://weblogs.sqlteam.com/ • http://sqlblog.com/ (lots of great people) • http://sqlskills.com/ (Paul Randal, Kimberly Tripp, Jonathan Kehayias, Bob Beuchemin, Erin Stellato, Joe Sack, Glenn Berry) • http://tsql.solidq.com/ (Itzik Ben-Gan) • http://www.simple-talk.com/sql/database-administration/constraints-and-the-test-driven-database/ • http://www.simple-talk.com/books/sql-books/defensive-database-programming/ • http://sqlblog.com/blogs/alexander_kuznetsov/archive/2009/04/28/using-rowversion-to-enforce-business-rules.aspx • http://sqlblog.com/blogs/alexander_kuznetsov/archive/2009/03/08/storing-intervals-of-time-with-no-overlaps.aspx • http://sqlblog.com/blogs/hugo_kornelis/archive/2007/03/29/can-you-trust-your-constraints.aspx • http://goo.gl/xCwdK (Datachix - Trusted Constraints) • More Books • Microsoft SQL Server 2008 Internals by Kalen Delaney, Paul S. Randal, Kimberly L. Tripp, Conor Cunningham, Adam Machanic (MSPress) • SQL Server MVP Deep Dives by SQL Server MVPs for War Child International (Manning) • SQL Server MVP Deep Dives 2 by SQL Server MVPs for Operation Smile (Manning) • Microsoft SQL Server 2008: T-SQL Fundamentalsby Itzik Ben-Gan (MSPress) • SQL for Smartiesby Joe Celko (Morgan Kauffman)
Coming up… • #SQLBITS
Thanks! SQLBitsSponsors, Organizers & Volunteers!