270 likes | 521 Views
A special thanks to all our sponsors!. PLATINUM SPONSORS. GOLD SPONSORS. SILVER SPONSORS. SPONSORS. PRIZE & GIVEAWAY SPONSORS. SQL & RDBMS Antipatterns. Nate West nwest@cardinalsolutions.com.
E N D
A special thanks to all our sponsors! PLATINUM SPONSORS GOLD SPONSORS SILVER SPONSORS SPONSORS PRIZE & GIVEAWAY SPONSORS
SQL & RDBMSAntipatterns Nate West nwest@cardinalsolutions.com
Antipatterns are techniques used to solve problems that end up leading to other, sometimes worse, problems. Please don’t take offense – many of these are learned “the hard way.” These are general rules of thumb, not “never do this ever” rules.
Query Antipatterns We made too many wrong mistakes. –Yogi Berra
SELECT * Demo • Network (and maybe Disk) I/O • App only needs 3 columns. Select * returns all of them. • Select only what you need – maybe there’s a covering index that already has the data. • Change Management • INSERT INTO table SELECT * FROM … • Views, Stored Procs, and App code don’t update on their own! • Always be specific!
NULL makes everything unclear • Adding or Concatenating vales to NULL • SELECT NULL + 1 -> NULL • SELECT ‘Hello ‘ + NULL -> NULL • 3-Valued Logic: • WHERE column = NULL -> unknown • WHERE column <> NULL -> unknown • WHERE NULL = NULL -> unknown • WHERE NOT (column = NULL) -> unknown
Demo • Don’t pick a value to represent NULL • UPDATE tran_discount SET amount = -9999 WHERE amount IS NULL • SELECT sum(*) FROM tran_discountWHERE amount <> -9999 • Use the COALESCE() function to turn NULLS into meaningful values • Learn how NULL works, use it sparingly. • If you can, stop using NULLs completely, and instead decompose your tables. • See “SQL and Relational Theory, 2nd Edition (Appendix C) by C.J. Date
Non-Sargable Predicates Demo Sargable: Search ARGument-able – predicate can be used by indexes. • Date Functions on a small subset • WHERE YEAR(date) = 2012 AND MONTH(date) = 3 • ISNULL/NVL or COALESCE • SELECT .. WHERE ISNULL(product_code, ‘U’) = ‘U’ OR COALESCE(product_code, ‘U’) = ‘U’ • Substring when a Leading LIKE will do • SELECT … WHERE SUBSTRING(ZipCode, 1, 3) = ‘280’
WHERE are your JOINs? • Deprecated Outer Join syntax • SQL Server *= & =*, Oracle (+) • Comma joins: • SELECT a, b FROM t1, t2 • Easy to accidently cause a Cartesian product • Use ANSI JOIN syntax • Use JOIN for join criteria. • Use WHERE for filter criteria.
Date/Time Acrobatics Demo • Don’t do complex computations on dates • What happens when the company holidays change? • Complexity grows very quickly • Example: Get records that fall on the first Friday of each month for this year.WHERE DATEPART(‘dw’, dt) = 6 AND DATEPART(‘yyyy’, dt) = @my_yearAND …figure out the first fridays of each month…. • Instead • Build a Calendar Table • SELECT * from T JOIN CALENDAR C ON (T.dt = CALENDAR.dt) WHERE C.DayName = ‘Friday’ AND C.DayInMonthRank = 1 AND C.Y = @my_year
Database DESIGN Antipatterns Experience isn’t hereditary. It isn’t even contagious. - Dr. John Gall, The Systems Bible
Most of the following antipatterns start out when we think: “The only thing that will interact with this database is our application!” Web Client Web Client TODAY A YEAR FROM NOW… Application Server Application Server File export to Vendor Custom Reports (SSRS, etc) Database Database Master Data / Upstream Applications Enterprise Data Warehouse Bob from Accounting’s MS Excel Connection Downstream Applications
Untyped & Incorrectly Typed Data Demo • Strings (varchar) everywhere! • DATETIME when DATE is what you want • Oracle – Create a Function-Based Index on TRUNC(date_col) • MONEY is a shifted integer, not a decimal • FLOAT is *approximate* • Use BIT carefully • Doesn’t Aggregate • Don’t use it for properties that don’t truly represent a True/False (Don’t make it NULL) • VARCHAR for names (use NVARCHAR – Unicode!) • Using TIME as an interval… • Use ANSI INTERVAL data types when you can (Oracle) • Otherwise use two columns, StartDateTime and EndDateTime.
Unconstrained Data • “We don’t use keys, the application maintains everything.” • No Primary Keys! • Well maybe, but its an ever-increasing IDENTITY so we have a bunch of duplicate records from when our vendor interface accidently ran twice. • No Foreign Keys • No CHECK Constraints • No UNIQUE Constraints • Leads to: • Mistakes • That one-time update by IT Support? • Race Conditions • That custom code you wrote? Did you remember to lock everywhere you had to? • Bad query plans (the optimizer uses constraints!) • Bad data quality • Angry business users • “My error message says “Null Reference Exception…”
Multi-Valued Attributes Demo • Delimiter(comma)-Separated Lists, stored in a VARCHAR Field • Can’t Index individual items • Limited Length to the VARCHAR • Can’t add constraints for validation. • What if the delimiter shows up in your data? • Difficult to group / order by • Multiple columns that relate to the same thing • How do you aggregate these? • Searching across multiple columns is tough • Create a lookup / relationship tableinstead
Database Independence “We want our application to be database-independent.” Translation: • We only want to use a tiny subset of database features • We want to show off our skills, built in functionality is boring • We want to write a bunch of transaction locking code in our application • We want to replace our database-specific code with ORM-specific code • We want to make adding data to the database as hard as possible • Nested service/web service calls and context switches • Pass through several layers of object abstraction • 40,000 rows/sec data load times? Good luck getting 5 rows/sec without scaling hardware up to ridiculous proportions • We want to ensure that there is a bunch of tribal knowledge in our developers’ heads (job security?) “The truth is that, with the exception of trivial applications, achieving database independence is not only extremely hard, but is also extremely costly and consumes a huge amount of resources … your application would not use SQL beyond simple "keyed reads" … in short, you would end up writing your own database.“ – Tom Kyte, “Effective Oracle by Design”
Polymorphic Associations 1 INDIVIDUALS M ACCOUNTS ORGANIZATIONS 1 A polymorphic association when a field in a table can reference a key from *multiple* tables. Polymorphic associations are *backward*!
1 0..M 0..M INDIVIDUAL ACCOUNTS INDIVIDUALS 1 ACCOUNTS 1 0..M 1 ORGANIZATION ACCOUNTS ORGANIZATION 0..M • Create a Common Super-Table 0..1 INDIVIDUALS 1 ACCOUNTS ACCOUNT HOLDERS 0..M 1 1 0..1 ORGANIZATION Demo Reverse the relationship
XML & Blobbed Data “XML is like Violence, if it doesn't solve the problem, use some more.” • One table to rule them all: • CREATE TABLE objects (object_idbigint primary key, data xml); • Or even worse – where data is a serialized object… • CREATE TABLE objects (object_idbigint primary key, data varbinary(max)); • Sure, you can write queries with XML columns: • SELECT ObjectId, data.value('(/Payment/RequestedAmount)[1]', 'decimal(8,2)') , data.value('(/Payment/ApprovedAmount)[1]', 'decimal(8,2)‘), data.value('(/Payment/PaymentDate)[1]', 'date') FROM dbo.T1 WHERE data.exist('/Provider/Type[text() = “Ophthalmologist"]') = 1 • But what if I want to do more than a simple query? • Summarizing and grouping become incredibly difficult. • Especially when the data has multiple elements of the same type
Entity-Attribute-Value “Rather, it refers to a mistake commonly made in database practice of focusing upfront exclusively on structure--here, ability to change the schema frequently--but at the cost of prohibitive data integrity and manipulation burden.” – F. Pascal “You do not need to actually design anything!” – Tom Kyte
OBJECTS object_id(PK) INT object_typeVARCHAR OBJECT_ATTRIBUTE_VALUES object_id (FK) INT attribute_id INT attribute_value VARCHAR ATTRIBUTES attribute_id INT attribute_name VARCHAR attribute_type VARCHAR
But it makes CRUD easy! • And it makes *everything* else much more difficult • Find me all people not named Tom Jones who live in Las Vegas. SELECT pan.attribute_nameas person_name,FROM OBJECTS p LEFT JOIN OBJECT_ATTRIBUTE_VALUES poa ON (p.object_id = poa.object_id) JOIN ATTRIBUTES pan ON (pan.attribute_id = poa.attribute_idAND pa.attribute_name = ‘Name’)JOIN ATTRIBUTES pal ON (pal.attribute_id= poa.attribute_id AND pa.attribute_name= ‘Location Object ID’JOIN OBJECTS l ON (l.object_id = cast(pal.attribute_name as int))JOIN OBJECT_ATTRIBUTE_VALUES loa ON (l.object_id = loa.object_id)JOIN ATTRIBUTES lac ON (lac.attribute_id= loa.attribute_idAND pa.attribute_name = ‘City’)WHERE p.object_type = ‘PERSON’ and l.object_type = ‘LOCATION’ AND loa.attribute_value <> ‘Las Vegas’ and pan.attribute_value <> ‘Tom Jones’ OR SELECT name FROM PERSON p JOIN LOCATION l ON (p.location_id = l.location_id) WHERE l.city <> ‘Las Vegas’ and p.name <> ‘Tom Jones’;
What if I need “flexibility”? • Determine how flexible various parts of the model are. • At least 95% of the model is likely static. If not, get better requirements. • If you must have them, model the “flexible” parts: • With “extension” tables • generic_date_1, generic_date_2, etc. • Has a data type • Can be indexed • Can add constraints • Can actually use WHERE column = X • Add a few metadata tables so the application can get what it needs • Look into Anchor Modeling, or a highly normalized schema • 5NF/6NF • Evolving Schema
Fix WHAT you can… But Remember In the struggle between yourself and the world, side with the world. - Franz Kafka
Questions? Nate West nwest@cardinalsolutions.com
References & More • Bill Karwin, SQL Antipatterns • http://karwin.blogspot.com/ • Aaron Bertrand • http://sqlblog.com/blogs/aaron_bertrand/ • Tom Kyte • http://tkyte.blogspot.com/, http://asktom.oracle.com • C.J. Date, SQL and Relational Theory • Fabian Pascal • http://www.dbdebunk.com/