510 likes | 522 Views
Learn essential SQL optimization strategies to boost database performance. Uncover key techniques for index usage, query rewrite, and Sargable queries. Enhance your SQL skills for faster data retrieval and improved database efficiency.
E N D
SQL Performance Tips and Techniques Eric Peterson Director Database Administration PayGo Electric
Background • 30 years of database experience • Mainframe DB2 • Former Sybase Architect • MS SQL server 4.2+ • Director Database Administration • PayGo Electric • epeterson@PayGoUtilities.com
Agenda • Entry level Presentation • Setup a Lunch and Learn with your developers • Background on Physical Storage • Tables • Indexes • Tips and Techniques • Minor changes can make a huge differance • High level Overview of Tuning tools
Data is stored in Tables • A heap table has no index • The data written sequentially to the end of the table Table Fast Write
Heap Tables • Reads are not sorted and SLOW • The entire table may have to be read to find a row READ Table
Indexes • The key to reading your data fast READ Table Scan Table Index(s)
Two Common Types of Indexes • Clustered • Sorted B-Tree Index is built into the table Cluster Index Non ClusteredIndex(s)
Two Common Types of Indexes • Clustered • Sorted B-Tree Index is built into the table Cluster Index Table Scan can still Occur with BAD SQL Non ClusteredIndex(s) Table Data is stored on the Leaf Page of a Clustered Index READ READ
Sargable Queries BAD SQL Does not use INDEXES • The term “Sargable“ (coined by Kalen Delaney) created from the pseudo-acronym "SARG", which stands for "Search ARGument," which refers to a WHERE clause that compares a column to a constant value. • If a WHERE clause is sargable, this means that it can take advantage of an index (assuming one is available) to speed completion of the query. • If a WHERE clause is non-sargable, this means that the WHERE clause (or at least part of it) cannot take advantage of an index, instead performing a table/index scan, which may cause the query's performance to suffer.
Non Sargable Query Operators • Non-sargable search arguments in the WHERE clause • "IS NULL", "<>", "!=", "!>", "!<", "NOT", "NOT EXISTS", "NOT IN", "NOT LIKE", and "LIKE '%500'" • These arguments generally prevents (but not always) the query optimizer from using an index to perform a search. • Left or right outer joins while not in the where clause non-sargable results
Operators • Here are the key operators used in the WHERE clause, ordered by their performance. Those operators at the top will produce results faster than those listed at the bottom. • = • >, >=, <, <= • LIKE • <>, not =, != • This lesson here is to use = as much as possible, and <>, not =, != as least as possible
Where Operands • Here are the key operands used in the WHERE clause, ordered by their performance. Those operands at the top will produce results faster than those listed at the bottom. • A single literal used by itself on one side of an operator • A single column name used by itself on one side of an operator, a single parameter used by itself on one side of an operator • A multi-operand expression on one side of an operator • A single exact number on one side of an operator • Other numeric number (other than exact), date and time • Character data, NULLs • The simpler the operand, and using exact numbers, provides the best overall performance.
Benchmark, your results may… • MS keeps improving the Optimizer. • The original presentation is 5+ years old • Things improve every version, and your results may be better. • Everything below really needs to be benchmarked • As we can identify issues • MS should be able to fix them.
Non Sargable Columns • Expressions that include a function or change of data on a column • where amount +1 = 23 • expressions that have the same column on both sides of the operator • where amount = amount - credit
Rewrite Queries for Sargablity • In some cases, it may be possible to rewrite a non-sargable WHERE clause into one that is sargable. For example: • WHERE SUBSTRING(firstname,1,1) = 'm' • can be rewritten like this: • WHERE firstname like 'm%'
Change Functions for Sargablity • Function Acts Directly on Column, and Index Cannot Be Used: ( exception: Min/Max ) • SELECT member_number, first_name, last_nameFROM membersWHERE DATEDIFF(yy,datofbirth,GETDATE()) > 21 • Function Separated From Column, and an Index Can Be Used: • SELECT member_number, first_name, last_nameFROM membersWHERE dateofbirth < DATEADD(yy,-21,GETDATE())
Non Like Data Types • SQL Server will allow you to use non like data types in your compare • They will not use an index SELECT E.LastName, E.SSN FROM dbo.EmployeeAS E WHERE E.NationalIDNumber= 112457891; SELECT E.LastName, E.SSN FROM dbo.Employee AS E WHERE E.NationalIDNumber = ‘112457891’ ;
Rewrite Queries for Sargablity • WHERE clauses that use NOT are not sargable, but can often be rewritten to remove the NOT from the WHERE clause, for example: • WHERE NOT column_name > 5 • WHERE column_name !> 5 • to make it sargable • WHERE column_name <= 5
Avoid Not In • If you currently have a query that uses NOT IN, which offers poor performance because the SQL Server optimizer has to use a nested table scan to perform this activity, instead try to use one of the following options instead, all of which offer better performance: • Use EXISTS or NOT EXISTS • Use IN • Perform a LEFT OUTER JOIN and check for a NULL condition
Like in Where Clause • If you use LIKE in your WHERE clause, try to use one or more leading character in the clause, if at all possible. For example, use: • LIKE 'm%' • not: • LIKE '%m' • If you use a leading character in your LIKE clause, then the Query Optimizer has the ability to potentially use an index to perform the query, speeding performance and reducing the load on SQL Server.
Where Clause Order • If a WHERE clause includes multiple expressions, there is generally no performance benefit gained by ordering the various expressions in any particular order. • Where AcctId = 1234 and CustId = 4321 • Where CustId = 4321 and AcctId = 1234 • This is because the SQL Server Query Optimizer does this for you, saving you the effort.
In Vs. Between • When you have a choice of using the IN or the BETWEEN clauses in your Transact-SQL, you will generally want to use the BETWEEN clause, as it is much more efficient. For example: • SELECT customer_number, customer_nameFROM customerWHERE customer_number in (1000, 1001, 1002, 1003, 1004) • is much less efficient than this: • SELECT customer_number, customer_nameFROM customerWHERE customer_number BETWEEN 1000 and 1004 • Assuming there is a useful index on customer_number, the Query Optimizer can locate a range of numbers much faster (using BETWEEN) than it can find a series of numbers using the IN clause (which is really just another form of the OR clause).
Many with an OR • The Query Optimizer will perform a table scan or a clustered index scan on a table if the WHERE clause in the query contains an OR operator and • if any of the referenced columns in the OR clause are not indexed • Or does not have a useful index • Because of this, if you use many queries with OR clauses, you will want to ensure that each referenced column in the WHERE clause has a useful index.
OR Vs. UNION • It may be possible to use a union over an or • SELECT employeeID, firstname, lastnameFROM namesWHERE dept = 'prod' or city = 'Orlando' or division = 'food' • This query has three separate conditions in the WHERE clause. In order for this query to use an index, then there must be an index on all three columns found in the WHERE clause. • This same query can be written using UNION ALL instead of OR, like this example: • SELECT employeeID, firstname, lastname FROM names • WHERE dept = 'prod‘ • UNION ALL • SELECT employeeID, firstname, lastname FROM names • WHERE city = 'Orlando'UNION ALL • SELECT employeeID, firstname, lastname FROM names • WHERE division = 'food'
Avoid Unnecessary Sorts • Whenever SQL Server has to perform a sorting operation, additional resources have to be used to perform this task. Sorting often occurs when any of the following Transact-SQL statements are executed: • ORDER BY • GROUP BY • SELECT DISTINCT • UNION • CREATE INDEX • Note: use of these will sort before using TOP
Speeding up Sorts • In many cases, the sort commands cannot be avoided, but there are few ways that sorting overhead can be reduced. These include: • Keep the number of rows to be sorted to a minimum. Do this by only returning those rows that absolutely need to be sorted. • Keep the number of columns to be sorted to the minimum. In other words, don't sort more columns that required. • Keep the width (physical size) of the columns to be sorted to a minimum. • Sort column with number datatypes instead of character datatypes.
Improving Having • write your query so that the WHERE clause does most of the work (removing undesired rows) instead of the HAVING clause do the work of removing undesired rows. • Using the WHERE clause appropriately can eliminate unnecessary rows before they get to the GROUP BY and HAVING clause, saving some unnecessary work, and boosting performance. • For example, in a SELECT statement with WHERE, GROUP BY, and HAVING clauses, here's what happens. • First, the WHERE clause is used to select the appropriate rows that need to be grouped. • Next, the GROUP BY clause divides the rows into sets of grouped rows, and then aggregates their values. • And last, the HAVING clause then eliminates undesired aggregated groups. If the WHERE clause is used to eliminate as many of the undesired rows as possible, this means the GROUP BY and the HAVING clauses will have less work to do, boosting the overall performance of the query.
Which is more efficient? • Using a NOT EXISTS • SELECT a.hdr_keyFROM hdr_tbl aWHERE NOT EXISTS (SELECT * FROM dtl_tbl b WHERE a.hdr_key = b.hdr_key) • Using a LEFT JOIN • SELECT a.hdr_keyFROM hdr_tbl aLEFT JOIN dtl_tbl b ON a.hdr_key = b.hdr_keyWHERE b.hdr_key IS NULL • Using a NOT IN • SELECT hdr_keyFROM hdr_tblWHERE hdr_key NOT IN (SELECT hdr_key FROM dtl_tbl) • the best performing version through the worst performing version will be from top to bottom, as displayed above.
Avoid Using Cursors • If possible, avoid using SQL Server cursors. They generally use a lot of SQL SERVER resources and reduce the performance and scalability of your applications. • If you need to perform row-by-row operations, try to find another method to perform the task. • Some options are to perform the task at the client, use tempdb tables, use derived tables, use a correlated sub-query, or use the CASE statement. • More often than not, there are non-cursor techniques that can be used to perform the same tasks as a SQL Server cursor.
Top and Rowcount • You may have heard of a SET command called SET ROWCOUNT. Like the TOP operator, it is designed to limit how many rows are returned from a SELECT statement. In effect, the SET ROWCOUNT and the TOP operator perform the same function. • While is most cases, using either option works equally efficiently, there are some instances (such as rows returned from an unsorted heap) where the TOP operator is more efficient than using SET ROWCOUNT. • Because of this, using the TOP operator is preferable to using SET ROWCOUNT to limit the number of rows returned by a query.
Datetime Issues • Datetime columns contain both date and time: ‘2006-03-31 13:03:59:9999’ • Inserting a date only ( 2006-03-31’) into a datetime field will insert the date and at midnight IE: ‘2006-03-31 00:00:00:0000’ • Getdate() returns the current date and time • Using a query with a date only may cause ranges to fail…..especially with an starting and ending date to the range. • Valid range of values for today will equal • ‘2006-03-31 00:00:00:0000’ • ‘2006-03-31 23:59:59:9999’ • Either the data or the query will need to be changed for valid range searches.
SQL Server Date function • SQL Server 2008 introduced a new data type Date • A datetime without the timestamp • Old way to strip off the time was: • convert(datetime(convert(char(11), getdate()))) • New 2008 • convert(date,getdate())
Datetime Range Resolution • The Following actual example shows how not to efficiently query date ranges set @DateFrom ='2004-02-02 00:00:00.000' set @DateTo = '2004-03-11 00:00:00.000' select invoiceId, invDate from invoice i where Cast(Convert(VarChar(20), i.InvDate, 101) as Datetime) Between @DateFrom and @DateTo • SQL:StmtCompleted Duration: 203 CPU: 203 Reads: 148 Writes: 0 • The following is a better (and sargable) approach in your code declare @DateFrom datetime , @DateTo datetime set @DateFrom ='2004-02-02 00:00:00.000' set @DateTo = dateadd( ms, -1, dateadd(day, +1, '2004-03-11 00:00:00.000') ) select invoiceId, invDate from invoice i where i.InvDate Between @DateFrom and @DateTo • SQL:StmtCompleted Duration: 15 CPU: 0 Reads: 148 Writes: 0
Constraint vs. Trigger • When you have a choice of using a constraint or a trigger to perform the same task, always choose the constraint. • The same goes if you have the option of using either a constraint or a rule, or a constraint or a default. • Constraints require less overhead than triggers, rules, and defaults, reducing the load on SQL Server, and boosting overall performance.
Joins • Very efficient if using indexes • Use full joins • Right and left joins are not as efficient and often not needed, and cause table/index scans • Use Join table syntax instead of where clause join (old style) • join sys.foreign_key_columns KC • on K.object_id = KC.constraint_object_id • Do not use • sys.foreign_key_columns KC • Where K.object_id = KC.constraint_object_id
Join vs. Subquery • If you have the choice of using a join or a subquery to perform the same task within a query, generally the join is faster. • This usually depends on the number of rows returned by the sub query….the more, the less efficient it is. • But this is not always the case, and you can may want to test the query using both methods to determine which is faster for your particular application.
Procedures vs. Dynamic SQL • A stored procedure is more efficient than dynamic SQL • Query plan stored • Use the OPTION (Optimize for ( @HostVar = 5)) • Will optimize the query plan
Index Temp Tables • While most temporary tables probably won't need, or even can use an index, some larger temporary tables can benefit from them. • A properly designed index on a temporary table can be as great a benefit as a properly designed index on a standard database table. • In order to determine if indexes will help the performance of your applications using temporary tables, you will probably have to perform some testing.
Select Into • If you need to use the SELECT INTO option, keep in mind that it can lock system tables, preventing others users from accessing the data they need while the data is being inserted. • In order to prevent or minimize the problems caused by locked tables, try to schedule the use of SELECT INTO when your SQL Server is less busy. • In addition, try to keep the amount of data inserted to a minimum. In some cases, it may be better to perform several, smaller SELECT INTOs instead of performing one large SELECT INTO
Using Indexes • Adding an index may improve performance on a Query • Will also slow up the following • Inserts • Updates to the column • deletes
Covering indexes • An Index is covered when • The lead columns are the lead columns in another index. • Example • Create index testA on test (TestId, CustId) • Covers • Create index testB on test (TestId) • This index is a duplicate, and will get just as good as performance from the A index • But dosent cover • Create index testC on test (CustId)
Index Hints • Generally, avoid using optimizer hints in your queries, as it is hard to out guess the optimizer but… • If you find that SQL Server uses a TABLE SCAN instead of an INDEX SEEK when you use an IN or OR clause as part of your WHERE clause, even when those columns are covered by an index, consider using an index hint to force the Query Optimizer to use the index. • For example: (takes about 3 seconds) • SELECT * FROM tblTaskProcesses WHERE nextprocess = 1 AND processid IN (8,32,45) • With hint: (takes about 1 second ) • SELECT * FROM tblTaskProcesses (INDEX = IX_ProcessID) WHERE nextprocess = 1 AND processid IN (8,32,45)
Tools • How do you know if your query is using an index • How long will a query take to run? • Let the SQL Server tell you….
SQL Server Management Studio • Display estimated Execution plan • Located under menu Query ( or CNTL L ) • Tells lots of information • Number of sorts • Join information • Index usage • Rows processed at each stage
Textual plans • These are a bit harder to read, but more information is immediately available. There are three text plan formats: • SHOWPLAN_ALL : a reasonably complete set of data showing the Estimated execution plan for the query • SHOWPLAN_TEXT : provides a very limited set of data for use with tools like osql.exe. It too only shows the Estimated execution plan • STATISTICS PROFILE: similar to SHOWPLAN_ALL except it represents the data for the Actual execution plan
Time and Statistics • SET STATISTICS IO { ON | OFF } • SCANS • Logical and Physical Reads • SET STATISTICS TIME { ON | OFF } • SQL Server parse and compile time: • CPU time = 0 ms, elapsed time = 1 ms. • Note: running a query twice will be cached and faster. • DBCC DROPCLEANBUFFERS;
XML Plans • XML plans present the most complete set of data available on a plan, all on display in the structured XML format. There are two varieties of XML plan: • SHOWPLAN_XML : The plan generated by the optimizer prior to execution. • STATISTICS_XML: The XML format of the Actual execution plan.
DB Optimizer - Idera • Graphical tool combining waits and cpu • Identifies bottlenecks in SQL • Breaks down SQL to key properties
SQL Sentury – Plan Explorer • Free tool • Examines plan • Index what if analisys