Module 5 Planning for SQL Server ® 2008 R2 Indexing

Module 5 Planning for SQL Server® 2008 R2 Indexing

Module Overview • Core Indexing Concepts • Data Types and Indexes • Single Column and Composite Indexes

Lesson 1: Core Indexing Concepts • How SQL Server Accesses Data • The Need for Indexes • Index Structures • Selectivity, Density and Index Depth • Index Fragmentation • Demonstration 1A: Viewing Index Fragmentation

How SQL Server Accesses Data Table Scan SQL Server reads all table pages Index SQL Server uses index pages to find rows

The Need for Indexes • ANSI SQL does not mention indexes • Generally considered to be external to the logical data model • All queries can be executed without indexes • Primary reason for indexes is performance • Some constraints are implemented via indexes • Indexes are used to make constraints efficient but in theory could be implemented in other ways • Analogy: Physical library holding books • Index by author is useful • Additional indexes would also be useful

Index Structures • Indexes are commonly based on tree structures • Not just a binary tree as nodes can have more than two children • Top node is called the root node • Bottom level nodes are called leaf nodes

Selectivity, Density and Index Depth • Three core concepts when working with indexes • Selectivity • A measure of how many rows are returned compared to the total number of rows • High selectivity means a small number of rows when related to the total number of rows • Density • A measure of the lack of uniqueness of data in the table • High density indicates a large number of duplicates • Index Depth • Number of levels within the index • Common misconception that indexes are deep

Index Fragmentation How does fragmentation occur? • SQL Server reorganizes index pages when data modifications cause index pages to split Types of fragmentation: • Internal – pages are not full • External – pages are out of logical sequence Detecting fragmentation • SQL Server Management Studio – Index Properties • System function - sys.dm_db_index_physical_stats

Demonstration 1A: Viewing Index Fragmentation In this demonstration, you will see how to identify fragmented indexes

Lesson 2: Data Types and Indexes • Numeric Index Data • Character Index Data • Date-Related Index Data • GUID Index Data • BIT Index Data • Indexing Computed Columns

Numeric Index Data • Indexes with numeric keys work efficiently • Many values fit in a small number of index pages • Sorts and comparisons are quick • Exact numeric types are most efficient • Integer types are the most efficient • INT and BIGINT commonly used • Approximate data types (float and real) much less efficient

Character Index Data • Character data types are much less efficient when used in index keys • Character values tend to be much larger than numeric values • Even short character values are slow to compare unless binary comparisons are being made • Most SQL Server applications use collations other than binary • Rules for collations need to be applied whenever comparisons are made

Date-Related Index Data • Date data types are generally good candidates for index keys • Very commonly used in business applications • Only slightly less efficient than integer data • Size of the data will be important • date more efficient than datetime

GUID Index Data • Becoming very common in new business applications • Moderate efficiency • Size is 128 bits or 16 bytes • Comparison performance is reasonable • Problems arise • Randomness of generation causes fragmentation problems • Very common problem in many current applications

BIT Index Data • BIT columns have only two possible values • BIT columns are efficient as index keys • Common misconception that BIT columns are not useful in indexes • Many valid usage scenarios exist • Filtered indexes often useful with BIT column indexes

Indexing Computed Columns • Computed columns are based on expressions • Values are typically derived from other columns • Indexing the computed value rather than any underlying values can be useful • Can assist with improving performance on poorly-designed databases • Example: a column that is used to hold values that should have been held in separate columns • Persisted computed columns • Avoid the overhead of computing values each time they are selected • Are computed during INSERT and UPDATE statements

Lesson 3: Single Column and Composite Indexes • Single Column vs. Composite Indexes • Ascending vs. Descending Indexes • Index Statistics • Demonstration 3A: Viewing Index Statistics

Single Column vs. Composite Indexes • Indexes are not always constructed on a single column • Multi-column indexes are called "composite" indexes • Composite indexes are often useful • Tend to be more useful than single column indexes in most typical business applications • Having an index sorted first by customer then by order date makes it very easy to find orders for a particular customer on a particular date. • A query might involve multiple search predicates. • Two columns together might be selective while neither is selective on its own. • Index on A,B is not the same as an index on B,A • Typically index the most restrictive column first

Ascending vs. Descending Indexes • Indexes could be constructed in ascending or descending order • In general, for single column indexes, both are equally useful • Each layer of a SQL Server index is double-linked (ie: linked in both directions) • SQL Server can start at either end and work towards the other end • Each component of a composite index can be ascending or descending • Might be very useful for avoiding sort operations

Index Statistics • SQL Server needs to have knowledge of the layout of the data in a table or index before it optimizes and executes queries • Needs to create a reasonable plan for executing the query • Important to know the usefulness of each index • Selectivity is the most important metric • By default, SQL Server automatically creates statistics on indexes • Can be disabled • Recommendation is to leave auto-creation and auto-update enabled

Demonstration 3A: Viewing Index Statistics • In this demonstration, you will see how to work with index statistics

Lab 5: Planning for SQL Server Indexing • Exercise 1: Explore existing index statistics • Challenge Exercise 2: Design column orders for indexes (Only if time permits) Logon information Estimated time: 45minutes

Lab Scenario You have been asked to explain the concept of index statistics and selectivity to a new developer. You will explore the statistics available on an existing index and determine how selective some sample queries would be. One of the company developers has provided you with a list of the most important queries that will be executed by the new marketing management system. Depending upon how much time you have available, you need to determine the best column orders for indexes to support each query. Complete as many as possible within the allocated time. In later modules, you will consider how these indexes would be implemented. Each query is to be considered in isolation in this exercise.

Lab Review • Which types of queries would most likely lead to widely-differing query plans? • If you have an equality predicate and a LIKE predicate in your most important query, which predicate would you try to satisfy as the first column of a composite index?

Module Review and Takeaways • Review Questions • Best Practices

Module 5 Planning for SQL Server ® 2008 R2 Indexing