240 likes | 457 Views
Overview of Indexing. Chapter 8 – Part II. Introduction to indexing First glimpse at indices and workloads. Understanding the Workload. For each query in workload: Which relations does it access? Which attributes are retrieved?
E N D
Overview of Indexing Chapter 8 – Part II. Introduction to indexing First glimpse at indices and workloads
Understanding the Workload • For each query in workload: • Which relations does it access? • Which attributes are retrieved? • Which attributes are involved in selection/join conditions? • How selective are these conditions likely to be? • For each update in workload: • Which attributes are involved in selection/join conditions? • How selective are these conditions likely to be? • The type of update (INSERT/DELETE/UPDATE), and the attributes that are affected.
Choice of Indexes • What indexes should we create? • Which relations should have indexes? • What field(s) should be the search key? • Should we build several indexes? • For each index, what kind of an index should it be? • Clustered vs. unclustered? Hash vs. tree? • Clustering must be used sparingly and only when justified by frequent queries that benefit from clustering. • At most one index can be clustered. (Why?) • Consider utilizing index-only evaluation. (e.g., avg(age))
Choice of Indexes: One Approach • Consider most important queries in turn. • Consider best plan using current indexes, and see if a better plan possible with additional index. If so, create it. • Consider impact on updates in workload! • Trade-off: Indexes can make queries go faster, updates slower. Require disk space, too. • Obviously, we must understand how DBMS evaluates queries and creates query evaluation plans
Choice of Indexes: Simple Approach • For now, we discuss simple 1-table queries.
Index Selection Guidelines • Attributes in WHERE clause are candidates for index keys. • Exact match condition suggests hash index. • Range query suggests tree index. • Clustering is especially useful for range queries • Clustering can also help equality queries if there are many duplicates.
Index Selection Guidelines • Multi-attribute search keys considered when WHERE clause contains several conditions. • Order of attributes is important for range queries. • Such indexes can sometimes enable index-only strategies for important queries. • Question : For index-only strategies, is clustering important ?
Index Selection Guidelines • Try to choose indexes that benefit as many queries as possible. • Since only one index can be clustered per relation, choose it based on important queries that would benefit the most from clustering.
Examples of Clustered Indexes SELECT E.dno FROM Emp E WHERE E.age>40 • B+ tree index on E.dno? • B+ tree index on E.age ? • Trade-offs : • How selective is the condition? (all > 40?) or (only some > 40) • Is the index clustered?
Examples of Clustered Indexes SELECT E.dno, COUNT (*) FROM Emp E WHERE E.age>10 GROUP BY E.dno • Consider the GROUP BYquery. • Index on E.age ? E.dno ? • Issues : • Use Index on E.age ? • If many tuples have E.age > 10, using E.age index and sorting the retrieved tuples may be costly. • Use Index on E.dno ? • Clustered E.dno index may be better! • What about without WHERE condition?
Examples of Clustered Indexes SELECT E.dno FROM Emp E WHERE E.hobby=Stamps • B+ tree index on E.hobby? • NOTE: is equality query. • NOTE : may contain many duplicates. • Clustered or Unclustered index ? • CONCLUDE : Clustering on E.hobby helps! • QUESTION: what if index is unclustered ? • CONCLUDE: may prefer to do a full scan.
Indexes with Composite Search Keys Composite Search Keys: Search on combination of fields (sal and age). 11,80 11 12 12,10 name age sal 12,20 12 13,75 bob 12 10 13 <age, sal> cal 11 80 <age> joe 12 20 10,12 sue 13 75 10 20 20,12 Data records sorted by name 75,13 75 80,11 80 <sal, age> <sal> Data entries in index sorted by <sal,age> Data entries sorted by <sal>
Equality and Composite Search Keys • Equality query: Every field value is equal to a constant value. • Examples : • age=20 • sal =75 • age=20 and sal =75 • sal =75 and age=20 11,80 11 12 12,10 name age sal 12,20 12 13,75 bob 12 10 13 <age, sal> cal 11 80 <age> joe 12 20 10,12 sue 13 75 10 20 20,12 Data records sorted by name 75,13 75 80,11 80 <sal, age> <sal> Data entries in index sorted by <sal,age> Data entries sorted by <sal>
Composite Search Keys • If retrieve Emp records with age=30 ANDsal=4000 • Index on <age,sal> would be better than an index on age or an index on sal. • Choice of index key orthogonal to clustering
Ranges and Composite Search Keys Examples of composite key indexes using lexicographic order. • Range query: Some field value is not a constant but a range. • Examples : • age=12 and sal > 10 • age =12 11,80 11 12 12,10 name age sal 12,20 12 13,75 bob 12 10 13 <age, sal> cal 11 80 <age> joe 12 20 10,12 sue 13 75 10 20 20,12 Data records sorted by name 75,13 75 80,11 80 <sal, age> <sal> Data entries in index sorted by <sal,age> Data entries sorted by <sal>
Composite Search Keys • If condition is: 20<age<30AND3000<sal<5000: • Clustered tree index on <age,sal> or <sal,age> is best. • If condition is: age=30AND3000<sal<5000: • Clustered <age,sal> index much better than <sal,age> index! • Composite indexes are larger, updated more often.
Index-Only Plans • Answer a query without retrieving actual tuples … • Is that possible ? • If index with suitable information is available. • Why is it a good idea ?
Index-Only Plans + Does index-only evaluation make sense? SELECT E.dno, COUNT(*) FROM Emp E GROUP BY E.dno <E.dno> ? <E.dno> ? <E.sal> ? <E.dno,E.sal> ? SELECT E.dno, MIN(E.sal) FROM Emp E GROUP BY E.dno <E. age,E.sal> or <E.sal, E.age>? SELECTAVG(E.sal) FROM Emp E WHERE E.age=25 AND E.sal BETWEEN 3000 AND 5000 Tree index!
Index-Only Plans : Multi-Key Index PROS: + The chance for index-only evaluation is increased. CONS: - Index size larger. - Update response for any field.
Index-Only Plans • Tree index on <dno,age>, or on : <age,dno> • Which is better? SELECT E.dno, COUNT (*) FROM Emp E WHERE E.age=30 GROUP BY E.dno
Index-Only Plans • Tree index on <dno,age>, or on : <age,dno> • Which is better? SELECT E.dno, COUNT (*) FROM Emp E WHERE E.age=30 GROUP BY E.dno SELECT E.dno, COUNT (*) FROM Emp E WHERE E.age>30 GROUP BY E.dno What if we consider the second query?
Index-Only Plans : Multiple Relations <E.dno> Or <D.dno> Or <D.mgr> SELECT DISTINCT ( D.mgr ) FROM Dept D, Emp E WHERE D.dno=E.dno
Summary • Understanding nature of workload for application, and the performance goals is essential to developing a good design. • What are the important queries and updates? • What attributes/relations are involved?
More Summary • Indexes must be chosen to speed up important queries • Index maintenance overhead on updates to key fields. • Choose indexes that can help many queries, if possible. • Build indexes to support index-only strategies. • Clustering is an important decision; only one index on a given relation can be clustered! • Order of fields in composite index key can be important.