Optimize Database Performance with Indexing and Joins

Indexing and Joins

Indexes • Increase database performance • must be explicitly defined • once defined, are transparent to the user • once created, the system maintains it • more than one can exist on a given table

Creating an Index • Syntax CREATE [UNIQUE] INDEX index_name ON table_name (column_name) • Example create index auind on authors (au_id)

Composite Index • Used when columns have a logical relationship and would be searched as a unit • Example create index au_name_ind on authors (au_lname, au_fname) • order not important, but performance is better when primary search col is first

2 Kinds of Indexes • Unique Index • Clustered Index

Unique Index • No 2 rows are permitted to have the same value • system checks data upon creation and data addition • rejects duplicates and returns an error • should only be created on a column that requires uniqueness eg. ssn, acct code • can be created as a composite or single column • helps in maintaining data integrity • boosts search performance

Clustered Index System sorts rows on an ongoing basis so that the physical order is the same as the indexed order only 1 can exist per table should only be created for a column that is most often retrieved in order greatly increases performance when searching for contiguous key values… especially a range slows down data updates due to the sorting involved

Things to Consider • Indexes greatly increase query response time • every index requires system resources to store and maintain • indexes can actually slow down the performance of UPDATES, INSERTS, and DELETES due to index maintenance So… don’t over index

What Should We Index? • Any column frequently used in retrieval • primary key columns • columns that are often queried in a sorted order • columns that are used in joins • columns that are often searched for ranges

We Should NOT Index… • Columns rarely used in queries • columns with 2 or 3 possible values eg. Male or Female • small tables

SQL-92 Create Table Constraints • PRIMARY KEY • rejects duplicates and nulls • UNIQUE • rejects duplicates, allows nulls • DEFAULT • inserts the default value when no value is entered • CHECK • validates data format • FOREIGN KEY and REFERENCES • ties foreign key to the primary key it references

Put it on paper!

Then write your SQL create table title(title_id char(6) not nullconstraint tididx primary keyconstraint tidcheck check(title_id like ‘[A-Z] [A-Z] [0-9] [0-9]…’),title varchar(80) not nullconstraint titleidx unique,type char(12)default ‘unclassified’ nullconstraint typechk check(type in(‘business’, ‘mod_cook’, ‘trad_cook’)),pub_id char(4) nullreference publishers (pub_id),price money null,advance money null)

Changing a Table • Syntax • ALTER table table_nameadd column_name datatype null|not null

Removing Objects • Database • DROP DATABASE db_name • deletes ALL tables and data within it!! • Table • DROP TABLE table_name • deletes table and its contents • Index • DROP INDEX table_name.index_name • deletes named index on named table

Joins In order to maintain normalization in the database design it is necessary to break up data into separate tables. The data can then be re-associated through the use of a join.

Joins • What columns do I need? • What tables have these columns? • Are all the tables related in some way? • If not, are there other tables that can relate them? • How are they all related? • Link them together by setting their common fields equal in the WHERE clause. • Restrict the WHERE clause to the record(s) of interest.

What to join? • Key columns are the best since these were created for the purpose of existing as a reference. • Should have similar data • Should be the same datatype • nulls will not join since their value is not known.

Syntax • Usually best to put the join conditions first in the WHERE clause • Use of aliases greatly simplifies the statement. • Any logical operator can be used. • A self-join can be performed on the same table by qualifying it twice.

Self Join Which authors in Oakland have the same zip code? Select distinct au1.au_fname , au1.au_lname, au1.zip from authors au1, authors au2 where au1.city = “Oakland” and au1.zip = au2.zip and au1.au_id != au2.au_id

How a Join is Processed • First the system obtains the Cartesian Product of all tables in joinCartesian Product - the matrix of all possible combinations that could satisfy the join • The select list is used to restrict the columns returned • The WHERE clause is then used to restrict the rows return that satisfy the query

2 ways of looking at a Join • Looking at all the tables, linking them together and treating them like one big table. • Setting the main search criteria and then linking the common fields to the data that is of interest.

Optimize Database Performance with Indexing and Joins

Optimize Database Performance with Indexing and Joins

Presentation Transcript

Indexing and Hashing

Indexing and Searching

Journals and Indexing

Indexing and Hashing

Joins and Cardinality Demystified

SQL Joins

JOINs

Sequences and Indexing

Indexing and Retrieval

Joins And Subqueries

JOINS

Joins

JOINS

Indexing and Execution

Special Joins

Storage, Indexing and Joins Slides taken from Database Management Systems, R. Ramakrishnan

Joins

Searching and Indexing

Indexing and Complexity

Vectors and indexing

Indexing and Searching