580 likes | 718 Views
Data Partitioning in VLDB. Tal Olier Tal.olier@hp.com. Why am I here?. Tal Olier – tal.olier@hp.com ~15 years in various software development positions. All of them involved database practice.
E N D
Data Partitioning in VLDB Tal Olier Tal.olier@hp.com
Why am I here? Tal Olier – tal.olier@hp.com ~15 years in various software development positions. All of them involved database practice. I work in HP Software, I love working there and I came to tell you this; the lecture is just an excuse getting me into the building :)
Agenda • RDBMS in short (basic terms) • SQL reminder • A bit about (RDBMS) architecture • Performance - access paths • What is table join • VLDB - the size factor • VLDB - industry practice • How joins are executed • Summary
A little history • Was invented in 1970 • By Edgar Frank "Ted" Codd • In IBM labs • Oracle emerged first to the market
Basics – a table • Rows • Columns • Primary key
Basics – a relation • A foreign key (constraint) • A reference • Source table • Source column/s • Target table • Target column/s
People example • People: name, height, smoking, father • Books read: title, author • Schedule details: from, to, activity • Resume details: from, to, salary
Query language • SQL – Structured Query Language • Declarative (vs. procedural) • Requires Internal optimization
SELECT query structure • SELECT • FROM… JOIN • WHERE • GROUP BY • HAVING • ORDER BY
SQL modules • DML (+Select) – Data manipulation language • DDL – Data definition language • TC – Transaction controls (commit/rollback) • DCL – Data control language (grant/revoke) • PE – Procedural extensions
Database server Process Memory Other cache Log cache Buffer cache Server Process Everything is blocks I/O System Data Files Log Files Client Process
IO bound vs. CPU bound • CPU – what is it consumed for? • IO – what is it consumed for?
FTS – full table scan • Scan the whole table – from top to bottom
B Tree index • B tree – allows great spanning that derives small tree height
B+ tree • The leaves are organized in a doubly linked list • B+ tree – allows searching through all values by searching the leaf level only
Database index • Data is sorted according to the index columns • The leaf contain pointers to rows in the table • Search of 1 value in a tree - o (log n) • Smaller index height in B+ trees • Index (database) operations: • Add/remove values • Index seek • Index scan
Inner join • Use join predicate to match rows from 2 table: A and B • Each row in table A is compared to each row in table B to find the pairs of rows that satisfy the join predicate • Than column values for each matched pairs are combined into a result row
department Cartesian product employee
Equi join • A inner join that uses equality comparison in the join predicate • Example:select * from employee emp join department dept on emp.dept_id = dept.dept_id
Equi join OK OK OK
RDBMS – summary in a nutshell • Tables • References • Joins • Indexes • Blocks • I/O
RDBMS – summary in a nutshell • Tables • References • Indexes • Blocks • I/O
Use case: Sales Information • Table: • Customer name • Order number • Order date and time • List of items, amount and prices
Union view Select * from t2007 Union all Select * from t2008 Union all Select * from t2009 Union all Select * from t2010
Local indexes • Index is bound to it’s partition • Drop partition derives drop index • Smaller index heights • Index is always usable • Harder to maintain uniqueness with it
Partitioned table - concepts • Partition column is the key for dividing the data • Performance – only relevant partitions used • Add/drop partition – DDL • Local index – index is bound to a partition
Data tables block block block block block block block T a b l e - A block block block block block block block T a b l e - C block block block block block block block block block block block block block block block block block block block block block T a b l e - B block block block block block block block block block block block block block block block block block block block block block