1 / 57

Data Partitioning in VLDB

Data Partitioning in VLDB. Tal Olier Tal.olier@hp.com. Why am I here?. Tal Olier – tal.olier@hp.com ~15 years in various software development positions. All of them involved database practice.

mikasi
Download Presentation

Data Partitioning in VLDB

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data Partitioning in VLDB Tal Olier Tal.olier@hp.com

  2. Why am I here? Tal Olier – tal.olier@hp.com ~15 years in various software development positions. All of them involved database practice. I work in HP Software, I love working there and I came to tell you this; the lecture is just an excuse getting me into the building :)

  3. Agenda • RDBMS in short (basic terms) • SQL reminder • A bit about (RDBMS) architecture • Performance - access paths • What is table join • VLDB - the size factor • VLDB - industry practice • How joins are executed • Summary

  4. Relational Database Management System

  5. A little history • Was invented in 1970 • By Edgar Frank "Ted" Codd • In IBM labs • Oracle emerged first to the market

  6. Basics – a table • Rows • Columns • Primary key

  7. Basics – a relation • A foreign key (constraint) • A reference • Source table • Source column/s • Target table • Target column/s

  8. People example • People: name, height, smoking, father • Books read: title, author • Schedule details: from, to, activity • Resume details: from, to, salary

  9. People example

  10. Structured Query Language

  11. Query language • SQL – Structured Query Language • Declarative (vs. procedural) • Requires Internal optimization

  12. SELECT query structure • SELECT • FROM… JOIN • WHERE • GROUP BY • HAVING • ORDER BY

  13. SQL modules • DML (+Select) – Data manipulation language • DDL – Data definition language • TC – Transaction controls (commit/rollback) • DCL – Data control language (grant/revoke) • PE – Procedural extensions

  14. A bit about architecture

  15. Database server Process Memory Other cache Log cache Buffer cache Server Process Everything is blocks I/O System Data Files Log Files Client Process

  16. IO bound vs. CPU bound • CPU – what is it consumed for? • IO – what is it consumed for?

  17. Performance?

  18. FTS – full table scan • Scan the whole table – from top to bottom

  19. B Tree index • B tree – allows great spanning that derives small tree height

  20. B+ tree • The leaves are organized in a doubly linked list • B+ tree – allows searching through all values by searching the leaf level only

  21. Database index • Data is sorted according to the index columns • The leaf contain pointers to rows in the table • Search of 1 value in a tree - o (log n) • Smaller index height in B+ trees • Index (database) operations: • Add/remove values • Index seek • Index scan

  22. Index seek/scan

  23. Join (logical)

  24. Inner join • Use join predicate to match rows from 2 table: A and B • Each row in table A is compared to each row in table B to find the pairs of rows that satisfy the join predicate • Than column values for each matched pairs are combined into a result row

  25. department Cartesian product employee

  26. Equi join • A inner join that uses equality comparison in the join predicate • Example:select * from employee emp join department dept on emp.dept_id = dept.dept_id

  27. Equi join OK OK OK

  28. RDBMS – summary in a nutshell • Tables • References • Joins • Indexes • Blocks • I/O

  29. Very Large Data Base

  30. RDBMS – summary in a nutshell • Tables • References • Indexes • Blocks • I/O

  31. VLDB – a table – size factor

  32. Use case: Sales Information • Table: • Customer name • Order number • Order date and time • List of items, amount and prices

  33. Order details (2007-2010)

  34. Remove 2007’s orders

  35. Order details kept in 4 tables

  36. … 4 tables – remove 2007’s data

  37. Union view Select * from t2007 Union all Select * from t2008 Union all Select * from t2009 Union all Select * from t2010

  38. Order details kept in 4 tables and a view

  39. Partitioned table

  40. Get back to: Remove 2007’s data?

  41. Impact on index behavior

  42. Partitioned index (local index)

  43. Local indexes • Index is bound to it’s partition • Drop partition derives drop index • Smaller index heights • Index is always usable • Harder to maintain uniqueness with it

  44. Partitioned table - concepts • Partition column is the key for dividing the data • Performance – only relevant partitions used • Add/drop partition – DDL • Local index – index is bound to a partition

  45. Star schema

  46. Data tables block block block block block block block T a b l e - A block block block block block block block T a b l e - C block block block block block block block block block block block block block block block block block block block block block T a b l e - B block block block block block block block block block block block block block block block block block block block block block

  47. Let’s get back to our partitioned table

  48. Dimension referencing

  49. Making fact tables thin

  50. Join (physical)

More Related