Lecture 17: Data Storage

Lecture 17: Data Storage Friday, November 7, 2003

Outline • External Sorting • Representing data elements (chapter 12) • Index structures (13.1, 13.2)

Sorting • Problem: sort 1 GB of data with 1MB of RAM. • Where we need this: • Data requested in sorted order (ORDER BY) • Needed for grouping operations • First step in sort-merge join algorithm • Duplicate removal • Bulk loading of B+-tree indexes.

2-Way Merge-sort:Requires 3 Buffers in RAM • Pass 1: Read a page, sort it, write it. • Pass 2, 3, …, etc.: merge two runs, write them Runs of length 2L Runs of length L INPUT 1 OUTPUT INPUT 2 Main memory buffers Disk Disk

Two-Way External Merge Sort • Assume block size is B = 4Kb • Step 1  runs of length L = 1MB • Step 2  runs of length L = 2MB • Step 3  runs of length L = 4MB • . . . . . . • Step 10  runs of length L = 1GB (why ?) Need 10 iterations over the disk data to sort 1GB

Can We Do Better ? • Hint:We have 1MB of main memory, but only used 12KB

Cost Model for Our Analysis • B: Block size ( = 4KB) • M: Size of main memory ( = 1MB) • N: Number of records in the file • R: Size of one record

External Merge-Sort • Phase one: load M bytes in memory, sort • Result: runs of length M bytes ( 1MB ) M/R records . . . . . . Disk Disk M bytes of main memory

Phase Two • Merge M/B – 1 runs into a new run (250 runs ) • Result: runs of length M (M/B – 1) bytes (250MB) Input 1 . . . . . . Input 2 Output . . . . Input M/B Disk Disk M bytes of main memory

Phase Three • Merge M/B – 1 runs into a new run • Result: runs of length M (M/B – 1)2 records (250*250MB = 625GB – larger than the file) Input 1 . . . . . . Input 2 Output . . . . Input M/B Disk Disk M bytes of main memory Need 3 iterations over the disk data to sort 1GB

Cost of External Merge Sort • Number of passes: • How much data can we sort with 10MB RAM? • 1 pass  10MB data • 2 passes  25GB data (M/B = 2500) • Can sort everything in 2 or 3 passes !

External Merge Sort • The xsort tool in the XML toolkit sorts using this algorithm • Can sort 1GB of XML data in about 8 minutes

Representing Data Elements • Relational database elements: • A tuple is represented as a record • CREATE TABLE Product ( • pid INT PRIMARY KEY, • name CHAR(20), • description VARCHAR(200), • maker CHAR(10) REFERENCES Company(name) • )

Issues • Represent attributes inside the records • Represent the records inside the blocs

Record Formats: Fixed Length • Information about field types same for all records in a file; stored in systemcatalogs. • Finding i’th field requires scan of record. • Note the importance of schema information! F3 F4 F1 F2 L3 L4 L1 L2 Address = B+L1+L2 Base address (B)

Record Header To schema length F3 F4 F1 F2 L3 L4 L1 L2 header timestamp • Need the header because: • The schema may change • for a while new+old may coexist • Records from different relations may coexist

Variable Length Records Other header information header F3 F4 F1 F2 L3 L4 L1 L2 length Place the fixed fields first: F1 Then the variable length fields: F2, F3, F4 Null values take 2 bytes only Sometimes they take 0 bytes (when at the end)

Records With Repeating Fields Other header information header F3 F1 F2 L3 L1 L2 length Needed e.g. in Object Relational systems,or fancy representations of many-many relationships

Storing Records in Blocks • Blocks have fixed size (typically 4k – 8k) BLOCK R4 R3 R2 R1

Spanning Records Across Blocks • When records are very large • Or even medium size: saves space in blocks block header block header R1 R2 R3 R2

BLOB • Binary large objects • Supported by modern database systems • E.g. images, sounds, etc. • Storage: attempt to cluster blocks together CLOB = character large objec • Supports only restricted operations

Modifications: Insertion • File is unsorted: add it to the end (easy ) • File is sorted: • Is there space in the right block ? • Yes: we are lucky, store it there • Is there space in a neighboring block ? • Look 1-2 blocks to the left/right, shift records • If anything else fails, create overflow block

Overflow Blocks • After a while the file starts being dominated by overflow blocks: time to reorganize Blockn-1 Blockn Blockn+1 Overflow

Modifications: Deletions • Free space in block, shift records • Maybe be able to eliminate an overflow block • Can never really eliminate the record, because others may point to it • Place a tombstone instead (a NULL record)

Modifications: Updates • If new record is shorter than previous, easy  • If it is longer, need to shift records, create overflow blocks

Physical Addresses • Each block and each record have a physical address that consists of: • The host • The disk • The cylinder number • The track number • The block within the track • For records: an offset in the block • sometimes this is in the block’s header

Logical Addresses • Logical address: a string of bytes (10-16) • More flexible: can blocks/records around • But need translation table:

Main Memory Address • When the block is read in main memory, it receives a main memory address • Need another translation table

Optimization: Pointer Swizzling • = the process of replacing a physical/logical pointer with a main memory pointer • Still need translation table, but subsequent references are faster

Pointer Swizzling Block 2 Block 1 Disk read in memory swizzled Memory unswizzled

Pointer Swizzling • Automatic: when block is read in main memory, swizzle all pointers in the block • On demand: swizzle only when user requests • No swizzling: always use translation table

Pointer Swizzling • When blocks return to disk: pointers need unswizzled • Danger: someone else may point to this block • Pinned blocks: we don’t allow it to return to disk • Keep a list of references to this block

Indexes • An index on a file speeds up selections on the search key fields for the index. • Any subset of the fields of a relation can be the search key for an index on the relation. • Search key is not the same as key(minimal set of fields that uniquely identify a record in a relation). • An index contains a collection of data entries, and supports efficient retrieval of all data entries with a given key value k.

Index Classification • Primary/secondary • Clustered/unclustered • Dense/sparse • B+ tree / Hash table / …

Primary Index • File is sorted on the index attribute • Dense index: sequence of (key,pointer) pairs

Primary Index • Sparse index

Primary Index with Duplicate Keys • Dense index:

Primary Index with Duplicate Keys • Sparse index: pointer to lowest search key in each block: • Search for 20 ...but need to search here too 20 is here...

Primary Index with Duplicate Keys • Better: pointer to lowest new search key in each block: • Search for 20 • Search for 15 ? 35 ? 20 is here... ...ok to search from here

Secondary Indexes • To index other attributes than primary key • Always dense (why ?)

Clustered/Unclustered • Primary indexes = usually clustered • Secondary indexes = usually unclustered

Clustered vs. Unclustered Index Data entries Dataentries (Index File) (Data file) DataRecords Data Records CLUSTERED UNCLUSTERED

Secondary Indexes • Applications: • index other attributes than primary key • index unsorted files (heap files) • index clustered data

Applications of Secondary Indexes • Clustered data Company(name, city), Product(pid, maker) Select pid From Company, Product Where name=maker and city=“Seattle” Select city From Company, Product Where name=maker and pid=“p045” Products of company 2 Products of company 3 Products of company 1 Company 1 Company 2 Company 3

Composite Search Keys Examples of composite key indexes using lexicographic order. • Composite Search Keys: Search on a combination of fields. • Equality query: Every field value is equal to a constant value. E.g. wrt <sal,age> index: • age=20 and sal =75 • Range query: Some field value is not a constant. E.g.: • age =20; or age=20 and sal > 10 11,80 11 12 12,10 name age sal 12,20 12 13,75 bob 12 10 13 <age, sal> cal 11 80 <age> joe 12 20 10,12 sue 13 75 10 20 20,12 Data records sorted by name 75,13 75 80,11 80 <sal, age> <sal> Data entries in index sorted by <sal,age> Data entries sorted by <sal>

Lecture 17: Data Storage