File Structures

File Structures Dale-Marie Wilson, Ph.D.

Basic Concepts • Primary storage • Main memory • Inappropriate for storing database • Volatile • Secondary storage • Physical storage e.g. magnetic disks • Nonvolatile • Cheaper

Basic Concepts • 2° storage organized into files • Each file has one or more records • Each record has one or more fields Process • User requests tuple e.g. SG37 • DBMS maps logical record to physical record • Physical record moved to DBMS buffers • N.B. Physical record is unit of transfer between disk and primary storage

Basic Concepts • Physical record typically consists of more than 1 logical record • Logical record can correspond to more than 1 physical record • Refer to physical record as blocks and pages Page 1 2

Basic Concepts • File organization • Physical arrangement of data in file into records and pages in 2° storage • Determines order records stored and accessed • Types • Heap (unordered) • Records place on disk in no specific order • Sequential (ordered) • Records ordered by value of specific field • Hash • Records placement determined by hash function

Basic Concepts • Access method • Steps involved in storing and retrieving records from file

Heap Files • Unordered files • Aka heap files • Simplest organization • Records placed in same order inserted • Linear search for retrieval • Insertion efficient; retrieval not efficient • Deletion process • Relevant page identified • Record marked as deleted • Page rewritten to disk • N.B. deleted record space not reused → performance deterioration • Best suited for bulk loading data

Ordered Files • Ordered files • Aka sequential files • Sorted on field – ordering field • If ordering field = key → ordering key • Binary search for retrieval • Insertion and deletion problematic • Need to maintain order of records • Rarely used unless 1° index exists

Hash Files • Hash files • Aka random/direct files • Hash function used to det. page address for storing record • Chosen to provide most even distribution of records – min. collisions • Examples: • Folding – applying arithmetic function to hash field e.g. + 7 • Division-remainder – uses mod function to det. field value • Each address corresponds to a page/bucket • Each bucket has slots for multiple records – placed in order of arrival • Base field – hash field • If hash field = key → hash key • Collision • Hash function does not calculate unique address for 2 or more records

Hash Files • Collision management techniques • Open addressing • Unchained overflow • Chained overflow • Multiple hashing

Collision Management • Open addressing • Linear search performed to locate 1st available slot • Same procedure for searching for record • Record doesn’t exist if empty slot found before record located

Collision Management • Unchained overflow • Overflow area maintained for collisions • Improves over open addressing by minimizing collisions Bucket Bucket Staff SA9 record Staff SL21 record 0 Staff SL41 record 3 4 Staff SG37 record 1 Staff SG5 record Staff SG14 record 2

Collision Management • Chained overflow • Overflow area maintained for collisions • Uses synonym pointer • Additional field that indicates whether collision occurred • If collision, contains bucket address of overflow area Bucket Bucket Staff SA9 record Staff SL21 record 0 Staff SG7 record 3 0 4 Staff SG37 record 1 0 Staff SG5 record Staff SG14 record 3 2

Collision Management • Multiple hashing • If collision occurs, new hash function performed • 2nd hash function typically used to place record in overflow area

Indexes • Index • Data structure that allows DBMS to locate particular records in file more quickly • Similar to index in book • Main types of indices: • Primary index • Index a key field • Clustering index • File sequentially ordered on non-key field i.e. more than record can correspond with index • Secondary index • Index defined on non-ordering field of data file

Indexes • File can have: • At most 1 primary or 1 clustering index • Several secondary indices • Index may be: • Dense • Index record for every search key value • Sparse • Index record for some key search values

Indexed Sequential Files • Indexed sequential file • Sorted data file with primary index • Has: • Primary storage area • Separate index • Overflow area

Multilevel Index • Multilevel index • Index treated as file and split into smaller indices • Overcomes problems with large indices that span several pages

B+ Trees • Search Tree • Used to guide search for a record, given the value of one of its fields • Two types of Nodes • Internal Nodes contain Key values and node pointers • Leaf Nodes containKey, Record-Pointer pairs • Degree/order • Max # children allowed • B-tree – balanced tree • Depth from root to leaf same for every leaf

B+ Trees • The structure of internal nodes in a B+ tree of order p: • Each internal node is of the form <P1, K1, P2, K2, ..., Pq-1, Kq-1, Pq > , where q <= p , each Pi is a tree pointer • Within each internal node, K1 < K2 < ... < Kq-1 • For all values of X in the subtree pointed at by Pi , we have Ki-1 < X < Ki for 1 < i < q , X < Ki for i=q, and Ki-1 < X for i=q • Each internal node has at most p tree pointers • Each internal node, except the root, has at least (p/2) tree pointers. The root node has at least two tree pointers if it is an internal node. • An internal node with q pointers, q <= p, has q-1 search field values.

B+ Trees

B+ Trees • The structure of leaf nodes in a B+ tree of order p: • Each leaf node is of the form < <K1,Pr1>, <K2,Pr2>, ..., <Kq-1,Prq-1>, Pnext > , where q <= p , each Pri is a data pointer that points to a record or block of records • Within each internal node, K1 < K2 < ... < Kq-1 • Each leaf node, has at least (p/2) values • All leaf nodes are at the same level • The Pnext pointer points to the next leaf node in the tree • This give efficient sequential access to data

B+ Trees

B+ Trees • Insertion example for B+ Tree: • When you insert into a leaf node that is full, you split and pass the rightmost value up to the parent • When you insert into a full root, the root splits and a new root is created with the middle value from the child nodes • Otherwise, values are inserted into openings at the lowest level

Appendix F • Assignment #7

File Structures

File Structures

Presentation Transcript

File Structures CIS 256

Data File Structures

MATLAB: Structures and File I

Chap1 . Introduction to File Structures

File Structures

File System and Structures

Understanding the File Structures

Indexing and Searching (File Structures)

Comp 335 – File Structures

CS 231 File Structures

Comp 335 File Structures

Unix File System Internal Structures

Comp 335 File Structures

Introduction to File Structures

File Systems Control Structures

Comp 335 – File Structures

Basic File Structures and Hashing

Comp 335 File Structures

The Development of File Structures

Advance Data Structures FILE STRUCTURE AND FILE ORGANIZATION

Indexing and Searching (File Structures)