290 likes | 548 Views
File Organization and Storage Structures . Chapter 5. Basic Concepts. The database on secondary storage is organized into one or more files, where each file consists of a number of records. Each record consists of one or more fields.
E N D
File Organization andStorage Structures Chapter 5
Basic Concepts The database on secondary storage is organized into one or more files, where each file consists of a number of records. Each record consists of one or more fields. Typically, a record corresponds to an entity and a field to an attribute. The physical record is the unit of transfer between disk and primary storage, and vice versa. A physical record , sometimes called block or page, contains mostly several logical records, depending on the size of the records.
List structures • Elementary list • Singular list • Circular list Symmetric list Symmetric circular list
Sequential insertion X(1) X’(1)=X(1) X(2) X’(2)=Y X(3) X’(3)=X(2) X(4) X’(4)=X(3) X’(5)=X(4) Free Zone free Zone
Insertion with pointer technique X(1) X’(1)=X(1) X(3) X’(4)=X(3) X(2) X’(3)=X(2) X(4) X’(5)=X(4) Y X’(2)=Y
Multi-list structure 2000 A 2020 list1 K 2000 2030 list2 B 2010 -1 list empty places 2040 L -1 2050 2060 . . . record with pointer record length 10 address 3000 3000 -1
Insertion at beginning of list 2 2000 A 2020 list1 K 2000 2030 list2 B 2040 -1 2050 L -1 M 2010 2060 . . . List1: A B List2: M K L 3000 3000 -1
General tree structure A C B D E K L F H J M N P Q R
Equivalent binary tree structure A B C J K L H F D E R Q N P M
Pointer Implementation A -1 B C -1 -1 -1 -1 J K L H -1 D E -1 F -1 -1 Q -1 R -1 -1 M N -1 P -1 -1 -1
Bi-directional tree X - first lower - higher - next Entry -1 X Y R S Y -1 S -1 R Z U T -1 T -1 Z -1 U
Ring structure Entry X X Y Z U V T R Y Z U V T R
File Organization File Organization • The physical arrangement of data into records and pages on secondary storage • Main types • Heap or unordered • Sorted • Hash Access method • The steps involved in storing and retrieving records from a file
Sample Data SUPPLIER file SNUM SNAME STATUS CITY S1 De Smet 20 London S2 Janssens 10 Paris S3 Blanchart 30 Paris S4 Clark 20 London S5 Adams 30 Athens
Hash Files Hashing techniques 0 1 • Duplicate handling • - open addressing • - unchained overflow • - Chained overflow • - Multiple hashing • Hashing algorithms • - folding • - mid-square • - division by • prime number S300 Blanchart 30 Paris 2 3 4 5 S200 Janssens 10 Paris 6 7 S500 Adams 30 Athens 8 9 S100 De Smet 20 London 10 11 S400 Clark 20 London 12 Limitations: - inappropriate for value ranges - retrieval on the non-hash fields
An Index An index provides an ACCESS PATH to the file it is indexing • a file may have several associated indexes • the sequential access path is always available • an index imposes an ordering on the file it is indexing • it can be used for direct access • it speeds up retrieval and slows down updating • it is not the same thing as a key • can be build on combinations of fields • can be SRA or symbolic
Sample Data SUPPLIER file SNUM SNAME STATUS CITY S1 De Smet 20 London S2 Janssens 10 Paris S3 Blanchart 30 Paris S4 Clark 20 London S5 Adams 30 Athens
Supplier file with index on city SNUM SNAME STATUS CITY S1 De Smet 20 London S2 Janssens 10 Paris S3 Blanchart 30 Paris S4 Clark 20 London S5 Adams 30 Athens Supplier file City-index Athens . London . London . Paris . Paris .
Supplier file with two indexes SNUM SNAME STATUS CITY S1 De Smet 20 London S2 Janssens 10 Paris S3 Blanchart 30 Paris S4 Clark 20 London S5 Adams 30 Athens Supplier file City-index 10 20 20 30 30 Athens . London . London . Paris . Paris .
Non-dense index SNUM SNAME STATUS CITY S1 De Smet 20 London S2 Janssens 10 Paris S3 Blanchart 30 Paris S4 Clark 20 London S5 Adams 30 Athens SNUM-index S2 . S4 . S5 . block 1 block 2 block 3
Factoring out a field Supplier file SNUM SNAME STATUS CITY-pointer S1 De Smet 20 S2 Janssens 10 S3 Blanchart 30 S4 Clark 20 S5 Adams 30 CITY-file CITY Athens London Paris
Combining Indexing and factoring out Athens London Paris S1 De Smet 20 S2 Janssens 10 S3 Blanchart 30 S4 Clark 20 S5 Adams 30
Parent - Child structure CITY file Athens London Paris S1 De Smet 20 S2 Janssens 10 SUPPLIER file S3 Blanchart 30 S4 Clark 20 S5 Adams 30
Fully inverted file SNAME-index STATUS-index CITY-index Supplier- file De Smet S1-> 10 S1-> Athens S5-> S1 Janssens S2-> 20 S1->,S4-> London S1->,S4-> S2 Blanchart S3-> 30 S3->,S5-> Paris S2->,S3-> S3 Clark S4-> S4 Adams S5-> S5
File organization: Indexed-sequential Behr Dooms Fagin parameters - index block size - data block size multi-level index blocks Ernest Fagin Adams Albert Behr Bodoo Claes Codd Dooms Ace Adamo Adams Alois Ball Behr Ademar Aerts Alan Albert Bens Bodoo data blocks
B-tree concept BALANCED tree 25 144 non-dense index 9 - 64 100 196 - 1 4 - 9 16 - 25 36 49 64 81 - 100 121 - 144 169 - 196225250 dense index
B-tree insertion same B-tree after insertion of record 32 64 - 144 - 25 - non-dense index 100 - 196 - 9 - 36 - 1 4 - 9 16 - 25 32 - 36 49 - 64 81 - 100121 - 144 169 - 196225256 dense index
B-tree deletion Deletion of 64 25 81 non-dense index 9 - 36 - 144 196 1 4 -- 9 16 - 25 32 - 36 49 - 81 100 121 144169 - 196225 256