1 / 32

Database Implementation Issues

Explore specialized algorithms for storing data records efficiently and reliably. Learn about tuple storage, variable length fields, record headers, managing blocks, and techniques for adding, deleting, and modifying records in a database system.

Download Presentation

Database Implementation Issues

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Database ImplementationIssues CSCE 315 – Programming Studio, Spring 2019 Robert Lightfoot Slides adapted from those used by Jennifer Welch and John Keyser

  2. Database Implementation • Typically, we assume databases are very large, used by many people, etc. • So, specialized algorithms are usually used for databases • Efficiency • Reliability

  3. Storing Data • Other terminology for implementation • Relation is a table • Tuple is a record • Attribute is a field

  4. Storing a Record (Tuple) • Often can assume all the fields are fixed (maximum) length. • For efficiency, usually concatenate all fields in each tuple. • Variable length: store max length possible, plus one bit for termination • Store the offsets for concatenation in a schema

  5. Example: tuple storage • Senator • Name – variable character (100 + 1 bytes) • State – fixed character (2 bytes) • YearsInSenate – integer (1 byte) • Party – variable character (11 + 1 bytes) 116 0 101 103 104

  6. More on tuples/records • So, schema would store: • Name: 0 • State: 101 • YearsInSenate: 103 • Party: 104 • Note that HW/efficiency considerations might give minimum sizes for each field • e.g. multiple of 4 or 8 bytes

  7. Variable Length Fields • Storing max size may be problematic • Usually nowhere close – waste space • Could make record too large for a “unit” of storage (i.e., disk block) • Storing pointers may be problematic • Record is not known size, thus variable length records • Not much locality in memory • Extra storage for pointer • Dynamic memory allocation is slow

  8. Variable Length Fields • Compromise: “local” pointers • Store fixed-length records, followed by variable-length • Variable data still kept locally • Header stores info about variable fields • Pointer to start of each • Not great if data in variable field will change

  9. Record Headers • Might want to store additional key information in header of each record • Schema information (or pointer to schema) • Record size (if variable length) • Timestamp of last modification

  10. Record Headers and Blocks • Records grouped into blocks • Correspond with a “unit” of disk/storage • Header information with record positions • Also might list which relation it is part of. • Concatenate records Header Record 1 Record 2 … Record n

  11. Addresses • Addresses of (pointers to) data often represented • Two types of address • Location in database (on disk) • Location in memory • Translation table usually kept to map items currently in virtual memory to the overall database. • Pointer swizzling: updating pointers to refer to disk vs. memory locations

  12. Records and Blocks • Sometimes want records to span blocks • Generally try to keep related records in the same block, but not always possible • Record too large for one block • Too much wasted space • Split parts are called fragments • Header information of record • Is it a fragment • Store pointers to previous/next fragments

  13. Adding, Deleting, Modifying Records • Insertion • If order doesn’t matter, just find a block with enough free space • Later come back to storing tables • If want to keep in order: • If room in block, just do insertion sort • If need new block, go to overflow block • Might rearrange records between blocks • Other variations

  14. Adding, Deleting, Modifying Records • Deletion • If want to keep space, may need to shift records around in block to fill gap created • Can use “tombstone” to mark deleted records • Modifying • For fixed-length, straightforward • For variable-length, like adding (if length increases) or deleting (if length decreases)

  15. Keeping Track of Tables • We have a bunch of records stored (somehow). • We need to query them (SELECT * FROM table WHERE condition) • Scanning every block/record is far too slow • Could store each table in a subset of blocks • Saves time, but still slow • Use an index

  16. Index • A general term – applies to other settings (e.g., index of a file, index of a dictionary, index of search engine) • It is a separate file describing the main data file • Describe here means giving random access to the data file by some search keys • Example: Apple Apricot Ball Banana Blazer Cannot Cat Code Coil …. A B C Index file Data file

  17. Index in Database Terminology • Special data structures to find all records that satisfy some condition • Possible indexes • Simple index on sorted data • Secondary index on unsorted file • Trees (B-trees) • Hash Tables

  18. Sorted files • Sort records of the relation according to field (attribute) of interest. • Attribute of interest is search key • Might not be a “true” key • Index stores (K,a) values • K = search key • a = address of record with K

  19. Dense Index • One index entry per record • Useful if records are huge, and index can be small enough to fit in memory • Can search efficiently and then examine/retrieve single record only 1 5 7 7 10 12 18 18 18 27 30 35 43 44 65 73 1 5 7 7 10 12 18 18 18 27 30 35 43 44 65 73

  20. Sparse Index(on sequential file) • Store an index for only every n records • Use that to find the one before, then search sequentially. 1 7 12 27 44 1 5 7 7 10 12 18 18 18 27 30 35 43 44 65 73

  21. Multiple Indices • Indices in hierarchy • B-trees are an example 1 27 1 7 12 27 44 1 5 7 7 10 12 18 18 18 27 30 35 43 44 65 73

  22. Duplicate Keys • Can cause issues, in both dense and sparse indexes, need to account for 1 7 12 27 44 1 5 7 7 10 12 18 18 18 27 30 35 43 44 65 73

  23. What if not sorted? • Can be the case when we want two or more indices on the same data • e.g. Senator.name, Senator.party • Must be dense (sparse would make no sense) • Can sort the index by the search key • This second level index can be sparse

  24. Example – Secondary Index 1 27 1 7 12 27 44 1 5 7 7 10 12 18 18 18 27 30 35 43 44 65 73 5 35 18 43 12 44 73 1 65 10 7 18 30 27 7 18

  25. Buckets • If there are lots of repeated keys, can use buckets • Buckets are in between the secondary index and the data file • One entry in index per key – points to bucket file • Bucket file lists all records with that key

  26. Example – Buckets 1 7 12 27 44 1 5 7 10 12 18 27 30 35 43 44 65 73 7 18 18

  27. Putting It All Together • To handle variable length records (e.g. VARCHAR(x)), we now have a header for every record • If everything were fixed length and we did not care about wasted space, only per-table header would suffice • To deal with long disk latency, group as many records in a disk block as possible • The rest of the block is empty • We need a block header to keep track header Record 3 Record 2 Record 1 A disk block containing records from a table

  28. Putting It All Together (contd.) • Insert/Delete/Update operations put this design into test • Delete is easier, just mark as deleted • Insert/Update might require a lot of “move-to-right” operations for making room for new/increased data • But empty spaces in most blocks, the “move-to-right” operation does not propagate too much • Usually an empty spot is found in the current or surrounding blocks • If no surrounding blocks found, overflow blocks can be used • Thus, insert/update operations does not undo sorted files header Record 3 Record 2 Record 1

  29. Putting It All Together - Indices • Index is a separate file along with the data file • Often small enough to fit in the RAM • The main purpose of indices is to avoid scanning all disk blocks • Each index entry contains a key and a pointer to the record containing the key • Index often resides in the RAM, while the record in the disk • Two types of indices: • Dense Index (applies to sorted or unsorted files) • Sparse (only for sorted files)

  30. Storage Considerations • Memory Hierarchy • Cache • Main Memory • Secondary storage (disk) • Tertiary storage (e.g. tape) • Smaller amounts but faster access • Need to organize information to minimize “cache misses”

  31. Storage Considerations:Making things efficient • Placing records together in blocks for group fetch • Prefetching • Prediction algorithm • Parallelism/Striping • placing across multiple disks to read/write faster • Example: RAID • Mirroring • double the reliability

  32. Storage ConsiderationsMaking it reliable • Checksums • Mirroring disks • Parity bits • RAID levels • Mostly, out of the scope of this course

More Related