Efficient Data Layout Techniques in Database Systems

Chapter 13Second half 遠山研- 教育輪講 Brice Pesci Database systems 教育輪講　Brice Pesci

References • The book itself • Database Systems • Second edition • Complete book • Pearson International Edition • This presentation • Chapter 13 - Secondary Storage Management • Second part • p590 – p617 教育輪講　Brice Pesci

Content of the presensation • Records • Blocks • Spanned records • BLOBs • Databases addresses • Pointer Swizzling • Tombstones • Pinned blocks 教育輪講　Brice Pesci

Arranging data on disk • Record • Represent data • Consecutive bytes in some disk block • Collections • Ex : relations • Records in one or more blocks • We shall overview the basic layout techniques 教育輪講　Brice Pesci

Fixed-length records • Simplest sort of record • Only fixed-length fields • One for each attribute (see tuple) • Very common to begin all fields at a multiple of 4 or 8 • More efficient reading/writing is some cases • Space not used by the previous field is wasted • The records are manipulated by main memory yet there are kept in secondary memory • Need to lay out the record efficiently 教育輪講　Brice Pesci

Fixed-length records • Header • Schema for the relation • Length of record • Timestamp • Pointers to the fields • Fields • Data itself 教育輪講　Brice Pesci

Fixed-length records • Example CREATE TABLE MovieStar( name CHAR(30) PRIMARY KEY, address VARCHAR(255), gender CHAR(1) birthdate DATE ); to schema length timestamp gender name address birthdate 0 12 44 300 304 316 header 教育輪講　Brice Pesci

Packing into blocks • Records representing tuples of a relation • Fixed length records are stored in blocks of the disk • Moved into main memory when there is a need • Access • Update header Record 1 Record 2 ... Record n 教育輪講　Brice Pesci

Packing into blocks • The block header • Links to one or more other blocks • For creating indexes to the tuples of a relation • Information about the role played by this block • Information about which relation the tuples belong to • A ‘directory’ with the offset of each record • Timestamps • Last modification/access 教育輪講　Brice Pesci

Representing blocks and record addresses • In main memory • The address of a block is the virtual-memory address of its first byte • The adress of a record within that block is the virtual memory • In secondary storage • Block : not part of the application’s virtual memory address space • Sequence of bytes : overall position (device ID disk, the cylinder number, ...) • Record : block address and offset of first byte 教育輪講　Brice Pesci

Addresses in client-server arch. • Processes • A server process • Provides data from secondary memory • One or more client processes • Applications using the data • Can be distributed over manys machines • Client applications uses ‘virtual address’ space (32 bits..) • OS or DBMS decides which part is in main memory • Hardware maps virtual addresses to physical addresses 教育輪講　Brice Pesci

Addresses in client-server arch. • The server’s data is in a database address space • Refers to blocks and offsets within blocks • Physical addresses • Byte strings that determines where in the secondary storage system the block can be found (quite long) • The host to which the storage is attached • The identifier for the disk or device on which the block is located • The number of the cylinder of the disk • The number of blocks within the track • (optional) The offset of the beginning of the record within block 教育輪講　Brice Pesci

Addresses in client-server arch. • Logical addresses • One for each block • Arbitrary string of bytes of fixed length • A ‘map table’ relates logical to physical addresses logical physical logical address physical address 教育輪講　Brice Pesci

Logical & structured addresses • Flexibility thanks to the level of indirection • If we move records around, all we need is to change the entries in the map table since the pointers to the record refers to this table • Many possibilities : an exemple • Physical address for the block (but not the offset) • Key value address for the record being referred to • To find a record • We use physical part to reach the block with the record • We examine the records of the block to find the one with the key 教育輪講　Brice Pesci

Logical & structured addresses • Another exemple • Keeping in each block an offset table which stores the offset of the records within the block • Record are placed starting at the end of the block • Useful when record need not to be of equal length header unused offset table Record3 Record2 Record1 教育輪講　Brice Pesci

Logical & structured addresses • The address of a record is now the physical address of its block plus the offset of the entry is the block’s offset table for this record • Advantages of this indirection • We can move the record around the block and we only have to change the record’s entry • We can move the record to another block if the offset table hold a forwarding address giving the new location • We can leave ‘tombstones’ when we delete records • Prevents pointer from leading to some new records 教育輪講　Brice Pesci

Management of pointers • Often, pointer or addresses are part of records • Common for tuples that represents objects • Attributes of type pointers in obj-rel database systems • Index structures have blocks with pointers within them • For every block, record, object, referenceable data • Database address in server’s address space • Memory address if the item if copied in virtual memory • On secondary storage : must use database address • On main memory : more efficient to use memory address when item has pointer (single instruction) 教育輪講　Brice Pesci

Translation map • We need a ‘translation map’ • Translates from all those database addresses that are currently in virtual memory to their current memory address DBaddr mem-adr databaseaddress memory address 教育輪講　Brice Pesci

Translation table & map table • The ‘translation table’ is different from the ‘map table' • Logical and physical addresses represent both database addresses • Memory addresses in the translation table are copies of the corresponding object in memory • All addressable items in the database have entries in the map table • Only items currently in memory are in the translation table 教育輪講　Brice Pesci

Pointer swizzling • Cost from translating repeatedly from database addresses to memory addresses • Ppointer swizzling’ • When we move a block from secondary to main memory, pointers within the block may be translated from the database address to the virtual address space • A pointer consists of : • A bit indicating the nature of the pointer (DB or mem) • The DB or mem pointer as appropriate 教育輪講　Brice Pesci

Pointer swizzling • Exemple read into memory Disk Memory swizzled Block 1 unswizzled Block 2 教育輪講　Brice Pesci

Automatic swizzling • As soon as a block is brought into memory • We locate all its pointers and addresses • We enter them into the translation table (if not there) • They include both pointer from records in the block to elsewhere and the adresses of the block itself and/or its records • We need a mechanism to locate the pointer from within a block • List of where the pointers are in the block header? • If records with know schemas, using the schema? 教育輪講　Brice Pesci

Automatic swizzling • When we enter into the translation table the addresses for the block just moved in memory • We may create the translation table entry for the databases addresses straightforwardly • Since we know where the block has been buffered in memory 教育輪講　Brice Pesci

Automatic swizzling • When we insert one of the database addresses into the translation table • If it is already in the table (because the block is in mem) • We replace it in the block just moved to memory by the corresponding memory address • We set the swizzled bit to ‘true’ • If not • The block has not been copied into main memory • We cannot swizzled this pointer and leave it as DB pointer 教育輪講　Brice Pesci

Automatic swizzling • During the use of this data, we follow P, a pointer • P is still unswizzled, that is in the form of DB pointer • We consult the translation table to see if P has a memory equivalent • If not, block B must be copied into a memory buffer • Once B is in memory, we can swizzle P by replacing its database form by the equivalent memory form 教育輪講　Brice Pesci

Swizzling on demand • Leave all pointers unswizzled when the block is first brought into memory • We enter its address and addresses of its pointers into the translation table along with memory equivalents • If we follow a pointer P that is inside some block of memory, we swizzle it, using the strategy described earlier • Might save time if the possibility that a swizzled pointer will never be followed 教育輪講　Brice Pesci

Swizzling on demand • Interresting option • Arrange so that database pointers looks like invalid memory addresses • If so, we can akways follow any pointer as if it were in memory form • If the pointer is unswizzled : hardware trap • If this causes the DBMS to call a ‘swizzle’ function, then we can follow pointer in single instruction • We only need to do something time consuming when the pointer is unwizzled 教育輪講　Brice Pesci

Programmer control of swizzling • We know whether the pointer are likely to be followed • The programmer specify explicitely that a block loaded in memory is to have pointers swizzled • The programmer may swizzle the addresses only as needed • Exemple • Block is the root block of a B-tree -> the pointers will be swizzled • If the blocks are loaded then used only once and then dropped, no need to do so 教育輪講　Brice Pesci

Returning blocks to disk • When a block is moved from memory back to disk, we need to unswizzle the pointers within that block • In principle, it is possible to find, given a memory address, the database address to which the memory address is assigned • We use the translation table • However, we do not want this operation to require a search through the entire translation table 教育輪講　Brice Pesci

Returning blocks to disk • If we consider the translation table as a relation • Then the problem of finding the memory address associated with a database address x is the query : • The reverse query would be SELECT memAddr FROM TranslationTable WHERE dbAddr = x; dbAddr memAddr SELECT dbAddr FROM TranslationTable WHERE memAddr = y; Translation table 教育輪講　Brice Pesci

No swizzling • We never swizzle pointers • We still need the translation table in order the pointers to be followed in their unswizzled form • Records cannot be pinned in memory • If it cannot be written back to disk safely at the moment • No need to make decisions about which form of pointer is present 教育輪講　Brice Pesci

Pinned records and blocks • In the header of a block, a bit tells whether it is pinned • Beacause of pointer swizzling blocks may be pinned • If a block B1 has within it a swizzled pointer to same data item in block B2 • When moving back B2 back to disk and reusing its main memory buffer • Should we follow the pointer in B1, it will lead us to the buffer, which no loner holds B2! • B2, referred by a swizzled pointer from somewhere else is pinned 教育輪講　Brice Pesci

Pinned records and blocks • The same exemple Disk Memory swizzled swizzled Block 1 Block 1 when bringing back B2 Block 2 Block 2 教育輪講　Brice Pesci

Pinned records and blocks • This is why we need to unswizzle any pointer in this kind of block • We also need to make sure it’s not pinned • If it is pinned, we must unpin it or let the block remain in memory, occupying extra space • To unpin a block pinned because of swizzled pointer from outside, we must unswizzle any pointer to it • Thus the translation table need to store for each DB address whose data item is is memory the places in memory where swizzled pointer to that item exists 教育輪講　Brice Pesci

Pinned records and blocks • Possibles approaches : • Keep the list of references to a memory address as a linked attached to the entry for that address in the translation table • If memory addresses are shorter than DB addresses, we create the linked list in the space used for the pointer themselves, each space used for a DB pointer is replaced • By the swizzled pointer • And another pointer that forms part of a linked list of all occurences of this pointer 教育輪講　Brice Pesci

Pinned records and blocks • Illustration of a linked list of occurences of a swizzled pointer dbAddr memAddr y x y y Translation table Swizzled pointer 教育輪講　Brice Pesci

Variable-length data and records • We may have to represent : • Data items whose size varies • Ex : Size of strings • Repeating fields • Ex : With many-many relationship, we have to store references to as many objects as related to the given object, ... • Variable-format data • Ex : Record representing XML element • Enormous fields • Ex : Encoding of movies, ... 教育輪講　Brice Pesci

Records with variable-length fields • If one or more fields of a record have variable length, then the record must contain enough information to let us find any fields of the record • An effective scheme is to put all fixed-length fields ahead of the variable-length fields • We place in the record header : • The length of the record • Pointers to (offsets of) the beginnings of all variable-length fields other than the first (because it follows immediately the fixed-length fields) 教育輪講　Brice Pesci

Records with variable-length fields • Example : • The same MovieStars record but with name and address implemented as variable-length characters strings other header info record length There is no need for apointer to the beginningof the name to address gender birthdate address name 教育輪講　Brice Pesci

Representing Null values • Tuples often have fields that may be NULL • The previous record format has a conveniant way to represent NULL • If address is NULL, then we put a NULL pointer in the place for the pointer • This way we can save space • And this, even if address is a fixed-length fields and has often the value NULL 教育輪講　Brice Pesci

Records with repeating fields • It is sufficient to groupe all the occurences of field F together and put in the record header a pointer to the first • Then we add to the offset for the field F all integer multiple of the number of bytes for one instance of field F to reach them • Until we reach the offset of the next field or the end of the record 教育輪講　Brice Pesci

Records with repeating fields • Exemple : • Now out MovieStars only hold the name and address (variable-length strings) and pointers to all the movies of the star other header info record length pointers to movies to address to movie pointers name address 教育輪講　Brice Pesci

Records with repeating fields • Another representation is to keep the record of fixed length • And put the variable-length portion on a separate block • We keep in the record • Pointer to the place where each repeating field begins • And either how many repetitions there are, or where they end 教育輪講　Brice Pesci

Records with repeating fields • Exemple : • Variables-length fields name and address • Repeating field starredIn (a set of movies references) headerinfo to address length address to name length name tomov ref nbrref Record Additionalspace address name 教育輪講　Brice Pesci

Records with repeating fields • Advantages to this indirection • The record itself is fixed-length • More efficient search • Minimizes overhead in block headers • Allows records to be moved within or among blocks easily • Backdraws • Storing variable-length components on another block • Increases the number of disk I/O’s needed to examine all components of a record 教育輪講　Brice Pesci

Records with repeating fields • Compromise strategy • Keep in the fixed-length portion of the record • Some reasonnable number of occurences of repeating fields • A pointer to a place where additional occurences can be found • A count of how many occurences there are • If there are fewer than this number, some of the space would be unused • If there are more, then the pointer additional space is non NULL and we can find the occurences using this pointer 教育輪講　Brice Pesci

Variable-format records • Records may not have a fixed schema • The simplest representation of variable-format records is a sequence of tagged fields • Each of which consists of the value of the fields preceded b information about the role of the field • The attribute or field name • The type of the field and some readily available schema info • The length of the field 教育輪講　Brice Pesci

Variable-format records • Exemple : • MovieStars may have additional attributes • Movie directed, former spouses, ... code for name code for restaurant owned code for string type code for string type length length N S 14 Clint Eastwood R S 16 Hog’s Breath Inn ... 教育輪講　Brice Pesci

Records that do not fit in a block • DBMS often manage datatypes with large values • Often, values do not fit in one block • May fit but we want to save space • Records fragment • The portion of a record that appears in one block • Spanned record • A record with two or more fragments • Unspanned record • Record that does not cross a block boundary 教育輪講　Brice Pesci

Spanned records • If records can be spanned, every record fragment need extra header information • Each record or fragment must contain a bit telling whether it is a fragment • If it is a fragment, then it needs bits telling if it is the first or last fragment for its record • If there is a next and/or previous fragment for the same record, then the fragment needs pointer to these other fragments 教育輪講　Brice Pesci

Efficient Data Layout Techniques in Database Systems