180 likes | 340 Views
CS 255: Database System Principles slides: Variable length data and record. By:- Arunesh Joshi( 107) Id:-006538558 Cs257_107_ch13_13.7. Agenda. Records With Variable-Length Fields Records With Repeating Fields Variable-Format Records Records That Do Not Fit in a Block BLOBs
E N D
CS 255: Database System Principlesslides: Variable length data and record By:- Arunesh Joshi( 107) Id:-006538558 Cs257_107_ch13_13.7
Agenda • Records With Variable-Length Fields • Records With Repeating Fields • Variable-Format Records • Records That Do Not Fit in a Block • BLOBs • Column Stores
Records With Variable-Length Fields A simple but effective scheme is to put all fixed length fields ahead of the variable-length fields. We then place in the record header: 1. The length of the record. 2. Pointers to (i.e., offsets of) the beginnings of all the variable-length fields. However, if the variable-length fields always appear in the same order then the first of them needs no pointer; we know it immediately follows the fixed-length fields.
Records With Repeating Fields • A similar situation occurs if a record contains a variable number of Occurrences of a field F, but the field itself is of fixed length. It is sufficient to group all occurrences of field F together and put in the record header a pointer to the first. • We can locate all the occurrences of the field F as follows. Let the number of bytes devoted to one instance of field F be L. We then add to the offset for the field F all integer multiples of L, starting at 0, then L, 2L, 3L, and so on. • Eventually, we reach the offset of the field following F. Where upon we stop.
An alternative representation is to keep the record of fixed length, and put the variable length portion - be it fields of variable length or fields that repeat an indefinite number of times - on a separate block. In the record itself we keep: • 1. Pointers to the place where each repeating field begins, and • 2. Either how many repetitions there are, or where the repetitions end.
Variable-Format Records • The simplest representation of variable-format records is a sequence of tagged fields, each of which consists of: 1. Information about the role of this field, such as: (a) The attribute or field name, (b) The type of the field, if it is not apparent from the field name and some readily available schema information, and (c) The length of the field, if it is not apparent from the type. 2. The value of the field.
There are at least two reasons why tagged fields would make sense. • Information integration applications - Sometimes, a relation has been constructed from several earlier sources, and these sources have different kinds of information For instance, our movie star information may have come from several sources, one of which records birthdates, some give addresses, others not, and so on. If there are not too many fields, we are probably best off leaving NULL those values we do not know. 2. Records with a very flexible schema - If many fields of a record can repeat and/or not appear at all, then even if we know the schema, tagged fields may be useful. For instance, medical records may contain information about many tests, but there are thousands of possible tests, and each patient has results for relatively few of them.
Records That Do Not Fit in a Block • These large values have a variable length, but even if the length is fixed for all values of the type, we need to use some special techniques to represent these values. In this section we shall consider a technique called “spanned records" that can be used to manage records that are larger than blocks. • Spanned records also are useful in situations where records are smaller than blocks, but packing whole records into blocks wastes significant amounts of space. For both these reasons, it is sometimes desirable to allow records to be split across two or more blocks. The portion of a record that appears in one block is called a record fragment.
If records can be spanned, then every record and record fragment requires some extra header information: 1. Each record or fragment header must contain a bit telling whether or not it is a fragment. 2. If it is a fragment, then it needs bits telling whether it is the first or last fragment for its record. 3. If there is a next and/or previous fragment for the same record, then the fragment needs pointers to these other fragments. Storing spanned records across blocks
BLOBS • Binary, LargeOBjectS= BLOBS • BLOBS can be images, movies, audio files and other very large values that can be stored in files. • Storing BLOBS • Stored in several blocks. • Preferable to store them consecutively on a cylinder or multiple disks for efficient retrieval. • Retrieving BLOBS • A client retrieving a 2 hour movie may not want it all at the same time. • Retrieving a specific part of the large data requires an index structure to make it efficient. (Example: An index by seconds on a movie BLOB.)
Column Stores An alternative to storing tuples as records is to store each column as a record. Since an entire column of a relation may occupy far more than a single block, these records may span many block, much as long as files do. If we keep the values in each column in the same order then we can reconstruct the relation from column records