1 / 20

Storage and File Structure

Storage and File Structure. DBMS and Storage/File Structure. Why do we need to know about storage/file structure Many database technologies are developed to utilize the storage architecture/hierarchy Data in the database needs to be organized and stored/retrieved efficiently.

Download Presentation

Storage and File Structure

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Storage and File Structure Yan Huang - CSCI5330 Database Implementation –Storage and File Structure

  2. DBMS and Storage/File Structure • Why do we need to know about storage/file structure • Many database technologies are developed to utilize the storage architecture/hierarchy • Data in the database needs to be organized and stored/retrieved efficiently Yan Huang - CSCI5330 Database Implementation –Storage and File Structure

  3. Storage Hierarchy Volatile primary storage Cache unit price Memory Flash Memory Secondary storage Magnetic Disk Non-volatile speed Tertiary storage Optical Disk Magnetic Tape Yan Huang - CSCI5330 Database Implementation –Storage and File Structure

  4. Primary Storage (Volatile) • Cache • Speed: 7 to 20 ns (1 nanosecond = 10–9 seconds) • Capacity: • A typical PC level 2 cache 64KB-2 MB. • Within processors, level 1 cache usually ranges in size from 8 KB to 64 KB. • Main memory: • Speed: 10s to 100s of nanoseconds; • Capacity: Up to a few Gigabytes widely used currently • per-byte costs have decreased roughly factor of 2 every 2 to 3 years) Yan Huang - CSCI5330 Database Implementation –Storage and File Structure

  5. Secondary Storage (Non-volatile) • Flash memory • Speed: • read speed similar to main memory, write is much slower • Capacity: 32M to 512M currently • Forms: SmartMedia, memory stick, secure digital, BIOS • Cost: roughly same as main memory • Magnetic-disk • Capacities: up to roughly 100 GB currently • Growing constantly and rapidly with technology improvements (factor of 2 to 3 every 2 years) Yan Huang - CSCI5330 Database Implementation –Storage and File Structure

  6. Tertiary Storage (Non-volatile) • Optical storage • CD-ROM (640 MB) and DVD (4.7 to 17 GB) most popular forms • CD-RW, DVD-RW, and DVD-RAM • Reads and writes are slower than with magnetic disk • Juke-box systems, with large numbers of removable disks, a few drives, and a mechanism for automatic loading/unloading of disks available for storing large volumes of data Yan Huang - CSCI5330 Database Implementation –Storage and File Structure

  7. Tertiary Storage (Non-volatile) • Tape storage • non-volatile, used primarily for backup (to recover from disk failure), and for archival data • sequential-access– much slower than disk • very high capacity (40 to 300 GB tapes available) • Tape jukeboxes available for storing massive amounts of data • hundreds of terabytes (1 terabyte = 109 bytes) to even a petabyte (1 petabyte = 1012 bytes) Yan Huang - CSCI5330 Database Implementation –Storage and File Structure

  8. Between Memory and Disk • The permanent residency of database is mostly on disk • In database, cost is usually measured by the number of disk I/O • But disks are too slow and we need memory to be the buffers … but memory is volatile • this introduced a number of issues Yan Huang - CSCI5330 Database Implementation –Storage and File Structure

  9. Lets talk about disk Yan Huang - CSCI5330 Database Implementation –Storage and File Structure

  10. Disk Subsystem Disk interface standards families • ATA (AT adaptor) range of standards • SCSI (Small Computer System Interconnect) range of standards • Several variants of each standard (different speeds and capabilities) Yan Huang - CSCI5330 Database Implementation –Storage and File Structure

  11. Disk Speed Discuss ways to improve disk reading speed Rotation time/latency milliseconds Data-transfer rate (4-8MB/sec) • Typical numbers: • 16,000 tracks per platter • sectors per track: 200 – 400 • 512 bytes per sector • 4-16KB per block • 5,400 - 15,000 r p m Seek time (milliseconds) Access time = seek time + latency Yan Huang - CSCI5330 Database Implementation –Storage and File Structure

  12. RAID • RAID: Redundant Arrays of Independent Disks • high capacity and high speed by using multiple disks in parallel, and • high reliability by storing data redundantly Yan Huang - CSCI5330 Database Implementation –Storage and File Structure

  13. Mean time to failure (MTTF) • Average time the disk is expected to run continuously without any failure. • Typically 3 to 5 years (1 year = 8,760 hours) • MTTF = 30,000 to 1,200,000 hours for a new disk • an MTTF of 1,200,000 hours for a new disk means that given 1000 relatively new disks, on an average one will fail every 1200 hours • When number of disks increase, the chance of some disk failure increase proportionally Yan Huang - CSCI5330 Database Implementation –Storage and File Structure

  14. Parallelism • Two main goals of parallelism in a disk system: 1. Load balance multiple small accesses to increase throughput 2. Parallelize large accesses to reduce response time. • Basic strategy: Stripping • Compare and contrast bit stripping and byte stripping Yan Huang - CSCI5330 Database Implementation –Storage and File Structure

  15. Redundancy • store extra information that can be used to rebuild information lost in a disk failure • Basic strategy: mirroring, parity • Mean time to data loss depends on mean time to failure, and mean time to repair • E.g. MTTF of 100,000 hours, mean time to repair of 10 hours gives mean time to data loss of 500*106 hours (or 57,000 years) for a mirrored pair of disks (ignoring dependent failure modes) Data Parity 10010010 1 Yan Huang - CSCI5330 Database Implementation –Storage and File Structure

  16. RAID levels Yan Huang - CSCI5330 Database Implementation –Storage and File Structure

  17. Choice of RAID Levels • Level 1 provides much better write performance than level 5 • Level 5 requires at least 2 block reads and 2 block writes to write a single block, whereas Level 1 only requires 2 block writes • Level 1 preferred for high update environments such as log disks • Level 1 had higher storage cost than level 5 • disk drive capacities increasing rapidly (50%/year) whereas disk access times have decreased much less (x 3 in 10 years) • I/O requirements have increased greatly, e.g. for Web servers • When enough disks have been bought to satisfy required rate of I/O, they often have spare storage capacity • so there is often no extra monetary cost for Level 1! • Level 5 is preferred for applications with low update rate,and large amounts of data • Level 1 is preferred for all other applications Yan Huang - CSCI5330 Database Implementation –Storage and File Structure

  18. Buffer Management • Database can not fit entirely in memory, needs memory as a buffer for speed reasons • LRU is used in many OS • Spatial and temporal locality due to loops • Database has a more predictable behavior • Example: join Yan Huang - CSCI5330 Database Implementation –Storage and File Structure

  19. DBMS Buffer Management Strategies • Pinned block – memory block that is not allowed to be written back to disk. • Toss-immediate strategy – frees the space occupied by a block as soon as the final tuple of that block has been processed • Most recently used (MRU) strategy – system must pin the block currently being processed. After the final tuple of that block has been processed, the block is unpinned, and it becomes the most recently used block. • Statistical information based • E.g., the data dictionary is frequently accessed. Heuristic: keep data-dictionary blocks in main memory buffer • Buffer managers also support forced output of blocks for the purpose of recovery Yan Huang - CSCI5330 Database Implementation –Storage and File Structure

  20. Exercises Yan Huang - CSCI5330 Database Implementation –Storage and File Structure

More Related