640 likes | 718 Views
Object Based Disk: the key to intelligent I/O. George Gorbatenko Data Machine International St Paul, MN 55115 gorby@ece.umn.edu. Why are we interested?. faster transportable more accessible cheaper facilitates holistic design improve reliability.
E N D
Object Based Disk: the key to intelligent I/O George Gorbatenko Data Machine International St Paul, MN 55115 gorby@ece.umn.edu
Why are we interested? • faster • transportable • more accessible • cheaper • facilitates holistic design • improve reliability DMI
I/O is considered the weak link in systems architecture • I/O problem • memory wall • bottle neck DMI
Issues • randomness is painful • mechanical time vs electronic time • ratio of times is about 200:1 • operating system obscures the disk DMI
Operating System • seamless view of space • legacy of data storage goes back to punched card • accommodates all applications DMI
Data evolution • tape reflected a 80 column card image • disk reflected tape DMI
In short… • nothing much has changed data format-wise since the 1930’s • we are pretty much dealing with records in a linear format, one record after the next DMI
The advantage of object based design is • encapsulate the data • define the application subset • don’t have the operating system getting in the way DMI
SQL object is good choice • broad user base • de facto standard for data bases • high enough to exploit the power in the I/O • yesterdays CPU in today’s disk (controller) • aggregate compute power exceeds the host DMI
Researchers in Intelligent Disks are motivated by… • exploiting the latent processing potential • filtering data in place DMI
But where do we place the intelligence? • host • I/O controller • disk DMI
many platters (ea fixed head) 10 many concentric tracks / platter 10k each track holds many sectors 100 Total number of 512 byte sectors 10M ____ disk capacity: 5GB Disk basics DMI
To access a random block • seek to track 10-15 us • wait for block to roll around 4 –5 us • read block 80 us hence… 200:1 DMI
Design Goals • synchronous operation • next data you want is beneath head • process data in place (filter) • touch the min amount of data • for what you touch you pay in time and space • exploit locality • amortize random access read over large data block DMI
Access strategies… • Amortize the (inefficient) access over large block of data • Make sure the data has utility DMI
Data Utility… DMI
Consider the travels of an inchworm… A1 B1 C1 D1 E1 A2 B2 C2 D2 E2 A3 B3 C3 D3 E3 A4 B4 C4 D4 E4 A5 B5 C5 D5 E5 DMI
Travels of an inchworm… A1 B1 C1 D1 E1 A2 B2 C2 D2 E2 A3 B3C3D3 E3 A4 B4 C4 D4 E4 A5 B5 C5 D5 E5 DMI
Travels of an inchworm… A1 B1 C1 D1 E1 A2 B2 C2 D2 E2 A3 B3C3D3 E3 A4 B4 C4 D4 E4 A5 B5 C5 D5 E5 DMI
Locality of Reference A1 B1 C1 D1 E1 A2 B2 C2 D2 E2 A3 B3C3D3 E3 A4 B4 C4 D4 E4 A5 B5 C5 D5 E5 (a) Logical view of two dimensional table. A1 B1 C1 D1 E1 A2 B2 C2 D2 E2 A3 B3 C3 D3…… (b) Row ordered mapping (physical). A1 A2 A3 A4 A5B1 B2 B3 B4 B5 C1 C2 C3 C4…… (c) Column ordered mapping (physical) DMI
Preservation of Logical Topology To preserve the logical topology of n dimensional logical data space, the physical space must at least be of like dimension. - for a 2D table (rows and columns) we need to view disk as two dimensional DMI
Observations: • SQL can be decomposed in two operations • select - favored by column order • extract – favored by row order • granular access permits touching min data • map data so as to preserve topology when going from logical to physical medium • reading a tracks worth of data appears reasonable DMI
Treating disk as 2D space • data objects are 2D spaces • solves “design boundaries” • disk is basically a 3D medium • cylinder-track-sector DMI
track read… DMI
Physical Sector Block Organization… Physical sector (512) Logical sector size (lss) DMI
Logical Sector Block Organization… Physical sector (512) Logical sector size (lss) DMI
record structure… typedef struct _record { char employee_no [8]; // employee number; field A char name [12]; // name; field B char address [24]; // address; field C char zip [5]; // zip code; field D char salary [6]; // salary; field E char doh [6]; // data of hire; field F char dept [3]; // department; field G char tbd [16]; // reserved for future use; field H } Record; DMI
modified best fit algorithm LSS (8 bytes) LSS = ceil (rec_len / num_hds) = ceil (64 /10) = 4n = 8 rec_space = LSS * num_hds = 80 bytes DMI
modified best fit algorithm LSS (8 bytes) A typedef struct _record { char employee_no [8]; // field A char name [12]; // field B char address [24]; // field C char zip [5]; // field D char salary [6]; // field E char doh [6]; // field F char dept [3]; // field G char tbd [16]; // field H } Record; B C D G E F DMI
SQL Decomposition… • Select records • scan the salary field • stores ordinal position in bit vector • Extract records • optimizer decides strategy (trk or sb read) DMI
Prototype • two 4 GB Seagate Baracudas • 21 heads (29 zones) • 40 KLOC • skew = 5 sectors • Solaris 2.51 OS • emulated intelligence in IOP • context sw every 60 ms DMI
Data particulars… • 168 byte records • LSS = 8 bytes • 63 records per Sector Block • 7,749 records per cylinder • 3 fields (2 heads) involved in query • 2 records extracted from disjoint blocks DMI
Test Runs • write cyl worth data w/o optimizer • write same with optimizer enabled • scan cyl involving 3 col; extract 2 blks • repeat operation (c) DMI
Results… ObservedCalculated case (a) 2.5 sec 2.427 sec case (b) 196 ms 216 ± 4 ms case (c) 51 ms 54.5 ± 4ms case (d) 42 ms 37.6 ± 4ms DMI
Benchmark Analysis • 3 Benchmarks selected - Wisconsin - Set Query - TPC D/H • selected non-join cases • reversed engineered the I/O detail DMI
Wisconsin results… WISCONSIN BENCHMARK 1000 WIS 2D 100.0 Time ( seconds) 10.00 1.000 0.100 Q1 Q2 Q3 Q4 Q5 Benchmarks DMI