400 likes | 594 Views
External Memory Data Structures. Srinivasa Rao Satti Workshop on Recent Advances in Data Structures December 20, 2011. Fundamental Algorithmic Problems. Searching : Given a list (sequence) L of elements x1, x2, .., xn and query element x , check whether x is present in L .
E N D
External Memory Data Structures Srinivasa Rao Satti Workshop on Recent Advances in Data Structures December 20, 2011
Fundamental Algorithmic Problems • Searching: Given a list (sequence) L of elements x1, x2, .., xn and query element x, check whether x is present in L. • When L is not sorted, we use linear search – scan the list to check if x is present in it. • When L is sorted, we use binary search – divide the remaining list to be searched in half with every comparison. Also insert and delete elements to/from L. • Sorting: Given a sequence of elements, sort them in increasing (or decreasing) order. • Insertion sort, bubble sort, quick sort, merge sort
Random Access Machine (RAM) Model • Standard theoretical model of computation: • Infinite memory • Uniform access cost • Unit-cost RAM model: All the basic operations (reading/writing a location from/to the memory, standard arithmetic and Boolean operations) take one unit of time. • Simple model crucial for success of computer industry. R A M
R A M L 1 L 2 Hierarchical Memory • Modern machines have complicated memory hierarchy • Levels get larger and slower further away from CPU • Data moved between levels using large blocks
read/write head read/write arm track magnetic surface Slow I/O • Disk access is 106 times slower than main memory access “The difference in speed between modern CPU and disk technologies is analogous to the difference in speed in sharpening a pencil using a sharpener on one’s desk or by taking an airplane to the other side of the world and using a sharpener on someone else’s desk.” (D. Comer) • Disk systems try to amortize large access time transferring large contiguous blocks of data (8-16Kbytes) • Important to store/access data to take advantage of blocks (locality)
External Memory Model [Aggarwal-Vitter 1988] N = # of items in the problem instance B = # of items per disk block M = # of items that fit in main memory T = # of items in output I/O: Move block between memory and disk Performance measures: Space: # of disk blocks used by the structure Time: # of I/Os performed by the algorithm (CPU time is “free”) D Block I/O M P
Scalability Problems: Block Access Matters • Example: Traversing linked list • Array size N = 10 elements • Disk block size B = 2 elements • Main memory size M = 4 elements (2 blocks) • Large difference between N and N/B since block size is large • Example: N = 256 x 106, B = 8000 , 1ms disk access time N I/Os take 256 x 103 sec = 4266 min = 71 hr N/B I/Os take 256/8 sec = 32 sec 1 5 2 6 3 8 9 4 10 1 2 10 9 5 6 3 4 7 7 8 Algorithm 1: N=10 I/Os Algorithm 2: N/B=5 I/Os
Queues and Stacks • Queue: • Maintain push and pop blocks in main memory O(1/B) Push/Pop operations • Stack: • Maintain push/pop blocks in main memory O(1/B) Push/Pop operations Push Pop
Fundamental Bounds Internal External • Scanning: N • Sorting: N log N • Searching: • Note: • Linear I/O: O(N/B) • B factor VERY important: • Cannot sort optimally with search tree
Search trees: API • Given a set S of keys, support the operations: • search(x) : return TRUE if x is in S, and FALSE otherwise • insert(x) : insert x into S (error if x is already in S) • delete(x) : delete x from S (error if x is not in S) • rangesearch(x,y) : return all the keys z such that x ≤ z ≤ y
Binary Search Trees • Binary search tree: • Standard method for search among N elements • We assume elements in leaves • Search traces a root-to-leaf path • If nodes are stored arbitrarily on disk • Search in I/Os • Rangesearch in I/Os
External Search Trees • BFS blocking: • Block height • Output elements blocked • • Rangesearch in I/Os • Optimal: O(N/B) space and query
External Search Trees • Maintaining BFS blocking during updates? • Balance is normally maintained in search trees using rotations • Seems very difficult to maintain BFS blocking during rotation • Also need to make sure output (leaves) is blocked! x y y x
B-trees • BFS-blocking naturally corresponds to tree with fan-out • B-trees balanced by allowing node degree to vary • Rebalancing performed by splitting and merging nodes
(a,b)-tree • T is an (a,b)-tree (a≥2 and b≥2a-1) • All leaves on the same level and contain between a and b elements • Except for the root, all nodes have degree between a and b • Root has degree between 2 and b (2,4)-tree • (a,b)-tree uses linear space and has height • • Choosing a,b = each node/leaf stored in one disk block • • O(N/B) space and query
(a,b)-Tree Insert • Insert: Search and insert element in leaf v DO { if v has b+1 elements/children Splitv: make nodes v’ and v’’ with and elements insert element (ref) in parent(v) (make new root if necessary) v=parent(v) } • Insert touches nodes v v’ v’’
(a,b)-Tree Delete • Delete: Search and delete element from leaf v DO { if v has a-1 elements/children Fusev with sibling v’: move children of v’ to v delete element (ref) from parent(v) (delete root if necessary) If v has >b (and ≤ a+b-1<2b) children split v v=parent(v) } • Delete touches nodes v’ v v
Summary/Conclusion: B-tree • B-trees: (a,b)-trees with a,b = • O(N/B) space • O(logB N+T/B) I/Os for search and rangesearch • O(logB N) I/Os for insert and delete • B-trees with elements in the leaves sometimes called B+-tree • Construction in I/Os • Sort elements and construct leaves • Build tree level-by-level bottom-up
B-tree Construction • In internal memory we can sortN elements in O(N log N) time using a balanced search tree: • Insert all elements one-by-one (construct tree) • Output in sorted order using in-order traversal • Same algorithm using B-tree use I/Os • A factor of non-optimal • As discussed we could build B-tree bottom-up in I/Os • In general we would like to have dynamic data structure to use in algorithms I/O operations
Flash memory • Non-volatile memory which can be erased and programmed • Characteristics: • Lighter • Provides better shock resistance • Providesmore throughput • Consumes less power • More denser (uses less space) compared to magnetic disks • Commonly used in digital cameras, handheld computers, mobile phones, portable music players etc. • Also used in embedded systems, sensor networks; and even replacing magnetic disks in PCs.
HDD vs SSD The disassembled components of a hard disk drive (left) and of the PCB and components of a solid-state drive (right)
Limitations of flash memory • Memory cells in a flash memory device can be written only a limited number of times • between 10,000 and 1,000,000, after which they wear out and become unreliable. • The only way to set bits (change their value from 0 to 1) is to erase an entire region memory. These regions have fixed size in a given device, typically ranging from several kilobytes to hundreds of kilobytes, and are called erase units. • Two different types of Flash memories: NOR and NAND • they have slightly different characteristics
Flash memory • The memory space of the chip is partitioned into blocks called erase blocks. The only way to change a bit from 0 to 1 is to erase the entire unit containing the bit. • Each block is further partitioned into pages, which usually store 2048 bytes of data and 64 bytes of meta-data. Erase blocks typically contain 32 or 64 pages. • Bits are changed from 1 to 0 by programming (writing) data onto a page. An erased page can be programmed only a small number of times (1 to 3) before it must be erased again.
Flash memory • Reading data takes tens of microseconds for the first access to a page, plus tens of nanoseconds per byte. • Writing a page takes hundreds of microseconds, plus tens of nanoseconds per byte. • Erasing a block takes several milliseconds. • Each block can sustain only a limited number of erasures. Algorithms/data structures designed for I/O model do not always work well when implemented on flash memory.
Flash memory models (I) • General flash model: • The complexity of an algorithm is x + c · y, where x and y are the number of read and write I/Os respectively, and c is a penalty factor for writing. • Typically, we assume that BR < BW << M, and c ≥ 1. BR M Flash c BW
Flash memory models (II) • Unit-cost flash model: • General flash model augmented with the assumption of an equal access time per element for reading and writing. • The cost of an algorithm performing x read I/Os and y write I/Os is given by x.BR + y.BW. • This simplifies the model considerably, as it becomes easier to adapt external-memory results. BR Flash M BW
B-trees on flash memory • An insertion in a B-tree updates a single leaf (unless the leaf splits) • Since we cannot perform an in-place update in flash memory, we need to create a new copy of the leaf, with the new element inserted. • Since the parent of this leaf has to update its pointer to the leaf, we need to create a new copy of the parent. And so on..up to the root. • Thus the write performance is quite bad for the naïve implementation.
Flash Translation Layer (FTL) • Software layer on the flash disk which performs logical to physical block mapping. • Distributes writes uniformly across blocks. • B-tree with FTL: • All nodes contain just the logical address of other nodes • Allows any update to write just the target node • Achieves one erase per update (amortized)
μ-tree [Kang, Jung, Kang, Kim, 2007] • Minimally Updated tree • Achieves similar performance as ‘B-tree with FTL’ on raw flash • Sizes of the nodes decreases exponentially from leaf to the root • Each block corresponds to a leaf-to-root path, and stores the nodes on a prefix of this path • Works only when log2 B ≥ logB N
FD-tree [Li, He, Yang, Luo, Yi, 2010] • Flash Disk aware tree index • Transforms random writes into sequential writes • Limits random writes to within a small region
FD-tree • Flash Disk aware tree index • Transforms random writes into sequential writes • Contains a head tree and a few levels of sorted runs of increasing sizes • O(logk N) levels, where k is the size ratio between levels
Other B-tree indexes for flash memory • BFTL [Wu, Luo, Chang, 2007] • Lazy Adaptive tree [Agrawal, Ganesan, Sitaraman, Diao, Singh, 2009] • Lazy Update tree [On, Hu, Li, Xu, 2009] • In-page Logging approach [Lee, Moon, 2007] • … All these are designed to get better practical performance, and take different aspects of flash characteristics into consideration. -- not easy to compare with each other
Comparison of tree indexes on flash N – number of elements BR – read block size BW – write block size BU – size of buffer h – height of the tree k - parameter
Directions for further research • The area is still in its infancy. • Not much is work has been done apart from the development of some file systems and tree indexing structures • Open problems: • Efficient tree indexes for flash memory • Tons of other (practically significant) algorithmic problems • Better memory model.