340 likes | 361 Views
Understand the basics of hashing, open addressing, and 2-3 trees for efficient web server load balancing. Learn about hash table terminology, different hashing techniques, and the principles behind 2-3 trees. Discover how to insert and delete nodes efficiently in 2-3 trees and the significance of keys at leaf nodes.
E N D
CSE 326 Nov 18, 1999 (Title pages make Powerpoint happy)
Hash tables Hashing summary Hashing for Web server load balancing 2-3, (a,b), B-trees 2-3 Tree: invariant, and insert/delete (a,b)-Tree Why many children Why nodes at root Zasha’s Up-tree mistake k-d trees Assignment 7 group discussion Menu
“Hash Table” terminology confusion • Regular, vanilla Arrays sometimes called “Hash Table” • Don’t get confused • Laugh quietly • Ask “where’s the hashing function?”
Hashing in Greg Badros’s childhood • Call customer service for status of your order. • Notes on status stored in one of 100 boxes. • Which box? Last 2 digits of phone number. • Hash(Customer)=PhoneNumber mod 100 • External hashing with list stored in array cell (papers in box)
External/Open Hashing • Collision: Array indexes point to buckets • Buckets = linked lists • Buckets = AVL trees O(log n) worst case • Insert/Delete easy
Coalesced Chaining • Store linked list inside Array (instead of external list) • On collision: look for first free Array element, starting at first • Reserve “Cellar” for hotspots at begin of Array
Open Addressing • Collision: Implicit list based on 2nd hashing function • On collision: add 2nd hashing function to last index • Types: • “Linear probing”. 2nd function is H(K)=c for some constant (can have “clusters”) • “Double Hashing”. pseudo-random 2nd function (clusters not likely)
Open Addressing cont’d • Deletes are tricky. “Deleted flag” • “Ordered hashing”. • ordered list of collisions • reduce time for unsuccessful searches
2-3 Tree: Invariants • All leaves are same depth • log time • All internal nodes have 1 or 2 keys (2 or 3 children) • All leaves have 1 or 2 keys • In-order property • like BST property • so we can find things efficiently
2-3 Tree: Zasha’s flawed intuition • BST is like 1-2 tree • 2-child nodes: can be balanced • 1-child nodes: unbalanced linked list • 2-3 Tree • 2-child nodes is worst case. • (FLAW: also need “all leaves same depth”, or else degenerate Huffman tree)
2-3 Tree: Insert (p. 231) Find leaf node to get K nodeWithInsert is node that gets K Loop If nodeWithInsert has 2 Keys (3 children) Then { Fine } Exit Loop If nodeWithInsert has 3 Keys (4 children) Then { Ooops – too many } Split nodeWithInsert into 2 nodes { Say 2 nodes have parent – throw this new parent to nodeWithInsert’s Parent } nodeWithInsert = nodeWithInsert.Parent If nodeWithInsert is Root Then Create new root with 2 children Exit Loop
2-3 Tree: Why insert works • All nodes end up with 1/2 Keys (2/3 children) • Keeps in-order property • split values for 3-Key (4-child) nodes into 2-child nodes • Keeps same levels • only time we add a level anywhere is Root • adding new root adds levels symmetrically
2-3 Tree: Delete (p. 231) Find nodeWithDelete based on Key If nodeWithDelete is not leaf Then nodeWithDelete=In-Order Successor(nodeWithDelete) { nodeWithDelete is a leaf} Loop If nodeWithDelete has 1 Key (2 Children) Then { Fine } Exit Loop { cont’d }
2-3 Tree: Delete cont’d If 0 keys (1 child) Then { Ooops } If nodeWithDelete is Root Then Remove Root Exit Loop Set nodeSibling { parent has 2/3 children } Set S: Key(nodeWithDelete),S,Key(nodeSibling) Set siblingKey : Key(nodeSibling) If nodeSibling has 1 Keys (2 Children) Then Move S from parent to nodeWithDelete Replace S with sKey from nodeSibling Move child of siblingKey Exit Loop Else Parent is parent(nodeWithDelete,nodeSibling) { give nodeWithDelete 2 keys (3 children) } nodeWithDelete gets keys S and siblingKey nodeWithDelete=Parent { we took a key } { loop again with Parent }
2-3 Tree: Why Delete works • All nodes end up with 1/2 Keys (2/3 children) • Keeps in-order property • careful about which keys we take from parent/sibling • Keeps same levels • only time we remove a level is Root • adds levels symmetrically
(a,b) Tree: Why such big nodes? • Hard Drive / CD-ROM • Read sector at a time • Sector: 256 bytes – 4 KB (typical) • AVL/2-3 node likely <20 bytes • You read 256 bytes to get 20 useful bytes???
(a,b) Tree: Why keys at leaf? • Database: nodes on disk. • Every access to tree asks for Root • Keep Root in RAM • Keys in root helps rarely • Put Keys in leaves, Cache higher nodes
(a,b) Tree: Why keys at leaf: 2 • No in-order successor business • Can fiddle with Keys at internal nodes • e.g. Chop off unnecessary suffixes
(a,b) Tree: Invariants • a >= 2, b >= 2a-1 • All leaves have same depth • All internal nodes have a..b children • All leaves have (a-1)..(b-1) keys • In-order property for efficient searching
(a,b) Tree: Insert idea • Same idea as 2-3 Tree Vague code snippet If nodeWithInsert has too many children Then Split it: get 2 nodes, each with b/2 children Now, Parent is nodeWithInsert
(a,b) Tree: Delete idea • Same idea as 2-3 Tree Vague code snippet If nodeWithInsert has too few children Then Get nodeSibling If nodeSibling has >a children Then Borrow key from parent Replace borrowed key from nodeSibling Take nodeSibling’s child from its just- removed key Else nodeSibling has a children, So Borrow key from parent Ooops, parent lost a key nodeWithInsert=Parent
B-Tree: Summary • B-Tree is (a,b) tree with b=2a-1 • Why not get the biggest value of a we can? • Who wants to bother with two different parameters???
B-tree for full text index (aka “inverted index”) • One possibility • Key: word • Info: list of documents (or document IDs) • Jim & Zasha’s idea for phrases • Key: word • Info: list of (document,list of occurrences as word #n)
Depth first search • Avoid loops: don’t repeat nodes. • Use some kind of Set ADT
2-d Trees: points • Like binary trees, but lines • Not always balanced • R-trees are • balanced • big nodes like B-trees • Usually faster than lists… • source: Zasha
Assignment 7 Discussion Points • What task will your program accomplish? • What sub-problems will it need to solve, to accomplish this task? • How do the sub-problems relate to each other? What does one sub-problem need from another in order to work? • What seems tricky?
Discussion example: Huffman • Purpose: • Program will encode text files using Static Huffman trees, and then decode the encoded files. This will compress files. • Sub-problems: • Reading/writing files. • Reading/writing files with bits • Heap with DeleteMin/Insert • Building Static Huffman tree using Heap • Encoding/Decoding scheme for Static Huffman tree with bit-oriented files
Discussion example: Huffman cont’d • Sub-problem relationships: • Reading/writing files is lowest level. • R/W files with bits needs R/W files • Heap is lowest • Building static Huffman needs reading files and Heap • Encoding/Decoding tree needs Static Huffman tree and bit-oriented files,