200 likes | 332 Views
CS2 in C++ Peer Instruction Materials by Cynthia Bailey Lee is licensed under a Creative Commons Attribution- NonCommercial - ShareAlike 4.0 International License . Permissions beyond the scope of this license may be available at http://peerinstruction4cs.org .
E N D
CS2 in C++ Peer Instruction Materials by Cynthia Bailey Lee is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.Permissions beyond the scope of this license may be available at http://peerinstruction4cs.org. CS106X – Programming Abstractions in C++ Cynthia Bailey Lee
Today’s Topics: Map Interface, continued • Binary Search Tree math • (continued from Friday) • How to maintain balance • Introduction to Hashing • C++ template details (if we have time)
Bad BSTs • One way to create a bad BST is to insert the elements in order: 34, 22, 18, 9, 3 • That’s not the only way… • How many distinctly structured BSTs are there that exhibit the worst case height (height equals number of nodes) for a tree with 5 nodes? • 2-3 • 4-5 • 6-7 • 8-9 • More than 9
BALANCE!!! • The #1 issue to remember with BSTs is that they are great when balanced (O(log n) operations), and horrible when unbalanced (O(n) operations) • Balance depends on order of insert of elements • Not the implementor’s control • Over the years, implementors have devised ways of making sure BSTs stay balanced no matter what order the elements are inserted
Red-Black trees One of the most famous (and most tricky) strategies for keeping a BST balanced
Red-Black trees • In addition to the requirements of binary search trees, red–black trees must meet these: • A node is either red or black. • The root is black. • All leaves (null children) are black. • Both children of every red node are black. • Every simple path from a given node to any of its descendant leaves contains the same number of black nodes.
Red-Black trees • In addition to the requirements of binary search trees, red–black trees must meet these: • A node is either red or black. • The root is black. • All leaves (null children) are black. • Both children of every red node are black. • Every simple path from a given node to any of its descendant leaves contains the same number of black nodes. According to these, what is the greatest possible difference between the longest and shortest root-to-leaf paths in the tree? All root-to-leaf paths are equal length 25% longer 50% longer 100% longer Other/none/more than one
Red-Black trees • Every simple path from a given node to any of its descendant leaves contains the same number of black nodes. • This is what guarantees “close” to balance This file is licensed under the Creative CommonsAttribution-Share Alike 3.0 Unported license. http://commons.wikimedia.org/wiki/File:Red-black_tree_example.svg
Insert procedure must maintain the invariants (this gets tricky) • Video: http://www.youtube.com/watch?v=vDHFF4wjWYU This file is licensed under the Creative CommonsAttribution-Share Alike 3.0 Unported license. http://commons.wikimedia.org/wiki/File:Red-black_tree_example.svg
Other BST balance strategies • Red-Black tree • AVL tree • Treap (BST + heap in one tree! What could be cooler than that, amirite? ♥ ♥ ♥) Other fun types of BST: • Splay tree • Rather than only worrying about balance, Splay Tree dynamically readjusts based on how often users search for an item. Commonly-searched items move to the top, saving time • B-Tree • Like BST, but a node can have many children, not just 2
Hashing Implementing the Map interface with Hashing/Hash Tables
Imagine you want to look up your neighbors’ names, based on their house number (all on your same street) • House numbers: 10565 through 90600 • (roughly 1000 houses—there are varying gaps in house numbers between houses) • Names: one last name per house
Options • BST (balanced): • Add/remove: O(logn) • Find: O(logn) • Linked list: • Add/remove: O(n) • Find: O(n) • Array: • Add/remove: • Find:
Hash Table is just a modified, more flexible array • Keys don’t have to be integers 0-(size-1) • (Ideally) avoids big gaps like our gap from 0 to 10565 in the house numbers and between numbers • Hash function is what makes this possible!
Closed Addressing • Where does key=“Annie” value=10 go if hash (“Annie”) = 3? • Where does key=“Solange” value=12 go if hash(“Solange”) = 5?
Hash collisions • We may need to worry about hash collisions • Hash collision: • Two keys a, b, a≠b, have the same hash code index (i.e. hash(a) = hash(b))
Closed Addressing • Where does key=“Annie” value=55 go if hashkey(“Annie”) = 3? • 55 overwrites 10 at 3 • A link list node is added at 3 • Other/none/ more than one
Hash key collisions • We can NOT overwrite the value the way we would if it really were the same key • Need a way of storing multiple values in a given “place” in the hash table