620 likes | 639 Views
Sets & Maps. Overview. The Collections Hierarchy Sequence vs. Associative Containers Sets Maps Implementing Sets and Maps Trees Hash tables. the Collection interface. List interface extends Collection interface Collection is an object that holds other objects
E N D
Overview • The Collections Hierarchy • Sequence vs. Associative Containers • Sets • Maps • Implementing Sets and Maps • Trees • Hash tables
the Collection interface • List interface extends Collection interface • Collection is an object that holds other objects • Collections have base types (<String>, ...) • Sets are another kind of Collection • as are Queues Collection<E> List<E> Set<E> Queue<E> Collection, List, Set & Queue are interfaces; ArrayList & LinkedList are classes ArrayList<E> LinkedList<E>
Collection Methods • Common to Lists, Sets and Queues • add(T), remove(T), clear() • contains, equals, isEmpty, size, toArray • addAll, containsAll, removeAll, retainAll • hashCode, iterator • List and Deque add several methods • different methods for each • Set adds no more methods
the Collections class • The Collections (note the s) class • static methods for Collection objects • like ArrayLists • e.g. Collections.sort(…) • Collection is an interface • set of method headers for a “skill set” • Collections is a class • (static) methods with definitions
Other Collections Methods • Collections.methodName(args..); • frequency(list, 10) 2 [10, 20, 10, 5] • max(list) 20 [10, 20, 10, 5] • min(list) 5 [10, 20, 10, 5] • reverse(list) [5, 10, 20, 10] • replaceAll(list, 10, 7) [5, 7, 20, 7] • swap(list, 0, 2) [20, 7, 5, 7] • shuffle(list) maybe [7, 20, 5, 7] max & min can also take comparators as 2nd argument
Exercise • Write a method that shows how many of each element is in a List<Integer> – but only in the range of the elements in the list • for example: [10, 5, 10, 3, 5, 10, 22, 19, 10, 5] • 3 appears 1 time(s) • 4 appears 0 time(s) • 5 appears 3 time(s) • ... • 22 appears 1 time(s)
Lists vs. Sets • List elements allow duplicates; Sets do not • list1 [a, b, c, a, b, d, a, a, a, a, b, z] • set1 [a, b, c, d, z] • Client puts List elements in order; computer chooses order for Set elements • list1.add("e"); [a, b, c, a, b, d, a, a, a, a, b, z, e] • set1.add("e"); [a, b, c, d, e, z] • Set interface implemented by (e.g.) TreeSet
Container Types • Sequence containers (List) • accessed by position • first element, second, third, …, last • duplicates allowed • Associative containers (Set, Map) • accessed by key (= part of its value) • position is incidental ( doesn’t matter) • no duplicates allowed
10 20 30 15 7 19 40 21 10 8 42 90 54 0 1 2 3 4 5 6 7 8 9 10 11 12 lion6 31 cat3 99 4 8 elephant6 15 18 55 12 dog7 elf0 11 42 walrus11 Container Types • Sequence (List) • Associative (Set, Map) 25 91 7 3 -5 88 42
Is 4 in the set? yes Is 7 in the set? no How many things are in the set? 10 What is the value of “cat”? 3 Change the value of “elephant” to 7 Is the map empty? no lion6 31 cat3 99 4 8 elephant6 15 18 55 12 dog7 elf0 11 42 walrus11 Associative Containers elephant7
Java Set and Map • Set = collection of unique values • is <key> there or not? • Map = function from key to value • what is the value of <key>? • Set extends Collection; Map does not • Sets can be implemented with Maps • TreeSet implemented using a TreeMap • HashSet implemented using a HashMap
Set and Map Interfaces • Set contains elements of a single type • Set<String>, Set<Integer>, Set<Student> public interface Set<E> extends Collection • Map needs two types: key and value • Map<String, Integer>, Map<String, Student> public class Map<K, V>
Map and Set Application • Simple web page search engine • maintains a Map<String, Set<URL> > • key string = search word (“nonmonotonic”) • Set<URL> = web pages containing that word • Multi-term search uses set intersection • set of pages with “nonmonotonic” = s1 • set of pages with “reasoning” = s2 • s1 intersect s2 = set of pages with both
Set Element Order • Depends on the implementation Set<String> treeSet = new TreeSet<>(); Set<String> hashSet = new HashSet<>(); treeSet.add("words"); hashSet.add("words"); treeSet.add("in"); hashSet.add("in"); treeSet.add("this"); hashSet.add("this"); treeSet.add("set"); hashSet.add("set"); System.out.println(treeSet); System.out.println(hashSet); [in, set, this, words] [set, in, words, this]
31 99 4 8 15 18 55 12 11 42 Implementing Sets and Maps • Binary Search Trees • linked structure • Hash Tables • array structure 0 1 2 3 4 5 6 7 8 9 10 11 12 / / 15 31 55 4 8 99 31 4 12 55 18 11 18 42 42 15 8 99 / 11 12
Binary Search Trees • Tree: root and children • each child has one parent above it in the tree • Binary Tree: at most two children • a left child and a right child • Binary Search Tree: left < root < right • everything to the left of the root is smaller • everything to the right of root is larger
BST Example • Root is 31 • its children are 8 and 99: 8 < 31 < 99 • Root is 8 • its children are 4 and 12: 4 < 8 < 12 • Everything under 8 is < 31, too • Everything under 99 is > 31, too 31 8 99 55 42 4 12 11 18 15
BST Nodes • BST contains data and two links // for a Set<E> private class BSTNode { E data; BSTNode left; BSTNode right; } // for a Map<K, V> private class BSTNode { K key; V value; BSTNode left; BSTNode right; }
Set and Map Implementations • Root is a BSTNode • also need a comparator • and maybe a count // for Set<E> private BSTNode root; private Comparator<E> comp; private intnumInTree; • root starts as null, numInTree as 0 • comp may be the natural order comparator (o1, o2) -> o1.compareTo(o2)
Finding an Element in a BST • Want to know if 6 is in the tree • How to see? • 6 < 7, so if it’s there, it mustbe in the left sub-tree • 6 > 2, so look right • 6 > 3, look right again • There it is 7 2 12 3 1 6 5 4
Finding an Element in a BST • Want to know if 8 is in the tree • 8 > 7, so look right • 8 < 12, so look left • Nothing there • 8 must not be in the tree 7 2 12 3 1 6 5 4
BST contains Method • Like a binary search for item public boolean contains(E item) { BSTNode cur = root; while (cur != null) { int c = comp.compare(item, cur.data); if (c < 0) { cur = cur.left; } // item < root else if (c > 0) { cur = cur.right; } // item > root else { return true; } // item == root } return false; // item not found }
9 Inserting Into a BST • Newly inserted node must appearin exactly the right position • If it is bigger than the root: • it must go in the right sub-tree • If smaller, into the left sub-tree • Where in the sub-tree? Recur. • for example, insert 9 7 2 12 1 3 6 5 4
To Insert into a BST • If the root is null • create the new node here • If the item to insert is less than the root • insert into the left sub-tree • If the item to insert is more than the root • insert into the right sub-tree • Otherwise – ignore duplicate item • return false Sets do not allow duplicate elements
BST insert Method • Returns true if inserted, false otherwise • recall: no duplicates allowed public boolean insert(E anItem) { intoldCount = numInTree; root = insertNode(anItem, root); return numInTree > oldCount; } • insertNode creates the new Node (if necessary), places it in the tree, and updates numInTree
BST insert Method • Insert new node, return the root of subtree private BSTNodeinsertNode(E anItem, BSTNode cur) { if (cur == null) { ++numInTree; return new BSTNode(anItem); } int c = comp.compare(anItem, cur.data); if (c < 0) { cur.left = insertNode(anItem, cur.left); } else if (c > 0) { cur.right = insertNode(anItem, cur.right); } return cur; }
Deleting from a BST • Easy if the node is a leaf: • just delete it (for example, 4) 7 2 12 1 3 9 6 5 4
Deleting from a BST • Easy if the node is a leaf: • just delete it 7 2 12 • If it has only one child • we can re-attach the child to the deleted node’s parent (for example, 3) 1 3 9 6 5
Deleting from a BST • Easy if the node is a leaf: • just delete it • If it has only one child • we can re-attach the child to the deleted node’s parent 7 2 12 1 9 6 • But what if it has two children? • delete the 2, for example • now what? 5
6 6 5 1 1 5 1 1 6 6 5 1 5 5 6 Deleting from a BST • Need to keep the BST property • After 2 is deleted there willjust be the 1, 5 and 6 7 2 12 1 9 6 5
Deleting from a BST • Find the minimum value in theright sub-tree • or the maximum in the left • Copy its value into the root • root was going to be deleted anyway • Delete the node you copied from • it’ll have at most one child 7 5 2 12 1 9 6 5
Exercise • Show the BST that results whenyou delete 5 from this tree • (use “min on right” rule) • Show the BST that resultswhen you delete 7 from this tree • use “min on right” rule 7 5 12 6 1 9 11
BST delete Method • Returns true if deleted, false otherwise public boolean delete(E anItem) { intoldCount = numInTree; root = deleteNode(anItem, root); return numInTree < oldCount; } • Exercise: write deleteNode • hint: it’s very similar to insertNode
Complexity • Average complexity: • on average tree is pretty balanced • contains, insert and delete all O(log N) • Worst case complexity: • insert data in order one long list • contains, insert and delete all O(N) • There are ways to balance trees • Java uses Red-Black trees
Hash Table Operations • Hash tables are designed to make insertion and finding particular elements fast • both in O(1) average time • deletion also supported in O(1) average time • Other operations expensive or impossible • no way to find the minimum/maximum except by looking at every element • no information on order used at all
Simple-Minded Structure • Simplest way to get O(1) is to have an array with one cell per possible data element • Example: keys drawn from 0..19 • A is an array indexed by 0..19 • A[i] is null if the item with key i is not present • A[i] is the data element if it is present
Simple-Minded Structure • Insertion = create new item & set A[i] • Find = look at A[i] 00 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 / * / / / / / * / / / / / / * / / * * / 01 Smith Brian 07 Wilson Debra 14 Chan Paulina 17 Lafleur Denis 18 Burns Monty
Large Key Spaces • Key spaces can be very large • key is a 20 character string • coded as ASCII 12820 = 2140 1042 • Impossible to get that much space • Generally only have a few keys • don’t need that much space • few thousands or millions 107
Hash Table Idea • Make the array much smaller • maybe about twice the number of items you’re expecting • Divvy up the keys between the array cells • function from keys to array indices: “hashing” • try to keep them spread out = avoid “collisions”
Hashing • Keys in range 1..100 • take key mod 20 as index into the array 00 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 / * / / / / / * / / / / / / * / / * * / 41 Smith Brian 94 Chan Paulina 78 Burns Monty 27 Wilson Debra 37 Lafleur Denis 41 mod 20 = 1 94 mod 20 = 14 78 mod 20 = 18 27 mod 20 = 7 37 mod 20 = 17
Note on Hash Table Size • Want HT size to be a prime number • will explain why later • 20 not a good size for a hash table • 19 or 23 would be better • Will use 20 anyway in these notes • easier to do the math! • computer doesn’t have to worry about that….
Note on Hashing Function • Usually broken into two parts • Hash function proper translates key to integer • string number • number other number • Result mapped into table using mod (%) • For now just keep it really simple • just using positive integer values and mod
Hashing Exercise • Place the following keys into the hash table • 42, 17, 81, 20, 73 • Hash function = key mod 20 • h(42) = 42 % 20 = 2 00 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 / / / / / / / / / / / / / / / / / / / /
Collisions • Collision happens when two items hash to same array location 00 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 / * / / / / / * / / / / / / * / / * * / 41 27 94 37 78 58 Item 58 wants to go in position 18 item 78 already using it “collision”
Collision Frequency • Take a 1000 cell table & good hash function • each cell is equally likely to be hashed to • Chance of collision: • 1st item: 0/1000 • 2nd item: 1/1000 • 3rd item: 2/1000 • … • Nth item: (N–1)/1000
Collision Frequency • What’s the chance of at least one collision? • 1 entry = 1 – (1000/1000) = 0 • 2 entries = 1 – (1000/1000)*(999/1000) = 1/1000 • 3 entries = 1 – (1000/1000)*(999/1000)*(998/1000) 3/1000 • 4 entries 6/1000 • 5 entries (table 0.5% full) 10/1000 = 1% • 100 entries (table 10% full)? about 99.4% (!)
Dealing with Collisions • Two ways of dealing with the problem • Open addressing • put the new item somewhere else • need to figure out where • Separate chaining • make a chain of all the elements that hash to a given location
Open Addressing • Put item somewhere else in the array • Need to be able to find it again • need some rules that can be followed • Stupid rules make things very bad! • Clever rules perform very well • clever rules are hard to describe/understand • Java uses open addressing with clever rules
41 / 27 / 94 / 37 58 / / 78 * Chained Hash Tables • Array elements are linked lists instead of data element pointers • data stored in nodes 00 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 / * / / / / / * / / / / / / * / / * * /