330 likes | 347 Views
Understand linear & binary search, hash functions, probing, collisions, load factor, chaining. Learn to implement hashing schemes effectively.
E N D
CS 240: Data Structures Tuesday, July 24th Searching, Hashing Graphs
Assignments • Self-Revisions for Lab 5 and Lab 6 are due next Tuesday (the writeup is pivotal in terms of a grade) – July 31st at 4pm. • Project 1 revisions need to be submitted by next Thursday, August 2nd at 4pm. • New defenses can be scheduled
The Test • The code for Lab 7 will be released early (before the lab is due). • Make sure you understand the questions. • Your lab 7 grade will be based on your answer to the Lab 7 questions on the test. • Know the first three sorts and either quicksort or mergesort. • You need to understand how to do a radix sort. • Know two of the lists from the presentation today – one should be what you actually did today.
Puppets. Now, dance!
Searches • Linear Search • Binary Search • Not too good with linked list… • This is why we add more links (trees!) • Hashing, an insertion/searching combo • See Chapter 9.3, Hash Tables (p. 530)
Hashing • What is hashing? • Corned beef hash: Hash(Corn(Beef)): ---------------------
Hashing • Hash Browns: Hash( ): -----------
Hashing • So, what does this have to do with anything? Well…. Maybe we should look at real hash browns… Much better!
Hashing • The point is: We have no idea what is in corned beef hash. No matter what Beef is. Beef is generic! We have no idea what is in hash browns. However, Hash(Corn(Beef)): --------------------- The same with hash browns too…
Hashing • Hashing lets us represent some “data” with some smaller “data”. The representation is not perfect! Look at the corned beef hash! But it is consistent. That makes it useful!
Hashing • Ok, back to seriousness for a moment: • Remember the algorithmic complexity of our various searches? Because this tree is as bad as a linear search! • Linear Search = • Binary Search = • Balanced Binary Search = • Why do we care if it is balanced? We’ll leave fixing this for another time.
Hashing • Other than making corned beef, there are other, more useful, hashing schemes: Consider this: Instead of putting all the records of computers, Binghamton University decides to keep only paper records of grade due to malicious CS students changing their grades each morning. Now, you need some money. You get this cushy work-study job pulling up folders to answer grade requests. Sound good, right?
Hashing • So, if I ask you for the grades for “El Zilcho” (first name “El”, last name “Zilcho”) how do you find them? Linear search right? We start from Alan Aardvark! You start by going to “Z”. But, how did you know to do that (if nobody suggested this, stop lesson, go home and cry)? You are a born bureaucrat!
Hashing • Hashing by first letter is a common hash. //firstletterhash represents h(x) //tohash represents x int firstletterhash(string tohash) { return(int(tohash.at(0))%26); } With a small enough list we can search pretty quickly!
Hashing • The first letter implementation requires that we have 26 entries. • If we only have a few entries we are wasting space! • A tradeoff decision must be made! What are the tradeoffs?
Hashing • Ok, we are done. • You know all there is to know about hashing. • Cool. Alright, quick quiz. Let us make a first-letter hash table. • A winner is you. Add the following: Apple Alabama Uh oh. Now what?
Hashing • We have a collision! One solution is linear probing: Finding stuff isn’t too much harder. What about deleting stuff?
Hashing • Some options: • Larger table • Different collision scheme • Better hash function (MD5?) Protip: Hash tables should be about 1.5-2 times as large as the number of items to store to keep collisions low.
But… I really like linear probing The point is: “too bad” You have more to learn! There is always more. Look how long I’ve been here…. No, don’t. It makes me feel old.
But… I really like linear probing Linear probing can cluster data! You can probe quadratically: i – 1, i + 1, i – 4, i + 4, i – 9, i + 9, <i – n2, i + n2> Better… but… How about a secondary hash? These can be really useful! Casting, Mapping, Folding, Shifting
More hashes? Are you sold? • Well, some of you may have thought of this: Isn’t this similar to the example we started with?
Hashes • How long should the hash function take? • Moreover, why does it matter? No matter what the data is (as long as it is the correct type) the hash function needs to be able to evaluate it!
Hashes • Some theory: • If Load Factor = Num Elements in Table / Table Size • Open addressing/Closed Hashing: Use Probing • When we don’t use a linked list (we use probing) our load factor should be < 0.5 • Closed addressing/Open Hashing: Use chaining (linked list) • But, if we do use a linked list then we want to load factor to be closer to 1. Why?
Uh oh. • New topic. • You will miss the hash. • Maybe not.
Graphs • So far, all of our Nodes only point to one other node • This changed today with the linked list presentations: • Next and previous pointer • Multiple pointers based on pieces of data • But, they can point to multiple nodes!
Trees • First, we generally don’t count a previous pointer as a pointer. • Our linked lists point to 1 other node (not counting special lists) hence a unary list. • However, we can point to two different nodes. A path “next” and “othernext”. • For a tree: “left” and “right”
Graphs… • We will talk more about trees next week. • A graph has an unlimited number of pointers that can pointer anwhere.
Representation • Now, our Node needs new data: • A list of Node* instead of just “next” • Some way to select a “next” • Graphs will often take distance between Nodes into account (so far, our distances have been irrelevant) • Hence each Node* is associated with a distance • We can store this as a “pair<Node*,int>” • Requires #include<algorithm>
Linked List • A Linked List is a subset of Graph. • It has nodes with only 1 Node* (list size == 1) • And the distance between each Node is the same (no value is needed, but we might as well say 0).