Data Structures Test: Searching, Hashing, Graphs

CS 240: Data Structures Tuesday, July 15th Searching, Hashing Graphs

The Test • Make sure you know the sorts. • There will be more coding on this test than the previous.

Sort Analysis • Well, first we need to explain more structurally how these work: • Mergesort: • Split up a list into smaller parts (half size) until the list is of size 1. • Put lists back together by “merging”: insertion sort • Quicksort: • Select a value and ensure that all values to the left are smaller – all values to the right are equal or larger • Repeat with left/right side until they are of size 1 or 2 (and sorted).

Indirect Sorting • Well, remember the difference between pass by value and pass by reference in terms of speed? • Sometimes you need to sort large objects! • You can use pointers! Fast access! Avoid copying data around!

Next week… • Second exam next Monday! Yay! + 1

Searches! • Linear search • You know this, you know you do. • Binary search • Why is this a problem with linked list?

Searches • Linear Search • Binary Search • Not too good with linked list… • This is why we add more links (trees!) • Hashing, an insertion/searching combo • See Chapter 9.3, Hash Tables (p. 530)

Hashing • What is hashing? • Corned beef hash: Hash(Corn(Beef)): ---------------------

Hashing • Hash Browns: Hash( ): -----------

Hashing • So, what does this have to do with anything? Well…. Maybe we should look at real hash browns… Much better!

Hashing • The point is: We have no idea what is in corned beef hash. No matter what Beef is. Beef is generic! We have no idea what is in hash browns. However, Hash(Corn(Beef)): --------------------- The same with hash browns too…

Hashing • Hashing lets us represent some “data” with some smaller “data”. The representation is not perfect! Look at the corned beef hash! But it is consistent. That makes it useful!

Hashing • Ok, back to seriousness for a moment: • Remember the algorithmic complexity of our various searches? Because this tree is as bad as a linear search! • Linear Search = • Binary Search = • Balanced Binary Search = • Why do we care if it is balanced? We’ll leave fixing this for another time.

Hashing • Other than making corned beef, there are other, more useful, hashing schemes: Consider this: Instead of putting all the records of computers, Binghamton University decides to keep only paper records of grade due to malicious CS students changing their grades each morning. Now, you need some money. You get this cushy work-study job pulling up folders to answer grade requests. Sound good, right?

Hashing • So, if I ask you for the grades for “El Zilcho” (first name “El”, last name “Zilcho”) how do you find them? Linear search right? We start from Alan Aardvark! You start by going to “Z”. But, how did you know to do that (if nobody suggested this, stop lesson, go home and cry)? You are a born bureaucrat!

Hashing • Hashing by first letter is a common hash. //firstletterhash represents h(x) //tohash represents x int firstletterhash(string tohash) { return(int(tohash.at(0))%26); } With a small enough list we can search pretty quickly!

Hashing • The first letter implementation requires that we have 26 entries. • If we only have a few entries we are wasting space! • A tradeoff decision must be made! What are the tradeoffs?

Hashing • Ok, we are done. • You know all there is to know about hashing. • Cool. Alright, quick quiz. Let us make a first-letter hash table. • A winner is you. Add the following: Apple Alabama Uh oh. Now what?

Hashing • We have a collision! One solution is linear probing: Finding stuff isn’t too much harder. What about deleting stuff?

Hashing • Some options: • Larger table • Different collision scheme • Better hash function (MD5?) Protip: Hash tables should be about 1.5-2 times as large as the number of items to store to keep collisions low.

But… I really like linear probing The point is: “too bad” You have more to learn! There is always more. Look how long I’ve been here…. No, don’t. It makes me feel old.

But… I really like linear probing Linear probing can cluster data! You can probe quadratically: i – 1, i + 1, i – 4, i + 4, i – 9, i + 9, <i – n2, i + n2> Better… but… How about a secondary hash? These can be really useful! Casting, Mapping, Folding, Shifting

More hashes? Are you sold? • Well, some of you may have thought of this: Isn’t this similar to the example we started with?

Hashes • How long should the hash function take? • Moreover, why does it matter? No matter what the data is (as long as it is the correct type) the hash function needs to be able to evaluate it!

Hashes • Some theory: • If Load Factor = Num Elements in Table / Table Size • Open addressing/Closed Hashing: Use Probing • When we don’t use a linked list (we use probing) our load factor should be < 0.5 • Closed addressing/Open Hashing: Use chaining (linked list) • But, if we do use a linked list then we want to load factor to be closer to 1. Why?

Yes, you can sort the chaining hash table.

Returning Data • Before we start the final leg of the class: • Let’s talk about returning values • Returning by value • Returning by reference • Can we return more than one thing?

Linked List • A Linked List is a subset of Graph. • It has nodes with only 1 Node* • And the distance between each Node is the same (no value is needed, but we might as well say 0).

Graphs • So far, all of our Nodes only point to one other node • But, they can point to multiple nodes!

Trees • First, we generally don’t count a previous pointer as a pointer. • Our linked lists point to 1 other node (not counting special lists) hence a unary list. • However, we can point to two different nodes. A path “next” and “othernext”. • For a tree: “left” and “right”

Graphs… • We will talk more about trees after the test. • A graph has an unlimited number of pointers that can pointer anywhere.

Start

Representation • Now, our Node needs new data: • A list of Node* instead of just “next” • Some way to select a “next” • Graphs will often take distance between Nodes into account (so far, our distances have been irrelevant) • Hence each Node* is associated with a distance • We can store this as a “pair<Node*,int>” • Requires #include<algorithm>

Data Structures Test: Searching, Hashing, Graphs

Data Structures Test: Searching, Hashing, Graphs

Presentation Transcript

CS-240

CS-240 Data Structures in C Arrays

CS 240: Data Structures

CS 240: Data Structures

CS 240: Data Structures

CS 240: Data Structures

CS 240: Data Structures

CS 240: Data Structures

CS 240: Data Structures

CS 240: Data Structures

CS 240: Data Structures

CS 240: Data Structures

CS 240: Data Structures

CS 240: Data Structures

CS 240: Data Structures

CS 240: Data Structures

CS 240: Data Structures

CS 240: Data Structures

CS 240: Data Structures

CS 240: Data Structures

CS 240: Data Structures

CS 240: Data Structures