330 likes | 452 Views
CS503: Fifth Lecture, Fall 2008 Recursion and Linked Lists. Michael Barnathan. Here’s what we’ll be learning:. Theory: Recursion. Recursive data structures. The “Divide and Conquer” paradigm. Memoization (as in “memo” + “ization”). Data Structures: Linked Lists.
E N D
CS503: Fifth Lecture, Fall 2008Recursion and Linked Lists. Michael Barnathan
Here’s what we’ll be learning: • Theory: • Recursion. • Recursive data structures. • The “Divide and Conquer” paradigm. • Memoization (as in “memo” + “ization”). • Data Structures: • Linked Lists. • We are going to keep coming back to recursion throughout the semester. • But it should be easier for you each time we cover it. • We’ll stop covering it when you’re sufficiently familiar with it.
Recursion: Definition • A function is recursive if it calls itself as a step in solving a problem. • Why would we want to do this? • A data structure is recursive if it can be defined in terms of a smaller version of itself. • Recursion is used when the problem can be broken down into smaller instances of the same problem. • Or “subproblems”. • These subproblems are more easily solved. • Often by breaking them into even smallersubproblems. • Of course, at some point, we have to stop. • The solutions to these subproblems are then merged into a solution for the whole problem.
Recursive Structures: Example. • Before we discussed sorting, I asked you how you would sort a 3 element array. • We couldn’t figure that out immediately, so I asked how to do it on a 2-element array. • Compare the elements and swap. • Then I asked you how to extend that to a 3-element array. • Do two comparisons. • A size-n array can be recursively defined as a single element and an array of size n-1. • The sorting problem was easier to solve for small arrays. • By extending the (easier) problem from small to large, we came up with a general sorting algorithm (bubble sort).
Recursion: Why? • Oftentimes, a problem can be solved by reducing it to a smaller version of itself. • If your solution is a function, you can call that function inside of itself to solve the smaller problems. • There are two components of a recursive solution: • The reduction: this generates a solution to the larger problem through solution of the smaller problems. • The base case: this solves the problem outright when it’s “small enough”. • The key: • You won’t be able to follow what is going on in every recursive call at once; don’t think of it this way. • Instead, think of a recursive function as a reduction procedure for reducing a problem to a smaller instance of itself. Only when the problem is very small do you attempt a direct solution.
Example: • Using a loop, print all of the numbers from 1 to n. void printTo(int n) { for (inti =1; i <= n; i++) System.out.println(i); } • Now do it using recursion. • Stop and think before coding. Never rush into a recursive algorithm. • Problem: Print every number from 1 to n. • How can we break this problem up? • Print every number from 1 to n-1, then print n. • “Print n” is easy: System.out.println(n); • How can we print every number from 1 to n-1? • How about calling our solution with n-1? • This is going to call it with n-2… • This is going to call it with n-3... • … • And print n-3. • And print n-2. • And print n-1. • And print n. • And then we’re done! Right? Where does it end!?
When does it end!? • This is a fundamental question when writing any recursive algorithm: where do we stop? • The easiest place to stop is when n < 1. • What do we do when n < 1? • Well, we already outputted all of the numbers from 1 to n. That was the goal. • So we do nothing. We “return;”.
So here it is. void printTo(int n) { if (n < 1) //Base case. return; else { //Recursive step: 1 to n is 1 to n-1 and n. printTo(n-1); //Print 1 to n-1. System.out.println(n); //Print n. } } • Question: What if we printed n before calling printTo? • Don’t try to trace each call. Think about what this is doing.
The Reduction: • We reduced the problem of printing 1 .. n to the problem of printing 1 .. n-1 and printing n. • A smaller instance of the same problem. • So we solved it using the same function. • Because we took one element off the end at a time, this is called tail recursion. • When it became “small enough” (n < 1), we solved it directly. • By stopping, since the output was already correct. • If we kept going, we’d print 0, -1, …, which would be incorrect. • That’s how recursion works. • But this is a simple example. • What’s the complexity of this algorithm?
Splitting into pieces. • What we just saw was a “one piece” problem. • We reduced the problem to one smaller problem. • n -> (n-1). • These are the easiest to solve. • These solutions are usually linear. • What if we split the problem into two smaller problems at each step? • Say we wanted to find the nth number in the Fibonacci series.
Recursive Fibonacci: • The Fibonacci series is a series in which each term is the sum of the two previous terms. • Recursive definition. • The first two terms are 1. • Base case (without this, we’d have a problem). • It looks like this: • 1, 1, 2, 3, 5, 8, 13, 21, 34, … • And here’s its function: • F(n) = F(n-1) + F(n-2) • F(1) = F(2) = 1
Fibonacci: two-piece recursion. • In order to find the nth Fibonacci number, we need to simply add the n-1th and n-2th Fibonacci numbers. • Ok, so here’s a Java function fib(int n): int fib(int n) { } • What would we write for the base case?
Fibonacci base case int fib(int n) { if (n <= 2) return 1; } • That was simple. The recursive part?
Fibonacci base case int fib(int n) { if (n <= 2) return 1; return fib(n-1) + fib(n-2); } • Ok, that wasn’t too bad.
Solving multi-piece recursion. • Often you get a direct solution from the recursive call in one-piece recursion. • But when you split into more than one piece, you must often merge the solutions. • In the case of Fibonacci, we did this by adding. • Sometimes it will be more complex than this. • Recursion usually looks like this: • Call with smaller problems. • Stop at the base case. • Merge the subproblems as we go back up.
Divide and Conquer • The practice of splitting algorithms up into manageable pieces, solving the pieces, and merging the solutions is called the “divide and conquer” algorithm paradigm. • It is a “top down” approach, in that you start with something big and split it into smaller pieces. • Recursion isn’t necessary for these algorithms, but it is often useful.
Memoization • What if we wanted to analyze fib? • Well, there’s one call to fib(n)… • Which makes two calls: fib(n-1) and fib(n-2)… • Which makes four calls: fib(n-2), fib(n-3), fib(n-3), and fib(n-4)… • Which makes eight… • … • Uh oh.
Why is it exponential? • So this is O(2^n). Not good. • And yet if you were to do it in a loop, you could do it in linear time. • But there’s something wrong with the way we’re calling that causes an exponential result. • There’s a lot of work being repeated. • We repeat n-2 twice, n-3 four times, n-4 eight… • In fact, this is the reason why it’s exponential! • Fortunately, we can reduce this.
Memoization • “Memoization” (no “r”) is the practice of caching subproblem solutions in a table. • (Come to think of it, they could have left the “r” in and it would still be an accurate term). • So when we find fib(5), we store the result in a table. The next time it gets called, we just return the table value instead of recomputing. • So we save one call. What’s the big deal? • fib(5) calls fib(4) and fib(3), fib(4) calls fib(3) and fib(2), fib(3) calls… • You actually just saved an exponential number of calls by preventing fib(5) from running again.
Implementing Memoization • Use a member array for the table and wrap the actual work in a private function. • The fib() function looks up the answer in the table and calls that function if it’s not found. class Fibonacci { private static int[] fibresults = new int[n+1]; //Or use a Vector for dynamic sizing. public int fib(int n) { if (fibresults[n] <= 0) //Not in table. fibresults[n] = fib_r(n); //Fill it in. return fibresults[n]; } private intfib_r(int n) { //This does the real work. if (n <= 2) return 1; //Base case. return fib(n-1) + fib(n-2); //Note that we call “fib”, not “fib_r”. } }
The Gain • When we store the results of that extra work, this algorithm becomes linear. • Finding F(50) without memoization takes 1 minute and 15 seconds on rockhopper. • Finding F(50) with memoization takes 0.137s. • The space cost of storing the table was linear. • Because we’re storing for one variable, n.
Optimal Solution to Fibonacci • The Fibonacci series has a closed form. • That means we can find F(n) in constant time: • F(n) = (phi^n - (-1/phi)^n) / sqrt(5). • Phi is the Golden Ratio, approx. 1.618. • It pays to research the problem you’re solving.
Linked Lists • We said that arrays were: • Contiguous. • Homogenous. • Random access. • What if we drop the contiguousness? • That is, adjacent elements in the list are no longer adjacent in memory. • It turns out that you lose random access, but gain some other properties in return.
Linked Lists • A linked list is simply a collection of elements in which each points to the next element. • For example: • This is accomplished by storing a reference to the next node in each node: class Node<DataType> { public DataType data; public Node<DataType> next; } 1 2 3
Variations • Doubly linked lists contain pointers to the next and previous nodes. The Java “LinkedList” class is doubly-linked. • This class has a similar interface to Vector. • Circularly linked lists are linked lists in which the last element points back to the first: • Seldom used, usually for “one every x” problems. • To traverse one of these, stop when the next element is equal to where you started. 1 2 3 1 0 2 3
CRUD: Linked Lists. • Insertion: ? • Access: ? • Updating an element: ? • Deleting an element: ? • Search: ? • Merge: ? • Let’s start with access and insertion.
Node Access • Elements are no longer contiguous in memory. • We can no longer jump to the ith element. • Now we have to start at the beginning and follow the reference to the next node i times. • Therefore, access is linear. • This is called sequential access. • Because every node must be visited in sequence. 1 2 3 Access element 3:
Node Insertion • To insert into a list, all we need to do is change the “next” pointer of the node before it and point the new node to the one after. • This is a constant-time operation (provided we’re already in position to insert). • Example: Node1.next = new Node(5, Node1.next); 1 5 2 3 Insert “5” after “1”:
Merging Two Lists • Linked lists have the unique property of permitting merges to be carried out in constant time (if you store the last node). • In an array, you’d need to allocate a large array and copy each of the two arrays to it. • In a list, you simply change the pointer before the target and the last pointer in the 2nd list. • Example: 1 2 3 1 2 3 Merge at “2”: 6 5 4 6 5 4
CRUD: Linked Lists. • Insertion: O(1) • Access: O(N) • Updating an element: O(1) • Deleting an element: ? • Search: O(N). • Merge: O(1). • Binary search will not work on sequential access data structures. • Moving around the array to find the middle is O(n), so we may as well use linear search. • Updating a node is as simple as changing its value. • That leaves deletion.
Deletion • Insertion in reverse. • Easier to do in a doubly-linked list. • Store the next node from the target. • Remove the target node. • Set previous node’s next to the stored next. 1 5 2 3 Delete “5”:
CRUD: Linked Lists. • Insertion: O(1) • Access: O(N) • Updating an element: O(1) • Deleting an element: O(1) • Search: O(N). • Merge: O(1). • Dynamically sized by nature. • Just stick a new node at the end. • Modifications are fast, but node access is the killer. • And you need to access the nodes before performing other operations on them. • Three main uses: • When search/access is not very important (e.g. logs, backups). • When you’re merging and deleting a lot. • When you need to iterate through the list sequentially anyway.
Les Adieux, L’Absence, Le Retour • That was our first lecture on recursion. • There will be others - it’s an important topic. • The lesson: • Self-similarity is found everywhere in nature: trees, landscapes, rivers, and even organs exhibit it. Recursion is not a primary construct for arriving at solutions, but a method for analyzing these natural patterns. • Next class: Linked Lists 2, Stacks, and Queues. • Begin thinking about project topics.