1 / 69

CS 130 A: Data Structures and Algorithms

This course provides in-depth coverage of data structures and algorithms, with a focus on problem-solving and coding skills. Prerequisites include knowledge of stacks, queues, binary search trees, functions, recurrence equations, and programming competence in C, C++, and UNIX.

papp
Download Presentation

CS 130 A: Data Structures and Algorithms

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CS 130 A: Data Structures and Algorithms • Course webpage: www.cs.ucsb.edu/~suri/cs130a/cs130a • Email:suri@cs.ucsb.edu • Office Hours: 11-12 Wed

  2. CS 130A: Prerequisites • First upper division course • More in-depth coverage of data structures and algorithms • Prerequisites • CS 16: stacks, queues, lists, binary search trees, … • CS 40: functions, recurrence equations, proofs, … • Programming competence assumed • C, C++, and UNIX • Refresh your coding and debugging skills • Use TAs

  3. Text Book • Data Structures & Algorithm Analysis in C++ by Mark Allen Weiss • Supplemental material from Introduction to Algorithms, by Cormen, Leiserson, Rivest, Stein [MIT book] • Lecture material primarily based on my notes • Lecture notes available on my webpage • See web page for lectures updates, assignments.

  4. CS 130 A: Grade Composition • 2 Midterm exams (30% total) • 2 Programming assignments (30% total) • 1 Final exam (40%) • Homework assignments • They will not be graded: they are to help you practice problem solving and prepare for exams • Solving homework problems key to understanding. • Solutions will be made available, so you can self-assess your understanding and work with TAs to correct your mistakes. • Attend all lectures! • Schedule is tentative. • Unexpected changes in midterm/exam dates

  5. Some Advice and Caution • Posted schedule of lectures, assignments, exams is tentative • Reviews unplanned • Unexpected events may change dates of midterms • No makeup exams, no extensions. • Attend all lectures. • Read lecture notes (material) before coming to class.

  6. Teaching Assistants • Teaching Assistants: • Semih Yavuz (syavuz@cs.ucsb.edu) • Discussion: Wed 6:30-7:200 (GIRV 1119) • TA hours: TBA (Trailer 936) • Bay-Yuan Hsu (soulhsu@cs.ucsb.edu) • Discussion: Tues 6:30-7:20 (GIRV 1119) • TA hours: TBA (Trailer 936)

  7. Discussion Sections • No discussion section this week • Discussion Format • No new material discussed • It is meant as a help session • Use them to go over homework assignments • Programming pointers • But TA are not there to help you write or debug code

  8. What the course is about • The course is primarily about Data Structures • Algorithms covered in small part (20%) • CS 130B is the main algorithms course • Data structures will be motivated by applications although we won’t discuss them in any detail

  9. What the course is about • This is a Theory course, not programming/systems • Primary focus on concepts, design, analysis, proofs • Includes 2 coding assignments, but no programming taught • C++, Unix competence expected • My teaching philosophy for 130A • Discovery and insights. Big picture. • Best understood in abstract form, with pen-paper • Alternative Style: learn by coding. (If coding is your thing, feel free to program the data structures.) • Exams on conceptual understanding, not coding details. • Homework exercises model for exam questions.

  10. Course Outline • Introduction and Algorithm Analysis (Ch. 2) • Hash Tables: dictionary data structure (Ch. 5, CLRS) • Heaps: priority queue data structures (Ch. 6) • Balanced Search Trees: general search structures (Ch. 4.1-4.5) • Union-Find data structure (Ch. 8.1–8.5, Notes) • Graphs: Representations and basic algorithms • Topological Sort (Ch. 9.1-9.2) • Minimum spanning trees (Ch. 9.5) • Shortest-path algorithms (Ch. 9.3.2) • B-Trees: External-Memory data structures (CLRS, Ch. 4.7) • kD-Trees: Multi-Dimensional data structures (Notes, Ch. 12.6) • Misc.: Streaming data, randomization (Notes)

  11. What are your goals? • A step towards the BS degree • Just a required CS course • Becoming a well-rounded computer scientist • Intellectual (theory) aspects of CS • Clever ideas • Interview questions at elite software companies

  12. My goals • Algorithms is my research expertise • A lively and enormously active area of research • Broad impact on almost every area of CS • My personal mission: • transmit some of the knowledge and enthusiasm • Win the best teacher award • Weekly Jokes • Send me your jokes!

  13. Why Study Algorithms and Data Structures? • Intellectual Pursuit

  14. Why Study Algorithms and Data Structures? • To become better computer scientist

  15. Why Study Algorithms and Data Structures? • World domination

  16. Algorithms are Everywhere • Search Engines • GPS navigation • Self-Driving Cars • E-commerce • Banking • Medical diagnosis • Robotics • Algorithmic trading • and so on …

  17. Emergence of Computational Thinking • Computational X • Physics: simulate big bang, analyze LHC data, quantum computing • Biology: model life, brain, design drugs • Chemistry: simulate complex chemical reactions • Mathematics: non-linear systems, dynamics • Engineering: nano materials, communication systems, robotics • Economics: macro-economics, banking networks, auctions • Aeronautics: new designs, structural integrity • Social Sciences, Political Science, Law ….

  18. Emergence of Computational Thinking

  19. Modern World of Computing • Age of Big Data, birth of Data Science • Digitization, communication, sensing, imaging… • Entertainment, science, maps, health, environmental, banking… • Volume, variety, velocity, variability • What all happens in 1 Internet minute?

  20. Intelligent Computational Systems

  21. Why Data Structures? • Data is just the raw material for information, analytics, business intelligence, advertising, etc • Computational efficient ways of analyzing, storing, searching, modeling data • For the purpose of this course, need for efficient data structures comes down to: • Linear search does not scale for querying large databases • N2 processing or N2 storage infeasible • Smart data structures offer an intelligent tradeoff: • Perform near-linear preprocessing so that queries can be answered in much better than linear time

  22. 2 Motivating Applications • Imagine you are in charge of maintaining a corporate network (or a major website such as Amazon) • High speed, high traffic volume, lots of users. • Expected to perform with near perfect reliability, but is also under constant attack from malicious hackers • Monitoring what is going through the network is complex: • Why is it slow? • Which machines have become compromised? • Which applications are eating up too much bandwidth etc.

  23. IP Network Monitoring • Any monitoring software/engine must be extremely light weight and not add to the network load • These algorithms need smart data structures to track important statistics in real time

  24. IP Network Monitoring • Consider a simple (toy) example • Is some IP address sending a lot of data to my network? • Which IP address sent the most data in last 1 minute? • How many different IP addresses in last 5 minutes? • Have I seen this IP address in the last 5 minutes? • IP address format: 192.168.0.0 • IPv4 has 32 bits, IPv6 has 128 bits • You wouldn’t want to maintain a table of all IP addresses to see how much traffic each is sending. • These are data structure problems, where obvious/naïve solutions are no good, and require creative/clever ideas.

  25. Microprocessor Profiling • Modern microprocessors run at GHz or higher speeds • Yet they do an incredible amount of optimization for instruction scheduling, branch prediction etc • Profiling or monitoring code tracks performance bottlenecks, and looks for anomalies. • Compute memory access statistics • Correlations across resources etc • Toy examples: • Which memory locations used the most in the last 1 sec? • Usage map over sliding time window • Need for highly efficient dynamic data structures

  26. A Puzzle • Most Frequent Item • You are shown a sequence of N positive integers • Identify the one that occurs most frequently • Example: 4, 1, 3, 3, 2, 6, 3, 9, 3, 4, 1, 12, 19, 3, 1, 9 • However, your algorithm has access to only O(1) memory • “Streaming data” • Not stored, just seen once in the order it arrives • The order of arrival is arbitrary, with no pattern • What data structure will solve this problem?

  27. A Puzzle: Most Frequent Item • Items can be source IP addresses at a router • The most frequent IP address can be useful to monitor suspicious traffic source • More generally, find the top K frequent items • Targeted advertising • Amazon, Google, eBay, Alibaba may track items bought most frequently by various demographics

  28. Another Puzzle • The Majority Item • You are shown a sequence of N positive integers • Identify the one that occurs at least N/2 times • A: 4, 1, 3, 3, 2, 6, 3, 9, 3, 4, 1, 12, 19, 3, 1, 9, 1 • B: 4, 1, 3, 3, 2, 3, 3, 9, 3, 4, 1, 3, 19, 3, 3, 9, 3 • Sequence A has no majority, but B has one (item 3) • Again, your algorithm has access to only O(1) memory • What data structure will solve this problem?

  29. Solving the Majority Puzzle • Use two variables C (candidate) and M (multiplicity). • When next item, say, X arrives • if C undefined (null), set C = X and M = 1; • else if X = C, set M = M+1; • else set M = M-1; • Claim: At the end of sequence, C is the only possible candidate for majority. • Note that sequence may not have any majority. • But if you know there is a majority, C must be it.

  30. Solving the Majority Puzzle • Proof of Correctness. • Suppose item Z is the majority item. • Whenever C = Z, counter M is incremented. • Whenever Z occurs but C has a different item, Z causes M to decrement. • Each decrement is “charged” to that non-Z item • Each non-Z item can only counteract one occurrence of Z • Since there are fewer than N/2 non-Z items, they cannot cancel all occurrences of Z. • So, in the end, Z must be stored as C, with a non-zero M value.

  31. Solving the Majority Puzzle • False Positives in Majority Puzzle. • What happens if the sequence does not have a majority? • C may contain a random item, with non-zero M. • Strictly, a second pass through the sequence is necessary to “confirm” that Z is the majority. • But in our application, it suffices to just “tag” a malicious IP address, and to monitor it for next few minutes.

  32. Generalizing the Majority Problem • Identify k items, each appearing more than N/(k+1) times. • Note that simple majority is the case of k = 1.

  33. Generalizing the Majority Problem • Find k items, each appearing more than N/(k+1) times. • Use k candidate-multiplicity tuples (C1, M1), …, (Ck, Mk). • When next item, say, X arrives • if X = Cj for some j, set Mj = Mj+1 • if X different from all Cj, but some tuple i free, then set Ci = X and Mi = Mi+1 • else decrement all counters Mj = Mj-1; • Verify for yourselves this algorithm is correct.

  34. Back to the Most Frequent Item Puzzle • You are shown a sequence of N positive integers • Identify most frequently occurring item • Example: 4, 1, 3, 3, 2, 6, 3, 9, 3, 4, 1, 12, 19, 3, 1, 9 • What algorithm and data structure will help?

  35. An Impossibility Result • Cannot be done! • Computing the MFI requires storing Q(N) space. • An adversary based argument: • The first half of the sequence has all distinct items • At least one item, say, X is not remembered by algorithm. • In the second half, all items will be distinct, except X will occur twice, becoming the MFI.

  36. Lessons for Data Structure Design • Puzzles such as Majority and Most Frequent Items teach us two important lessons: • To solve a problem, we should understand its structure • Correctness is intertwined with design/efficiency • Problems with superficial resemblance can have very different complexity • Do not blindly apply a data structure or algorithm without understanding the nature of the problem

  37. Performance Bottleneck: algorithm or data structure?

  38. Course Objectives • Focus: systematic design and analysis of data structures (and some algorithms) • Algorithm: method for solving a problem. • Data structure: method to store information. • Guiding principles: abstraction and formal analysis • Abstraction: Formulate fundamental problem in a general form so it applies to a variety of applications • Analysis: A (mathematically) rigorous methodology to compare two objects (data structures or algorithms) • In particular, we will worry about "always correct"-ness, and worst-case bounds on time and memory (space).

  39. 130a: Design and Analysis • Foundations of Algorithm Analysis and Data Structures. • Data Structures • How to efficiently store, access, manage data • Data structures effect algorithm’s performance • Algorithm Design and Analysis: • How to predict an algorithm’s performance • How well an algorithm scales up • How to compare different algorithms for a problem

  40. Asymptotic Complexity Analysis

  41. Complexity and Tractability Assume the computer does 1 billion ops per sec.

  42. N2 is bad, Exponential is horrible

  43. Graph Problems Often face Combinatorial Explosion

  44. Quick Review of Algorithm Analysis • Two algorithms for computing the Factorial • Which one is better? • int factorial (int n) { if (n <= 1) return 1; else return n * factorial(n-1); } • int factorial (int n) { if (n<=1) return 1; else { fact = 1; for (k=2; k<=n; k++) fact *= k; return fact; } }

  45. A More Challenging Algorithm to Analyze main () { int x = 3; for ( ; ; ) { for (int a = 1; a <= x; a++) for (int b = 1; b <= x; b++) for (int c = 1; c <= x; c++) for (int i = 3; i <= x; i++) if(pow(a,i) + pow(b,i) == pow(c,i)) exit; x++; } }

  46. Max Subsequence Problem • Given a sequence of integers A1, A2, …, An, find the maximum possible value of a subsequence Ai, …, Aj. • Numbers can be negative. • You want a contiguous chunk with largest sum. • Example: 4, 3, -8, 2, 6, -4, 2, 8, 6, -5, 8, -2, 7, -9, 4, -1, 5 • While not a data structure problems, it is an excellent pedagogical exercise for design, correctness proof, and runtime analysis of algorithms

  47. Max Subsequence Problem • Given a sequence of integers A1, A2, …, An, find the maximum possible value of a subsequence Ai, …, Aj. • Example: 4, 3, -8, 2, 6, -4, 2, 8, 6, -5, 8, -2, 7, -9, 4, -1, 5 • We will discuss 4 different algorithms, of time complexity O(n3), O(n2), O(n log n), and O(n). • With n = 106, Algorithm 1 may take > 10 years; Algorithm 4 will take a fraction of a second!

  48. Algorithm 1 for Max Subsequence Sum • Given A1,…,An , find the maximum value of Ai+Ai+1+···+Aj Return 0 if the max value is negative

  49. Algorithm 1 for Max Subsequence Sum • Given A1,…,An , find the maximum value of Ai+Ai+1+···+Aj 0 if the max value is negative int maxSum = 0; for( int i = 0; i < a.size( ); i++ ) for( int j = i; j < a.size( ); j++ ) { int thisSum = 0; for( int k = i; k <= j; k++ ) thisSum += a[ k ]; if( thisSum > maxSum ) maxSum = thisSum; } return maxSum; • Time complexity: O(n3)

More Related