420 likes | 444 Views
Dive into data structures, algorithms, hashing, graphs, and more with Dr. Yen's course. Learn essential programming concepts, analysis techniques, and advanced data structures. Understand the significance of computational complexity, abstract data types, and academic integrity.
E N D
Data Structures and Programming 資料結構與程式設計
Instructor: 顏嗣鈞 E-mail: hcyen@ntu.edu.tw Web: http://www.ee.ntu.edu.tw/~yen Time: 9:10-12:00 AM, Tuesdaay Place: BL 113 Office hours: by appointment Class web page: http://www.ee.ntu.edu.tw/~yen/courses/ds17.html
顏嗣鈞 • 學歷 博士 Univ. of Texas at Austin (計算機科學) 1986 碩士 交大計算機工程研究所 1982 學士 台大電機系 1980 • 經歷 台大電機系 教授 1991 – present 台大計算機及資訊網路中心 主任 2014 – present 台大電機系 系主任 2010 -- 2013 台大電機系 副教授 1990 -- 1991 美國Iowa State Univ. 計算機科學系助理教授 1986-1990 • 專長 演算法設計分析、資訊視覺化、計算理論
PREREQUISITES Familiarity in PASCAL, C, C++, or JAVA.
Textbook: Data Structures & Algorithm Analysis in C++(3ndor 4thEd.), Mark Weiss, Addison Wesley.
TOPICS lPRELIMINARIES: Introduction. Algorithm analysis. lABSTRACT DATA TYPES: Stacks. Queues. Lists. List operations. List representations. List traversals. Doubly linked lists. lTREES: Tree operations. Tree representations. Tree traversals. Threaded trees. Binary trees. AVL trees. 2-3 trees. B-trees. Red-black trees. Binomial trees. Splay trees, and more. lHASHING: Chaining. Open addressing. Collision handling. lPRIORITY QUEUES: Binary heaps. Binomial heaps. Fibonacci heaps. Min-max heaps. Leftist heaps. Skew heaps. lSORTING: Insertion sort. Selection sort. Quicksort. Heapsort. Mergesort. Shellsort. Lower bound of sorting. lDISJOINT SETS: Set operations. Set representations. Union-find. Path compression. lGRAPHS: Graph operations. Graph representations. Basic graph algorithms. lAMORTIZED ANALYSIS. Binomial heaps, Skew heaps. Fibonacci heaps. lADVANCED DATA STRUCTURES: Tries. Top-down splay trees, and more.
Grading Homework + Prog. Assignments: 25-30% Midterm exam.: 35-40% Final exam.: 35-40%
Academic Integrity With the exception of group assignments, the work (including homework, programming assignments, tests) must be the result of your individual effort. This implies that one student should never have in his/her possession a copy of all or part of another student's homework. It is your responsibility to protect your work from unauthorized access. Academic dishonesty has no place in a university, in particular, in NTUEE. It wastes our time and yours, and it is unfair to the majority of students. Any form of cheating will automatically result in a failing grade in the course.
Data Structures vs. Programming Programs Data Structures Algorithms + =
C++ Data Structures One of the all time great books in computer science: The Art of Computer Programming (1968-1973) by Donald Knuth Examples in assembly language (and English)! American Scientist says: in top 12 books of the CENTURY!
Abstract Data Types Data Types integer, array, pointers, … • Abstract Data Type (ADT) • Mathematical description of an object and the set of operations on the object tradeoffs! • Algorithms • binary search, quicksort, …
Advanced Data Structures “Why not just use a big array?” Example problem Search for a number k in a set of N numbers Solution # 1: Linear Search Store numbers in an array of size N Iterate through array until find k Number of checks Best case: 1 (k=15) Worst case: N (k=27) Average case: N/2 15 10 22 3 12 19 27 3 10 12 15 19 22 27 Sorted array ?
Advanced Data Structures Solution # 2: Binary Search Tree (BST) Store numbers in a binary search tree Requires: Elements to be sorted Properties: The left subtree of a node contains only nodes with keys less than the node's key The right subtree of a node contains only nodes with keys greater than the node's key Both the left and right subtrees must also be binary search trees Search tree until find k Number of checks Best case: 1 (k=15) Worst case: log2 N (k=27) Average case: (log2 N) / 2 15 10 22 3 12 19 27
Example Does it matter? Problem Artifacts N = 1,000,000,000 1 billion (Walmart transactions in 100 days) 1 Ghz processor = 109 cycles per second Solution #1 ( assume 10 cycles per check) Worst case: 1 billion checks = 10 seconds Solution #2 (assume 10 cycles per check) Worst case: 30 checks = 0.0000003 seconds
Analysis Does it matter? N vs. (log2 N)
Computational Complexity • Computational complexity: an abstract measure of the time and space necessary to execute an algorithm as functions of its “input size”. • Input size: size of encoded “binary” strings. • sort n words of bounded length - input size: n • the input is the integer n -input size:lg n • the input is the graph G(V, E) - input size: |V| and |E| • Runtime comparison: assume 1 BIPS,1 instruction/op. Spring 2013
Can’t Finish the Assigned Task “I can’t find an efficient algorithm, I guess I’m just too dumb.”
Mission Impossible “I can’t find an efficient algorithm, because no such algorithm is possible.”
諾爾 愛斯坦 “I can’t find an efficient algorithm, but neither can all these famous people.”
Motivating Example:Minimum Spanning Tree Given an undirected graph G = (V, E) with weights on the edges, a minimum spanning tree (MST) of G is a subset TE such that T is connected and has no cycles, T covers (spans) all vertices in V, and sum of the weights of all edges in T is minimum.
Another Example: Shortest Path Given a weighted graph and two vertices u and v, we want to find a path of minimum total weight between u and v. Length of a path is the sum of the weights of its edges. Applications: Internet packet routing, Flight reservations, Driving directions 849 PVD 1843 ORD 142 SFO 802 LGA 1205 1743 337 1387 HNL 2555 1099 1233 LAX 1120 DFW MIA
Dijkstra’s algorithm 0 A 4 8 2 8 2 3 7 1 B C D 3 9 5 8 2 5 E F 0 A 4 8 2 8 2 4 7 1 B C D 3 9 2 5 E F 0 0 A A 4 4 8 8 2 2 8 2 3 7 2 3 7 1 7 1 B C D B C D 3 9 3 9 5 11 5 8 2 5 2 5 E F E F
Example (cont.) 0 A 4 8 2 7 2 3 7 1 B C D 3 9 5 8 2 5 E F 0 A 4 8 2 7 2 3 7 1 B C D 3 9 5 8 2 5 E F
Questions 0 Find neighbors A 4 8 • Operations performed? 2 8 2 4 B C D 2 8 4 B C D Find/remove minimum E F 8 2 3 7 1 B C D Update neighbors 3 9 5 11 2 5 E F
Key steps 2 C 2 C 8 8 B B 4 4 D D F F E E Find/remove minimum Update 3 D 8 5 B E 11 F
Straightforward approach Find min: scan thru the array; Remove item Update: a left-right scan again • Questions: • Is the above efficient? • Can we do better?
Priority Queues Heaps Operation Linked List Binary Binomial Fibonacci * Relaxed make-heap 1 1 1 1 1 insert 1 log N log N 1 1 find-min N 1 log N 1 1 delete-min N log N log N log N log N union 1 N log N 1 1 decrease-key 1 log N log N 1 1 delete N log N log N log N log N is-empty 1 1 1 1 1 O(|V|2) O(|E| log |V|) O(|E| + |V| log |V|) Dijkstra/Prim 1 make-heap |V| insert |V| delete-min |E| decrease-key
Is this program correct? —How do we know? int Find(float[] a, int m, int n, float x) { while (m < n) { int j = (m+n) / 2; if (a[j] < x) { m = j+1; } else if (x < a[j]) { n = j-1; } else {return j; } }return -1;}
Making sense of programs • Program semantics defines programming language • e.g., Hoare logic, Dijkstra's weakest preconditions • Specifications record design decisions • bridge intent and code • Tools amplify human effort • manage details • find inconsistencies • ensure quality
Program Correctness (Example) (x != null ==> x != null && x.f >= 0) &&(x == null ==> z-1 >= 0) if (x != null) { n= x.f;} else { n= z-1; z++;}a= newchar[n]; x != null && x.f >= 0 z-1 >= 0 n >= 0 true
Data “Structures” General areas include: ● Sequential storage ● Hierarchical storage ● Adjacency storage
Goal • Learn to write efficient and elegant software • How to choose between two algorithms • Which to use? bubble-sort, insertion-sort, merge-sort • How to choose appropriate data structures • Which to use? array, vector, linked list, binary tree
Why should you care? • Complex data structures and algorithms are used in every real program • Data compression uses trees: MP3, Gif, etc… • Networking uses graphs: Routers and telephone networks • Security uses complex math algorithms: GCD and large decimals • Operating systems use queues and stacks: Scheduling and recursion • Many problems can only be solved using complex data structures and algorithms
In this course, we will look at: • different techniques for storing, accessing, and modifying information on a computer • algorithms which can efficiently solve problems • We will see that all data structures have trade-offs – there is no ultimate data structure... • The choice depends on our requirements
What this course isNOTabout • This course is not about C++ • Although we will use C++ to implement some of the concepts • This course is not about MATH • Although we will use math to formalize many of the concepts • Competency in both math and C++ is therefore welcomed. • C++: inheritance, overloading, overriding, files, linked-lists, multi-dimensional arrays • Math: polynomials, logarithms, inductive proofs, logic
The Big Idea • Definition of Abstract Data Type • A collection of data along with specific operations that manipulate that data • Has nothing to do with a programming language! • Two fundamental goals of algorithm analysis • Correctness: Prove that a program works as expected • Efficiency: Characterize the run-time of an algorithm
The Big Idea • Alternative goals of algorithm analysis • Characterize the amount of memory required • Characterize the size of a programs code • Characterize the readability of a program • Characterize the robustness of a program
Clever? Efficient? Insert Delete Find Merge Shortest Paths Union Lists, Stacks, Queues Heaps Binary Search Trees AVL Trees Hash Tables Graphs Disjoint Sets Data Structures Algorithms
Why study data structures? Clever ways to organize information in order to enable efficient computation Databases AI Theory Graphics Networking Games Systems Applications Data Structures