500 likes | 610 Views
Data Structures. Alan, Tam Siu Lung 96397999 Tam@SiuLung.com 99967891. Prerequisite. Familiarity with Pascal/C/C++ Asymptotic Complexity Techniques learnt Recursion Divide and Conquer Exhaustion Greedy [Dynamic Programming exempted] Algorithms learnt
E N D
Data Structures Alan, Tam Siu Lung 96397999 Tam@SiuLung.com 99967891
Prerequisite • Familiarity with Pascal/C/C++ • Asymptotic Complexity • Techniques learnt • Recursion • Divide and Conquer • Exhaustion • Greedy • [Dynamic Programming exempted] • Algorithms learnt • Bubble / Insertion / Selection / Shell / Merge / Quick / Bucket / Radix Sorting • Linear / Binary / Interpolation Searching
What our Programming Language provides? • Built-in Data Types • Character/String (length limit?) • Integral (signed/unsigned 8 [?], 16, 32, 64 [?] bit) • Floating Point (signed/unsigned 32, 64, 80 [?] bit) • Fixed Point [?] • Complex [?] • Pointer/Reference • Function Pointer/Reference
What our Programming Language provides? • Aggregate Data Types • Array [base-definable?] • Multiple Values of same type • Access by numeric index • Record/Struct/Class • Multiple Values of different types • Function Aggregation + Inheritance + Polymorphism [?] • Unions [?]
What our Programming Language provides? • Built-in Language Constructs • Branching (If, Else) • Loops (For, While, Until) • Function/Procedure Calling • In C++’s view, statements and operators are functions as well • a = b int &operator=(int &a, const int &b) • a > b bool operator>(const int &a, const int &b) • *a int &operator*(int *a) • a[b] string &operator[](string &a[], int b) • Recursion • Even more for more sophisticated languages!
For most of the remaining time • We concentrate at • Pointer • Array • Record • and how they interact • We will use a C++-like notation • array<int> meaning an array of integer • int* is acronym of pointer<int> • Records are written as: struct<int, int, string> • Capital types are “variables” which means it can be replaced by any types
Formal Definition: Pointer • Concept: • pointer<Type> p; (Type *p) [^p in Pascal] • Operations: • *p Type &operator*(Type *p) [p^ in Pascal] • Returns the pointed value • Error if p is null/nil • &y Type *operator&(Type &p) [@y in Pascal] • Returns the address of a value • p = x Type *operator=(Type *p, Type *x) • Pointer assignment
Formal Definition: Pointer • More Operators • p < q bool operator<(Type *p, Type *q) • Returns if pointer p is smaller • ++p Type *operator++(Type *p) [inc(p) in Turbo Pascal] • Point to next element (in an array) • --p Type *operator--(Type *p) [dec(p) in Turbo Pascal] • Point to previous element (in an array) • p + n Type *operator+(Type *p, int n) [not in Turbo Pascal] • Point to nth next element (in an array)
int main() { int a[10]; int *b = &a[1]; *b = 1; b = new int(2); delete b; b = 0; } var a : array[1..10] of integer; b : ^integer; begin b = @a[2]; b^ = 1; new(b); b^ = 2; dispose(b); b = nil; end. Programming Syntax: Pointer
Array • Concept • array<Type, Size : int> • array<Type, Lower : int, Upper : int> • Operations • Type &operator[](Type a[], int index) • Requires 0 <= index < Size • Requires Lower <= index <= Upper • Analysis • a[x] is equivalent to *(a + x) • which is equivalent to (Type *)(@a + x * sizeof(a)) • It is sometimes slower than necessary!
Example: Prime Finding • primes[] stores all primes found primes[0] = 2; for each i for each v in primes[] if (v * v > i) then begin primes.add(n); break; end; if (i mod v = 0) then break;
#include <iostream> using namespace std; int main() { int primes[100], *last = primes; cout << (*last++ = 2) << endl; for (int i = 3; i < 100; ++i) { int *j = primes; do { if (*j * *j > i) { cout << (*last++ = i) << endl; break; } if (i % *j == 0) break; } while (++j < last); } } var primes: array[1..100] of integer; i : integer; last, j: ^integer; begin last := @primes; last^ := 2; inc(last); for i := 3 to 100 do begin j := @primes; repeat if j^ * j^ > i then begin last^ := i; inc(last); writeln(i); break; end; if (i mod j^ = 0) break; inc(j); until j >= last; end; end. Solution
Record • Like Arrays • Identified by names instead of index • Each name is associated with a type • Pair is a special record with 2 elements, Key and Value • Keys are unique (i.e. keys identify records) • Keys are comparable (i.e. sort-able) [sometimes] • Since Value can itself be a record, all records with a unique portion can be represented as a pair)
struct Point { double x, y; }; struct Rect { Point tl, br; int color; }; int main() { Rect rect; rect.color = 255; rect.tl.x = 0.0; } type Point = recordx, y : real;end; Rect = record tl, br : Point; color : integer; end; var rect : Rect; begin rect.color := 255; rect.tl.x := 0.0; with rect do begin color := 255; tl.x := 0.0; end; end. Programming Syntax: Record
Linked List • Combining Pointer and Record • linkedlist<string>: type pNode = ^Node; Node: record value : string; next : pNode; end; var head: pNode;
Linked List • Operations • void Add(linkedlist<Type> p, Type &v) • Add an element to the Linked List • Node *Search(linkedlist<Type> p, Type &v) • Returns null/nil if not found • void InsertAfter(Node node, Type &v) • Insert an element after another • void Remove(Node node) • How to implement? • C++: x->y == (*x).y
Node *list; void Add(int v) { Node *old = list; list = new Node(); list.next = old; list.value = v; } Node *Search(int v) { for (Node *p = list; p; p = p->next) if (p->value == v) return v; return 0; } Node *InsertAfter(Node *n, int v) { Node *old = n.next; n.next = new Node(); n.next.next = old; n.next.value = v; } var list: pNode; procedure Add(v : integer); var old : pNode; old := list; new(list); list.next := old; list.value := v; } function Search(v : integer) : pNode; var n : pNode; begin n := list; while (n <> nil) and (n^.value <> v) do p := p^.next; Search := n; end; { InsertAfter is similar to Add } Linked List Implementation
Array Implementation Add 3 Add 2
Array Implementation Remove 2 Remove 3
Abstraction • Both of the implementations feature the same complexity • O(1) Addition • O(n) Searching • O(1) Insertion • O(1) Removal • Sometimes we don’t care how it gets implemented • We only want a data structure which provides the operations we want. • We define Abstract Data Types (ADTs) to mean a collection of Data Structures providing certain operations • Plane • Polynomial • Graph • We don’t even care how fast the operations in an ADT are, though practically we do
Dictionary (Map, Associative Array) • Dictionary is unordered container of kv-pairs • map<Key, Value> • void Insert(map<Key, Value> &c, Key &key, Value &value) • int Size(map<Key, Value> &c) • Value &Search(map<Key, Value> &list, Key &key) • void Delete(map<Key, Value> &list, Key &key)
List ADT • List ADT is ordered container of kv-pairs • list<Key, Value> • void Insert(list<Key, Value> &c, int pos, Type &value) • Type &Find-ith(list<Key, Value> &c, int pos) • void Delete-ith(list<Key, Value> &c, int pos) • int Size(list<Key, Value>) • Type &Search(list<Key, Value> &c, Key &key) • void Delete(list<Key, Value> &c, Key &key) • … • A List can be implemented by array (Vector/Table), linked list (LinkedList), etc • A List is also a Dictionary
Time Complexity • We seldom remove anyway • There is no way to make both Add/Search fast • In general, it is difficult if we do not depend on features of the Key
Direct Addressing Implementation • Use the Vector ADT • The key is the location • Efficient: O(1) for all operations • Infeasible: if the key can range from 1 to 20000000000, if the key is not numeric ...
Hash Function • Hash Function: hm(k) • Map all keys “by calculation” into an integer domain, e.g. 0 to m ─ 1 • E.g. CRC32 hashes strings into 32-bit integer (i.e. m = 232) • Alan: 1598313570 • Max: 3452409927 • Man: 943766770 • On: 2246271074
Hash Table Implementation • Use a Table<int, Value> ADT of size m • Use hm(Key) as the key • All operations can be done like using Table • Solved except • Collision: What to do if two different k have same h(k) • How to find a suitable hash function • If good hash functions are used, hash tables provide near O(1) insertion, searching and removal • But it is difficult to get it right • And it is not easy to code • C++: hash_map<Key, Value, hash_func> • Read 2003 Advanced Notes on Hash Table if you are motivated enough
Sorted Array is fast for searching But it is slow when inserted at front Idea Store separate arrays If value < v, insert to left array If value >= v, insert to right array Now we have a Data Structure which is Worst Case N / 2 + 1 insertion (N in the past) lg(N) + 1 searching Binary Search Tree Implementation
Binary Search Tree Implementation • Now we have a Data Structure which is • N / 2 + 1 insertion (N in the past) • lg(N) + 1 searching • If we store “N / 2” elements in this DS • N / 4 + 1 insertion • lg(N) searching • If both of left and right arrays use this DS [Recursion] • N / 4 + 2 insertion • lg(N) + 1 searching • Continue this process lg(N) times • lg(N) + 2 insertion • lg(N) + 1 searching • How will it look like?
Binary Search Tree Implementation struct Node { Node *left, right; int *value; }; type pNode = ^Node; Node = record left, right : ^Node; value : int; end; 6 3 8 1 4 7 9 7.5
Introduction to Tree • Node • Root • Leaf / Internal • Parent / Children • [Proper] Ancestors / Descendants • Siblings
Binary Search Tree Implementation • Operations • Searching • If target < current, go to left • If target > current, go to right • Insertion • Search • Insert it there • Removal • If it is leaf, just remove it. • Otherwise, the smallest one larger than it is leaf. Replace! • Worst Case • If input is sorted, the tree will become … • What can we do? • C++: map<Key, Value, comparator>
Recess Have a break!
Stack ADT • Something your compiler has implemented for you. void pow(int x, int n) { if (n == 0) return 1; int v = pow(x, n / 2); if (n % 2 == 0) return v * v; return x * v * v; } • pow(3, 5)→pow(3, 2)→pow(3, 1)→pow(3, 0)
Stack ADT • But • It mandates what to be put in stack • It couples control flow with data flow • So we will still implement our own stack • Last-in-first-out • When do we need this behavior? • Array? • Fast, but fixed size • C++: stack<Type>
int stack[100]; int top = 0; void push(int v) { stack[top++] = v; } int pop() { return stack[--top]; } var stack : array[1..100] of integer; top : integer; procedure push(v : integer); begin inc(top); stack[top] := v; end; function pop : integer; begin pop := stack[top]; dec(top); end; Array Implementation of Stack
Queue ADT • First-in-first-out • When do we need this behavior? • Major use is Breadth First Search in Graph • Array? • Fast, but fixed size • Circular? • C++: queue<Type>
int queue[100]; int head = 0, tail = 0; void enqueue(int v) { queue[tail++] = v; } int dequeue() { return queue[head++]; } var queue : array[1..100] of integer; head, tail : integer; procedure enqueue(v : integer); begin inc(tail); stack[tail] := v; end; function dequeue : integer; begin inc(head); pop := stack[head]; end; Array Implementation of Queue
Priority Queue ADT • PriorityQueue<Priority, Value> • void Push(Priority &p, Value& v) • Add an element • Value &Top() • Returns the element with maximum priority • void Pop() • Remove the element with maximum priority • Again both Array and Linked List can do it suboptimally. A maximum heap can finish Push and Pop in O(lg n) and Top in O(1). • C++: priority_queue<Type, comparator>
Heap • In an array with N elements • We can obtain maximum value of an array in O(1) time if every Add() updates this value. • But removal of it destroys all knowledge and requires N – 1 operations to recalculate. • If we have 2 arrays of N / 2 elements • We only need N / 2 time because only the array with maximum extracted is recalculated. 8 6
Heap 8 7 6 5 5 4 5 3 2 3 2 3 4 1
Heap 8 7 6 5 5 4 5 3 2 3 2 3 4 1
Heap 7 6 5 5 4 5 3 2 3 2 3 4 1
Heap 7 6 5 5 4 5 3 2 3 2 3 4 1
Heap 7 5 6 5 5 4 3 2 3 2 3 4 1
Heap 7 5 6 5 5 4 3 3 2 3 2 4 1
Heap 8 7 6 5 5 4 5 3 2 3 2 3 4 1
Heap 4 7 6 5 5 4 5 3 2 3 2 3 8 1
Heap 7 4 6 5 5 4 5 3 2 3 2 3 8 1
Heap 7 5 6 5 5 4 4 3 2 3 2 3 8 1
Heap • Left Complete Binary Tree • 1 2 3 4 5 6 7 8 91011121314 • [8, 7, 6, 4, 5, 5, 5, 2, 3, 2, 3, 3, 1, 4] • [4, 7, 6, 4, 5, 5, 5, 2, 3, 2, 3, 3, 1] 8 • [7, 4, 6, 4, 5, 5, 5, 2, 3, 2, 3, 3, 1] 8 • [7, 5, 6, 4, 4, 5, 5, 2, 3, 2, 3, 3, 1] 8 • [1, 5, 6, 4, 4, 5, 5, 2, 3, 2, 3, 3] 7, 8 • [6, 5, 1, 4, 4, 5, 5, 2, 3, 2, 3, 3] 7, 8 • [6, 5, 5, 4, 4, 1, 5, 2, 3, 2, 3, 3] 7, 8 • [6, 5, 5, 4, 4, 3, 5, 2, 3, 2, 3, 1] 7, 8