600 likes | 633 Views
Explore dynamic sets, basic techniques, and diverse data structures in C# for efficient algorithm design and performance evaluation. Learn about arrays, queues, trees, and more.
E N D
CSE 5350/7350 Introduction to Algorithms Data StructuresSpecification and Implementation Textbook readings: Cormen: Part III, Chapters 10-14 Mihaela Iridon, Ph.D. mihaela@engr.smu.edu Data Structures
Objectives • Understand what dynamic sets are • Learn basic techniques for • Representing & • Manipulating finite dynamic set • Elementary Data Structures • Stacks, queues, heaps, linked lists • More Complex Data Structures • Hash tables, binary search trees • Data Structures in C#.NET 2.0 Data Structures
High-Level Structure (1) • Arrays • System.Collections.ArrayList • System.Collections.Generic.List • Queue • System.Collections.Generic.Queue • Stack • System.Collections.Generic.Stack Data Structures
High-Level Structure (2) • Hashtable • System.Collections.Hashtable • System.Collections.Generic.Dictionary • Trees • Binary Trees, BST, Self-Balancing BST • Linked Lists • System.Collections.Generic.LinkedList • Graphs Data Structures
Dynamic Data Sets Definition Why dynamic General examples Data structures and the .NET framework “An Extensive Examination of Data Structures Using C# 2.0” – Scott Mitchell http://msdn2.microsoft.com/en-us/library/ms364091(VS.80).aspx Data Structures
Data Structure Design Impact on efficiency/running time The data structure used by an algorithm can greatly affect the algorithm's performance Important to have rigorous method by which to compare the efficiency of various data structures Data Structures
Example: file extension search public bool DoesExtensionExist(string [] fileNames, string extension) { int i = 0; for (i = 0; i < fileNames.Length; i++) if (String.Compare(Path.GetExtension(fileNames[i]), extension, true) == 0) return true; return false; // If we reach here, we didn't find the extension } } Search is of O(n) Data Structures
The Array Linear Simple Direct Access Homogeneous Most widely used Data Structures
The Array (2) The contents of an array are stored in contiguous memory. All of the elements of an array must be of the same type or of a derived type; hence arrays are referred to as homogeneous data structures. Array elements can be directly accessed. With arrays if you know you want to access the ith element, you can simply use one line of code: arrayName[i]. Data Structures
Array Operations • Allocation • Accessing • Declaring an array in C#: string[] myArray; (initially myArray reference is null) • Creating an array in C#: myArray = new string[5]; Data Structures
Array Allocation • string[] myArray = new string[someIntegerSize]; • this allocates a contiguous block of memory on the heap (CLR-managed) Data Structures
Array Accessing • Accessing an element at index i: O(1) • Searching through and array • Unsorted: O(n) • Sorted: O(log n) • Array class: static method: • Array.BinarySearch(Array input, object val) Data Structures
Array Resizing • When the size needs to change: • Must create a new array instance • Copy old array into new array: Array1.CopyTo(Array2, 0) • Time consuming • Also, inserting into an array is problematic Data Structures
Multi-Dimensional Arrays • Rectangular • n x n • n x n x n x … • Accessing: O(1) • Searching: O(nk) • Jagged/Ragged • n1 x n2 x n3 x … Data Structures
Goals Type-safe Performant Reusable Example: payroll application Data Structures
System.Collections.ArrayList Can hold any data type: (hybrid) Internally: array object Automatic resizing Not type safe: casting errors detected only at runtime Boxing/unboxing: extra-level of indirection affects performance Loose homogeneity Data Structures
Generics • Remedy for Typing and Performance • Type-safe collections • Reusability • Example: public class MyTypeSafeList<T> { T[] innerArray = new T[0]; } Data Structures
List • Homogeneous • Self-Re-dimensioning Array • System.Collections.Generic.List List<string> studentNames = new List<string>(); studentNames.Add(“John”); … string name = studentNames[3]; studentNames[2] = “Mike”; Data Structures
List Methods • Contains() • IndexOf() • BinarySearch() • Find() • FindAll() • Sort() • Asymptotic Running Time: same as array but with extra overhead Data Structures
Ordered Requests Processing First-come, First-serve (FIFO) Priority-based processing Inefficient to use List<T> List will continue to grow (internally, the size is doubled every time) Solution: circular list/array Problem: initial size?? Data Structures
Queue • System.Collections.Generic.Queue • Operations: • Enqueue() • Dequeue() • Contains() • ToArray() • Peek() • Does not allow random access • Type-safe; maximizes space utilization Data Structures
Queue (continued) • Applications: • Web servers • Print queues • Rate of growth: • Specified in the constructor • Default: double initial size Data Structures
Stack • LIFO • System.Collections.Generic.Stack • Operations: • Push() • Pop() • Doubles in size when more space is needed • Applications: • CLR call stack (functions invocation) Data Structures
Limitations of Ordinal Indexing • Ideal access time: O(1) • If index is unknown • O(n) if not sorted • O(log n) if sorted • Example: SSN: 10 ^ 9 possible combinations • Solution: compress the ordinal indexing domain with a hash function; e.g. use only 4 digits Data Structures
Hash Table • Hashing: • Math transformation of one representation into another representation • Hash table: • The array that uses hashing to compress the indexers space • Cryptography (information security) • Hash function: • Non-injective (not a one-to-one function) • “Fingerprint” of initial data Data Structures
Goals • Fast access of items in large amounts of data • Few collisions as possible • collision avoidance • Avalanche effect: • Minor changes to input major changes to output Data Structures
Collision Resolution (1) • Probability to map to a given location: 1/k (k = size = number of slots) • (1) Linear Probing Is H[i] empty? • YES: place item at location I • NO: i = i + 1; repeat • Deficiency: clustering • Access and Insertion: no longer O(1) Data Structures
Collision Resolution (2) • (2) Quadratic Probing • Check s + 12 • Check s – 12 • Check s + 22 • Check s – 22 • … • Check s +/- i2 • Clustering a problem as well Data Structures
Collision Resolution (3) • (3) Rehashing – used by Hashtable (C#) • System.Collections.Hashtable • Operations: • Add(key, item) • ContainsKey() • Keys() • ContainsValue() • Values() • Key, Value: any type not type safe Data Structures
Hashtable Data Type – Example using System; using System.Collections; public class HashtableDemo { private static Hashtable employees = new Hashtable(); public static void Main() { // Add some values to the Hashtable, indexed by a string key employees.Add("111-22-3333", "Scott"); employees.Add("222-33-4444", "Sam"); employees.Add("333-44-55555", "Jisun"); // Access a particular key if (employees.ContainsKey("111-22-3333")) { string empName = (string) employees["111-22-3333"]; Console.WriteLine("Employee 111-22-3333's name is: " + empName); } else Console.WriteLine("Employee 111-22-3333 is not in the hash table..."); } } Data Structures
Hashtable • Key = any type • Key is transformed into an index via GetHashCode() function • Object class defines GetHashCode() • H(key) = [GetHash(key) + 1 + (((GetHash(key) >> 5) + 1) % (hashsize – 1))] % hashsize Values = 0 .. hashsize-1 Data Structures
Collision Resolution (3 – cont’d) • Rehashing = double hashing • Set of hash functions: H1, H2, …, Hn • Hk(key) = [GetHash(key) + k * (1 + (((GetHash(key) >> 5) + 1) % (hashsize – 1)))] % hashsize • Hashsize must be PRIME Data Structures
Hashtable • Load Factor = MAX ( # items / # slots) • Optimal: 0.72 • Expanding the hashtable: 2 steps: (costly) • Double # slots (crt prime next prime which is about twice bigger) • Rehash • High LoadFactor Dense Hashtable • Less space • More probes on collision (1/(1-LF)) • If LF = 0.72 expected # probes = 3.5 O(1) Data Structures
Hashtable • Costly to expand • Set the size in constructor if size is known • Asymptotic running times: • Access: O(1) • Add, Remove: O(1) • Search: O(1) Data Structures
System.Collections.Generic.Dictionary • Typesafe • Strongly typed KEYS + VALUES • Operations: • Add(key, value) • ContainsKey(key) • Collision Resolution: CHAINING • Uses linked lists from an entry where collision occurs Data Structures
Chaining in Dictionary Data Type Data Structures
Dictionary Example Dictionary<keyType, valueType> variableName = new Dictionary<keyType, valueType>(); Dictionary<int, Employee> employeeData = new Dictionary<int, Employee>(); // Add some employees employeeData.Add(455110189) = new Employee("Scott Mitchell"); employeeData.Add(455110191) = new Employee("Jisun Lee"); ... // See if employee with SSN 123-45-6789 works here if (employeeData.ContainsKey(123456789)) ... Data Structures
Chaining in the Dictionary type • Efficiency: • Add: O(1) • Remove: O (n/m) • Search: O(n/m) Where: n = hash table size m = number of buckets/slots • Implemented s.t. n=m at ALL times • The total # of chained elements can never exceed the number of buckets Data Structures
Trees • = set of linked nodes where no cycle exists • (GT) a connected acyclic graph • Nodes: • Root • Leaf • Internal • |E| = ? • Forrest = { trees } Data Structures
Popular Tree-Type Data Structures • BST: Binary Search Tree • Heap • Self-balancing binary search trees • AVL • Red-black • Radix tree • … Data Structures
Binary Trees • Code example for defining a tree data object • Tree Traversal • In-order: L Ro R • Pre-order: Ro L R • Post-order: L R Ro • Ө(n) Data Structures
Binary Tree Data Structure Data Structures
Tree Operations • Search: Recursive: O(h) • h = height of the tree • Max & Min Search: search right/left • Successor & Predecessor Search • Insertion (easy: always add a new leaf) & Deletion (more complicated as it may cause the tree structure to change) • Running time: • function of the tree topology Data Structures
Binary Search Tree • Improves the search time (and lookup time) over the binary tree in general • BST property: • for any node n, every descendant node's value in the left subtree of n is less than the value of n, and every descendant node's value in the right subtree is greater than the value of n Data Structures
Non-BST vs BST • Non-BST • BST Data Structures
Linear Search Time in BST The search time for a BST depends upon its topology. Data Structures
BST continued • Perfectly balanced BST: • Search: O(log n) [ height = log n] • Sub-linear search running time • Balanced Binary Tree: • Exhibits a good ration: breadth/width • Self-balancing trees Data Structures
The Heap • Specialized tree-based data structure that satisfies the heap property: if B is a child node of A, then key(A) ≥ key(B). [max-heap] • Operations: • delete-max or delete-min: removing the root node of a max- or min-heap, respectively • increase-key or decrease-key: updating a key within a max- or min-heap, respectively • insert: adding a new key to the heap • merge: joining two heaps to form a valid new heap containing all the elements of both Data Structures
Max Heap Example Example of max-heap: Data Structures
Linked Lists • No resizing necessary • Search: O(n) • Insertion • O(1) if unsorted • O(n) is sorted • Access: O(n) • System.Collections.Generic.LinkedList • Doubly-linked; type safe (value Generics) • Element: LinkedListNode Data Structures