1 / 57

Session 11: Data Structures and Collections

Session 11: Data Structures and Collections. Lists ( Array based, linked) Sorting and Searching Hashing Trees System.Collections.Generic. Lists. A data structure where elements are organised by position (index). ArrayList ( List ) and LinkedList Sometimes lists are called sequences .

tien
Download Presentation

Session 11: Data Structures and Collections

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Session 11: Data Structures and Collections Lists (Array based, linked) Sorting and Searching Hashing Trees System.Collections.Generic UCN T&B: IT Technology

  2. Lists • A data structure where elements are organised by position (index). • ArrayList (List) and LinkedList • Sometimes lists are called sequences. Each element has a reference to the next element. Hence elements may be allocated at different memory locations. One fixed size segment in memory. numList UCN T&B: IT Technology

  3. ArrayList • Array-based: • Fixed size (statically allocated). • Always occupies maximum memory. • May grow or shrink dynamically, but that requires halting the application and allocation of a new array. • Direct access to elements by position (index), otherwise searching is required. • Inserting and deleting in the middle of the list requires moving (many) elements. UCN T&B: IT Technology

  4. Linked Lists (LinkedList) • A linked list consists of nodes representing elements. • Each node contains a value (or value reference) and a reference (pointer) to the next element: UCN T&B: IT Technology

  5. Linked Lists (LinkedList) • The list it self is represented by a reference to the first element, often called head • The next-reference of the last element is usually null • The linked list is dynamic in size: it grows and shrinks as needed. • Access by position is slow (may require traversing the hole list). • See this Java Example. UCN T&B: IT Technology

  6. Figure 4.1 a) A linked list of integers; b) insertion; c) deletion UCN T&B: IT Technology

  7. Implementation Class Node private class Node { private object val; private Node next; public Node(object v, Node n) { val= v; next= n; } public object Val { get{return val;} set{val= value;} } public Node Next { get{returnnext;} set{next= value;} } } UCN T&B: IT Technology

  8. Linked Implementation of ADT list class LinkedList { private class Node //… Node head,tail; int n;//number of elements public LinkedList() { head= null; tail= null; n= 0; } public int Count { get { return n; } } public voidAddFront(object o) { Node tmp = new Node(o, null); if (Count == 0)//list is empty tail = tmp; else tmp.Next = head; head = tmp; n++; } UCN T&B: IT Technology

  9. Traversing a Linked List public void Print() {//for debugging... Node p = head;//start of list while (p != null) //while not end of list { Console.WriteLine(p.Val); //print currentvalue p = p.Next; //set p to next element of the list } } head p tail UCN T&B: IT Technology

  10. Finding a Position in a Linked List public intFindPos(object o) { //Returns the position of o in the list (counting from 0). //If o is not contained, -1 is return. boolfound = false; int i = 0; Node p = head; while (!found && p != null){ if (p.Val.Equals(o)) found = true; else{ p = p.Next; i++; } } if (found) return i; else return -1; } UCN T&B: IT Technology

  11. Dynamic vs. Static Data Structures • Array-Based Lists: • Fixed (static) size (waste of memory). • May be able to grown and shrink (ArrayList), but this is very expensive in running time (O(n)) • Provides direct access to elements from index (O(1)) • May be sorted. Hence binary search gives fast access (O(log n)) • Linked List Implementations: • Uses only the necessary space (grows and shrinks as needed). • Overhead to references and memory allocation • Only sequential access: access by index requires searching (expensive: O(n)) numList UCN T&B: IT Technology

  12. Using a tail-reference Linked List - Variants UCN T&B: IT Technology

  13. Using a dummy head node UCN T&B: IT Technology

  14. Circular UCN T&B: IT Technology

  15. Doubly Linked List UCN T&B: IT Technology

  16. …operations become more complicated … UCN T&B: IT Technology

  17. The Full Monty…. (LinkedList) UCN T&B: IT Technology

  18. Search Trees:Dynamic Data Structures with Fast Search • Binary Trees • Binary Search Trees • General Trees (Composite Pattern) • Balanced Search Trees (2-3 Trees etc.) • B- Trees (external, database index) UCN T&B: IT Technology

  19. Terminology • General trees: • leaf/external node/terminal • root • internal node • siblings, children, parents, ancestors, descendents • sub trees • the depth or height of a node = number of ancestors • the depth or height of a tree = max depth/height for any leaf UCN T&B: IT Technology

  20. Binary Trees • A binary tree can be defined recursively by • Either the tree is empty • Or the tree is composed by a root with left and right sub trees, which are binary trees themselves • Note: contrary to general trees binary trees • have ordered sub trees (left and right) • may be empty UCN T&B: IT Technology

  21. Reference Based Implementation UCN T&B: IT Technology

  22. Figure 10.9Traversals of a binary tree: a) preorder; b) inorder; c) postorder UCN T&B: IT Technology

  23. Binary Search Trees • Value based container: • The search tree property: • For any internal node: the value in the root is greater than the value in the left child • For any internal node: the value in the root is less than the value in the right child • Note the recursive nature of this definition: • It implies that all sub trees themselves are search trees • Every operation must ensure that the search tree property is maintained (invariant) UCN T&B: IT Technology

  24. Example:A Binary Search Tree Holding Names UCN T&B: IT Technology

  25. Values are inserted in sorted order Balance Problems (skewed tree): UCN T&B: IT Technology

  26. InOrder:Traversal Visits Nodes in Sorted Order UCN T&B: IT Technology

  27. insert retrieve delete All depends on the depth of the tree If insertions and deletions are uniformly distributed, then the tree will eventually grow skewed O(log n) / O(n) O(log n) / O(n) O(log n) / O(n) Efficiency UCN T&B: IT Technology

  28. Solution:Balanced Search Trees • Trading time for space: • In worst case additional space in O(n) is required; but: • retrieve, insert and delete in O(log n) – also w.c.. • Principle: • A node may hold several keys (n) and has several children (n+1) • A node must be at least half filled (n/2 keys) • Insert and delete can be performed, so the tree is kept balanced in O(logn) 2-3-tree: k = 2 UCN T&B: IT Technology

  29. 2-3-Trees (n=2) UCN T&B: IT Technology

  30. Retrieve • Search using the same principle as in binary search trees: • Search the root • If not found, the search recursively in the appropriate sub tree • Performance is proportional to the height of the tree • Since the tree is balanced: O(log n) UCN T&B: IT Technology

  31. Insertion • The insert algorithm must ensure that the 2-3-tree properties are conserved. It goes like this: • Search down through the tree to the appropriate leaf node and insert • If there is room in the leaf, then we are done • Otherwise split the leaf node into two new leafs and move the middle value up into the parent node • If there is no room in the parent, then continue recursive until a node with room is reached, or • Eventually the root is reached. If there is no room in the root, then a new root is created, and the height of the tree is increased • Performance depends on the height of the tree (searching down through the tree + in worst case a trip from the leaf to the root rebalancing on the way up) • That is: O(log n) UCN T&B: IT Technology

  32. Inserting 39 (there is room) UCN T&B: IT Technology

  33. Inserting 38 (there is no room in the leaf) • Insert any way, • Split leaf and • Move middle value up UCN T&B: IT Technology

  34. Inserting 37 (there is room) UCN T&B: IT Technology

  35. Inserting 36 (there is no room) Split and move up Split and move up UCN T&B: IT Technology

  36. Inserting 35 , 34 and 33 (there is room) UCN T&B: IT Technology

  37. Deletion • Like insertion – just the other way around:-) • find the node with the value to be deleted • If this is not a leaf, the swap with its inorder successor (which is always a leaf - why?), and remove the value • If there now is too few values (< n/2) in the leaf, then merge the node with a sibling and pull down a value from the parent node • If there now is too few values in the parent, then continue recursively until there are enough values or the root is reached • If the root becomes empty, the remove it and the height of the tree is decreased • Performance: once again: down and up through the tree : O(log n) UCN T&B: IT Technology

  38. Balanced Search Trees • Variants: • 2-3-trees • 2-3-4-trees • Red-Black-trees • AVL-trees • Splay-trees…. • Is among other used for realisation of the map/dictionary/table ADT • In Java.Collections: TreeMap and TreeSet UCN T&B: IT Technology

  39. Keys are converted to indices in an array. A hash function, h maps a key to an integer, the hash code. The hash code is divided by the array size and the remainder is used as index If two or more keys gives the same index, we have a collision. An Alternative to Sorting and Searching:Hashing UCN T&B: IT Technology

  40. Collision Handling • Avoiding collisions: • Use a prime as the size of the array: • Trying to store keys with hash codes 200, 205, 210, 215, 220,.., 595 in an array of size 100 yields three collisions for each key. • But an array with size 101 results in no collision. • Choose a good hash function: • this is a (mathematical) discipline of its own UCN T&B: IT Technology

  41. Probing is searching for a near by free slot in the array. Probing may be: Linear(h(x)+1, +2, +3, +4,…) Quadratic(h(x)+1, +2, +4, +8,…) Double hashing … Collision Handling UCN T&B: IT Technology

  42. Chaining • The array doesn’t hold the element itself, but a reference to a collection (a linked list for instance) of all colliding elements. • On search that list must be traversed UCN T&B: IT Technology

  43. Efficiency of Hashing • Worst case (maximum collisions): • retrieve, insert, delete all O(n) • Average number of collisions depends on the load factor, λ, not on table size λ = (number of used entries)/(table size) • But not on n. • Typically (linear probing): numberOfCollisionsavg = 1/(1 - λ) • Example: 75% of the table entries in use: • λ = 0.75: 1/(1-0.75) = 4 collisions in average (independent of the table size). UCN T&B: IT Technology

  44. When Hashing Is Inefficient • Traversing in key order. • Find smallest/largest key. • Range-search (Find all keys between high and low). • Searching on something else than the designated primary key. See this Java Example UCN T&B: IT Technology

  45. .NET 2:System.Collections.Generics (key, value) -pair ICollection<T> IList<T> LinkedList<T> IDictionary<TKey, TValue> List<T> SortedDictionary<TKey, TValue> Dictionary <TKey, TValue> Index able Array-based Balanced search tree Hashtabel UCN T&B: IT Technology

  46. Learning Goals Read and write (use) specifications Select and use ADT, i.e.: Dictionary Data Structures and Algorithms Application ADT class: Dictionary SortedDictionary ---- class Appl{ ---- IDictionary d; ----- m= new XXXDictionary(); interface: (i.e. Dictionary) Specification Knowledge of. Select and use data structure, i.e. SortedDictionary UCN T&B: IT Technology

  47. Exercises • Consider some of our programmes (Banking, Forest, AndersenAndAsp, for instance). • Would it be better to use some other collection instead of List? • Try to chance the implementation in one or more of your programs, so, for instance a hash table is used. • Implement InsertAt(int index, object element) and RemoveAt(int index) on the linked list. UCN T&B: IT Technology

  48. Time Complexity – Big-”O” • Investigation of the use of time and/or space of an algorithm • Normally one looks at • Worst-case (easer to determine) • Only growth rates – not exact measures • Counts the number of some “basic operations” (a computation, a comparison of to elements etc.). UCN T&B: IT Technology

  49. Big-O notation: • The complexity of an algorithm is notated with “Big-O” • O(f(n)), n is the size of the problem (number of input elements, for instance), f is a function that indicates the efficiency of the algorithm, for instance n (the running time is linear in problem size) • Big-O: is asymptotic (only holds for large values of n) • Big-O: only regards most significant term • Big-O: ignores constants UCN T&B: IT Technology

  50. Examples public int sum (int a, b) { int sum; sum = a + b; return sum; } What is the basic operation? O(1) What is the basic operation? public int sum (int[] a) { int sum= 0; for(int i= 0; i<a.length; i++) sum= sum+a[i]; return sum; } O(n) UCN T&B: IT Technology

More Related