1 / 42

Algorithms and Data Structures

Algorithms and Data Structures. Lecture 6. Agenda:. Hash tables Collisions Hash functions Binary heap. Data Structures: hash tables. U – set of possible keys K – set of used keys, K is a subset of U T- hash table In case of Direct Address Table |U|=|T| (|U|=|DAT|)

Download Presentation

Algorithms and Data Structures

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Algorithms and Data Structures Lecture 6

  2. Agenda: • Hash tables • Collisions • Hash functions • Binary heap

  3. Data Structures: hash tables • U – set of possible keys • K – set of used keys, K is a subset of U • T- hash table • In case of Direct Address Table |U|=|T| (|U|=|DAT|) • Construction of DAT becomes memory consuming or even impossible if |U| is a quite large number • If |K| is much less than |U| - significant part of DAT is unused

  4. Data Structures: hash tables - sample • E.g. U is a set of 16 bit integers and K = { x | x ∊ U and x < 1024 } • |U|=216, |K|=210 then 216 - 210 = 64512 slots are unused, assuming slot occupies 4 bytes we get 258048 bytes = 252K of allocated but unused memory

  5. Data Structures: hash tables • It is reasonable constraining cardinality of T (let it be m) to be close to |K| - that is the idea behind hash table • If k is a key from U, position in DAT is determined by k; in other words, form viewpoint of DAT, k is a position in DAT • Position in T is determined by some function h(k) • Where h is a hash function, which calculates a position in T dependently on value of a key k

  6. Data Structures: hash tables • h : U -> { 0,1,… m }, so the domain of function h is a set of possible keys, domain of function values is a set of positions in T • Value of function h(k) is also named as a hash value • Function f : X ->Y is a single-valued function  for any x1∊ X and x2∊ X values of f(x1) and f(x2) are not equal while x1≠x2; otherwise function f is nonsingle-valued

  7. Data Structures: hash tables • In case if m (cardinality of T) is less than |U| function h(k) is a nonsingle-valued function; it means that may exist two different keys k1,k2 and h(k1)=h(k2) • If hash values calculated on different keys are equal, we say that there is a collision in hash table • In case of DAT: |U|=m, h(k)=k is a single valued function and therefore there are no collisions in DAT

  8. Data Structures: hash tables

  9. Data Structures: hash tables

  10. Data Structures: collisions • It is desirable constructing a hash function so the collisions would be less probable • Any hash function must always produce the same value for any number of subsequent calls with the same input • There is a number of collision resolution methods available: chaining, open addressing and others

  11. Data Structures: collisions – chaining • Each element of a hash table has an associated linked list of elements representing keys with the same hash value

  12. Data Structures: collisions – chaining • Let n is a number of elements in a hash table T (with chains) and m is a cardinality of a hash table T (number of positions in table) • α = n/m is load factor of a hash table; α∊ [1/m; n] • E.g. load factor of any DAT is always between 0 and 1 as n may not exceed m • Load factor of arbitrary hash table T may have value between 0 and n

  13. Data Structures: collisions – chaining • Let’s consider time characteristics of n element- and m slot- hash table (with chains) • In worst case (hash function is constructed improperly) all the keys may have the same hash value; it means that all n elements will be organized into the list • Time characteristics of operations are similar to the characteristics of list operations: search is Θ(n), add and delete are O(1)

  14. Data Structures: collisions – chaining • In best case (hash function is constructed properly) we assume that hash values are distributed uniformly – hypothesis of simple uniform hashing • Let’s consider search operation: while evaluating search operation it is desirable considering two cases: (a) unsuccessful result (there are no elements with given key in the table T) and (b) successful result

  15. Data Structures: collisions – chaining • Theorem 1: Given a hash table (with chains) T, which load factor is α and hypothesis of simple uniform hashing is true. Then during unsuccessful search operation (1) α elements will be visited in average and (2) average time (including calculation of hash value ) will be Θ(1+α). • (1) Taking into account assumption of hypothesis all the positions of T are equiprobable for the given key. Therefore in order to perform unsuccessful search we have to look through one of the m lists. Average length of a list is n/m=α. Hereby statement (1) is proved.

  16. Data Structures: collisions – chaining • (2) From (1) we can state that average time needed to look through α elements (list) is Θ(α); time needed to calculate hash value is Θ(1). Thus average time needed to accomplish unsuccessful search operation is Θ(1+α). Thereby statement (2) is proved. • Theorem 2 (add on to theorem 1): Average time needed for successful search operation in table T is Θ(1+α). • Average search time is a sum of times needed to find each element of the table T divided by number of elements.

  17. Data Structures: collisions – chaining • Let’s consider arbitrary list (of m lists); time needed to find i-th element of a list is Θ(1+(i-1)/m) • Thus average time is a sum of all times divided by number of elements. • 1/n ∑ [1+(i-1)/m], i=1, …, n • 1/n ∑ [1+(i-1)/m]=1/n [n+ 1/m∑(i-1)] • 1+1/nm ∑(i-1)=1+1/nm (n-1)n/2= • 1+α(n-1)/2n=1 + α/2 – 1/(2m) • Θ(1+α). Thereby theorem is proved.

  18. Data Structures: collisions – chaining • Let’s assume that m is proportional to n; it means that n=mc, where c – some constant. Therefore n = O(m) (by the definition of O). In other hand α=n/m=O(m)/m=O(1) and O(α+1)=O(1) • Statement: If growth of m and n are proportional and hypothesis of simple uniform hashing is true then average search time does not depend on n and is always O(1). • Other operations: add is O(1), delete is O(1)

  19. Data Structures: hash functions • Good hash function must comply with assumption of uniform hashing: for key k all m hash values must be equally probable; where m is a cardinality of a hash table T • It is usually assumed that domain of a hash function (set of keys) is a set of natural numbers • If keys are not natural numbers they usually may be transformed to the required form, even if keys are strings and etc.

  20. Data Structures: hash functions • E.g. if keys are two letter strings they may be converted either to natural number or to a pair of natural numbers • “pt” –> pair <112, 116>, where 112 and 116 ASCII codes of “p” and “t” correspondingly • “pt” -> <14452> , 14452 =112*128 + 116 in base-128 system (ASCII value of a standard character may not be greater than 128) • If string may contain non-standard characters, we have to deal with base-256 system: 112*256+116=28788

  21. Data Structures: hash functions - construction • Division method: for any key k corresponding hash value is a remainder of division k by m (cardinality of T); h(k) = k mod m • E.g. m=12, k=100, h(100)=4 • In order to construct good hash function (complying with assumption of uniform hashing) value of m must be chosen carefully dependently on set of keys • Counter-example: U={ x | 2n, n – natural number}, U={1, 2, 4, 8, 16, 32 …}; let m = 32 = 25 • Function h(k) = k mod 32 does not provide uniform hashing, if k >= 32 h(k)=0

  22. Data Structures: hash functions - construction • Multiplication method: for any key k corresponding hash value is calculated by h(k) = lmax[ m ( kA mod 1 ) ], where A some constant, 0<A<1 • lmax(x) is a function that returns maximal natural number that is less or equal to x (x may be any positive number) • Method is less dependant on chosen value of m

  23. Data Structures: hash table - sample

  24. Data Structures: hash table - sample

  25. Data Structures: hash table - sample

  26. Data Structures: hash table - sample

  27. Data Structures: hash table - sample

  28. Data Structures: hash table - sample

  29. Data Structures: hash table - sample

  30. Data Structures: binary heap • Binary heap is an array of elements that may be organized to a binary tree by the following rules: • 1st element of array is root of a tree • If node has index j, left and right child nodes (if any) have indexes 2j and 2j+1 correspondingly, parent node (if any) has index equal to integral part of j/2 • Heap may not occupy the whole array • Size of the heap is less or equal to the size of the array it occupies

  31. Data Structures: binary heap • The main property of the heap: any child element is always less or equal to the parent one

  32. Data Structures: binary heap-sample

  33. Data Structures: binary heap-sample

  34. Data Structures: binary heap-sample

  35. Data Structures: binary heap-sample

  36. Data Structures: binary heap-sample

  37. Data Structures: binary heap-sample

  38. Data Structures: binary heap-sample

  39. Data Structures: binary heap-sample

  40. Data Structures: binary heap-sample

  41. Data Structures: binary heap • Heapify is O(logn) • Left, Right and Parent are O(1) • Buildheap is O(n) • Heapsort is O(n logn)

  42. Q&A

More Related