210 likes | 371 Views
The Set ADT List and Tree Implementations. CSCI 2720 Spring 2007 Eileen Kraemer. The Set ADT. We assume: Members drawn from a single universe Sets are finite; universe may be infinite No duplicate members
E N D
The Set ADTList and Tree Implementations CSCI 2720 Spring 2007 Eileen Kraemer
The Set ADT • We assume: • Members drawn from a single universe • Sets are finite; universe may be infinite • No duplicate members • But some representations can be used to represent multisets (which may contain duplicate members) • Note: ADT = abstract data type • Interesting because “is x in S?” is asked in many applications.
The Set ADT • Member(x,S) • If x in S return true else return false • Union(S,T) • return S T, the set consisting of every x in S or T or both • Intersection(S,T) • return S T, the set consisting of every x in both S and T • Difference (S,T) • return S – T, the set of all x in S that are not in T • MakeEmptySet() • Return the empty set
The Set ADT • IsEmptySet(S) • return true if S is the empty set, else return false • Size(S) • return |S| (“cardinality of S”), the number of elements in the set S • Insert(x,S) • set S to S {x} ; no effect if x is already in S • Delete(x,S) • set S to S – {x}; no effect if x not in S • Equal(S,T) • Return true if S == T (if S and T have the same members) • Iterate(S,F) • perform operation F on each member of set S (in no particular order)
Additional set operations …. • Apply only if members are linearly ordered • Min(S) – return smallest member of set S • Max(S) – return largest member of set S • Applies only if members are of form <K,I> • Where K is a key drawn from some key space and I is some associated info • Lookup(K,S) • Given a key value K, return an Info I such that <K,I> is a member of S; if no such member exists return • If a <K,I> is found : successful search • If is returned : unsuccessful search
The Dictionary ADT • A set ADT with the operations: • MakeEmptySet • IsEmptySet • Insert • Delete • Lookup
Unordered Lists • Simplest representation of sets • May use contiguous memory, singly linked, doubly linked • Applies to sets drawn from any universe • Keys may be ordered or unordered
Unordered Lists • Lookup • Simple sequential search; • At each step, compare two keys for equality • If dictionary contains n elements, then Lookup is Θ(n) in the worst case if the lists are unordered • Insert = Lookup + insertion cost • Delete = Lookup + deletion cost
Time to perform Lookup • Count the number of comparisons: • How many (K == K’) operations are performed? • Linked list: • Worst case for successful search: • K is last key in list: n comparisons • Worst case for unsuccessful search: • Compare to all keys in list: n comparisons • Seems “okay” only if list is short
Expected-cost analysis • Here, we consider the frequency and cost of each lookup and compute a weighted sum: the expected cost
Assume:Uniform distribution of access to keys • Expected cost for linked lists: • = average number of comparisons to lookup a key • = (1+2+3+…+n)/n • = (n+1)/2 • Thus: • Expected cost for a lookup in linked list is Θ(n)
Assume:non-uniform access to keys • Closer to reality: relatively few keys account for most of the LookUps • Let the keys in the dictionary be K1, …, Kn, • ordered by frequency of access • (K1 accessed most frequently) • Let pi be frequency of access to key Ki • p1 + p2 + … + pn = 1.0 • For successful searches: • Expected cost = (p1 *1)+ (p2 *2)+ … + (pn *n) • For unsuccessful searches: • Cost is always θ(n)
What is the optimal ordering? • A list kept in frequency order: • p1 >= p2 >= … >= pn • If the frequencies are sufficiently skewed, this can be much less than (n+1)/2 • Example: • p1 = ½, p2 = ¼, p3= ⅛, …etc. • Expected cost is : • (1* ½ + 2* ¼ + 3* ⅛+ …..n* 2-n+1)/n • less than 2 • If p=4, then cost = (1* ½ + 2* ¼ + 3* ⅛+4*⅛) = 15/32
But you usually don’t know the frequency of access in advance … • However, you can adjust the order of the list as the Lookups occur: • Move-to-front Heuristic: • Transpose Heuristic:
Move-to-front Heuristic: • After each successful search, move the item that was sought to the front of the list • Cost is no more than twice the optimum • (proof in text) • Example: {A, B, C, D} • Search on C: cost=3, adjust to {C, A, B, D} • Search on D: cost=4, adjust to {D, C, A, B} • Search on C: cost=2, adjust to {C, D, A, B} • Search on D: cost=1, adjust to {D, C, A, B}
Transpose Heuristic: • If the item sought is not the first in the list, move it one position forward by exchanging it with the item just before it • Performance (after steady state reached) is even better than Move-to-Front (but harder to prove) • Example: {A, B, C, D} • Search on C: cost=3, adjust to {A, C, B, D} • Search on D: cost=4, adjust to {A, C, D, B} • Search on C: cost=2, adjust to {C, A, D, B} • Search on A, cost=2, adjust to {A, C, D, B}
Ordered Lists • If the keys have some linear order, can maintain list in order by key value • Benefits: • Unsuccessful search can stop when key of greater value is encountered: expected number of comparisons reduced to n/2
Ordered Lists • If representation is tabular (contigous memory, “array-based”), can employ binary search • Binary search (worst case) • between 1 and floor(lg n) + 1 comparisons in a successful search of a table of size n, • either floor(lg n) or floor(lg n) + 1 comparisons in an unsuccessful search • Floor(lg n) + 1 exactly, if n is one less than a power of 2
BinarySearchLookup Function BinarySearchLookUp(key K, table T[0..n-1]):info {return information stored with key K in T, or if K is not in T} Left = 0 Right = n-1 Repeat forever if Right < Left then return else Middle = floor((Left+Right)/2) if (K == Key(T[Middle])) then return Info(T[Middle]) else if K < Key(T[Middle])then Right <= Middle -1 else Left = Middle + 1
Binary Search Example • Choose an array of 16 values (have class provide values) • Work through example on board.
Interpolation Search function InterpolationSearchLookup(key K, table T[0..n-1]): info //return info stored with K in ordered table T, or if K not present //table positions T[-1] and T[n] are assumed to be available to the alg Key(T[-1]) = -1 Key(T[n]) = N Left = 0 Right = n-1 repeat forever if Right < Left return else p = (K – Key(T[Left-1])) / (Key(T[Right+1]) – Key(T[Left-1])) middle = floor(p * (right-left+1)) + left if K == Key(T[middle]) return Info(T[middle]) else if K < Key(T[Middle]) then Right = Middle -1 else Left = Middle + 1