330 likes | 595 Views
Efficient and Effective Practical Algorithms for the Set-Covering Problem. Qi Yang, Jamie McPeek, Adam Nofsinger Department of Computer Science and Software Engineering University of Wisconsin at Platteville. The Set-Covering Problem. Given N sets, let X be the union of all the sets.
E N D
Efficient and Effective Practical Algorithms forthe Set-Covering Problem Qi Yang, Jamie McPeek, Adam Nofsinger Department of Computer Science and Software Engineering University of Wisconsin at Platteville
The Set-Covering Problem • Given N sets, let X be the union of all the sets. A cover of X is a group of sets from the N sets such that every element of X belongs to a set in the group. • The set-covering problem is to find a cover of X of the minimum size.
Matrix Representation of the Set-covering Problem Number of sets: N = 4 Number of elements: M = 6 One cover: S1, S3, S4 One minimal cover: S1, S3 Not a cover: S1, S2, S4 (a is not covered)
NP-Hard Problem • Introduction to Algorithms by T. H. Cormen, C.E. Leiserson, R. L. Rivest • The Set-covering problem has been proved to be NP hard • A Greedy Algorithm
Algorithm Greedy ResultCover : The minimum cover to be found. Uncovered : The set of elements not covered yet. 1. Set ResultCover to the empty set 2. Set Uncovered to the union of all sets 3. While Uncovered is not empty • select a set S that is not in ResultCover and covers the most elements of Uncovered • add S to ResultCover • remove all elements of S from Uncovered
Algorithm Check And Remove (CAR) • Identifying Redundant Search Engines in a Very Large Scale Metasearch Engine Context • 8th ACM International Workshop on Web Information and Data Management • The set-covering problem is equivalent to the problem of identifying redundant search engines on the Web • Algorithm CAR is much faster than Algorithm Greedy
Algorithm CAR (Check And Remove) 1. Set ResultCover to the empty set 2. For each set S • determine if S has an element that is not covered by ResultCover • add S to ResultCover if S has such an element • exit the for loop if ResultCover is a cover of X 3. For each set S in ResultCover • determine if S has an element that is not covered by any other set of ResultCover • Remove S from ResultCover if S has no such an element
Example Set ResultCover UnCovered {} {a, b, c, d, e, f} S1 {S1} {a, d, f} S2 {S1, S2} {a, f} S3 {S1, S2, S3} {} Removing S2 {S1, S3} {}
Time Complexity Algorithm Greedy O(M * N * min(M, N)) Algorithm CAR O(M * N) N: number of sets M: number of elements of the union X
CPU Time CPU Time (Sec) 40000 35000 30000 25000 Greedy 20000 CAR 15000 10000 5000 0 100 200 300 400 500 600 700 800 900 1000 Actual Cover Size CPU Time
Implementation Details • Read data Binary search tree BitMap indicating which sets cover an element • Convert the tree to an array of BitMaps Matrix representation of the set-cover problem • Find a cover
Binary Search Tree and BitMap Number of sets (N) is known Number of elements of each set is known The total number of elements is unknown Reading elements of one set at a time BitMap size N which sets cover the element a column of the matrix element element element
Array of Column BitMaps e1 e2 e3 e4 em-1 em • Row Operations • Find the number of elements in a set that are not covered by the result cover • Determine if a set contains an element that is not covered by the result cover • Determine if a set in the result cover has an element that is not covered by any other sets in result cover • … element
Array of Row BitMaps It takes some time to convert column BitMaps to row BitMaps. But all row operations are performed within a row BitMap. element
CPU Time The CPU time includes the time to convert column BitMaps to row BitMaps, but not the time to build the tree.
Algorithm Greedy 1. Set ResultCover to the empty set 2. Set Uncovered to the union of all sets 3. While Uncovered is not empty • select a set S that is not in ResultCover and covers the most elements of Uncovered • add S to ResultCover • remove all elements of S from Uncovered
Algorithm Greedy Update UncoveredCount: number of elements of a set not covered by ResultCover 1. Set ResultCover to the empty set 2. Set Uncovered to the union of all sets 3. For each set, set the UncoveredCount to the size of the set 4. While Uncovered is not empty • select a set that has the largest value of UncoveredCount among all sets not in ResultCover • add the set to ResultCover • remove all elements of the set from Uncovered • update the value of UncoveredCount for each set not in ResultCover
Update Uncovered Count For each element in the set to be added to the ResultCover If the result cover does not covers it For each set not in the result cover If the set contains the element uncovered count is decremented by one
Time Complexity Algorithm Greedy O(M * N * min(M, N)) Algorithm CAR O(M * N) Algorithm Greedy Update O(M * N)
Algorithm List And Remove (LAR) Implemented the matrix using linked list instead of array of BitMaps Algorithm Update plus the remove phase from algorithm CAR
Summary • Algorithm LAR runs faster than Algorithm CAR • Algorithm LAR generates smaller cover sets than Algorithm CAR • Algorithm: Updating vs. searching every time • Data Structure: Link list vs. array of BitMaps