250 likes | 411 Views
Connecting Discrete Structures to the “Real World” Using Market Basket Analysis (and Gray Codes) to Integrate and Motivate Topics in Discrete Structures. Michael R. Wick and Paul J. Wagner Department of Computer Science University of Wisconsin - Eau Claire Eau Claire, WI 54701. Road Map.
E N D
Connecting Discrete Structures to the “Real World”Using Market Basket Analysis (and Gray Codes) to Integrate and Motivate Topics in Discrete Structures Michael R. Wick and Paul J. Wagner Department of Computer Science University of Wisconsin - Eau Claire Eau Claire, WI 54701
Road Map • Introduction • Our Discrete Structures Course • Application: Market Basket Analysis • The Apriori Algorithm • Set Theory • Dynamic Programming • Algorithm Analysis • Application: Binary Reflected Gray Codes • Applications • Recursion • Algorithm Analysis • Divide-and-Conquer • Dynamic Programming • Summary • Contact Information
Introduction • Perceived disconnect with Discrete Structures • Rest of curriculum • Application to “real world” • Particularly problematic in applied programs • We claim this course for our own • Replaced similar course in Mathematics • Retained rigor • Infused applications and algorithmics
Our Discrete Structures Course • Topics • Logic • Expert Systems, Algorithm Correctness Proof • Proof Techniques • Recursion • Graycodes • Divide and Conquer • Dynamic Programming • Sets & Relations • Market-basket Analysis • compareTo and equals implementations • Functions • Algorithm Analysis • Combinatorics/Probability • Expert Systems • Matrices • Graphics/Transmission Errors • Graphs and Trees • Shortest Path, Iterative Deepening, Huffman Coding
Application: Market-Basket Analysis • Sets are a powerful way to describe the application • Market Basket Analysis: the use of association techniques to find groups of items that tend to occur together in transactions • frequent item sets • sets of items that occur above some minimum threshold (called the minimum support) • example: {a,b,c,d} occurs 12 times (min. support == 10) • association rules • a,b,c d iff support({a,b,c,d}) / support({a,b,c}) r (called minimum confidence) • a,b c,d iff support({a,b}) / support({c,d}) r • how many such rules are there? • Suggestive Sell • When the client selects the antecedent items suggest that they select the consequent items
Application: Market-Basket Analysis • Apriori Algorithm (1997) • Principles • Every subset of a frequent item set must be frequent • Every frequent item set of cardinality n+1 must have at least two frequent item sets of cardinality n as subsets • The intersection of these two subsets must have a cardinality of n-1 • We can build every possible frequent item set of size n+1 from the union of frequent item sets of size n.
Application: Market-Basket Analysis • Apriori Algorithm (1997) • Example: minSupport = 2 I= {Table Saw, Router, Kreg Jig, Sander, Drill Press} T= {{Table Saw, Router, Drill Press}, { Router, Sander }, { Router, Kreg Jig }, {Table Saw, Router, , Sander }, {Table Saw, , Kreg Jig }, { Router, Kreg Jig }, {Table Saw, , Kreg Jig }, {Table Saw, Router, Kreg Jig, , Drill Press}, {Table Saw, Router, Kreg Jig }} L1 = { {T}, {R}, {K}, {S}, {D} } L2 = { {R,T}, {K,T}, {D,T}, {K,R}, {R,S}, {D,R} } L3 = { {K,R,T}, {D,R,T} } L4 = Rules = ????
k Application: Market-Basket Analysis • Apriori Algorithm (1997) Let I = {a,b,c,…} be a set of all items in the domain Let T = { S | S I } be a bag of all transaction records of item sets Let support(S) = {A | A T S A} | Let L1 = { {a} | a I support({a}) minSupport } k (k > 1 Lk-1 ) Let Lk = { Si Sj| (Si Lk-1) (Sj Lk-1) ( |Si– Sj| = 1 ) ( |Sj– Si| = 1) ( S[ ((S Si Sj) (|S| = k-1)) S Lk-1] ) ( support(Si Sj) minSupport ) The set of all frequent item sets is given by L = Lk and the set of all association rules is given by R = { A C | A (Lk) (C = Lk – A) (A ) (C ) support(Lk) / support(A) minConfidence }
k k Application: Market-Basket Analysis • Dynamic ProgrammingApproach • Want proof of principle of optimality and overlapping subproblems • Principle of Optimality • The optimal solution to Lk includes the optimal solution of Lk-1 • Proof by contradiction • Overlapping Subproblems • Lemma of every subset of a frequent item set is a frequent item set • Proof by contradiction
Application: Market-Basket Analysis • Rule Generation Algorithm Let L = k Lk Let T = {S | S I } be the set of all transactions. Let <A,C> be an association rule with antecedent A and consequent C. Let confid(<A,C>) = |{B | B T (A B) B}| / |{B | B T A B}| Let R1 = {<F-a,a> | F L a F confid(F,a) ≥ min_confid)} and k [ (k > 1) (Rk-1 ≠ ) Rk = { <A,C> | (<A,Ci> Rk-1) (<A,Cj> Rk-1) (|Ci – Cj| =1 |Cj – Ci| = 1) (S [((S Ci Cj) (|S| = k-1)) <A,S> Rk-1]) (confide(<A, Ci Cj>) ≥ min_confi) } then R = Rk is the set of all confident association rules. Given as a homework problem on sets
Application: Binary Reflected Gray Codes • Formal Definition: • A binary reflected Gray code is a one-to-one function mapping the integers 0 i 2n – 1 to n-bit binary numbers so that every two consecutive binary numbers differ in exactly one bit. • Origin • Used by Emile Baudot in telegraph in 1878. • Used by Frank Gray in 1953 patient for pulse-code modulation tube • Prevented large noise spikes when vacuum tube counters incremented • Example:
Application: Binary Reflected Gray Codes • Appears in a curiously large number of applications • Towers of Hanoi • Robotic Arm Angle measurement • Hamiltonian Circuits • …
Visual Representation Application: Binary Reflected Gray Codes • Why is it called “Binary Reflected”? • Binary is obvious • Strings are drawn from alphabet of 0s and 1s • Reflected is less obvious • Each half of the code sequence is built from a reflected copy of the other half
Application: Binary Reflected Gray Codes • A Simple Recursive Definition • Let G(k,n) represent the kth code in the n-bit binary reflected Gray code sequence • Computed in Θ(n) time (for n bits) • For single Gray code value, this is optimal • Typically, however, desire entire code sequent
Application: Binary Reflected Gray Codes • A Naïve Implementation • To generate the entire sequence, call G(i,n) with i going from 0 to k-1. • A priori Analysis • Each invocation of G requires Θ(n) time • G is invoked k times • k is equal to 2n • Therefore, Θ(n*2n) time and Θ(2n) space • Optimal is Θ(2n) time and space
Application: Binary Reflected Gray Codes • What is the source of the inefficiency? • Repeated work.
Application: Binary Reflected Gray Codes • A Dynamic Programming Approach
Application: Binary Reflected Gray Codes • Naïve Dynamic Programming Implementation • Requirement • We must generate and store the entire (n-1)-bit Gray code sequence prior to starting the n-bit Gray code sequence • Approach • Use two-dimensional matrix to store previously calculated Gray code sequences
Application: Binary Reflected Gray Codes • Analysis • Time • Space
Application: Binary Reflected Gray Codes • Notice the classic time/space trade-off • Naïve Iterative • Time: Θ(n*2n) • Space: Θ(2n) • Naïve Dynamic Programming • Time: Θ(2n+1) • Space: Θ(2n+1) • What are the sources of the remaining inefficiencies? • Time: Spends too much time copying values • 2nd half of n-bit sequence is copy (plus “0”) of 1st half • Space: Only require previous Gray code sequence, not all previous sequences Time/Space trade-off is just a rule of thumb
Application: Binary Reflected Gray Codes • Improved Approach • Use integers rather than strings to represent codes • Binary representation of integer is equivalent to the string version • Requires only 1 bit per bit of code. • Reuse the first half of the (n-1)-bit sequence directly as the first half of n-bit sequence • Most-significant bit is still set as it must contain leading zeros. • To set leading one of second half, just add 2n-1
Application: Binary Reflected Gray Codes • Analysis • Produces and stores • Time and Space
Summary • Revised Discrete Structures Course • Explicit connection to curriculum • Infusion of “real-world” applications • Applications allow infusion of • Dynamic Programming • Divide-and-Conquer • Set Theory • Algorithm Analysis • Recursion • Proof Techniques • Logic
Contact Information Michael R. Wick (wickmr@uwec.edu) Paul J. Wagner (wagnerpj@uwec.edu) Department of Computer Science University of Wisconsin – Eau Claire Eau Claire, WI 54701 www.cs.uwec.edu