170 likes | 288 Views
High Frequent Value Reduct in Very Large Databases. Tsau Young Lin San Jose State University, USA Jianchao Han California State University Dominguez Hills, USA. Agenda. Introduction Decision table reduction review Our method An example Conclusion. Introduction. Rough Set
E N D
High Frequent Value Reduct in Very Large Databases Tsau Young Lin San Jose State University, USA Jianchao Han California State University Dominguez Hills, USA RSFDGrC-2007
Agenda • Introduction • Decision table reduction review • Our method • An example • Conclusion RSFDGrC-2007
Introduction • Rough Set • Finding decision rules in decision tables • Reduction • Row (Horizontal) reduction • merging duplicate rows • Column (Vertical, or Attribute) reduction • finding important attributes • Value reduction • simplifying decision rules RSFDGrC-2007
Finding Value Reduct RSFDGrC-2007
Row Reduction: Step 1 • An equivalence relation can be defined by RESULT: ID-i ID-j iff ID-i.RESULT = ID-j.RESULT • It partitions the transaction into three decision classes: • DECISION1={ID-1, ID-2, ID-3, ID-4, ID-5, ID-6, ID-7, ID-8, ID-9}={1} • DECISION2={ID-10, ID-11, ID-12, ID-13, ID-14}={2} • DECISION3={ID-15, ID-16, ID-17, ID-18}={3} RSFDGrC-2007
Row Reduction: Step 2 • For the conditional attributes {TEST, LOW, HIGH, CASE, NEW}, we have the following condition classes: • CONDITION1 = {ID-1, ID-2}; • CONDITION2 = {ID-3}; • CONDITION3 = {ID-4, …, ID-9}; • CONDITION4 = {ID-10}; • CONDITION5 = {ID-11, …, ID-14}; • CONDITION6 = {ID-15}; • CONDITION7 = {ID-16, ID-17, ID-18}. RSFDGrC-2007
Row Reduction: Step 3 • Decision rules • R1: CONDITION1 DECISION1; • R2: CONDITION2 DECISION1; • R3: CONDITION3 DECISION1; • R4: CONDITION4 DECISION2; • R5: CONDITION5 DECISION2; • R6: CONDITION6 DECISION3; • R7: CONDITION7 DECISION3. RSFDGrC-2007
Attribute Reduction • Finding attribute reducts • Two minimal attribute reducts: • {TEST, LOW, HIGH, CASE} • {TEST, LOW, HIGH, NEW}. RSFDGrC-2007
Value Reduction • Finding value reduct for each rule • [Rule] <attribute> represents equivalent classes • Consider Rule 1 • [R1]TEST ={R1, R2, R5, R7}; • [R1]LOW ={R1, R7}; • [R1]HIGH ={R1, R5, R6}; • [R1]CASE ={R1, R2, R4, R5, R6, R7} RSFDGrC-2007
Value Reduct • Find the family • F = {[R1]TEST, [R1]LOW, [R1]HIGH, [R1]CASE}, F = {R1} • Value reduct is the minimal subfamilies {[R1]LOW, [R1]HIGH }, such that [R1]LOW [R1]HIGH = F If choose theattribute reduct {TEST, LOW, HIGH, NEW}, obtain different rules RSFDGrC-2007
Our Method • Finding value reduct without finding attribute reduct • Avoid computation of selecting attribute reduct • Do not miss any rules when selecting attribute reduct • Forming rules from frequent rows • Originated from association rules • Easy implementation in DBMS RSFDGrC-2007
Algorithm: Finding all decision rules Input: A decision (relational) table T condition attribute set C, a decision attribute d; a minimum support threshold s Output: RB, a set of decision rules RB empty For k=1 to |C| Do RBk empty For each subset of C, A of size k, Do TA create a subset from T with all columns in A Remove all inconsistent and insufficient support tuples from TA For each remaining tuple r in TA Do If r is not covered by R Then RBk RBk {r} If RBk = empty Then Return RB Else R R RBk Return R RSFDGrC-2007
Implementation in SQL • Create a subset of T from a given subset A of C and remove all inconsistent and insufficient support tuples • Assume A = {A1, A2, …, Ap}, then the following SQL statement works: CREATE VIEW TA SELECT A1, A2, …, Ap, d, sum(support) FROM (SELECT A1, A2, … Ap, d, count(*) support FROM T GROUP BY A1, A2, …, Ap, d) GROUP BY A1, A2, …, Ap HAVING count(*) = 1 and sum(support) >= s RSFDGrC-2007
An Example • Previous decision with the support threshold s=1 • Loop 1: Finding frequent 2-itemset. • Two consistent rules RSFDGrC-2007
An Example: Continue • Finding frequent 3-itemset -- Three consistent rules • Finding frequent 4-itemset -- Three consistent rules RSFDGrC-2007
An Example: Continue • No any 5-items which are consistent and not covered by RB • The output RB is the union of above tables: 2-item rules • R1’’: CASE = 3 RESULT = 1 with support = 6 • R2’’: NEW = 2 RESULT = 1 with support = 6 3-item rules • R3’’: TEST = 0, HIGH = 0 RESULT = 3 with support = 1 • R4’’: LOW = 0, HIGH = 0 RESULT = 1 with support = 2 • R5’’: LOW = 0, HIGH = 1 RESULT = 3 with support = 3 4-item rules • R6’’: TEST = 1, LOW = 1, HIGH = 1 RESULT = 1 with support 1 • R7’’: TEST = 1, LOW = 1, HIGH = 0 RESULT = 2 with support 4 • R8’’: TEST = 0, HIGH = 1, CASE = 2 RESULT = 2 with support 1 • R9’’: TEST = 0, HIGH = 1, NEW = 1 RESULT = 2 with support 1 RSFDGrC-2007
Conclusion • Reviewed various approaches to reducing decision tables to form decision rules • Present a new method to find decision rules directly from value reduction • Discuss the algorithm implementation in SQL • Demonstrate the method with an example RSFDGrC-2007