240 likes | 381 Views
Extracting Decisional Correlation Rules. Alain Casali Christian Ernst. Industrial Problem. Given a supply chain (in micro- electronics) , we want to find links between some parameters ’ values and values of a specific attribute of the supply chain (the yield) .
E N D
Extracting Decisional Correlation Rules Alain Casali Christian Ernst
Industrial Problem • Given a supply chain (in micro-electronics), we want to find links between some parameters’ values and values of a specific attribute of the supply chain (the yield). • The use of positive (and/or negative) association rules is not suitable in our context. • We use correlation tests because: • it is a more significant measure in a statistical way; • the measure takes into account not only the presence but also the absence of the items; • the measure is non-directional, and can thus highlight more complex existing links than a “simple ” implication. Dexa'09 - Extracting Decision Correlation Rules
Outline • Preliminaries • Decision Correlation Rules • Contingency Vectors • LHS-χ2 algorithm • Experimental Analysis • Conclusion Dexa'09 - Extracting Decision Correlation Rules
Literal Set • A literal set XY is composed by: • a positive part (X); • a negative part (Y); • The variation of a literal set XY encompasses all the combinations that we can obtain from XY.Ex: Var(AB) = AB, AB, AB, AB • The support of a literal set is the number of transactions that contain its positive part and contain no 1-item of its negative part. Dexa'09 - Extracting Decision Correlation Rules
Correlationrule and χ2 (1) • Contingency table • Expected Value Each cell of the contingency table (CT) of a pattern X contains the support of all literal sets YZ related to its variation: Dexa'09 - Extracting Decision Correlation Rules
Correlationruleand χ2 (2) • Computation of χ2 (Brin’97)Makes the link between real support and theoretical support (expected value) • Correlation rateutilization of a table giving the centile values with a single degree of freedom (existence of a bijection) Correlation (BF) ≈ 85% ⇒χ2(BF) ≈ 1,67 Dexa'09 - ExtractingDecisionCorrelationRules
Related Constraints • Anti monotone constraint (Cochran criteria): • no cell of the CT must have a null value; • at least p% of the CT’scells must have a support greater or equal than MinSup; • Monotone Constraint • X symbolizes a valid correlation rule: χ2(X) ≥ MinCor Dexa'09 - Extracting Decision Correlation Rules
Browsing the search space • Utilization of levelwise algorithms to browse the search space; • Levelwise algorithms are adapted when: • the relation is on the disk; • we have anti monotone constraints. • Problem: memory requirement for the contingency tables Example with |I| = 1000 Dexa'09 - Extracting Decision Correlation Rules
Lectic Order & Lectic Search (LS) Goal: enumerate the combinations (powerset lattice) with a balanced treeStart point: 2 vectors; the 1st one is empty, the 2nd one contains the list of the itemsCreate 2 branches: left: prune the last element of the 2nd vector (recursive call) right: add the last element of the 2nd vector to the first (recursive call) Stop: when the 2nd vector is empty, then output the 1st vector (,ABC) (C,AB) (,AB) (,A) (B,A) (, ) (A,) (B,) (AB,) DEXA - Sept. 2006
Outline • Preliminaries • Decision Correlation Rules • Contingency Vectors • LHS-χ2 algorithm • Experimental Analysis • Conclusion Dexa'09 - Extracting Decision Correlation Rules
Decision Correlation Rules • We are interested by rules satisfying the both constraints: • χ2(X) ≥ MinCor • X contains 1 value of the target attribute • Problem: it does not exist a function f such that χ2(X ∪ A) = f(χ2(X), supp(A)) Dexa'09 - Extracting Decision Correlation Rules
Outline • Preliminaries • Decision Correlation Rules • Contingency Vectors • LHS-χ2 algorithm • Experimental Analysis • Conclusion Dexa'09 - Extracting Decision Correlation Rules
Contingency Vector (1) • Equivalence class associated with a literal • Contingency Vector of a pattern XSet of equivalence classes of the variation of X [YZ] = {i Tid(r) / Y Tid(i) et Z Tid(i) = } Ex : [BF] = {3} Ex : CV (BF) = { [BF], [BF], [BF], [BF]} = {{8}, {4}, {3}, {1,2,5,6,7,9,10} Dexa'09 - Extracting Decision Correlation Rules
Contingency Vector (2) • The contingency vector is a partition of the Tid’s • Recurrence relation: • In practice: VC (X A) = (VC(X) [A]) (VC(X) [A]) Additions in binarylogic Dexa'09 - Extracting Decision Correlation Rules
Contingency Vector (3) Computation of the contingency table Dexa'09 - Extracting Decision Correlation Rules
Outline • Preliminaries • Decision Correlation Rules • Contingency Vectors • LHS-χ2 algorithm • Experimental Analysis • Conclusion Dexa'09 - Extracting Decision Correlation Rules
LHS χ2 Algorithm • Modification of LS in order to include the contingency vectors; • If we are on a node: • Call to the left branch: we do nothing; • Before calling the right branch: • Computation of the new contingency vector; • Test of the anti monotone constraints; • [Add current pattern to the positive border] • Test of the monotone constraints; • Computation of the χ2 • If all tests are OK, then output the pattern and its χ2 Dexa'09 - Extracting Decision Correlation Rules
Memory Requirements What is the needed storage requirement? • Contingency Vectors of the 1-item: |I|*|r| bits • Currents contingency vectors (including the previous one due to recursive call):|I|*|I|*|r| bits in theory|I|*|r| bytes in practice since we never exceed pattern having a length greater than 8 • Finally we need: |r|*(|I|+|I|/8) bytesthis result has to be compared with Dexa'09 - Extracting Decision Correlation Rules
Outline • Preliminaries • Decision Correlation Rules • Contingency Vectors • LHS-χ2 algorithm • Experimental Analysis • Conclusion Dexa'09 - Extracting Decision Correlation Rules
ExperimentalAnalysis (1) • Experiments are made on PC with a 1.8 GHz processor with a RAM of 2Go • Files are provided by 2 manufacturers (STMicroelectronics and ATMEL) Dexa'09 - Extracting Decision Correlation Rules
ExperimentalAnalysis (2) Dexa'09 - Extracting Decision Correlation Rules
ExperimentalAnalysis (2) Dexa'09 - Extracting Decision Correlation Rules
Outline • Preliminaries • Decision Correlation Rules • Contingency Vectors • LHS-χ2 algorithm • Experimental Analysis • Conclusion Dexa'09 - Extracting Decision Correlation Rules
Conclusion • We have discovered new parameters having an influence on the yield (above 25% was not known before); • Better response time between 30 and 70% with LHS-χ2 compared to a levelwise algorithm; • Perspectives: • Utilization of “divided and conquer” strategy for better performances; • « Cleaning » / Transformation of original data; • Generalization of the rules by integrated literal sets. Dexa'09 - Extracting Decision Correlation Rules