10 likes | 83 Views
A. Casali 1 , C. Ernst 2 , F. Gasnier 3 , J. Stephan 2 1: Université de la Méditerranée / LIF ― 2: É cole des Mines de St É tienne / CMP-GC ― 3: STMicroElectronics Rousset. A complete data transformation, mining and interpretation Model for correlation detection within data measurements.
E N D
A. Casali1, C. Ernst2, F. Gasnier3, J. Stephan2 1: Université de la Méditerranée / LIF ―2:École des Mines de St Étienne / CMP-GC ―3: STMicroElectronics Rousset A complete data transformation, mining and interpretation Model for correlation detection within data measurements Extracting correlated sets using the chi-squaredmeasure within n-ary relations: an implementation Motivations Results The field of APC aims at highlighting correlations between Production parameters. This study focuses on the device analysis of the principal trajectories impacting the yield. The goal is to detect correlations between data measurements structured as n-ary relations and involving (at least) one target attribute. The method uses a data mining levelwise algorithm based on both the chi-squared and the support measures. Retrieved Patterns Report • - Item decoding • Presentation (processing) of correlations INTERPRETATION Methodology: a KDD approach Raw (Excel) Data Measurement Files Knowledge Generation Files with a vast number of numerical attributes (and often incomplete data) SELECTION Selected File Conclusions • Attribute removal. Criteria: attributes • with too few distinct values • having too many null values • presenting doubles (one is kept) • with a too small standard deviation PREPROCESSING This approach makes it possible for STMicroElectronics Rousset to highlight unknown correlations between various parameters, validated by electrical and/or physical analysis. While the proposed mining method confirmed that levelwise algorithms do not provide results beyond four search levels, it proved its value for n-ary relations with a very large number of numerical attributes. The study aims at supporting the development of effective R2R control loops. Preprocessed File • - Normalization • Interval discretization / Item encoding • Elimination of attributes with no item having the support TRANSFORMATION Transformed File Future Work • Current developments are focused on: • - The optimization of the procedure, • And the implementation of other search methods. • We plan to initiate a background procedure integrating different • sets of methods, measurements and results. • Automatic generation of the most suitable result for • each new analysis. • IN : ItemSet I, Fraction p%, Threshold mc (chi2), Threshold s (support), • Target Attribute ta, Relation r • OUT : Set of minimal correlated patterns • C2 := APrioriGen(I); // (2-pattern) candidates generation • i := 2 • while Ci <> 0 do • Li := 0 • for each X Cido • Build the contingency table of X • if p% of the table’s cells have a support s then • if chi2(X) mc then Li := Li X • endif 10 end for 11 Ci+1 := APrioriGen(Ci – Li) 12 i := i + 1 13 end while 14 returni Li // limited to the patterns including one item of ta DATA MINING Acknowledgments This work was initiated while the fourth author was at Ecole des Mines de Saint-Étienne / CMP-GC, and was supported by Research Project “Rousset 2003-2008”, financed by the Communauté du Pays d'Aix, Conseil Général des Bouches du Rhône and Conseil Régional Provence Alpes Côte d'Azur.