230 likes | 369 Views
Djamel A. Zighed and Nicolas Nicoloyannis ERIC Laboratory University of Lyon 2 (France) zighed@univ-lyon2.fr. Prague Sept. 04. About Computer science dep. In Lyon, there are 3 universities, 100000 students Lumière university Lyon 2, has 22000 students,
E N D
Djamel A. Zighed and Nicolas Nicoloyannis ERIC Laboratory University of Lyon 2 (France) zighed@univ-lyon2.fr Prague Sept. 04
About Computer science dep. • In Lyon, there are 3 universities, 100000 students • Lumière university Lyon 2, has 22000 students, • Lyon 2, is mainly a liberal art university • The faculty of economic has tree departments, among them the computer science one • We belong to this department • We have Bachelor, Master and PhD programs for 300 students
ERIC Lab at the University Faculties of university of Lyon 2 Economic Sociology Linguistic Law Research centers of the university ERIC Knowledge Engineering Research Center - The budget of ERIC doesn’t depend from the university, it’s given par The national ministry of education - We have a large autonomy in decision making
ERIC Lab • Born in 1995, • 11 professors (N. Nicoloyannis, director) • 15 PhD Students • Grants+contracts+WK+…=200K€/year • Research topics • Data mining (theory, tools and applications) • Data warehouse management (T,T,A)
Theory Induction graphs Learning and classification Tools SIPINA : Plate form for data mining Applications Medical fields Chemical applications Human science … Data Mining (T,T,A) Data mining TTA for complex data
Data mining on complex data • An example : Breast cancer diagnosis
Association measure : It measures the strength of the relationship betweenX and Y Contingency table Motivations
Motivations Association measure : It measures the strength of the relationship betweenX and Y Contingency table
Motivations Association measure : It measures the strength of the relationship betweenX and Y Contingency table
Motivations Association measure : It measures the strength of the relationship betweenX and Y According to a specific association measure, may we improve the strength of the relationship by merging some rows and/or some columns ? Contingency table
Association measure : It measures the strength of the relationship betweenX and Y Motivations According to a specific association measure, may we improve the strength of the relation ship by merging some rows and/or some columns ? Contingency table
For the preceding example the maximization of the Tschuprow’s t gives Goal: Find the groupings that maximize the association between attributes Yes, we can improve the association by reducing the size of the contingency table
Extension According to a specific association measure, may we find the optimal reduced contingency table ? Contingency table
( ) ( ) P W T : The set of all partitions brought about over X X ( ) ( ) P W T : The set of all partitions brought about over Y Y ( ) ( ) P P # T : the size of the set T X X ( ) ( ) P P # T : the size of the set T Y Y The number of cases we have to check is ( ) ( ) P P l = ´ # T # T X Y Optimal solution (exhaustive search) Goal:Find the best cross partition on T
Optimal solution (exhaustive search) According to a specific association measure, may we find the optimal reduced contingency table ? Yes, but the solution is intractable in real word because of the high time complexity
Heuristic Proceed successively to the grouping of 2 (row or column) values that maximizes the increase in the association criteria.
Simulation Goal:How far is the quasi-optimal solution from the true optimum? Comparison tractable for tables not greater than 6 × 6. Simulation Design Randomly generate 200 tables Analysis of the distribution of the deviations between optima and quasi-optima. Generating the Tables 10000 cases distributed in the cxr cells of the table with an uniform distribution (worst case).
Conclusion • Implementation for new approach induction decision tree. • Zighed, D.A., Ritschard, G., W. Erray and V.-M. Scuturici (2003), Abogodaï,a New approach for Decision Trees, in Lavrac, N., D.Gamberger, L. Todorovski and H. Blockeel (eds), Knowledge Discovery in databases: PKDD 2003 , LNAI 2838, Berlin: Springer, 495--506. • Zighed D. A., Ritschard G., Erray W., Scuturici V.-M. (2003), Decision tree with optimal join partitioning, To appear in Journal of Information Intelligent Systems, Kluwer (2004). • Divisive top-down approach • Extension to multidimensionnal case