180 likes | 299 Views
Feature Selection Based on Relative Attribute Dependency: An Experimental Study. Jianchao Han 1 , Ricardo Sanchez 1 , Xiaohua Hu 2 1 Computer Science Dept., California State University Dominguez Hills 2 College of Information Science and Technology, Drexel University. Agenda. Introduction
E N D
Feature Selection Based on Relative Attribute Dependency: An Experimental Study Jianchao Han1, Ricardo Sanchez1, Xiaohua Hu2 1Computer Science Dept., California State University Dominguez Hills 2College of Information Science and Technology, Drexel University RSFDGrC 2005, Regina, Canada
Agenda • Introduction • Rough Set Approach • Relative Attribute Dependency Based on Rough Set Theory • A Heuristic Algorithm for Finding Optimal Reducts • Experiment Result • Related Work • Summary and Future Work RSFDGrC 2005, Regina, Canada
Introduction • Data Reduction • Horizontal Reduction – Sampling • Vertical Reduction – Feature Selection • Feature Selection • Statistical Feature Selection • Significant Feature Selection • Rough Set Feature Selection • Search Process • Top-bottom search • Bottom up search • Exhaust search • Heuristic search RSFDGrC 2005, Regina, Canada
Rough Set Based Feature Selection • Bottom-up search • Brute-force search • Heuristic search • Rough Set Theory • Introduced by Pawlak in the 1980s • An efficient tool for data mining, concept generation, induction, and classification RSFDGrC 2005, Regina, Canada
Rough Set Theory-- IS • Information system IS = < U, C, D, {Va}aCUD, f>, • where U={u1,u2,...,un} is a non-empty set of tuples, called data table • Cis a non-empty set of condition attributes • D is a non-empty set of decision attributes • CD=Ø • Va is the domain of attribute a with at least two elements • f is a function: U×(CD)V=aCDVa, RSFDGrC 2005, Regina, Canada
Approximation • Let ACD, and ti, tjU, X be a subset of U and ACD • Define RA={<ti,tj>U×U: aA, ti[a]=tj[a]} • Indiscernibility relation, denoted IND, is an equivalent relation on U , RA is an equivalent relation on U. • Approximation space (U, IND) partitions U into equivalentclasses[A]={A1,A2,…,Am} induced by RA • Lower approximation or positive region of X: LowA(X)={Ai[A] | AiX, 1≤i≤m} • Upper approximation of X based on A: UppA(X) = {Ai[A]|AiX≠Ø, 1≤i≤m} • Boundary area of X: BoundaryA(X) = UppA(X)-LowA(X) • Negative region of X: NegA(X)={Ai[A]|Ai U-X, 1≤i≤m} RSFDGrC 2005, Regina, Canada
Core Attributes and Reduct • Let [D]={D1,D2,…,Dk} be the set of elementary sets partitioned by RD • Approximation aggregation: • LowA([D])=kj=1LowA(Dj) • UppA([D])= kj=1UppA(Dj) • a C is a Core attribute of C, if LowC([D]) LowC-{a} ([D]), dispensable attribute otherwise • RC is a reduct of C in U w.r.t. D if LowR([D])=LowC([D]) and BR, LowB([D])LowC([D]) RSFDGrC 2005, Regina, Canada
Calculation of Reducts • Finding all reducts is NP-hard • Traditional method: decision matrix • Some new methods still suffer from intensive computation of • either discernibility functions • or positive regions • Our method: • New equivalent definition of reducts • Count distinct tuples (rows) in the IS table • Efficient algorithm for finding reducts RSFDGrC 2005, Regina, Canada
Relative Attribute Dependency • Let , -- projection of U on P • Let , , the degree of relative dependencyof Q on D over U • is the number of equivalence classes in U/IND(X). • Theorem.Assume U is consistent. is a reduct of C with respect to D if and only if 1) 2) RSFDGrC 2005, Regina, Canada
Computation Model (RAD) • Input: Adecision table U, condition attributes set C and decision attributes set D • Output: A minimum reduct R of condition attributes set C with respect to D in U • Computation: Find a subset R of C, such that , and . RSFDGrC 2005, Regina, Canada
A Heuristic Algorithm for Finding Optimal Reducts • Given the partition by D, U/IND(D), of U, the entropy, or expected information based on the partition by , of U, is given by where • Entropy E(q) RSFDGrC 2005, Regina, Canada
Algorithm Design • R C, Q empty • For each attribute do • Compute the entropy E(q) of q • Q Q {<q, E(q)>} • While Do • q • Q Q – {<q, E(q)>} • If Then • R R – {q} • Return R Algorithm complexity: RSFDGrC 2005, Regina, Canada
Experiments • Data sets – UCI repository • 10 data sets • Various sizes (# of tuples and attributes) • Categorical attributes • Preprocess – remove all inconsistent tuples RSFDGrC 2005, Regina, Canada
Reduct Original Data 800 700 600 500 400 300 200 100 0 AS BCW DER HV LC SPE YS ZOO AUD SOY Experiment Results b) Number of Condition Attributes a) Number of Rows RSFDGrC 2005, Regina, Canada
Classification Accuracy RSFDGrC 2005, Regina, Canada
Related Work • Grzymala-Busse • LERS • Rough measure of rule: • Ours: • Nguyen et al: • Similar: using Radix-sorting • Ours: no discernibility relation, lower and upper approximations maintained • Others RSFDGrC 2005, Regina, Canada
Conclusion • Summary • Relative attribute dependency • Computation model • Algorithm implementation • Experiment • Future work • Refinement • Application • Extension to numerical attributes RSFDGrC 2005, Regina, Canada