270 likes | 449 Views
Incorporating Game Theory in Feature Selection for Text Categorization. Nouman Azam and JingTao Yao Department of Computer Science University of Regina CANADA S4S 0A2 azam200n@cs.uregina.ca jt yao@cs.uregina.ca http://www.cs.uregina.ca/~azam200n http://www.cs.uregina.ca/~jtyao.
E N D
Incorporating Game Theory in Feature Selection for Text Categorization Nouman Azam and JingTao Yao Department of Computer Science University of Regina CANADA S4S 0A2 azam200n@cs.uregina.ca jtyao@cs.uregina.ca http://www.cs.uregina.ca/~azam200nhttp://www.cs.uregina.ca/~jtyao
Acknowledgement • Thanks to Dr. Dominik Slezak for presenting this work on our behalves. Incorporating Game Theory in Feature Selection for TC
Introduction • Feature selection. • Selecting a subset of important features. • Text categorization. • Assigning textual documents to predefined categories. • Text categorization and high imbalance. • The number of instances in categories varies significantly. • Importance of features vary accordingly. • Hard to apply feature selection techniques directly. Incorporating Game Theory in Feature Selection for TC
Feature Selection in Text Categorization • Assigning positive or negative values to features. • The values indicate importance of features. • Positive values indicates importance for positive category. • Negative values indicates importance for negative category. Incorporating Game Theory in Feature Selection for TC
Existing Feature Selection Approaches • One sided approaches. • Selecting features with high positive values. • Two sided approaches. • Selecting features with high absolute value. • Explicit combinational approach. • Selecting features with high positive or negative values generated by a one sided method. Incorporating Game Theory in Feature Selection for TC
Limitations of Existing Approaches • Favours features indicative of either positive or negative category. • There may be features that indicates both categories. • It is plausible to include such features in some applications. • Dilemma: positive features vs. negative features. • However, we need to find a way to select these features. • Incorporating Game Theory in Feature Selection to deal with this issue. Incorporating Game Theory in Feature Selection for TC
Incompetence of Existing Approaches • An Example. • Considering an imbalanced data set with 10 documents in positive and 100 in negative categories. • There are eight words in these documents. • Considering four methods. • One sided approaches: correlation coefficient and GSS coefficient. • Two sided approaches: chi square and gini index. Incorporating Game Theory in Feature Selection for TC
Probabilities of Words in Categories • Meaning of probabilities. • Referring to fraction of documents from a category containing the word. Incorporating Game Theory in Feature Selection for TC
Scores of Words Incorporating Game Theory in Feature Selection for TC
Rankings of Words • Observations • w7 and w8 are not considered as important by any method. • They will be ignored, if we select three features. Incorporating Game Theory in Feature Selection for TC
A Simple Solution • Using an explicit combinational approach. • Probabilities in respective categories are used for rankings. • The new rankings. • Considering positive category twice as important as negative category. • We may select w1, w8 and w4. • We note that w8 which indicates both categories is selected. Incorporating Game Theory in Feature Selection for TC
Conclusion from the Simple Solution • A feature may be considered as good for, • Positive category, • Negative category, • Both of them, or • Neither of them. • We are trying to find a systematic method, that finds the best decision choice. • Game theory may be useful for formulating such method. Incorporating Game Theory in Feature Selection for TC
Game Theory • Game theory is a core subject in decision sciences. • Prisoners Dilemma. • A classical example in Game Theory. Incorporating Game Theory in Feature Selection for TC
Feature Selection with Game Theory • Formulating problems with Game Theory requires to, • Identifythe player set. • Identify the strategy set. • Determine the payoff functions. • Implement a competition. Incorporating Game Theory in Feature Selection for TC
The Player Set • Two players were selected. • The players represents positive and negative category. • The player C+ represents positive category. • The player C- represents negative category. • Each player determine the features’ utility for its respective category. Incorporating Game Theory in Feature Selection for TC
The Strategy Set • Two actions were formulated for each player. • Action a1 for keeping a feature. • Action a2 for discarding a feature. • For Differentiating the actions of the two players • denote the actions of C+. • denote the actions of C-. Incorporating Game Theory in Feature Selection for TC
The Payoff Functions • Notation for a payoff function. • Payoff of player i, performing action j, given action k of opponent is denoted as . • The payoff sets. Incorporating Game Theory in Feature Selection for TC
Defining the Payoff Functions • Let cat and cat represents positive and negative categories. • A and B represent the number of documents from cat and cat containing word w. • C and D representthe number of documents from cat and cat that does not contain w. • Conditional probabilities of w in cat and cat are Incorporating Game Theory in Feature Selection for TC
Payoffs Functions for Players • Both players deciding to keep a feature. • The payoffs of players are calculated as average. . • Both players deciding to discard a feature. • The payoffs are calculated as . • C+ deciding to keep while C- discard. • The payoffs are and respectively. • C+ deciding to discard while C- keep. • The payoffs are and respectively. Incorporating Game Theory in Feature Selection for TC
Actions Scenarios for Players Incorporating Game Theory in Feature Selection for TC
Implementing Competition • Representing the game in a payoff table. • Determining Nash equilibrium for finding the actions of players. Incorporating Game Theory in Feature Selection for TC
Selected Features Set • Defining two features sets. • FS+ as set of features representing positive category. • FS- as set of features representing negative category. • The game will determine the inclusion or exclusion of features in these sets. • Final selected features is the union of FS+ and FS-. Incorporating Game Theory in Feature Selection for TC
A Demonstrative Example • Considering earlier example. Incorporating Game Theory in Feature Selection for TC
Payoff Tables for Words • The bold cells represents Nash equilibrium. • Considering w1. • The actions of players in equilibrium are for C+ and for C-. • The actions of players decides to include w1 in FS+. Incorporating Game Theory in Feature Selection for TC
Payoff Tables for Words Incorporating Game Theory in Feature Selection for TC
Selected Features • Result of implementing game for features. • FS+ = {w1, w7, w8} and FS- = {w4, w7,w8}. • FS = {w1, w4, w7, w8}. • Observation. • The words w7 and w8 are selected. • The suggested approach selects features, that indicates both categories. Incorporating Game Theory in Feature Selection for TC
Conclusion • Limitations of existing approaches. • Preference is given to features indicating positive or negative category. • The may not be suitable for selecting features indicating both categories. • Game theory based method. • Implements a game between categories. • Importance of the method. • Useful in selecting features indicating positive category, negative category or both of them. Incorporating Game Theory in Feature Selection for TC