1 / 27

Dual Active Feature and Sample Selection for Graph Classification

Dual Active Feature and Sample Selection for Graph Classification. Xiangnan Kong 1 , Wei Fan 2 , Philip S. Yu 1. 1 Department of Computer Science University of Illinois at Chicago 2 IBM T. J. Watson Research. KDD 2011. Graph Classification. Traditional Classification:. Feature Vector.

Download Presentation

Dual Active Feature and Sample Selection for Graph Classification

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Dual Active Feature and Sample Selection for Graph Classification Xiangnan Kong1, Wei Fan2, Philip S. Yu1 1 Department of Computer Science University of Illinois at Chicago 2 IBM T. J. Watson Research KDD 2011

  2. Graph Classification • Traditional Classification: Feature Vector input output label x • Graph Classification: Graph Object label input output

  3. Cheminformatics: Drug Discovery - + + Graph Object Anti-cancer activity H H N Chemical Compound H H C C C +/- label C C C - - + H H C H N O H Testing data Training data ? ? ? ? ? ?

  4. Applications: System Call Graph Program Flows Normal softare/ Virus? XML Documents Category Error? label label label

  5. Graph Classification • Given a set of graph objects with class labels • how to predict the labels of unlabeled graphs Challenge: • complex structure • lack of features • Subgraph Feature Mining

  6. Subgraph Features Subgraph Features F3 F1 F2 H H N H C H C … C C C C G1 C C H H How to extract a set of subgraph features for a graph classification? C C O O C C N C H H C x1 C Classifier … H N O 1 0 1 H x1 H … x2 0 1 1 H H C C C H C C G2 H C H C C C x2 C C C C C H O H O Graph Objects Feature Vectors Classifiers

  7. Subgraph Feature Selection • Existing Methods • Mining discriminative subgraph features for a graph classification task • Focused on supervisedsettings - + + C H H C C N C C C F2 F1 - - +

  8. Labeling Cost • Supervised Settings • Require a large number of labeled graphs • Labeling cost is high ? We can only afford to label a few graph objects -> Feature selection -> Classification Accuracy

  9. Active Sample Selection • Given a set of candidate graph samples • We want to select the most important graph to query the label + ? ? ? + - ? ? ?

  10. Active Sample Selection • Given a set of candidate graph samples • We want to select the most important graph to query the label + ? ? ? - ? ? + ?

  11. Two parts of the problem ? ? • Active SampleSelection • select most important graph in the pool to query label ? ? • Subgraph FeatureSelection • Select relevant features to the classification task C C H H O O N C C C O Correlated ! C C C C

  12. Active Sample Selection No feature Subgraph enumeration is NP-hard Representative Informative

  13. Active Sample Selection View depend on which subgraph features are used

  14. Example F1 F2 Subgraph Features C C C O C C C C Graphs Very Similar G1 H H G2 H N H H H C H C C C H C C C C H C C C H C C C C H H C C C C C C H O H N O H O H

  15. Example F1 F2 Subgraph Features C C H H O O N G1 G2 H H H N H H C Very Different H H C C C H C C C C H C Graphs C C H C C C C H H C C C C C C H O H N O H O H

  16. Subgraph Feature Selection Subgraph Feature Feature Selection View Graph Object Active Sample Selection View

  17. Dual Active Feature and Sample Selection Query & Label Active Sample Selection + - C C C LabeledGraphs C C H H O O C C N C Subgraph Feature Selection ? ? ? • Perform active sample selection & feature selectionsimultaneously UnlabeledGraphs

  18. gActive Method • Max-min Active Sample Selection • Maximizing theReward for querying a graph query + min. - Worst Case max.

  19. gActive Method • Dependence MaximizationGraphs’ features match with their labels • InformativeQuery graph far away from labeled graphs • RepresentativeQuery graph close to unlabeled graphs • Max-min Active Sample Selection • Maximize the reward + • Feature Selection • Max. an utility function

  20. Example: - + More Details in the paper: Branch& Bound Subgraph Mining (speed up)

  21. Experiments: Data Sets balanced with 500 positive + 500 negative samples • Anti-Cancer Activity datasets (NCI & AIDS) • Graph: chemical compounds • Label: anti-cancer activities

  22. Experiments: Compared Methods • Unsupervised feature selection + Random Sampling • Freq. + Random frequent subgraphs+ random query • Unsupervised feature selection + Margin-based • Freq. + Marginfrequent subgraphs + close to margin • Unsupervised feature selection + TED • Freq.+ TEDfrequent subgraphs + transductive experimental design • Supervised feature selection + Random Sampling • IG + Random information gain + random query • Supervised feature selection + Margin-base • IG + Margininformation gain + close to margin • Dual active feature and sample selection • gActivethe proposed method in this paper

  23. Experiment Results

  24. Experiment Results (NCI-47) gActive Dual Active Feature & Sample selection Accuracy (higher is better) I.G. + Random I.G. + Margin Freq. + TED Freq. + Margin Freq. + Random Supervised < Unsupervised Supervised > Unsupervised # Queried Graphs ( #features=200, NCI-47 )

  25. Experiment Results

  26. Experiment Results • gActive wins consistently

  27. Conclusions • Dual ActiveFeature and Sample Selection for Graph Classification • Perform subgraph feature selection and active sample selection simultaneously • Future works • other data and applications • itemset and sequence data Thank you!

More Related