Ming Ji Department of Computer Science University of Illinois at Urbana-Champaign

Active Learning for Information Networks A Variance Minimization Criterion Ming Ji Department of Computer Science University of Illinois at Urbana-Champaign

Information Networks: the Data • Information networks • Abstraction: graphs • Data instances connected by edges representing certain relationships • Examples • Telephone account networks linked by calls • Email user networks linked by emails • Social networks linked by friendship relations • Twitter users linked by the ``follow” relation • Webpage networks interconnected by hyperlinks in the World Wide Web …

Active Learning: the Problem • Classical task: classification of the nodes in a graph • Applications: terrorist email detection, fraud detection … • Why active learning • Training classification models requires labels that are often very expensive to obtain • Different labeled data will train different learners • Given an email network containing millions of users, we can only sample a few users and ask experts to investigate whether they are suspicious or not, and then use the labeled data to predict which users are suspicious among all the users

Active Learning: the Problem • Problem definition of active learning • Input: data and a classification model • Output: find out which data examples (e.g., which users) should be labeled such that the classifier could achieve higher prediction accuracy over the unlabeled data as compared to random label selection • Goal: maximize the learner's ability given a fixed budget of labeling effort.

Notations : the set of nodes : the labels of the nodes , where is the weight on the edge between two nodes and Goal: find out a subset of nodes , such that the classifier learned from the labels of could achieve the smallest expected prediction error on the unlabeled data , measured by , where is the label prediction for

Classification Model • Gaussian random field • : energy function measuring the smoothness of a label assignment • Label prediction • Without loss of generality, we can arrange the data points chosen to be labeled to be the first instances, i.e., • Design constraint , we want to predict with the highest probability • Let be the graph Laplacian, split as: • Prediction:

The Variance Minimization Criterion • Recall the goal of active learning • Maximize the learner's ability Minimize the error • Analyze the distribution of the Gaussian field conditioned on the labeled data • Compute the expected prediction error on the unlabeled nodes • Choose the nodes to label such that the expected error (= total variance) is minimized

Experimental Results on the Co-author Network Classification accuracy (%) comparison

Experimental Results on the IsoletData Set Classification accuracy vs. the number of labels used

Conclusions • Publication: Ming Ji and Jiawei Han, “A Variance Minimization Criterion to Active Learning on Graphs”, Proc. 2012 Int. Conf. on Artificial Intelligence and Statistics (AISTAT'12), La Palma, Canary Islands, April 2012. • Main advantages of the novel criterion proposed • The first work to theoretically minimize the expected prediction error of a classification model on networks/graphs • The only information used: the graph structure • Do not need to know any label information • The data points do not need to have feature representation • Future work • Test the assumptions and applicability of the criterion on real data • Study the expected error of other classification models

Ming Ji Department of Computer Science University of Illinois at Urbana-Champaign

Ming Ji Department of Computer Science University of Illinois at Urbana-Champaign

Presentation Transcript

University of Illinois at Urbana-Champaign Department of Mathematics

University of Illinois at Urbana-Champaign UIUC

Dan Roth Department of Computer Science University of Illinois at Urbana-Champaign

University of Illinois at Urbana-Champaign

Dan Roth Department of Computer Science University of Illinois at Urbana-Champaign

Dan Roth Department of Computer Science University of Illinois at Urbana-Champaign

Department of Computer Science University of Illinois at Urbana-Champaign

Dan Roth Department of Computer Science University of Illinois at Urbana-Champaign

University of Illinois Urbana-Champaign

University of Illinois at Urbana-Champaign

University of Illinois at Urbana-Champaign WELCOME

Jennifer C. Hou Department of Computer Science University of Illinois at Urbana-Champaign

Dan Roth Department of Computer Science University of Illinois at Urbana/Champaign

University of Illinois Urbana-Champaign

Marco Caccamo Department of Computer Science University of Illinois at Urbana-Champaign

University of Illinois at Urbana-Champaign

University of Illinois at Urbana-Champaign

Dan Roth Department of Computer Science University of Illinois at Urbana-Champaign

Dan Roth Department of Computer Science University of Illinois at Urbana-Champaign