Collaborative Classifier Agents

Studying the Impact of Learning in Distributed Document Classification Collaborative Classifier Agents by WeimaoKe, JavedMostafa, and YueyuFu {wke, jm, yufu}@indiana.edu Laboratory of Applied Informatics Research Indiana University, Bloomington Laboratory of Applied Informatics Research (LAIR)

Outline • Introduction • Motivation and research questions • Research model and methodology • Experimental design • Experimental results • Summary and future work

Intro: Centralized classification A Centralized Classifier Classifier Knowledge

Intro: Why not centralized? • Global repository is rarely realistic • Scalability • Intellectual property restrictions • … ?

Intro: Knowledge distribution Arts Sciences Distributed repositories History Politics

Intro: Distributed classification Distributed Classification Classifier Knowledge Assignment Classification

Intro: Collaborative classification Collaboration for classification Classifier Knowledge Assignment Classification

Motivation & Research Questions • Why distributed classification? • The nature of knowledge distribution • Advantages of distributed classification: fault tolerance, adaptability, flexibility, privacy, etc. • Problems/Questions • Classifier agents have to learn and collaborate. But how? • Effectiveness and efficiency of agent collaboration? • The dynamics of agent collaboration?

Methodology: Agent simulation • To compare • Centralized approach (upper-bound) • Distributed approach without collaboration (lower-bound) • Distributed approach with collaboration • Two learning/collaboration algorithms: • Algorithm 1: Pursuit Learning • Algorithm 2: Nearest Centroid Learning • Two parameters • r: Exploration Rate • g: Maximum Collaboration Range • Evaluation Measure • Effectiveness: precision, recall, F measure • Efficiency: time for classification • Precision = a / (a + b) • Recall = a / (a + c) • F1 = 2 * P * R / (P + R)

Exp design: Data collection • Reuters Corpus Volumes 1 (RCV1) • 37 classes / categories • “Knowledge” represented by the training class vectors • Training set: 6,394 documents • Test set: 2,500 documents • Feature selection: 4,084 unique terms

Exp design: baselines Knowledge division 1 agent • Without collaboration Upper bound baseline 2 agents 4 agents … 37agents Lower bound baseline

Exp design: agent collaboration Knowledge division 1 agent • With collaboration • Algorithms • Pursuit leaning • Nearest centroid learning • Parameters • Exploration rate • Max collaboration range Upper bound baseline 2 agents 4 agents … 37 agents Lower bound baseline

Exp design: Agent simulation model 2 • Parameters for learning/prediction: • 1. Exploration rate (r) • 2. Max collaboration range (g) 1 2 Doc Distributor 1 3 Documents # Classifier agent 4 Collaboration request Collaboration response

Exp results: non-collaboration baseline

Exp results: Collaboration effectiveness PL optimal zone random

Exp results: learning dynamics Learning converged?

Exp results: Classification efficiency

Exp results: Efficiency vs. Effectiveness

Summary • Classification effectiveness decreases dramatically when knowledge becomes increasingly distributed. • Pursuit Learning • Efficient – without analyzing contents • Effective, although not content sensitive • “The Pursuit Learning approach did not depend on document content. By acquiring knowledge through reinforcements based on collaborations this algorithm was able to construct/build paths for documents to find relevant classifiers effectively and efficiently.” • Nearest Centroid Learning • Inefficient – analyzing content • Effective • Future work • Other text collections • Knowledge overlap among the agents • Local neighborhood

Collaborative Classifier Agents

Collaborative Classifier Agents

Presentation Transcript

Classifier Systems

Collaborative vs. Mobile Agents

Classifier Review

Naïve Bayes Classifier

Classifier training

Bayesian Classifier

NAÏVE BAYES CLASSIFIER

CAST: Collaborative Agents for Simulating Teamwork

Classifier Systems

Naïve Bayes Classifier

Classifier training

Classifier training

Classifier Evaluation

Linear Classifier

Classifier training

Define: Classifier

COLLABORATIVE CLASSIFIER AGENTS

Bayesian Classifier

Spiral Classifier, Bucket Wheel Classifier

Classifier Systems

COLLABORATIVE CLASSIFIER AGENTS

Classifier Clarifier