Personalized Presentation: Pick Your Choice

Personalized Presentation:Pick Your Choice • My PhD Theme • Research that contributes to making data mining widely applicable • Your choice of data mining topics: • To Protect and Serve: Automated Construction of Classifiers for Scene Classification and Porn Filtering • Viral Mining: Benchmarking Artificial Immune Systems for Classification Tasks • Keep it Simple and Be Self Confident: a Bias Variance Analysis of the CoIL Challenge 2000 Data Mining Competition

Viral Mining: Benchmarking Artificial Immune Systems for Classification Tasks Ling Jun Meng & Peter van der Putten

Problem Statement • Artificial Immune Systems – the newest biologically inspired computing paradigm • Question • ‘Old wine in new bags’? Added value for real world data mining? • Approach: • Benchmark AIRS for classification by end user data mining (real world conditions) • Characterize AIRS relative to other algorithms and data sets

Background • ‘The Second Brain’ • Immune response • Primary: in response to intruder • Secondary: remember intrusion • Immune System Entities • Antigens: intruder • B-cells, antobodies, T-cells: cells / proteins produced in response • Memory cells: memorize intrusion

Memory cell ARB 1.present a training data 3. added into the memory cells pool or replace an existing memory cell 5. classification 2. generate a candidate memory cell 4. repeat until all the training instances are represented. Artificial Immune Recognition System (AIRS)

Overview of the AIRS algorithm • Seed the memory cell pool (MC) • For each training instance(agi), do: • If MC is empty, add agi to MC • Select the memory cell (mc) in MC of the same classification having the highest affinity to agi • Clone mc in proportion to its affinity to agi • Mutate each clone and add to ARB pool (AB) • Allocate resources to AB and remove the weak cells (Limited resource mechanism) • Calculate the average stimulation of AB to agi to check for termination. • If termination is not met, clone and mutate a random selection of ARB cells and then check termination again. Repeat until termination. • Select the ARB cell with the highest affinity as mccandiate, if mccandiate has higher affinity than mc, add mccandiate to MC. If mc and mccandiate are sufficiently similar, then remove mc from MC. • Perform kNN classification using MC.

Experiments: Data sets

Experiments: Algorithms

Experiments: Results

Algorithm similarity methods • Correlation measure • Prediction better or worse than average • Correlation on standardized accuracy

Algorithm similarity results

The influence of data set size • Experimental design • ten artificial data sets from Diabetes-10%, Diabetes-20% to Diabetes-90%, and Diabetes-100% • use log trend curve to have a better appearance of the pattern.

Results of data size influence

Pattern 1

Pattern 2

Pattern 3

Discussion • AIRS can be deemed as a reasonable classifier. However, inconsistent to early claims that AIRS performs far better than average algorithms, it is very close to the average of these algorithms. • AIRS behaves more like IBk and MLP over those benchmarking data sets. • With the increase of data set size, AIRS increases faster in performance than MLP, while increase slower in performance than IB1. It has similar increasing curve with other algorithms.

Questions?www.liacs.nl/~puttenputten@liacs.nl

Personalized Presentation: Pick Your Choice

Personalized Presentation: Pick Your Choice

Presentation Transcript

It’s your choice...

Oral Presentation (Your Choice of Title Here)

Pick Your Choice of PlayStation 3 Controllers

YOUR CHOICE WYHC

Pick your decade Pick your topic Pick your book

Your Future, Your Choice

STOCK PICK PRESENTATION

Your Choice, Your Voice

It’s your choice!

Your personal choice

It’s your choice

Pick Your Side !

Your Choice, Your Voice

It’s your choice!

Order Your Choice

Pick A Deal Of Your Choice By 10deals

Pick your perfect gemstone

Pick Your Side

Get The Personalized Music Box Of Your Choice

Pick You Choice of Commercial Microwave Oven

Oral Presentation (Your Choice of Title Here)

TWO CHOICE PRICE PRESENTATION