1 / 18

Personalized Presentation: Pick Your Choice

Personalized Presentation: Pick Your Choice. My PhD Theme Research that contributes to making data mining widely applicable Your choice of data mining topics: To Protect and Serve: Automated Construction of Classifiers for Scene Classification and Porn Filtering

Download Presentation

Personalized Presentation: Pick Your Choice

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Personalized Presentation:Pick Your Choice • My PhD Theme • Research that contributes to making data mining widely applicable • Your choice of data mining topics: • To Protect and Serve: Automated Construction of Classifiers for Scene Classification and Porn Filtering • Viral Mining: Benchmarking Artificial Immune Systems for Classification Tasks • Keep it Simple and Be Self Confident: a Bias Variance Analysis of the CoIL Challenge 2000 Data Mining Competition

  2. Viral Mining: Benchmarking Artificial Immune Systems for Classification Tasks Ling Jun Meng & Peter van der Putten

  3. Problem Statement • Artificial Immune Systems – the newest biologically inspired computing paradigm • Question • ‘Old wine in new bags’? Added value for real world data mining? • Approach: • Benchmark AIRS for classification by end user data mining (real world conditions) • Characterize AIRS relative to other algorithms and data sets

  4. Background • ‘The Second Brain’ • Immune response • Primary: in response to intruder • Secondary: remember intrusion • Immune System Entities • Antigens: intruder • B-cells, antobodies, T-cells: cells / proteins produced in response • Memory cells: memorize intrusion

  5. Memory cell ARB 1.present a training data 3. added into the memory cells pool or replace an existing memory cell 5. classification 2. generate a candidate memory cell 4. repeat until all the training instances are represented. Artificial Immune Recognition System (AIRS)

  6. Overview of the AIRS algorithm • Seed the memory cell pool (MC) • For each training instance(agi), do: • If MC is empty, add agi to MC • Select the memory cell (mc) in MC of the same classification having the highest affinity to agi • Clone mc in proportion to its affinity to agi • Mutate each clone and add to ARB pool (AB) • Allocate resources to AB and remove the weak cells (Limited resource mechanism) • Calculate the average stimulation of AB to agi to check for termination. • If termination is not met, clone and mutate a random selection of ARB cells and then check termination again. Repeat until termination. • Select the ARB cell with the highest affinity as mccandiate, if mccandiate has higher affinity than mc, add mccandiate to MC. If mc and mccandiate are sufficiently similar, then remove mc from MC. • Perform kNN classification using MC.

  7. Experiments: Data sets

  8. Experiments: Algorithms

  9. Experiments: Results

  10. Algorithm similarity methods • Correlation measure • Prediction better or worse than average • Correlation on standardized accuracy

  11. Algorithm similarity results

  12. The influence of data set size • Experimental design • ten artificial data sets from Diabetes-10%, Diabetes-20% to Diabetes-90%, and Diabetes-100% • use log trend curve to have a better appearance of the pattern.

  13. Results of data size influence

  14. Pattern 1

  15. Pattern 2

  16. Pattern 3

  17. Discussion • AIRS can be deemed as a reasonable classifier. However, inconsistent to early claims that AIRS performs far better than average algorithms, it is very close to the average of these algorithms. • AIRS behaves more like IBk and MLP over those benchmarking data sets. • With the increase of data set size, AIRS increases faster in performance than MLP, while increase slower in performance than IB1. It has similar increasing curve with other algorithms.

  18. Questions?www.liacs.nl/~puttenputten@liacs.nl

More Related