E N D
Abstract • This study presents an analysis of two modified fuzzy ARTMAP neural networks. The modifications are first introduced mathematically. Then, the performance of the systems is studied on benchmark examples with noiseless data. It is shown that each modified ARTMAP system achieves classification accuracy superior to that of standard fuzzy ARTMAP, while retaining comparable complexity of the internal code. 1. In the first modified ARTMAP system, a graded choice-by-difference (CBD) signal function takes the choice signal Tj to be dependent on the input position, even when the input lies within the category box Rj. Namely, an input near the center of the box Rj generates a larger signal Tj than an input near the boundary of the box. In order to ensure that the same input would choose the same category if it were immediately re-presented (direct access), the ART match rule was also modified, to correspond to the new choice rule. The resulting graded signal function system creates more accurate decision boundaries, especially when these boundaries are not parallel to the input space axes.
2. In the second modified ARTMAP system, all category boxes Rj are point boxes. This simplified network learns with a fast-commit/no-recode rule, which does not allow any learning at node j once category j has been established. In addition, vigilance () is set to zero, which eliminates the matching system. Each input that makes a predictive error creates a new category, which is encoded as the input itself. The classification accuracy obtained with this point-box system is better than that of the other studied systems. However, the point-box system has a potential drawback in that its memory requirements may be high for large databases. To alleviate this problem, a strategy for on-line elimination of redundant categories is proposed and evaluated. This strategy can be interpreted as a rule for on-line forgetting of certain stored memories. Its application leads to a significant reduction in memory requirements while retaining classification accuracy.
ARTMAP Neural network for supervised learning. b Output class Category choice Signal function Input
1. Signal Functions ART1: Weber law (1987) Fuzzy ARTMAP: Choice-by-difference(CBD, 1994) NEW: Graded signal function that improves behavior for certain types of data
SIMULATIONS: Average of 10 runs, 1 training epoch Training points: 1,000 (DIAG) or 10,000 (CIS), Testing points: 10,000 (1994) CBD Signal Function Circle-in-Square (CIS) DIAGONAL 91.70 % correct 15.2 coding nodes 94.41 % correct 40.7 coding nodes
NEW Graded Signal Function DIAGONAL Circle-in-Square (CIS) 95.26 % correct (h=1) 17.8 coding nodes 96.73 % correct (h=0.5) 46.4 coding nodes • Smoother boundaries • Improved % correct but slightly more nodes
Choice Signal with CBD and Graded Signal Function in Two-dimensional Input Space NEW Graded signal function Signal position-dependent for inputs within Rj. Peak-height determined by parameter h (1994) CBD signal function Signal constant for inputs within aRj. Flat top (h=0)
Decision Boundaries Between Overlapping Category Boxes with CBD and Graded Signal Function R1 CBD R2 Graded
ARTMAP design principle Repeated presentation of the same input should lead to a choice of correct node without search. Need to modify match function. Graded Signal Function Implementation: Match b Output class Tj Match function Reset or Resonance ? Input Input
2. Point ARTMAP Outputclass ARTMAP Minimum algorithm with critical properties of Adaptive Resonance Theory b Point ARTMAP Output class Labeled coding points a Input Input
Point ARTMAP - Learning Cycle 2a. If chosen coding node matches the output class ... 1. New input presentation leads to a choice of closest coding point … do nothing Output class a 2b. If it does not ... … store current input into memory a Input a a a
Point ARTMAP Circle-in-Square (CIS) DIAGONAL 97.8 % correct 59.2 coding nodes 98.72 % correct 275 coding nodes
Point ARTMAP - Results • Best performance • Fastest learning • When training stops at a given fuzzy ARTMAP accuracy, memory sizes are comparable • Potential danger of many coding nodes HOW to restrict network size while assuring improvement of its performance as more patterns are presented? Possible solution:Compute continually for each node a measure of its usefulness/criticality and eliminate least useful nodes as needed.
Usefulness Rule When to eliminate a node? • Many possible choices, in this study “hard limit:” • Define the maximum network size N • Once network size N is reached, one of the existing nodes (the least useful one) is eliminated every time a new node is created How to compute usefulness of nodes? • Required general properties: • local, fast, simple computation • computed only for one (or a few) nodes per input presentation Criterion for acceptability of any usefulness rule: MN number of training patterns from a given training set, in response to which the network reaches size N. Usefulness rule must assure that the network code learned in response to M>MN inputs is better than that for MN inputs.
Usefulness Rule - Definition General definition: Usefulness error [if eliminated] Implemented rule: Usefulness updated every time nodewins the competition and gives correct prediction (step 2a). Usefulness increases if the node is critical,i.e., if elimination of the current winner would lead to a predictive error. Usefulness decreases if the node is non-critical,i.e., the network would give a correct predictioneven without it. Initial usefulness is zero. a
Point ARTMAP with On-line Elimination Circle-in-Square (CIS) with network size frozen after 10,000 iterations (275 nodes)
Development of Internal Code in Point ARTMAP with On-line Elimination 10,000 train. points 100,000 200,000 500,000 400,000 300,000 Learned code after presentation of training set of different sizes
Point ARTMAP with On-Line Elimination - Discussion Properties: • ability to improve coding with time without growing in size • ability to correct a learned error • ability to adapt in a non-stationary environment • can be understood as rule for optimal forgetting In a noisy environment: • without on-line elimination, the system will grow without limits • current elimination rule can lead to incorrect behavior • goal - find a rule that will secure optimum performance
Summary • Grdaded signal function is an extension of the CBD signal function that distinguishes between points within category boxes. • Point ARTMAP is a minimum version of a system for supervised learning based on Adaptive Resonance Theory. It is fast and very simple, with tendency to proliferate stored categories. This deficiency can be alleviated by several local pruning rules. • Both systems, especially Point ARTMAP, performed very well on benchmark problems with noiseless data. • Additional testing on noisy data is needed.
steepness parameter h Appendix 1: Simulations with Diagonal data 100 177.3 10000 training points 96 80 92 60 1000 training points % Correct predictions Number of coding nodes 88 40 10000 training points 100 training points 84 20 1000 training points 100 training points 80 0 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 Point ARTMAP Point ARTMAP fuzzy ARTMAP fuzzy ARTMAP Simulations of fuzzy ARTMAP with standard CBD (h=0) and with graded CBD function (h>0), and of Point ARTMAP on the diagonal benchmark problem
steepness parameter h Appendix 2: Simulations with Circle-in-the-square data 100 275 10000 training points 80 96 60 92 1000 training points % Correct predictions 10000 training points Number of coding nodes 40 88 1000 training points 20 84 100 training points 0 100 training points 80 0 0.2 0.4 0.6 0.8 1 Point ARTMAP 0 0.2 0.4 0.6 0.8 1 Point ARTMAP fuzzy ARTMAP fuzzy ARTMAP Simulations of fuzzy ARTMAP with standard CBD (h=0) and with graded CBD function (h>0), and of Point ARTMAP on the circle-in-the-square benchmark problem