PROBABILISTIC DISTANCE MEASURES FOR PROTOTYPE-BASED RULES

PROBABILISTIC DISTANCE MEASURES FOR PROTOTYPE-BASED RULES Marcin Blachnik, Tadeusz Wieczorek Department of Electrotechnology Faculty of Materials Engineering & Metallurgy, The Silesian University of Technology, Poland Włodzisław Duch Department of Informatics, Nicolaus Copernicus University, Poland, School of Computer Engineering, Nanyang Technological UniversitySingapore.

Outline • Type of rules • What are prototype rules? • Heterogeneous distance function • Probability density function (PDF) estimation • Results • Conclusions Taiwan

Types of rules • Crisp logical rules. • Rough sets and logic. • Fuzzy rules (F-rules). • Prototype rules (P-rules) – most general? P-rules with additive similarity functions may be converted into theneurofuzzy rules with “natural” membership functions, including nominal features. P-rules do not need the feature space. There are many neurofuzzy programs, but no P-rules so far. Taiwan

Motivation • Understanding data, situations, recognizing objects or making diagnosis people frequently use similarity to known cases, and rarely use logical reasoning, but soft computing experts use logic instead of similarity ... • Relations between similarity and logic are not clear. • Q1: How to obtain the same decision borders in Fuzzy Logic systems and Prototype Rule Based systems? • Q2: What type of similarity measure corresponds to a typical fuzzy functions and vice versa? • Q3: How to transform one type of a system into another type preserving their decision borders? • Q4: Are there any advantages of such transformations? • Q5: Can we understand data better using prototypes instead of logical rules? Taiwan

Example Taiwan

Prototype rules - advantages • Inspired by cognitive psychology: understanding data, situations, recognizing objects or making diagnosis people frequently use similarity to known cases, and rarely use logical reasoning. • With Heterogeneous Distance Functions P-rules supports all typesof attributes: continues, discrete, symbolic andnominal, while F-rules require numerical inputs. • Locally linear decision borders to avoid overfitting. • Many algorithms for prototype selection and optimization exist but they have not been applied to understand data. • Applications of P-rules to real datasets give excellent results generating small number of prototypes. Taiwan

Prototype rules - learning Learning process involves: • select similarity or dissimilarity (distance) functions • model optimization: the number and positions of prototypes Decision making task consist of: • calculating distance (similarity) to each prototype • assigning P-rule to calculate the output class as a rule Nearest Neighbourrule: If P=argminp’(D(X,P’)) Then Class(X)=Class(P) Threshold rule: If D(X,P)≤dp Then Class(X)=Class(P) Taking D(X,P) - Chebychev distance crisp logic rules are obtained Taiwan

Applications to real data (ICONIP’2004) Gene expression data for 2 types of leukaemia (Golub et al, Science 286 (1999) 531-537 Description: 2 classes, 1100 features, 3 most relevant selected. Used methods: 1 prototype/class LVQ, DVDM similarity measure. Results (number of misclassified vectors): Searching for Promoters in DNA strings Description: 2 classes, 57 features, all symbolic features. Used methods: 9 prototypes for promoters, 12 for nonpromoters, generated using C-means + LVQ, with VDM similarity measure. Results: 5 misclassified vectors in leave one out test. Taiwan

Distance (similarity) functions Continuous attributes Probabilistic Metrics Taiwan

Heterogeneous distance function Combine contributions from symbolic and real-valued features to get the distance. or use only probabilistic measures Taiwan

Probability density function estimation Problem: how to combine influence of nominal/symbolic? 1. Normalization – continuous  symbolic 2. Estimation – continuous attributes => prob. If estimation, then several options to get probabilities: • Discretization (DVDM) • Discretization + Interpolation (IVDM) • Gaussian kernel estimation (GVDM) • Rectangular Parzen window (LVDM) • Rectangular moving Parzen window (PVDM) Taiwan

Discretization Discretization & Interpolation Gaussian kernel Rect. Parzen window Moving Parzen windows. 3 overlapping Gaussians in 4D, good parameters for estimation. Taiwan

Discretization Discretization & Interpolation Gaussian kernel Moving Parzen wind. Rect. Parzen window 3 overlapping Gaussians in 4D, bad parameters for estimation. Taiwan

Testing and comparison procedure Two artificial datasets for testing, 2D 200 vectors/class uniform distribution 200 vectors/class normal distribution 6 real datasets with mixes symbolic/real features. • Flags (UCI repository) • Glass (UCI repository) • Promoters (UCI repository) • Wisconsin Brest Cancer, WBC (UCI repository) • Pima Indians diabetes (UCI repository) • Lancet (from A.J. Walker, S.S. Cross, R.F. Harrison, Visualization of biomedical datasets by use of growing cell structure networks: a novel diagnostic classification technique. Lancet Vol. 354, pp. 1518-1522, 1999.) For all tasks 10 fold CV test procedure is used. Taiwan

Classification results Results on artificial datasets. Left: Gaussian distributed. Right: uniform distributed. Similar results, except for convergence problems. Datasets with all symbolic or discrete values. leave-one-out results. Taiwan

Real datasets Taiwan

Results & discussion • Selection of appropriate parameters is very important. • Incorrect values if one uses: • too small sigma (Gaussian Estimation); • too narrow window (Rectangular Parzen Window estimations) • too many bins in discretization. Increased sensitivity of estimation methods => overfitting if • too high sigma (Gaussian Estimation); • too wide window (Rectangular Parzen Window estimations) • Low number of bins in discretization. Decreased sensitivity of estimation methods leading to over-generalization. • Middle values of parameters are best start pointsleading to good results (0.5, Parzen width0.5, Parzen step  0.01) Taiwan

Some conclusions • First step in understanding relations between fuzzy and similarity-based systems. • Prototype rules can be expressed using fuzzy rules and vice versa leading to new possibilities in both fields: new type of membership functions & new type of distance functions. • Expert knowledge can be captured in any kind of rules, but sometimes it may be more natural to express knowledge as P-rules (similarity) or as F-rules (logical conditions). • VDM measure used in P-rules leads to a natural shape of membership functions in fuzzy logic for symbolic data. • There is no best choice of heterogeneous distance function type or PDF estimation method or probability metrics. • Simplest methods may lead to good results. • Selection of appropriate parameters is very important. • P-systems should be as popular as neurofuzzy systems, although many open problems still remain, both theoretical and practical. Taiwan

Thank youfor lending your ears ...

PROBABILISTIC DISTANCE MEASURES FOR PROTOTYPE-BASED RULES