390 likes | 518 Views
Data mining II The fuzzy way. Włodzisław Duch Dept. of Informatics, Nicholas Copernicus University, Toruń, Poland http://www.phys.uni.torun.pl/~duch. ISEP Porto, 8-12 July 2002. Basic ideas. Complex problems cannot be analyzed precisely
E N D
Data miningIIThe fuzzy way Włodzisław Duch Dept. of Informatics, Nicholas Copernicus University, Toruń, Poland http://www.phys.uni.torun.pl/~duch ISEP Porto, 8-12 July 2002
Basic ideas • Complex problems cannot be analyzed precisely • Knowledge of an expert may be approximated using imprecise concepts.If the weather is nice and the place is attractive then not many participants stay at the school. Fuzzy logic/systems include: • Mathematics of fuzzy sets/systems, fuzzy logics. • Fuzzy knowledge representation for clusterization, • Classification and regression. • Extraction of fuzzy concepts and rules from data. • Fuzzy control theory.
Types of uncertainty • Stochastic uncertaintyRolling dice, accident, insurance risk… - probability theory. • Measurement uncertainty About 3 cm; 20 degrees - statistics. • Information uncertaintyTrustworthy client, known constraints - data mining. • Linguistic uncertaintySmall, fast, low price – fuzzy logic.
Crisp sets young = { xM | age(x) 20 } myoung(x) myoung(x) ={ 1 : age(x) 20 0 : age(x) > 20 Membership function A=“young” 1 0 x [years]
Fuzzy sets X-universum, space; xX A-linguistic variable, concept, fuzzy set. mA – a Membership Function (MF), determining the degree, to which xbelongs to A. Linguistic variables, concepts – sums of fuzzy sets. Logical predicate functions with continuous values. Membership value: different from probability. m(bold) = 0.8 does not mean bold 1 in 5 cases. Probabilities are normalized to 1, MF are not. Fuzzy concepts are subjective and context-dependent.
Fuzzy examples Crisp and fuzzy concept „young men” A=“young” A=“young” 1 1 =0.8 0 0 x [years] x [years] x=20 x=23 „Boiling temperature” has value around 100 degrees (pressure, chemistry).
a=0.6 Few definitions Support of a fuzzy set A: supp(A) = { xX : A(x) > 0 } Coreof a fuzzy set A: core(A) = { xX : A(x) =1 } a-cutof a fuzzy set A: Aa = { xX : A(x) > a } Height= maxxA(x) 1 Normal fuzzy set: sup xXA(x) = 1
Definitions illustrated MF 1 .5 a 0 Core X Crossover points a - cut Support
Types of MF Trapezoid: <a,b,c,d> Gaus/Bell: N(m,s) (x) (x) 1 1 s 0 0 a b c d x c x
1 0 a b x MF example Singleton: (a,1) i (b,0.5) Triangular: <a,b,c> (x) (x) 1 0 a b c x
Linguistic variables W=20 => Age=young. L. variable = L. value. L. variable: : temperature terms, fuzzy sets : { cold, warm, hot} (x) cold warm hot 1 0 20 40 x [C]
Fuzzy numbers MP are usually convex, with single maximum. MPs for similar numbers overlap. Numbers: core= point, x (x)=1 Decrease monotonically on both sides of the core. Typically: triangular functions (a,b,c) or singletons.
Fuzzy rules Commonsense knowledge may sometimes be captured in an natural way using fuzzy rules. IF L-variable-1 = term-1 andL-variable-2 = term-2 THEN zm. L-variable-3 = term-3 IFTemperature = hotandair-condition price = low THENcooling = strong What does it mean for fuzzy rules: IFxis A then yis B ?
Fuzzy implication If => means correlation T-norma T(A,B) is sufficient. A=>B has many realizations.
y y B B x x A A Interpretation of implication Ifxis A thenyis B: correlationor implication. A=>B not A or B A entails B A=>B A and B
Types of rules FIR, Fuzzy Implication Rules. Logic of implications between fuzzy facts. FMR, Fuzzy Mapping Rules. Functional dependencies, fuzzy graphs, approximation problems. Mamdani type: IF MFA(x)=high then MFB(y)=medium. Takagi-Sugeno type: IF MFA(x)=high then y=fA(x) Linear fA(x) – first order Sugeno type. FIS, Fuzzy Inference Systems. Combine rules fuzzy rules to calculate final decisions.
Fuzzy approximation • Fuzzy systems F: n p use m rules to map vectorx on the output F(x), vector or scalar. Singleton model:Ri: IF xis AiThen y is bi
IFTemperatura=chilly andHeating-price=expensive THEN heating=no Rules base Temperature freezingcold chilly Heating Price cheap so-so expensive full full medium full medium weak medium weak no IFTemperature=freezing andHeating-price=cheap THEN heating=full
1. Fuzzification Fuzzification: from measured values to MF: Determine membership degrees for all fuzzy sets (linguistic variables): Temperature: T=15 C Heating-price: p=48 Euro/MBtu chilly(T)=0.5 cheap(p)=0.3 1 1 0.5 0.3 0 0 t p 15C 48 Euro/MBtu IF Temperature = chilly and Heating-price = cheap...
chilly(T)=0.5 cheap(p)=0.3 1 1 0.5 0.3 0 0 t p 15C 48 Euro/MBtu IF Temperature=chilly and Heat-price=cheap... 2. Term composition Calculate the degree of rule fulfillment for all conditionscombining terms using fuzzy AND, ex. MIN operator. A(X)= A1(X1) A2(X2) AN(XN)for rules RA all(X)= min{chilly(t), cheap(p)} = min{0.5,0.3} = 0.3
3. Inference Calculate the degree of truth of rule conclusion: use T-norms such as MIN or product to combine the degree of fulfillment of conditions and the MF of conclusion. full(h) conclusions(h) 1 Inference MIN concl=min{cond,full} ... cond=0.3 0 h THEN Heating=full mocno(h) konkl(h) 1 ... cond =0.3 Inferenceconcl. = cond •full 0 h
4. Aggregation Aggregate all possible rule conclusion using MAX operator to calculate the sum. THEN Heating=full THEN Heating =medium THEN Heating =no 1 0 h
5. Defuzzification Calculate crisp value/decision using for example the “Center of Gravity” (COG) method: concl(h) COG 1 0 h 73 For discrete sets a „center of singletons”, for continuous: mi= degree of membership ini Ai = area under MF for the seti ci = center of gravity for the seti. Simi• Ai • ci Simi• Ai h =
FIS for heating Fuzzification Defuzzification Inference Rule base if temp=freezing then valve=open freeze cold warm full half closed freeze=0.7 0.7 0.7 if temp=cold then valve=half open 0.2 0.2 cold =0.2 T v Measured temperature if temp=warm then valve=closed Output that controls the valve position hot =0.0
Takagi-Sugeno rules Mamdani rules: conclude that IF X1= A1i X2=A2 … Xn= AnThen Y = B TS rules: conclude some functional dependence f(xi) IF X1= A1i X2= A2 …. Xn= AnThen Y=f(x1,x2,..xn) TSrules are usually based on piecewise linear functions(equivalent to linear splines approximation): IF X1= A1i X2= A2…Xn= AnThen Y=a0 + a1x1 …+anxn
Fuzzy system in Matlab rulelist=[ 11311 1 2 3 1 1 1 3 2 1 1 2 1 3 1 1 2 2 2 1 1 2 3 1 1 1 3 1 2 1 1 3 2 3 1 1 3 3 3 1 1]; fis=addrule(fis,rulelist); showrule(fis) gensurf(fis); Surfview(fis); 1. If (temperature is cold) and (oilprice is normal) then (heating is high) (1) 2. If (temperature is cold) and (oilprice is expensive) then (heating is medium) (1) 3. If (temperature is warm) and (oilprice is cheap) then (heating is high) (1) 4. If (temperature is warm) and (oilprice is normal) then (heating is medium) (1) 5. If (temperature is cold) and (oilprice is cheap) then (heating is high) (1) 6. If (temperature is warm) and (oilprice is expensive) then (heating is low) (1) 7. If (temperature is hot) and (oilprice is cheap) then (heating is medium) (1) 8. If (temperature is hot) and (oilprice is normal) then (heating is low) (1) 9. If (temperature is hot) and (oilprice is expensive) then (heating is low) (1) first input second input output rule weight operator (1=AND, 2=OR)
Fuzzy Inference System (FIS) IFspeed is slowthen break = 2 IFspeed is medium then break = 4* speed IFspeed is high then break = 8* speed MF(speed) slow medium high .8 .3 .1 speed 2 R1: w1 = .3; r1 = 2 R2: w2 = .8; r2 = 4*2 R3: w3 = .1; r3 = 8*2 Break = S(wi*ri) / Swi = 7.12
First-order TS FIS • Rules • IF X is A1andY is B1then Z = p1*x + q1*y + r1 • IF X is A2and Y is B2thenZ = p2*x + q2*y + r2 • Fuzzy inference A1 B1 z1 = p1*x+q1*y+r1 w1 X Y A2 B2 z2 = p2*x+q2*y+r2 w2 X Y w1*z1+w2*z2 x=3 y=2 z = P w1+w2
Induction of fuzzy rules All this may be presented in form on networks. Choices/adaptive parameters in fuzzy rules: • The number of rules (nodes). • The number of terms for each attribute. • Position of the membership function (MF). • MF shape for each attribute/term. • Type of rules (conclusions). • Type of inference and composition operators. • Induction algorithms: incremental or refinement. • Type of learning procedure.
Feature space partition Regular grid Independent functions
MFs on a grid • Advantage: simplest approach • Regular grid: divide each dimension in a fixed number of MFs and assign an average value from all samples that belong to the region. • Irregular grid: find largest error, divide the grid there in two parts adding new MF. • Mixed method: start from regular grid, adapt parameters later. • Disadvantages: for kdimensions and NMFs in each Nkareas are created !Poor quality of approximation.
Optimized MP • Advantages: higher accuracy, better approximation, less functions, context dependent MPs. • Optimized MP may come from: • Neurofuzzy systems – equivalent to RBF network with Gaussian functions (several proofs). FSM models with triangular ortrapezoidal functions.Modified MLP networks with bicentral functions, etc. • Decision trees, fuzzy decision trees. • Fuzzy machine learning inductive systems. • Disadvantages: extraction of rules is hard, optimized MPs are more difficult to create.
Improving sets of rules. • How to improve known sets of rules? • Use minimization methods to improve parameters of fuzzy rules: usually non-gradient methods are used; most often genetic algorithms. • change rules into neural network, train the network and convert it into rules again. • Use heuristic methods for local adaptation of parameters of individual rules. • Fuzzy logic – good for modeling imprecise knowledgebut ... • How do the decision borders of FIS look like? Is it worthwhile to make input fuzzy and output crisp? • Is it the best approximation method?
Fuzzy rules and data uncertainty Data has been measured with unknown error. Assume Gaussian distribution: x – fuzzy number with Gaussian membership function. A set of logical rules R is used for fuzzy input vectors: Monte Carlo simulations for arbitrary system => p(Ci|X) Analytical evaluationp(C|X) is based on cumulant: Error function is identical to logistic f. < 0.02
Fuzzification of crisp rules RuleRa(x) = {x>a}is fulfilled byGxwith probability Error function is approximated by logistic function; assuming error distributions(x)(1- s(x)), fors2=1.7 approximates Gauss<3.5% RuleRab(x) = {b> x>a} is fulfilled byGxwith probability:
Soft trapezoids and NN The difference between two sigmoids makes a soft trapezoidal membership functions. Conclusion: fuzzy logic withs(x) -s(x-b)m.f. is equivalent to crisp logic + Gaussian uncertainty. Gaussian classifiers (RBF) are equivalent to fuzzy systems with Gaussian membership functions.
Optimization of rules Fuzzy: large receptive fields, rough estimations. Gx – uncertainty of inputs, small receptive fields. Minimization of the number of errors – difficult, non-gradient, but now Monte Carlo or analytical p(C|X;M). • Gradient optimization works for large number of parameters. • Parameterssxare known for some features, use them as optimization parameters for others! • Probabilities instead of 0/1 rule outcomes. • Vectors that were not classified by crisp rules have now non-zero probabilities.
Summary • Fuzzy sets/logic is a useful form of knowledge representation, allowing for approximate but natural expression of some types of knowledge. • An alternative way is to include uncertainty of input data while using crisp logic rules. • Adaptation of fuzzy rule parameters leads to neurofuzzy systems; the simplest are the RBF networks and Separable Function Networks (SFN), equivalent to any fuzzy inference systems. • Results may sometimes be better than with other systems since it is easier to include a priori knowledge in fuzzy systems.
Disclaimer A few slides/figures were taken from various presentations found in the Internet; unfortunately I cannot identify original authors at the moment, since these slides went through different iterations; one source seems to be J.-S. Roger Jang from NTHU, Taiwan. I have to apologize for that.