1.29k likes | 1.48k Views
Knowledge Integration by Genetic Algorithms. Prof. Tzung-Pei Hong Department of Electrical Engineering National University of kaohsiung. Outline. Introduction Review GAs Fuzzy Sets Related Studies Knowledge Integration Strategies Classification Rules Association Rules
E N D
Knowledge Integration by Genetic Algorithms Prof. Tzung-Pei Hong Department of Electrical Engineering National University of kaohsiung
Outline • Introduction • Review • GAs • Fuzzy Sets • Related Studies • Knowledge Integration Strategies • Classification Rules • Association Rules • Conclusions
Why Knowledge Integration • Four Reasons 1. Knowledge is distributed among sources … … Expert System RB1 RBi RBn Integration 2. It Increases reliability of knowledge-based systems GRB 4. Reduce the effort on developing an expert system or decision support system User Interface 3. Knowledge can be reused
Why Using GAs ? • Integration … … RB1 RBi RBn Integration must satisfy 1.Completeness 2.Correctness 3.Consistency 4.Conciseness Multi-objective optimization problem GAs finding optimal or nearly optimal solutions
Vague Knowledge • In Real-World Applications … … RB1 RBi RBn knowledge sources or data linguistic or ambiguous information Vagueness greatly influences the resulting knowledge base
Benefits • Medsker [95] • Knowledge integrated from different sources has good validity • Integrated knowledge can deal with more complex problems • Knowledge integration may improve the performance of the knowledge base • Integrating would facilitate building bigger and better systems cheaply
Traditional Knowledge Integration • Problems • When conflict occurs • Domain experts must intervene in the integration process • Subjective • Time consuming • Limited Integration • A small number of knowledge sources • more knowledge sources • More difficult and complex
Our Goals • Solve potential conflicts and contradictions • Integrate knowledge without human expert’s intervention • Improve the integration speed • Make the scale of knowledge sources
History of GAs • GA: Genetic Algorithm • History John Holland 1975 K. A. De Jong D. E. Goldberg
Idea of GA • Survival of the fittest • Iterative Procedure • Genetic operators • Reproduction • Crossover • Mutation • Near optimal solution
Simple Genetic Algorithms Start Initialize a population of individuals Evaluate each individual's fitness value Quit if : 1) Maximum generations are reached 2) Time limit is reached Select the superior individuals 3) Population is converged for reproduction No Yes Quit ? Apply crossover and perhaps mutation Evaluate new individual's fitness value stop
An Example • A Function • Find the max
Step1 • Define a suitable representation • Each Chromosome • 12 bits • e.g. t = 0 000000000000 t = 1 111111111111 t = 0.680 101011100001
Step2 • Create an initial population of N • N Population size • Assume N = 40
Step3 • Define a suitable fitness function f to evaluate the individuals • Fitness function f(t) • e.g. The first six individuals
Step 4 • Perform the crossover and the mutation operations to generate the possible offsprings
Crossover • Offsprings: • Inheriting some characteristics of their parents • e.g. Parent 1 : 00011 0000001 Parent 2 : 01001 1001101 Child 1 : 000111001101 Child 2 : 010010000001
Mutation • Offsprings • possessing different characteristics from their ascendents • Preserving a reasonable level of population diversity • e.g. Bit change • e.g. Inversion 0 1 1 1 0 0 0 0 0 1 0 0 1 1 1 1 0 0 0 0 0 1 0 0 1 1 1 0 1 1 0 0 0 1 0 0 1 1 1 1 0 1 0 0 0 1 0 0
New Offsprings • The new offsprings produced by the operators
Step 5 • Replace the individual • e.g. The first six individuals NEW
Step 6 • If the termination criteria are not satisfied, go to Step 4; otherwise, stop the genetic algorithm • The termination criteria • The maximum number of generations • The time limit • The population converged
Fuzzy Sets • 傳統電腦決策 • 不是對(1)就是錯(0) 例如:25歲以上是青年,那26歲就是中年? 60分以上是及格,那60分以下就是不及格 • 何謂模糊 • 在對(1)與錯(0)之間,再多加幾個等級 • 幾乎對(0.8) • 可能對(0.6) • 可能錯(0.4) • 幾乎錯(0.2)
Fuzzy Sets 再多分成幾級 連續 • Question:168公分到底算不算高? 矮 中 高 隸屬度 身高(Cm) 160 170 180
Example:“Close to 0” • e.g. • μA(3) = 0.01 • μA(1) = 0.09 • μA(0.25) = 0.62 • μA(0) = 1 • Define a Membership Function: μA(x) =
Example:“Close to 0” • Very Close to 0: μA(x) =
Fuzzy Set (Cont.) 0.6 sunny 0.8 sunny x 0.1 sunny • Membership function • [0, 1] • e.g. • sunny : x → [0, 1]
Fuzzy Set Sunny Not sunny 1 0.8 0.6 0.4 0.2 0 • Simple • Intuitively pleasing • A generalization of crisp set • Vague member → non-member 0 or 1 Non-member member gradual
Fuzzy Operations • 交集(AND) • 取較小的可能性 EX:學生聰明(0.8) 而且 用功(0.6) 則是模範生(0.6) • 聯集(OR) • 取較大的可能性 EX:學生聰明(0.8) 或者 用功(0.6) 則是模範生(0.8) • 反面(NOT) • 取與1的差 EX:學生聰明是0.8, 則學生不聰明0.2
Fuzzy Inference Example 大眼睛 小嘴巴 身材好 陶晶瑩 0 0.8 0.3 張惠妹 1 0.6 0.8 李 玟 0 0.3 0.9 李心潔 0.7 0.1 0.5 蔡依林 0.8 0.5 0.3 • 洪老師找小老婆的條件 • (大眼睛而且小嘴巴)或者是身材好 Question : 誰是最佳女主角
Answer • 對陶晶瑩= (0 AND 0.8) OR 0.3 = 0 OR 0.3 = 0.3 • 對張惠妹= (1 AND 0.6) OR 0.8 = 0.8 • 對李 玟= (0 AND 0.3) OR 0.9 = 0.9 • 對李心潔= (0.7 AND 0.1) OR 0.5 = 0.5 • 對蔡依林= (0.8 AND 0.5) OR 0.3 = 0.5 • 李 玟 為最佳選擇! 謝謝!
Fuzzy Decision • A = {A1, A2, A3, A4, A5} • A set of alternatives • C = {C1, C2, C3} • A set of criteria
Example (Cont.) • Assume : C1 and C2 or C3 • E (Ai) : evaluation function • E (A1) = (0 0.8) 0.3 = 0 0.3 = 0.3 • E (A2) = (1 0.6) 0.8 = 0.6 0.8 = 0.8 • E (A3) = (0 0.3) 0.9 = 0 0.9 = 0.9 the best choice • E (A4) = (0.7 0.1) 0.5 = 0.1 0.5 = 0.5 • E (A5) = (0.8 0.5) 0.3 = 0.5 0.3 = 0.5
Review of Knowledge Integration Knowledge Integration Cooperative Approach Centralized Approach Blackboard LPC Model Integrity Constraints Repertory Grid Genetic Algorithm Decision Table
GA-Based Classifier Systems GA-Based Classifier Systems Michigan Approach Pittsburgh Approach rule 1 xxxxxxx.... rule set 1 rrrrrrrrr.... rule 2 rule set 2 zzzzzzzzzzzz.... yyyyyyy.... nnnnnn.... rule n rule set m mmmm.......
Genetic Knowledge Integration Michigan Approach Pittsburgh Approach GKIDSO Approach TPGKI Approach MGKI Approach Vague Knowledge GFKILM Approach GFKIGM Approach TPGFKI Approach MGFKI Approach
Integration of Classification Rules • Four Methods • GKIDSO • Genetic Knowledge-Integration approach with Domain-Specific Operators • TPGKI • Two-Phase Genetic Knowledge Integration • GFKILM • Genetic-Fuzzy Knowledge-Integration with several sets of Local Membership functions • GFKIGM • Genetic-Fuzzy Knowledge-Integration with a set of Global Membership functions
Genetic Knowledge-Integration Framework Training Data Set 1 Training Data Set m Expert Group 1 Expert Group n K.A. Tool 1 K.A. Tool n M.L Method 1 M.L Method m Rule Set Rule Set Rule Set Rule Set Dictionary Dictionary Dictionary Dictionary Encoding Global Feature Set &Class Set Intermediary representation Intermediary representation Intermediary representation Intermediary representation GA-Based Knowledge Integration Integrating Case Set Knowledge Base Dictionary
Knowledge Integration Rule Set Knowledge Input Rule Set Rule Set Knowledge Encoding Genetic Knowledge Integration Knowledge Verification Knowledge Knowledge Decoding Integration Data Set Knowledge Base
GKIDO Approach Knowledge integration Knowledge encoding Generation k Generation 0 Initial population Chromosome Chromosome 1 1 RS 1 Chromosome 1 Chromosome Chromosome 2 2 Chromosome genetic RS 2 2 Chromosome Chromosome 3 3 Chromosome RS 3 3 operators Chromosome Chromosome RS m Chromosome m m m • Genetic Knowledge-Integration approach with Domain-Specific Operators • Consists of two parts • Encoding • Integration
Knowledge Encoding Rule Set Intermediary Rule Intermediary Rule Fixed-Length Rule String Fixed-Length Rule String Variable-Length Rule-Set String
Example: Brain Tumor • Two classes: {Adenoma, Meningioma} • Three features: • {Location, Calcification, Edema} • Feature values for Location • {brain surface, sellar, brain stem} • Feature values for Calcification • {no, marginal, vascular-like, lumpy} • Feature values for Edema • {no, < 2 cm, < 0.5 hemisphere}
Intermediary Rules • Two Rules • R1:IF (Location=sellar) and (Calcification=no) then Asenoma • R2:IF (Location=brain surface) and (Edema< 2cm) then Meningioma dummy test R1:IF(Location=sellar) and (Calcification=no) and (Edema= no , or < 2 cm , or < 0.5 hemisphere) then Asenoma R2:IF(Location=brain surface) and (Calcification= no or marginal or vascular-like or lumpy)and (Edema< 2cm) then Meningioma
Fixed-Length Rule String Location Calcification Edema Classes R1 : 010 1000 111 10 R2 : 100 1111 010 01 R1:IF(Location=sellar) and (Calcification=no) and (Edema= no , or < 2 cm , or < 0.5 hemisphere) then Asenoma R2:IF(Location=brain surface) and (Calcification= no or marginal or vascular-like or lumpy)and (Edema< 2cm) then Meningioma
Knowledge Integration Genetic Operation Crossover Initial Population Generation 1 Mutation Rule Set 1 Fusion Rule Set 1 Fission Rule Set 2 Rule Set 2 Rule Set n Rule Set n Fitness Function
Fitness Function - - • Formally • where - is a control parameter
Crossover r r r 11 1 i 1 n 6 4 4 7 4 4 8 6 4 4 7 4 4 8 6 4 4 7 4 4 8 : 100 11 01 10 001 01001 0101010 0010101011 00 L L L L RS 1 4 2 4 3 1 { cp 7 bits 1 r r r 2 j 21 2 m 6 4 4 7 4 4 8 6 4 4 7 4 4 8 6 4 4 7 4 4 8 : 0100110011 00 11011 1010101 1000110011 01 L L L RS 1 4 2 4 3 2 { 7 bits cp 2 crossover 1010101 1000110011 01 : 1001101100 01 01001 L L L L L O 1 0101010 0010101011 00 : 0100110011 00 11011 L L O 2
Fusion • Eliminate redundancy and subsumption • Redundancy • R1: if A then B • R2: if A then B • Subsumption • R1: if A and C then B • R2: if A then B
Fusion (Cont.) • Eliminate redundancy • Eliminate subsumption
Fission • Eliminate misclassification and contradiction • Misclassification • e: (A, C) R: if A then B • Contradiction • R: if A then B or C R1: if A then B R2: if A then C