190 likes | 298 Views
A Generalized Version Space Learning Algorithm for Noisy and Uncertain Data. T.-P. Hong, S.-S. Tseng IEEE Transactions on Knowledge and Data Engineering, Vol. 9, No. 2, 1997 2002. 11. 14 임희웅. Introduction. Generalized learning strategy of VS Noisy & uncertain training data
E N D
A Generalized Version Space Learning Algorithm for Noisy and Uncertain Data T.-P. Hong, S.-S. Tseng IEEE Transactions on Knowledge and Data Engineering, Vol. 9, No. 2, 1997 2002. 11. 14 임희웅
Introduction • Generalized learning strategy of VS • Noisy & uncertain training data • Searching & pruning • Trade-off between including positive training instances and excluding negative ones • Trade-off between computational time consumed and the accuracy by pruning factors
New Definition of S/G • Addition Information : Count • Sum of positive/negative information implicit in the training instances presented so far. • S/G boundary • S • A set of the first i maximally consistent hypotheses. • No other hypothesis in S exists which is both more specific than another and has equal or larger count. • G • A set of the first j maximally consistent hypotheses. • No other hypothesis in G exists which is both more general than another and has equal or larger count.
FIPI • FIPI • Factor of Including Positive Instances • Trade-off between including positive training instances vs. excluding negative ones • 0~1, real number • 0: only to include positive training example • 1: only to exclude negative training example • 0.5: same importance
Certainty Factor (CF) • A measure for positiveness • -1~1, real number • -1: negative example • 1: positive example • In case of new training example of CF • S(1+CF)/2 positive example • G(1-CF)/2 negative example
Learning Process • Searching & Pruning • Searching • Generate and collects possible candidates into a large set • Pruning • Prune above set according to the degree of consistency of the hypotheses
Input & Output • Input • A set of n training instances each with CF • FIPI • i: the max # of hypotheses in S • J: the max # of hypotheses in G • Output • The hypotheses in sets S and G that are maximally consistent with the training instances.
Step 1 & 2 • Step 1 • Initialize S=, & G=<?> with count 0 • Step 2 • For each training instance with uncertainty CF, do Step 3 to Step 7.
Step 3 – Search 1 • Generalize/Specialize each hypothesis in S/G • ck: count of hypothesis in S/G • Attach new count • ck+(1+CF)/2 / ck+(1-CF)/2 • S’/G’
Step 4 – Search 2 • Find the set S”/G” • Which Include/exclude only the new training instance itself • Set the count of each hypothesis in S”/G” to be (1+CF)/2 / (1-CF)/2
Step 5 – Pruning 1 • Combine S/G, S’/G’, and S”/G” • Identical hypotheses • only with maximum count is retained • If a particular hypothesis is both more general/specific than another and has an equal or smaller count, discard that.
Step 6 – Confidence Calc. • Confidence of each new hypothesis • For each hypothesis s with count cs in the new S • Find the hypothesis g in the new G that is more general than s and has the maximum count cg • Confidence = FIPI cs + (1-FIPI) cg • For each hypothesis g with count cg in the new G • Do the same.
s (count=cs), … specific S Confidence of s = FIPI cs + (1-FIPI) max(cg) g is more general than s g (count=cg), … G general Confidence of g = FIPI cs + (1-FIPI) max(cg)
Step 7 – Pruning 2 • Select only i/j hypotheses with highest confidence in the new S/G
Another Papers • GA • L. De Raedt, et al., “A Unifying Framework for Concept-Learning Algorithms”, Knowledge Engineering Rev., vol. 7, no. 3, 1989 • R. G. Reynolds, et al., “The Use of Version Space Controlled Genetic Algorithms to Solve the Boole Problem”. Int’l J. Artificial Intelligence Tools, vol. 2, no. 2, 1993 • Fuzzy • C. C. Lee, “Fuzzy Logic in Control Systems: Fuzzy Logic Controller Part1&2”, IEEE Trans. Systems, Man, and Cybernetics, vol. 20, no. 2, 1990 • L. X. Wang, et al., “Generating Fuzzy Rules by Learning from Examples”, Proc. IEEE Conf. Fuzzy Systems, 1992