1 / 19

A Generalized Version Space Learning Algorithm for Noisy and Uncertain Data

A Generalized Version Space Learning Algorithm for Noisy and Uncertain Data. T.-P. Hong, S.-S. Tseng IEEE Transactions on Knowledge and Data Engineering, Vol. 9, No. 2, 1997 2002. 11. 14 임희웅. Introduction. Generalized learning strategy of VS Noisy & uncertain training data

tova
Download Presentation

A Generalized Version Space Learning Algorithm for Noisy and Uncertain Data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Generalized Version Space Learning Algorithm for Noisy and Uncertain Data T.-P. Hong, S.-S. Tseng IEEE Transactions on Knowledge and Data Engineering, Vol. 9, No. 2, 1997 2002. 11. 14 임희웅

  2. Introduction • Generalized learning strategy of VS • Noisy & uncertain training data • Searching & pruning • Trade-off between including positive training instances and excluding negative ones • Trade-off between computational time consumed and the accuracy by pruning factors

  3. New Definition of S/G • Addition Information : Count • Sum of positive/negative information implicit in the training instances presented so far. • S/G boundary • S • A set of the first i maximally consistent hypotheses. • No other hypothesis in S exists which is both more specific than another and has equal or larger count. • G • A set of the first j maximally consistent hypotheses. • No other hypothesis in G exists which is both more general than another and has equal or larger count.

  4. FIPI • FIPI • Factor of Including Positive Instances • Trade-off between including positive training instances vs. excluding negative ones • 0~1, real number • 0: only to include positive training example • 1: only to exclude negative training example • 0.5: same importance

  5. Certainty Factor (CF) • A measure for positiveness • -1~1, real number • -1: negative example • 1: positive example • In case of new training example of CF • S(1+CF)/2 positive example • G(1-CF)/2 negative example

  6. Learning Process • Searching & Pruning • Searching • Generate and collects possible candidates into a large set • Pruning • Prune above set according to the degree of consistency of the hypotheses

  7. Learning Process

  8. Input & Output • Input • A set of n training instances each with CF • FIPI • i: the max # of hypotheses in S • J: the max # of hypotheses in G • Output • The hypotheses in sets S and G that are maximally consistent with the training instances.

  9. Step 1 & 2 • Step 1 • Initialize S=, & G=<?> with count 0 • Step 2 • For each training instance with uncertainty CF, do Step 3 to Step 7.

  10. Step 3 – Search 1 • Generalize/Specialize each hypothesis in S/G • ck: count of hypothesis in S/G • Attach new count • ck+(1+CF)/2 / ck+(1-CF)/2 •  S’/G’

  11. Step 4 – Search 2 • Find the set S”/G” • Which Include/exclude only the new training instance itself • Set the count of each hypothesis in S”/G” to be (1+CF)/2 / (1-CF)/2

  12. Step 5 – Pruning 1 • Combine S/G, S’/G’, and S”/G” • Identical hypotheses • only with maximum count is retained • If a particular hypothesis is both more general/specific than another and has an equal or smaller count, discard that.

  13. Step 6 – Confidence Calc. • Confidence of each new hypothesis • For each hypothesis s with count cs in the new S • Find the hypothesis g in the new G that is more general than s and has the maximum count cg • Confidence = FIPI  cs + (1-FIPI)  cg • For each hypothesis g with count cg in the new G • Do the same.

  14. s (count=cs), … specific S Confidence of s = FIPI  cs + (1-FIPI)  max(cg) g is more general than s g (count=cg), … G general Confidence of g = FIPI  cs + (1-FIPI)  max(cg)

  15. Step 7 – Pruning 2 • Select only i/j hypotheses with highest confidence in the new S/G

  16. Another Papers • GA • L. De Raedt, et al., “A Unifying Framework for Concept-Learning Algorithms”, Knowledge Engineering Rev., vol. 7, no. 3, 1989 • R. G. Reynolds, et al., “The Use of Version Space Controlled Genetic Algorithms to Solve the Boole Problem”. Int’l J. Artificial Intelligence Tools, vol. 2, no. 2, 1993 • Fuzzy • C. C. Lee, “Fuzzy Logic in Control Systems: Fuzzy Logic Controller Part1&2”, IEEE Trans. Systems, Man, and Cybernetics, vol. 20, no. 2, 1990 • L. X. Wang, et al., “Generating Fuzzy Rules by Learning from Examples”, Proc. IEEE Conf. Fuzzy Systems, 1992

More Related