Can Inductive Learning Work?

Inductivehypothesis h Training set D p(x): probability that example x is picked from X size |H| size m - - + - + - - - - + + + + - - + + + + - - - + + Hypothesis space H Example set X h: hypothesis that agrees with all examples in D Can Inductive Learning Work? L

Approximately CorrectHypothesis h H is approximately correct (AC)with accuracy eiff: Pr[h(x) correct] > 1 – e where x is an example picked with probability distribution p from X

PAC Learning Algorithm • A leaning algorithm L is Provably Approximately Correct(PAC) with confidence 1-giff the probability that it generates a non-AC hypothesis his g:Pr[his non-AC] g • Can L be PAC if the size m of the training set D is large enough? • If yes, how big should m be?

Intuition • If m is large enough and g H is not AC, it is unlikely that it agrees with all examples in the training dataset D • So, if m is large enough, there should be few non-AC hypotheses that agree with all examples in D • Hence, it is unlikely that L will pick one

Can L Be PAC? • Let g be an arbitraryhypothesis in H that is not approximately correct • Since g is not AC, we have: Pr[g(x) correct]  1–e • The probability that g is consistent with all the examples in D is at most(1-e)m • The probability that there exists a non-AC hypothesis matching all examples in D is at most |H|(1-e)m • Therefore, L is PAC if m verifies: |H|(1-e)m g h H is AC iff: Pr[h(x) correct] > 1–e L is PAC if Pr[h is non-AC] g

Calculus • H = {h1, h2, …, h|H|} • Pr(hi is not-AC and agrees with D)  (1-e)m • Pr(h1, or h2, …, is not-AC and agrees with D) Si=1,…,|H|Pr(hi is not-AC and agrees with D)  |H| (1-e)m

Size of Training Set • From |H|(1-e)m g we derive:mln(g/|H|) / ln(1-e) • Since e < -ln(1-e) for 0 < e <1, we have:mln(g/|H|) / (-e)mln(|H|/g) / e • So, m increases logarithmicallywith the size of the hypothesis space But how big is |H|?

2n 2 Importance of KIS Bias • If H is the set of all logical sentences with nobservable predicates, then |H| = , and m is exponential in n • If H is the set of all conjunctions of k << nobservable predicates picked among n predicates, then |H| = O(nk) and m is logarithmic in n •  Importance of choosing a “good” KIS bias

Can Inductive Learning Work?

Can Inductive Learning Work?

Presentation Transcript

Machine Learning: Symbol-based

Learning Theories

IT/CS 811 Principles of Machine Learning and Inference

Beyond best practice: Research-based innovation in learning and knowledge work Marlene Scardamalia Project Director

Everyday inductive leaps Making predictions and detecting coincidences

Fusion Technology Development for Urban/Asymmetric Warfare: Deductive and Inductive Approaches

A Framework for Asynchronous Parallel Machine Learning

CMSC 671 Fall 2003

Bayesian models of inductive learning

Work

Lynn Radicello

Work Ethic

What?

Predictive Learning from Data

Return to Social Work: Learning Materials

Why Learning Communities Work: A DEEPer Look at Effective Educational Practice George D. Kuh

e-learning ?

Biblical Genres and the Inductive Bible Study Method

Learning Agents Laboratory Computer Science Department George Mason University

“JUST LIKE ME”

Predictive Learning from Data

Inductive Approaches to the Detection and Classification of Semantic Relation Mentions