PAC Learning

PAC Learning 8/5/2005

purpose • Effort to understand negative selection algorithm from totally different aspects • Statistics • Machine learning • What is machine learning, in a very informal way? • Looking for mathematical tool to describe, analyze, evaluate either a learning algorithm, or learning problem.

background • PAC learning framework is a branch of computational learning theory. • Computational learning theory is a mathematical field related to the analysis of machine learning algorithms. It is actually considered as a field of statistics. • Machine learning algorithms take a training set, form hypotheses or models, and make predictions about the future. Because the training set is finite and the future is uncertain, learning theory usually does not yield absolute guarantees of performance of the algorithms. Instead, probabilistic bounds on the performance of machine learning algorithms are quite common.

More about computational learning theory • In addition to performance bounds, computational learning theorists study the time complexity and feasibility of learning. • In computational learning theory, a computation is considered feasible if it can be done in polynomial time.

More about computational learning theory • There are several different approaches to computational learning theory, which are often mathematically incompatible. • This incompatibility arises from • using different inference principles: principles which tell you how to generalize from limited data. • differing definitions of probability (frequency probability, Bayesian probability).

More about computational learning theory • The different approaches include: • Probably approximately correct learning (PAC learning), proposed by Leslie Valiant; • VC theory, proposed by Vladimir Vapnik; • Bayesian inference, arising from work first done by Thomas Bayes. • Algorithmic learning theory, from the work of E. M. Gold. • Computational learning theory has led to practical algorithms. For example, PAC theory inspired boosting

What is this for? • The PAC framework allowed accurate mathematical analysis of learning.

Basic facts of PAC learning • Probably approximately correct learning (PAC learning) is a framework of learning that was proposed by Leslie Valiant in his paper A theory of the learnable. • In this framework the learner gets samples that are classified according to a function from a certain class. The aim of the learner is to find an approximation of the function with high probability. We demand the learner to be able to learn the concept given any arbitrary approximation ratio, probability of success or distribution of the samples. • How does negative selection fit in? We only deal with a very special distribution of the samples: one class samples. Is it a PAC learning algorithm?

“The intend of PAC model is that successful learning of an unknown target concept should entail obtaining with high probability, a hypothesis that is a good approximation of it.” • We can consider this target concept as a unknown function, e.g. f:{0,1}n{0,1}; the result to pursue is an approximation of f, or a hypothesis as called here. • The purpose of the discussion of PAC is to decide whether a algorithm to find the approximation (1) good enough or not (2) feasible or not. • “If we wish to define a model of learning from (random) samples, a crucial point is to formulate ‘correctly’ the notion of success.” (quoted but corrected and highlighted)

To make the discussion simple, let us use the simple setup f:{0,1}n{0,1} • Instance space {0,1}n

Give probability distribution D defined on {0,1}n The error of a hypothesis h with respect to a fixed target concept c is defined as Where D denotes the symmetric difference. Error(h) is the probability that h and c will disagree according to D. The hypothesis h is a good approximation of the target concept c if error(h) is small. (Note that depends on D).

Definition of “PAC Learnability” • This definition is the center piece of PAC learning model. • Defining when the concept class C is: • PAC learnable by the hypothesis space H • Properly PAC learnable • PAC learnable

What is concept class C? C={Cn}n≥1, where Cn is set of target concepts over {0, 1}n • What is hypothesis space H? H={Hn}n≥1, where Hn is set of hypotheses over {0, 1}n

Definition of “PAC learnable by the hypothesis space H”: • The concept class C is PAC learnable by the hypothesis space H if there exists a polynomial time algorithm A and a polynomial p(,.,.) such that for all n≥1, all target concepts cCn, all probability distribution D on the instance space {0,1}n, and all e and d, where 0<e, d<1, if the algorithm A is given at least p(n,1/e,1/d) independent random examples of c drawn according to D, then with probability at least 1-d, A returns a hypothesis h  Hn with error(h)≤e. • Note: this talks about the existence of A, not what exactly A is. • The smallest such polynomial p is called the sample complexity of learning algorithm. • This is as essential to a learning algorithm as time complexity to a general algorithm

Definition pf “properly PAC learnable” • If C=H • Definition of “PAC learnable” • If C is concept class and there exists some hypothesis space H such that hypotheses in H can be evaluated on given instances in polynomial time and such that C is PAC learnable by H • This extension if from “for given H” to “existence of H” • If C is properly PAC learnable, it is obviously PAC learnable (assuming hypotheses on C can be evaluated on give instance in polynomial time)

There are many variants of the basic definition. • It can be shown they are equivalent. • The model can be extended to various aspects.

We ask for a single algorithm A for all distribution • Not that for every distribution D there exists an algorithm that was designed for the specific distribution D • That means: algorithm A does not know the distribution.

A key part of PAC learning and the potential link to negative selection algorithm we’re try to make (if existing at all): probability distribution D • “The error probability is measured with respect to the same distribution according to which the random examples are chosen.” “if the learning algorithm will get random examples from a distribution which provides only samples with first bits 0 and the error will be measured with respect to distribution on strings whose first bit is 1 then clearly the learning algorithm has no chance to do much.” • NSA, at least my method, seems doing something “no chance to do much” described above, with a little help from the magic “self threshold (or self radius)” • NSA’s notion of success is not well defined?

What does it mean by “L is a PAC learning algorithm”: • For any given d, e>0, there is a sample size m0, such that for all target functions t computable and all probability distribution P, we have m>=m0 Pm(error(L(s),t)>e)<d

Unanswered questions • How does negative selection algorithm fit into the model of PAC learning? • Does NSA count as a learning process or algorithm at all?

references • D Haussler. Probably approximately correct learning. In AAAI-90 Proceedings of the Eight National Conference on Artificial Intelligence, Boston, MA, pages 1101--1108. American Association for Artificial Intelligence, 1990. http://citeseer.ist.psu.edu/haussler90probably.html • http://en.wikipedia.org/wiki/Probably_approximately_correct_learning • http://en.wikipedia.org/wiki/Computational_learning_theory • ... …

PAC Learning

PAC Learning

Presentation Transcript

PAC Learning and The VC Dimension

Pan Pac

PAC 101

Pre-PAC

ICE-PAC

ACC PAC

S-PAC

PAC

Objectives: Error Bounds Complexity Theory PAC Learning PAC Bound Margin Classifiers

Pac I

Machine Learning Lecture 5: Theory I – PAC Learning

PAC-VA

PAC

PAC

Probably Approximately Correct Learning (PAC)

The probably approximately correct (PAC) learning model

PAC-Learning and Reconstruction of Quantum States