Statistical Learning Methods

Statistical Learning Methods Marco Loog

Introduction • Agents can handle uncertainty by using the methods of probability and decision theory • But first they must learn their probabilistic theories of the world from experience...

Key Concepts • Data : evidence, i.e., instantiation of one or more random variables describing the domain • Hypotheses : probabilistic theories of how the domain works

Outline • Bayesian learning • Maximum a posteriori and maximum likelihood learning • Instance-based learning • Neural networks

Bayesian Learning • Simply calculates probability of each hypothesis, given data, and makes predictions based on this • I.e., predictions based on all hypothesis, weighted by their probabilities, rather than using only ‘single best’ hypothesis

Candy • Suppose five kinds of bags of candies • 10% are h1 : 100% cherry candies • 20% are h2 : 75% cherry candies + 25% lime candies • 40% are h3 : 50% cherry candies + 50% lime candies • 20% are h4 : 25% cherry candies + 75% lime candies • 10% are h5 : 100% lime candies • We observe candies drawn from some bag

Mo’ Candy • We observe candies drawn from some bag • Assume observations are i.i.d.,e.g. because many candies in the bag • Assume we don’t like the green lime candy • Important questions • What kind of bag is it? h1, h2,...,h5? • What flavor will the next candy be?

Posterior Probability of Hypotheses

True hypothesis will eventually dominate the Bayesian prediction [prior is of no influence in the long run] More importantly [maybe not for us?] : Bayesian prediction is optimal Posterior Probability of Hypotheses

The Price for Being Optimal • For real learning problems the hypothesis space is large, possibly infinite • Summation / integration over hypothesis cannot be carried out • Resort to approximate or simplified methods

Maximum A Posteriori • Common approximation method : make predictions on the single most probable hypothesis • I.e. take the hi that maximizes P(hi|d) • Such a MAP hypothesis is approximately Bayesian, i.e., P(X|d) ≈ P(X|hi) [the more evidence the better the approximation]

Hypothesis Prior • Both in Bayesian learning and in MAP learning, hypothesis prior plays an important role • If hypothesis space is too expressive overfitting can occur [see also Chapter 18] • Prior is used to penalize complexity [instead of explicitly limiting the space] : the more complex the hypothesis the lower the prior probability • If enough evidence available, eventually complex hypothesis chosen [if necessary]

Maximum Likelihood Approximation • For enough data, prior becomes irrelevant • Maximum likelihood [ML] learning : choose h that maximizes P(d|hi) • I.e., simply get the best fit to the data • Identical to MAP for uniform prior P(hi) • Also reasonable if all hypotheses are of the same complexity • ML is the ‘standard’ [non-Bayesian / ‘classical’] statistical learning method

E.g. • Bag from new manufacturer; fraction  of red cherry candies; any  is possible • Suppose unwrap N candies, c cherries and l = N - c limes • Likelihood • Maximize for  using log likelihood

E.g. 2 • Gaussian model [often denoted by N(µ,)] • Log likelihood is given by • If  is known, find maximum likelihood for µ • If µ is known, find maximum likelihood for 

Halfway Summary and Additional Remarks • Full Bayesian learning gives best possible predictions but is intractable • MAP selects single best hypothesis; prior is still used • Maximum likelihood assumes uniform prior, OK for large data sets • Choose parameterized family of models to describe the data • Write down likelihood of data as function of parameters • Write down derivative of log likelihood w.r.t. each parameter • Find parameter values such that the derivatives are zero • ML estimation may be hard / impossible; modern optimization techniques help • In games, data often becomes available sequentially; not necessary to train in one go

Outline • Bayesian learning √ • Maximum a posteriori and maximum likelihood learning √ • Instance-based learning • Neural networks

Instance-Based Learning • So far we saw statistical learning as parameter learning, i.e., given a specific parameter-dependent family of probability models fit it to the data by tweaking parameters • Often simple and effective • Fixed complexity • Maybe good for very little data

Instance-Based Learning • So far we saw statistical learning as parameter learning • Nonparametric learning methods allow hypothesis complexity to grow with the data • “The more data we have, the more ‘wigglier’ the hypothesis can be”

Nearest-Neighbor Method • Key idea : properties of an input point x are likely to be similar to points in the neighborhood of x • E.g. classification : estimate unknown class of x using classes of neighboring points • Simple, but how does one define what a neighborhood is? • One solution : find the k nearest neighbors • But now the problem is how to decide what nearest is...

k Nearest-Neighbor Classification • Check the class / output label of your k neighbors and simply take [for example] # of neighbors having class label x kas the posterior probability of having class label x • When assigning a single label : take MAP!

kNN Probability Density Estimation

Kernel Models • Idea : Put little density function [a kernel] in every data point and take the [normalized] sum of these • Somehow similar to kNN • Often providing comparable performance

Probability Density Estimation

Outline • Bayesian learning √ • Maximum a posteriori and maximum likelihood learning √ • Instance-based learning √ • Neural networks

Neural Networks and Games

So First... Neural Networks • According to Robert Hecht-Nielsen, a neural network is simply “a computing system made up of a number of simple, highly interconnected processing elements, which process information by their dynamic state response to external inputs” Simply... • We skip the biology for now • And provide the bare basics

Statistical Learning Methods

Statistical Learning Methods

Presentation Transcript

Statistical Learning Methods for Information Retrieval

Basic statistical methods

Statistical Methods

Statistical Methods

Statistical Methods

Statistical Methods II

Chapter 11 Supervised Learning: STATISTICAL METHODS

Performance of Statistical Learning Methods

Statistical Methods

STATISTICAL LEARNING METHODS FOR MICROSTRUCTURES

Statistical Learning Methods

Statistical Methods II

Multivariate statistical methods

Basic Statistical Methods

Statistical Methods

Statistical Learning Methods in Natural Language Processing

Statistical Learning Methods in HEAP

Statistical Methods

Statistical Methods Bayesian methods

Chapter 11 Supervised Learning: STATISTICAL METHODS

Statistical Methods

Nonparametric Statistical Methods