300 likes | 444 Views
Empirical Research Methods in Computer Science. Lecture 7 November 30, 2005 Noah Smith. Using Data. Data. estimation; regression; learning; training. Model. classification; decision. pattern classification machine learning statistical inference . Action. Probabilistic Models.
E N D
Empirical Research Methods in Computer Science Lecture 7 November 30, 2005 Noah Smith
Using Data Data estimation; regression; learning; training Model classification; decision pattern classification machine learning statistical inference ... Action
Probabilistic Models • Let X and Y be random variables. (continuous, discrete, structured, ...) • Goal: predict Y from X. • A model defines P(Y = y | X = x). • Where do models come from? • If we have a model, how do we use it?
Using a Model • We want to classify a message, x, as spam or mail: y ε {spam, mail}. Model P(spam | x) P(mail | x) x
Bayes’ Rule likelihood: one distribution over complex observations per y prior what we said the model must define normalizes into a distribution:
Naive Bayes Models • Suppose X = (X1, X2, X3, ..., Xm). • Let
Naive Bayes: Graphical Model Y X1 X2 X3 Xm ...
Part II Where do the model parameters come from?
Using Data Data estimation; regression; learning; training Model Action
Warning • This is a HUGE topic. • We will barely scratch the surface.
Forms of Models • Recall that a model defines P(x | y) and P(y). • These can have a simple multinomial form, like P(mail) = 0.545, P(spam) = 0.455 • Or they can take on some other form, like a binomial, Gaussian, etc.
Example: Gaussian • Suppose y is {male, female}, and one observed variable is H, height. • P(H | male) ~ N(μm, σm2) • P(H | female) ~ N(μf, σf2) • How to estimate μm, σm2,μf, σf2?
Maximum Likelihood • Pick the model that makes the data as likely as possible max P(data | model)
Maximum Likelihood (Gaussian) • Estimating the parameters μm, σm2,μf, σf2 can be seen as • fitting the data • estimating an underlying statistic (point estimate)
Example: Regression • Suppose y is actual runtime, and x is input length. • Regression tries to predict some continuous variables from others.
Regression • Linear: assume linear relationship, fit a line. • We can turn this into a model!
Linear Model • Given x, predict y. y = β1x + β0+ N(0, σ2) random deviation true regression line
Principle of Least Squares • Minimize the sum of squared vertical deviations. • Unique, closed form solution! vertical deviation
Other kinds of regression • transform one or both variables (e.g., take a log) • polynomial regression • (least squares → linear system) • multivariate regression • logistic regression
Example: text categorization • Bag-of-words model: • x is a histogram of counts for all words • y is a topic
MLE for Multinomials • “Count and Normalize”
The Truth about MLE • You will never see all the words. • For many models, MLE isn’t safe. • To understand why, consider a typical evaluation scenario.
Evaluation • Train your model on some data. • How good is the model? • Test on different data that the system never saw before. • Why?
Tradeoff low variance overfits the training data doesn’t generalize low accuracy
Text categorization again • Suppose ‘v1@gra’ never appeared in any document in training, ever. • What is the above probability for a new document containing ‘v1@gra’ at test time?
Solutions • Regularization • Prefer less extreme parameters • Smoothing • “Flatten out” the distribution • Bayesian Estimation • Construct a prior over model parameters, then train to maximize P(data | model) × P(model)
One More Point • Building models is not the only way to be empirical. • Neural networks, SVMs, instance-based learning • MLE and smoothed/Bayesian estimation are not the only ways to estimate. • Minimize error, for example (“discriminative” estimation)
Assignment 3 • Spam detection • We provide a few thousand examples • Perform EDA and pick features • Estimate probabilities • Build a Naive-Bayes classifier