Linear Models (I)

Linear Models (I) Rong Jin

Review of Information Theory • What is information? • What is entropy? • Average information • Minimum coding length • Important inequality Distribution for Generating Symbols Distribution for Coding Symbols

Review of Information Theory (cont’d) • Mutual information • Measure the correlation between two random variables • Symmetric • Kullback-Leibler distance • Difference between two distributions

Outline • Classification problems • Information theory for text classification • Gaussian generative • Naïve Bayes • Logistic regression

X Input Y Output ? Classification Problems • Given input X={x1, x2, …, xm} • Predict the class label y • y{-1,1}, binary class classification problems • y {1, 2, 3, …, c}, multiple class classification problems • Goal: need to learn the function:

Doc: Months of campaigning and weeks of round-the-clock efforts in Iowa all came down to a final push Sunday, … Topic: politics    Which is a bird image? Examples of Classification Problems • Text categorization: • Input features: words ‘campaigning’, ‘efforts’, ‘Iowa’, ‘Democrats’, … • Class label: ‘politics’ and ‘non-politics’ • Image Classification: • Input features: color histogram, texture distribution, edge distribution, … • Class label: ‘bird image’ and ‘non-bird image’

Learning Setup for Classification Problems • Training examples: • Identical Independent Distribution (i.i.d.) • Training examples are similar to testing examples • Goal • Find a model or a function that is consistent with the training data

Information Theory for Text Classification • If coding distribution is similar to the generating distribution  short coding length  good compression rate Distribution for Generating Symbols Distribution for Coding Symbols

Compression Algorithm for TC Topic: Sports New Document Compression Model M1 Politics 16K bits Compression Model M2 10K bits Sports

Training Examples Learning a Statistical Model  Prediction p(y|x;) Probabilistic Models for Classification Problems • Apply statistical inference methods • Key: finding the best parameters  • Maximum likelihood (MLE) approach • Log-likelihood of data • Find the parameters  that maximizes the log-likelihood

Generative Models • Not directly estimate p(y|x;) • Using Bayes rule • Estimate p(xly;) instead of p(y|x;) • Why p(xly;)? • Most well known distributions are p(xl). • Allocate a separate set of parameters for each class •   {1, 2,…,c} • p(xly;)  p(xly) • Describes the special input patterns for each class y

Gaussian Generative Model (I) • Assume a Gaussian model for each class • One dimension case • Results for MLE

Example • Height histogram for males and females. • Using Gaussian generative model • P(male|1.8) = ? , P(female|1.4) = ?

Gaussian Generative Model (II) • Consider multiple input features • X={x1, x2, …, xm} • Multi-variate Gaussian distribution • y is a mm covariance matrix • Results for MLE • Problem: • Singularity of y : too many parameters

Overfitting Issue • Complex model • Insufficient training • Consider a classification problem of multiple inputs • 100 input features • 5 classes • 1000 training examples • Total number parameters for a full Gaussian model is • 5 means  500 parameters • 5 covariance matrices  50,000 parameters • 50,500 parameters  insufficient training data

Another Example of Overfitting

Naïve Bayes • Simplify the model complexity • Diagonalize the covariance matrix y • Simplified Gaussian distribution • Feature independence assumption • Naïve Bayes assumption

Naïve Bayes • A terrible estimator for • But it is a very reasonable estimator for Why? • The ratio of likelihood is more important • Naïve Bayes does a reasonable job on the estimation of ratio

The Ratio of Likelihood • Binary class • Both classes share the similar variance • A linear model !

Decision Boundary • Gaussian Generative Models == Finding a linear decision boundary • Why not do it directly?

Linear Models (I)

Linear Models (I)

Presentation Transcript

Functional linear models

General Linear Models; Generalized Linear Models

HIERARCHICAL LINEAR MODELS

Generalized Linear Models

Classification: Linear Models

GENERAL LINEAR MODELS

Lecture 2 Linear Models I

Generalized Linear Models

Higher Order Linear Models

Classification: Linear Models

GLM I: Introduction to Generalized Linear Models

Linear Regression Models

Linear Models

Linear Models I

GLM I: Introduction to Generalized Linear Models

Mixed Linear Models