160 likes | 483 Views
Bayes Classifier , Linear Regression. 10701 /15781 Recitation Jan uary 29, 2008. Parts of the slides are from previous years’ recitation and lecture notes, and from Prof. Andrew Moore’s data mining tutorials. Classification and Regression. Classification
E N D
Bayes Classifier,Linear Regression 10701/15781 Recitation January 29, 2008 Parts of the slides are from previous years’ recitation and lecture notes, and from Prof. Andrew Moore’s data mining tutorials.
Classification and Regression • Classification • Goal: Learn the underlying function f: X (features)Y (class, or category) e.g. words “spam”, or “not spam” • Regression f: X (features) Y (continuous values) e.g. GPA salary
Supervised Classification • How to find an unknown function f: X Y (features class) or equivalently P(Y|X) • Classifier: • Find P(X|Y), P(Y), and use Bayes rule - generative • Find P(Y|X) directly - discriminative
Classification Learn P(Y|X) 1. Bayes rule: P(Y|X) = P(X|Y)P(Y) / P(X) ~ P(X|Y)P(Y) • Learn P(X|Y), P(Y) • “Generative” classifier 2. Learn P(Y|X) directly • “Discriminative”(to be covered later in class) • e.g. logistic regression
Generative Classifier: Bayes Classifier Learn P(X|Y), P(Y) • e.g. email classification problem • 3 classes for Y = { spam, not spam, maybe } • 10,000 binary features for X = {“Cash”, “Rolex”,…} • How many parameters do we have? • P(Y) : • P(X|Y) :
Generative learning:Naïve Bayes • Introduce conditional independence P(X1,X2|Y) = P(X1 |Y) P(X2 |Y) P(Y|X) = P(X|Y)P(Y) / P(X) for X=(Xi,…,Xn) = P(X1|Y)…P(Xn|Y)P(Y) / P(X) = prodi P(Xi|Y) P(Y) / P(X) • Learn P(X1|Y), … P(Xn|Y), P(Y) instead of learning P(X1,…, Xn |Y) directly
Naïve Bayes • 3 classes for Y = {spam, not spam, maybe} • 10,000 binary features for X = {“Cash”,”Rolex”,…} • Now, how many parameters? • P(Y) • P(X|Y) • fewer parameters • “simpler” – less likely to overfit
Full Bayes vs. Naïve Bayes P(Y=1|(X1,X2)=(0,1))=? • Full Bayes: P(Y=1)=? P((X1,X2)=(0,1)|Y=1)=? • Naïve Bayes: P(Y=1)=? P((X1,X2)=(0,1)|Y=1)=? • XOR
Regression • Prediction of continuous variables • e.g. I want to predict salaries from GPA. • I can regress that … • Learn the mapping f: X Y • Model is linear in the parameters (+ some noise) linear regression • Assume Gaussian noise • Learn MLE Θ
1-parameter linear regression • Normal linear regression or equivalently, • MLEΘ? • MLE σ2 ?
Multivariate linear regression • What if the inputs are vectors? • Write matrix X and Y : (n data points, k features for each data) • MLE Θ =
Constant term? • We may expect linear data that does not go through the origin • Trick?
Regression: another example • Assume the following model to fit the data. The model has one unknownparameter θ to be learned from data. • A maximum likelihood estimation of θ?