Questions on Homework 1?

Questions on Homework 1?

Review of Terminology • Hypothesis or Model: A particular classifier: e.g., decision tree, neural network, etc. • Hypothesis or Model Space: All possible hypotheses of a particular type (e.g., decision tree; polynomial function; neural network) • Learning algorithm: A method for choosing or constructing a hypothesis (or model) from a given hypothesis (or model) space • Hypothesis or Model Parameters: E.g., size of decision tree; degree of polynomial; number of weights for neural network [constrains the hypothesis space] • Learning algorithm parameters: E.g., “information gain” vs. “gain ratio”; or value of learning rate for perceptron learning

Cross-Validation • Two uses: • Used to obtain better estimate of a model’s accuracy when data is limited. • Used for model selection.

k-fold Cross Validation for Estimating Accuracy • Each example is used both as a training instance and as a test instance. • Split data into k disjoint parts: S1, S2, ..., Sk. • For i = 1 to k Select Sito be the test set. Train on the remaining data, test on Si, to obtain accuracy Ai . • Report as the final accuracy of learning algorithm.

k-fold Cross Validation for Model Selection Run k-fold cross-validation with parameter i to produce k models. Compute average test accuracy of these k models. Choose parameter value with best average test accuracy. Use all training data to learn model with this parameter value. Test resulting model on separate, unseen test data.

Evaluating Hypotheses, Continued • Precision: Fraction of true positives out of all predicted positives: • Recall: Fraction of true positives out of all actual positives:

row = actual, column = predicted What is Precision (9)? What is Recall (9)? 75% of instances classified as “9” actually are “9” 86% of all “9”s were classified as “9”

row = actual, column = predicted What is Precision (8)? What is Recall (8)?

Error vs. Loss • Error rate: Fraction of incorrect answers given by a classifier h • Loss(y, ): Amount of utility lost by predicting when the correct answer is y. • Note that • Loss depends on the user and the task. E.g., for one user, we might have: L(spam, nospam) = 1, L(nospam, spam)= 10

Goal of Machine Learning: Minimize expected loss over all input-output pairs (x, y) in data space. Need to define prior probability distribution P(X, Y) over input-output pairs. Let ξ be the set of all possible input-output pairs. Then, expected generalization loss for hypothesis h with respect to loss function L is: Best hypothesis, h*, is:

Commonly used Loss functions

Empirical Loss TypicallyP(X, Y) is not known. Learning method can only estimateGenLoss by observing empirical loss on a set of examples, E, where N = |E| . Best hypothesis, , is:

Sources of Loss What are the possible reasons why would differ from the target function f ? • Unrealizability: • Variance: Different training sets return different h’s, especially when training sets are small • Noise: f is nondeterministic: returns different values of f(x) for same x. (Sometimes this is a result of not having all necessary attributes in x.) • Computational complexity: It may be intractable to search H.

Regularization for Model Selection • Instead of doing cross-validation for model selection, put penalty (or, more generally, “regularization”) term directly in “Cost” function to be minimized:

Questions on Homework 1?