1 / 11

Goal of Learning Algorithms

Goal of Learning Algorithms. The early learning algorithms were designed to find such an accurate fit to the data. A classifier is said to be consistent if it performed the correct classification of the training data. The ability of a classifier to correctly classify data not

Download Presentation

Goal of Learning Algorithms

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Goal of Learning Algorithms • The early learning algorithms were designed to find such an accurate fit to the data. • A classifier is said to be consistent if it performed the correct classification of the training data • The ability of a classifier to correctly classify data not in the training set is known as its generalization • Bible code? 1994 Taipei Mayor election? • Predict the real future NOT fitting the data in your hand or predict the desired results

  2. Key assumption: Training and testing data are generated i.i.d. ( i.e. “average Probably Approximately Correct Learning pac Model fixed but unknown distribution according to an • When we evaluate the “quality” of a hypothesis (classification function) we should take the into account unknown distribution error” or “expected error” ) made by the • We call such measure risk functional and denote it as

  3. Let training be a set of examples choseni.i.d.according to as ar.v. • Treat the generalization error depending on the random selection of r.v. • Find a bound of the trail of the distribution of in the form ,where and is a function of is the confidence level of the error bound which is given by learner Generalization Error of pac Model

  4. will be less • The error made by the hypothesis then the error bound that is not depend on the unknown distribution Probably Approximately Correct • We assert: or

  5. Let be the training examples choseni.i.d.according to with the probability density • The expected misclassification error made by is Find the Hypothesis with Minimum Expected Risk? • The ideal hypothesis should has the smallest expected risk Unrealistic !!!

  6. by an • Replace the expected risk over average over the training example • The empirical risk: with the smallest empirical • Find the hypothesis risk Empirical Risk Minimization (ERM) ( are not needed) and • Only focusing on empirical risk will cause overfitting

  7. The following inequality will be held with probability VC Confidence (The Bound between ) C. J. C. Burges,A tutorial on support vector machines for pattern recognition, Data Mining and Knowledge Discovery 2 (2) (1998), p.121-167

  8. Why We Maximize the Margin? (Based on Statistical Learning Theory) • The Structural Risk Minimization (SRM): • The expected risk will be less than or equal to empirical risk (training error)+ VC (error) bound

  9. is shattered by if and only • A given training set if for every labeling of consistent with this labeling • Three(linear independent)pointsshatteredby a hyperplanes in Capacity (Complexity) of Hypothesis Space :VC-dimension

  10. Theorem: Consider some set ofm points in . Choose a point as origin. Then thempoints can be shattered by oriented hyperplanes if and only if the position vectors of the rest points are linearly independent. Shattering Points with Hyperplanes in Can you always shatter three points with a line in ?

  11. (A Capacity Measure of Hypothesis Space ) can be shattered • If arbitrary large finite set of by , then • Let then Definition of VC-dimension • TheVapnik-Chervonenkisdimension, , of hypothesis space defined over the input space is the size of the (existent) largest finite subset of shattered by

More Related