1 / 45

Pattern Recognition and Machine Learning

Source: Bishop book chapter 1 with modifications by Christoph F. Eick. Pattern Recognition and Machine Learning. Chapter 1: Introduction. Polynomial Curve Fitting . Experiment: Given a function; create N training example. What M should we choose?  Model Selection

conor
Download Presentation

Pattern Recognition and Machine Learning

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Source: Bishop book chapter 1 with modifications by Christoph F. Eick Pattern Recognition and Machine Learning Chapter 1: Introduction

  2. Polynomial Curve Fitting Experiment: Given a function; create N training example What M should we choose?Model Selection Given M, what w’s should we choose?Parameter Selection

  3. Sum-of-Squares Error Function

  4. 0th Order Polynomial • As N­, E¯ • As c (H)­, first E¯ and then E­ • As c (H)­the training error decreases for some time and then stays constant (frequently at 0) How do M, the quality of fitting and the capability to generalize relate to each other??

  5. 1st Order Polynomial

  6. 3rd Order Polynomial

  7. 9th Order Polynomial

  8. Over-fitting Root-Mean-Square (RMS) Error:

  9. Polynomial Coefficients

  10. Data Set Size: 9th Order Polynomial

  11. Data Set Size: 9th Order Polynomial Increasing the size of the data sets alleviates the over-fitting problem.

  12. Regularization Penalize large coefficient values Idea: penalize high weights that contribute to high variance and sensitivity to outliers.

  13. Regularization: 9th Order Polynomial

  14. Regularization:

  15. Regularization: vs.

  16. The example demonstrated: As N­, E¯ As c (H)­, first E¯ and then E­ As c (H)­the training error decreases for some time and then stays constant (frequently at 0)

  17. Polynomial Coefficients Weight of regularization increases

  18. Probability Theory Apples and Oranges

  19. Probability Theory Marginal Probability Conditional Probability Joint Probability

  20. Probability Theory Sum Rule Product Rule

  21. The Rules of Probability Sum Rule Product Rule

  22. Bayes’ Theorem posterior  likelihood × prior

  23. Probability Densities Cumulative Distribution Function Usually in ML!

  24. Transformed Densities

  25. Expectations (f under p(x)) Conditional Expectation (discrete) Approximate Expectation (discrete and continuous)

  26. Variances and Covariances

  27. The Gaussian Distribution

  28. Gaussian Mean and Variance

  29. The Multivariate Gaussian

  30. Gaussian Parameter Estimation Likelihood function Compare: for 2, 2.1, 1.9,2.05,1.99 N(2,1) and N(3.1)

  31. Maximum (Log) Likelihood

  32. Properties of and

  33. Curve Fitting Re-visited

  34. Maximum Likelihood Determine by minimizing sum-of-squares error, .

  35. Predictive Distribution Skip initially

  36. Model Selection Cross-Validation

  37. Entropy • Important quantity in • coding theory • statistical physics • machine learning

  38. Entropy Coding theory: x discrete with 8 possible states; how many bits to transmit the state of x? All states equally likely

  39. Entropy

  40. Entropy In how many ways can N identical objects be allocated M bins? Entropy maximized when

  41. Entropy

  42. Differential Entropy Put bins of width ¢along the real line Differential entropy maximized (for fixed ) when in which case

  43. Conditional Entropy

  44. The Kullback-Leibler Divergence

  45. Mutual Information

More Related