1 / 37

Regression

Regression. Disclaimer: This PPT is modified based on Dr. Hung- yi Lee http://speech.ee.ntu.edu.tw/~tlkagk/courses_ML17.html. Regression: Output a scalar. Stock Market Forecast Self-driving Car Product Recommendation (eg: Amazon). Dow Jones Industrial Average at tomorrow.

edithd
Download Presentation

Regression

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Regression Disclaimer: This PPT is modified based on Dr. Hung-yi Lee http://speech.ee.ntu.edu.tw/~tlkagk/courses_ML17.html

  2. Regression: Output a scalar • Stock Market Forecast • Self-driving Car • Product Recommendation (eg: Amazon) Dow Jones Industrial Average at tomorrow Control wheel angle Possibility to buy ProductB UserA,

  3. Example Application • Estimating the Combat Power (CP) of a Pokemonafter evolution CP after evolution

  4. Step 1: Model w and b are parameters (can be any value) y = b + w xcp f1: y = 10.0 + 9.0 xcp A set of function Model f2: y = 9.8 + 9.2 xcp f3: y = - 0.8 - 1.2 xcp …… infinite function CP after evolution : feature Linear model: : weight, b: bias

  5. : true value Step 2: Goodness of Function : pred value y = b + w xcp function Output (scalar): function input: A set of function Model Training Data

  6. : true value Step 2: Goodness of Function : pred value Training Data: 10 pokemons This is real data. Source: https://www.openintro.org/stat/data/?data=pokemon

  7. : true value Step 2: Goodness of Function : pred value y = b + w xcp Loss function : A set of function Model Input: a function, output: how bad it is Estimation error Goodness of function f Estimated y based on input function Sum over examples Training Data

  8. Step 2: Goodness of Function • Loss Function Each point in the figure is a function smallest y = - 180 - 2 xcp The color represents . Very large (true example)

  9. Step 3: Best Function f: y = b + w xcp A set of function Model Goodness of function f Pick the “Best” Function Gradient Descent Training Data

  10. http://chico386.pixnet.net/album/photo/171572850 Step 3: Gradient Descent • Consider loss function 𝐿(𝑤) with one parameter w: • (Randomly) Pick an initial value w0 • Compute Loss Increase w Negative Decrease w Positive w0

  11. http://chico386.pixnet.net/album/photo/171572850 Step 3: Gradient Descent • Consider loss function 𝐿(𝑤) with one parameter w: • (Randomly) Pick an initial value w0 • Compute Loss η>0 is called “learning rate” w0

  12. Step 3: Gradient Descent • Consider loss function 𝐿(𝑤) with one parameter w: • (Randomly) Pick an initial value w0 • Compute Loss • Compute …… Many iteration Local minima global minima w0 w1 w2 wT

  13. Step 3: Gradient Descent gradient • How about two parameters? • (Randomly) Pick an initial value w0,b0 • Compute , • Compute ,

  14. Step 3: Gradient Descent Color: Value of Loss L(w,b) , Compute ,

  15. Step 3: Gradient Descent • When solving: • Each time we update the parameters, we obtain that makes smaller. by gradient descent Is this statement correct?

  16. Step 3: Gradient Descent Loss Very slow at the plateau Stuck at saddle point Stuck at local minima The value of the parameter w

  17. Step 3: Gradient Descent • Formulation of and

  18. Step 3: Gradient Descent • Formulation of and

  19. How’s the results? Training Data y = b + w xcp b = -188.4 w = 2.7 Average Error on Training Data = 31.9

  20. What we really care about is the error on new data (testing data) How’s the results? - Generalization Another 10 Pokemons as testing data y = b + w xcp b = -188.4 How can we do better? w = 2.7 Average Error on Testing Data = 35.0 > Average Error on Training Data (31.9)

  21. Selecting another Model y = b + w1 xcp + w2 (xcp)2 Best Function b = -10.3 w1 = 1.0, w2 = 2.7 x 10-3 Average training Error = 15.4 Testing: Average testing Error = 18.4 Better! Could it be even better?

  22. Selecting another Model y = b + w1 xcp + w2 (xcp)2 + w3 (xcp)3 Best Function b = 6.4, w1 = 0.66 w2 = 4.3 x 10-3 w3 = -1.8 x 10-6 Average train Error = 15.3 Testing: Average test Error = 18.1 Slightly better. How about more complex model?

  23. Selecting another Model y = b + w1 xcp + w2 (xcp)2 + w3 (xcp)3 + w4 (xcp)4 Best Function Average train Error = 14.9 Testing: Average test Error = 28.8 The results become worse ...

  24. Selecting another Model y = b + w1 xcp + w2 (xcp)2 + w3 (xcp)3 + w4 (xcp)4 + w5 (xcp)5 Best Function Average train Error = 12.8 Testing: Average test Error = 232.1 The results are so bad.

  25. Training Data Model Selection 1. y = b + w xcp 2. y = b + w1 xcp + w2 (xcp)2 y = b + w1 xcp + w2 (xcp)2 + w3 (xcp)3 3. y = b + w1 xcp + w2 (xcp)2 + w3 (xcp)3 + w4 (xcp)4 4. y = b + w1 xcp + w2 (xcp)2 + w3 (xcp)3 + w4 (xcp)4 + w5 (xcp)5 A more complex model yields lower error on training data. 5. If we can truly find the best function

  26. Model Selection Overfitting A more complex model does not always lead to better performance on testing data. Select suitable model This is Overfitting.

  27. Let’s collect more data There is some hidden factors not considered in the previous model ……

  28. What are the hidden factors? Eevee Pidgey Weedle Caterpie

  29. Back to step 1: Re-design the Model Linear model? x of x If : If : If : If : y

  30. Back to step 1: Re-design the Model Linear model? 1 1 0 If =1 0 =0 otherwise 0 0 If 0 0

  31. Training Data Average train error = 3.8 Testing Data Average test error = 14.3

  32. Are there any other hidden factors? -Need domain knowledge CP after evolution weight CP after evolution CP after evolution Height HP

  33. Back to step 1: Re-design Complex Model Again x If : Training Error = 1.9 If : If : Testing Error = 102.3 If : Overfitting! y

  34. Back to step 2: Regularization The functions with smaller are better • Smaller means … smoother If Noises corrupt input xi when testing. Not large changes in response. A smoother function has less influence. Do you have to apply regularization on bias? Not affect smoothness.

  35. Regularization smoother How smooth? Select obtaining the best model • Training error: larger, considering the training error less • We prefer smooth function, but don’t be too smooth.

  36. Conclusion • Pokémon: Original CP and species almost decide the CP after evolution • There are probably other hidden factors • Gradient descent • following lectures: More theory and tips • Following lectures: Where does the error come from? • More theory about overfitting and regularization • The concept of validation

  37. Reference • Bishop: Chapter 1.1

More Related