630 likes | 643 Views
Feedback from last week. Too much slides / Too much information in too little time AI is a complex topic with different subjects Overall lecture is not enough time (1.5h + Practice)
E N D
Feedback from last week Too much slides / Too much information in too little time AI is a complex topic with different subjects Overall lecture is not enough time (1.5h + Practice) Use the summary and the learning results: Each slide is just for you to understand everything in total, but the focus of what you need to learn is clear based on learning results Trust us: First lecture of his kind – We will combine learning results with exam questions Live Coding/ Practice Coding: the code could not be read on the beamer: Uploading practice code and live code before the lecture Use jupyter notebook for better explanation
Objectives for Lecture 3: Regression Depth of understanding • After the lecture you are able to…
Chapter: Motivation • Chapter: Linear Models • Chapter: Loss functions • Chapter: Regularization & Validation • Chapter: Practicalconsiderations • Chapter: Summary
Motivation – Regression Example Data points Output variable or Labels Weight in kg Regression result Size in m Input variables
Motivation – Algorithms in Machine Learning • House pricing • Sales • Persons weight • Object detection • Spam detection • Cancer detection • Genome patterns • Google news • Pointcloud (Lidar) processing
Motivation – Regression in Automotive Technology Sensor calibration • Usuallyelectricquantitiesaremeasured • Necessarytoconvertthemtophysicalquantities • Examples: • Accelerometers • Gyroscopes • Displacementsensors
Motivation – Regression in Automotive Technology Parameter estimation • Vehicleparameters like areoftenonlyroughlyknown • Estimation via regressiontechniques
Motivation – Regression in Automotive Technology Vehiclepricing • Regression iswidelyusedforfinancialrelations • Allowstocompressdatainto a simple modelandevaluate derivatives
Motivation – Whyshouldyouuse Regression? Model structure Training Data Previouslyunseensetsof input variables Predictionsaboutoutput variables Predictive Model Based on thecombinationofdataandmodelstructure, itispossibletopredicttheoutcomeof a processorsystem Training datasetisusuallyonly a representation at sparsepointsandcontains lots ofnoise Allowsusageofinformation in simulation, optimization, etc.
Relation ofstatisticsandmachinelearning Howcanweextractinformationfromdataandusethemtoreasonandpredictin beforehandunseencases? (learning) Nearly all classic machinelearningmethodscanbereinterpreted in termsofstatistics Focus in machinelearningismainly on prediction Statisticsoftenfocusses on relationanalysis Lots ofadvancedregressiontechniquesbuild upon a statisticalinterpretationofregression
Chapter: Motivation • Chapter: Linear Models • Chapter: Loss functions • Chapter: Regularization& Validation • Chapter: Practicalconsiderations • Chapter: Summary
Linear Basis Function Model Bias Term Input Variables Output Variables Weight Parameters Basis Functions Weight Parameters Basis Functions
Representingthedatasetas a matrix Weightvector Output vector Design Matrix
Basis functions – examples Linear function Polynomialfunction Sinusoidalfunction Gaussianbasisfunction
Basis functions – Polynomials • Globallydefined on theindependent variable domain • Design matrixbecomesill-conditionedfor large inputdomain variables forstandardpolynomials • Hyperparameter: • Polynomialdegree
Basis functions– Gaussians • Locally defined on theindependent variable domain • Sparse design matrix • Infinitlydifferentiable • Hyperparameter: • NumberofGaussianfunctions • Width ofeachbasisfunction • Meanofeachbasisfunction
Basis functions – comparisonoflocaland global Global basisfunction Localbasisfunction Spreadparameter: 0.3
Chapter: Motivation • Chapter: Linear Models • Chapter: Loss functions • Chapter: Regularization & Validation • Chapter: Practicalconsiderations • Chapter: Summary
Loss functions • The lossfunctionsmeasurestheaccuracyofthemodelbased on thetrainingdataset • The bestmodelwecanobtain, istheminimumlossmodel • Choice of a lossfunctionis fundamental in theregressionproblem • Minimizethelossfunctionforthetrainingdatasetconsistingofindependent variables andtarget variables byvariationofthebasisfunctionweights.
Loss functions – MeanSquared Error (MSE or L2) Pro‘s: Veryimportant in practicalapplications Solution canbeeasilyobtainedanalytically Con‘s: Not robust tooutliers Examples: Basic regression Energyoptimization Control applications
Loss functions – Mean Absolute Error (MAE or L1) Pro‘s: Robust tooutliers Con‘s: Noanalyticalsolution Non-differentiable in theorigin Examples: Financial applications
Loss functions – Huber Loss Pro‘s: Combinesstrengthsandweaknessesof L1 and L2 lossfunctions Robust + differentiable Con‘s: More hyperparameters Noanalyticalsolution
Loss functions – Comparison L2 lossisdifferentiable L1 lossismore intuitive Huber Loss combinestheoreticalstrengthsofboth Practicalhints: Start with L2 losswheneverpossible Think aboutphysicalinsightsandyourintent!
Analytic Solution – Low dimensional example Solve theoptimizationproblem with themodel Insert themodelanddatapoints In general, optimal solutionsareobtained at thepointswherethegradientvanishestozero.
Analytic Solution – Low dimensional example Calculate thegradient andsetitequaltozero
Analytic Solution – Low dimensional example Solve theresultingequation (also called normal equation):
Analytic Solution – General form Minimizing MSE lossfunctioncanberewritten in matrix form Optimum valueforisequaltosettingthegradienttozeroandsolvefor The importanceofthislossfunctionistightlyrelatedtothefactthattheanalyticalsolutionisavailableandcanbecalculatedexplicitlyforlow- to medium sizeddatasets!
SequentialAnalytic Solution - Motivation Actualbestestimate RLS Update rule New datapoint • Considerthefollowingcases: • Applyregressionduringoperationoftheproduct • Thereis not enoughmemorytostore all datapoints • A possiblesolutionisgivenbyRecursive Least Squares (RLS)
SequentialAnalyticSolution – The algorithm Predictionbased on oldparameters Residual Old parameterestimate Correctiongain New datapoint: Update theparameters Andthememorymatrix withbeingtheidentitymatrixofappropriatedimension
SequentialAnalyticSolution – Forgettingfactor • Some applicationsshowslowlyvaryingconditions in thelongterm, but canbeconsideredstationary on shortto medium time periods • Agingofproductsleadstoslightparameterchanges • Vehiclemassisusuallyconstantover a significantperiodof time • The RLS algorithmcan deal withthisbyintroductionof a forgettingfactor. This leadsto a reductionofweightforoldsamples.
Numerical Iterative Solutions Con‘s: • Knowledge aboutnumericoptimizationnecessary Optimum Costfunction Parameter • Regression canbesolvednumerically • Importantfor large-scaleproblemsandfor non-quadraticlossfunctions • Popularmethods: • Gradient descent • Gauss-Newton • Levenberg-Marquardt Pro‘s: • Verygeneric
Constraints on theweights • Weightscanbeinterpretedasphysicalquantities • Temperature (non-negative) • Spring constants (non-negative) • Mass (non-negative) • A valid rangeisknownfortheweights • Tireandotherfrictionmodels • Efficiency ( 0 – 100 % ) • Improvesrobustness • More difficulttosolve
Howtosolvetheregressionproblem? Isthecostfunctionquadratic? no yes Are thereparameterconstraints? no yes Isthedatasetvery large? yes no Is all dataavailableinstantanously? yes no SequentialAnalytic Solution Numeric Iterative Solution Analytic Solution
Chapter: Motivation • Chapter: Linear Models • Chapter: Loss functions • Chapter: Regularization & Validation • Chapter: Practicalconsiderations • Chapter: Summary
Howtochoosethemodel? Underfitted Welldone Overfitted • Toomanyfeatures • Unrelevantfeatures • Not enoughfeatures • Wrongstructure
Overfitting – Choice ofhyperparameters Figuresource: Bishop – Pattern Recognition andMachine Learning Overfittingisthefailuretogeneralizeproperlybetweenthedatapoints Costfunctiondecreaseswithincreasedmodelcomplexity Noise andunrelevanteffectsbecometooimportant
Overfitting – Curseofdimensionality 16 samples in one, twoandthree dimensional space • Overfittingoccursif • datapointsaresparse • Model complexityis high • Sparsityofdatapointsisdifficulttograsp • Sparsityincreases fast withincreasedinputdimension
Validation datasets Validation Data Train model A Evaluate Training Data Available Data Train model on complete Dataset Train model B Evaluate Evaluate Train model C Best model Difficulttojudgeoverfitting in high-dimensional domainsandautonomoussystems A standardtechniqueisto separate thedataintotrainingandvalidationdata
Validation datasets Figuresource: Bishop – Pattern Recognition andMachine Learning Increased Model Complexity Different hyperparameterscanbeusedto tune themodel Validation techniqueworksfor all ofthem
Common pitfallswithvalidationdatasets • Beawarethatyourvalidationdataset must reflectthefuturepropertiesoftheunderlyingphysicalrelationship. • Do not reusevalidationdatasets. Ifthe same validationsetisusedagainandagainfortestingthemodelperformanceitissomehow incorporated intothemodellingprocessanddoes not givetheexpectedresultsanymore! • Split thedatabeforefittingthemodelistherefore essential. Taking 2/3 ofthedataastrainingdatais a goodstartingvalue. Visualizeyourdataasmuchaspossible!
k-Fold Cross-validation Training Data Validation Data Validation Data Validation Data Validation Data fold • In caseof limited datasizesets, onemay not wanttoremove a substantial partofthedataforthefittingprocess • Onecanusesmallervalidationssetstoestimatethetruepredictionerrorbysplittingthedatainto multiple ‚folds‘ • Varianceoftheestimationerroris an indicatorformodelstability
Regularization • From a design pointofview, wewanttochoosemodelstructurebased on underlyingphysicalprinciplesandnot on thecharacteristicsofthedataset • Polynomialbasisfunctionstendtohave large coefficientsforsparsedatasets • Gaussianbasisfunctionstendtooverfitlocally, whichleadstosingle, large coefficients • A techniquetocircumventthisisregularization • Penalize high coefficients in theoptimizationpreventstheseeffects • Weightingofpenaltytermgives an intuitive hyperparametertocontrolmodelcomplexity
TypicalRegularization – Ridge Regression Regularization Term Other names: L2 regularization, Thikonovregularization Preventsoverfittingwell Analyticsolutionisavailableas an extensiontothe MSE problem Difficulttoapplyand tune in high-dimensional featurespaces
TypicalRegularization – Lasso Regression Regularization Term • Other names: L1 regularization • Tendstoproducesparsesolutionsandcanthereforebeappliedforfeatureselection • Sparsesolutionmeans, thatseveralcoefficientsgotozero: