Tópicos Especiais em Aprendizagem

Tópicos Especiais em Aprendizagem Reinaldo Bianchi Centro Universitário da FEI 2012

2a. Aula Parte A

Objetivos desta aula • Apresentar os conceitos de Statistical Machine Learning • Continuação de Regressão. • Métodos de Validação e Seleção. • Aula de hoje: • Capítulos 3 e 7 do Hastie. • Wikipedia e Matlab Help

Aula passada • Vimos: • Conceitos de Aprendizado de Máquina. • Statistical Machine Learning: • Predição, Regressão e Classificação • MetodosLeastMeanSquare e NearestNeighbour

Variable TypesandTerminology • In thestatisticalliteraturetheinputsare oftencalledthepredictors, inputs, and more classicallytheindependent variables. • In thepatternrecognitionliteraturethetermfeaturesispreferred, whichwe use as well. • Theoutputsare calledthe responses, orclassicallythedependent variables.

Namingconventionforthepredictiontask • Thedistinction in output type has ledto a namingconventionforthepredictiontasks: • Regressionwhenwepredictquantitativeoutputs. • Classificationwhenwepredictqualitativeoutputs. • Both can be viewed as a task in functionapproximation.

Examplesof SML problems ProstateCancer StudybyStameyet al. (1989) thatexaminedthecorrelationbetweenthelevelofprostatespecificantigen (PSA) and a numberofclinicalmeasures. Thegoal is to predictthelogof PSA (lpsa) from a numberofmeasurements. • Regressionproblem

Examplesofsupervisedlearningproblems • Classificationproblem

Linear ModelsandLeastSquares The linear model has been a mainstayofstatisticsforthepast 30 yearsandremainsoneofitsmostimportanttools. Given a vector ofinputs: wepredictthe output Y viathemodel:

Linear Models Thetermistheintercept, alsoknown as thebias in machinelearning. Oftenitisconvenienttoincludetheconstant variable 1 in X, include in the vector ofcoefficients , andthenwritethe linear model in vector form as aninnerproduct:

Fitting the data: Least Squares • How do wefitthe linear modelto a set of training data? • by far themost popular isthemethodofleastsquares. • Pick thecoefficientsβtominimizetheResidual SumofSquares:

Fitting the data: Least Squares • AssumingthatX has full columnrank, we set thefirstderivativetozero: • IfXTXisnonsingular(invertible), thentheuniquesolutionisgiven by:

Example: height x shoe size • We wanted to explore the relationship between a person’s height and their shoe size. • We asked to individuals their height and corresponding shoe size. • We believe that a persons shoe size depends upon their height. • The height is independent variable x. • Shoe size is the dependent variable, y.

Scatter Plot with Trend Line

Linear ModelsandLeastSquares: Regression Using the learned parameters βone can do compute new outputs via regression. At anarbitraryinputx0thepredictionis: Intuitively, itseemsthatwe do notneed a verylarge data set tofitsuch a model.

Example Height x Shoe Size • Thus if a person is 5 feet tall (i.e. x=60 inches), then I would estimate their shoe size to be:

Regression using LMS

Other Linear methods FromMatlabHelp http://www.mathworks.com/help/toolbox/curvefit/bq_5ka6-1.html

Linear methods can approximatepolinomials • Linear methods can alsoapproximate polinomial curves. = Intercept = Linear coefficient = Quadraticcoefficient

Dataset: US Census x = 1900 1910 1920 1930 1940 1950 1960 1970 1980 1990 2000 y = 75.9950 91.9720 105.7110 123.2030 131.6690 150.6970 179.3230 203.2120 226.5050 249.6330 281.4220

Dataset: US Census Trytopredictthe US population in theyear 2010

Linear x = (1900:10:2000)' y = [ 75.995 91.972 105.711 123.203 131.669 150.697 179.323 203.212 226.505 249.633 281.422]' one = ones(11,1) X = [one, x] v = (X'*X)\(X'*y)

Linear v = -3783.9 2.0253 plot (x,y, 'x', x, v(1)+x*v(2))

Quadratic Multiplica elementos 1 a 1 x = (1900:10:2000)' y = [ 75.995 91.972 105.711 123.203 131.669 150.697 179.323 203.212 226.505 249.633 281.422]’ one = ones(11,1) X = [one, x, x.*x] v = (X'*X)\(X'*y)

Dataset: US Census x = 1900 1910 1920 1930 1940 1950 1960 1970 1980 1990 2000 y = 75.9950 91.9720 105.7110 123.2030 131.6690 150.6970 179.3230 203.2120 226.5050 249.6330 281.4220 x.*x = 3610000 3648100 3686400 3724900 3763600 3802500 3841600 3880900 3920400 3960100 4000000

Quadratic w = 3.2294E4 -34.985 0.0095 plot (x,y, 'x', x,v(1)+x*v(2),x,w(1)+x*w(2)+x*x*w(3))

Quadratic In 2010: Linear 286.9 mi Quadratic 311.6 mi Real Result 308,745,538 w = 3.2294E4 -34.985 0.0095 plot (x,y, 'x', x,v(1)+x*v(2),x,w(1)+x*w(2)+x*x*w(3))

Other Linear methods • Existemoutros métodos lineares, baseados no LMS: • Weighted linear leastsquares • Robustleastsquares: • Leastabsoluteresiduals (LAR). • Bisquareweights. • E existem métodos não lineares…

Weighted linear leastsquares • Itisusuallyassumedthatthe response data isofequalqualityand, therefore, has constantvariance. • Ifthisassumptionisviolated, yourfitmight be influenced by data ofpoorquality. • Toimprovethefit, you can use anadditionalscale factor (theweight) isincluded in thefittingprocess.

Weighted linear leastsquares Weightedleast-squaresregressionminimizesthe error estimate: wherewiare theweights. Theweights determine how mucheach response valueinfluencesthe final parameterestimates.

Weighted linear leastsquares Weightingyour data isrecommendediftheweights are known, orifthereisjustificationthattheyfollow a particular form. Theweightsmodifytheexpressionfortheparameterestimatesb:

Weighted linear leastsquares Theweightsyousupplyshouldtransformthe response variancesto a constantvalue. Ifyou know thevariancesofthemeasurementerrors in your data, thentheweights are given by:

Example: height x shoe size The equation: Becames: v = (X'*([W.*X(:,1),W.*X(:,2)]))\ (X'*(W.*y)) W is the weight vector: W = what you desire…

If wi = 1, is the same as LMS

If wi = selective…

If w W=abs(1./(y-X*v))

All together now wi=ones(10,1) wi=selective

RobustLeastSquares • Itisusuallyassumedthatthe response errorsfollow a normal distribution, andthat extreme values are rare. Still, extreme valuescalledoutliersdo occur. • Themaindisadvantageofleast-squaresfittingisitssensitivitytooutliers. • Outliershave a largeinfluenceonthefitbecausesquaringtheresidualsmagnifiestheeffectsofthese extreme data points.

Outliers (wikipedia) • In statistics, anoutlierisanobservationthatisnumericallydistantfromtherest of the data. • Grubbsdefinedanoutlier as: • Anoutlyingobservation, oroutlier, isonethatappearstodeviatemarkedlyfromothermembers of thesample in whichitoccurs.

Outliers – Causes (wikipedia) • Outliersariseduetochanges in systembehaviour, fraudulentbehaviour, human error, instrument error orsimplythrough natural deviations in populations. • A physicalapparatusfortakingmeasurementsmaysuffera transientmalfunction. • Error in data transmissionortranscription.

Outliers – Causes (wikipedia) • Outliersariseduetochanges in systembehaviour, fraudulentbehaviour, human error, instrument error orsimplythrough natural deviations in populations. • A samplemayhavebeencontaminatedwithelementsfromoutsidethepopulationbeingexamined. • Alternatively, anoutliercould be theresult of a flaw in theassumedtheory, callingforfurtherinvestigationbytheresearcher.

Outliers

Outliers - CAUTION Unlessit can be ascertainedthatthedeviationisnotsignificant, itisill-advisedto ignore thepresence of outliers. Outliersthatcannot be readilyexplaineddemandspecialattention. NUNCA DESPRESE UM PONTO!

Outliers

RobustLeastSquares • Tominimizetheinfluenceofoutliers, you can fityour data usingtworobustregressionmethods: • Leastabsoluteresiduals (LAR): finds a curve thatminimizestheabsolutedifferenceoftheresiduals. • Bisquareweights: minimizes a weightedsumofsquares, wheretheweightgiventoeach data pointdependson how far thepointisfromthefittedline.

Robust Least Squares

Non Linear LeastSquares • Thenonlinearleast-squaresformulation can be usedtofit a nonlinearmodelto data. • A nonlinearmodelisdefined as anequationthatisnonlinear in thecoefficients, or a combination of linear and nonlinear in thecoefficients. • Forexample, Gaussians, ratios of polynomials, and powerfunctions are allnonlinear

Non Linear LeastSquares Thebasis of themethodistoapproximatethemodelby a linear one and to refine theparametersbysuccessiveiterations. Thederivatives are functions of boththeindependent variable and theparameters, so thesegradientequations do nothave a closedsolution.

Tópicos Especiais em Aprendizagem

Tópicos Especiais em Aprendizagem

Presentation Transcript