750 likes | 909 Views
Tópicos Especiais em Aprendizagem. Reinaldo Bianchi Centro Universitário da FEI 2012. 2 a . Aula. Parte A. Objetivos desta aula. Apresentar os conceitos de Statistical Machine Learning Continuação de Regressão. Métodos de Validação e Seleção. Aula de hoje:
E N D
Tópicos Especiais em Aprendizagem Reinaldo Bianchi Centro Universitário da FEI 2012
2a. Aula Parte A
Objetivos desta aula • Apresentar os conceitos de Statistical Machine Learning • Continuação de Regressão. • Métodos de Validação e Seleção. • Aula de hoje: • Capítulos 3 e 7 do Hastie. • Wikipedia e Matlab Help
Aula passada • Vimos: • Conceitos de Aprendizado de Máquina. • Statistical Machine Learning: • Predição, Regressão e Classificação • MetodosLeastMeanSquare e NearestNeighbour
Variable TypesandTerminology • In thestatisticalliteraturetheinputsare oftencalledthepredictors, inputs, and more classicallytheindependent variables. • In thepatternrecognitionliteraturethetermfeaturesispreferred, whichwe use as well. • Theoutputsare calledthe responses, orclassicallythedependent variables.
Namingconventionforthepredictiontask • Thedistinction in output type has ledto a namingconventionforthepredictiontasks: • Regressionwhenwepredictquantitativeoutputs. • Classificationwhenwepredictqualitativeoutputs. • Both can be viewed as a task in functionapproximation.
Examplesof SML problems ProstateCancer StudybyStameyet al. (1989) thatexaminedthecorrelationbetweenthelevelofprostatespecificantigen (PSA) and a numberofclinicalmeasures. Thegoal is to predictthelogof PSA (lpsa) from a numberofmeasurements. • Regressionproblem
Examplesofsupervisedlearningproblems • Classificationproblem
Linear ModelsandLeastSquares The linear model has been a mainstayofstatisticsforthepast 30 yearsandremainsoneofitsmostimportanttools. Given a vector ofinputs: wepredictthe output Y viathemodel:
Linear Models Thetermistheintercept, alsoknown as thebias in machinelearning. Oftenitisconvenienttoincludetheconstant variable 1 in X, include in the vector ofcoefficients , andthenwritethe linear model in vector form as aninnerproduct:
Fitting the data: Least Squares • How do wefitthe linear modelto a set of training data? • by far themost popular isthemethodofleastsquares. • Pick thecoefficientsβtominimizetheResidual SumofSquares:
Fitting the data: Least Squares • AssumingthatX has full columnrank, we set thefirstderivativetozero: • IfXTXisnonsingular(invertible), thentheuniquesolutionisgiven by:
Example: height x shoe size • We wanted to explore the relationship between a person’s height and their shoe size. • We asked to individuals their height and corresponding shoe size. • We believe that a persons shoe size depends upon their height. • The height is independent variable x. • Shoe size is the dependent variable, y.
Linear ModelsandLeastSquares: Regression Using the learned parameters βone can do compute new outputs via regression. At anarbitraryinputx0thepredictionis: Intuitively, itseemsthatwe do notneed a verylarge data set tofitsuch a model.
Example Height x Shoe Size • Thus if a person is 5 feet tall (i.e. x=60 inches), then I would estimate their shoe size to be:
Other Linear methods FromMatlabHelp http://www.mathworks.com/help/toolbox/curvefit/bq_5ka6-1.html
Linear methods can approximatepolinomials • Linear methods can alsoapproximate polinomial curves. = Intercept = Linear coefficient = Quadraticcoefficient
Dataset: US Census x = 1900 1910 1920 1930 1940 1950 1960 1970 1980 1990 2000 y = 75.9950 91.9720 105.7110 123.2030 131.6690 150.6970 179.3230 203.2120 226.5050 249.6330 281.4220
Dataset: US Census Trytopredictthe US population in theyear 2010
Linear x = (1900:10:2000)' y = [ 75.995 91.972 105.711 123.203 131.669 150.697 179.323 203.212 226.505 249.633 281.422]' one = ones(11,1) X = [one, x] v = (X'*X)\(X'*y)
Linear v = -3783.9 2.0253 plot (x,y, 'x', x, v(1)+x*v(2))
Quadratic Multiplica elementos 1 a 1 x = (1900:10:2000)' y = [ 75.995 91.972 105.711 123.203 131.669 150.697 179.323 203.212 226.505 249.633 281.422]’ one = ones(11,1) X = [one, x, x.*x] v = (X'*X)\(X'*y)
Dataset: US Census x = 1900 1910 1920 1930 1940 1950 1960 1970 1980 1990 2000 y = 75.9950 91.9720 105.7110 123.2030 131.6690 150.6970 179.3230 203.2120 226.5050 249.6330 281.4220 x.*x = 3610000 3648100 3686400 3724900 3763600 3802500 3841600 3880900 3920400 3960100 4000000
Quadratic w = 3.2294E4 -34.985 0.0095 plot (x,y, 'x', x,v(1)+x*v(2),x,w(1)+x*w(2)+x*x*w(3))
Quadratic In 2010: Linear 286.9 mi Quadratic 311.6 mi Real Result 308,745,538 w = 3.2294E4 -34.985 0.0095 plot (x,y, 'x', x,v(1)+x*v(2),x,w(1)+x*w(2)+x*x*w(3))
Other Linear methods • Existemoutros métodos lineares, baseados no LMS: • Weighted linear leastsquares • Robustleastsquares: • Leastabsoluteresiduals (LAR). • Bisquareweights. • E existem métodos não lineares…
Weighted linear leastsquares • Itisusuallyassumedthatthe response data isofequalqualityand, therefore, has constantvariance. • Ifthisassumptionisviolated, yourfitmight be influenced by data ofpoorquality. • Toimprovethefit, you can use anadditionalscale factor (theweight) isincluded in thefittingprocess.
Weighted linear leastsquares Weightedleast-squaresregressionminimizesthe error estimate: wherewiare theweights. Theweights determine how mucheach response valueinfluencesthe final parameterestimates.
Weighted linear leastsquares Weightingyour data isrecommendediftheweights are known, orifthereisjustificationthattheyfollow a particular form. Theweightsmodifytheexpressionfortheparameterestimatesb:
Weighted linear leastsquares Theweightsyousupplyshouldtransformthe response variancesto a constantvalue. Ifyou know thevariancesofthemeasurementerrors in your data, thentheweights are given by:
Example: height x shoe size The equation: Becames: v = (X'*([W.*X(:,1),W.*X(:,2)]))\ (X'*(W.*y)) W is the weight vector: W = what you desire…
If w W=abs(1./(y-X*v))
All together now wi=ones(10,1) wi=selective
RobustLeastSquares • Itisusuallyassumedthatthe response errorsfollow a normal distribution, andthat extreme values are rare. Still, extreme valuescalledoutliersdo occur. • Themaindisadvantageofleast-squaresfittingisitssensitivitytooutliers. • Outliershave a largeinfluenceonthefitbecausesquaringtheresidualsmagnifiestheeffectsofthese extreme data points.
Outliers (wikipedia) • In statistics, anoutlierisanobservationthatisnumericallydistantfromtherest of the data. • Grubbsdefinedanoutlier as: • Anoutlyingobservation, oroutlier, isonethatappearstodeviatemarkedlyfromothermembers of thesample in whichitoccurs.
Outliers – Causes (wikipedia) • Outliersariseduetochanges in systembehaviour, fraudulentbehaviour, human error, instrument error orsimplythrough natural deviations in populations. • A physicalapparatusfortakingmeasurementsmaysuffera transientmalfunction. • Error in data transmissionortranscription.
Outliers – Causes (wikipedia) • Outliersariseduetochanges in systembehaviour, fraudulentbehaviour, human error, instrument error orsimplythrough natural deviations in populations. • A samplemayhavebeencontaminatedwithelementsfromoutsidethepopulationbeingexamined. • Alternatively, anoutliercould be theresult of a flaw in theassumedtheory, callingforfurtherinvestigationbytheresearcher.
Outliers - CAUTION Unlessit can be ascertainedthatthedeviationisnotsignificant, itisill-advisedto ignore thepresence of outliers. Outliersthatcannot be readilyexplaineddemandspecialattention. NUNCA DESPRESE UM PONTO!
RobustLeastSquares • Tominimizetheinfluenceofoutliers, you can fityour data usingtworobustregressionmethods: • Leastabsoluteresiduals (LAR): finds a curve thatminimizestheabsolutedifferenceoftheresiduals. • Bisquareweights: minimizes a weightedsumofsquares, wheretheweightgiventoeach data pointdependson how far thepointisfromthefittedline.
Non Linear LeastSquares • Thenonlinearleast-squaresformulation can be usedtofit a nonlinearmodelto data. • A nonlinearmodelisdefined as anequationthatisnonlinear in thecoefficients, or a combination of linear and nonlinear in thecoefficients. • Forexample, Gaussians, ratios of polynomials, and powerfunctions are allnonlinear
Non Linear LeastSquares Thebasis of themethodistoapproximatethemodelby a linear one and to refine theparametersbysuccessiveiterations. Thederivatives are functions of boththeindependent variable and theparameters, so thesegradientequations do nothave a closedsolution.