Tópicos Especiais em Aprendizagem

Tópicos Especiais em Aprendizagem Reinaldo Bianchi Centro Universitário da FEI 2012

1a. Aula Parte B

Objetivos desta aula • Apresentar os conceitos básicos de Aprendizado de Máquina: • Introdução. • Definições Básicas. • Áreas de Aplicação. • Statistical Machine Learning. • Aula de hoje: Capítulos 1 do Mitchell, 1 do Nilsson e 1 e 2 do Hastie + Wikipedia.

MainApproachesaccordingtoStatistics ExplanationbasedLearning Decisiontrees Case BasedLearning Inductivelearning BayesianLearning NearestNeighbors Neural Networks Support Vector Machines GeneticAlgorithms Regression Clustering ReinforcementLearning Classification AI StatisticsNeural Network

MainApproachesaccordingtoStatistics NearestNeighbors Support Vector Machines Regression Clustering Classification AI StatisticsNeural Network

MainApproachesaccordingtoStatistics NearestNeighbors Regression Clustering Classification AI StatisticsNeural Network

Primeira aula, parte B • Introduction to Statistical Machine Learning: • Basic definitions. • Regression. • Classification.

LivroTexto • The Elements of Statistical Learning • Data Mining, Inference, and Prediction

WhyStatisticalLearning? • “Statisticallearning plays a key role in manyareasofscience, financeandindustry.” • “Thescienceoflearning plays a key role in thefieldsofstatistics, data miningand artificial intelligence, intersectingwithareasofengineeringandother disciplines.”

SML problems Predictwhether a patient, hospitalizeddueto a heartattack, willhave a secondheartattack. Thepredictionisto be basedondemographic, dietandclinicalmeasurementsforthatpatient. Predictthepriceof a stock in 6 monthsfromnow, onthe basis ofcompany performance measuresandeconomic data.

SML problems Identifythenumbers in a handwritten ZIP code, from a digitizedimage. Estimatetheamountofglucose in thebloodof a diabeticperson, fromtheinfraredabsorptionspectrumofthatperson'sblood. Identifytheriskfactorsforprostatecancer, basedonclinicalanddemographic variables.

Examplesof SML problems ProstateCancer StudybyStameyet al. (1989) thatexaminedthecorrelationbetweenthelevelofprostatespecificantigen (PSA) and a numberofclinicalmeasures. Thegoal is to predictthelogof PSA (lpsa) from a numberofmeasurements.

Examplesofsupervisedlearningproblems

Otherexamplesoflearningproblems DNA Microarrays Expression matrix of 6830 genes (rows, only 100 shown) and 64 samples (columns) for the human tumor data. The display is a heat map, ranging from bright green (negative, under expressed) to bright red (positive, over expressed). Missing values are grey.

Other examples of learning problems DNA Microarrays Expression matrix of 6830 genes (rows, only 100 shown) and 64 samples (columns) for the human tumor data. The display is a heat map, ranging from bright green (negative, under expressed) to bright red (positive, over expressed). Missing values are grey.

Other examples of learning problems DNA Microarrays Expression matrix of 6830 genes (rows, only 100 shown) and 64 samples (columns) for the human tumor data. The display is a heat map, ranging from bright green (negative, under expressed) to bright red (positive, over expressed). Missing values are grey. • Task: describe how the data are organised or clustered. • (unsupervised learning)

Overview of Supervised Learning Cap 2 do Hastie

Variable TypesandTerminology • In thestatisticalliteraturetheinputsare oftencalledthepredictors, inputs, and more classicallytheindependent variables. • In thepatternrecognitionliteraturethetermfeaturesispreferred, whichwe use as well. • Theoutputsare calledthe responses, orclassicallythedependent variables.

Variable TypesandTerminology • Theoutputsvary in natureamongtheexamples: • ProstateCancerpredictionexample: • The output is a quantitativemeasurement. • Handwrittendigitexample: • The output isoneof 10 differentdigitclasses: G = {0,1,...,9}

Namingconventionforthepredictiontask • Thedistinction in output type has ledto a namingconventionforthepredictiontasks: • Regressionwhenwepredictquantitativeoutputs. • Classificationwhenwepredictqualitativeoutputs. • Both can be viewed as a task in functionapproximation.

Examplesof SML problems ProstateCancer StudybyStameyet al. (1989) thatexaminedthecorrelationbetweenthelevelofprostatespecificantigen (PSA) and a numberofclinicalmeasures. Thegoal is to predictthelogof PSA (lpsa) from a numberofmeasurements. • Regressionproblem

Examplesofsupervisedlearningproblems • Classificationproblem

Qualitative variables representation • Qualitative variables are representednumerically by codes: • Binary case: iswhenthere are onlytwoclassesorcategories, such as “success” or “failure,” “survived” or “died.” • These are oftenrepresented by a single binarydigitorbit as 0 or 1, orelse by −1 and 1.

Qualitative variables representation • Whenthere are more thantwocategories, Themostcommonlyusedcodingisviadummy variables: • K-levelqualitative variable isrepresented by a vector of K binary variables or bits, onlyoneofwhichis “on” at a time. • Thesenumericcodes are sometimesreferredto as targets.

Variables • Wewilltypically denote aninput variable by thesymbolX. • IfX is a vector, itscomponents can be accessed by subscriptsXj. • Observedvalues are written in lowercase: hencetheithobservedvalueofX iswritten as xi • Quantitativeoutputswill be denoted by Yandqualitativeoutputswill be denoted by G (forgroup).

Two Simple ApproachestoPrediction: LeastSquares (método dos mínimos quadrados) andNearestNeighbors (método dos vizinhosmais próximos)

Linear Methods for Regression • “Linear models were largely developed in the pre-computer age of statistics, but even in today’s computer era there are still good reasons to study and use them.” (Hastie et al.)

Linear Methods for Regression • For prediction purposes they can sometimes outperform non-linear models, especially in situations… • small sample size • low signal-to-noise ratio • sparse data • Transformation of the inputs

Linear ModelsandLeastSquares The linear model has been a mainstayofstatisticsforthepast 30 yearsandremainsoneofitsmostimportanttools. Given a vector ofinputs: wepredictthe output Y viathemodel:

Linear Models Thetermistheintercept, alsoknown as thebias in machinelearning. Oftenitisconvenienttoincludetheconstant variable 1 in X, include in the vector ofcoefficients , andthenwritethe linear model in vector form as aninnerproduct:

Positive Linear Relationship E(y) Regression line Intercept b0 Slope b1 is positive x

Negative Linear Relationship E(y) Regression line Intercept b0 Slope b1 is negative x

No Relationship E(y) Regression line Intercept b0 Slope b1 is 0 x

Fitting the data: Least Squares • How do wefitthe linear modelto a set of training data? • by far themost popular isthemethodofleastsquares. • Pick thecoefficientsβtominimizetheResidual SumofSquares:

Least Squares Method • Least Squares Criterion: • where: • yi = observed value of the dependent variable for the ith observation • yi = estimated value of the dependent variable for the ith observation ^

Fitting the data: Least Squares • RSS(β) is a quadraticfunctionoftheparameters, andhenceitsminimumalwaysexists, but may not be unique. • Thesolutioniseasiesttocharacterize in matrixnotation: • whereXisanN × pmatrixwitheachrowaninput vector • yisan N-vector oftheoutputs

Fitting the data: Least Squares • Differentiating withrespecttoβweget:

Fitting the data: Least Squares • AssumingthatX has full columnrank, we set thefirstderivativetozero: • IfXTXisnonsingular, thentheuniquesolutionisgiven by:

Example: height x shoe size • We wanted to explore the relationship between a person’s height and their shoe size. • We asked to individuals their height and corresponding shoe size. • We believe that a persons shoe size depends upon their height. • The height is independent variable x. • Shoe size is the dependent variable, y.

Example: height x shoe size The following data was collected: Height, x (inches) Shoe size, y Person 1 69 9.5 Person 2 67 8.5 Person 3 71 11.5 Person 4 65 10.5 Person 5 72 11 Person 6 68 7.5 Person 7 74 12 Person 8 65 7 Person 9 66 7.5 Person 10 72 13

Example: height x shoe size

Least Squares Method(forma matricial) Theuniquesolutionisgiven by: Oftenitisconvenienttoincludetheconstant variable 1 in X, include in the vector ofcoefficients

X without Bias β0

X with Bias β0

XTX

XTX n

XTy

Tópicos Especiais em Aprendizagem

Tópicos Especiais em Aprendizagem

Presentation Transcript