290 likes | 377 Views
Machine learning for economists. ME!. Hannes Rosenbusch Social Psychology PhD project on data science methods for psychology. Agenda. What is machine learning ? Important concepts Classic prediction models Coding. Machine learning is…. Actually it is this …. y = m * x + b
E N D
ME! • Hannes Rosenbusch • SocialPsychology • PhD project on data sciencemethodsforpsychology
Agenda • What is machine learning? • Important concepts • Classic predictionmodels • Coding
Actuallyit is this… • y = m * x + b • Predictionmodels (supervised machine learning) • Patternrecognition (unsupervised)
What is thedifferencetomywork? • Slightdifference in focus
Am I already machine learning? • Machine learning = regressionmodels + focus on prediction? • Kind of, yes. • I thought I canaskfor a bigger computer?!?!
Benefits of ML • Reminduswhat is useful (prediction) • Unbiasedquantifications of predictionaccuracy • Better, cooler, more accurate predictionmodels
How toquantifyaccuracy • We needtoquantifyaccuracy manymetricesavailable
How toquantifyaccuracy • We needtoquantifyaccuracy manymetricesavailable • … and make predictionsfor “new samples” • Fit model on sample A + Evaluatemodel on sample A BIAS • Evaluate model on sample B instead
Hold out method • Training set & Test set • Build model with training set (50-80% of data) • Evaluateaccuracywith rest data • How much data do I need?
Cross-validation • Split data intok sets • Build model withall but one set • Evaluateaccuracy on left out set • Rotateleft-out set
Let’s get tothemodels • Linearandlogisticregression are machine learning! • But there are others! • Today we look at three • regularization-focused, tree-based, similarity-based
Cool models 1: Penalizedregression • Normalregression: shrinkresiduals • Resulting model potentiallyunstable (multicolinearity) • Bettertransferabilityto new samples • Solution: minimizeresiduals AND coefficients (draw)
Cool models 2: decision trees/forests • Yes/no rulesinsteadof betaweights • Where do rulescomefrom? • Optimization in training sample • Minimizehetereogeneity in leaves
Cool models 2: decision trees/forests • Yes/no rulesinsteadof betaweights • Where do rulescomefrom? • Optimization in training sample • Minimizehetereogeneity in leaves
Cool models 2: decision trees/forests • Random forest • Combination of different decision trees • Take a random sample of data • Take a random sample of variables at the splits • Build tree • Repeat 500 times
Cool models 3: nearestneighborsmodels • Predict case that is most similarto new observation
Hyperparameters • Youneedtotell these modelshowto act • Ridgeregression How muchpenalization? • Trees How many branches? • Nearestneighbors How manyneighbors? • Tuning means adjusting hyperparameters to make model accurate • Choosethe hyperparameter thatgiveshighestaccuracy on new data
In sum • Machine learning is notmagic…it is prediction • Socialscientists are very well preparedtoacquire ML skills • Extensionsto classic methods are: • Focus on prediction • Out-of-sample evaluation • Cool new models
Making it happen • Python offers simpleimplementations of: • many ML models • splittinginto training and test set • quantifications of accuracy • #1 ML package: sklearn
Practicechallenge • Predictthemedianincome of US counties • You get a dataset with plenty of predictors • (census, electionresults, psych. Tests, Twitter) • Together we implement: • Regression model • Cross-validation • Evaluation