Introduction to Machine Learning for Social Scientists

Machine learningforeconomists

ME! • Hannes Rosenbusch • SocialPsychology • PhD project on data sciencemethodsforpsychology

Agenda • What is machine learning? • Important concepts • Classic predictionmodels • Coding

Machine learning is…

Actuallyit is this… • y = m * x + b • Predictionmodels (supervised machine learning) • Patternrecognition (unsupervised)

What is thedifferencetomywork?

What is thedifferencetomywork? • Slightdifference in focus

Am I already machine learning? • Machine learning = regressionmodels + focus on prediction? • Kind of, yes. • I thought I canaskfor a bigger computer?!?!

Benefits of ML • Reminduswhat is useful (prediction) • Unbiasedquantifications of predictionaccuracy • Better, cooler, more accurate predictionmodels

How toquantifyaccuracy • We needtoquantifyaccuracy  manymetricesavailable

Basic concepts

How toquantifyaccuracy • We needtoquantifyaccuracy  manymetricesavailable • … and make predictionsfor “new samples” • Fit model on sample A + Evaluatemodel on sample A  BIAS • Evaluate model on sample B instead

Basic concepts

Hold out method • Training set & Test set • Build model with training set (50-80% of data) • Evaluateaccuracywith rest data • How much data do I need?

Cross-validation • Split data intok sets • Build model withall but one set • Evaluateaccuracy on left out set • Rotateleft-out set

Cross-validation

Let’s get tothemodels • Linearandlogisticregression are machine learning! • But there are others! • Today we look at three • regularization-focused, tree-based, similarity-based

Cool models 1: Penalizedregression • Normalregression: shrinkresiduals • Resulting model potentiallyunstable (multicolinearity) • Bettertransferabilityto new samples • Solution: minimizeresiduals AND coefficients (draw)

Cool models 2: decision trees/forests • Yes/no rulesinsteadof betaweights • Where do rulescomefrom? • Optimization in training sample • Minimizehetereogeneity in leaves

Cool models 2: decision trees/forests • Random forest • Combination of different decision trees • Take a random sample of data • Take a random sample of variables at the splits • Build tree • Repeat 500 times

Cool models 3: nearestneighborsmodels • Predict case that is most similarto new observation

Hyperparameters • Youneedtotell these modelshowto act • Ridgeregression How muchpenalization? • Trees  How many branches? • Nearestneighbors  How manyneighbors? • Tuning means adjusting hyperparameters to make model accurate • Choosethe hyperparameter thatgiveshighestaccuracy on new data

Workflow of machine learning

In sum • Machine learning is notmagic…it is prediction • Socialscientists are very well preparedtoacquire ML skills • Extensionsto classic methods are: • Focus on prediction • Out-of-sample evaluation • Cool new models

Break andthencoding

Making it happen • Python offers simpleimplementations of: • many ML models • splittinginto training and test set • quantifications of accuracy • #1 ML package: sklearn

Practicechallenge • Predictthemedianincome of US counties • You get a dataset with plenty of predictors • (census, electionresults, psych. Tests, Twitter) • Together we implement: • Regression model • Cross-validation • Evaluation

Introduction to Machine Learning for Social Scientists

Introduction to Machine Learning for Social Scientists

Presentation Transcript

Machine Learning

Machine Learning

MACHINE LEARNING

Machine Learning

Machine Learning

Machine Learning

Machine Learning

Machine Learning

Machine Learning For Beginners

Machine learning Courses | Machine Learning Training

Machine Learning

Machine Learning

Machine Learning

Machine Learning

Machine Learning

Machine Learning

Machine Learning

Machine Learning

Machine learning

Machine Learning Projects | Machine Learning Applications | Machine Learning Training | Simplilearn