Qunatitative Methods in Social Sciences (E774)

Qunatitative Methods in Social Sciences (E774) Sudip Ranjan BASU,Ph.D 27 November 2009

Model Selection Procedures Selecting explanatory variables for a model Maximum R2 Backward elimination all significant coefficients Forward selection adding variables Stepwise regression drop variables if they loose their significance as other variables added Exploratory vs. Explanatory Research Part -1 Lecture 11-Sudip R. Basu 2

Regression Diagnostics Regression function follows a linear relationship Conditional distribution of Y (dependent variable) follows a normal distribution Homoscedasticity: Conditional distribution of Y has constant standard deviation throughout the range of values of the explanatory variables. Sample is randomly selected Lecture 11-Sudip R. Basu 3

Checking Residuals Examine the residuals Plotting Residuals against Explanatory variables Lecture 11-Sudip R. Basu 4

Heteroskedasticity Estimation of regression model Obtain Residuals Plotting Residuals against Fitter values or one or more Explanatory variables Plotting Residuals against Fitter values or one or more Explanatory variables Breusch-Pagan/Cook-Weisberg test Ho=Constant variance P value (Chi-square test)=0.00 Reject constant variance hypothesis Significant heteroskedasticity implies SE and Ho might be invalid Lecture 11-Sudip R. Basu 5

Outliers: Influential Observations? Remove Outliers Leverage is a nonnegative statistic such that larger its value, the greater weight that observation receives in determining DFFIT: effect on the fit of deleting observation The larger its absolute value, greater the influence that observation has on fitted values DFBETA: effect on the model parameter estimates of removing observation from dataset The larger the absolute value, the greater the influence of the observations on the parameter estimates Cook’s distance: effect that observation i has on all the predicted values Lecture 11-Sudip R. Basu 6

Multicollinearity: Multicollinearity: Explanatory variables ‘overlap’ considerably and higher R2 values Multicollinearity inflates standard errors Variance inflation factor: multiplicative increase in the variance (squared se) of the estimator due to xj being correlated with other predictors Lecture 11-Sudip R. Basu 7

Generalised Linear Model Response variable (y) is non-normal y discrete binary variable (success or failure) Logistic regression Y discrete count variable (# of children) Poisson and negative binomial distribution Y continuous non-normal variable Gamma distribution Part -2: NOT PART OF FINAL EXAM on 7th December Lecture 11-Sudip R. Basu 8

Link function Link function is g(μ) Identity link Log link Logistic link Use of explanatory variables Lecture 11-Sudip R. Basu 9

OLS as special case of GLM Y can have a distribution other than the normal GLM can model a function of the mean GLM not need to transform data Maximum likelihood applied to GLM Choose most appropriate probability distribution of y variable Lecture 11-Sudip R. Basu 10

Nonlinear relationship If data nonlinear, then OLS may underestimate result Prediction may poorly approximate the true regression curve Two approaches to handle Polynomial regression Loglinear regression Lecture 11-Sudip R. Basu 11

Quadratic regression models A polynomial regression function, y response and x explanatory variable Quadratic regression model Cubic regression model Lecture 11-Sudip R. Basu 12

Nonparametric regression Fitting model without assuming particular functional forms No (fewer) assumptions of functional and distribution of y Plot of a fitted nonparametic regression model to learn about (overall) trends in data Generalised additive model, when GLM is the special case if these functions are linear Lecture 11-Sudip R. Basu 13

Exponential regression Y is an exponential function Taking logarithm of exponential function Interpreting ‘Multiplicative’ not linear coefficients E(y) changes by the same percentage for each unit increase in X Lecture 11-Sudip R. Basu 14

Logistic Regression Y –categorical variable with two possible outcomes Binary response variable (1 or 0), P(y=1) Binomial distribution Linear probability model Single explanatory variable Lecture 11-Sudip R. Basu 15

Binary response y variable Curvilinear relationship Odds ratio Logistic transformation Logistic regression model For β>0, P(y=1) increases as X increases For β<0, P(y=1) decreases as X increases For β=0, P(y=1) does not change as X increases Logistic regression for probabilities Lecture 11-Sudip R. Basu 16

Binomial probability distribution Categorical data, discrete variable Each observation falls into 2 categories Probabilities for two categories are same for each observation Category 1: π, Category 2: 1- π Outcomes of successive observations are independent Lecture 11-Sudip R. Basu 17

Properties of binomial distributions BD perfectly symmetric if π=0.50, otherwise skewed Skewness increases as π gets closer to 0 or 1 Sample proportion Mean , standard deviation Binomial test Lecture 11-Sudip R. Basu 18

Multiple Logistic Regression LRM with multiple predictors LRM probabilities For 2 predictors Odds ratio-log of odds, multiplicative Probabilities of predictors impact Lecture 11-Sudip R. Basu 19

Inference for LRM Bivariate logistic regression Ho=0, x has no effect on P(y=1) Use z-distribution, except for small sample Wald statistic, chi-square dist, df=1 Likelihood-ratio test: extra parameters in full model is equal to 0 LR Test statistic: For large samples, W and LR similar results For small sample, use LR test results Lecture 11-Sudip R. Basu 20

Probit Regression Y –categorical variable with standard normal probability distribution Probit score/index A one-unit increase in x leads to increasing the probit score by ‘b’ standard deviations. Probit model Probabilities Lecture 11-Sudip R. Basu 21

Wrap-up Week 8- Week 12 Lecture 11-Sudip R. Basu

Correlation Theory Correlation shows the association between two or more variables Degree of relationship Degree of covariability 3 key issues If relationship exists? And how to measure Testing the significance Exploring cause and effect relation Correlation: Positive or negative Depends on direction of change of variables. If both variables are varying in same direction, then positive correlation. Simple, partial and multiple Simple if two variables are studied Partial/multiple if three or more variables are studied Linear and non-linear Depends on the constancy of the ratio of change between the variables. If the amount of change in one variable tends to bear a constant ration to the amount of change in the other variable, then linear relationship. Scatter diagram A way to see if two variables are related, represented by a dot chart Graphic method Two variables observations are plotted by looking at their direction and closeness Lecture 11-Sudip R. Basu

Correlation coefficient Correlation coefficient (Pearson’s r) Describes degree of correlation between two variables Interpreting r (-1«r«+1) If r=+1: perfect positive relationship If r=-1: perfect negative relationship If r=0: no relationship Coefficient of determination: explained variation/total variation (>less value is not good!!) Rank Correlation: Using orders/ranks of observations rather than actual observations between two variables Steps to compute « r »: Compute deviations of observations of two variables from their respective mean values Square these deviations and obtain sum of squared deviations Multiply the deviations of observations of two variables and obtain total Substitute these values in the formula above: Lecture 11-Sudip R. Basu

Linear Regression Model Regression function is a mathematical function that describes how mean of Y changes according to the value of X β is regression coefficient σ Conditional standard deviation Estimate R2 of predicted equation H0: β =0 –variables are statistically independent Test statistic: Standard error of b: Multiple regression function Slope in mrf describes the effect of an explanatory variable while controlling effects of other explanatory variables in the model β1 and β2are partial regression coefficients R-squared (0,1) Coefficient of multiple determinations If R2=1 If R2=0 Testing collective influence of Xi Alternative hypothesis Test statistic-F distribution: A small P-value for H0: β =0 regression line has nonzero slope Lecture 11-Sudip R. Basu

Regression Diagnostics Model Selection Procedures Selecting explanatory variables for a model-maxR2 Backward elimination-all significant coefficients Forward selection -adding variables Stepwise regression-drop variables if they loose their significance as other variables added Exploratory vs. Explanatory Research Examine the residuals Plotting Residuals against Explanatory variables Heteroskedasticity Lecture 11-Sudip R. Basu

Detecting Influential (outlier) Observations Remove Outliers Leverage is a nonnegative statistic such that larger its value, the greater weight that observation receives in determining DFFIT-effect on the fit of deleting observation The larger its absolute value, greater the influence that observation has on fitted values DFBETA-effect on the model parameter estimates of removing observation from dataset The larger the absolute value, the greater the influence of the observations on the parameter estimates Cook’s distance-effect that observation i has on all the predicted values Lecture 11-Sudip R. Basu

Effects of multicollinearity Multicollinearity-Explanatory variables ‘overlap’ considerably and higher R2 values Multicollinearity inflates standard errors Variance inflation factor-multiplicative increase in the variance (squared se) of the estimator due to xj being correlated with other predictors. The VIF ranges from 1.0 to infinity. VIFs greater than 10.0 are generally seen as indicative of severe multicollinearity. 1/VIF -Tolerance ranges from 0.0 to 1.0, with 1.0 being the absence of multicollinearity. Lecture 11-Sudip R. Basu

Wish you all the very best!! • Presentation 1: 4 December 8.15am-10am • Group # 1-8 • 15 minutes maximum per group @ AJF-Villa Barton • Presentation 2: 4 December 4.15pm-6pm • Group # 9-16 • 15 minutes maximum per group @ AJF-Villa Barton • MDEV Exam: 7 December 4.30pm-5.30pm • @E1+E2/Bungener- Rothschild Be happyand enjoy numbers in life…. Lecture 11-Sudip R. Basu

Lecture 2-Sudip R. Basu

Qunatitative Methods in Social Sciences (E774)

Qunatitative Methods in Social Sciences (E774)

Presentation Transcript

Research Methods in the Social Sciences

Quantitative Methods in Social Sciences (E774)

Quantitative Methods in Social Sciences (E774): Review Session X

Research Methods in the Social Sciences

Research Methods in the Social Sciences

Research Methods in Social Sciences

Quantitative Methods in Social Sciences (E774): Review Session XI

Quantitative Methods in Social Sciences (E774): Review Session IX

Research Methods in the Social Sciences

Quantitative Methods in Social Sciences (E774): Review Session V

Quantitative Methods in Social Sciences (E774)

Quantitative Methods in Social Sciences (E774): Review Session V

Quantitative Methods in Social Sciences (E774): Review Session VIII

Quantitative Methods For Social Sciences

Qunatitative Methods in Social Sciences (E774)

Quantitative Methods in Social Sciences (E774)

Quantitative Methods in Social Sciences (E774)

Quantitative Methods in Social Sciences (E774)

Quantitative Methods in Social Sciences (E774)

Quantitative Methods in Social Sciences (E774)

Quantitative Methods in Social Sciences (E774)

Quantitative Methods in Social Sciences (E774): Review Session II