300 likes | 425 Views
Qunatitative Methods in Social Sciences (E774). Sudip Ranjan BASU , Ph.D 27 November 2009. Model Selection Procedures. Selecting explanatory variables for a model Maximum R 2 Backward elimination all significant coefficients Forward selection adding variables Stepwise regression
E N D
Qunatitative Methods in Social Sciences (E774) Sudip Ranjan BASU,Ph.D 27 November 2009
Model Selection Procedures Selecting explanatory variables for a model Maximum R2 Backward elimination all significant coefficients Forward selection adding variables Stepwise regression drop variables if they loose their significance as other variables added Exploratory vs. Explanatory Research Part -1 Lecture 11-Sudip R. Basu 2
Regression Diagnostics Regression function follows a linear relationship Conditional distribution of Y (dependent variable) follows a normal distribution Homoscedasticity: Conditional distribution of Y has constant standard deviation throughout the range of values of the explanatory variables. Sample is randomly selected Lecture 11-Sudip R. Basu 3
Checking Residuals Examine the residuals Plotting Residuals against Explanatory variables Lecture 11-Sudip R. Basu 4
Heteroskedasticity Estimation of regression model Obtain Residuals Plotting Residuals against Fitter values or one or more Explanatory variables Plotting Residuals against Fitter values or one or more Explanatory variables Breusch-Pagan/Cook-Weisberg test Ho=Constant variance P value (Chi-square test)=0.00 Reject constant variance hypothesis Significant heteroskedasticity implies SE and Ho might be invalid Lecture 11-Sudip R. Basu 5
Outliers: Influential Observations? Remove Outliers Leverage is a nonnegative statistic such that larger its value, the greater weight that observation receives in determining DFFIT: effect on the fit of deleting observation The larger its absolute value, greater the influence that observation has on fitted values DFBETA: effect on the model parameter estimates of removing observation from dataset The larger the absolute value, the greater the influence of the observations on the parameter estimates Cook’s distance: effect that observation i has on all the predicted values Lecture 11-Sudip R. Basu 6
Multicollinearity: Multicollinearity: Explanatory variables ‘overlap’ considerably and higher R2 values Multicollinearity inflates standard errors Variance inflation factor: multiplicative increase in the variance (squared se) of the estimator due to xj being correlated with other predictors Lecture 11-Sudip R. Basu 7
Generalised Linear Model Response variable (y) is non-normal y discrete binary variable (success or failure) Logistic regression Y discrete count variable (# of children) Poisson and negative binomial distribution Y continuous non-normal variable Gamma distribution Part -2: NOT PART OF FINAL EXAM on 7th December Lecture 11-Sudip R. Basu 8
Link function Link function is g(μ) Identity link Log link Logistic link Use of explanatory variables Lecture 11-Sudip R. Basu 9
OLS as special case of GLM Y can have a distribution other than the normal GLM can model a function of the mean GLM not need to transform data Maximum likelihood applied to GLM Choose most appropriate probability distribution of y variable Lecture 11-Sudip R. Basu 10
Nonlinear relationship If data nonlinear, then OLS may underestimate result Prediction may poorly approximate the true regression curve Two approaches to handle Polynomial regression Loglinear regression Lecture 11-Sudip R. Basu 11
Quadratic regression models A polynomial regression function, y response and x explanatory variable Quadratic regression model Cubic regression model Lecture 11-Sudip R. Basu 12
Nonparametric regression Fitting model without assuming particular functional forms No (fewer) assumptions of functional and distribution of y Plot of a fitted nonparametic regression model to learn about (overall) trends in data Generalised additive model, when GLM is the special case if these functions are linear Lecture 11-Sudip R. Basu 13
Exponential regression Y is an exponential function Taking logarithm of exponential function Interpreting ‘Multiplicative’ not linear coefficients E(y) changes by the same percentage for each unit increase in X Lecture 11-Sudip R. Basu 14
Logistic Regression Y –categorical variable with two possible outcomes Binary response variable (1 or 0), P(y=1) Binomial distribution Linear probability model Single explanatory variable Lecture 11-Sudip R. Basu 15
Binary response y variable Curvilinear relationship Odds ratio Logistic transformation Logistic regression model For β>0, P(y=1) increases as X increases For β<0, P(y=1) decreases as X increases For β=0, P(y=1) does not change as X increases Logistic regression for probabilities Lecture 11-Sudip R. Basu 16
Binomial probability distribution Categorical data, discrete variable Each observation falls into 2 categories Probabilities for two categories are same for each observation Category 1: π, Category 2: 1- π Outcomes of successive observations are independent Lecture 11-Sudip R. Basu 17
Properties of binomial distributions BD perfectly symmetric if π=0.50, otherwise skewed Skewness increases as π gets closer to 0 or 1 Sample proportion Mean , standard deviation Binomial test Lecture 11-Sudip R. Basu 18
Multiple Logistic Regression LRM with multiple predictors LRM probabilities For 2 predictors Odds ratio-log of odds, multiplicative Probabilities of predictors impact Lecture 11-Sudip R. Basu 19
Inference for LRM Bivariate logistic regression Ho=0, x has no effect on P(y=1) Use z-distribution, except for small sample Wald statistic, chi-square dist, df=1 Likelihood-ratio test: extra parameters in full model is equal to 0 LR Test statistic: For large samples, W and LR similar results For small sample, use LR test results Lecture 11-Sudip R. Basu 20
Probit Regression Y –categorical variable with standard normal probability distribution Probit score/index A one-unit increase in x leads to increasing the probit score by ‘b’ standard deviations. Probit model Probabilities Lecture 11-Sudip R. Basu 21
Wrap-up Week 8- Week 12 Lecture 11-Sudip R. Basu
Correlation Theory Correlation shows the association between two or more variables Degree of relationship Degree of covariability 3 key issues If relationship exists? And how to measure Testing the significance Exploring cause and effect relation Correlation: Positive or negative Depends on direction of change of variables. If both variables are varying in same direction, then positive correlation. Simple, partial and multiple Simple if two variables are studied Partial/multiple if three or more variables are studied Linear and non-linear Depends on the constancy of the ratio of change between the variables. If the amount of change in one variable tends to bear a constant ration to the amount of change in the other variable, then linear relationship. Scatter diagram A way to see if two variables are related, represented by a dot chart Graphic method Two variables observations are plotted by looking at their direction and closeness Lecture 11-Sudip R. Basu
Correlation coefficient Correlation coefficient (Pearson’s r) Describes degree of correlation between two variables Interpreting r (-1«r«+1) If r=+1: perfect positive relationship If r=-1: perfect negative relationship If r=0: no relationship Coefficient of determination: explained variation/total variation (>less value is not good!!) Rank Correlation: Using orders/ranks of observations rather than actual observations between two variables Steps to compute « r »: Compute deviations of observations of two variables from their respective mean values Square these deviations and obtain sum of squared deviations Multiply the deviations of observations of two variables and obtain total Substitute these values in the formula above: Lecture 11-Sudip R. Basu
Linear Regression Model Regression function is a mathematical function that describes how mean of Y changes according to the value of X β is regression coefficient σ Conditional standard deviation Estimate R2 of predicted equation H0: β =0 –variables are statistically independent Test statistic: Standard error of b: Multiple regression function Slope in mrf describes the effect of an explanatory variable while controlling effects of other explanatory variables in the model β1 and β2are partial regression coefficients R-squared (0,1) Coefficient of multiple determinations If R2=1 If R2=0 Testing collective influence of Xi Alternative hypothesis Test statistic-F distribution: A small P-value for H0: β =0 regression line has nonzero slope Lecture 11-Sudip R. Basu
Regression Diagnostics Model Selection Procedures Selecting explanatory variables for a model-maxR2 Backward elimination-all significant coefficients Forward selection -adding variables Stepwise regression-drop variables if they loose their significance as other variables added Exploratory vs. Explanatory Research Examine the residuals Plotting Residuals against Explanatory variables Heteroskedasticity Lecture 11-Sudip R. Basu
Detecting Influential (outlier) Observations Remove Outliers Leverage is a nonnegative statistic such that larger its value, the greater weight that observation receives in determining DFFIT-effect on the fit of deleting observation The larger its absolute value, greater the influence that observation has on fitted values DFBETA-effect on the model parameter estimates of removing observation from dataset The larger the absolute value, the greater the influence of the observations on the parameter estimates Cook’s distance-effect that observation i has on all the predicted values Lecture 11-Sudip R. Basu
Effects of multicollinearity Multicollinearity-Explanatory variables ‘overlap’ considerably and higher R2 values Multicollinearity inflates standard errors Variance inflation factor-multiplicative increase in the variance (squared se) of the estimator due to xj being correlated with other predictors. The VIF ranges from 1.0 to infinity. VIFs greater than 10.0 are generally seen as indicative of severe multicollinearity. 1/VIF -Tolerance ranges from 0.0 to 1.0, with 1.0 being the absence of multicollinearity. Lecture 11-Sudip R. Basu
Wish you all the very best!! • Presentation 1: 4 December 8.15am-10am • Group # 1-8 • 15 minutes maximum per group @ AJF-Villa Barton • Presentation 2: 4 December 4.15pm-6pm • Group # 9-16 • 15 minutes maximum per group @ AJF-Villa Barton • MDEV Exam: 7 December 4.30pm-5.30pm • @E1+E2/Bungener- Rothschild Be happyand enjoy numbers in life…. Lecture 11-Sudip R. Basu