280 likes | 482 Views
Regression Analysis: Outline. Review on Regression Analysis Regression with Categorical explanatory variables Pooled Regression: Fixed Effect and Random Effect models. Research Purpose Research questions, objectives, hypotheses Methodology Type of Study
E N D
Regression Analysis: Outline • Review on Regression Analysis • Regression with Categorical explanatory variables • Pooled Regression: Fixed Effect and Random Effect models
Research Purpose Research questions, objectives, hypotheses Methodology Type of Study Sampling plan and sample size determination Data collection methods Data analysis plan Execution Data collection and analysis Data collection and Data analysis Discussion and Conclusion Research Evaluations Regression Analysis in the overall context of Research
Regression Analysis: Review • What is Regression? • Dependence measure~ estimate the overall relationships between the dependent and independent variables • Examples of dependent and independent variables? • Regression and Causality (~ experiment, theory ) • Regression (~predict dependent) and Correlation (~ linear association) • Uses of Regression • Descriptive~ describe relationship and how strong? • Inference ~ which variables are most important/ significant? • Predictive ~ forecasting • Hypothesis Testing • Sample Size
Type of Variables in Regression Analysis • Independent • Dependent • Moderating • Mediating • Moderation-mediation
Moderating Variables Moderating Variables Testing Moderation • Y = b0 + b1* X + b2* Z + b3* XZ +e Y = [b1 + b3* Z] X + [b0+b2*Z]
Mediator Variables Mediator Variables b a Attitude B BI c
Multivariate Research Methods: Regression Analysis: Review • How it works? • Formalization of regression model: • y = b0 + b1 x1+ b2 X2+ …+bk Xk+ error • intercept, slope, error • Examples?? • What do we observe? Y and X’s and estimate b’s • Which variables to include? • Theory, Prior research, common sense • If you don’t have any idea? • statistical criteria: stepwise, Forward and Backward ( in cases of only metric data??) • Moderator Effects ~ Interaction Variables • How to Obtain Estimates? • Least square method of Regression • Any straight line you fit will have some error • Objective is to minimize that errors e.g. sums of squared values of difference between Y and Y-predicted. • Or minimize the sum squares errors • Y = a + b*X + e leads to e = Y - a -b*X • e2 = (Y - a - b*X)2 ~ minimize sum of e2 usystematic part Systematic part
Multivariate Research Methods: Regression Analysis: Review • Interpretation of parameter estimates? • Intercept • mean of the dependent ~ when value of all independent variables are zero • Mean of the dependent ~ when all slopes are zero • Not always meaningful • Slopes: • Change in Y as we change one unit of X. • zero slope ? X does not affect Y • b1, b2,…..bk: partial regression coefficients • e.g. b1 = Change in the value of Y if X1 is changed by one unit while all other explanatory variables are ( X2 …Xk) kept constant.
Multivariate Research Methods: Regression Analysis: Review • Interpretation of parameter estimates? • Size of the regression coefficient • depends on the scale of the explanatory variable • Which variable is a good explanatory variables then size of the coefficient is not a good predictor for that. • Scale of the independent variables ~ within 10 times • Beta coefficients/ or standardized coefficients, • provides relative importance • Elasticity: This measures the percentage change in dependent variable for 1 % change in the independent variable.
Is Regression coefficient Significant? Is Regression Significant? Overall goodness of fit? r2 r ~ coefficient of multiple correlation adjusted r2 Multivariate Research Methods: Regression Analysis: Review Y RSS ( error) TSS ESS Y= b0+bX X
Multivariate Research Methods: Regression Analysis: Review • Detecting problems with the assumptions? • Heteroscedasticity • error variances are not same • when errors are related to either dependent or independent variables • e.g more stable saving ( or consumption) with lower income families/ larger variances with brand switchers than brand loyal customers Variance Saving Income • Remedy ?? If we know the nature of heteroscedasticity, we can use WLS • • Volatility ~ Finance ??
Regression Analysis : Review • Detecting problems with the assumptions? • Autocorrelation~ more a time-series problem • when errors are correlated with consecutive obs. • Reasons? • Omitted variables • Model mis-specification • Detection • Graphical methods • Durbin-Watson ~ DW= 2 (1-r), DW varies between 0 - 4 • ideal number is 2 et Y Positive Negative X et-1 • Problem? • • Over estimate coeff. of determination and • underestimate the standard errors
Detecting problems with the assumptions? Multicollinearity presence of very high interrelations among explanatory variables (do not violate any assumption) Symptoms:The standard errors are likely to be high, Estimates are not reliable? Detection Bivariate correlation Variance Inflation Factor (VIF)~ 10 Tolerance = 1/VIF Remedies Drop variables composite variables e.g. Family life cycles, Social Status Factor analysis Multivariate Research Methods: Regression Analysis: Review X2 X1 X2 X1 Y Y
Multivariate Research Methods: Regression Analysis: Review • Detecting problems with the assumptions? • Linear in parameters • Y = a + b*X2 + e ~ linear in parameters but non-linear in variables • Y = a + b2*X1 + b*X2+ e~ non-linear in parameters: Non-linear regression • The Regression model is correctly specified • Functional form, e.g. new consumer durable sales • Influential observation • outliers • whether one or a few observations??
Regression Analysis: Review • Outliers: In linear regression, an outlier is an observation with large residual. Problem with dependent variable?? • Leverage: An observation with an extreme value on a independent variable is called a point with high leverage. Leverage is a measure of how far an independent variable deviates from its mean. These leverage points can have an unusually large effect on the estimate of regression coefficients. • Influence: An observation is said to be influential if removing the observation substantially changes the estimate of coefficients. • Detection • RESIDUAL CHECK • Standardized residual • Studentized residual • Problem approx.: abs. value > 2
Regression Analysis: Review • Transformation of variables • Dependent variable should be normally dist., constant variance etc • e.g. GNP per capita, Log(Price) etc • Retransformation ?? • Forecasting • model fit versus forecasting • forecasting independent variables • Model Selection / comparing models • adjusted R-sq • Model Validation • Cross-validation • Jackknife validation
Multivariate Research Methods: Regression Analysis: Limitations • Nominal independent variables ~ dummy variable regression • gender, income groups, ethnicity, region, race etc. • Measurement error~ Structural equation models • XTrue = Xobs + ex • Y=b0 +b1 * XTrue + eY • Y= b0 +b1 * (Xobs + ex) + eY • Y= b0 +b1 * Xobs + b1*ex + eY • Y= b0 +b1 * Xobs + b1*ex + eY Error term is correlated with x-variable ~ this violates the reg. assumption
Regression Analysis: Limitations • Limited dependent variable • Censored dependent variable ~ lots of zeros Tobit Regression • Expenditures in home buying • Demand in a supply restricted situation • vacation expenditures • Truncated dependent variable ~ duration analysis, available in LIMDEP • Interpurchase times • duration of unemployment Y (e.g housing exp.) X (e.g. income)
Regression with Categorical Explanatory Variables • Some modeling problems • Is gender important in determining the level of expenditure on medical expenses? • Do Nescafe’s supermarket coffee sales vary by state? • How would you model the impact of local crime on housing prices if crime rate were rated - none, moderate or high? • How do I include income as a determinant of cigarette demand when data have only been collected by income class? • Examples • Medical expenditure = intercept+ b1* Gender + b2* age group + error • Sales=intercept+ b1*Provinces+ error
Midterm exam scores by sex average score of female and male student: . Interpretation of regression coefficients: Binary Coding score male female
Midterm exam scores by sex average score of female and male student: . Interpretation of regression coefficients: Effect Coding score male female Note: we are not estimating
Regression Analysis: Non-Linear Regression • Example: Sales and Price dynamics of New Product Sales First Purchase Sales Price Time Time
Pooled Regression: Fixed Effect and Random Effect models • Panel Data – Cross Sectional Time Series Data • Observations on “n” individuals (or countries, firms etc), each measured at T points in time (T can be different for each measuring unit) • Observations are not independent • use panel structure to get better parameter estimates • Control for fixed or random individual differences • Example of Data Setup…. • Software : LIMDEP ( also SAS…) • Example: Cross-sectional survey 50% Female Participation in Labor Force??
Fixed Effect –individual slopes are different - shifted by “fixed” amount Random Effect – individual differences are random rather than fixed – random slope terms. The slope is function of mean slope value plus random error Pooled Regression: Fixed Effect and Random Effect models - Unobserved heterogeneity that is stable over time - This ui is uncorrelated with X’s
Pooled Regression: Fixed Effect and Random Effect models • The Hausman Test: • Model Selection – Fixed Effect vs Random Effect • H0: that random effects would be consistent and efficient, versus • H1: that random effects would be inconsistent. Chi-Square Test Statistic.