480 likes | 510 Views
Lecture #4. Applied Economics for business management. Lecture outline:. Review Go over Homework Set #4 Introduction to empirical analysis. 4-Step general plan for empirical analysis:. 1. Define or specify the problem. 2. Formulate a model that is oriented toward
E N D
Lecture #4 Applied Economics for business management
Lecture outline: • Review • Go over Homework Set #4 • Introduction to empirical analysis
4-Step general plan for empirical analysis: 1. Define or specify the problem. 2. Formulate a model that is oriented toward solving the stated problem. -This involves 2 parts: • specify the economic model (according • to economic theory and • industry/firm/household characteristics (ii) define the empirical model by specifying the equation(s) to be estimated.
4-Step general plan for empirical analysis: 1. Define or specify the problem. 2. Formulate a model that is oriented toward solving the stated problem. 3. Obtain data for the variables specified in the model and estimate equations of the stated model. 4. Evaluate and use the estimated results.
Empirical analysis • Why do we need a model? • The purpose of having a model is to explain • how a relationship or system works.
Empirical Analysis • A model can be simple or complex. • A simple model might contain a single relationship. • For example, what is the estimated demand function • for bananas i.e., a single equation model. • More complex models often contain multiple • relationships. For example, demand functions • for several fruits, bananas, apples, pears etc. – • a simultaneous equations model.
Data Collection Estimation of empirical models requires data. There are 2 types of data: (i) primary data (ii) secondary data
Secondary data Secondary data refer to already published data which are available from different sources. Examples of secondary data sources include the FAO, U.S. Census of Agriculture, Statistics of Hawaii Agriculture, etc. With secondary data, someone else has performed the task of collecting and verifying the data.
PRIMary Data However, with many research problems, readily available secondary data may not exist. In this case, the student or researcher must collect primary data. A popular means of collecting primary data is through survey questionnaires (personal interviews, mail surveys and telephone surveys).
Primary Data - Surveys An important advantage of surveys is that the researcher can develop them to collect data that are specific to the research question. However, collection of research data using surveys is expensive and necessitates careful attention to question design and presentation to avoid biasing responses. There are a number of textbooks on surveying, e.g., J. Converse and S. Presser. (1986). Survey Questions: Handcrafting the Standardized Questionnaire, Newbury Park, CA: Sage and R. Singleton and B. Straits. (1999). Approaches to Social Research, New York: Oxford University Press.
Sampling Working with data will most likely involve sampling (method of drawing samples). A sample is a set of observations chosen from the population for acquiring information about the population. The word population has a very specific meaning in statistics: it is the total collection of observations or objects to be studied.
Sampling The word population has a very specific meaning in statistics: it is the total collection of observations or objects to be studied. There are several types of sampling procedures, e.g., random sampling, stratified random sampling, cluster sampling, etc. The goal of sampling is to obtain sample results which are representative of the population.
example Suppose that you’re interested in estimating the demand for some food item in Albania. Would a sample from people or households in Tirana be a representative sample? Probably not … since you’re not capturing the rural demand response.
Sampling How large a sample should you use?
Fitting Lines To obtain a specific relationship between 2 variables, we often plot data on a graph. Suppose that we have the following function:
Fitting Lines We could derive a more specific relationship by drawing a line through these points that shows a more explicit relationship between
Fitting Lines How do we draw this line? What is the criterion for choosing the best line?
Regression The criterion often used is to estimate a line that minimizes the sum of squared dispersions or squared errors of points on this line from the actual data points. The tool to do this is called regression. In the case of two variables (one dependent variable and one independent (or explanatory) variable), the line fitting procedure is called simple regression analysis.
Regression Multiple regression is the term used for describing regression analysis with 2 or more explanatory variables. What is a dispersion or error?
Regression Use the following graph:
So the dispersion or error of the regression line from the actual value for point A is: likewise, where “^” implies estimated value
Squared Errors Why do we square these errors or residuals? Because positive errors can cancel negative errors. So just adding up the errors or dispersions is not an adequate method. Rather, regression minimizes the sum of squared errors. Sum of squared errors
OLS Models Assumptions for least squares regression models (or OLS models): Given Where is an vector of values for the dependent variable, is a matrix for explanatory variables plus a constant term, is a vector of coefficient values for the explanatory variables and constant, and is a vector of random errors.
Example: ↑
Assumptions for ols: • The observations are linear functions of the • observations • or the expected value of the error term is zero. • or the variance of the error term is a • constant • Note: the identity matrix I
Assumptions for ols: • The observations are linear functions of the • observations • or the expected value of the error term is zero. • or the variance of the error term is a • constant For constant variance of the error terms, we say that the residuals or error terms are homoscedastic. For varying variances of error terms, we say the residuals are heteroscedastic.
Assumptions for ols: • The observations are linear functions of the • observations • or the expected value of the error term is zero. • or the variance of the error term is a • constant Consider the covariances of the error term: Here we say the residuals or error terms are pairwise uncorrelated in the particular sample. If pairwise correlated then we have the problem of autocorrelation or serial correlation.
Assumptions for ols: • The observations are linear functions of the • observations • or the expected value of the error term is zero. • or the variance of the error term is a • constant • 4. matrix of fixed numbers. This assumption states that the exogenous variables are nonstochastic or fixed in repeated samples. • 5. No exact linear relationship exists among the exogenous variables.
Assumptions for OLS: This is the question of linear dependence or independence among explanatory variables. An important implication of this assumption is that there may be some correlations (not exact) among explanatory variables. This problem is called multicollinearity. According to the Gauss-Markov Theorem, within the class of linear, unbiased estimators, the least squares estimator has minimum variance.
Venn Diagram (See Wonnacott and Wonnacott. Econometrics)
Venn Diagram Therefore, the least squares estimator is often referred to as BLUE or best, linear, unbiased estimator.
Least Squares operator Properties of Least Squares Estimators: BLUE involves 3 concepts: best, linear and unbiased. (i) linear in terms of (ii) unbiasedness (i.e., the expected value of where b is the true population parameter.) (iii) best there is no other linear and unbiased estimator that has a smaller variance than the least squares method.
Statistics Statistics often used to help evaluate explanatory power and hypothesis testing: 1. Total variation (TV) = Explained variation (EV) + Unexplained variation (UV) (in y) Note: EV is the variation explained by the estimated equation or model
Statistics regression equation explains 70% of the variation in the dependent variable (y). Adjusted and is a more reliable measure of fit.
Statistics Statistics often used to help evaluate explanatory power and hypothesis testing: 1. 2. t-statistics or Student’s t –statistic This statistic measures the statistical significance of independent or explanatory variables.
Statistics The t-statistic can be calculated as the coefficient divided by its standard error
Statistics So if t is statistically significant reject the null hypothesis and accept the alternative hypothesis that Most statistics and econometric textbooks have t-tables to determine critical t-values depending on n and m.
Statistics Statistics often used to help evaluate explanatory power and hypothesis testing: 1. • t-statistics or Student’s t –statistic • F- statistic measures the statistical significance of the • entire regression equation. Most statistics and econometric books provide an F-statistic table so that one can find critical F-values for a given confidence level.
Statistics Statistics often used to help evaluate explanatory power and hypothesis testing: 1. • t-statistics or Student’s t –statistic • F- statistic measures the statistical significance of the • entire regression equation. 4. Durbin-Watson (DW) statistic
Statistics Durbin-Watson (DW) statistic: Often cited in time series studies to test for autocorrelation. A DW statistic of ≈ 2.0 no autocorrelation In journals, you may find the following table: (5.44)1 (0.53) (27.88) 1Figures in parentheses are standard errors.
Statistics Are the variables statistically significant at the 95% confidence level? (5.44)1 (0.53) (27.88) statistically insignificant at the 95% level of confidence
Common Regression Problems • Autocorrelation or serial correlation • Heteroskedasticity • Multicollinearity