510 likes | 631 Views
Econ 488. Lecture 2 Cameron Kaplan. Hypothesis Testing. Suppose you want to test whether the average person receives a B or higher (3.0) in econometrics. The Null Hypothesis (H 0 ): Usually trying to reject this: H 0 : µ =3.0. Hypothesis Testing.
E N D
Econ 488 Lecture 2 Cameron Kaplan
Hypothesis Testing • Suppose you want to test whether the average person receives a B or higher (3.0) in econometrics. • The Null Hypothesis (H0): Usually trying to reject this: • H0: µ =3.0
Hypothesis Testing • Alternative Hypothesis (HA or H1): The null hypothesis is not true • HA: µ ≠3.0 (two-sided) • Or HA: µ >3.0 (one-sided) • Usually we pick the two sided test unless we can rule out the possibility that µ >3.0
Hypothesis Testing • Suppose we conduct a sample of 20 former econometrics students we found: • Sample Mean = 3.30 • Standard Deviation = 0.25 • How likely is it that a sample of 20 would give a sample average of 3.30 if the population average was really 3.0?
Hypothesis Testing • When we estimate x-bar using an estimated standard error we need to use the t-distribution
Hypothesis Testing • Test Statistic: • Significance Level - Most common is 5% or 1%.
5 % significance level • If really was 3.0, what values of t would give us a test that would reject the null when it’s correct only 5% of the time?
Hypothesis Testing • We have a sample size of 20 • Thus we have N-1 = 20-1 = 19 degrees of freedom. • Look in t-table • t* = 2.093 • So if our value of t is greater than 2.093 OR less than -2.093, we should reject the null hypothesis
Hypothesis testing So, we should reject the null
P-value • Suppose we want to know: if the average student really got a 3.0, how likely would it be for us to observe a value at least as far from 3.0 as we did in our sample? • In other words, if = 3.0, how likely is it that when we draw a sample of 20 that we would get a sample mean of 3.3 or greater (or 2.7 or less)?
P-value • We want to know the probability that t>5.366 • Can’t look up in most tables, but most stats software gives it to you. • In this case, p=0.000035 • In other words if the null were true, we would only get a value that extreme 0.0035% of the time (1 out of 29,000 times) • This is strong evidence that we should reject the null.
P-value • If p-value is smaller than the significance level, reject null. • P-value is nice, because if you are given p-value, you don’t have to look anything else up in a table. • Smaller p-values mean null hypothesis is less likely to be true.
Bias • A biased sample is a sample that differs significantly from the population.
Common Types of Bias • Selection Bias • Sample systematically excludes or underrepresents certain groups. • e.g. calculating the average height of US men using data from medicare records • We are systematically excluding the young, who may be different for many reasons.
Common Types of Bias • Self-Selection Bias/Non-Response Bias • Bias that occurs when people choose to give certain information. • e.g. ads to participate in medical studies • e.g. calculating average CSUCI GPA by asking students to volunteer to let us look at their transcripts.
Common Types of Bias • Survivor Bias • Suppose we are looking at the historical average performance of companies on the NYSE, and wanted to know how that was related to CEO pay. • One problem that we might have is that we might only look at companies that are still around. • We are excluding companies that went out of business.
Review of Regression • Regression - Attempt to explain movement in one variable as a function of a set of other variables • Example: Are higher campaign expenditures related to more votes in an election?
Review of Regression • Dependent Variable - Variable that is observed to change in response to the independent variable • e.g. share of votes in the election • Independent Variable(s) (AKA explanatory variable) - variables that are used to explain variation in dependent variable. • e.g. campaign expenditures.
Review of Regression • Example: Demand • Quantity is dependent variable • Price, Income, Price of compliments, Price of Substitutes are all independent variables.
Simple Regression • Y = 0+1X • Y: Dependent Variable • X: Independent Variable • 0: Intercept (or Constant) • 1: Slope Coefficient
Simple Regression Y 1 0 X
Simple Regression • 1 is the response of Y to a one unit increase in X • 1 =Y/X • When we look at real data, the points aren’t all on the line
Simple Regression • How do we deal with this? • By adding a stochastic error term to the equation. • Y = 0 + 1X + • Deterministic Component • Stochastic Component
Simple Regression Y 0 + 1X X
Why do we need ? • Omitted Variables • Measurement Error • The underlying relationship may have a different functional form • Human behavior is random
Notation • There are really N equations because there are N observations. • Yi = 0 + 1Xi + i (i=1,2,…,N) • E.g. • Y1 = 0 + 1X1 + 1 • Y2 = 0 + 1X2 + 2 • … • YN = 0 + 1XN + N
Multiple Regression • We can have more than one independent variable • Yi = 0 + 1X1i + 2X2i + 3X3i + I • What does 1 mean? • It is the impact of a one unit increase in X1 on the dependent variable (Y), holding X2 and X3constant.
Steps in Empirical Economic Analysis • Specify an economic model. • Specify an econometric model. • Gather data. • Analyze data according to econometric model. • Draw conclusions about your economic model.
Step 1: Specify an Economic Model • Example: An Economic Model of Crime • Gary Becker • Crimes have clear economic rewards (think of a thief), but most criminal behavior has economic costs. • The opportunity cost of crime prevents the criminal from participating in other activities such as legal employment, • In addition, there are costs associated with the possibility of being caught, and then, if convicted, there are costs associated with being incarcerated.
Economic Model of Crime • y=f(x1, x2, x3, x4, x5, x6, x7) • y=hours spent in criminal activity • x1=“wage” for an hour spent in criminal activity • x2=hourly wage in legal employment • x3=income from sources other than crime/employment • x4=probability of getting caught • x5=probability of being convicted if caught • x6=expected sentence if convicted • x7=age
Economic Model of Education • What is the effect of education on wages? • wage=f(educ,exper,tenure) • educ=years of education • exper=years of workforce experience • tenure=years at current job
Step 2: Specify an econometric model • In the crime example, we can’t reasonably observe all of the variables • e.g. the “wage” someone gets as a criminal, or even the probability of being arrested • We need to specify an econometric model based on observable factors.
Econometric Model of Crime • crimei = 0 + 1wagei + 2othinci + 3freqarri + 4freqconvi + 5avgseni + 6agei + I • crime = some measure of frequency of criminal activity • wage = wage earned in legal employment • othinc = income earned from other sources • freqarr = freq. of arrests for prior infractions
Econometric Model of Crime • crimei = 0 + 1wagei + 2othinci + 3freqarri + 4freqconvi + 5avgseni + 6agei + I • freqconv = frequency of convictions • avgsen = average length of sentence • age= age in years • = stochastic error term
Econometric Model of Crime • The stochastic error term contains all of the unobserved factors, e.g. wage for criminal activity, prob of arrest, etc. • We could add variables for family background, parental education, etc, but we will never get rid of
Wage and Education • wagei = 0 + 1educi + 2exper + 3tenurei + I • What are the signs of the betas? • Run Regression in Gretl! (wage1.gdt)
Step 3: Gathering Data Types of Data: • Cross-Sectional Data • Time Series Data • Pooled Cross Sections • Panel/Longitudinal Data
Cross-Sectional Data • A sample of individuals, households, firms, cities, states, or other units, taken at a given point in time • Random Sampling • Mostly used in applied microeconomics • Examples • General Social Survey • US Census • Most other surveys
Time Series Data • Observations on a variable or several variables over time • E.g. stock prices, money supply, CPI, GDP, annual homicide rates, etc. • Because past events can influence future events, and lags in behavior are common in economics, time is an important dimension of time-series
Time Series Data • More difficult to analyze than cross-sectional data • Observations across time are not independent • May also have to control for seasonality
Pooled Cross-Sections • Both time series and cross-sectional features • Suppose we collect data on households in 1985 and 1990 • We can combine both of these into one data set by creating a pooled cross-section • Good if there is a policy change between years • Need to control for time in analysis
Panel/Longitudinal Data • A panel data set consists of a time series for each cross-sectional member • E.g. select a random sample of 500 people, and follow each for 10 years.
Causality & Ceteris Paribus • What we really want to know is: does the independent variable have a causal effect on the dependent variable • But: Correlation does not imply causation • Suppose we want to know if higher education leads to higher worker productivity
Causality and Ceteris Paribus • If we find a relationship between education and wages, we don’t know much • Why? What if highly educated people have higher IQs, and it’s really high IQ that leads to higher wages? • If you give a random person more education, will they get higher wages?