1 / 72

Financial Econometrics

Financial Econometrics. Dr. Kashif Saleem Associate Professor (Finance) Lappeenranta School of Business. Lecture 1 & 2: Agenda. Introduction What is econometrics? Types of data Steps involved in formulating an econometric model

palmer-rios
Download Presentation

Financial Econometrics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Financial Econometrics Dr. Kashif Saleem Associate Professor (Finance) Lappeenranta School of Business

  2. Lecture 1 & 2: Agenda Introduction What is econometrics? Types of data Steps involved in formulating an econometric model Points to consider when reading articles in empirical finance A brief overview of the classical linear regression model What is a regression model? Regression versus correlation Simple regression The assumptions underlying the classical linear regression model Properties of the OLS estimator Precision and standard errors An introduction to statistical inference A special type of hypothesis test: the t-ratio Introduction and Empirical data exercises with Eviews

  3. The Nature and Purpose of Econometrics • What is Econometrics? • Literal meaning is “measurement in economics”. • Definition of financial econometrics: The application of statistical and mathematical techniques to problems in finance.

  4. Examples of the kind of problems that may be solved by an Econometrician Financial econometrics can be useful for: • testing theories in finance • determining asset prices or returns • testing hypotheses concerning the relationships between variables • examining the effect on financial markets of changes in economic conditions • forecasting future values of financial variables • financial decision-making

  5. Types of Data and Notation • There are 3 types of data which econometricians might use for analysis: 1. Time series data 2. Cross-sectional data 3. Panel data, a combination of 1. & 2. • The data may be quantitative (e.g. exchange rates, stock prices, number of shares outstanding), or qualitative (e.g. day of the week). • Time series data: Data that have been collected over a period of time on one or more variables

  6. Types of Data and Notation • Examples of time series data SeriesFrequency GNP or unemployment monthly, or quarterly government budget deficit annually money supply weekly value of a stock market index as transactions occur • Examples of Problems that Could be Tackled Using a Time Series Regression - Relationship between country’s stock index macroeconomic variables - Relationship between company’s stock price its dividend payment. - Relationship between different macroeconomic and financial variables

  7. Types of Data and Notation (cont’d) • Cross-sectional data: are data on one or more variables collected at a single point in time, e.g. - gross annual income for each of 1000 randomly chosen households in Helsinki City for the year 2000. - A poll of usage of internet stock broking services - Cross-section of stock returns on the New York Stock Exchange - A sample of bond credit ratings for UK banks

  8. Types of Data and Notation (cont’d) • Panel Data has the dimensions of both time series and cross-sections, it is mult- dimensional data. • Panel data contains observations on multiple phenomena observed over multiple time periods for the same firms or individuals • e.g. the daily prices of a number of blue chip stocks over two years. • Examples: • Annual unemployment rates of each state over several years • Quarterly sales of individual stores over several quarters • Wages for the same worker, working at several different jobs

  9. Cross-sectional, Time series and panel data (example): • Consider, for example, a set of 1000 households randomly chosen from all households of Lappeenranta • The observed variable is the gross annual income. • A set of 1000 annual income values for year 1995 for each household is an example of cross-sectional data. • From such data one could derive information on how income was distributed among households in Lappeenranta in 1995. • A series of 10 values of the average annual income of ALL 1000 households for each year since 1991 to 2000 is an example of Time series data. • From such data one could derive information on how average income changed during the decade from 1991-2000 • A set of 1000x10 = 10,000 values of the annual income for each household for each year from 1991 to 2000 is an example of Panel data.

  10. Returns in Financial Modelling • It is preferable not to work directly with asset prices, so we usually convert the raw prices into a series of returns. There are two ways to do this: Simple returns or log returns Formulas on Board

  11. Steps involved in the formulation of econometric models Economic or Financial Theory (Previous Studies) Formulation of an Estimable Theoretical Model Collection of Data Model Estimation Is the Model Statistically Adequate? No Yes Reformulate Model Interpret Model Use for Analysis

  12. Some Points to Consider when reading papers in the academic finance literature 1. Does the paper involve the development of a theoretical model or is it merely a technique looking for an application, or an exercise in data mining? 2. Is the data of “good quality”? Is it from a reliable source? Is the size of the sample sufficiently large for asymptotic theory to be invoked? 3. Have the techniques been validly applied? Have diagnostic tests for violations of been conducted for any assumptions made in the estimation of the model?

  13. Some Points to Consider when reading papers in the academic finance literature (cont’d) 4. Have the results been interpreted sensibly? Is the strength of the results exaggerated? Do the results actually address the questions posed by the authors? 5. Are the conclusions drawn appropriate given the results, or has the importance of the results of the paper been overstated?

  14. A brief overview of the classical linear regression model

  15. Regression • Regression is probably the single most important tool at the econometrician’s disposal. But what is regression analysis? • It is concerned with describing and evaluating the relationship between a given variable (usually called the dependent variable) and one or more other variables (usually known as the independent variable(s)). • More specifically, regression is an attempt to explain movements in a variable by reference to the movements in one or more other variables.

  16. Some Notation • Denote the dependent variable by y andthe independent variable(s) by x1, x2, ..., xkwhere there are k independent variables. y (whose movements the regression seeks to explain) x ( the variables which are used to explain those variations) • Some alternative names for the y and x variables: dependent variable independent variables regressand regressors effect variable causal variables explained variable explanatory variable • Note that there can be many x variables but we will limit ourselves to the case where there is only one x variable to start with. In our set-up, there is only one y variable.

  17. Regression is different from Correlation • The correlation between two variables measures the degree of linear associationbetween them. • Thus, it is not impliedthat changes in x cause changes in y • Rather, it is simply stated that there is evidence for a linear relationship between the two variables, and that movements in the two are on average related to an extent given by the correlation coefficient. • In regression, the dependent variable (y) and the independent variable(s) (xs) are treated very differently. • The yvariable is assumed to be random or ‘stochastic’ in some way, i.e. to have a probability distribution. • The xvariables are, however, assumed to have fixed (‘non-stochastic’) values in repeated samples. • Regression as a tool is more flexible and more powerful than correlation.

  18. Simple Regression • For simplicity, say k=1. This is the situation where y depends on only one xvariable. • Examples of the kind of relationship that may be of interest include: • How asset returns vary with their level of market risk • Measuring the long-term relationship between stock prices and dividends.

  19. Simple Regression: An Example • Suppose that we have the following data on the excess returns on a fund manager’s portfolio (“fund XXX”) together with the excess returns on a market index: • We have some intuition that the beta on this fund is positive, and we therefore want to find whether there appears to be a relationship between x and y given the data that we have. The first stage would be to form a scatter plot of the two variables.

  20. Finding a Line of Best Fit • We can use the general equation for a straight line, y=a+bx to get the line that best “fits” the data. • However, this equation (y=a+bx) is completely deterministic. • Is this realistic? No. So what we do is to add a random disturbance term, u into the equation. yt = +xt+ ut where t = 1,2,3,4,5

  21. Why do we include a Disturbance term? • The disturbance term can capture a number of features: - We always leave out some determinants of yt (effect of other variables on Y which are not in the model) - There may be errors in the measurement of yt that cannot be modelled. - Random outside influences on yt which we cannot model (unexpected events)

  22. Determining the Regression Coefficients • So how do we determine what and are? • Choose andso that the (vertical) distances from the data points to the fitted lines are minimised (so that the line fits the data as closely as possible):

  23. Ordinary Least Squares • The most common method used to fit a line to the data is known as OLS (ordinary least squares). • What we actually do is take each distance and square it (i.e. take the area of each of the squares in the diagram) and minimise the total sum of the squares (hence least squares). • Tightening up the notation, let yt denote the actual data point t denote the fitted value from the regression line denote the residual, yt -

  24. OLS

  25. Actual and Fitted Value

  26. Deriving the OLS Estimator ON BOARD 

  27. What do We Use and For? • If “x” increases by by 1 unit, “y” will be expected, every thing else being equal, to increase by the value of . of course, if beta had been negative, a rise in “x” would on average cause a fall in “y” • Alpha is known as intercept coefficient estimate, can be interpreted as the value that would be taken by the dependent variable “y” if the independent variable “x” took a value of zero

  28. What do We Use and For? • In the CAPM example, plugging the 5 observations in to the formulae derived would lead to the estimates = xxx and = xxx. -- on board • Question: If an analyst tells you that she expects the market to yield a return 20% higher than the risk-free rate next year, what would you expect the return on fund XXX to be? • Solution: on board

  29. The Population and the Sample • The population is the total collection of all objects or people to be studied, for example, • Interested inPopulation of interest predicting outcome the entire electorate of an election • A sample is a selection of just some items from the population. • A random sample is a sample in which each individual item in the population is equally likely to be drawn.

  30. Linearity • In order to use OLS, we need a model which is linear in the parameters (and ). It does not necessarily have to be linear in the variables (y and x). • Linear in the parameters means that the parameters are not multiplied together, divided, squared or cubed etc. • Some models can be transformed to linear ones by a suitable substitution or manipulation, e.g. the exponential regression model • Then let yt=ln Ytand xt=ln Xt

  31. Estimator or Estimate? • Estimators are the formulae used to calculate the coefficients • Estimates are the actual numerical values for the coefficients.

  32. The Assumptions Underlying the Classical Linear Regression Model (CLRM) • The model which we have used is known as the classical linear regression model. • We observe data for xt, but since yt also depends on ut, we must be specific about how the ut are generated. • We usually make the following set of assumptions about the ut’s (the unobservable error terms): • On Board 

  33. BLUE • If assumptions 1. through 4. hold, then the estimators and determined by OLS are known as Best Linear Unbiased Estimators (BLUE). What does the acronym stand for? • “Estimator” - is an estimator of the true value of . • “Linear” - is a linear estimator • “Unbiased” - On average, the actual value of the and ’s will be equal to the true values. • “Best” - means that the OLS estimator has minimum variance among the class of linear unbiased estimators. The Gauss-Markov theorem proves that the OLS estimator is best.

  34. Properties of the OLS Estimator • Consistent The least squares estimators and are consistent. That is, the estimates will converge to their true values as the sample size increases to infinity. Need the assumptions E(xtut)=0 and Var(ut)=2 <  to prove this. • Unbiased The least squares estimates of and are unbiased. That is E( )= and E( )= Thus on average the estimated value will be equal to the true values. To prove this also requires the assumption that E(ut)=0. Unbiasedness is a stronger condition than consistency. • Efficiency An estimator of parameter  is said to be efficient if it is unbiased and no other unbiased estimator has a smaller variance. If the estimator is efficient, we are minimising the probability that it is a long way off from the true value of .

  35. Precision and Standard Errors • Any set of regression estimates of and are specific to the sample used in their estimation. • Recall that the estimators of and from the sample parameters ( and ) are given by • What we need is some measure of the reliability or precision of the estimators ( and ). The precision of the estimate is given by its standard error. Given assumptions 1 - 4 above, then the standard errors can be shown to be given by • Formulas on Board

  36. Estimating the Variance of the Disturbance Term (cont’d) Some Comments on the Standard Error Estimators 1. Both SE( ) and SE( ) depend on s2(or s). The greater the variance s2, then the more dispersed the errors are about their mean value and therefore the more dispersed y will be about its mean value. 2. The sum of the squares of x about their mean appears in both formulae. The larger the sum of squares, the smaller the coefficient variances. 3. The larger the sample size, T, the smaller will be the coefficient variances.

  37. Example: How to Calculate the Parameters and Standard Errors • Assume we have the following data calculated from a regression of y on a single variable x and a constant over 22 observations. • Data: • Calculations: on board 

  38. Hypothesis testing • Hypothesis testing is the use of statistics to determine the probability that a given hypothesis is true. The usual process of hypothesis testing consists of four steps. • 1. Formulate the Null hypothesis (commonly, that the observations are the result of pure chance) and the Alternative hypothesis (commonly, that the observations show a real effect combined with a component of chance variation). • 2. Identify a test statistic that can be used to assess the truth of the Null hypothesis. • 3. Compute the P- Values which is the probability that a test statistic at least as significant as the one observed would be obtained assuming that the Null hypothesis were true. The smaller the -value, the stronger the evidence against the null hypothesis. • 4. Compare the p-value to an acceptable significance value (sometimes called an alpha value). If , that the observed effect is statistically significant, the null hypothesis is ruled out, and the alternative hypothesis is valid.

  39. Hypothesis Testing Example: Suppose we have the following regression results: • is a single (point) estimate of the unknown population parameter, . How “reliable” is this estimate? • The reliability of the point estimate is measured by the coefficient’s standard error.

  40. Hypothesis Testing: Some Concepts • We can use the information in the sample to make inferences about the population. • We will always have two hypotheses that go together, the null hypothesis (denoted H0) and the alternative hypothesis (denoted H1). • The null hypothesis is the statement or the statistical hypothesis that is actually being tested. The alternative hypothesis represents the remaining outcomes of interest. • For example, suppose given the regression results, we are interested in the hypothesis that the true value of  is in fact 0.5. We would use the notation H0 :  = 0.5 H1 :  0.5 This would be known as a two sided test.

  41. One-Sided Hypothesis Tests • Sometimes we may have some prior information that, for example, we would expect  > 0.5 rather than  < 0.5. In this case, we would do a one-sided test: H0 :  = 0.5 H1 :  > 0.5 or we could have had H0 :  = 0.5 H1 :  < 0.5 • There are two ways to conduct a hypothesis test: via the test of significance approach or via the confidence interval approach.

  42. Testing Hypotheses: The Test of Significance Approach • Assume the regression equation is given by , for t=1,2,...,T • The steps involved in doing a test of significance are: 1. Estimate , and , in the usual way 2. Calculate the test statistic. This is given by the formula where is the value of  under the null hypothesis.

  43. The Test of Significance Approach • lets suppose Our calculated value is 5.93 • All we have to do now is compare this with the critical value which we get from a table of critical values of t • First work out how many degrees of freedom (for a t test this is the total number of pieces of data minus 2). • let say we have 50 observations , In our case this is (50-2) = 48. • Enter the table at the nearest number of degrees of freedom to yours. • If you're between two values always take the lower one. • Our appropriate entry point is therefore 40 degrees of freedom. • critical value is 2.021

  44. The Test of Significance Approach • 5.93 is bigger than 2.021. • In a t test if our calculated value is bigger than the critical value we reject our null hypothesis. • In rejecting our hypothesis of no difference we are saying that there is indeed a significant difference between the means of the two sets of data. • In choosing the 5% significance level we are saying that we would expect to be correct in accepting or rejecting our null hypothesis 95% of the time.

  45. Determining the Rejection Region for a Test of Significance 5. Given a significance level, we can determine a rejection region and non-rejection region. For a 2-sided test:

  46. The Rejection Region for a 1-Sided Test (Upper Tail)

  47. The Rejection Region for a 1-Sided Test (Lower Tail)

  48. How to Carry out a Hypothesis Test Using Confidence Intervals 1. Calculate , and , as before. 2. Choose a significance level, , (again the convention is 5%). This is equivalent to choosing a (1-)100% confidence interval, i.e. 5% significance level = 95% confidence interval 3. Use the t-tables to find the appropriate critical value, which will again have T-2 degrees of freedom. 4. The confidence interval is given by 5. Perform the test: If the hypothesised value of  (*) lies outside the confidence interval, then reject the null hypothesis that  = *, otherwise do not reject the null.

  49. Constructing Tests of Significance and Confidence Intervals: An Example • Using the regression results above, , T=22 • Using both the test of significance and confidence interval approaches, test the hypothesis that  =1 against a two-sided alternative. • The first step is to obtain the critical value. We want tcrit = t20;5%

More Related