Ka-fu Wong University of Hong Kong

Ka-fu WongUniversity of Hong Kong A Brief Review of Probability, Statistics, and Regression for Forecasting

Random variable • A random variable is a mapping from the set of all possible outcomes to the real numbers. • Today’s Hang Seng Index can go up, down or stay the same as yesterday. Consider the movement of Hang Seng Index in a month of 22 trading days. We can define a random variable Y of number of days in which Hang Seng Index goes up. In this case, Y assumes 22 values, y=1, y=2, …, y=22. • Discrete random variables can assume only a countable number of values. A discrete probability distribution describes the probability of occurrence for all the events. For instance, pi is the probability that event i will occur. • Continuous random variables can assume a continuum of values. A probability density function, f(y), is a nonnegative continuous function such that the area under f(y) between any points a and b is the probability that Y assumes a value between a and b.

Moments Mean (measures central tendency): Variance (measures dispersion around mean): Standard deviation: Skewness (measures the amount of asymmetry in a distribution): Kurtosis (measures the thickness of the tails in a distribution):

Multivariate Random Variables Joint distribution: Covariance (measures dependence between two variables): Correlation: Conditional distribution: Conditional mean Conditional variance

Statistics Sample mean: Sample variance: or Sample standard deviation: or

Statistics Sample skewness: Sample kurtosis: Jarque-Bera test statistics: Under null of independent normally distributed observations, JB is distributed in large samples as a chi-square distribution with two degrees of freedom.

Example What is our expectation of y given x=0?

Forecast • Suppose we want to forecast the value of a variable y, given the value of a variable x. • Denote that forecast yf│x.

Conditional expectation as a forecast • Think of y and x as random variables jointly drawn from some underlying population. • It seems reasonable to consider constructing the forecast of y based on x as the expected value of y conditional on x, i.e., yf│x = E(y │x ),the average population value of y given that value of x. • E(y │x ) is also called the population regression of y (on x).

Conditional expectation as a forecast • The expected value of y conditional on x yf│x = E(y │x ), • It turns out that in many reasonable forecasting settings, • this forecast has optimal properties (e.g., minimizing expected loss), and • (approximating) this forecast guides our choice of forecast method.

Unbiasedness of Conditional expectation as a forecast • The forecast error will be y - E(y │x ) • Expected forecast error = E[y - E(y │x )] = E(y)-E[E(y│x )]= E(y)-E(y) = 0 • Thus the conditional expectation is an unbiased forecast. • Note that another name for E(y │x ) is the population regression of y (on x).

Some operational assumptions about E(y | x) • In order to proceed in this direction, we need to make some additional assumptions about the underlying population and, in particular, the form of E(y │x ). • The simplest assumption to make is to assume that the conditional expectation is a linear function of x, i.e., assume E(y │x ) = β0 + β1x • If β0 and β1 are known, then the forecast problem is completed by setting yf│x = β0 + β1x

When parameters are unknown • Even if the conditional expectation is linear in x, the parameters β0 and β1 will be unknown. • The next best thing for us to do would be to estimate the values of β0 and β1 and use the estimated β’s in place of their actual values to form the forecasts. • This substitution will not provide as accurate a forecast, since we’re introducing a new source of forecast error due to “estimation error” or “sampling error.” However, under certain conditions the resulting forecast will still be unbiased and retain certain optimality properties.

When parameters are unknown • Suppose we have access to a sample of T pairs of (x,y) drawn from the population from which the relevant value of y will be drawn: (x1,y1),(x2,y2),…,(xT,yT). • In this case, a natural estimator of β0 and β1 is the ordinary least squares (OLS) estimator, which is obtained by minimizing the sum of squared residuals S (yt –β0 – β1xt)2 with respect to β0 and β1. The solution are the OLS estimates and . • Then, for a given value of x, we can forecast y according to

Fitting a regression lineEstimating β0 and β1

When parameters are unknown • This estimation procedure, also called the sample regression of y on x, will provide us with a “good” estimate of the conditional expectation of y given x (i.e., the population regression of y on x) and, therefore, a “good” forecast of y given x, provided that certain additional assumptions apply to the relationship between y and x. • Let ε denote the difference between y and E(y │x ). That is, ε = y - E(y │x ) i.e., y = E(y │x ) + ε and y = β0 + β1x + ε, if E(y │x ) = β0 + β1x.

When parameters are unknown • The assumptions that we need pertain to these ε’s (the “other factors” that determine y) and their relationship to the x’s. • For instance, so long as E(εt │x1,…,xT) = 0 for t = 1,…,T, the OLS estimator of β0 and β1 based on the data (x1,y1),…,(xT,yT) will be unbiased and, as a result, the forecast constructed by replacing these “population parameters” with the OLS estimates will be unbiased. • A standard set of assumptions that provide us with a lot of value – Given x1,…,xT , ε1,…,εT are i.i.d. N(0,σ2) random variables.

When parameters are unknown • These ideas and procedures extend naturally to the setting where we want to forecast the value of y based on the values of k other variables, say, x1,…,xk. • We begin by considering the conditional expectation or population regression of y on x1,…,xk to make our forecast. That is, yf│x1,…,xk = E(y│x1,…,xk) • To operationalize this forecast, we first assume that the conditional expectation is linear, i.e., E(y│x1,…,xk) = β0 + β1x1 + … + βkxk

When parameters are unknown • The unknown β’s are generally replaced the estimate from a sample OLS regression. • Suppose we have the data set (y1,x11,…,xk1), (y2,x12,…,xk2), …, (yT,x1T,…,xkT) • The OLS estimate of the unknown parameters are obtained by minimizing the sum-of-squared residuals, S(yt – β0 – β1x1t - … - βkxkt)2, t = 1,…,T. • As in the case of the simple regression model, this procedure to estimate the population regression function will have good properties provided that the regression errors εt = yt – E(yt│x1t,…,xkt) , t = 1,…,T have appropriate properties.

ExampleMultiple Linear regression

Residual plots

Density Forecasts and Interval Forecasts • The procedures we described above produce point forecasts of y. They can also be used to produce density and interval forecasts of y, provided that the x’s and the regression errors, i.e., the ε’s, meet certain conditions.

End

Ka-fu Wong University of Hong Kong