500 likes | 642 Views
Asset Return Predictability II. Tests of RW2: Technical Trading Rules. RW2. As CLM point out, RW2 is difficult to test directly. If we don’t know how the marginal distributions vary through time we can’t do statistical tests.
E N D
Asset Return Predictability II Tests of RW2: Technical Trading Rules
RW2 • As CLM point out, RW2 is difficult to test directly. • If we don’t know how the marginal distributions vary through time we can’t do statistical tests. • The way the literature has approached tests of RW2 is to see if particular strategies make money. • The two most popular strategies are using filter rules and technical analysis. • The “economic” test is to see if these simple strategies make money.
Filter Rules • The forefathers of the modern momentum strategies. • Purchase an asset when its price has appreciated by some specific amount (X%). • Short-sell (or sell) when the reverse occurs. • There is an old literature (Alexander (1961, 1964), Fama (1965), Fama and Blume (1966)) that shows profits can be made in this way but that the profits are smaller than any reasonable level of transactions costs the strategies would generate.
Technical Analysis • Finding complicated but predictable patterns. • The goal is to identify regularities in the time series of prices by extracting nonlinear patterns from noisy data. • This suggests that some price movements are significant and some should be ignored. • The question is: what’s a pattern and what is noise? • CLM don’t discuss this much so we’ll look at Lo, Mamaysky, and Wang (2000) as an illustration.
LMY (2000) • What do they do? • Use smoothing estimators to extract nonlinear relations between past and future prices by averaging out noise. They: • Discuss smoothing estimators – in particular kernel regression. • Apply it to a sample of US stocks from 1962 – 1996. • Check the accuracy by performing Monte Carlo simulations.
Smoothing Estimators • The basics: • Technical analysis begins with the idea that prices evolve in some nonlinear fashion over time and that there is predictability in the evolution. • Let prices {Pt} follow an unknown law of motion: Pt = m(Xt) + εt, t = 1,…,T
Smoothing Estimators • m(Xt) is an arbitrary fixed but unknown nonlinear function of some state variable Xt, and εt is white noise. • Goal is to construct a smooth function to approximate the time series of prices. • LMW use time as the state variable but retain the notation of Xt.
Smoothing Estimators • Intuition • Suppose you want to estimate m at t0 when Xt0 = x0. • If you have repeated observations on Pt0 its easy • which converges to m(x0) under suitable regularity conditions.
Smoothing Estimators • {Pt} is a time series so of course we don’t have repeated observations of the value at a given time. • If we assume that m() is sufficiently smooth then for time series observations Xt near x0 the values of Pt should be “close” to m(x0). • The closer the Xts are to x0 (closer in time) the closer an average of the Pts will be to m(x0). • What this really suggests is a weighted average, with high weights for observations close to x0 and lower weights for those farther away.
Smoothing Estimators • For any arbitrary x (think time) a smoothing estimator may be expressed as: where the the weights are large for those Pt paired with Xt near x (i.e. t near x) and small for those far from x.
Smoothing Estimators • We must define what we mean by near and far. • Too large a neighborhood and the weighted average will be too smooth and hide the nonlinearities of interest, too small and the average will be too variable and include lots of noise. • We also have to define how the weights change as we move from near to far.
Kernel Regression • In this case wt(x) is constructed from a pdf K(x) called a kernel: • Rescaling the kernel w.r.t. the parameter h>0 changes its spread:
Kernel Regression • Define the weight function as: • gh(x) is the total probability mass in the sample used to compute the weights and wt,h(x) is the pdf at (x-Xt) divided by the total mass so the weights in the kernel estimator sum to one. • Controlling h controls the bandwidth. Small h and you look at a very small neighborhood.
Nadaraya-Watson Kernel Estimator • Substituting the weights into the expression for yields the NW kernel estimator: • Under certain regularity conditions that depend upon the shape of the kernel and the behavior of the weights as a function of sample size, it can be shown that this estimator converges to m(x).
Normal Kernel • Throughout their paper LMW use the Normal or Gaussian Kernel: • Note that the bandwidth, h, is equivalent to the standard deviation of the normal kernel. Larger bandwidth means higher standard deviation and more smoothing.
Example: Fitting A Sine Wave • They generate data using Yt = Sin(Xt) + ½ εzt where εzt ~ N(0,1) and the X’s are evenly spaced over the interval [0,2π]. • They then estimate the sine function using the kernel regression approach. • Use the example to illustrate the importance of the right choice of bandwidth.
Selecting Bandwidth • A popular way to select bandwidth is to minimize the sum of squared deviations from the observations to the fitted values: • Minimizing the cross-validation function minimizes MSE asymptotically. It has problems if the data are not evenly spaced.
LMW’s Choice For Analysis • LMW don’t choose h*. • After consulting with technical analysts, they decided it was too big. • They pick 0.3h* before performing any statistical tests and used it for balance of paper. (This was to avoid data mining charges.) • They suggest that the reason it didn’t work is that the function they were trying to fit was a lot choppier than a sine wave, and it is difficult to distinguish signal from noise with a non-smooth function. • No statistics for Kernel estimators have (as yet) been worked out for non-smooth functions. This is a basic problem for this paper.
Automating Technical Analysis • LMW use an algorithm with the following steps: • Define technical pattern. • Construct kernel estimator for stock price series. • Analyze fitted for occurrences of each pattern. • The question was whether this could replicate human visual pattern recognition. • Consider the standard patterns and their meanings.
Definitions • Consider m(.), the systematic component of a price history {Pt} and identify n local extrema, local maxima and minima. • Denote by E1, E2, …, En the n extrema. • Denote by t*1, t*2, …, t*n the dates of these extrema.
Identification Algorithm • Prices {P1, …, PT} • Time: window from t to t + l + d – 1, where • t (start of window) varies from 1 to T – l – d + 1 • Let l = 35 days and d = 3 days. There are 38 days in a window. • Fit a kernel regression to the rolling windows of data.
l and d • Want short windows to better distinguish signal from noise. Short windows limit the “number” of patterns that might emerge. • This means that you can only observe short horizon patterns. • d controls for the fact that it takes time to detect the pattern after it is completed. • The extrema in the pattern must occur before t+l-1 • Conditional returns are then computed after the detection of the pattern at t+l+d, not earlier.
Note on Implementation • First, compute over the rolling windows. • Then, identify the local extrema of within each window. • Then, using the definitions, search for the identified patterns. • The pictures use a single security’s, CTX, daily returns from 1992-1996.
Basic Question • LMW don’t answer the question: “Is technical analysis profitable?” • Don’t ask this question because specifying the appropriate benchmark is difficult. • Ask: “Is the conditional distribution of returns different from the unconditional distribution?” • Use goodness of fit test and Kolmogorov-Smirnov test to see if the empirical distributions of returns following a pattern are different from the unconditional empirical distribution.
Quantile Tests • Compute the deciles of unconditional returns and tabulate the relative frequency of conditional returns falling into decile j of the unconditional returns, j = 1,…,10. • Under an IID null and if the conditional and unconditional distributions are identical we can compute a goodness-of-fit Q statistic.
Quantile Tests • Under the null, the asymptotic distributions of the and the corresponding Q statistic are given by: where nj is the number of observations in decile j and n is the total number of observations.
Kolmogorov-Smirnov Test • Let {Z1t} and {Z2t} be two samples that are each IID with CDFs F1(z) and F2(z). The K-S statistic tests the null F1 = F2 based on the empirical distributions of the two samples. • Empirically pick the point at which these empirical distributions are most different and “normalize”: • Smirnov (1939) derives the asymptotic distribution of this statistic and you can look up the values in a table. You want γ to be small.
A Problem • Stock returns are not IID. • LMW do some repair work to try and get empirical returns back to an identical distribution by subtracting the mean and dividing by the standard deviation but it is not enough. • Do not address the independence part.
Data And Sampling • LMW apply their tests to daily returns from NYSE/AMEX and Nasdaq for 1962-1996. • Have 7, 5-year periods (non-overlapping). • Select 10 stocks at random from each of 5 market cap quintiles in each subperiod. • Make sure that most observations are present. • Yields 50 stocks for each period.
Computing Conditional Returns • For each pattern, compute the continuously compounded one-day return d days after the pattern has completed. • Then, find unconditional continuously compounded one-day returns as well. • Then, standardize by subtracting (subperiod) mean and divide by standard deviation. • Lastly, combine the standardized return on all stocks to increase the power of tests. • This gives one unconditional and two conditional distributions.
Conditioning On Volume • Construct returns conditioning on increasing or decreasing (or neither) volume (because that’s something technical analysts talk a lot about). • Look at 1st and 2nd halves of each subperiod. • If volume in 1st half is 20% higher than that in 2nd this is a decreasing volume subperiod. • You can guess what increasing means.
Summary Statistics • Lots of double tops and bottoms (over 2000 of each). • Less frequent (1600 each) head and shoulders and inverted head and shoulders. • That is 4 – 6 occurrences of each pattern for each stock in 5 years. • The patterns are asymmetrical with respect to volume. For example, 409 broadening tops with increasing volume but only 143 with decreasing volume. (Volume and volatility are correlated.)
Summary Statistics • Frequency counts for geometric Brownian motion included in table 1. Only 577 and 578 HS and IHS patterns in a Brownian motion compared to 1611 and 1654 in the data. • Tables 3 and 4 report summary statistics for the returns distributions for NYSE/AMEX and Nasdaq stocks. • Note: patterns change the distributions • e.g., the 1st 4 moments of normalized unconditional returns are 0.000, 1.000, 0.345, 8.122 • Conditional on a BTOP they are 0.017, 0.910, 0.206, 3.386
Test Results • Goodness-of-fit • Table 5: conditional return dist’n’s different for NYSE/AMEX for 7 of 10 patterns. • Table 6: conditional return dist’n’s wildly different for Nasdaq stocks. • Kolmogorov-Smirnov • Table 7: for NYSE/AMEX conditional and unconditional dist’n’s different half the time. • Table 8: for Nasdaq conditional and unconditional dist’n’s different in all cases.
Volume • Conditioning on volume adds information only in a very few cases.
Conclusion • After all these years of scoffing, there may be something there. • However, there is little evidence that this method of identifying patterns yields any profits. • See the discussion by Jegadeesh in the JF.