1 / 39

Break-points detection with atheoretical regression trees

Explore the significance of detecting structural breaks in time series data using regression trees. Learn about key methodologies like Chow test, CUSUM, and Fisher's algorithm for pinpointing break-points. Discover the application of regression trees in analyzing tree ring data from Campito Mountain and mean water levels of Michigan-Huron lakes.

Download Presentation

Break-points detection with atheoretical regression trees

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Marco Reale University of Canterbury Universidade Federal do Parana, 27th November 2006 Break-points detection with atheoretical regression trees

  2. Acknowledgements The results presented are the outcome of joint work with: Carmela Cappelli and William Rea

  3. Structural Breaks • A structural break is a statement about parameters in the context of a specific model. • A structural break has occurred if at least one of the model parameters has changed value at some point (break-point). • We consider time series data.

  4. Relevance Their detection is important for: • forecasting (latest update of the DGP); • Analysis. With regard to this point a recent debated issue is fractional integration vs structural breaks.

  5. Milestones: Chow 1960 • Test for an a priori candidate break-point. • Splits the sample period in two subperiods and test the equality of the parameter sets with an F statistic. • It cannot be used for unknown dates: misinformation or bias.

  6. Milestones: Quandt 1960 • We can compute Chow statistics for all possible break-points. • If the candidate breakpoint is known a priori, then a Chi-square statistics can be used.

  7. Milestones: CUSUM 1974 • Proposed by Brown, Durbin and Evans. • It checks the cumulative sum of the residuals. • It tests the null of no breakpoints against one or more breakpoints.

  8. Milestones: Andrews 1993 It exploits the Quandt statistics for a priori unknown break-points.

  9. Bai and Perron 1998, 2003 • It finds multiple breaks at unknown times. • Application of Fisher algorithm (1958) to find optimal exhaustive partitions. • It requires prior indication of number of breaks. • Applied recursively after positive indication provided by CUSUM. • Use of AIC to decide the number of breaks.

  10. Fisher’s algorithm

  11. Examples with G=2,3 and m=1

  12. Example with G=3 and m=2

  13. Bai, Perron and Fisher • Eventually Fisher selects the partition with the minimum deviance. • It is a global optimizer, but was computationally feasible only for very small n and G (even with today's computers). • Using later results in dynamic programming Bai and Perron can use the Fisher algorithm reasonably fast for n=1000 and any G and m. • Fisher’s algorithm is related to regression trees.

  14. Trees (1) • Trees are particular kinds of directed acyclic graphs. • In particular we consider binary trees. • Splits to reduce heterogeneity.

  15. Trees (2) Node 1 is called root. Node 5 is called leaf. The other nodes are called branches.

  16. Regression Trees (1) • Regression trees are sequences of hierarchical dichotomous partitions with maximum homogeneity of y projected by partitions of explanatory variables. • y is a control or response variable.

  17. Regression trees (2)

  18. Regression trees optimality Regression trees don't provide necessarily optimal partitions

  19. Atheoretical Regression Trees • Any artificial strictly ascending or descending sequence as a covariate, e.g. {1,2,3,4...} would do all the optimal dichotomous partitions. • It also works as a counter. • It is not a theory based covariate so the name, Atheoretical regression trees ....yes it's ART. • ART is not a global optimizer.

  20. Pruning the tree Trees tend to oversplit so the overgrown tree needs a pruning procedure: • Cross Validation, is the usual procedure in regression tree, not ideal in general for time series; • AIC (Akaike, 1973) tends to oversplit • BIC (Schwarz, 1978) very good All the information criteria robust for non normality, especially BIC.

  21. Single break simulations

  22. Noisy square simulations

  23. CUSUM on noisy square

  24. ART on noisy square

  25. Some comments • The simulations show an excellent performance. • However ART performs better in long regimes. • With short regimes it tends to find spurious breaks but the performance can be sensibly improved with an enhanced pruning technique (ETP).

  26. Bai and Perron on noisy square

  27. Some comments • BP tends to find breaks any time the CUSUM rejects the null. • It unlikely finds spurious breaks. but • It tends to underestimate the number of breaks.

  28. Application to Michigan-Huron • The Michigan-Huron lakes play a very important role in the U.S. economy and hence they are regularly monitored. • In particular we consider the mean water level (over one year) time series from 1860 to 2000.

  29. Michigan-Huron (2)

  30. Michigan-Huron (3)

  31. Michigan-Huron (4)

  32. Campito Mountain • We applied ART to the Campito Mountain Bristlecone Pine data which is an unbroken set of tree ring widths covering the period 3435BC to1969AD. A series of this length can be analyzed by ART in a few seconds. BPP was applied to the series and took more than 200 hours of CPU time to complete.Tree ring data are used as proxies for past climatic conditions.

  33. Campito Mountain (2)

  34. Campito Mountain (3)

  35. The four most recent periods… …are: • 1863-1969: Industrialization and global warming. • 1333-1862: The Little Ice Age. • 1018-1332: The Medieval Climate Optimum. • 862-1017: Extreme drought in the Sierra Nevadas.

  36. Niceties of ART • Speed: Art has O(n(t)) while BP O(nng). • Simplicity: it can be easily implemented or run with packages implementing regression trees. • Feasibility: it can be used without almost any limitation on either the number of observations or the number of segments. • Visualization: it results in a hierarchical tree diagram that allows for inputation of a priori knowledge.

  37. …and ... and of course you can say you're doing ART

  38. Dedicated to Paulo

More Related