Time Series, Nonsense Correlations and the PCC

Time Series, Nonsense Correlations and the PCC Julian Reiss Complutense University, Madrid and Centre for Philosophy, LSE

Overview • The PCC in the context of time series • A Well-Known Problem: British Bread Prices and Venetian Sea Levels • Two Attempts to Fix it: • Defusing the problem • Arguing that the problem does not impugn the usefulness of the principle • I am going to argue that neither strategy will work • Finally, I am going to present a reformulated version of the principle, which avoids the problems here discussed but at the expense of virtual uselessness • Lesson: Don’t use the PCC in time-series analysis

The Principle of the Common Cause • Here’s a version of it: PCC. If two variables X and Y are correlated then either • X causes Y • Y causes X or • Z, a common cause, causes both X and Y • Versions of this principle are at the heart of all probabilistic methods of causal inference (most prominently, of course, Bayes’ Nets) • Here focus on applications to time series: time-ordered measurements: X = {Xt, Xt+1, Xt+2, …, Xt+n} • Important in fields as diverse asneurophysiology, climatology, epidemiology, astro- and geophysics and many of the social sciences

Bread Prices and Sea Levels • Years ago, Elliott Sober introduced the following counterexample to the literature: Consider the fact that the sea level in Venice and the cost of bread in Britain have both been on the rise in the past two centuries. Both, let us suppose, have monotonically increased. Imagine that we put this data in the form of a chronological list; for each date, we list the Venetian sea level and the going price of British bread. Because both quantities have increased steadily with time, it is true that higher than average sea levels tend to be associated with higher than average bread prices. The two quantities are very strongly positively correlated. • And yet, ex hypothesis, not causally connected

Bread Prices and Sea Levels

Defusing the Counterexample • In a recent article, Kevin Hoover has tried to show that the counterexample is only apparent • Distinguish two stages of inference from observations: • Statistical inference: from frequencies to probabilities • Causal inference: from probabilities to causal relations • Sober committed a fallacy at the first stage of the inference: the two series are associated (at the level of frequencies) but not correlated (at the level of probabilities) • One can see this readily when one considers that statistical inference is always conducted against a probability model; regarding Sober’s series as correlated means that the probability model one thinks most likely to be true of the series is one with stable moments (e.g. an IN process)—but simple data analysis shows that this is probably not the case

Defusing the Counterexample • For a series whose moments change (in this case, the mean increases monotonically in time), the Pearson’s correlation coefficient is not the right measure of correlation—we need a different measure that is adequate to the situation

Defusing the Counterexample • Hoover’s use of terms is highly non-standard: • The correlation coefficient is defined for any two variables, not only variables whose moments are stable over time • Alternative, non-parametric measures yield the same verdict (e.g., Spearman’s rank correlation coefficient is unity) • If the problem was one of statistical inference, we would expect the association to disappear with a larger sample—but this isn’t the case here

Defusing the Counterexample • But never mind standard usage; what if Hoover means to say: Correlation is a theoretical concept that needs to be operationalised differently in different contexts • The Pearson’s coefficient is an inappropriate measure of correlation when time-series are non-stationary; two integrated series are correlated if and only if they are co-integrated

Defusing the Counterexample • Stationary: A time series is weakly (or covariance) stationary if, and only if, its mean and variance are both finite and independent of time, and the covariance between the values of the series at different times depends only on the temporal distance between them • Integrated: Let d be the minimum integer such that {dXt} is weakly stationary. Then {Xt} is said to be integrated of order d, which is notated I(d). (By convention, a stationary time series is notated as I(0).) • Co-integrated: Two time series {Xt} and {Yt} are cointegrated if, and only if, each is I(1) and a linear combination {Xt – 0 – 1Yt}, where 1 0, is I(0).

Defusing the Counterexample Yt = Yt-1 + Yt Xt = .5Xt-1 + Xt Yt = Yt-1 + Yt Xt = .5Yt-1 + Xt

Defusing the Counterexample • Distinguish: • “Spurious” correlation: correlation is only apparent (e.g. because the wrong measure has been used) • “Nonsense” correlation: correlation is real but it does not have a causal explanation • Hoover effectively denies that there are any nonsense correlations • The problem is that integratedness is only one source of non-stationarity, and non-stationarity is only one source of nonsense correlation

Defusing the Counterexample • Integratedness (or unit roots) is sometimes called a stochastic source of non-stationarity; there are also deterministic sources: • A deterministic trend: Xt = t + XXt-1 + X,t • Breaks in deterministic parameters (e.g., the mean of a series or its trend) • (to be fair, Hoover considers so-called trend-stationary series—but we’ll see below that prior detrending isn’t always a good idea) • Correlations can be nonsense even in stationary series: • Xt = XXt-1 + X,t Yt = YYt-1 + Y,t • Moving averages

Defusing the Counterexample • Furthermore, there are various sources of nonsense correlation that have nothing to do with time series as such • Population heterogeneity • Selection/sampling bias • Mathematical/conceptual/logical links • Etc.

Defusing the Counterexample • Though I do not have a proof for this claim, I doubt that case-by-case measures of correlation can be found that do not beg the question • Furthermore, Hoover’s recipe just shifts the problem up one level: cointegration, too, can be spurious • Ironically, Sober’s series appear cointegrated (0 = 20.25; 1 = .54)

Insulating the PCC • Hence, I think we can justly conclude that Sober’s counterexample is genuine • But maybe, after all, it doesn’t matter so much because (a) Sober-like scenarios are rare; (b) data can be prepared prior to analysis and thus we can insulate the PCC from failures • Quickly on (a): it would be dumb to use an inference method we know sometimes fails even if failures are rare; but, importantly, failures aren’t rare—they’re ubiquitous

Insulating the PCC • But, perhaps, we can do something with the data before applying the PCC to it (Steel 2003): “[T]he above discussion illustrates how researchers interested in drawing conclusions from statistical data can design their investigation so that counter-examples like Sober’s are not a concern. For instance, if the series is non-stationary but transformable into a stationary one via differentiating with respect to time, then differentiate. Then PCC can be invoked without concern for the difficulty illustrated by the Venice-Britain example.” • Unfortunately, this, too, doesn’t work

Insulating the PCC • Differencing is only effective when series are integrated of order 1—many series are not: • Differencing won’t be effective in stationary time series (cf. discussion above) • Nor in fractionally integrated series • You’ll have to difference several times if series are integrated of an order > 1 • Moreover, we can lose important information through differencing: • Information on long-run behaviour • Information about co-breaking

Insulating the PCC • Other off-the-shelf correction methods don’t fare better • Detrending can lead to spurious results when series are integrated (e.g., detrending can lead to spurious co-integration) • Detrending isn’t always effective: Sober’s series remain highly correlated after detrending • Compare: “Applying the program [that incorporates the PCC] to real data requires a lot of adaptation to particular circumstances: […] data must be differenced to remove auto-correlation…” (Clark Glymour, philosopher) “A Simple Message to Autocorrelation Correctors: Don’t.” (Grayham Mizon, econometrician)

Eliminative Induction • I think the best we can do is along the following lines: PCC*: A correlation between two variables X and Y is explanation-seeking. If all kinds of non-causal (e.g., statistical, logical, mathematical, conceptual, nomological) explanation can be ruled out, then either X causes Y, Y causes X or X and Y are the joint effects of a common cause Z, which screens off X and Y • But of course, this is of little help since other explanations can almost never be eliminated when time series are concerned

PCC—What’s it good for? • The PCC seems applicable AT BEST to systems that • are shielded from outside influences (in order to avoid deterministic disturbances) • have no internal dynamics (in order to avoid stochastic disturbances) • are run over and over again (in order to get correlations to begin with) • Aren’t these conditions typically met in experimental set ups? • But if we can experiment, why use the PCC?

Time Series, Nonsense Correlations and the PCC

Time Series, Nonsense Correlations and the PCC

Presentation Transcript

Time Series

Time Series 2 Time Series 1

Correlations Magnetism and Structure across the actinide series

Time Series and Forecasting

Time series

Time Series

Time Series

Time series

Time Series

autocorrelation correlations between samples within a single time series

Time Series

Time Series

Time series

Time Series

Time Series

Time Series

Time series

Time Series and Forecasting

Time Series