370 likes | 385 Views
This article explores the application of copulas in measuring dependence between random variables. It discusses the limitations of linear correlation and introduces alternative measures such as Spearman's rank correlation and Kendall's rank correlation. The role of copulas in calculating these measures is also explained.
E N D
MEASURES OF DEPENDENCE Application of copulas
1. Introduction (cont.) • Linear Correlation: • -1≤≤1 • X and Yindependent (X,Y) = 0. • (aX + b, cY + d) = sgn(ac)(X,Y)
1. Introduction • Correlation one of the most widely used, and perhaps misunderstood, concept in statistics. • Advantages of linear correlation: • Straightforward to calculate • Easy to manipulate under linear operations • Natural measure of dependence in multivariate elliptical distribution
1. Introduction (cont.) • Fallacies about linear correlation: • Marginal distributions and correlation coefficients determine the joint distribution. • Given marginals F1 and F2 all linear correlations between -1 and 1 are realizable.
1. Introduction (cont.) • Shortcomings: • The variances of X and Y must be finite. Not ideal for heavy tailed distributions. • Zero correlation does not imply stochastic independence, except for the case of elliptic distributions. • Not invariant under nonlinear strictly monotonic transformations.
1. Introduction (cont.) • Given two distributions F1 and F2, the attainable correlation is given by the following theorem: • Let(X,Y) a random variable with marginals F1 and F2 and unspecified dependence structure with finite variances. Then • The set of all possible correlations is a closed interval [min, max] and for the extremal correlations min<0< max. • minis attainable iff X and Y are perfectly negatively dependent (countermonotonic); maxis attainable iff X and Y are perfectly positively dependent (comonotonic). • min = -1if X and –Y are of the same type; max = 1, if X and Y are of the same type.
1. Introduction (cont.) • Remark: two random variables are said to be of the same type if there exists > 0 and in R, such that Y =d X + . Conclusion: Need alternative measures of functional dependence.
2. Measures of dependence. • Definition 2.1. (concordant, discordant)Observations (x1, y1) and (x2, y2) are concordant if (x1 – x2)(y1 – y2) > 0. They are discordant if (x1 – x2)(y1 – y2) < 0. • Definition 2.2. (Spearman’s rank correlation) Let X and Y be random variables with distributions F1 and F2 and joint distribution F. The population Spearman’s rank correlation is given by s = (F1(X), F2(Y)), where is the usual linear correlation.
2. Measures of dependence. • The sample estimator of the Spearman’s rank correlation is given by
2. Measures of dependence. • Example 2.1.
2. Measures of dependence. • Example 2.2.
2. Measures of dependence. • Definition 2.3. Let {(Xi, Yi)} 1≤ i ≤ n be a random sample from a population (X, Y) with cdf F(x, y). There are nC2 distinct pairs of (Xi, Yi) of observations in the sample, and each pair is either concordant or discordant. Let c be the number of concordant pairs and d be the number of discordant pairs. Then an estimate of Kendall’s rank correlation for the sample is given by t = (c – d) / (c + d). The population version of Kendall’s rank correlation is given by
2. Measures of dependence. • Note: the generalization of s and to n > 2 is similar to the procedure for linear correlation. • Theorem 2.1. Let X and Y be random variables with cdf’s F1 and F2, and joint cdf F. Let denote either s or . The following hold: • is symmetric. • If X and Y are independent then = 0. • -1 ≤ ≤ 1. • If T is strictly monotonic on the range of X, then
2. Measures of dependence. • Advantages of these measures: • Invariant under monotonic transformations. • Perfect dependence corresponds to measures 1; = 1 iff Y = T(X) for some monotone increasing T. • Robust against outliers. • Disadvantages: • Not easy to manipulate; not moment based correlations.
3. The role of copulas. • Theorem 3.1. let X and Y be random variables with cdf’s F1(x) and F2(y) and joint cdf F(x, y). Let C(u, v) be the copula of X and Y. Then we have the following alternative formulas for Spearman’s and Kendall’s correlations respectively:
3. The role of copulas. • Proof: we will use the following identity due to Hoeffding:
3. The role of copulas. • Proof (cont.)the proof for Kendall’s identity will be presented in a more general setting. • Note: the proof of the last part of theorem 2.1 now follows easily from this theorem.
3. The role of copulas. • Example3.1. Farlie-Gumbel-Morgenstern Family of copulas. note that [-2/9, 2/9]. Limited range of dependence!
3. The role of copulas. • Example 3.2. The Clayton-Cook-Johnson Family
3. The role of copulas. • Theorem 3.2. Let (X1, Y1), (X2, Y2) be random vectors with joint cdf’s H1, and H2 and common marginals F (of X1 and X2) and G (of Y1 and Y2); let C1 and C2 denote the copulas of (X1, Y1) and (X2, Y2) respectively. Let K be defined by: K = P[(X1 – X2)(Y1 – Y2) > 0] - P [(X1 – X2)(Y1 – Y2) < 0]then:
3. The role of copulas. Proof:
3. The role of copulas. Proof (cont.) The result follows.
3. The role of copulas. • Corollary 3.2. Under the hypothesis of theorem 3.1 the following hold: • K is symmetric in its arguments: K(C1, C2) = K(C2, C1) • K is non-decreasing in each argument: • Copulas can be replaced by survival copulas in K, i.e.
3. The role of copulas. • Remark 3.2.1. If C has a singular component the formula for can not be computed. For such copulas the following expression is used:This is the consequence of the following theorem (Li et al., 2002): • Theorem 3.3. Let C1 and C2 be copulas. Then
3. The role of copulas. • Example 3.3. Recall M(u, v) = Min(u, v), W(u, v) = Max(u + v – 1, 0), (u, v) = uv. The support of M and W are the first and the second diagonals of I2. If g(u, v) is an integrable function with domain I2, then
3. The role of copulas. • Example 3.3 cont.It follows that:
3. The role of copulas. • If C is an Archimedean copula then the Kendall’stau can be evaluated directly using the generator. But first recall the theorem from last time:Let C be an Archimedean copula generated by . Then the distribution function of the random variable C(U, V) is given by
3. The role of copulas. • Theorem 3.4. Let X and Y be random variables with an Archimedean copula C generated by . Then the Kendall’s tau of X and Y is given by:Proof: see Nelsen.
3. The role of copulas. • Example 3.4. For the Gumbel family (t) = (-lnt). Therefore
4. Tail dependence (Dr. Lin’s request). • Definition 4.1. Let X and Y be random variables with cdf’s F1 and F2. The coefficient of upper (lower) tail dependence of X and Y is
4. Tail dependence • Can we use copulas to understand tail dependence? Recall the definition of survivor copula: C*(u, v) = u + v – 1 + C(1 – u, 1 – v). We now have:
4. Tail dependence Therefore we get for the upper tail dependence: Likewise for the lower tail dependence we get:
4. Tail dependence • Example 4.1. Gumbel copula:C(u, v) = exp[-{(-log(u)) + (-log(u)) }1/], u = 2 – 21/, l = 0.We see that for > 1, C has tail dependence.
4. Tail dependence • Example 4.2. Gaussian family of copulas.We need a different formula for u. The following can be derived from the definition of conditional probability and therefore the proof is not presented.
4. Tail dependence • Example 4.2. (cont.)It can now be shown that Now if X and Y have bivariate normal distribution with correlation , it can be shown that u = 0. So the Gaussian copula does not have upper tail dependence.
4. Tail dependence • Theorem 4.1.Let C be a strict Archimedean copula. If -1’(0) is finite, then C(u, v) does not have upper tail dependence. If C has upper tail dependence, then -1’(0) = - , and the coefficient of upper tail dependence is given by • Theorem 4.2. let be strict generator. The coefficient of lower tail dependence for the copula C is given by
4. Tail dependence • Example 4.3. Consider the Clayton family.C(u,v) = (u + v - 1)-1/. Then l = 2-1/; 0. • Example 4.4. It can be shown that the Frank family does not have any tail dependence.