210 likes | 481 Views
Correlation, dependence and copulas. Multivariate distributions Covariance and correlation Dependence vs. correlation Copulas Most slides borrowed from Chanyoung Park. Joint density distribution function. With two random variables X and Y
E N D
Correlation, dependence and copulas • Multivariate distributions • Covariance and correlation • Dependence vs. correlation • Copulas Most slides borrowed from Chanyoung Park
Joint density distribution function • With two random variables X and Y • Definition means that the integral of the joint PDF from minus infinity to infinity on both variables is 1.
Marginal PDF • Bivariate RVs • 6 randomly generated RVs: {(1,2) (2,4) (4,5) (2,2) (3,4) (3,2)} • Sampling only for X1: {1,2,4,2,3,3} → A PDF for only X1? • Sampling only for X2: {2,4,5,2,4,2} → A PDF for only X2? • Marginal PDF • A PDF for only one RV • Marginal distribution is independent of whether RVs are correlated.
Marginal PDF does not reflect correlation • Two same marginal distributions • Two RVs X1 and X2 and their marginal distributions • Correlated RVs (Bivariate RVs) uncorrelated RVs
Is there any correlations between Economic freedom and income? • Note: I (chanyoung) am not a supporter of free markets
Correlated Variables • For normal distribution can use Matlab’smvnrnd • R = MVNRND(MU,SIGMA,N) returns a N-by-D matrix R of random vectors chosen from the multivariate normal distribution with 1-by-D mean vector MU, and D-by-D covariance matrix SIGMA.
Example mu = [2 3]; sigma = [1 1.5; 1.5 3]; r = mvnrnd(mu,sigma,20); plot(r(:,1),r(:,2),'+') What is the correlation coefficient?
Correlation images • Correlated two random variables • Two RVs X1 and X2 • Plotting 1000 pairs of X1 and X2 • Correlation is not the slope of linear regression of two RVs. Why? • Correlated RVs (Bivariate RVs) uncorrelated RVs
Probability with bivariate RVs • Probability • How to calculate probability of P(X1<x1), P(X2<x2)and P(X<x1, X<x2) • Correlated RVs (Bivariate RVs) Independent RVs x2 x2 x1 x1 P(X1<x1) = P(X1<x1) P(X2<x2) = P(X2<x2) P(X<x1, X<x2) ≠ P(X<x1, X<x2)
Statistical independence • Two events A and B are independent if the occurrence of one event does not change the probability of the other event. • The two events are independent if and only if • Similarly two random variables X,Y are independent if and only if
Conditional probability • The PDF of X for specified y is the conditional probability of X given y • If X and Y are independent
Example of dependent and uncorrelated • Let X be N(0,1) and let • Then • Can you show with simple example that these two variables are not independent?
Describing efficiently joint distributions • When the joint distribution is normal the means and covariance matrix do the job. • When the joint distribution is not normal, we look for other devices, and a copula function is the current favorite. • Note that the marginal distributions can be normal but the joint distribution is not normal. How can that happen?
How to calculate probability of bivariate RVs • Copula • For independent RVs, P(X1<x1, X2<x2)= P(X1<x1) P(X2<x2) • For correlated RVs, P(X<x1, X<x2) = C(P(X1<x1), P(X2<x2), θ) • C is a copula function and θ is a correlation coefficient to get the P(X<x1, X<x2) • Correlation coefficient • Measures for correlation of bivariate RVs: Linear correlation coefficient (Pearson’s Rho), Kendal’s Tau and Spearman’s Rho • Linear correlation coefficient: Only capable of measuring linear relationship, it is unduly influenced by outliers • Kendal’s Tau: Using probability of correlation. (using probability of concordance and probability of discordance) • Spearman’s Rho: Using the same concept with Pearson’s Rho but it uses ranks of data rather than correlation of data.
Kendall’s tau coefficient • Let (x1, y1), (x2, y2), …, (xn, yn) be a set of joint observations from two random variables X and Y respectively. • A pair of observations (xi, yi) and (xj, yj) are concordant if the ranks for both elements agree: that is, if both xi > xj and yi > yj or if both xi < xj and yi < yj. E.g (1,2) (3,19) • They are discordant, if xi > xj and yi < yj or if xi < xj and yi > yj. (1,2), (2,1) • If xi = xj or yi = yj, the pair is neither concordant nor discordant. • The Kendall τ coefficient is defined as:
Example • Consider X=U[0,1] and • What is Kendall’s tau? • What is the linear correlation coefficient?
Copula models • Elliptical Copula • Student-t copula • Gaussian copula (Linear correlation coefficient); copula function is implicit • Archimedian Copula • Explicit copula function • Clayton, Gumbel, Frank copula (Kendall’s tau) • Joint PDFs of copulas • Standard normal distributions are used as marginal PDFs • Kendall’s tau = 0.5 • Gaussian Clayton Gumbel Frank
How to fit copula? • From samples • Find marginal distributions; Find PDFs of best fit for each RV using Goodness of fit (GOF) test such as Kolmogorov-Smirnoff (K-S) test • Calculate a correlation coefficient • Find a copula of best fit for the samplesusing GOF test • When would you want to fit a copula?
How to generate bivariate RVs? • Generate bivariate RVs using MATLAB • Generate N samples • Frank copula with Kendall’s tau of 0.5with two standard normal distribution for marginal PDFs • N=5; family= 'Frank’;Ktau=0.5;alpha = copulaparam(family,Ktau);U = copularnd(family,alpha,N);X=norminv(U,0,1) • U = copularnd(FAMILY,ALPHA,N) returns N random vectors generated from the bivariate Archimedean copula determined by FAMILY, with scalar parameter ALPHA. FAMILY is 'Clayton', 'Frank', or 'Gumbel'. U is an N-by-2 matrix. Each column of U is a sample from a Uniform(0,1) marginal distribution. X = 0.8954 -0.1639 1.3153 0.5446 -1.1408 -0.7823 1.3618 2.2497 0.3381 1.6897