600 likes | 756 Views
FIN 685: Risk Management. Topic 4: Dependencies Larry Schrenk, Instructor. Topics. Dependency Rank Order Statistics Spearman’s r Kendall’s t Correlation Copulae. Dependency. Bivariate Dependencies. Require that you ask three questions Does an dependencies exist ?
E N D
FIN 685: Risk Management Topic 4: Dependencies Larry Schrenk, Instructor
Topics • Dependency • Rank Order Statistics • Spearman’s r • Kendall’s t • Correlation • Copulae
BivariateDependencies • Require that you ask three questions • Does an dependencies exist? • If an dependency exists, then how strong is it? • What is the pattern or direction of the dependency ? • When we consider these questions, we begin to explain the nature (if there is one) of the relationship between our variables • When we do this, we cross a threshold into a higher form of scientific inquiry
Does a dependency Exist? • In our example we looked at the independence of two variables • When the test rejected the null of independence it revealed that there was evidence of some association between our variables • It is important to recognize that the statistics do not prove a causal relationship, but they do give evidence that that relationship is likely to exist • This can have far reaching implications because it opens up the possibility for modeling and prediction
How Strong is the dependency ? • A huge Chi-Square value is some indication that there is a strong association, but this “impressionistic” approach is somewhat limited • As we progress in this subject, we will investigate specific indices for describing the strength of an association • These indices are generally scaled from 0 to 1 or from –1 to +1 • A zero value generally means no association, while a 1 value means that there is nearly a perfect association
What Is the Pattern or Direction of the dependency ? • Because of the pseudo-ordinal nature of the data in our example, attending lecture is clearly associated with higher exam scores • In the simplest sense, more lectures attended translates into a higher score • This means that the dependency is not only present, but also positive in its effect (big yields big and small yields small)
What Is the Pattern or Direction of the Dependency ? • A silly example: • Note that the dependency here is negative • It appears that getting so drunk that you forget where you are on Saturday night can have an adverse effect on how you feel on Sunday • This is negative association: big boozing provides little contentment on Sunday morning, while little alcohol will yield big contentment on Sunday morning
Indices of dependency • If an dependency exists, then we should the strength of the relationship in a standard manner via an index • We do this so that we can compare associations between many variables and thereby determine which have the strongest influence over others
Scatter plot • The relationship between any two variables can be portrayed graphically on an x- and y- axis. • Each subject i1 has (x1, y1). When score s for an entire sample are plotted, the result is called scatter plot.
Direction of the relationship Variables can be positively or negatively correlated. • Positive correlation: A value of one variable increase, value of other variable increase. • Negative correlation: A value of one variable increase, value of other variable decrease.
Strength of the relationship The magnitude of correlation: • Indicated by its numerical value • ignoring the sign • expresses the strength of the linear relationship between the variables.
r =1.00 r = .42 r =.17 r =.85
Spearman’s rho • Non-Parametric • Range:–1.0 to zero to 1.0 • Like correlation between ranked variables • Ordinal
Ordinal By Degrees • Ordinal data is defined as data that has a clear hierarchy • This form of data can often appear similar to nominal data (in categories) or interval data (ranked from 1 to N) • However there is more information to ordinal categories than nominal categories • And there is less to ranks than there is to real data at an interval level of measure
Interpreting Spearman’s rs • 1.00 means that the rankings are in perfect agreement • -1.00 is if they are in perfect disagreement • 0 signifies that there is no relationship
Calculation • Convert data to ranks, xi, yi • Excel: RANK function • Assuming no tied ranks
Kendall’s Tau • Non-Parametric • Range:–1.0 to zero to 1.0 • ‘Pairs’ Oriented • Ordinal
Introduction to Kendall’s • The basic premise behind Kendall’s is that for observations with two pieces of information (two variables) you can rank the value of each and treat it as a pair to be compared to all other pairs • Each pair will have an X (dependent variable) value and a Y (independent variable) value • If we order the X values, we would would expect for the Y values to have a similar order if there is a a strong positive correlation between X and Y • Kendall’s has a range from –1 to +1 with large positive values denoting positive associations and large negative values denoting negative associations, a 0 denotes no association
Kendall’s tau • This series of tests works off of the comparison of pairs to all other pairs • Any comparison of pairs can have only three possible results • Concordant – Ordinally correct • Discordant – Ordinallyincorect • Tied – Exactly the same • Note that for n pairs there are n(n-1)/2 comparisons, hence the equation from your book
Calculation • This series of tests works off of the comparison of pairs to all other pairs • Any comparison of pairs can have only three possible results • Concordant (Nc) – Ordinally correct • Discordant (Nd) – Ordinallyincorrect • Tied – Exactly the same
Calculation • Note that for n pairs there are n(n-1)/2 comparisons, hence the equation
Pearson’s rho • Pearson’s Product Moment Correlation • Devised by Francis Galton • The coefficient is essentially the sum of the products of the z-scores for each variable divided by the degrees of freedom • Its computation can take on a number of forms depending on your resources
Pearson’s rho • Parametric: Elliptical • Linear • Range:–1.0 to zero to 1.0 • Cardinal
Equations and Covariation • The sample covariance is the upper center equation without the sample standard deviations in the denominator • Covariance measures how two variables covary and it is this measure that serves as the numerator in Pearson’s r Mathematically Simplified Computationally Easier
Covariation r = 0.89, cov = 788.6944 • How it works graphically: x(bar) y(bar) +,+ -,-
Correlation via rho • So we now understand Covariance • Standard deviation is also comfortable term by now • So we can calculate Pearson’s r, but what does it mean: • r is scaled from –1 to +1 and its magnitude gives the strength of association, while its sign shows how the variables covary
Example • Correlation = 0.58
Correlation Works When • Multivariate Normal • Ellipitical • Linear Relationships
Some Concerns 1. Correlation represents a linear relations. • Correlation tells you how much two variables are linearly related, not necessarily how much they are related in general. • There are some cases that two variables may have a strong perfect relationship but not linear. For example, there can be a curvilinear relationship.
NonLinearity • An Extreme Example
Some Concerns 2. Restricted range • Correlation can be deceiving if the full information about each of the variable is not available. A correlation between two variable is smaller if the range of one or both variables is truncated. • Because the full variation of one variables is not available, there is not enough information to see the two variables covary together.
Some Concerns 3. Outliers • Outliers are scores that are so obviously deviant from the remainder of the data. • On-line outliers–artificially inflate the correlation coefficient. • Off-line outliers–artificially deflate the correlation coefficient
On-line outlier • An outlier which falls near where the regression line would normally fall would necessarily increase the size of the correlation coefficient, as seen below. • r = .457
Off-line outliers • An outlier that falls some distance away from the original regression line would decrease the size of the correlation coefficient, as seen below: • r = .336
Some Concerns 3. Distributional Assumptions • Multivariate Normal • Assets not Normal • Combining Distributions
Some Concerns 3. Time Stability • Higher Correlation in Bad Markets
Correlation and Causation • Two things that go together may not necessarily mean that there is a causation. • One variable can be strongly related to another, yet not cause it. Correlation does not imply causality. • When there is a correlation between X and Y. • Does X cause Y or Y cause X, or both? • Or is there a third variable Z causing both X and Y , and therefore, X and Y are correlated?