630 likes | 778 Views
Statistics. Statistics. Branch of Mathematics that deals with the collection and analysis of data Descriptive Statistics: used to analyze and describe data
E N D
Statistics • Branch of Mathematics that deals with the collection and analysis of data • Descriptive Statistics: used to analyze and describe data • Inferential Statistics: used to use the information to make statements regarding the relationships between variables or the expectations about future events.
Measures of Central Tendency • Arithmetic Mean • Median • Mode • Geometric Mean
Arithmetic Mean • Other names • Average • Mean
Arithmetic Mean • The calculation is identical, just the notation varies slightly
Summation Notation • Notice that the first form uses less vertical space on the page • This makes accountants very happy • The first can also be easier to fit into a line of text
Example • Ten second year BBA students wrote the CSC exam last month • Their scores were: 71, 72, 88, 69, 77, 63, 91, 81, 83, 75
Calculating the Mean • Arithmetic mean • sum the observations and divide by the number of observations • Example: 5%, 7%, -2%, 12%, 8%
Problem with the Arithmetic Mean • Arithmetic mean is incorrect for variables that are related multiplicatively, like rates of growth, rates of return and rates of change • $1,000 at 6% for 5 years should be $1,338.23
Geometric Mean • The Geometric Mean should be used for rates of change, like rates of return
Geometric Mean • The Geometric Mean should be used for rates of change, like rates of return Means: The product of these factors from 1 to N
Geometric vs. Arithmetic Mean • The more variable the underlying data, the greater the error using the Arithmetic mean • The Geometric Mean is often easier to calculate: • Stock prices: 1992: $20; 1999: $40, R = 10.41%
Geometric vs. Arithmetic Mean • For analysis of past performance, use the Geometric mean • The past returns have averaged 5.898% • To use the past returns to estimate the future expected return, use the Arithmetic mean • The expected return is 6%
Median and Mode • Median: Midpoint • If odd number of observations: Middle observation • If even number of observations: Average of middle 2 observations • Mode: Most frequent
Example • Our CSC mark data was (sorted): 63, 69, 71, 72, 75, 77, 81, 83, 88, 91 • The median is 76 • There is no mode
Example • The Deviation is the difference between each observation and the mean • The sign indicates whether the observation is above (+) or below (-) the mean
Example • The average deviation is always zero • If it isn’t, you must have made a mistake!
Measures of Dispersion • So far, we have look at measures of central tendency • What about measuring the tendency of the data to vary from these centre?
Measures of Dispersion • Range • Highest - Lowest • Variance • Standard Deviation
Example • The range is 91-63=28 • The range can be extremely sensitive to outlier observations • Suppose one of these students had a very bad day and scored 8. • The range would now be 91-8=83
Mean Absolute Deviation • The Mean Absolute Deviation is a measure of average dispersion that is not used very much • It has some undesirable mathematical properties beyond the level of this course
Mean Squared Deviation • The Mean Squared Deviation is very commonly used • The MSD in this example is 694/10=69.4 • The more common name of the MSD is the VARIANCE
Variance • Variance measures the amount of dispersion from the mean. • For Populations: For Samples:
Standard Deviation • Standard Deviation measures the amount of dispersion from the mean. • For Populations: For Samples:
Standard Deviation Example • Using the previous example • The data is sample data
Interpreting the Std. Dev. • You have heard of the Bell Shaped or Normal Distribution • The properties of the Normal Distribution are well known and give us the EMPIRICAL RULE
Empirical Rule For approximately Normally Distributed data: • Within 1s of the mean: approx.. 2/3s • Within 2s of the mean: approx. 95% (19/20) • Within 3s of the mean: virtually all
Quartiles, Percentiles, etc. • The Median splits the data in half • Quartiles split the data into quarters • Deciles split the data into tenths • Percentiles split the data into one-hundredths
Rank Measures • “That was a top-half performance” • “WTG Special fund has been a top quartile performer for the past 5 years” • “Our programme accepts only students proven to be top decile performers” • “I was in the 92nd percentile on the GMAT”
Using Excel • Full Descriptive Statistics • Tools • Data Analysis • Descriptive Statistics
Bivariate Statistics • So far, we have been dealing with statistics of individual variables • We also have statistics that relate pairs of variables
Interactions Sometimes two variables appear related: • smoking and lung cancers • height and weight • years of education and income • engine size and gas mileage • GMAT scores and MBA GPA • house size and price
Interactions • Some of these variables would appear to positively related & others negatively • If these were related, we would expect to be able to derive a linear relationship: y = a + bx • where, b is the slope, and • a is the intercept
Linear Relationships • We will be deriving linear relationships from bivariate (two-variable) data • Our symbols will be:
Example • Consider the following example comparing the returns of Consolidated Moose Pasture stock (CMP) and the TSE 300 Index • The next slide shows 25 monthly returns
Example • From the data, it appears that a positive relationship may exist • Most of the time when the TSE is up, CMP is up • Likewise, when the TSE is down, CMP is down most of the time • Sometimes, they move in opposite directions • Let’s graph this data
Example Summary Statistics • The data do appear to be positively related • Let’s derive some summary statistics about these data:
Observations • Both have means of zero and standard deviations just under 3 • However, each data point does not have simply one deviation from the mean, it deviates from both means • Consider Points A, B, C and D on the next graph
Implications • When points in the upper right and lower left quadrants dominate, then the sums of the products of the deviations will be positive • When points in the lower right and upper left quadrants dominate, then the sums of the products of the deviations will be negative
An Important Observation • The sums of the products of the deviations will give us the appropriate sign of the slope of our relationship
Covariance(Showing the formula only to demonstrate a concept)
Covariance • In the same units as Variance (if both variables are in the same unit), i.e. units squared • Very important element of measuring portfolio risk in finance