1 / 29

BCOR 1020 Business Statistics

BCOR 1020 Business Statistics. Lecture 5 – January 31, 2008. Overview. Chapter 4 – Descriptive Statistics… Standardized Data Percentiles and Quartiles Boxplots. Chapter 4 – Standardized Data. Chebyshev’s Theorem – Developed by mathematicians Jules Bienaym é (1796-1878)

patty
Download Presentation

BCOR 1020 Business Statistics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. BCOR 1020Business Statistics Lecture 5 – January 31, 2008

  2. Overview • Chapter 4 – Descriptive Statistics… • Standardized Data • Percentiles and Quartiles • Boxplots

  3. Chapter 4 – Standardized Data Chebyshev’s Theorem – Developed by mathematicians Jules Bienaymé (1796-1878) and Pafnuty Chebyshev (1821-1894). • For any population with mean m and standard deviation s, the percentage of observations that lie within k standard deviations of the mean must be at least 100[1 – 1/k2]. • For k = 2 standard deviations, 100[1 – 1/22] = 75% (So, at least 75.0% will lie within m+ 2s.) • For k = 3 standard deviations, 100[1 – 1/32] = 88.9% (So, at least 88.9% will lie within m+ 3s.) • Although applicable to any data set, these limits tend to be too wide to be useful.

  4. Clickers Using Chebyshev’s Theorem, determine the minimum percentage of observations that lie within 4 standard deviations of the mean. 100[1 – 1/k2] A = 75.0% B = 88.9% C = 93.8% D = 96.0%

  5. Chapter 4 – Standardized Data The Empirical Rule: • The normal or Gaussian distribution was named for Karl Gauss (1771-1855). • The normal distribution is symmetric and is also known as the bell-shaped curve. • The Empirical Rule states that given data from a normal distribution, we expect that for… k = 1: About 68.26% will lie within m+ 1s. k = 2: About 95.44% will lie within m+ 2s. k = 3: About 99.73% will lie within m+ 3s.

  6. Chapter 4 – Standardized Data The Empirical Rule: • Distance from the mean is measured in terms of the number of standard deviations. • Unusual Observations: Unusual observations are those that lie beyond m+ 2s. Outliers are observations that lie beyond m+ 3s. Note: no upper bound is given. Data values outside m+ 3s are rare.

  7. Clickers Suppose 80 students take an exam. Assuming exam scores follow a normal distribution, approximately how many students would you expect to have scores within 2 standard deviations of the mean? A = 55 B = 76 C = 79 D = 80

  8. Chapter 4 – Standardized Data Defining a Standardized Variable: • A standardized variable (Z) redefines each observation in terms the number of standard deviations from the mean. Standardization formula for a population: Standardization formula for a sample: • zi tells how far away the observation is from the mean (in terms of s). • A negative z value means the observation is below the mean. • Positive z means the observation is above the mean.

  9. Chapter 4 – Standardized Data Defining a Standardized Variable: • MegaStat calculates standardized values as well as checks for outliers. • In Excel, use =STANDARDIZE(Array, Mean, STDev) to calculate a standardized z value.

  10. Chapter 4 – Standardized Data Example: Unusual Observations in the P/E Data • The P/E ratio data contains several large data values. Are they unusual or outliers? Raw Data: Standardized Data:

  11. Chapter 4 – Standardized Data Outliers: What do we do with outliers in a data set? • If due to erroneous data, then discard. • An outrageous observation (one completely outside of an expected range) is certainly invalid. • Recognize unusual data points and outliers and their potential impact on your study. • Research books and articles on how to handle outliers.

  12. Chapter 4 – Standardized Data Estimating Sigma: • It is common to use the sample standard deviation(S) as an estimate of s. • We can also use the empirical rule to define a simple (quick-and-dirty) estimate: • For a normal distribution, the range of 99.73% of the values is 6s (from m – 3s to m + 3s). • If you know the range R (high – low), you can estimate the standard deviation as s = R/6. • Useful for approximating the standard deviation when only R is known. • This estimate depends on the assumption of normality.

  13. Chapter 4 – Percentiles & Quartiles Percentiles: • Percentiles are data that have been divided into 100 groups. • For example, you score in the 83rd percentile on a standardized test. That means that 83% of the test-takers scored below you. • Deciles are data that have been divided into 10 groups (i.e. 10th, 20th, 30th, etc. percentiles). • Quintiles are data that have been divided into 5 groups (i.e. 20th, 40th, 60th, 80th, 100th percentiles). • Quartiles are data that have been divided into 4 groups (i.e. 25th, 50th, 75th, 100th percentiles).

  14. Chapter 4 – Percentiles & Quartiles Percentiles: • Percentiles are used to establish benchmarks for comparison purposes… • (e.g., health care, manufacturing and banking industries use 5, 25, 50, 75 and 90 percentiles). • Percentiles are used in employee merit evaluation and salary benchmarking.

  15. Chapter 4 – Percentiles & Quartiles Quartiles: • Quartiles are scale points that divide the sorted data into four groups of approximately equal size. • The three values that separate the four groups are called Q1, Q2, and Q3, respectively. • Quartiles (25, 50, and 75 percent) are commonly used to assess financial performance and stock portfolios.

  16. Chapter 4 – Percentiles & Quartiles Quartiles: • The second quartile Q2 is the median, an important indicator of central tendency. • Q1 and Q3 measure dispersion since the interquartile rangeQ3 – Q1 measures the degree of spread in the middle 50 percent of data values.

  17. Chapter 4 – Percentiles & Quartiles Method of Medians: • For small data sets, find quartiles using method of medians: Step 1. Sort the observations. Step 2. Find the median Q2. Step 3. Find the median of the data values that lie belowQ2. This is Q1. Step 4. Find the median of the data values that lie aboveQ2. This is Q3.

  18. Clickers Recall the following P/E ratios for 68 stocks in a portfolio. First Find Q1, Q2 and Q3. We can use quartiles to define benchmarks for stocks that are low-priced (bottom Quartile or Q1) or high-priced (top quartile or Q3). What is the P/E ratio benchmark for high- priced stocks in this portfolio? A = 14 B = 19 C = 26 D = 36

  19. Chapter 4 – Percentiles & Quartiles Example: P/E Ratios and Quartiles: • recall from the previous question: • These quartiles express central tendency (M = Q2) and dispersion (the interquartile range IQR). • Because of clustering of identical data values, these quartiles do not provide clean cut points between groups of observations.

  20. Chapter 4 – Percentiles & Quartiles Excel Quartiles: • Use Excel function =QUARTILE(Array, k) to return the kth quartile. • Excel treats quartiles as a special case of percentiles. For example, to calculate Q3… • We can use either =QUARTILE(Array, 3) or =PERCENTILE(Array, 75) • Excel calculates the quartile positions as:

  21. Chapter 4 – Percentiles & Quartiles Caution: • Quartiles generally resist outliers. • However, quartiles do not provide clean cut points in the sorted data, especially in small samples with repeating data values. • Although they have identical quartiles, these two data sets are not similar. The quartiles do not represent either data set well.

  22. Midhinge = Chapter 4 – Percentiles & Quartiles Central Tendency & Dispersion Using Quartiles: Some robust measures of central tendency using quartiles are: • Median (M = Q2) – we’ve already discussed. • Midhinge – The mean of the 1st and 3rd quartiles: Both are robust measures of central tendency since they ignore extreme values (outliers).

  23. Chapter 4 – Percentiles & Quartiles Central Tendency & Dispersion Using Quartiles: Some robust measures of dispersion using quartiles are: • Midspread (Innerquartile Range,IQR) – A robust measure of dispersion: • Coefficient of Quartile Variation (CQV) – Measures relative dispersion, expresses the midspread as a percent of the midhinge: • Similar to the CV, CQV can be used to compare data sets measured in different units or with different means. Midspread = Q3 – Q1

  24. Clickers Recall from the data set of 68 P/E ratios: Min = 7, Q1 = 14, Q2 = 19, Q3 = 26, Max = 91 What is the Midspread (Innerquartile Range)? A) 12 B) 19 C) 77 D) 84

  25. Chapter 4 – Boxplots Boxplots – A useful tool of exploratory data analysis (EDA). • Also called a box-and-whisker plot. • Based on a five-number summary: Xmin, Q1, Q2, Q3, Xmax Example: Consider the five-number summary for the 68 P/E ratios… Xmin = 7, Q1 = 14, Q2 = 19, Q3 = 26, Xmax = 91

  26. Whiskers Box Q1 Q3 Minimum Maximum Right-skewed Median (Q2) Chapter 4 – Boxplots • The Boxplot for the P/E ratio data is …

  27. Chapter 4 – Boxplots Fences and Unusual Data Values – Use quartiles to detect unusual data points. • These points are called fences and can be found using the following formulas: • Values outside the inner fences are unusual while those outside the outer fences are outliers.

  28. Inner Fence OuterFence Unusual Outliers Chapter 4 – Boxplots Fences and Unusual Data Values: • Truncate the whisker at the fences and display unusual values and outliers as dots. Example: Boxplot of P/E ratios with fences… Based on these fences, there are three unusual P/E values and two outliers.

  29. Chapter 4 – Standardized Data Example: Unusual Observations in the P/E Data • The P/E ratio data contains several large data values. Are they unusual or outliers? Compare the boxplot to standardized data analysis… Standardized Data:

More Related