1 / 35

BCOR 1020 Business Statistics

BCOR 1020 Business Statistics. Lecture 4 – January 29, 2008. Overview. Chapter 4 – Descriptive Statistics… Numerical Description Central Tendency Dispersion. Chapter 4 – Numerical Description. Sample ( Size = n ): Statistics are computed and estimate parameters

kavindra
Download Presentation

BCOR 1020 Business Statistics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. BCOR 1020Business Statistics Lecture 4 – January 29, 2008

  2. Overview • Chapter 4 – Descriptive Statistics… • Numerical Description • Central Tendency • Dispersion

  3. Chapter 4 – Numerical Description Sample (Size = n): Statistics are computed and estimate parameters e.g., = sample mean, S = sample std. dev. Population (Size = N): Characterized by Parameters e.g., m = pop. Mean, s = pop. Std. dev. • Recall: • Statistics are descriptive measures derived from a sample (n items). • Parameters are descriptive measures derived from a population (N items).

  4. Chapter 4 – Numerical Description There are three key characteristics of numerical data:

  5. Defect rate = total no. defects x 100 no. inspected Chapter 4 – Numerical Description Example: Vehicle Quality • Consider the data set of vehicle defect rates from J. D. Power and Associates. • Numerical statistics can be used to summarize this random sample of brands. • Must allow for sampling error since the analysis is based on sampling.

  6. Chapter 4 – Numerical Description • Number of defects per 100 vehicles, 2004 models.

  7. Chapter 4 – Numerical Description • Sorted data provides insight into central tendency and dispersion.

  8. Chapter 4 – Numerical Description Visual Displays: • The dot plot offers a visual impression of the data. • Histograms with 5 bins (suggested by Sturges’ Rule) and 10 bins are shown below. • Both are symmetric with no extreme values and show a modal class toward the low end.

  9. Chapter 4 – Numerical Description • We can compute descriptive statistics using Excel and discuss measures of central tendency and dispersion… • Figures 4.4 and 4.5 in your text details the Excel menus for computing descriptive statistics. • Figure 4.7 in your text details the MegaStat menus for computing descriptive statistics.

  10. Chapter 4 – Numerical Description MegaStat output…

  11. Chapter 4 – Central Tendency • The central tendency is the middle or typical values of a distribution. • Central tendency can be assessed using a dot plot, histogram or more precisely with numerical statistics. • The Text presents six measures of central tendency… • Mean – Median • Mode – Midrange • Geometric Mean (G) – Trimmed Mean • The mean and median are the most frequently used, but we will discuss the merits of all six.

  12. Chapter 4 – Central Tendency Mean – • A familiar measure of central tendency. • In Excel, use function =AVERAGE(Data) where Data is an array of data values. • For the sample of n = 37 car brands:

  13. Chapter 4 – Central Tendency Characteristics of the Mean: • Arithmetic mean is the most familiar average. • Affected by every sample item. • The balancing point or fulcrum for the data. • Regardless of the shape of the distribution, distances from the mean to the data points always sum to zero.

  14. Chapter 4 – Central Tendency Median(M) – the 50th percentile or midpoint of the sorted sample data. • Use Excel’s function =MEDIAN(Data) where Data is an array of data values. • M separates the upper and lower half of the sorted observations. • If n is even, the median is the average of the middle two observations in the data array. • If n is odd, the median is the middle observation in the data array.

  15. For odd n, Median = For even n, Median = Chapter 4 – Central Tendency Median: • To compute the median by hand, sort the n observations in the data: where

  16. For even n, Median = Chapter 4 – Central Tendency Example: • Consider the following n = 6 data values:11 12 15 17 21 32 • What is the median? n/2 = 6/2 = 3 and n/2+1 = 6/2 + 1 = 4 M = (x3+x4)/2 = (15+17)/2 = 16

  17. Clickers Consider the following n = 7 data values:12 23 23 25 27 34 41 What is the median? A = 24 B = 25 C = 26 D = 27

  18. Chapter 4 – Central Tendency Median • For the 37 vehicle quality ratings (odd n) the position of the median is (n+1)/2 = (37+1)/2 = 19. • So, the median is x19 = 121. • When there are several duplicate data values, the median does not provide a clean “50-50” split in the data.

  19. Chapter 4 – Central Tendency Characteristics of the Median • The median is insensitive to extreme data values. • For example, consider the following quiz scores for 3 students: • What does the median for each student tell you? Tom’s scores: 20, 40, 70, 75, 80 Mean =57, Median = 70, Total = 285 Jake’s scores: 60, 65, 70, 90, 95 Mean = 76, Median = 70, Total = 380 Mary’s scores: 50, 65, 70, 75, 90 Mean = 70, Median = 70, Total = 350

  20. Chapter 4 – Central Tendency Mode – The most frequently occurring data value. • Similar to mean and median if data values occur often near the center of sorted data. • May have multiple modes or no mode. • Easy to define, not easy to calculate in large samples. • Use Excel’s function =MODE(Array) • will return #N/A if there is no mode. • will return first mode found if multimodal. • May be far from the middle of the distribution and not at all typical. • Generally isn’t useful for continuous data since data values rarely repeat. • Best for attribute data or a discrete variable with a small range (e.g., Likert scale).

  21. Chapter 4 – Central Tendency Mode: • A bimodal distribution refers to the shape of the histogram rather than the mode of the raw data. • Occurs when dissimilar populations are combined in one sample. For example,

  22. Chapter 4 – Central Tendency Skewness: • Compare mean and median or look at histogram to determine degree of skewness. • Mean, Median & Skewness: • If median > mean, skewed left. • If median = mean, symmetric. • If median < mean, skewed right. • Mean, Mode & Skewness: • If mode > mean, skewed left. • If mode = mean, symmetric. • If mode < mean, skewed right.

  23. Midrange = Chapter 4 – Central Tendency Midrange – the point halfway between the lowest and highest values of X. • Easy to use but sensitive to extreme data values.

  24. Clickers Consider the J. D. Power quality data (n=37): What is the midrange? A = 121 B = 122 C = 130 D = 173

  25. Chapter 4 – Central Tendency Trimmed Mean: • To calculate the trimmed mean, first remove the highest and lowest k percent of the observations. • To determine how many observations to trim, multiply k x n: • Remove (k x n) highest and lowest observations. • Mitigates the effects of extreme values. • May exclude relevant data values.

  26. Chapter 4 – Dispersion • Variation is the “spread” of data points about the center of the distribution in a sample. The text considers the following measures of dispersion: • Range • Variance (S2) • Standard Deviation (S) • Coefficient of Variation (CV) • Mean Absolute Deviation (MAD) • The variance and standard deviation are the most frequently used, but we will briefly discuss the merits of all five.

  27. Chapter 4 – Dispersion Range – The difference between the largest and smallest observation. • Easy to calculate, but sensitive to extreme data values. Range = xmax – xmin

  28. Chapter 4 – Dispersion Variance: • The population variance (s2) is defined as the sum of squared deviations around the mean m divided by the population size. • For the sample variance (s2), we divide by n – 1 instead of n, otherwise s2 would tend to underestimate the unknown population variance s2.

  29. Population standard deviation Sample standard deviation Chapter 4 – Dispersion Standard Deviation – The square root of the variance. • Explains how individual values in a data set vary from the mean. • Units of measure are the same as X. • For the 37 vehicle quality ratings …

  30. Chapter 4 – Dispersion

  31. Chapter 4 – Dispersion Calculating Standard Deviation: • Excel’s built in functions are… • The standard deviation is nonnegative because deviations around the mean are squared. • When every observation is exactly equal to the mean, the standard deviation is zero. • Standard deviations can be large or small, depending on the units of measure. • Compare standard deviations only for data sets measured in the same units and only if the means do not differ substantially.

  32. Chapter 4 – Dispersion Coefficient of Variation – A unit-free measure of dispersion. • Expressed as a percent of the mean. • Useful for comparing variables measured in different units or with different means. • Only appropriate for nonnegative data. It is undefined if the mean is zero or negative.

  33. Clickers Recall from the J. D. Power quality data (n=37): What is the Coefficient of Variation? A = 5.48% B = 18.26% C = 22.89% D = 125.38%

  34. Chapter 4 – Dispersion Mean Absolute Deviation (MAD) – reveals the average distance from an individual data point to the mean (center of the distribution). • Uses absolute values of the deviations around the mean. • Excel’s function is =AVEDEV(Array).

  35. Machine B Machine A Chapter 4 – Dispersion Central Tendency vs. Dispersion: Manufacturing • Consider the histograms of hole diameters drilled in a steel plate during manufacturing. Acceptable variation but mean is less than 5 mm. Desired mean (5mm) but too much variation. • The desired distribution is outlined in red.

More Related