150 likes | 287 Views
Intro to Numerical Descriptions of Quantitative Data. 8/31/11. Different Quantitative Data Types. Discrete Data result when the number of possible values is either finite or countable.
E N D
Intro to Numerical Descriptions of Quantitative Data 8/31/11
Different Quantitative Data Types • Discrete Dataresult when the number of possible values is either finite or countable. • Continuous Dataresult from an infinitely many possible values that correspond to a continuous scale that covers a range of values without gaps or interruptions. • Example: A restaurant decides to count the number of patrons it receives for lunch on weekdays. They also track total receipts for the lunch crowd. The number of patrons is discrete data (it must be an integer) but the revenue is continuous.
Exploration – Centers of Data • The following terms need to be defined. Use the AP books to read and define the terms for your group • Median • Mean • Midrange • Also, prepare some thoughts about the merits of each as a true measure of the average. Consider what an extreme value might do to a small sample of data points and then consider what would happen if the set was larger.
5-number summary • The 5-number summary of a set of data is: {Min, Q1, Median AKA Q2, Q3, Max} • Consider the sorted set: {20, 23, 24, 27, 27, 28, 30, 30, 31, 32, 34, 36, 37, 40, 40, 51} • How would we determine the 5-number summary? • Min and Max come first • Then divide the set in half to find the median. • Then find the medians of the two half-sets (Q1, Q3) Try it at your table
5-number summary • Read p. 76 “So what is a quartile anyway.” • How we’ll calculate a quartile (usually): • Order the data. • Split the data into two halves. If the set is odd, the middle value is the median or second quartile. • Remove the median from the remaining sets and calculate the medians of the remaining half-sets. • These submedians are the 1st and 3rd quartiles or Q1 and Q2.
Calculating Q1 and Q3 • When we calculated the median, we have to divide the data into upper and lower halves. • We cut these halves into new halves and apply the same process to the smaller subsets. • Q1: Lower Half is {20, 23, 24, 27, 27, 28, 30, 30} • Median of this set is 27. • Q3: Upper Half is {31, 32, 34, 36, 37, 40, 40, 51} • Median of this set is (36+37)/2 = 36.5.
Putting it together • So our Five Numbers are • Min = 20 • Q1 = 27 • Q2 = 30.5 • Q3 = 36.5 • Max = 51 • The five numbers can be put into a graphical representation called a box-and-whisker plot. • The box is created by the range between Q1 and Q3 with a vertical line representing Q2 • The whiskers extend to the min and max values.
Box and Whisker Plot Min Q1 Median Q3 Max
Interquartile Range (IQR) and the Midhinge • The difference between Q3 and Q1 is called the Interquartile Range, or IQR. • It is a measure of spread that gives us a better understanding than the range (Max minus Min) because the range is subject to influence of extreme values, called outliers. • An outlier is any data point that is highly unusual. Defining highly unusual often depends on the circumstances, but a good rule is anything that is in the top or bottom .1 percentile of a distribution.
Box-plots and the 5-number Summary • Box plots can be used to understand the distribution of data, but there are some limitations for a traditional box plot. • The range of data from the minimum to the maximum is portrayed, but no allowances for outliers are made. • Use the Interquartile Range (IQR) and the Quartiles to create a Modified box-plot. Identify outliers by using the 1.5-IQR rule. • I.5×IQR Rule: In a modified box-plot, a data point is an outlier is it is above Q3 by 1.5×IQR or below Q1 by 1.5×IQR.
Using Box-Plots to determine outliers The following data are scores from a test. Are there any outliers in the data? Sort the data to find min, Q1, Q2, Q3 and max. Then find IQR (Q3 – Q1) and apply the I.5×IQR Rule
Using Box-Plots to determine outliers Data sorted: 31 54 57 58 59 60 65 66 66 68 70 70 71 71 74 77 78 80 81 84 X10 = 68 and X11 = 70 are the two middle values. Q2 is the average, so Q2 = 69. X5 = 59 and X6 = 60 are the middle values of the lower half, so Q1 = 59.5 X15 = 74 and X16 = 77 are the middle values of the upper half, so Q3 = 75.5. The IQR = Q3 – Q1 = 75.5 – 59.5 = 16 1.5×IQR = 1.5·16 = 24
Using Box-Plots to determine outliers • Therefore, our outliers would be any values greater than Q3 by at least 24 and any values lower than Q1 by 24 or more. • Q3 + 24 = 99.5 (This is called the upper fence) • Q1 – 24 = 35.5 (This is called the lower fence) • Are there outliers? • Yes. The low score was 31 which is below to the lower fence. • No outliers above. • How could we modify the box plot to indicate an outlier?
Modified Box Plot The upper whisker extends to the max value (84) because there are no outliers greater than 1.5×IQR over Q3. The box is the same as before. Left side is Q1, right side is Q3 and the middle bar is Q2. This is the outlier. The dot specifies the value of 31. The low whisker extends to the least value that is not an outlier (54).
Homework • Question 8 on page 91. Copy it down, please. Due on Friday.