250 likes | 334 Views
Exams returned at end of class Average = 78 Standard Dev = 12 Key with explanations will be posted Don’t be discouraged: First test is often hardest. Have been focused on categorical data & proportions Next segment of course will focus on numerical data & means
E N D
Exams returned at end of class Average = 78 Standard Dev = 12 Key with explanations will be posted Don’t be discouraged: First test is often hardest Have been focused on categorical data & proportions Next segment of course will focus on numerical data & means Today we discuss summary stats and graphs for numerical data Announcements
Numerical Data • Numerical data can be continuous or discrete • Discrete data is restricted by its nature to certain values, usually counts • Continuous data could conceptually be measured to more and more decimal places
Examples of Discrete Data • Number of people • Litter size for animal births • Number of days with rain
Example of Continuous Data • Temperature (not just 87, but 87.3 degrees) • Time (not just 10 seconds, but 10.58 sec) • Weight (not just 5 lbs, but 5.3 lbs)
Summaries of Numerical Data • Numerical data is summarized by a measure of “center” and a measure of “spread” • There are two pairs of these measures • Mean (center) and standard deviation (spread) • Median (center) and SIQR (spread) • SIQR = Semi-interquartile range
Mean and Standard Deviation • The mean is the average. To compute the mean, add up all the values and divide by the number of observations. • The standard deviation is a measure of spread. To compute it, subtract the mean from each value (called deviations). Square the deviations, total them, divide by n-1 and take the square root.
Example 1 • Observations: 50, 63, 72, 84, 91 • Mean = (50+63+72+84+91)/5 = 72 • Deviations = (50-72) = -22, … , (91 -72) = 19 • Deviations Squared = 484, … , 361 • Total of above = 484 + … + 361 = 1070 • Total/(5-1) = Total/4 = 267.5 • Square root of 267.5 = 16.35 • Standard Deviation = 16.35
Example 2 • Observations: 69, 71, 72, 72, 76 • Mean = (69+71+72+72+76)/5 = 72 • Standard Deviation = 2.54 • This data set has the same mean as example 1, but less variability. Thus, it has a lower “spread” or standard deviation.
Median and SIQR • The median is the middle of the sorted data. One half of the data is higher than the median and one half is below. • SIQR = (upper quartile - lower quartile)/2 • The lower quartile is the value so that one fourth of the data is below it and three fourths of the data is above it. • The upper quartile is the value so that three fourths of the data is below it and one fourth of the data is above it.
Example 1 revisited • Observations: 50, 63, 72, 84, 91 • Median = 72 • Lower Quartile = 63 • Upper Quartile = 84 • SIQR = (84 - 63)/2 = 10.5
Example 2 revisited • Observations: 69,71,72,72,76 • Median = 72 • Lower Quartile = 71 • Upper Quartile = 72 • SIQR = (72-71)/2 = .5 • Again, the two data sets have the same “center” but different “spreads”
Mean and SD Sensitive to outliers Sampling distributions are easily found Median and SIQR Robust to outliers Sampling distributions are difficult to find Making the comparison Therefore, we will use the mean and standard deviation for “well behaved” data and we will use the median and SIQR when we have outliers.
Sensitivity vs. Robustness • Observations: 50, 63, 72, 84, 91 • Mean = 72, SD = 16.35 • Median = 72, SIQR = 10.5 • New Observation = 24 • New Mean = 64, New SD = 24.45 • New Median = 67.5, New SIQR = 13.875 • The mean and SD were more heavily affected by the outlier than the median and SIQR.
Sampling Distributions • As we move forward, we will see that the sample mean is normally distributed, and that the t-distribution can help describe the sample mean and sample standard deviation • Finding the sampling distributions for the sample median and SIQR is more involved, and will not be covered in this course.
Summary Graphs • Stem-and-leaf chart • Histogram • Box plot
9 | 5677 9 | 001123444 8 | 566666778889 8 | 00001112222233444 7 | 55668999999 7 | 0011223344 6 | 567777777788899 6 | 000123334 5 | 6799 5 | 14 4 | 8 The stems are the first digit of the grade and placed to the left of the line The leaves are the second digit of the grade and placed to the right of the line Each grade is represented Example: There are three 81’s Exam Grades: Stem-and-leaf plot
Histogram: Section 506 • Histogram is a bar chart • More aesthetic than a stem-and-leaf • Cannot reconstruct the data set from a histogram
Box-plots • Useful for comparing groups • Center line is median • Top of box is upper quartile • Bottom of box is lower quartile Max Max Upper Q. Upper Q. Median Median Lower Q Lower Q Min Min
More On Boxplots • Same data sets as before, but a zero was added to each • Outliers are represented as points • Definition of outlier is based on the quartiles and the SIQR Max Max Upper Q. Upper Q. Median Median Lower Q Lower Q Lowest Non-outlier Lowest Non-outlier Min Min
Why I don’t curve Low scores indicate a problem to be addressed: learning is not happening Curving does not encourage learning, it is a cheap fix for low grades What I do instead Sometimes I offer exam corrections Other times I offer additional bonus assignments This time: a bonus assignment will be offered Grades and Curving
John In my Fall 99 class First Exam: D Good HW & Quiz Made office visits Grades improved Class grade: A I did not curve Sarah In my Fall 99 class First Exam: D Skipped HW & Quiz Never came by office Class grade: F Whined I did not curve A Tale of Two Aggies
Bonus E: Election Coverage • Give a statistical critique of election coverage of next week’s debate • If you can’t watch debate, you may use a magazine or newspaper (include copy) • Clarity: 2 points • Validity: 2 points • Brevity: 2 points • Typed on paper: due Oct. 24
How to make a stem-and-leaf • Click the Editor button • Enter data in columns • Click Close button • Go to Graphs: One Variable: Stem-and-Leaf • Select the variable of interest • Click OK
How to make a histogram • After entering data, go to Graphs: One Variable: Histogram: Continuous Variable • Select variable of interest • Set desired options • Click OK
How to make box-plots • Go to Graphs: Comparison of Variables: Box Plot Comparison • Select all variables of interest (makes side-by-side box plots) • Click OK.