350 likes | 572 Views
Statistics. Chapter 10. 10.1 Organizing and Picturing Information. Line Plot: A line plot is a basic and intuitive visual representation of data. 16. 14. 12. 10. Frequency. 8. 6. 4. 2. 10. 20. 30. 40. 50. 60. 70. 80. 90. 100. Student Test Scores.
E N D
Statistics Chapter 10
10.1 Organizing and Picturing Information Line Plot: A line plot is a basic and intuitive visual representation of data. 16 14 12 10 Frequency 8 6 4 2 10 20 30 40 50 60 70 80 90 100 Student Test Scores
Example:30 fourth graders took a science test and made the following scores. What can we conclude about the students’ performance? 22, 23, 14, 45, 39, 11, 9, 46, 22, 25, 6, 28, 33, 36, 16, 39, 49, 17, 22, 32, 34, 22, 18, 21, 27, 34, 26, 41, 28, 25 6, 9, 11, 14, 16, 17, 18, 21, 22, 22, 22, 22, 23, 25, 25, 26, 27, 28, 28, 32, 33, 34, 34, 36, 39, 39, 41, 45, 46, 49 Step1: Put data in ascending order. Step 2: Place one dot for each score 5 4 Frequency 3 2 1 5 10 15 20 25 30 35 40 45 50 Science Test Scores
Stem and Leaf Plots A stem and leaf plot is an effective way to present two sets of data side by side for analysis. For a two digit number the “stem” is the tens place, and the “leaf” is the ones place. Stems are listed once in ascending order vertically. Leaves are placed in increasing order away from the stem, and may be repeated if necessary.
Example: Make a stem and leaf plot for the following children’s heights, in centimeters. 94, 105, 107, 108, 108, 120, 121, 122, 123 For three digit numbers, the stem is the hundreds and tens positions, and the leaf is the ones position.
Histograms A bar graph used to graph frequency distributions of continuous variables is called a histogram. The graph is similar, but no spaces are allowed between the bars.
Grouping Data Values into Classes When there are many different values in a data set, we may group the data values into classes to better understand the information. Typically we use between 8 and 12 classes, but there is no rule that dictates the number we must use. Choose wisely.
Histograms • Histograms can also be drawn with percentages, like the bar graph.
Bar Graphs Bar graph: Specify the classes on the horizontal axis and the frequencies on the vertical axis.
Pictographs A pictograph uses a picture or icon to symbolize the quantities being represented. Pictorial Embellishments are used to make the graph more visibly appealing.
Scatterplots Data that occurs in pairs, such as dates and temperature, selling price of a home and its appraised value, etc, can be plotted on a set of axes similar to an xy plane. Such a plot is called a scatterplot.
10.2 Analyzing Data One of the ways to summarize data numerically is to calculate measures of center. The measures we will use are the mean, median, mode and quartiles. We will also be examining the Five Number Summary.
Mean The arithmetic mean is what we usually refer to as the “average.” To calculate the mean, we add up all the data points and divide by the number of data points.
Median If we arrange a set of numbers in order, the median is the middle value in the list of numbers. Case 1: Odd number of data points: The median is the data point in the middle position. Case 2: Even number of data points: The median is the average of the two middle numbers and is not a data point.
Mode The mode is the most frequent data point in the set. There can be more than one mode. If there are two modes, the data set is “bimodal”.
The Five Number Summary The median divides the data set into two halves. The set below the median is the lower half, and the set above the median is the upper half. The median of the lower half is the first quartile, Q1. The median of the upper half is the third quartile, Q3. The low data point, Q1, the median,Q3and the high data point form the five number summary.
Box and Whisker Plots The graph of the five number summary is called a box and whisker plot. min Q1 med Q3 max
Percentiles If we were to divide the data into 100 equal parts, percentiles could be used to mark the dividing points in the data. A number is in the nth percentile of some data if it is greater than or equal to n% of the data.
Measures of Dispersion Definitions: The range is the difference between the largest and the smallest data values in the set. If x is a data value in a set whose mean is then is called x’s deviation from the mean.
Standard Deviation The standard deviation measures how far off the mean a data point is “on average”. Think of standard deviation as the “average deviation” of a data set. Formula:
Definition: The z-score, z, for a particular score, x, is The z-score indicates how many standard deviations the number is away from the mean. Numbers above the mean = positive z-score. Numbers below the mean = negative z-score.
Distributions Definitions: A collection of numerical information is called data or a distribution. A set of data listed with their frequencies is called a frequency distribution. When the percent of the time each item occurs in a frequency distribution is listed, we call the distribution a relative frequency distribution
Bar graphs can be drawn using frequency distributions or relative frequency distributions. Bar graph using Relative Frequency Bar graph using Frequency Distribution
The Normal Distribution When describing a set of data, statisticians often look to the shape of the data. One special shape that occurs frequently is a bell curve. The bell curve indicates that a distribution is “normal”.
Characteristics of Normal Distribution • Bell shaped curve. • Highest point of curve is at the mean. • Mean = median = mode. • Curve is symmetric about the mean. • Total area under the curve is 1. • Points of inflection lie 1 standard deviation away from the mean. • 68% of data lies within one standard deviation of the mean.95% of data lies within two.99.7% lies within three.
Normal Distribution When discussing normal distributions, we assume we are dealing with an entire population rather than a sample. To indicate this, we change the symbols representing the mean and standard deviation. Mean Standard Deviation Before: Now: Before: Now:
Area Under the Normal Curve Areas under the curve represent percentages (or probabilities) of values in a distribution. To address this idea properly and generally, we need something called the standard normal distribution. This distribution is also called a “z distribution.”
Standard Normal Distribution 68% of the data lies between z = - 1 and z = 1 95% of the data lies between z = -2 and z = 2 99.7% of the data lies between z =-3 and z =3
10.3 Misleading Graphs and Statistics Scaling and Axis Manipulation To make the differences among bars of a histogram or bar chart more dramatic, the axes are often manipulated, either by changing the scales or omitting the scale values.
Line Graphs and Cropping To manipulate a line graph, one could either compress the vertical axis scaling or extend the scaling, whichever fits the desired effect.
Circle Graphs A circle graph can be misleading by not indicating the percent amounts, not having the correct central angle, or by illustrating the graph by “exploding” sectors.
Sampling The entire group in question is called the population. The subset of the population that is actually questioned is called a sample. Bias A bias is a flaw in the sampling procedure that makes it more likely the sample will not represent the entire population.