120 likes | 176 Views
Chapter 2. Descriptive Statistics. I . Section 2-1 A. Steps to Constructing Frequency Distributions 1. Determine number of classes (may be given to you) a. Should be between 5 and 15 classes. 2. Find the Range a. The Maximum minus the Minimum .
E N D
Chapter 2 Descriptive Statistics
I. Section 2-1 A. Steps to Constructing Frequency Distributions 1. Determine number of classes (may be given to you) a. Should be between 5 and 15 classes. 2. Find the Range a. The Maximumminus the Minimum. 1) Use the TI-83 to sort the data. a) STAT – Edit – enter numbers into L1 b) STAT – SortA(L1) will put the numbers into ascending order. c) STAT – SortD(L1) will put the numbers into descending order. 3. Find the Class Width a. Range divided by the number of classes. 1) Always round UP!! a) Even if class width comes out to an even number, go up one. 4. Find the Lower Limits a. Begin with the minimum value in your data set, and then add the class width to that to get the next Lower Limit. 1) Repeat as many times as needed to get the required number of classes.
5. Find the Upper Limits a. The Upper Limit of the first class is one less than the Lower Limit of the second class. 1) Add the class width to each Upper Limit until you have the necessary number of classes. 6. Find the Lower Boundaries a. Subtract one-half unit from each Lower Limit (Do NOT round these!) 7. Find the Upper Boundaries a. Add one-half unit to each Upper Limit. 8. Find the Midpoints of each class. a. The means of the Lower and Upper Limits (Do NOT round). 1) Could also use the means of the boundaries for this. 9. Frequency Distribution a. Place a tally mark in each class for every piece of data that fits there. b. Add up the tally marks – these are your frequencies for each class. 10. Relative Frequencies a. Divide the class frequencies by the total number of data points to find the percentage of the total represented by each class.
11. Cumulative Frequencies a. The total number of tallies for each class, plus all those that came before. 1) The cumulative frequency of the last class must equal the number of data points used. B. Steps to Constructing a Frequency Histogram 1. Label the horizontal axis with the class boundaries. 2. Label the vertical axis with the number of frequencies. 3. Draw a bar graph with bars that touch, using the frequencies from your frequency distribution. C. Steps to Constructing a Relative Frequency Histogram 1. Label the horizontal axis with the class boundaries. 2. Label the vertical axis with the frequency percentages. 3. Draw a bar graph with bars that touch, using the relative frequencies from your frequency distribution. D. Steps to Constructing an Ogive 1. Label the horizontal axis with the midpoints of each class. 2. Label the vertical axis with the total number of data points. 3. Place a dot at each midpoint that corresponds to that class’s cumulative frequency. a. This chart will always end at the total number of data points.
II. Section 2-2 A. Stem and Leaf Plot 1. Use the extreme values as your starting point. 2. Go through the data points, placing the leaves beside the appropriate stems. 3. If you have too many data points, you can use two lines per stem, with 0-4 consisting of the first line, and 5-9 on the second line. B. Dot Plot 1. Use a horizontal line, numbered from lowest data value to highest. a. Place a dot on the line at each data point. 1) This allows you to see visually whether you have a tight grouping of data points, and where it is, if it exists. C. Pie Chart 1. Used to describe parts of a whole. a. Multiply the relative frequency you calculated earlier by 360 (the number of degrees in a circle) to find the number of degrees that each class will consist of. 1) The calculated number of degrees corresponds to the interior anglein the circle. a) Use a protractor to draw your angles.
D. Scatter Plot 1. Used to visually examine the possible relationship between two different elements. a. Place one element on the vertical axis, and the other on the horizontal. 1) Graph them as if one was the xvalue of an ordered pair and the other was the y-value. 2. The closer the dots are to being linear, the stronger the relationship. a. If the slope is upward, the relationship has a positive correlation. b. If the slope is downward, the relationship has a negative correlation.
III. Section 2-3 A. Measures of Central Tendency 1. Mean – The sum of all data points divided by the number of values. a. This one is the one that we most often think of when we say “average”. 1) It’s also the one most affected by an extreme value (either high or low). 2. Median – the middle number (or mean of two middle numbers) when the data points are put into order. a. The point which has as many data values above it as there are below it. 3. Mode – The value that happens the most often (highest frequency). B. Shapes of Distributions 1. Symmetric – Data bunched in the middle, with equal distribution on either side. 2. Uniform – Data is spread evenly across the whole spectrum. 3. Skewed Data – Named by the “tail”. a. Skewed right means most of the data values are to the left (low) end of the range. b. Skewed left means that most of the data values are to the right (high) end of the range.
IV. Section 2-4 A. Measures of Variation 1. Range – the difference between the highest value and the lowest value. (Maximum minus Minimum) a. Easy to compute but only uses two numbers from a data set. 2. Deviation – The difference between the value of a data point and the meanof the data set. a. In a population, the deviation of x is . (Greek letter “mu”, pronounced “moo”) b. In a sample, the deviation of x is (pronounced “x bar”) c. The sum of the deviations of a set of data will always be zero. 3. Population Measures of Variance – a. Population Variance -- The sum of the squares of the deviations, divided by N (the number of data pointsin the population). 1). Find the deviations, and then square them (this makes them all positive, so they don’t cancel each other out) a) Add up the squared deviations, and then divide by the number of data points. b. Population Standard Deviation – The square root of the population variance.
4. Sample Measures of Variance a. Sample Variance – The sum of the squares of the deviations, divided by n - 1(one less than the number of data points in the sample). b. Sample Standard Deviation – The square root of the sample variance. B. Empirical Rule 1. Allsymmetric bell-shaped distributions have the following characteristics: a. About 68% of data points will occur within one standard deviation of the mean. b. About 95% of data points will occur within two standard deviations of the mean. c. About 99.7% of data points will occur within three standard deviations of the mean.
C. Chebychev’s Theorem 1. This applies to ANY distribution, regardless of its shape. a. The portion of data lying with kstandard deviations (k > 1) of the meanis at least 1) For k = 2, at least 1 – ¼ = ¾ or 75% of the data will be within 2 standard deviations of the mean. 2) For k = 3, at least 1 – 1/9 = 8/9 or 88.9% of the data will be within 3 standard deviations of the mean.
V. Section 2-5 – Measures of Position A. Quartiles 1. Q1, Q2 and Q3 divide the data into 4 equal parts. a. Q2 is the same as the median, or the middle value. b. Q1 is the median of the data below Q2. c. Q3 is the median of the data above Q2. 2. Box and Whisker Plot a. Left whisker runs from lowest data value to Q1. b. Box runs from Q1 to Q3, with a line through it at Q2. 1) The distance from Q1 to Q3 is called the interquartile range. c. Right whisker runs from Q3 to highest data value. d. To draw a box-and-whisker plot on the TI-83, follow these steps. 1) Enter the data values into L1 in STAT Edit 2) Turn on your Stat Plots (2nd Y=), and select the plot with the box- and-whisker shown) 3) Set your window to match the data a) Xmin should be less than your lowest data point. b) Xmax should be more than your highest data point. 4) Press graph. The box-and-whisker plot should appear. a) Press the Trace button and you can see exactly which values make up the Min, Q1, Median, Q3, and the Max.
B. Percentiles 1. Divide the data into 100 parts. There are 99 percentiles (P1, P2, P3, …P99) a. P50 = Q2 = the median. b. P25 = Q1 c. P75 = Q3 2. A 63rd percentile score means that this person did as well as or better than63% of the people who took that test. 3. The cumulative frequency that we did way back in section one can help us find the percentile. C. Z-Scores 1. Also called the “standard score”, it represents the number of standard deviations that a data value is away from the mean. a. 2. A z-score of less than -2 or greater than 2 is considered to be unusual. a. Remember that 95% of data points should be within 2 standard deviations of the mean.