1 / 33

Chapter 2

Chapter 2. Frequency Distributions, Stem-and-leaf displays, and Histograms. Where have we been?.  = = 1.79. (X- ) = 0.00. (X- ) 2 = SS = 16.00. X = 30 N = 5  = 6.00.

wesley
Download Presentation

Chapter 2

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Chapter 2 Frequency Distributions, Stem-and-leaf displays, and Histograms

  2. Where have we been?

  3.  = = 1.79 (X- ) = 0.00 (X- )2 = SS = 16.00 X = 30 N = 5  = 6.00 To calculate SS, the variance, and the standard deviation: find the deviations from , square and sum them (SS), divide by N (2) and take a square root(). Example: Scores on a Psychology quiz Student John Jennifer Arthur Patrick Marie X 7 8 3 5 7 X -  +1.00 +2.00 -3.00 -1.00 +1.00 (X - )2 1.00 4.00 9.00 1.00 1.00 2 = SS/N = 3.20

  4. Ways of showing how scores are distributed around the mean • Frequency Distributions, • Stem-and-leaf displays • Histograms

  5. Some definitions • Frequency Distribution - a tabular display of the way scores are distributed across all the possible values of a variable • Absolute Frequency Distribution -displays the count of each score. • Cumulative Frequency Distribution -displays the total number of scores at and below each score. • Relative Frequency Distribution -displays the proportion of each score. • Relative Cumulative Frequency Distribution -displays the proportion of scores at and below each score.

  6. Example Data • Traffic accidents by bus drivers • Studied 708 bus drivers. • Recorded all accidents for a period of 4 years. • Data looks like:3, 0, 6, 0, 0, 2, 1, 4, 1, … 6, 0, 2

  7. Absolute Freq. 117 157 158 115 78 44 21 7 6 1 3 1 708 Relative Frequency .165 .222 .223 .162 .110 .062 .030 .010 .008 .001 .004 .001 .998 Notice rounding error Frequency Distributions # of accidents 0 1 2 3 4 5 6 7 8 9 10 11 Calculate relative frequency. Divide each absolute frequency by the N. For example, 117/708 = .165

  8. Relative Freq. .165 .222 .223 .162 .110 .062 .030 .010 .008 .001 .004 .001 .998 What can you answer? # of accidents 0 1 2 3 4 5 6 7 8 9 10 11 Proportion with at most 1 accident? = .165 + .222 = .387 .387 * 100 = 38.7% Proportion with 8 or more accidents? = .008 + .001 +.004 + .001 = .014 = 1.4% Proportion with between 4 and 7 accidents? = .110 + .062 +.030 + .010 = .212 = 21.2%

  9. Absolute Frequency 117 157 158 115 78 44 21 7 6 1 3 1 708 Cumulative Frequencies Cumulative Relative Frequency .165 .387 .610 .773 .883 .945 .975 .983 .993 .994 .999 1.000 Cumulative Frequency 117 274 432 547 625 669 690 697 703 704 707 708 Cumulative frequencies show number of scores at or below each point. # of acdnts 0 1 2 3 4 5 6 7 8 9 10 11 Calculate by adding all scores below each point. Cumulative relative frequencies show the proportion of scores at or below each point. Calculate by dividing cumulative frequencies by N at each point.

  10. Grouped Frequency Example 100 High school students’ average time in seconds to read ambiguous sentences. Values range between 2.50 seconds and 2.99 seconds.

  11. Grouped Frequencies Needed when • number of values is large OR • values are continuous. To calculate group intervals • First find the range. • Determine a “good” interval based on • on number of resulting intervals, • meaning of data, and • common, regular numbers. • List intervals from largest to smallest.

  12. Grouped Frequencies Range = 2.99 - 2.50 = .49 ~ .50 i = .1 #i = 5 i = .05 #i = 10 Reading Time 2.95-2.99 2.90-2.94 2.85-2.89 2.80-2.84 2.75-2.79 2.70-2.74 2.65-2.69 2.60-2.64 2.55-2.59 2.50-2.54 Frequency 9 7 20 11 10 10 4 8 10 11 Reading Time 2.90-2.99 2.80-2.89 2.70-2.79 2.60-2.69 2.50-2.59 Frequency 16 31 20 12 21

  13. Either is acceptable. • Use whichever display seems most informative. • In this case, the smaller intervals and 10 category table seems more informative. • Sometimes it goes the other way and less detailed presentation is necessary tp prevent the reader from missing the forest for the trees.

  14. Stem and Leaf Displays • Used when seeing all of the values is important. • Shows • data grouped • all values • visual summary

  15. Stem and Leaf Display • Reading time data Reading Time 2.9 2.9 2.8 2.8 2.7 2.7 2.6 2.6 2.5 2.5 Leaves 5,5,6,6,6,6,8,8,9 0,0,1,2,3,3,3 5,5,5,5,5,6,6,6,7,7,7,7,7,7,7,8,9,9,9,9 0,0,1,2,3,3,3,3,4,4,4 5,5,5,5,6,6,6,8,9,9 0,0,0,1,2,3,3,3,4,4 5,6,6,6 0,1,1,1,2,3,3,4 6,6,8,8,8,8,8,9,9,9 0,1,1,1,2,2,2,4,4,4,4 i = .05 #i = 10

  16. Stem and Leaf Display • Reading time data Reading Time 2.9 2.8 2.7 2.6 2.5 Leaves 0,0,1,2,3,3,3,5,5,6,6,6,6,8,8,9 0,0,1,2,3,3,3,3,4,4,4,5,5,5,5,5,6,6,6,7,7,7,7,7,7,7,8,9,9,9,9 0,0,0,1,2,3,3,3,4,4,5,5,5,5,6,6,6,8,9,9 0,1,1,1,2,3,3,4,5,6,6,6 0,1,1,1,2,2,2,4,4,4,4,6,6,8,8,8,8,8,9,9,9 i = .1 #i = 5

  17. Transition to Histograms 9 9 9 9 7 7 7 7 7 7 7 6 6 6 5 5 5 5 4 4 4 4 2 2 2 1 1 1 0 4 4 4 3 3 3 3 2 1 0 0 9 9 9 8 8 8 8 8 6 6 4 4 3 3 3 2 1 0 0 0 9 9 8 6 6 6 5 5 5 5 9 8 8 6 6 6 6 5 5 4 3 3 2 1 1 1 0 3 3 3 2 1 0 0 6 6 6 5

  18. Histogram of reading times 20 18 16 14 12 10 8 6 4 2 0 F r e q u e n c y Reading Time (seconds)

  19. Histogram concepts - 1 • Used to display continuous data. • Discrete data are shown on a box graph. • But most psychology data are continuous, even if they are measured with integers.

  20. Histogram concepts - 2 • Use bar graphs, not histograms, for discrete data. • You rarely see data that is really discrete. • Discrete data are categories or rankings. • If you have continuous data, you can use histograms, but remember real class limits. • Histograms can be used for relative frequencies as well.

  21. What are the real limits of each class? 20 18 16 14 12 10 8 6 4 2 0 F r e q u e n c y Real limits of the fifth class are ???? - ???? Real limits of the highest class are ???? - ????.

  22. What are the real limits of each class? 20 18 16 14 12 10 8 6 4 2 0 F r e q u e n c y Real limits of the fifth class are 2.695-2.745 Real limits of the highest class are 2.945 - 2.995

  23. Predicting from Theoretical Distributions • Theoretical distributions show how scores can be expected to be distributed around the mean. (Mean = 2.755 for reading data). • Distributions are named after the shapes of their histograms: • Rectangular • J-shaped • Bell (Normal) • many others

  24. Rectangular Distribution of scores

  25. Flipping a coin 100 75 50 25 0 100 flips - how many heads and tails do you expect? Heads Tails

  26. Rolling a die 100 75 50 25 0 120 rolls - how many of each number do you expect? 1 2 3 4 5 6

  27. Absolute Freq. 0 1 2 3 4 5 6 5 4 3 2 1 36 Relative Frequency .000 .028 .056 .083 .111 .139 .167 .139 .111 .083 .056 .028 1.001 Rolling 2 dice Dice Total 1 2 3 4 5 6 7 8 9 10 11 12 How many combinations are possible?

  28. Rolling 2 dice 100 90 80 70 60 50 40 30 20 10 0 360 rolls - how many of each number do you expect? 1 2 3 4 5 6 7 8 9 10 11 12

  29. Normal Curve

  30. J Curve Occurs when socially normative behaviors are measured. Most people follow the norm, but there are always a few outliers.

  31. Principles of Theoretical Curves • Expected frequency = Theoretical relative frequency * N • Expected frequencies are your best estimates because they are closer, on the average, than any other estimate when we square the error. • Law of Large Numbers - The more observations that we have, the closer the relative frequencies should come to the theoretical distribution.

  32. Q & A: Continuous data • HOW IS THE FACT THAT WE ARE DISPLAYING CONTINUOUS DATA SHOWN ON A HISTOGRAM AS OPPOSED TO A BAR GRAPH? • The bars of the graph on a histogram meet at the real limits of each interval. • IF DATA CAN ONLY BE INTEGERS (SUCH AS NUMBER OF TRUE/FALSE QUESTIONS ANSWERED CORRECTLY ON A PSYCH QUIZ), HOW COME IT IS CALLED CONTINUOUS DATA. • Whether data is continuous or discrete depends on what your measuring, not the accuracy of your measuring instrument. For example, distance is continuous whether you measure it with a yardstick or a micrometer. Knowledge, like self-confidence and other psychological variables, is probably best thought of as a continuous variable.

  33. Determining “i” (the size of the interval) • WHAT IS THE RULE FOR DETERMINING THE SIZE OF INTERVALS TO USE IN WHICH TO GROUP DATA? • Whatever intervals seems appropriate to most informatively present the data. It is a matter of judgement. Usually we use 6 – 12 same size intervals each of which use intuitively obvious endpoints (e.g., 5s and 0s).

More Related