1 / 35

S1: Chapter 4 Representation of Data

S1: Chapter 4 Representation of Data. Dr J Frost (jfrost@tiffin.kingston.sch.uk) . Last modified : 9 th September 2013. Stem and Leaf recap. Put the following measurements into a stem and leaf diagram:.

tavi
Download Presentation

S1: Chapter 4 Representation of Data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. S1: Chapter 4Representation of Data Dr J Frost (jfrost@tiffin.kingston.sch.uk) Last modified: 9th September 2013

  2. Stem and Leaf recap Put the following measurements into a stem and leaf diagram: 4.7 3.6 3.8 4.7 4.1 2.2 3.6 4.0 4.4 5.0 3.7 4.6 4.8 3.7 3.2 2.5 3.6 4.5 4.7 5.2 4.7 4.2 3.8 5.1 1.4 2.1 3.5 4.2 2.4 5.1 ? 1 2 3 4 5 4 1 2 4 5 2 5 6 6 6 7 7 8 8 0 1 2 2 4 5 6 7 7 7 7 8 0 1 1 2 (1) (4) (9) (12) (4) Key: 2 | 1 means 2.1 Now find: ? ? ? ?

  3. Back-to-Back Stem and Leaf recap Girls 80 84 91 80 98 40 60 64 72 96 85 88 76 54 58 92 80 79 Boys 60 91 65 67 75 46 72 71 57 64 60 50 68 ? Girls Boys 0 8 5 4 6 4 0 9 8 6 2 8 5 4 0 0 0 8 6 2 2 1 0 4 5 6 7 8 9 6 0 7 9 0 0 4 5 7 8 1 2 4 5 0 1 The data above shows the pulse rate of boys and girls in a school. Comment on the results. The back-to-back stem and leaf diagram shows that boy’s pulse rate tends to be lower than girls’. Key: 0|4|6 Means 40 for girls and 46 for boys. ?

  4. Box Plot recap Box Plots allow us to visually represent the distribution of the data. Sketch Sketch Sketch Sketch Sketch range IQR 0 5 10 15 20 25 30 How is the IQR represented in this diagram? How is the range represented in this diagram? Sketch Sketch

  5. Box Plots recap Sketch a box plot to represent the given weights of cats: 5lb, 6lb, 7.5lb, 8lb, 8lb, 9lb, 12lb, 14lb, 20lb ? ? ? ? ? 0 4 8 12 16 20 24 Sketch

  6. Outliers An outlier is: an extreme value. ? Outliers beyond this point 0 5 10 15 20 25 30 More specifically, it’s generally when we’re 1.5 IQRs beyond the lower and upper quartiles. (But you will be told in the exam if the rule differs from this)

  7. Outliers We can display outliers as crosses on a box plot. But if we have one, how do we display the marks for the minimum/maximum? 0 5 10 15 20 25 30 Maximum point is not an outlier, so remains unchanged. 0 5 10 15 20 25 30 But we have points that are outliers here. This mark becomes the ‘outlier boundary’, rather than the minimum.

  8. Examples ? 0 5 10 15 20 25 30 ? 0 5 10 15 20 25 30

  9. Exercises Pages 58 Exercise 4B Q2 Page 59 Exercise 4C Q1, 2

  10. Comparing Box Plots Box Plot comparing house prices of Croydon and Kingston-upon-Thames. Croydon Kingston £100k £150k £200k £250k £300k £350k £400k £450k “Compare the prices of houses in Croydon with those in Kingston”. (2 marks) • For 1 mark, one of: • In interquartile range of house prices in Kingston is greater than Croydon. • The range of house prices in Kingston is greater than Croydon. • i.e. Something spread related. • For 1 mark: • The median house price in Kingston was greater than that in Croydon. • i.e. Compare some measure of location (could be minimum, lower quartile, etc.) ? ?

  11. Bar Charts vs Histograms • Histograms • For continuous data. • Data divided into (potentially uneven) intervals. • [GCSE definition] Frequency given by area of bars.* • No gaps between bars. • Bar Charts • For discrete data. • Frequency given by height of bars. ? ? ? ? Use this as a reason whenever you’re asked to justify use of a histogram. Frequency Density Frequency 1.0m 1.2m 1.4m 1.6m 1.8m 6 7 8 9 Height Shoe Size * Not actually true. We’ll correct this in a sec.

  12. Bar Charts vs Histograms Still using the ‘incorrect’ GCSE formula: ? ? Freq ? F.D. Width ? Frequency = 40 ? 5 4 3 2 1 Frequency = 15 ? Frequency = 25 ? Frequency Density Frequency = 30 ? 10 20 30 40 50 Height (m)

  13. Area = frequency? The area of each bar in fact isn’t necessarily equal to the frequency. Actually: i.e. Similarly: However, we often let , so that that the becomes an =, as we were allowed to assume at GCSE.

  14. The key to almost every histogram question… …This diagram! Area Frequency For a given histogram, there’s some scaling to get from an area (whether the total area of the area of a particular bar) to the corresponding frequency. Once you’ve worked out this scaling, any subsequent areas you calculate can be converted to frequencies.

  15. Area = frequency? There were 60 runners in a 100m race. The following histogram represents their times. Determine the number of runners with times above 14s. 5 4 3 2 1 0 We first find what area represents the total frequency. Total area = 15 + 9 = 24 So each unit of area represents 60/24 = 2.5 runners ? Frequency Density Then use this scaling along with the desired area. ? (4 x 1.5) x 2.5 = 15 runners 9 12 18 Time (s)

  16. Frequency Density = Frequency Class width? Note the gaps! We can use the complete set of information in the first row combined with the bar to again work out the correct ‘scaling’. ? ? 5 4 3 2 1 0 Frequency Density 1 2 3 4 5 6 7 8 9 10 Time (s)

  17. May 2012 A policeman records the speed of the traffic on a busy road with a 30 mph speed limit. He records the speeds of a sample of 450 cars. The histogram in Figure 2 represents the results. (a) Calculate the number of cars that were exceeding the speed limit by at least 5 mph in the sample. (4 marks) M1 A1: Determine what one small square or one large square is worth. ? Num small squares = 562.5 One square = cars M1 A1: Use this to find number of cars travelling >35mph. ?

  18. May 2012 A policeman records the speed of the traffic on a busy road with a 30 mph speed limit. He records the speeds of a sample of 450 cars. The histogram in Figure 2 represents the results. (b) Estimate the value of the mean speed of the cars in the sample. (3 marks) (Hint: imagine the implied grouped frequency table) M1 M1: Use histogram to construct sum of speeds. ? A1 Correct value ? Notice that we don’t have a scale on the frequency density axis. But we can put anything we like (e.g. 10, 20, 30, …), because we’ll likely have to scale the area of the bars to get the frequency anyway.

  19. Jan 2012 Bro Tip: Be careful that you use the correct class widths! 14 ? 5 ? ? 21 + 45 + 3 = 69

  20. Jan 2008 M1 ? ? A1 ? B1 M1 ? ? A1 = 12 runners

  21. Answer: Distance is continuous ? Note that gaps in the class intervals! 4 / 5 = 0.8 19 / 5 = 3.8 53 / 10 = 5.3 ... ?

  22. Jun 2007 35 ? 15 ? (5 x 5) + 15 = 40 ?

  23. Skew Skew gives a measure of whether the values are more spread out above the median or below the median. mode mode median median mean mean Frequency Frequency Height Weight Sketch Mode Sketch Median Sketch Mean Sketch Mode Sketch Median Sketch Mean We say this distribution has positive skew. We say this distribution has negative skew. ? ? • (To remember, think that the ‘tail’ points in the positive direction)

  24. Skew Remember, think what direction the ‘tail’ is likely to point. Distribution Skew ? Salaries on the UK. High salaries drag mean up. So positive skew. Mean > Median ? ? IQ A symmetrical distribution, i.e. no skew. Mean = Median ? ? Heights of people in the UK Will probably be a nice ‘bell curve’. i.e. No skew. Mean = Median ? ? Likely to be people who retire significantly before the median age, but not many who retire significantly after. So negative skew. Mean < Median Age of retirement ?

  25. Exam Question In the previous parts of a question you’ve calculated that the mean mark of students in a test was and . (d) Describe the skewness of the marks of the students, giving a reason for your answer. (2) Negative skew 1st mark ? because mean < median 2nd mark ?

  26. Skew Positive skew ? Negative skew ? No skew ? Given the quartiles and median, how would you work out whether the distribution had positive or negative skew?

  27. Exam Question 1st mark ? Therefore positive skew. 2nd mark ?

  28. Calculating Skew One measure of skew can be calculated using the following formula: (Important Note: this will be given to you in the exam if required) 3(mean – median) standard deviation When mean > median, mean < median, and mean = median, we can see this gives us a positive value, negative value, and 0 respectively, as expected. Find the skew of the following teachers’ annual salaries: £3 £3.50 £4 £7 £100 ? Mean = £23.50 ? Median = £4 ? Standard Deviation = £38.28 Skew = 1.53 ?

  29. S1: Chapter 4 Revision!

  30. Revision • Stem and leaf diagrams: • Can you construct one, and write the appropriate key? • Can you calculate mode, mean, median and quartiles? • Can you assess skewness by using these above values? • Back-to-back stem and leaf diagrams: • Can you construct one with appropriate key? • Can you compare the data on each side? 1 2 3 4 5 4 1 2 4 5 2 5 6 6 6 7 7 8 8 0 1 2 2 4 5 6 7 7 7 7 8 0 1 1 2 (1) (4) (9) (12) (4) Key: 2 | 1 means 2.1 ? ? ? ? ? Type of skewReason: ? ?

  31. Revision Girls Boys Notice the values go outwards from the centre. 0 8 5 4 6 4 0 9 8 6 2 8 5 4 0 0 0 8 6 2 2 1 0 4 5 6 7 8 9 6 0 7 9 0 0 4 5 7 8 1 2 4 5 0 1 Key: 0|4|6 Means 40 for girls and 46 for boys. ? ? The data above shows the pulse rate of boys and girls in a school. Comment on the results. Boy’s pulse rate tends to be lower than girls’. ?

  32. Revision Histograms • Can you: • Appreciate that the frequency density scale doesn’t matter. This is why frequency is only proportional to area, and not equal to it. • You often need to identify the scaling .You might only be given the total frequency (in which case you need to find the total area of the histogram to find ).But if you know the frequency associated with a particular bar, just find the area of that single bar. • If you don’t care about the scaling, then • Be incredibly careful about class widths (i.e. widths of boxes). If the class interval in the frequency table was with gaps, then you’d draw on the histogram, and use 6 as the width of the box. • If you want to find the quartiles/median/mean, you need to first construct a grouped frequency table using the histogram. • When asked to find the number of people with values in a certain range (e.g. with times between 10 and 15s) and it crosses multiple ranges/bars, it’s easier to use the frequency table you’ve constructed from the histogram. Use linear interpolation where necessary.

  33. Revision M1 ? ? A1 ? B1 M1 ? ? A1 = 12 runners

  34. Revision Given that an outlier is a value outside the lower and upper quartiles… ? 0 5 10 15 20 25 30 ? 0 5 10 15 20 25 30

  35. Revision Skewness • You can determine skewness in two ways: • Comparing quartiles:When , the width of the right box in the box plot is wider, so it’s positive skew.If a box plot is drawn, it should be immediately obvious! • Comparing mean/median:When , large values have dragged up the mean, so there’s a tail in the positive direction, and thus the skew is positive. • When asked to justify your answer for skewness, you’re expected to put either something like “” or . • You will always be given a formula if you have to calculate a value for skew. But for all formulae, 0 means no skew (i.e. a “symmetric distribution”), >0 means positive skew and <0 means negative skew. Find the skew of the following teachers’ annual salaries: £3 £3.50 £4 £7 £100 ? Standard Deviation = £38.28 Mean = £23.50 ? Median = £4 ? Skew = 1.53 ?

More Related