1 / 41

Statistics

Statistics. Types of Data. Categorical data is data which has worded categories eg ‘ Ways of getting to school ’ might have the categories bus, car, walk, bike. Quantitative data is numerical data. It will either be discrete or continuous.

blake-yates
Download Presentation

Statistics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Statistics

  2. Types of Data • Categorical data is data which has worded categories • eg ‘Ways of getting to school’ might have the categories bus, car, walk, bike. • Quantitative data is numerical data. It will either be • discrete or continuous. • Discrete data is countable eg number of peas in a pod, • number of pupils in a class. • Continuous data is measurable eg lengths of leaves, • heights of people, weights of animals etc

  3. Definitions • The population is the group about which you wish to collect information. • A census involves collecting data from every person in the population. • A sample involves collecting data from those in a part of the population. • A sample should be unbiased and • therefore representative of the • entire population. • A biased sample is one which has • been unfairly influenced by the • collection process.

  4. Ways of Displaying Data 1. Bar Graph (for discrete data) 2. Histogram (for continuous data) 3. Dot Plot 4. Strip Graph 5. Pictogram 6. Pie Graph 7. Stem and Leaf 8. Box and Whisker 9. Line Graph * * * * * * *= most important ones

  5. Frequency Tables Here are the number of questions the 27 students in 9HRT got correct out of the 8 ‘Do Now’ revision questions at the start of a lesson: 5, 5, 4, 6, 4, 4, 5, 5, 6, 4, 4, 3, 7, 4, 6, 5, 5, 5, 4, 6, 5, 6, 4, 5, 4, 5, 5 We illustrate this in a frequency table as follows: 1 9 11 5 1 27

  6. Discrete Data For discrete data, we graph the information in a bar graph. • A bar graph must have: • a title • labels on both axes • scale on both axes • separate bars • x-axis numbers in centre • of the bar • y-axis scale starting at 0 with evenly spaced numbers • a good size – at least ⅓ page

  7. Outliers Outliers are data values that are either much larger or much smaller than the general body of data. Outliers appear separated from the body of data on a graph. Number of peas in pods from my garden Frequency

  8. Grouped Discrete Data A kindergarten was concerned about the number of cars passing by between 8.45 am and 9.00 am. Over 30 week days they recorded data. The results were: 27, 30, 17, 13, 46, 23, 40, 28, 38, 24, 23, 22, 18, 29, 16, 35, 24, 8, 24, 44, 32, 52, 31, 39, 32, 19, 41, 38, 24, 32 In situations like this, it is necessary to group the data into class intervals.

  9. A histogram is then drawn as shown below. Note the difference in the scale on the horizontal axis. Histogram: Frequency Table: Cars passing by kindergarten Frequency 0 10 20 30 40 50 60 Number of cars

  10. Weights of boys in the rugby squad. Frequency Continuous Data For continuous data, we graph the information in a histogram. Example: 2 students lie in the 60 kg up to but not including 70 kg, 7 students lie in the 70 kg up to but not including 80 kg, 9 students lie in the 80 kg up to but not including 90 kg, 5 students lie in the 90 kg up to but not including 100 kg, 3 students lie in the 100 kg up to but not including 110 kg. Rugby Squad The frequency table would be: The graph would be: Note: The bars are JOINED

  11. histogram • A bar graph must have: • a title • labels on both axes • scale on both axes • separate bars • x-axis numbers in centre of the bar • y-axis scale starting at 0 with evenly spaced numbers • a good size – at least ⅓ page bars joined at the join of the bars

  12. Averages An average is a number that is typical of the data. In maths we use three different types of average 1. 2. 3. mean = median = middle value mode = most common value Example: the mean of 5, 0, 8, 1, 0, 4, 3, 0, 2, 2

  13. The median is the middle value when they are all placed in order. For an odd number of data, the median is the one in the middle. For an even number of data, the median is between the two middle values. Example: The median of: 4, 6, 3, 2, 7, 8, 3, 5, 5, 7, 6, 6, 4 In order, the data is: 2, 3, 3, 4, 4, 5, 5, 6, 6, 6, 7, 7, 8 There are 13 bits of data so the median is the 7th bit median = 5 Example: The median of 2, 3, 3, 4, 4, 5, 5, 6, 6, 6, 7, 7, 7, 8 medianis (5+6) ÷ 2 = 5.5 There are 14 bits of data so the median is between the 7th and 8th bit

  14. The mode is the value that occurs most often Example: The mode of 4, 6, 3, 2, 7, 8, 3, 5, 5, 7, 6, 6, 4 The mode = 6 Example: The mode of 2, 3, 3, 4, 4, 5, 5, 6, 6, 6, 7, 7, 7, 8 This data set has two modes. The modes are 6 and 7. We say that the data is bimodal. If a data set has more than 2 modes, we do not use the mode as a measure of the middle. Pg 308 Ex 10F.1

  15. What’s a quartile? Measures of Spread eg: 2, 5, 8, 3, 5, 7, 1, 9, 4, 8, 4, 6, 7, 8, 4, 9 • The range is the difference between the largest and the smallest value, ie range = highest value – lowest value. • The inter-quartile range is the difference between the upper quartile and lower quartile, ie I.Q.R. = upper quartile – lower quartile The range is 9 - 1 = 8 That’s easy. It’s just the median of each half of the data.

  16. UQ LQ median Quartiles eg 1: 1, 1, 1, 2, 2, 3, 3, 4, 5, 5, 5, 6, 6, 6, 7, 8, 9, 9 18 values puts the median between the 9th and 10th values. ie median = 5 The lower quartile is ¼ of the way along the data, which is the middle of the left hand half. ie LQ = 2 The upper quartile is ¾ of the way along the data, which is the middle of the right hand half. ie UQ = 6 Inter-quartile range = UQ – LQ = 6 – 2 = 4

  17. UQ LQ median eg 2: 1, 1, 1, 2, 3, 3, 4, 5, 6, 7, 7, 8, 8, 9, 9, 10, 12 17 bits of data, so median is the 9th bit of data ie median = 6 Each half (not counting the median) has 8 bits of data, so the quartiles are between the 4th and 5th bit of data. LQ = 2.5 UQ = 8.5 IQR = 6

  18. Stem & Leaf Graph Time spent queueing at the bank 2 unsorted sorted The time spent (in minutes) by 20 people in a queue at a bank has been recorded as follows: 8 5 6 0 2 6 8 8 1 2 7 9 6 4 5 0 8 0 0 Key: 2 | 7 = 2.7 minutes median = (2.2+2.7) ÷ 2 = 2.45 mins UQ = 3.7 mins 1.45 mins LQ =

  19. Discus results for 10SKD • This is an example of a back to back stem and leaf plot. • The numbers are sorted from smallest to largest - from the centre out. girls boys 14 0 2 5 13 6 7 3 0 12 2 4 5 3 11 9 3 3 10 3 5 4 9 4 5 5 8 5 7 • A stem and leaf graph must have • a title • labels (if back-to-back) • a key • the leaf numbers in columns • numerical order (sorted) • no commas between numbers Key | 9 | 4 = 9.4 m

  20. Min LQ Median UQ Max Box and Whisker plot • The box and whisker plot is a visual display of the five statistics: • Minimum, Lower Quartile, Median, Upper Quartile and Maximum.

  21. 100 90 80 70 60 50 40 30 20 10 0 • A box and whisker graph must have: • an axis • a scale on the axis • a label on the axis (including the units) • a title • headings, if side by side Maths exam results eg: boys girls mark (%)

  22. Comparing data Often two (or more) box and whisker plots are put on the same set of axes (‘side-by-side’ box plots). We can then compare the data, commenting on: 1) ‘on average’ (using the median) 2) ‘spread’ (using the range or IQR) 3) ‘shape’ (symmetrical or skewed) 4) and a general statement vertically or horizontally

  23. Statistical Investigation 1) Pose the question. 2) Collect the data. This may involve preparing a questionnaire, then deciding on whether the data should be collected from the whole population or from just a sample. If from a sample, how do you choose your sample and how many should be in it? 3) Organise the data collected. Maybe in a frequency table or a stem and leaf plot. Calculate the relevant statistics (eg mean, median, LQ, UQ, range, IQR) 4) Illustrate the data collected. Maybe in a box and whisker plot, a pie graph, a bar graph etc 5) Write an analysis, ending with a conclusion.

  24. Analysis When writing a report to compare two (or more) data sets, there are four things we need to mention: 1) On average which is heavier, longer, better etc and quote the values of the medians or means for each set of data. 2) The spread of the data by quoting the values of the range or IQR for each set of data. 3) The shape of the data (by looking at the bar graph, stem & leaf graph or box & whisker plot) 4) A conclusion We can also mention if there is an outlier ie a value that is significantly bigger or smaller than the rest of the data.

  25. 7 6 5 4 3 2 1 7 6 5 4 3 2 1 Shape: symmetrical uni-modal bi-modal skewed symmetrical skewed to the lower values skewed to the higher values

  26. 0 1 2 3 4 5 6 5 9 2 3 3 5 7 7 5 5 0 0 0 0 4 4 5 6 4 4 1 1 1 2 7 8 9 9 0 3 3 5 9 8 8 5 3 2 2 0 7 7 5 3 1 1 9 5 3 2 5 1 = 51 cm Example: Here is a back-to-back stem and leaf graph comparing the lengths of leaves of sprayed fern plants with those that had not been sprayed. sprayed unsprayed Sprayed: Unsprayed: minimum = 20 cm minimum = 5 cm 29 cm 17.5 cm LQ = LQ = median = 43 cm median = 31 cm 39.5 cm UQ = 54 cm UQ = maximum = 65 cm maximum = 59 cm IQR = 25 cm IQR = 22 cm range = 45 cm range = 54 cm

  27. 70 65 60 55 50 45 40 35 30 25 20 15 10 5 0 Lengths of leaves of fern plants Sprayed: minimum = 20 cm Sprayed Unsprayed 29 cm LQ = median = 43 cm UQ = 54 cm maximum = 65 cm IQR = 25 cm range = 45 cm Length (cm) Unsprayed: minimum = 5 cm 17.5 cm LQ = median = 31 cm 39.5 cm UQ = maximum = 59 cm IQR = 22 cm range = 54 cm

  28. Analysis: 70 65 60 55 50 45 40 35 30 25 20 15 10 5 0 Lengths of leaves of fern plants On average, the sprayed leaves have grown longer than the unsprayed leaves Sprayed Unsprayed as their median is 43cm compared with the unsprayed leaves median of 31cm. The unsprayed leaves lengths have a greater spread as their range is 54 cm compared with the sprayed leaves range of 45cm. The shape of the unsprayed leaves data is uni-modal whereas for the Length (cm) sprayed leaves it is bi-modal (as seen from the stem & leaf) . In conclusion, the lengths of sprayed leaves are longer than the lengths of the non-sprayed leaves.

  29. 1) The following is a back-to-back stem and leaf graph of the heights of boys and girls in a year 9 class. Work out the relevant statistics, draw the box plot, then write an analysis by filling in the gaps. girls boys 14 15 16 17 18 7 9 3 6 7 7 9 1 3 4 5 6 6 7 9 1 2 2 3 3 4 5 6 8 9 1 1 2 5 2 1 9 8 8 6 3 2 8 7 5 3 1 1 1 0 0 7 6 6 4 3 3 9 7 4 16 3 = 163cm minimum = 141 cm minimum = 147 cm Girls: Boys: LQ = 158 cm LQ = 160 cm median = 161 cm median = 168 cm UQ = 174 cm UQ = 174.5 cm maximum = 189 cm maximum = 182 cm IQR = 16 cm IQR = 14.5 cm range = 48 cm range = 35 cm

  30. girls 190 185 180 175 170 Height (cm) 165 160 155 150 145 140 Heights of year 9 students boys boys taller girls On average, the ………..are………………… than the…………. as the median for the boys is………………compared with the girls median of………….. 161cm 168cm girls spread range The…………………of heights is greater for the…………….as their …………… is…………………..compared to the………………. range of…………………… 48cm boys 35cm skewed The shape of the data for the girls is……………………………….. and for the boys is……………………………………(as seen from the box plot) fairly symmetrical the boys are generally taller than the girls. In conclusion……………………………………………………………………………

  31. 2) The following is a back-to-back stem and leaf graph comparing the weights of the students in 2 classes. Calculate the relevant statistics, draw the box plot, then write an analysis by filling in the gaps. Class J Class K 7 6 5 4 3 0 5 0 1 3 4 6 3 6 6 7 7 8 9 2 2 3 4 7 7 8 9 9 5 8 1 0 6 4 2 1 9 9 7 6 5 1 0 0 7 5 4 2 2 0 8 7 7 6 6 3 = 63 kg minimum = 36 kg minimum = 35 kg Class J: Class K: LQ = 42 kg LQ = 45.5 kg median = 50.5 kg median = 56 kg UQ = 60 kg UQ = 60.5 kg maximum = 71 kg maximum = 75 kg IQR = 18 kg IQR = 15 kg range = 35 kg range = 40 kg

  32. 80 Class J Class K 75 70 65 60 Weight (kg) 55 50 45 40 35 30 Weights of students K J On average class…….. is heavier than class………because the median for class ….. is……………… compared with the ………………for class …… of………….. The ……………. of data for class K is greater as the ………….. for class K is………… compared with the ……………for class J of ……………. The data shape for class J is……………………………… The data shape for class K is ……………………………………………... K 56 kg median J spread range 50.5 kg 40 kg range 35 kg fairly symmetrical skewed class K is generally heavier than class J In conclusion………………………………………………………………

  33. 3) Here is a back-to-back stem and leaf graph showing the time in minutes for competitors to complete a cross country race. It compares the time of those competitors shorter than 165 cm with those taller than 165cm. It is using a split stem. Calculate the relevant statistics, draw a box plot and then write a report. Shorter than 165cm 165cm or taller 4 0 9 8 7 4 3 3 2 9 8 7 7 6 5 3 3 2 2 2 1 1 0 9 8 6 5 3 2 0 7 6 6 5 5 4 4 1 1 5 7 8 9 0 1 1 1 2 2 2 2 3 3 8 8 9 0 0 1 1 1 2 2 2 3 3 4 7 8 8 9 3 4 4 Key: 5 2 = 52 mins

  34. Shorter students 80 75 70 65 60 Time (mins) 55 50 45 40 35 30 Times to run a race minimum = 40 mins Shorter: LQ = 50 mins Taller students median = 54 mins UQ = 63 mins maximum = 74 mins IQR = 13 mins range = 34 mins minimum = 43 mins Taller: LQ = 50.5 mins median = 58 mins UQ = 62 mins maximum = 71 mins IQR = 11.5 mins range = 28 mins

  35. Report: On average the taller students were slower than the shorter students as the median for the taller students was 58 mins compared to the shorter students median of 54 mins. The spread of results was greater for the shorter students as they had a range of 34 mins whereas the taller students had a range of 28 mins. The data for the shorter students is uni-modal and slightly skewed. The data for the taller students is bi-modal (as shown by the stem & leaf graph. In conclusion, the height of the students does not greatly affect the running speeds. Pg 326 Problems 1 & 2 Pg 297 Opening Problem B Pg 324 # 4

  36. Misleading Graphs should be should be Pg 329 Ex 10L

  37. How students of Sancta Maria College travel to school by bicycle Pie Graphs Because there is 360o around a circle 80° 120° 96° 64° On Calculator: 70 ÷ 315 x = 360

  38. 1998 1999 2000 2001 2002 Time Series Data Data that is collected over time, at regular intervals, is often called ‘time series data’. The data is usually presented on a line graph, with time on the horizontal axis. eg. the following graph shows the number of visitors staying at a motel over a 5 year period. A time series is used to identify trends and patterns in data over a period of time so as to predict future movements.

  39. Long Term Trend: Whether the measurements are increasing, decreasing or staying fairly constant overall. Seasonal Variations: These are the up and down patterns which recur over a year, month, week or day Short Term Features: these are irregular fluctuations, unexpected results, outliers.

  40. 1) Here is a back-to-back stem and leaf graph of the heights of boys and girls in a year 9 class. Work out the relevant statistics, draw the box plot, then write an analysis. girls boys 14 15 16 17 18 7 9 3 6 7 7 9 1 3 4 5 6 6 7 9 1 2 2 3 3 4 5 6 8 9 1 1 2 5 2 1 9 8 8 6 3 2 8 7 5 3 1 1 1 0 0 7 6 6 4 3 3 9 7 4 16 3 = 163cm 2) The following is a back-to-back stem and leaf graph comparing the weights of the students in 2 classes. Calculate the relevant statistics, draw the box plot, then write an analysis. Class J Class K 0 5 0 1 3 4 6 3 6 6 7 7 8 9 2 2 3 4 7 7 8 9 9 5 8 7 6 5 4 3 1 0 6 4 2 1 9 9 7 6 5 1 0 0 7 5 4 2 2 0 8 7 7 6 6 3 = 63 kg

  41. 3) Here is a back-to-back stem and leaf graph showing the time in minutes for competitors to complete a cross country race. It compares the time of those competitors shorter than 165 cm with those taller than 165cm. It is using a split stem. Calculate the relevant statistics, draw a box plot and then write a report. Shorter than 165cm 165cm or taller 4 0 9 8 7 4 3 3 2 9 8 7 7 6 5 3 3 2 2 2 1 1 0 9 8 6 5 3 2 0 7 6 6 5 5 4 4 1 1 5 7 8 9 0 1 1 1 2 2 2 2 3 3 8 8 9 0 0 1 1 1 2 2 2 3 3 4 7 8 8 9 3 4 4 Key: 5 2 = 52 mins

More Related