1 / 67

Lecture 3 Summarising Data

Lecture 3 Summarising Data. For use in fall semester 2015 Lecture notes were originally designed by Nigel Halpern. This lecture set may be modified during the semester. Last modified: 4-8-2015. Lecture Aim & Objectives. Aim

Download Presentation

Lecture 3 Summarising Data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Lecture 3Summarising Data For use in fall semester 2015 Lecture notes were originally designed by Nigel Halpern. This lecture set may be modified during the semester. Last modified: 4-8-2015

  2. Lecture Aim & Objectives Aim • To investigate pictorial & statistical methods of analysing quantitative data Objectives • Pictorial representation of quantitative data • Statistical representation of quantitative data

  3. Pictorial Representation • Levels of measurement • Tables & frequency distributions • Charts, plots, graphs & pie-charts

  4. 3 (4) Levels of Measurement • Nominal variables • Ordinal variables • Interval (& ratio) variables

  5. Nominal • Categories • e.g. gender (m/f), responses (y/n), class of travel (b/l) • Usually presented as frequencies & categories or %’s • e.g. 45% male, 55% female • Measure the existence (or not) of a characteristic • But contain limited information

  6. Ordinal • Ordered categories or preferences • e.g. ranked responses from a Likert scale • e.g. finishers in a race (1st, 2nd, 3rd, etc) • e.g. preferred aircraft • Measure intensity, order or degree • But still limited as they don’t imply distances • i.e. distance between 1st & 2nd

  7. Interval & Ratio • Ordered & scaled (on equal intervals) • e.g. age in years, temperature • Measures differences between values • Interval: arbitrary zero • e.g. temperature (+/-) • Ratio: absolute zero indicating absence of that variable • e.g. age, income • High analytical capabilities • e.g. can compare means unlike for nominal or ordinal data

  8. Levels of Measurement Summary

  9. Your turn….. • What levels of measurement would be derived from each of the following questions • Gender (male/female) • Age in years and months (state years/months) • Do you smoke (yes/no) • How many cigarettes, on average, do you smoke a day (state no.) • Number of full years you’ve been smoking (state no.) • How many minutes exercise do you do, on average, each day (less than 30mins / 30-59mins / 60+mins) • To what extent do you think that smoking is bad for your health (Strongly agree / tend to agree / neither / tend to disagree / strongly disagree) • Rank the cigarette brands in order of quality (B&H, Silk Cut, Marlborough)

  10. Tables • Most straight forward pictorial representation • Good method of storing information • Summarises &/or shows patterns in data • Easily made using word-processing or spreadsheets • Confusing if constructed poorly • Confusing if they try to show too much

  11. Table Considerations • Should be clear & appropriate • Should be chosen with a purpose in mind • Not just for the sake of it • Must include a title & a source of data • Must be referenced & discussed in the text • Don’t assume that everyone will understand them

  12. Table Clarity • Use a common system of data presentation • Use percentages rather than raw scores for clarity & comparative capabilities The above points are particularly relevant if the table includes more than one variable calculated using different units of measurement (AKA ‘cross-tabulation’)

  13. Data from a survey of pax at LGW, LHR & MAN (CAA, 2000): - 34,650 Business Pax: A/B=18,607; C1=14,345; C2=1,386; D/E=312 - 130,350 Leisure Pax: A/B=43,407; C1=52,400; C2=21,508; D/E=13,035 Use percentages instead?

  14. Data from a survey of pax at LGW, LHR & MAN (CAA, 2000): - 34,650 Business Pax: A/B=18,607; C1=14,345; C2=1,386; D/E=312 - 130,350 Leisure Pax: A/B=43,407; C1=52,400; C2=21,508; D/E=13,035 Easier to interpret?

  15. Frequency Distributions • Standard frequency distribution • Univariate frequency distribution • Grouped frequency distribution • Relative & cumulative frequency distribution

  16. Standard Frequency • Standard frequency distribution • Presents data • e.g. “How many return flights did you take last year?” • Answers from 50 pax as a standard frequency distribution: Number of return flights taken last year: 7 3 10 3 2 4 3 3 6 3 5 2 3 4 2 5 4 3 6 8 4 12 1 3 4 15 5 1 3 1 4 2 3 5 2 3 8 3 4 4 6 3 5 2 4 2 3 2 5 1

  17. Univariate Frequency • Univariate frequency distribution • Lists data more clearly & with their frequency • Important for large sample sizes

  18. Grouped frequency distribution Groups all data according to categories Further improves clarity Grouped Frequency

  19. Relative & Cumulative Frequency • Relative & cumulative frequency distributions • Relative: each category as a % of the total • Cumulative: add each relative to proceeding

  20. Too many numbers…?

  21. Charts, Plots, Graphs & Pie-charts • Simple bar charts • Compound bar charts • Histograms • Scatter or dot plots • Line graphs • Pie-charts

  22. Charts, Plots, Graphs & Pie-charts:Pros & Cons • Easily made using word-processing or spreadsheets • Ease of creation can lead to over-elaborate charts at the expense of clarity

  23. Charts, Plots, Graphs & Pie-charts:Considerations • Should be clear & appropriate • Should be chosen with a purpose in mind • Not just for the sake of it • Typically include • Title • Labelled axis • Key that explains the different segments • Source of data • Must be referenced & discussed in the text • Do not assume that everyone will understand them • Data type will restrict which method is chosen

  24. Simple Bar Charts • Simple bar charts • Horizontal or vertical charts of separate bars that represent size of data

  25. Simple Bar Charts • Figure 1. Student results for SCM300 in 2007

  26. Compound Bar Charts • Compound bar charts • Show proportions/relative size of groups • Bars will always have same height when % are used but not when figures are used • For 3+ components, pie-charts may be better

  27. Compound Bar Charts • Figure 1. Student results for SCM300 in 2007

  28. Histograms • Histograms • Similar to bar charts but a better indication of variation & distribution • Bars are connected instead of separate

  29. Histograms • Figure 1. Student results for SCM300 in 2007

  30. This figure indicates repeat visits to Norway & tourists interest in returning but is it easy to understand…..?

  31. Scatter or Dot Plots • Scatter or dot plots • Illustrate the exact distribution of data • Can be used to illustrate continuous data • BUT a line graph may be better • Effective for 2 related variables

  32. Scatter or Dot Plots • Figure 1. Passengers & Aircraft Movements at HiMolde Airport

  33. Line Graphs • Line graphs • Show trends over time • e.g. patterns, peaks & troughs, rates of incline/decline • Can show more than 1 variable at a time • This can indicate possible relationships • e.g. see next slide

  34. Pie Charts • Pie-charts • Segments represent cases in each category • Best for 3-6 categories (no more, no less) • Labelling & shading sometimes difficult • Combining categories may improve clarity but loses detail

  35. Pie Charts

  36. Pie ChartsToo many pies……..?

  37. Charts, Plots, Graphs & Pie-chartsSummary

  38. Statistical Representation • Measures of central tendency • Measures of dispersion • Normal distribution & skew

  39. Measures of Central Tendency • Raw data can be confusing & meaningless • Measures of central tendency • AKA measures of location or average • Present the data in 1 single number • 3 different measures depend on intention or data • See next slide

  40. Measures of Central Tendency

  41. Example Age of students

  42. Measures of Dispersion • Measures of central tendency don’t show: • How closely related values are (i.e. clustered) • How representative they are of the data set • The range of values • The degree of distortion by extreme values Salaries of office staff at HiMolde Airways: ·£11k, £15k, £15k, £18k, £25K, £30k, £32k, £38k Salaries of office staff at HiMolde Airport: ·£20k, £21k, £22k, £23k, £23k, £24k, £25k, £26k Mean salary at HiMolde Airways = £23k (£184k/8) Mean salary at HiMolde Airport = £23k (£184k/8)

  43. Measures of Dispersion • Range • Inter-quartile range • Standard deviation

  44. Range • Simplest & crudest measure of dispersion • Indicates spread of data • Places values in ascending order • Then subtracts smallest from the largest value • Extreme values affect (determine) the outcome • Range gives a greater insight into a data set • But gives no indication of the clustering of individual values

  45. Range Salaries of office staff at HiMolde Airways: ·    - £11k, £15k, £15k, £18k, £25K, £30k, £32k, £38k Salaries of office staff at HiMolde Airport: ·    - £20k, £21k, £22k, £23k, £23k, £24k, £25k, £26k Range of salaries at HiMolde Airways = £38k - £11k = £27k Range of salaries at HiMolde Airport = £26k - £20k = £6k

  46. Inter-Quartile Range • Most appropriate when using ordinal data • Divides values into 4 equal parts (quartiles) • Is an extension of the idea of the median • Represents the middle 50% of the values that fall between the 1st & 3rd quartiles • Not affected by extremes • BUT doesn’t utilise all values • It discards 50% of the values & therefore provides a limited picture of the degree of clustering

  47. Min. value Q1 Q2 Q3 Max. value Inter-Quartile Range Median value 1st 25% cases 2nd 25% cases 3rd 25% cases 4th 25% cases Inter-Quartile Range

  48. Inter-Quartile Range Salaries of office staff at HiMolde Airways: ·    - £11k, £15k, £15k, £18k, £25K, £30k, £32k, £38k Salaries of office staff at HiMolde Airport: ·    - £20k, £21k, £22k, £23k, £23k, £24k, £25k, £26k IQ Range of salaries at HiMolde Airways = £15-£31 IQ Range of salaries at HiMolde Airport = £22-£24

  49. Standard Deviation • Widely used in quantitative research • Most useful measure of dispersion • Utilises all data in the distribution • Compares each value in the distribution with the mean • It examines the variance of the data around the mean • Therefore saying something about how representative the mean is for the data set

More Related