670 likes | 682 Views
Lecture 3 Summarising Data. For use in fall semester 2015 Lecture notes were originally designed by Nigel Halpern. This lecture set may be modified during the semester. Last modified: 4-8-2015. Lecture Aim & Objectives. Aim
E N D
Lecture 3Summarising Data For use in fall semester 2015 Lecture notes were originally designed by Nigel Halpern. This lecture set may be modified during the semester. Last modified: 4-8-2015
Lecture Aim & Objectives Aim • To investigate pictorial & statistical methods of analysing quantitative data Objectives • Pictorial representation of quantitative data • Statistical representation of quantitative data
Pictorial Representation • Levels of measurement • Tables & frequency distributions • Charts, plots, graphs & pie-charts
3 (4) Levels of Measurement • Nominal variables • Ordinal variables • Interval (& ratio) variables
Nominal • Categories • e.g. gender (m/f), responses (y/n), class of travel (b/l) • Usually presented as frequencies & categories or %’s • e.g. 45% male, 55% female • Measure the existence (or not) of a characteristic • But contain limited information
Ordinal • Ordered categories or preferences • e.g. ranked responses from a Likert scale • e.g. finishers in a race (1st, 2nd, 3rd, etc) • e.g. preferred aircraft • Measure intensity, order or degree • But still limited as they don’t imply distances • i.e. distance between 1st & 2nd
Interval & Ratio • Ordered & scaled (on equal intervals) • e.g. age in years, temperature • Measures differences between values • Interval: arbitrary zero • e.g. temperature (+/-) • Ratio: absolute zero indicating absence of that variable • e.g. age, income • High analytical capabilities • e.g. can compare means unlike for nominal or ordinal data
Your turn….. • What levels of measurement would be derived from each of the following questions • Gender (male/female) • Age in years and months (state years/months) • Do you smoke (yes/no) • How many cigarettes, on average, do you smoke a day (state no.) • Number of full years you’ve been smoking (state no.) • How many minutes exercise do you do, on average, each day (less than 30mins / 30-59mins / 60+mins) • To what extent do you think that smoking is bad for your health (Strongly agree / tend to agree / neither / tend to disagree / strongly disagree) • Rank the cigarette brands in order of quality (B&H, Silk Cut, Marlborough)
Tables • Most straight forward pictorial representation • Good method of storing information • Summarises &/or shows patterns in data • Easily made using word-processing or spreadsheets • Confusing if constructed poorly • Confusing if they try to show too much
Table Considerations • Should be clear & appropriate • Should be chosen with a purpose in mind • Not just for the sake of it • Must include a title & a source of data • Must be referenced & discussed in the text • Don’t assume that everyone will understand them
Table Clarity • Use a common system of data presentation • Use percentages rather than raw scores for clarity & comparative capabilities The above points are particularly relevant if the table includes more than one variable calculated using different units of measurement (AKA ‘cross-tabulation’)
Data from a survey of pax at LGW, LHR & MAN (CAA, 2000): - 34,650 Business Pax: A/B=18,607; C1=14,345; C2=1,386; D/E=312 - 130,350 Leisure Pax: A/B=43,407; C1=52,400; C2=21,508; D/E=13,035 Use percentages instead?
Data from a survey of pax at LGW, LHR & MAN (CAA, 2000): - 34,650 Business Pax: A/B=18,607; C1=14,345; C2=1,386; D/E=312 - 130,350 Leisure Pax: A/B=43,407; C1=52,400; C2=21,508; D/E=13,035 Easier to interpret?
Frequency Distributions • Standard frequency distribution • Univariate frequency distribution • Grouped frequency distribution • Relative & cumulative frequency distribution
Standard Frequency • Standard frequency distribution • Presents data • e.g. “How many return flights did you take last year?” • Answers from 50 pax as a standard frequency distribution: Number of return flights taken last year: 7 3 10 3 2 4 3 3 6 3 5 2 3 4 2 5 4 3 6 8 4 12 1 3 4 15 5 1 3 1 4 2 3 5 2 3 8 3 4 4 6 3 5 2 4 2 3 2 5 1
Univariate Frequency • Univariate frequency distribution • Lists data more clearly & with their frequency • Important for large sample sizes
Grouped frequency distribution Groups all data according to categories Further improves clarity Grouped Frequency
Relative & Cumulative Frequency • Relative & cumulative frequency distributions • Relative: each category as a % of the total • Cumulative: add each relative to proceeding
Charts, Plots, Graphs & Pie-charts • Simple bar charts • Compound bar charts • Histograms • Scatter or dot plots • Line graphs • Pie-charts
Charts, Plots, Graphs & Pie-charts:Pros & Cons • Easily made using word-processing or spreadsheets • Ease of creation can lead to over-elaborate charts at the expense of clarity
Charts, Plots, Graphs & Pie-charts:Considerations • Should be clear & appropriate • Should be chosen with a purpose in mind • Not just for the sake of it • Typically include • Title • Labelled axis • Key that explains the different segments • Source of data • Must be referenced & discussed in the text • Do not assume that everyone will understand them • Data type will restrict which method is chosen
Simple Bar Charts • Simple bar charts • Horizontal or vertical charts of separate bars that represent size of data
Simple Bar Charts • Figure 1. Student results for SCM300 in 2007
Compound Bar Charts • Compound bar charts • Show proportions/relative size of groups • Bars will always have same height when % are used but not when figures are used • For 3+ components, pie-charts may be better
Compound Bar Charts • Figure 1. Student results for SCM300 in 2007
Histograms • Histograms • Similar to bar charts but a better indication of variation & distribution • Bars are connected instead of separate
Histograms • Figure 1. Student results for SCM300 in 2007
This figure indicates repeat visits to Norway & tourists interest in returning but is it easy to understand…..?
Scatter or Dot Plots • Scatter or dot plots • Illustrate the exact distribution of data • Can be used to illustrate continuous data • BUT a line graph may be better • Effective for 2 related variables
Scatter or Dot Plots • Figure 1. Passengers & Aircraft Movements at HiMolde Airport
Line Graphs • Line graphs • Show trends over time • e.g. patterns, peaks & troughs, rates of incline/decline • Can show more than 1 variable at a time • This can indicate possible relationships • e.g. see next slide
Pie Charts • Pie-charts • Segments represent cases in each category • Best for 3-6 categories (no more, no less) • Labelling & shading sometimes difficult • Combining categories may improve clarity but loses detail
Statistical Representation • Measures of central tendency • Measures of dispersion • Normal distribution & skew
Measures of Central Tendency • Raw data can be confusing & meaningless • Measures of central tendency • AKA measures of location or average • Present the data in 1 single number • 3 different measures depend on intention or data • See next slide
Example Age of students
Measures of Dispersion • Measures of central tendency don’t show: • How closely related values are (i.e. clustered) • How representative they are of the data set • The range of values • The degree of distortion by extreme values Salaries of office staff at HiMolde Airways: ·£11k, £15k, £15k, £18k, £25K, £30k, £32k, £38k Salaries of office staff at HiMolde Airport: ·£20k, £21k, £22k, £23k, £23k, £24k, £25k, £26k Mean salary at HiMolde Airways = £23k (£184k/8) Mean salary at HiMolde Airport = £23k (£184k/8)
Measures of Dispersion • Range • Inter-quartile range • Standard deviation
Range • Simplest & crudest measure of dispersion • Indicates spread of data • Places values in ascending order • Then subtracts smallest from the largest value • Extreme values affect (determine) the outcome • Range gives a greater insight into a data set • But gives no indication of the clustering of individual values
Range Salaries of office staff at HiMolde Airways: · - £11k, £15k, £15k, £18k, £25K, £30k, £32k, £38k Salaries of office staff at HiMolde Airport: · - £20k, £21k, £22k, £23k, £23k, £24k, £25k, £26k Range of salaries at HiMolde Airways = £38k - £11k = £27k Range of salaries at HiMolde Airport = £26k - £20k = £6k
Inter-Quartile Range • Most appropriate when using ordinal data • Divides values into 4 equal parts (quartiles) • Is an extension of the idea of the median • Represents the middle 50% of the values that fall between the 1st & 3rd quartiles • Not affected by extremes • BUT doesn’t utilise all values • It discards 50% of the values & therefore provides a limited picture of the degree of clustering
Min. value Q1 Q2 Q3 Max. value Inter-Quartile Range Median value 1st 25% cases 2nd 25% cases 3rd 25% cases 4th 25% cases Inter-Quartile Range
Inter-Quartile Range Salaries of office staff at HiMolde Airways: · - £11k, £15k, £15k, £18k, £25K, £30k, £32k, £38k Salaries of office staff at HiMolde Airport: · - £20k, £21k, £22k, £23k, £23k, £24k, £25k, £26k IQ Range of salaries at HiMolde Airways = £15-£31 IQ Range of salaries at HiMolde Airport = £22-£24
Standard Deviation • Widely used in quantitative research • Most useful measure of dispersion • Utilises all data in the distribution • Compares each value in the distribution with the mean • It examines the variance of the data around the mean • Therefore saying something about how representative the mean is for the data set