390 likes | 451 Views
N318b Winter 2002 Nursing Statistics. Lecture 2 : Measures of Central Tendency and Variability. Today’s Class(es). mean, median, mode range, standard deviation, variance << 10 min break >> Some examples Applying knowledge to assigned readings (Arathuzik; Hayman et al.).
E N D
N318b Winter 2002 Nursing Statistics Lecture 2:Measures of Central Tendency and Variability
Today’s Class(es) • mean, median, mode • range, standard deviation, variance << 10 min break >> • Some examples • Applying knowledge to assigned readings (Arathuzik; Hayman et al.) Followed by small groups from 12-2 PM focuses on determining and interpreting measures of central tendency and dispersion
A Quick Review from Last Week - 1 Measurement Scales Nominal data Ordinal data Interval data Ratio data Variable Types Dependent Independent
Measures of Central Tendency A basic cornerstone of most research statistics is that numeric data points tend to group together, usually in identifiable (predictable) ways – i.e. they tend to congregate around a common value • mean • median • mode You should know what these three things are and how they differ from each other
Mean Most appropriate for ratio or interval data (i.e. continuous numeric data) but not if strongly skewed = (x1 + x2 + x3 + xn ) / N Where x1 + x2 + x3 + xn are independent data points and N is the total number of data points Note: x1 + x2 + x3 + xn also written as “X”
Some Properties of the Mean • All data points contribute to its value • Sensitive to extreme values • Sum of deviations always zero i.e. (x-)=0 • Sum of squared deviations at a minimum - i.e. (x-)2 lowerfor mean than other terms • Mean is algebraic thus it can be manipulated making it more useful statistically • When sample large enough (e.g. >25) it does a good job estimating true population mean
Median Most appropriate for ratio data (i.e. continuously scaled) even if skewed median = mid-point of distribution (i.e. the 50th percentile) Divides the data into two equally sized groups (i.e. same frequency or count in each)
Some Properties of the Median • Typically not calculated as it is simply the mid-point (but data must be sorted/ordered) • Median not sensitive to extreme values thus useful if data skewed • Not used with nominal data since it requires data to have an order • Does not have to actually exist as a data point (e.g. mid-point between adjacent data points)
Mode Typically more useful for grouped data (i.e. ordinal or re-scaled continuous data) mode = most common value Has descriptive value but it is not a widely used statistic
Some Properties of the Mode • Not calculated (but observed) • If all values unique then no mode • May be more than one mode (e.g. bimodal, trimodal, etc.) • Only measure of central tendency for strictly nominal data
Mean, Median and Mode When distribution of data points is very even (i.e. normally distributed), then the three converge centrally Mean, median, mode all in same position in a perfect distribution
Mean, Median and Mode “real” data points rarely (never!) perfectly normally distributed thus typically some differences do exist Median Sample “left” skewed as mean is less than median Mean Mode
Mean, Median and Mode Age groups Group 1 = (11, 12, 13, 13, 14, 15) 1= 13 Group 2 = (11, 12, 13, 13, 14, 25) 2= 17 Mean affected by extreme value Median is 13 – divides data in half Mode is 13 – most common value
Measures of Dispersion The “Flip-side”: Viewed another way, most research statistics that are numeric data points also tend to vary from each other, usually in identifiable (predictable) ways – i.e. they tend to be spread out • Standard deviation • Variance • Percentiles • Range You should know what these four things are and how they differ from each other
Dispersion (or spread) Two samples with the same mean can have very different dispersion Sample B: More dispersed Mean Sample A: Less spread, SD of A < SD of B Sample A measured more precisely?
+1 SD -1 SD 1 SD either side of mean includes about 68% of sample -2 SD +2 SD 2 SD includes about 95% of sample Standard deviation Mean
Standard deviation Key indicator of the average point deviation from the sample mean SD = (x-)2 / N-1 SD - most important dispersion measure If SD is low relative to the mean then measure is more precise (see coefficient of variation in textbook)
Other measures of deviation Variance: squared deviations from mean; important for later methods Range: maximum value - minimum value; useful for describing sample Percentiles: Value above which and below which a certain proportion of the sample falls
Example 1 Assignment #1 Marks
What happens if we remove the zeros – i.e. the most influential (outlying) observations?
Example 2 Assign #1 – Zeros dropped
Arathuzik (1994) Quick summary of the paper: – a pilot study examining the effects of a combination of interventions on pain perception, pain control and mood in metastatic breast cancer patients – pre-test / post-test experimental design – 3 groups enrolled with 24 (convenience sample) subjects randomly allocated to the intervention groups
A few questions … Q1. What do you think of the sample size Only 8 per group gives little chance to accurately address hypotheses What happens if you change age categories of only 2 subjects in Table 1? What about education level? Small samples are unstable !
extreme pain no pain 0 10 A few questions … Q2. How are the pain scales expressed? Visual analogue scales with 0 being no pain and 10 being extreme pain How are they treated in the analysis? Table 2 - Continuous data - this may make it even harder to see an effect since they are not very precise
Hayman et al. (1995) Quick summary of the paper: – matched pair analysis of twins to examine nongenetic influences of obesity on lipid profile and blood pressure both cross-sectionally (Phase 1, N=73 pairs) and longitudinally (Phase 2 , N=56 pairs)
A few questions … Q1. Describe the sample population in terms of race, age and sex? Did it change much over time? Race – all white, both Phases Age: at Phase 1: M=8.5 yrs, SD = 1.8 yrs at Phase 2: M=12.5 yrs, SD = 1.8 yrs Sex: at Phase 1: 43.8% male, 56.2% female at Phase 2: 44.6% male, 55.4% female
A few questions … Q2. how long was follow-up period p278 - “median interval between measurements was 40 months What does this mean? Roughly half the time periods were longer than 40 months and half were less than 40 months (i.e. it was the “dividing line”)
Next Week - Lecture 3: Graphs, Normal Curve and Central Limit Theorem • For next week’s class please review: • Page 13 in syllabus • Textbook Chapter 2, pages 46-57 • Textbook Chapter 3, pages 65-70 • Syllabus papers: • i) Kilpack (1991) • ii) Paulson & Altmaier (1995)
Workshop Rooms: H018, H19 and H9 MS016, MS017, MS018, MS022 MS023, MS027, MS028, and MS029 All rooms are now confirmed for rest of the year so please go to the same room with your group as last time
“In Group”Session – Q#1: 3rd column not necessary – i.e. no missing data !
A Quick Review from Last Week - 2 Summarizing Hypotheses • Null or Research? • Directional or Non-directional? • Causal or Associative? • Simple or Complex?