330 likes | 416 Views
Clinical Research Management 512. Leslie McIntosh l mcintosh at path.wustl.edu. Lecture 8. Overview of Statistics Part 1 Vocabulary Levels of Measurement Measures of Central Tendencies Part 2 Confidence Intervals Part 3 Dataset Introduction Homework. Quick Overview. Statistics.
E N D
Clinical Research Management 512 Leslie McIntosh lmcintosh at path.wustl.edu
Lecture 8 • Overview of Statistics • Part 1 • Vocabulary • Levels of Measurement • Measures of Central Tendencies • Part 2 • Confidence Intervals • Part 3 • Dataset Introduction • Homework
Statistics • The study of inference – taking information from a small amount of data (sample) and making generalizations to a larger setting (population) • Extrapolating from a sample to a population
Epidemiology • The study of the distribution and determinants of disease frequency in human populations and the application of this study to control health problems. • Keywords: • Population • Disease Frequency • Disease Distribution • Disease Determination • Disease Control • Health Promotion Aschengrau & Seage: Essentials of Epidemiology in Public Health
Part 1 Vocabulary Levels of Measurement Measures of Central Tendencies
Vocabulary • Measures of Central Tendencies • Mean • Median • Mode • Range • Levels of Measurement • Continuous • Discrete (aka: Categorical) • Dichotomous • Nominal • Ordinal
Levels of Measurement (Variable Types) • Continuous – variables that can take on any value in a given range • Discrete (aka: Categorical) – variables that have a limited number of values • Dichotomous – two values • Nominal – multiple values where order does not matter • Ordinal – multiple values where does matter
Activity: Variable Types • What is your gender? • What is your age? • What is the highest level of education you have completed? • What is your own yearly income? • What is your total household income, including all earners in your household? • What is your current marital status? • What is your religious affiliation? • What is your race?
Measures of Central Tendencies • Mean • Sum of all numbers divided by the total numbers within the distribution • Median • Midpoint of a distribution • Mode • Most frequent number in the distribution • Range • The difference from the smallest number to the largest number.
Part 2 Confidence Intervals
Confidence Interval (CI) • Sometimes referred to as the margin of error. • 95% CI denotes a 5% chance that the range will not include the true population value. • The scientist/researcher can be 95% sure that the 95% CI includes the true population value.
Confidence Interval • Provides range of values • Based on observations from 1 sample. • Gives Information about closeness to unknown population parameter • Stated in terms of probability • Never 100% sure
Confidence Interval Assumptions • Sample was randomly selected • Observations (values collected) were independent • Assessment was accurate
Confidence Interval CI of a Proportion Assumptions • Provides range of values • Based on observations from 1 sample. • Gives information about closeness to unknown population parameter • Stated in terms of probability • Never 100% sure • Sample was randomly selected • Observations (values collected) were independent • Assessment was accurate
Approximate 95% CI of Proportion • p = proportion (e.g. 6/10 = 0.60) • N = sample size to
Confidence Interval Practice • 12 out of 45 statistics students succumbed to the flu. What is the 95% CI of getting the flu? • What assumptions are being made? • http://www.mccallum-layton.co.uk/stats/ConfidenceIntervalCalcProportions.aspx
Confidence Interval Practice • 4% of my students e-mailed me stating they did not like the last lecture. What is the 95% CI? • What assumptions are being made?
What affects the Confidence Interval of a Proportion? • Sample size • Confidence level (e.g. 90%, 95%, 99%)
Demonstration • Schultz, Eric. "Confidence Intervals: Confidence Level, Sample Size, and Margin of Error" from the Wolfram Demonstrations Project?http://demonstrations.wolfram.com/ConfidenceIntervalsConfidenceLevelSampleSizeAndMarginOfError/
Confidence Interval of the Mean • Assumptions: • Sample is randomly selected from the population. • The population is distributed in a Gaussian manner. • All subjects are from the same population. • Subjects are selected independently of one another.
Normal Curve 99.7 % Mean 68 % 95 % δ δ δ δ δ δ δ δ δ
Calculating the CI of the Mean • The sample mean It is our best guess of the _________ mean. • VariabilityThis is the standard deviation (SD). When data have a lot of variability, the mean of the sample will more likely be ________ from the population mean.
Calculating the CI of the Mean • Sample SizeWhen the sample size is large, you expect the sample mean to be ________ to the population mean and the CI to be ________. • Confidence LevelTypically 99%, 95%, or 90%. If you want more confidence, you must generate a _______ CI.
Approximate 95% CI of Mean • m = sample mean • s = standard deviation of the sample • N = sample size to
Approximate 90% & 99% CI of Mean 90% CI to 99% CI to
Approximate 95% CI of Proportion • p = proportion (e.g. 6/10 = 0.60) • N = sample size to
Central Limit Theorem • No matter how the original population is distributed, the distribution of sample means will approximate a Gaussian (normal) distribution if the samples are large enough. • http://www.socr.ucla.edu/htmls/SOCR_Experiments.html
Demonstration • Schultz, Eric. "Confidence Intervals: Confidence Level, Sample Size, and Margin of Error" from the Wolfram Demonstrations Project?http://demonstrations.wolfram.com/ConfidenceIntervalsConfidenceLevelSampleSizeAndMarginOfError/
Part 3 Dataset Introduction Homework
Activity • Create a dataset • Central tendencies exercises
Central Tendencies Homework • From 1996 to 2004, the ages of the best actors at the time of winning the award were 45, 59, 45, 42, 35, 46, 28, 42, and 36. The ages of the best actresses at the time of winning the award for this same time period were 39, 33, 25, 24, 32, 32, 34, 27, and 30. • These are two populations, one for best actors and one for best actresses. Why are these populations and not samples? • Find the mean age of women winning the Oscar for best actress from 1996 to 2004. • Find the median age of men winning the Oscar for best actor from 1996 to 2004. • Find the median age of women winning the Oscar for best actress from 1996 to 2004. • Is the mode a useful measure of central tendency for either population? Explain. • Which of the measures of central tendency provides the best measure of the middle of these two population distributions? Explain.