260 likes | 343 Views
Chapter 1. Why Statistics?. Learning can result from:. Critical thinking Asking an authority Religious experience However, collecting DATA is the surest way to learn about the world. Data in the Sciences are messy . At first glance, data often look like an incoherent jumble of numbers
E N D
Chapter 1 Why Statistics?
Learning can result from: • Critical thinking • Asking an authority • Religious experience However, collecting DATA is the surest way to learn about the world
Data in the Sciences are messy • At first glance, data often look like an incoherent jumble of numbers • How do we make sense of data? Statistical procedures are tools for learning about the world by Learning from Data.
Real Data! • To help you understand the power and usefulness of statistics, we will explore two real and interesting data sets • “The Smoking Study” • “The Maternity Study”
The Smoking Study • From the University of Wisconsin Center for Tobacco Research and Intervention • 608 participants provided data on smoking, addiction, withdrawal, and how best to quit smoking • The full data set is provided on the CD, a description of the data collected in provided in the appendices of the book
The Maternity Study • From Wisconsin Maternity Leave and Health Project • 244 families provided data on marital satisfaction, child-rearing styles, and other household events • The full data set is provided on the CD, a description of the data collected in provided in the appendices of the book
Variability • Why are data messy? • Consider a concrete example: Depression scores (“CESD”) for participants in the Smoking Study • Some participants (each has a different ID number) have CESD scores of 0, while others have scores of 2, 11 or 7, or some other value • These data are messy in that the scores are different from one another • Variability is the statistical term for the degree to which scores (such as the depression scores) differ from one another.
Sources of Variability • It is easy to see that depression scores are variable, by why? • Individual differences • Some people are more depressed than others • Some people have difficulty reading the and understanding the questions on the test • Some people answer the questions more honestly than others • Procedure • Differences in the ways the data were collected • Conditions or Treatments • The conditions that are imposed on the participants of the study
Populations and Samples • Statistical Population – a collection or set of measurements of a variable that share some common characteristic • Sample – a subset of measurements from a population • Random sample – a sample selected such that every score in the population has an equal chance of being included
Chapter 2 Frequency Distributions and Percentiles
Variability (revisited) • Collecting Data means measuring a variable • Those measurements differ (vary) from one another • One way to organize and summarize a set of measurements is to construct a frequency distribution • These methods can be applied to both populations and samples
Example YRSMK – Number of Years Smoking Daily From the First 60 Participants in the Smoking Study
Example YRSMK – Number of Years Smoking Daily From the First 60 Participants in the Smoking Study
A Better Summary? YRSMK – Number of Years Smoking Daily From the First 60 Participants in the Smoking Study
Percentiles • We have been focusing on distributions rather than individual scores • Sometimes, individual scores are of great importance • Computing Percentiles, when n=608 • The 50-th percentile is the “middle” score. It is the 304-th sorted score. • The 32-th percentile is the 608*0.32=194.56, i.e., the 195-th sorted score.
Percentile Rank • The percentile rank of a score is the percent (the proportion times 100) of the measurements in the distribution below that score value • Computing percentile rank for YRSMK: • Sort the variable, called YRSMK_sorted • The percentile rank of 9 is 50/608 = 0.082, so it is the 8-th percentile • The percentile rank of 21 is 246/608 = 0.4046053, so it is the 40-th percentile
Graphing Distributions • Graphing distributions is a very valuable tool for highlighting features of the data • Shape • Range • Central Tendency • Variability
Shape • We classify the shape of distributions in three ways: • Symmetry – is one half a mirror image of the other half? • Skew – are there high/low frequencies of low/high scores? • Modality – how many humps or modes?
Symmetry • Is one half of the distribution a mirror image of the other (along a vertical axis)? • Three examples of symmetrical distributions:
Skew • Negative – low frequencies of low values and high frequencies of high values • Positive – high frequencies of low values and low frequencies of high values
Modality • How many humps (or modes)? Unimodal Bimodal
Characterizing Shape Asymmetric Negatively Skewed Bimodal Asymmetric Positively Skewed Unimodal
Central Tendency and Variability • In addition to shape, distributions differ in terms of: • Central Tendency - scores near the center of the distributions; where the scores “tend” to be • Variability – the degree to which scores differ from one another; the “spread” of the scores
Comparing Distributions • It is very useful to be able to compare and contrast (name similarities and differences) of distributions • Distributions can differ in terms of shapes, central tendencies, and variability
Comparing Distributions How do these distributions differ?