350 likes | 366 Views
Descriptive Statistics becoming familiar with the data. The Strategies. Initial Screening Levels of Measurement Five Descriptive Questions Graphical Presentations Search for Outliers. Initial Screening. Missing values Defining labels Key punch errors Valid values
E N D
The Strategies • Initial Screening • Levels of Measurement • Five Descriptive Questions • Graphical Presentations • Search for Outliers
Initial Screening • Missing values • Defining labels • Key punch errors • Valid values • Understanding what you have • Understanding the population, sampling frame, and sample
Levels of Measurement • Nominal • Ordinal • Interval • Ratio • Determining what statistics are appropriate
Nominal • Naming things. • Creating groups that are qualitatively different or unique… • But not necessarily quantitatively different.
Nominal • Placing individuals or objects into categories. • Making mutually excusive categories. • Numbers assigned to categories are arbitrary.
Nominal • Sample variables: • Gender • Race • Ethnicity • Geographic location • Hair or eye color
Ordinal • Rank ordering things. • Creating groups or categories when only rank order is known. • Numbers imply order but not exact quantity of anything.
Ordinal • The difference between individuals with adjacent ranks, on relevant quantitative variables, is not necessarily the same across the distribution.
Ordinal • Sample variables: • Class Rank • Place of finish in a race (1st, 2nd, etc.) • Judges ratings • Responses to Likert scale items (for example – SD, D, N, A, SA)
Interval • Orders observations according to the quantity of some attribute. • Arbitrary origin. • Equal intervals. • Equal differences expressed as equal distances.
Interval • Sample variables: • Test Scores • SAT • GRE • IQ tests • Temperature • Celsius • Fahrenheit
Ratio • Quantitative measurement. • Equal intervals. • True zero point. • Ratios between values are useful.
Ratio • Sample variables: • Financial variables • Finish times in a race • Number of units sold • Test scores scaled as percent correct or number correct
Levels of Measurement Review • What level of measurement? • Today is a fall day. • Today is the third hottest day of the month. • The high today was 70o Fahrenheit. • The high today was 20o Celsius. • The high today was 294o Kelvin.
Levels of Measurement Review • What level of measurement? • Student #1256 is: • a male • from Lawrenceville, GA. • He came in third place in the race today. • He scored 550 on the SAT verbal section. • He has turned in 8 out of the 10 homework assignments.
Levels of Measurement Review • What level of measurement? • Student #3654 is: • in the third reading group. • Nominal? • Ordinal? • Interval? • Ratio?
Five Descriptive Questions • What is the middle of the set of scores? • How spread out are the scores? • Where do specific scores fall in the distribution of scores? • What is the shape of the distribution? • How do different variables relate to each other?
Five Descriptive Questions • Middle • Spread • Rank or Relative Position • Shape • Correlation Descriptive Statistics Answer Sheet Descriptive Questions in Excel, SPSS, and TI-83
Middle • Mean • Median • Mode
Spread • Standard Deviation • Variance • Range • IQR
Rank or Relative Position • Five number summary • Min, 25th, 50th, 75th, Max • Identifying specific values that have interpretive meaning • Identifying where they fall in the set of scores • Box plots • Outliers
Shape • Positive Skewness • Negative Skewness • Normality • Histograms
Correlation • Direction of Relationships • Positive or Negative • Magnitude of Relationships • Weak , Moderate, Strong • Scatterplots • Outliers
Outliers • Boxplot shows middle 50% of scores as the box. • Q3 (75th) – Q1 (25th) = IQR • Data outside 1.5 IQR rule are outliers • Q1 – (1.5*IQR) • Q3 + (1.5*IQR)
Outliers • If normality of the population can be assumed, other rules can be used. • Mean +/- 2 SDs or Mean +/- 3 SDs • Empirical Rule • Approximately 68% within +/- 1 SD • Approximately 95% within +/- 2 SD • Approximately 99% within +/- 3 SD
Outliers • You can also look at outliers in the bivariate case. • Examine the scatterplots for values out of the pattern.