410 likes | 1.13k Views
Descriptive Data Analysis. Hey, no-one said I’d have to do math in here!!. Sayings about statistics. Figures lie and liars figure. There are lies, damned lies, and statistics. Figures don’t lie, statisticians do.
E N D
Descriptive Data Analysis Hey, no-one said I’d have to do math in here!!
Sayings about statistics • Figures lie and liars figure. • There are lies, damned lies, and statistics. • Figures don’t lie, statisticians do. • Statistics always remind me of the fellow who drowned in a river whose average depth was only 3 feet. - Woody Hayes • Say you were standing with one foot in the oven and one foot in an ice bucket. According to the statisticians, you should be about perfectly comfortable. - Bobby Bragan, Milwaukee Braves
Population and Sample • Population: The entire group of individuals or objects that you wish to study • Sample: A subset or portion of the population that is taken to be representative of the population • Representative: To be able to make valid observations about the population, the sample must be representative (similar) to the population
Parameter and Statistic • Parameter: A value that is computed based on all the elements in the population (usually unknown) • Statistic: A value that is computed based upon the sample observations • We use statistics to estimate unknown population parameters
Descriptive vs. Inferential Statistics • Descriptive statistics:Used to describe the characteristics of a group. • Inferential statistics: Used to estimate something about a population based upon a sample drawn from that population.
Types of Scores • Continuous Scores: Have a potentially infinite number of values allowing variables to be measured with varying degrees of accuracy. • Discrete Scores: Limited to a specific number of values and not usually expressed as decimals or fractions.
Types of Scores • Nominal Scores: A set of mutually exclusive categories. No order to the categories. • Ordinal Scores: Ranking of a set of objects with regard to some characteristic. No meaningful distance between the ranks. • Interval Scores: The scores have a meaningful order and the units of measurement are equal distance apart on the scale. No true zero (total absence of the characteristic).
Types of Scores (cont’d) • Ratio Scores: Has all the properties of interval measurement but also has a true zero. Zero scores represent the total absence of the characteristic being measured.
Frequency Distribution • A simple way of making sense of a set of scores freq Scores
Frequency Distribution Terms • Tally: Recording a score in the proper interval • Frequency: The number of times a particular score appears • Cumulative Frequency: The number of observations at or below a given score. • Percent: Frequency expressed as a percent of the total. • Cumulative Percent: The percentage of observations occurring at or below a given score.
Measures of Central Tendency • Mean: The arithmetic average of a set of scores • Median (50th percentile): Divides the distribution in half • Mode: The most frequently occurring observation
Measures of Variability • Range: The difference between the smallest and the largest score • Sum of Squares: The sum of the squared deviations from the mean • Variance: The average squared deviation from the mean • Standard Deviation: Approximately the average deviation from the mean
Standard Scores Permit comparison of scores obtained using different units of measurement (e.g. sit-ups vs. sit-and-reach scores) • z-scores: Scores expressed in standard deviation units (can have negative numbers and decimal values) • T-scores: Derived from z-scores. Positive, whole numbers (mean-50, sd=10)
Correlation and Regression Are we having fun yet?
Correlation • Correlation is a statistical procedure used to determine the strength and nature of the linear relationship between 2 sets of scores obtained on the same group of individuals
Correlation Coefficient (r) • The coefficient of correlation can range from 0 – 1.00 and can be either positive or negative.
Positive or Direct Relationship • When high scores on one variable are associated with high scores on another variable. The coefficient of correlation will be positive.
Negative, Inverse or Indirect Relationship • When high scores on one variable are associate with low scores on another variable. The coefficient of correlation will be negative.
Interpreting Correlation Coefficients + .80 – 1.00 High + .60 - .79 Moderately High + .40 - .59 Moderate + .20 - .39 Low + .00 - .19 No linear relationship
Correlation Facts • The closer the coefficient of correlation is to 1.00, the stronger the relationship between the two variables. • The coefficient of determination (r2) tells you how much common variation there is between the two variables. • Correlation coefficient of .80 between years of education and healthcare expenditures translates to an r2 of .64. • This means that 64% of the variability in healthcare expenditures can be explained by years of education