150 likes | 260 Views
Univariate Descriptive Statistics And Basics of Normal Distributions. Probability and Statistics. Statistics deal with what we observe and how it compares to what might be expected by chance . For now, we especially care about the normal (Gaussian) distribution. The Normal Curve.
E N D
Univariate Descriptive Statistics And Basics of Normal Distributions
Probability and Statistics • Statistics deal with what we observe and how it compares to what might be expected by chance. • For now, we especially care about the normal (Gaussian) distribution
Describing Simple Distributions of Data • Central Tendency • Some way of “typifying” a distribution of values, scores, etc. • Mean • sum of scores divided by number of scores • Median • middle score, as found by rank • Mode • most common value from set of values • In a normal distribution, all 3 measures are equal.
Special Features of the Mean • Sum of the deviations from the mean of all scores = zero. • It is the point in a distribution where the two halves are balanced.
Using Central Tendencies in Recoding • “splitting” metrics into binary variables • High/Low (mean or median) • Most common, least common (mode) • “collapsing” variables (less from more) • Groups of scores in different ranges above and below the mean (eg., Age in years recoded as teenagers, young adult, adult, elder adult, etc).
Dispersion • Range • Overall measure of distance between all values in a variable. • Difference between highest value and the lowest value. • Standard Deviation • A measure of spread • A statistic that describes how tightly the values are clustered around the mean. • Variance • A measure of spread • Computed as the average squared deviation of each value from its mean
Properties of Standard Deviation (S.D.) • If a constant is added to all scores, it has no impact on S.D. • If a constant is multiplied to all scores, it will affect the dispersion (S.D. and variance) • Remember, variance is just the square of the S.D. (or, S.D is the square root of the variance) S = standard deviationX = individual scoreM = mean of all scoresn = sample size (number of scores)
Common Data Representations • Histograms (hist command in STATA) • Simple graphs of the density or frequency • With density, area comes out in percent and total area = 100% • Box Plots (graph box command in STATA) • Yet another way of displaying dispersion.
Issues with Normal Distributions • Skewness • Kurtosis
In-Class Examples in STATA Using GSS93_data.dta from Resources Page on course website Also, look at notes on syllabus for today’s lecture.