200 likes | 213 Views
Learn about measures of central tendency, variation, and creating charts in sociology research using STATA. Understand how to interpret data distributions and calculate statistical indicators.
E N D
Sociology 601(Martin)Lecture for week 2: September 9 - 11 • Chapter 3.1: • Making Charts • Chapter 3.2 – 3.5 (if time permits) • Measures of central tendency • Measures of variation • Walk-through of the STATA graphic user interface.
Definitions for charts • frequency distribution: a graph listing intervals of possible values for a variable (on the x-axis), and number of observations in each interval (on the y-axis). • relative frequency distribution: as above, but the y-axis has the percent or proportion of observations in each interval. • bar graph: the variable is ordinal or nominal scale. • The bars should not touch • histogram: the variable is interval scale. • The bars should touch
General Rules for Relative Frequency Distributions • Whether you are making a bar graph or histogram: • Make sure each observation is in one and only one category. • Use categories of equal width. • Choose an appealing number of categories. • Decide whether to provide labels • Double-check your graph. • If you use fewer bars to describe the distribution of a variable, you lose information but gain clarity.
Example from Text, p. 36 • Murders per 100,000 population, by State for 1993
Frequency Distribution • Murders per 100,000 population for 1993, by State • What have we lost? What have we gained?
Relative Frequency Distribution • Murders per 100,000 population, by State
Collapsed Relative Frequency Distribution • Murders per 100,000 population, by State • What have we lost? What have we gained?
3.2: Measuring central tendency - mean • Mean: sum of measurements divided by number of measurements. • Equation for the mean of a sample: • or, if you don’t have an equation editor, Ybar = SUM(Yi) / n where… Ybar is the sample mean (Yi) is a measurement of Y for case i n is the number of cases in the sample
Weighted means • Weighted sample mean: the sum of measurements divided by the number of observations, adjusted for the number of cases in each observation • Example: we could weight the state murder rates by the number of persons in each state in 1993 to get the mean murder rate for persons in the US • If n = 2 the equation for the weighted mean is
3.3 Other measures of central tendency • Median: the measurement that falls in the middle of an ordered sample • the median is the value of the 50th percentile • Percentile: the number such that p% of scores fall below it and (100-p)% of scores fall above it • Mode: the value that occurs most frequently
3.4: Measures of variation • range: the difference between the largest and smallest observations • interquartile range: the difference between the 25th and 75th percentile observation • deviation: for any observation, the difference between that observation and the sample mean Di = Yi - Ybar (one averaged measure of variation for a sample would be to take the mean of the absolute values of all the deviations for the sample)
Variance and standard deviation: the most common measures of variation • variance: the mean of the squared deviations for a sample, labeled s2. • standard deviation: the square root of the variance, or the root mean squared deviation, labeled s.
Practice: Calculate the mean, variance, and standard deviation.
Interpreting the standard deviation. • s is (formally) the root mean squared deviation. • s is one version of the typicaldistance of an observation from the sample mean. • Because s accounts for squared deviations, it is affected by extreme scores. • Is this a desirable property? • Compare these samples: (-3,-3,+3,+3) vs (-2,-2,-2,+6) • Generally, for a continuous quantitative variable Y about 68% of scores fall between Ybar - s and Ybar + s.
Interpreting sample statistics. • Recall that… • A statistic is a single number estimated from a sample • A parameter is a single number that summarizes some quality of a variable in a population. • For means: • the population mean is (mu) • The sample mean Ybaris an estimator of . • For standard deviations • the population standard deviation is (sigma), • The sample standard deviation s is an estimator of .
The STATA windows environment - icons • Open (use) • Save • Print Results • Begin Log • Start viewer • Bring results window to front • Bring graph window to front • Do-file editor • Data editor • Data browser • Clear • Break
The .do file: interface of choice for social research • Icons within the .do file: • New • Open • Save • Print • Find • Cut • Copy • Paste • Undo • Do current file • Run current file
Sample commands in a .do file use "I:\601Fall08\socy601data.dta", clear summarize AGE summarize AGE [weight=ADULTS] tabulate AGE tabulate AGE [weight=ADULTS] clear
How to create a log file • One approach is to use the log icon to start and stop a log. • Another approach is to type the log-starting command into a .do file : log using I:\601Fall08\week01hmwk.txt, replace *. . . (your work here) . . . log close