290 likes | 399 Views
1. Homework #2 2. Inferential Statistics 3. Review for Exam. HOMEWORK #2: Part A. Sanitation Eng. Z=.53 = .2019 + .50 = .7019 F.C. Z=.67 = . 2486 + .50 = .7486 5 GPA’s, which are in the top 10%? GPA of 3.0 and 3.20 are not : Z = (3.0-2.78)/.33 =.67 Area beyond = .2514 (25.14%)
E N D
HOMEWORK #2: Part A • Sanitation Eng. Z=.53 = .2019 + .50 = .7019 • F.C. Z=.67 = .2486 + .50 = .7486 • 5 GPA’s, which are in the top 10%? • GPA of 3.0 and 3.20 are not: • Z = (3.0-2.78)/.33 =.67 • Area beyond = .2514 (25.14%) • Z=(3.20-2.78)/.33=1.27 • Corresponds to .8980 (.3980+.5000) • Area beyond = .1020 (10.2%) • By contrast, for 3.21… • Z=(3.21-2.78)/.33=1.30 • Corresponds to .9032 (.4032+.5000)
HOMEWORK #2: Part B • Question 1 • a. Mean=18.87; median=15; mode=4 • b. The mean is higher because the distribution is positively skewed (several large cities with high percents) • c. When you remove NYC, the mean=16.43 & the median goes from 15 to 14.5. Removing NYC’s high value from the distribution reduces the skew. • The mean decreases more than the median because value of the mean is influenced by outlying values; the median is not—it only moves one case over.
HOMEWORK #2: Part B • Question 2 • For this problem, there are two measures of central tendency (indicating the “typical” score). • The mean per student expenditure was almost $2,000 higher in 2003 ($9,009) than in 1993 ($7,050). • The median also increased, but not nearly as much (from $7,215 to $7,516). • The spread of the scores, as indicated by the standard deviation, was more than double 2003 (1,960) than it was in 1993 (804). • Shape • For 1993, the distribution of scores has a slight negative skew; this distribution is essentially normal (bell-shaped) as the mean ($7,050) and median ($7,215) are similar. By contrast, for 2003, the mean is much greater than the median; this distribution has a strong positive skew.
HOMEWORK #2: Part B • Q3 • a. 53.28% • Opposite sides of mean, add 2 areas together • b. 6.38% • Both scores on right side of mean, subtract areas • c. 10.56% • “Column C” area for Z=1.25 is .1056 • d. 69.15% • “Column B” area for Z= -0.5 is .1915 + .5000 (for other half of normal curve) • e. 99.38% • Z=2.5; Column B (for area between 2.5 & 0) = .4938 + .5000 (for other half of normal curve) • f. 6.68% • Z = -1.5; Column C for area beyond -1.5 =.0668
HOMEWORK #2: Part B • Q4 • a. .9953 • Column B area (.4953) + .5000 (for other half of normal curve) • b. .5000 • 50% of area on either side of mean (47) • c. .6826 • “Column B” for both – .3413 + .3413 • d. .9997 • Column B area (.4997) + .5000 (for other half of normal curve) • e. .0548 • “Column C” area for Z=1.6 • f. .3811 • Scores on opposite sides of mean add “Col. B” areas
HOMEWORK #2: Part C Statistics HOURS PER DAY WATCHING TV N Valid 1426 Missing 618 Mean 3.03 Median 2.00 Mode 2 Std. Deviation 2.766 Percentiles 10 1.00 20 1.00 25 1.00 30 2.00 40 2.00 50 2.00 60 3.00 70 3.00 75 4.00 80 4.00 90 6.00 • SPSS: • All the info needed to answer these questions is contained in this output
Sampling Terminology • Element: the unit of which a population is comprised and which is selected in the sample • Population: the theoretically specified aggregation of the elements in the study (e.g., all elements) • Parameter: Description of a variable in the population • σ = standard deviation, µ = mean • Sample: The aggregate of all elements taken from the pop. • Statistic: Description of a variable in the sample (estimate of parameter) • X = mean, s = standard deviation
Non-probability Sampling • Elements have unknown odds of selection • Examples • Snowballing, available subjects… • Limits/problems • Cannot generalize to population of interest (doesn’t adequately represent the population (bias) • Have no idea how biased your sample is, or how close you are to the population of interest
Probability Sampling • Definition: • Elements in the population have a known (usually equal) probability of selection • Benefits of Probability Sampling • Avoid bias • Both conscious and unconscious • More representative of population • Use probability theory to: • Estimate sampling error • Calculate confidence intervals
Sampling Distributions • Link between sample and population • DEFINITION 1 • IF a large (infinite) number of independent, random samples are drawn from a population, and a statistic is plotted from each sample…. • DEFINITION 2 • The theoretical, probabilistic distribution of a statistic for all possible samples of a certain outcome
The Central Limit Theorem I • IF REPEATED random samples are drawn from the population, the sampling distribution will always be normally distributed • As long as N is sufficiently (>100) large • The mean of the sampling distribution will equal the mean of the population • WHY? Because the most common sample mean will be the population mean • Other common sample means will cluster around the population mean (near misses) and so forth • Some “weird” sample findings, though rare
The Central Limit Theorem II • Again, WITH REPEATED RANDOM SAMPLES, The Standard Deviation of the Sampling distribution = σ √N • This Critter (the population standard deviation divided by the square root of N) is “The Standard Error” • How far the “typical” sample statistic falls from the true population parameter
The KICKER • Because the sampling distribution is normally distributed….Probability theory dictates the percentage of sample statistics that will fall within one standard error • 1 standard error = 34%, or +/- 1 standard error = 68% • 1.96 standard errors = 95% • 2.58 standard errors = 99%
The REAL KICKER • From what happens (probability theory) with an infinite # of samples… • To making a judgment about the accuracy of statistics generated from a single sample • Any statistic generated from a single random sample has a 68% chance of falling within one standard error of the population parameter • OR roughly a 95% CHANCE OF FALLING WITHIN 2 STANDARD ERRORS
EXAM • Closed book • BRING CALCULATOR • You will have full class to complete • Format: • Output interpretation • Z-score calculation problems • Memorize Z formula • Z-score area table provided • Short Answer/Scenarios • Multiple choice
Review for Exam • Variables vs. values/attributes/scores • variable – trait that can change values from case to case • example: GPA • score (attribute)– an individual case’s value for a given variable • Concepts Operationalize Variables
Review for Exam • Short-answer questions, examples: • What is a strength of the standard deviation over other measures of dispersion? • Multiple choice question examples: • Professor Pinhead has an ordinal measure of a variable called “religiousness.” He wants to describe how the typical survey respondent scored on this variable. He should report the ____. • a. median • b. mean • c. mode • e. standard deviation • On all normal curves the area between the mean and +/- 2 standard deviations will be • a. about 50% of the total area • b. about 68% of the total area • c. about 95% of the total area • d. more than 99% of the total area
EXAM • Covers chapters 1- (part of)6: • Chapter 1 • Levels of measurement (nominal, ordinal, I-R) • Any I-R variable could be transformed into an ordinal or nominal-level variable • Don’t worry about discrete-continuous distinction • Chapter 2 • Percentages, proportions, rates & ratios • Review HW’s to make sure you’re comfortable interpreting tables
EXAM • Chapter 3: Central tendency • ID-ing the “typical” case in a distribution • Mean, median, mode • Appropriate for which levels of measurement? • Identifying skew/direction of skew • Skew vs. outliers • Chapter 4: Spread of a distribution • R & Q • s2 – variance (mean of squared deviations) • s • Uses every score in the distribution • Gives the typical deviation of the scores • DON’T need to know IQV (section 4.2)
Keep in mind… • All measures of central tendency try to describe the “typical case” • Preference is given to statistics that use the most information • For interval-ratio variables, unless you have a highly skewed distribution, mean is the most appropriate • For ordinal, the median is preferred • If mean is not appropriate, neither is “s” • S = how far cases typically fall from mean
EXAM • Chapter 5 • Characteristics of the normal curve • Know areas under the curve (Figure 5.3) • KNOW Z score formula • Be able to apply Z scores • Finding areas under curve • Z scores & probability • Frequency tables & probability
EXAM • Chapter 6 • Reasons for sampling • Advantages of probability sampling • What does it mean for a sample to be representative? • Definition of probability (random) sampling • Sampling error • Plus… • Types of nonprobability sampling
Interpret • Number of cases used to calculate mean? • Most common IQ score? • Distribution skewed? Direction? • Q? • Range? • Is standard deviation appropriate to use here?
Scenario • Professor Scully believes income is a good predictor of the size of a persons’ house • IV? • DV? • Operationalize DV so that it is measured at all three levels (nominal, ordinal, IR) • Repeat for IV
Express the answer in the proper format • Percent • Proportion • Ratio • Probability