160 likes | 506 Views
Random sampling and probability. Introduction to inferential statistics. Inferential statistics. Review definition: Making a decision about what you do not know -- the population characteristics -- based on what you do know -- the sample data. Two purposes of inferential statistics:
E N D
Random sampling and probability Introduction to inferential statistics
Inferential statistics • Review definition: Making a decision about what you do not know -- the population characteristics -- based on what you do know -- the sample data. • Two purposes of inferential statistics: • Parameter estimation • Hypothesis testing
The grape Kool-Aid example • Parameter estimation: What is the mean IQ of the population of Grape Kool-Aid drinkers? • Hypothesis testing: Is the mean IQ of the Grape Kool-Aid drinkers different from the mean IQ of the water drinkers?
Which is it? • We want to find the mean amount of pizza eaten by Houghton students. • We want to see if there is a difference in pizza consumption between seniors and sophomores. • We want to see if there is a difference in pizza consumption with vs. without having Pepsi freely available.
Making sense of hypothesis testing: finding your marbles. • Two opaque jars contain marbles. • Jar one contains 50 red marbles and 50 blue marbles. • Jar two contains 90 red marbles and 10 blue marbles. • Drawing one marble at a time from a jar, can you tell which jar it is? Blue Blue Blue Blue Blue Red Red Red Red Red
Random sampling and probability • Two applications of randomization: • Random selection, to produce representative samples • Random assignment, to produce equivalent groups • Random assignment of participants to groups • Random assignment of treatments to groups • Random assignment of sequences of treatments, or levels of the independent variable
Techniques for random sampling • Dichotomous techniques: The coin toss • Multi-group techniques: Random numbers • Computer generated random numbers • Table of random digits • As computer generated random numbers are based on a non-random seed, 1997 saw the development of the lava lamp technique.
A random digits table 12345 67890 12345 67890 12345 1|22864 59302 31334 37506 38477 2|29476 49068 67381 11834 05934 3|39678 89970 09674 83495 99377 4|38476 16459 00794 38457 98032 5|48572 49583 50286 66739 39567 6|68395 58296 96708 92663 49210
Two sampling strategies • Sampling with replacement • Purest form of random selection • Most beneficial for small populations • Sampling without replacement • Less pure random selection • No practical disadvantage for large populations
Probability Theory • A priori probability • p(A) = Number of events classified as A Total number of possible events • For example, the a priori probability of a head on one toss of an unbiased coin is 1 (the number of events classified as heads) divided by 2 (the total number of possible events, heads plus tails) = 1 / 2 = .5
Probability theory... • A posteriori probability • p(A) = Number of times A has occurred Total number of occurrences For example, if I toss any coin ten times, and I get 4 heads, the a posteriori probability of heads is 4 / 10 = .4 • Hypothesis testing compares a posteriori probability with a priori probability.
Probability theory... • Mutual exclusivity and the addition rule • p(A or B) = p(A) + p(B) - p(A ‘n’ B) For example, the probability of drawing a heart or a 3 from a deck of playing cards = 13/52 + 4/52 - 1/52 = 16/52 = .308 The probability of drawing either a heart or a spade from a deck of playing cards = 13/52 + 13/52 – 0/52 = 26/52 = .500
Probability theory... • Combination and the multiplication rule • p(A and B) = p(A) * p(B|A) For example, the probability of drawing a heart and a spade on two successive draws (with replacement) is (13/52)*(13/52) = .0625 and the probability of getting three heads on three tosses of an unbiased coin, simultaneously or sequentially, is .5*.5*.5 = .125
Independent vs dependent events • p(A and B) = p(A) * p(B|A) For example, the probability of drawing a heart and a spade on two successive draws (without replacement) is (13/52)*(13/51) = .0637
Resampling • Resampling is the basis for determining the variability of sample statistics. • Form an initial sample from the population without replacement. • Resample with replacement, drawing samples of the same size as the initial sample and drawing from the initial population.
Probability problems as Z-scores • For many probability problems, Z-score analysis is just the ticket.