Inferential Statistics: Probability Distributions, Hypothesis Testing, and Statistical Inference

Inferential Statistics:Probability Distributions, Hypothesis Testing, and Statistical Inference

Key Lessons from the Shark/Spider/Candybug/Fiddler Labs • Natural populations and their traits are inherently variable. Its members belong to some sort of distribution. Some distributions are uniform, some are random, some are patchy, some are normal (Gaussian), some are bimodal (two peaks), some are multimodal (several peaks), some are skewed (peak is way over toward one extreme), and so on. • When taking a subsample from a natural population in scientific research, you usually must take a sample that is random. This guards against taking a sample that is biased toward one particular segment of the population (e.g., the biggest, most noticeable crabs). Random sampling is the safest way to ensure that you get an unbiased estimate. • Unfortunately, random samples are not always representative of the total population (e.g., due to “flukes” or “Poisson clusters”). They sometimes yield estimates that lie far from the true mean (which remember, we can never ever know…).

(Key Lessons cont’d) • Every random sample has a probability (“likelihood” or “chance” or “odds”) of being representative of the total population and therefore of yielding a good, accurate estimate of the true (but ever unknowable) mean. This probability increases with a larger sample size (N). Increasing the sample size improves one’s statistical confidence that the estimate is reasonably accurate. • Just as the individual members of the natural population belong to some natural (but hidden) distribution, so does one’s estimate (from a SINGLE random sample) belong to a Distribution of the Means. This is the set of all the estimated means you could ever possibly get with a given sample size. • However, you are more likely to get some estimates than others. Generally speaking, you are more likely to get an estimate close to the true mean than you are to get a fluke. Thus a Distribution of the Means is usually bell-shaped(even if the natural population itself is NOT bell-shaped!). And a bell-shaped curve reflects a set of probabilities…

A Human Histogram! Diagram from Freeman & Herron, Evolutionary Analysis University of Connecticut students organized by height, from 5’0” to 6’5”. For the trait of height, humans belong (approximately) to a bell-shaped “Normal” distribution. (technically it’s a bimodal distribution, with two peaks, since the mean height of males differs from that of females)

Mean, median, & mode all coincide Most individuals are clustered near the mean… …with fewer individuals at the extremes (or “tails”) “Normal” (or Gaussian) Distribution The curve is symmetrical (balanced to the left and right). All “Normal” curves are bell-shaped, but not all “Bell” curves are Normal!!! A true Normal Distribution has certain specific proportions…

68% of Data 95% of Data 99% of Data In any standard Normal Distribution: 68% of the data fall within one “standard deviation” of the mean 95% of the data fall within two “standard deviations” of the mean 99% of the data fall within three “standard deviations” of the mean All of these percents can be understood as PROBABILITIES!!!

Questions • Suppose the shoe sizes of all the high school girls in Caroline County belong to a Normal Distribution. If we were to check Kelli’s shoe size, what is the probability that it would fall within one standard deviation of the mean shoe size among girls at CHS? • 68% (about a two-thirds likelihood) • If we were to check Amanda, Clancy, & Emily’s shoe sizes, how many of those three girls would we expect (probabilistically speaking) to fall within one standard deviation of the mean? • Two (since an average of 68% of all CHS girls fall within this interval) • What are the odds that Clancy’s shoe size would fall beyond (outside of) two standard deviations of the mean? • 5% (100 – 95 = 5) (a one in twenty chance) • Suppose Brittany has really gigantic feet, just beyond three standard deviations of the mean shoe size. How many other CHS girls would you have to randomly sample before you would probably find someone with feet just as unbelievably humungous as Brittany’s? • You may have predicted 100, but it’s actually 200! See why???

Hey, check out the Gaussian simulation in Excel…. All Normal Distributions have the same “68-95-99” proportions, but they can be short and fat or tall and thin, depending on the variance (and standard deviation) of the population. How does increasing the variance affect the shape of a normal distribution? How does decreasing it affect it? Answer:

Results of Hypersampling from a Single Population of 100 Fiddler Crabs These are distributions of our estimated means using two different sample sizes, NOT of the individual crabs themselves. Because the crabs themselves belong to a Normal distribution, the Distributions of the Means are also Normal (approximately).

The Central Limit Theorem, Part 1 Principle #1: If samples are taken from a natural population that is Normally Distributed, then the “Distribution of the Means” will ALSO be Normally Distributed …no matter what the sample size (N). (Sample size only affects the range and variance of the Distribution of the Means …how wide and tall it is…) This is an extremely useful fact!!! It tells us that a single estimate from a single sample (of size N) is 68% likely to fall within one “standard deviation” of the true mean, 95% likely to fall within two “standard deviations” of the true mean, etc. It helps us to know how confident we can be in our estimated mean!!!

But what if the crabs we measured had not come from a Normal Distribution? What if instead, most of the crabs had small fiddles, while only a few had medium and large fiddles? How would that have affected our Distribution of the Means? Would IT still be Normal, even though the real crab population wasn’t??? Hmmm. Let’s find out…

Here’s a population of 100 fiddlers whose claws are not normally distributed. The distribution is “skewed” toward one side. Most crabs have small claws and only a few have medium or large claws. Now here’s a Distribution of the Means, based on 50 estimates with N=3. It is not Normal, although it does have a bit of a bell shape, somewhat skewed to one side. And here’s a Distribution of the Means, based on 50 estimates of N=6. It’s still not quite Normal, but it’s closer to Normal than the N=3 Distribution.

The Central Limit Theorem, Part 2 Principle #2: If samples are taken from a natural population that is not Normally Distributed, then the “Distribution of the Means” will be more or less bell-shaped, but it will not be Normal. However, it will become closer and closer to Normal as sample size (N) increases. This means that the bigger your sample size, the closer your estimate of the mean will fall into the known “68-95-99” proportions and probabilities of a Normal curve. Again, this is useful information…

“Hypothesis Testing” The Central Limit Theorem is the basis for all the major Statistical Tests on null hypotheses. These mathematical tests tell you how “safe” it is to reject your null hypothesis. Recall that in most biological experiments and field studies, the scientist’s primary goal is to DETECT A DIFFERENCE between two or more MEANS (for example, between a control group and an experimental group, or between several experimental groups). In all cases, there is a corresponding null hypothesis. Ho: MC = ME (control mean vs. experimental mean) Ho: MF = MB= MS (means at 3 different salinities: fresh vs. brackish vs. saltwater) Ho: M10 = M15 = M20= M25 (means at 4 different temperatures: 10° vs. 15° vs. 20° vs. 25°)

The problem: When two estimated means from two different groups come up different, how do we know that they are REALLY different? Do they come from two different distributions, or are they just two different estimates from the same distribution? How LIKELY is it that they belong to the same Distribution? How CONFIDENT can we be that they belong to two different distributions? Suppose an experiment turns up an estimated control mean and experimental mean of 5 and 7 respectively. How would you know they stem from two different Distributions of the Means and not the same Distribution of the Means???

Statistical Tests • “Statistical Tests” are mathematical tests that compare one or more estimated means, and based on the sample size and variance of the data sets, predict the probability that the means belong to the same distribution of the means. • Examples: t-test, ANOVA, and regression • The math involved gets cumbersome, so it’s best to let a computer run the operations and do all the work. • In all cases, the hidden mathematics are based on: • The known “68-95-99” proportions of a Normal Distribution • The Variance that shows up in the data set (which controls the width and spread of the Distribution of the Means) • The fact that a larger Sample Size (N) accomplishes two things: (a) it reduces the DM’s variance, thereby improving the probability (hence confidence) of being close to the true mean, and (b) makes DM closer to Normal (= the Central Limit Theorem)

p-value At the end, the computerized statistical test will kick out a “p-value,” a number between (0 and 1.0 …or 0% and 100%). This is the probability that the estimated means come from the same distribution. In other words, it reflects the odds that the null hypothesis is true. The conventional criterion for rejecting the null hypothesis is a p-value of 0.05 or less. This would mean that there is less than a 5% chance that the null hypothesis is true. It means that you can be at least 95% confident that your two (or more) estimated means really do belong to different distributions. It means there is less than a 1 in 20 chance that you would be wrong to reject the null hypothesis.

“Statistical Significance” When a statistical test gives us a p-value of .05 or less and thereby tells us we can reject the null hypothesis with at least 95% confidence, then we can accept the alternate hypothesis (Ha …our actual hypothesis) as valid. In that case we say that the difference between our estimated means is “statistically significant.” We have detected a “statistically significant” difference between the two (or more) groups. Several expressions that all mean basically the same thing: “We can reject the null hypothesis with 95% confidence.” “We can accept the alternate hypothesis with 95% confidence.” “There’s a statistically significant difference between the means.”

Questions • A fisheries biologist wants to know if striped bass (“rockfish”) in the lower Rappahannock are bigger, on average, than striped bass in the more polluted James River? Using an otter trawl and a random sampling scheme, he captures 25 adult rockfish from both rivers. He estimates the mean fork length of Rappahannock rock to be 49 cm and the mean fork length of James River rock to be 43 cm. It looks like the Rappahannock fish are indeed bigger, on average, but he runs a statistical t-test to be safe. The t-test generates a p-value of 0.08. Can he reject the null hypothesis of “no difference?” • NO! Because his p-value is not 0.05 or less, he must retain the null hypothesis! He cannot be 95% sure that he has detected a real difference between the two populations, and so he cannot safely accept the alternate hypothesis that Rappahannock rock are bigger. • Even so, his p-value was pretty low. So it may be that there is a real difference here, but that he just failed to detect it. Therefore he might want to repeat his experiment using a “more sensitive” sampling scheme. What could he do to enhance the odds of detecting a size difference between the two populations (if it in fact exists)? • Increase his sampling size! Instead of randomly selecting N=25 fish, he might try N=50 fish. That would reduce the variance in the Distribution of the Means and also bring its shape closer to a true Normal Distribution …both of which would increase the “sensitivity” of the t-test.

Inferential Statistics: Probability Distributions, Hypothesis Testing, and Statistical Inference