1 / 37

eatworms.swmed/~leon leon@eatworms.swmed

eatworms.swmed.edu/~leon leon@eatworms.swmed.edu. Combining probabilities Samples and Populations Four useful statistics: The mean, or average. The median, or 50% value. Standard deviation. Standard Error of the Mean (SEM). Three distributions: The binomial distribution.

hu-alvarado
Download Presentation

eatworms.swmed/~leon leon@eatworms.swmed

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. eatworms.swmed.edu/~leon • leon@eatworms.swmed.edu

  2. Combining probabilities Samples and Populations Four useful statistics: The mean, or average. The median, or 50% value. Standard deviation. Standard Error of the Mean (SEM). Three distributions: The binomial distribution. The Poisson distribution. The normal distribution. Four tests The chi-squared goodness-of-fit test. The chi-squared test of independence. Student’s t-test The Mann-Whitney U-test. Basic Statistics

  3. Combining probabilities • When you throw a pair of dice, what is the probability of getting 11?

  4. Combining probabilities • The probability that allof several independentevents occurs is the productof the individual event probabilities. • The probability that oneof several mutually exclusiveevents occurs is the sumof the individual event probabilities.

  5. Combining probabilities • When you throw a pair of dice, what is the probability of getting 11? • When you throw five dice, what is the probability that at least one shows a 6?

  6. Combining probabilities • When you throw a pair of dice, what is the probability of getting 11? • When you throw five dice, what is the probability that at least one shows a 6?

  7. Populations and samples • What proportion of the population is female?

  8. Populations and samples • What proportion of the population is female? • Abstract populations: what does a mouse weigh?

  9. Populations and samples • What proportion of the population is female? • Abstract populations: what does a mouse weigh? • Population characteristics: • Central tendency: mean, median • Dispersion: standard deviation

  10. Four sample statistics

  11. Standard deviation and SEM • Use standard deviation to describe how much variation there is in a population. • Example: income, if you’re interested in how much income varies within the US population. • Use SEM to say how accurate your estimate of a population mean is. • Example: measurement of -gal activity from a 2-hybrid test.

  12. Sample stats: recommendations • When you report an average, report it as mean  SEM. • Same for error bars in graphs. • In the figure caption or the table heading or somewhere, say explicitly that that’s what you’re reporting. • Use the median for highly skewed data.

  13. Three distributions • The binomial distribution • When you count how many of a sample of fixed size have a certain characteristic. • The Poisson distribution • When you count how many times something happens, and there is no upper limit. • The normal distribution • When you measure something that doesn’t have to be an integer or when you average several continuous measurements.

  14. The binomial distribution

  15. The Poisson distribution

  16. The normal distribution

  17. Hypothesis testing

  18. A genetic mapping problem

  19. A genetic mapping problem

  20. A genetic mapping problem

  21. The experiment • Look at the SSR genotype of 40 e/e kids. • If about 1/4 are /, the SSR is probably unlinked. • If the number of / is much less than 1/4, the SSR is probably linked. • We’re going to figure out how to make the decision in advance, before we see the results.

  22. Expected results if unlinked

  23. Is the SSR linked? • We want to know if the SSR is linked to the epilepsy gene. • What would your answer be if: • 10/40 kids were /? • 0/40 kids were /? • 5/40 kids were /? • Need a way to set the cut-off.

  24. Type I errors • Suppose that in reality, the SSR and the epilepsy gene are unlinked. • Still, by chance, the number of / in our sample may be <cut-off. • We would decide incorrectly that the genes were linked. • This is a type I error.

  25. What’s the probability of a type I error () if we cut off at 5?

  26. Probability of a type I error

  27. Some terminology • The hypothesis that nothing special is going on is the null hypothesis, H0. • A type I error is the rejection of a true null hypothesis. • The probability of a type I error is called , or the level of significance.

  28. Levels of significance • “Statistically significant,” if nothing more precise is added, means significant at P≤ 5%. • “Highly significant” is less universal, but typically means P≤ 1%. • The other level worth distinguishing isP≤ 0.1%. • Recommendation: stick with these levels, don’t report ridiculously low probabilities.

  29. How many tails? • The test I have just described is a one-tailed test, because we were only interested in the possibility that the frequency of / was less than ¼. • More commonly, you want to test whether an observation is either less than or greater than a predicted value. • In that case you need two cutoffs, a lower one and an upper one. • The probability of a type I error will then be the sum of the probability of too low a number and the probability of too high a number.

  30. Two tails of the binomial

  31. The two-tailed test • Typically we put half of the probability (2.5%) in each tail. • Our decision rule will be to reject if n≤ 4 or if n≥ 16. • This is called a two-tailed test. • Recommendation: if you are at all uncertain, do a two-tailed test.

  32. Statistical tests • Chi-squared goodness-of-fit test: • Test whether a single measurement from a binomial matches a theoretical value. • Test whether two Poisson distributions have equal means (by testing whether one measurement is 50% of the sum). • Chi-squared test of independence: • Test whether two binomial distributions have equal means. • Student’s t test: • Test whether two normal distributions have equal means. • Mann-Whitney U test: • Test whether two samples come from distributions with the same location. Can be used with any continuous distribution.

  33. Test on the probability of a binomial variable • You looked at N things (people in the room for instance), and counted the number n who matched some criterion (female, for instance). • The null hypothesis is that this is a binomial with probability p0 (some definite value that you predict based on theory). • Chi-squared goodness-of-fit test. • Example: progeny classes from genetic cross.

  34. Tests of independence • When you have measured two binomial variates to test if the p of the two distributions is the same. • Chi-squared test of independence. • For instance, suppose we want to know if the proportion of biologists who are women is different from the proportion of doctors who are women. So we count some biologists and some doctors and we find that 24/61 biologists are women (39%), but 36/72 doctors are women (50%). We could use a chi-squared test to find out if this difference is significant. (Turns out it isn’t even close.)

  35. Student’s t test on the means of normal variables • This is when you have two sample averages and you want to know if they’re different. • For instance, maybe you have weighed mice that are homozygous for a gene knockout and their heterozygous siblings. The hotes weigh less, a common sign that they’re unhealthy in some way, and you want to know if the difference is significant. • This test assumes that weight (or at least the average of several weights) is normally distributed.

  36. The Mann-Whitney U test • Used under almost exactly the same circumstances as the t-test. For instance, you could use it to compare mouse weights. • Doesn’t compare averages; compares the positions of the entire distributions. • This test makes NO ASSUMPTIONS about the underlying distributions. • Probably the most useful of all statistical tests.

  37. THINK

More Related