180 likes | 289 Views
Some Common Probability Distributions. Gaussian: Sum of numbers. Uniform: e.g. Dice throw. Rayleigh: Square root of the sum of the squares of two gaussians. Relationships Among Probability Distributions. Assume that are uniformly distributed. Then:.
E N D
Some Common Probability Distributions Gaussian: Sum of numbers. Uniform: e.g. Dice throw. Rayleigh: Square root of the sum of the squares of two gaussians.
Relationships Among Probability Distributions Assume that are uniformly distributed. Then: Is Gaussian (Normal) distributed for N sufficiently large. Is distributed. Is Rayleigh distributed.
What could you conclude if you made a single measurement and the value fell as follows on the expected distribution?
What if three values fell as follows on the expected distribution?
If two data sets gave the following measurements, would you conclude that they came from different distributions? What if the data looked like this?
How confident are you that the data sets in each plot below come from different distributions? Lower standard deviation Smaller difference in means Fewer data points
Students T test measures the confidence you can have that two values are inherently different, based on three parameters • Difference of the means • Standard deviations • Number of data points obtained • Particularly useful when there are multiple confounding variables. • E.g. Blood cholesterol drugs – are we, on average, lowering blood pressure?
Students T test is used to answer the following question: • Given: • Difference of the means • Standard deviations • Number of data points obtained • That these data come from normal distributions • What is the probability (p) that they came from different underlying distributions?
Example Given the mean and standard deviation for pressure, along with the number of points measured from a clinical drug trial, what is the probability (p) that the drug had an effect on the distribution (i.e. that it changed the blood pressure of these individuals on average). Sample Mean: Mean from the sample that was taken (the 2000 people in the drug trial). Distribution Mean: Mean that would occur if you could give the drug to everyone in the world and do the measurement.
Statistical Tests You Should Know • T-test: Are the means of two data sets the same? • F-test: Are the standard deviations of two data sets the same? • Chi-Squred Test: Does the distribution of a data set match a proposed distribution? • Anova: Like an F-test for multiple variables. • Pierson’s Correlation Coefficient: Does one variable depend on another?
To Run a T-Test • Calculate the mean of the data. • Calculate the standard deviation of the data. • Determine the T statistic (e.g. ) • From T determine p. • p is “the probability that you would get a difference in means this large or smaller, given that the two measurement sets come from the same distribution.”
Interpretation of T test • You set the value that you consider significant. • Medical applications: p < 0.05 is “significant.” • Since p < 0.05 is a 1/20 probability, you will typically be wrong once in every 20 T tests.
Hypothesis • Null hypothesis: statement that the two distributions are the same. i.e. “Altase causes no change in blood pressure.” • Alternative hypothesis: Can vary. • Altase reduces the mean blood pressure. • Altase changes the mean blood pressure.
One-Tailed vs Two-Tailed • Depends on “alternative hypothesis.” • One tail: If alternative hypothesis is that one mean is greater than the other. • Two tail: If alternative hypothesis is that the means are different. • Saying that one of the means is greater is more restrictive. • The confidence you have in your result depends on the prediction (1st law of the frisbee).
Example • A friend throws a frisbee. It bounces off a pole, goes to the roof of the house, rolls along an arc, flips off the gutter, and then lands in the fountain. Are you impressed? • A friend predicts that the frisbee will do the above, and then it happens. Are you impressed? • As with the frisbee, statistical analysis depends on how far you are willing to stick your neck out.
Confidence Interval • States the range of values that contains the true value within a given percent confidence. • Depends on number of samples, and desired confidence. • Not a statistical test of significance, but related to the T-test • The more samples • we have, the narrower • the confidence interval.