590 likes | 994 Views
Probability. Principles of probability calculations. Probability values range from 0 to 1. Adding all probabilities of the sample yields 1. The probability that an event A will not occur is 1 minus the probability of A.
E N D
Principles of probability calculations • Probability values range from 0 to 1. • Adding all probabilities of the sample yields 1. • The probability that an event A will not occur is 1 minus the probability of A. • If two events are independent, the probability that one or the other event occurs is the sum of their individual probabilities.
Simple probability Sample space: 1,2,3,4,5,6 P(A) = 1/6 = 0.1666
Joint probability P(A,B) = P(A) P(B) P(5,6) = P(0.166) P(0.166) = 0.0277
Joint probability (1) keep the dogs on the beach -> V NP PP -> V [NP PP]
Conditional probability VP VP → V NP XP [.15] V NP PP .15 x .81 = .12 keep the dogs on the beach keep: V NP XP [.81]
Conditional probability VP → V NP XP [.15] VP NP NP → NP PP [.14] V NP PP .19 x .39 x 14 = .01 keep the dogs on the beach keep: V NP [.19]
Conditional probability In a corpus including 12.000 nouns and 3.500 adjectives, 2.000 adjectives precede a noun. What is the likelihood that a noun occurs after an adjective? P(2000) P(ADJ|N) = 0.1666 P(12000)
Conditional probability What is the likelihood that an adjective precedes a noun? P(2000) P(N|ADJ) = 0.5714 P(3500)
Types of probability distributions • Discrete probability distribution • Continuous probability distribution
Binomialdistribution • two possible outcomes on each trail • the outcomes are independent of each other • the probability ratio is constant across trails Bernoulli trail:
Binomialdistribution T H HH HT TH TT
Binomialdistribution 0 heads = HH 1 head = HT + TH 2 heads = TT
Binomialdistribution HH HT TH TT 0 1 2 Sample space Random variable
H T HH HT TH TT HHH HHT HTH HTT THH THT TTH TTT
Sample space: HHH TTT HHT TTH HTH THT THH HTT Random variables: 0 Head 1 Head 2 Heads 3 Heads 0 head: 1 1 head: 3 2 heads: 3 3 heads: 1 / 8 = 0.125 / 8 = 0.375 / 8 = 0.375 / 8 = 0.125
Normaldistribution • The center of the curve represents the mean, median, and mode. • The curve is symmetrical around the mean. • The tails meet the x-axis in infinity. • The curve is bell-shaped. • The total under the curve is equal to 1 (by definition).
z-scores x1 – x SD
z-scores Zwei Kandidaten haben an zwei unterschiedlichen Sprachtests teilgenommen. Kandidat A hat 121 Punkte erzielt, Kandidat B hat 177 Punkte erzielt. Im ersten Test (an dem Kandidat A teilgenommen hat) lag der Mittelwert bei 92 und die Standardabweichung bei 14; im zweiten Test (an dem Kandidat B teilgenommen hat) lag der Mittelwert bei 143 und die Standardabweichung bei 21. Welcher der beiden Kandidaten hat besser abgeschnitten (im Vergleich zu allen übrigen Kandidaten)? ZA = 121 – 92 / 14 = 2.07 ZB = 177 – 143 / 21 = 1.62
Central limit theorem 6, 2, 5, 6, 2, 3, 1, 6, 1, 1, 4, 6, 6, 2, 2, 1, 1, 5, 1, 3 = 2.64
Mean of sample mean 4.75 + 3.0 + 3.0 + 2.75 + 2.5 = 3.2 5
The sample means are normally distributed (even if the phenomenon in the parent population is not normally distributed). Central limit theorem
Central limit theorem • Der Mittelwert der individuellen Mittelwerte nähert sich dem Mittelwert in der wahren Population an. • Die Mittelwerte der Stichproben ist normalverteilt, selbst wenn das Phänomen, das wir untersuchen, in der wahren Population nicht normalverteilt ist. • Alle parametrischen Tests nutzen die Tatsache, dass die Mittelwerte der Stichproben (ab einer bestimmten Anzahl von Stichproben) normalverteilt sind.
population sample
population sample mean of this sample
population sample mean of this sample distribution of many sample means
Are your data normally distributed? How many samples do you need to assume that the mean of the sample means is normally distributed?
Are your data normally distributed? • The distribution in the parent population (normal, slightly skewed, heavily skewed). • The number of observations in the individual sample. • The total number of individual samples.
Confidence intervals Confidence intervals indicate a range within which the mean (or other parameters) of the true population is located given the values of your sample and assuming a particular degree of certainty.
Confidence intervals • The mean of the sample means • The SDs of the sample means, i.e. the standard error • The degree of certainty with which you want to state the estimation
Standard deviation (xn – x)2 N- 1