200 likes | 495 Views
580.691 Learning Theory Reza Shadmehr Review of Probability Theory Suggested reading: Hoel PG, Port SC, and Stone CJ (1971) Introduction to Probability Theory, Houghton-Mifflin publishers, Boston, pp. 14-128. A Philosophical Essay on Probabilities (1814)
E N D
580.691 Learning Theory Reza Shadmehr Review of Probability Theory Suggested reading: Hoel PG, Port SC, and Stone CJ (1971) Introduction to Probability Theory, Houghton-Mifflin publishers, Boston, pp. 14-128.
A Philosophical Essay on Probabilities (1814) • “Probability is the ratio of the number of favorable cases to that of all cases possible.” • Suppose we throw a coin twice. What is the probability that we will throw exactly one head? • There are four equally possible cases that might arise: • One head and one tail. • One tail and one head. • Two tails. • Two heads. • So there are 2 cases that will give us a head. The probability that we seek is 2/4. Pierre Simon de Laplace (1749-1827) Laplace firmly believed that, in reality, every event is fully determined by general laws of the universe. But nature is complex and we are woefully ignorant of her ways; we must therefore calculate probabilities to compensate for our limitations. Event, in other words, are probable only relative to our meager knowledge. In an epigram that has defined strict determinism ever since, Laplace boasted that if anyone could provide a complete account of the position and motion of every particle in the universe at any single movement, then total knowledge of nature’s laws would permit a full determination of all future history. Laplace directly links the need for a theory of probability to human ignorance of nature’s deterministic ways. He writes: “So it is that we owe to the weakness of the human mind one of the most delicate and ingenious of mathematical theories, the science of chance or probability.” (Analytical Theory of Probabilities, as cited by Stephen J. Gould, Dinosaurs in a Haystack, p. 27.
“If events are independent of one another, the probability of their combined existence is the product of their respective probabilities.” Suppose we throw two dice at once. The probability of getting “snake eyes” (two ones) is1/36. “The probability that a simple event in the same circumstances will occur consecutively a given number of times is equal to the probability of this simple event raised to the power indicated by this number.” “Suppose that an incident be transmitted to us by twenty witnesses in such a manner that the first has transmitted it to the second, the second to the third, and so on. Suppose again that the probability of each testimony be equal to the fraction 9/10. That of the incident resulting from the testimonies will be less than 1/8. Many historical events reputed as certain would be at least doubtful if they were submitted to this test.”
“When two events depend upon each other, the probability of the compound event is the product of the probability of the first event and the probability that, this event having occurred, the second will occur.” Suppose we have three urns, labeled A, B, and C. Two of the urns have only white balls, and one urn that has only black balls. We take one ball from urn C. What is the probability that it is white? (A,B,C)=(1,1,0) or (0,1,1) or (1,0,1) Probability of picking a white ball from urn C is 2/3. When a white ball has been drawn from urn C, the probability of drawing a white ball from urn B is 1/2. Therefore, probability of drawing two white balls from urns B and C is 1/3. Bayes’ rule
Example: Suppose that in a group of people, 40% are male and 60% are female, and that 50% of the males and 30% of the females smoke. Find the probability that a smoker is male.
Binomial distribution and discrete random variables Suppose a random variable can only take one of two variables (e.g., 0 and 1, success and failure, etc.). Such trials are termed Bernoulli trials. Probability density or distribution Probability of a specific sequence of successes and failures Why?
If we just do one trial: 0 1 0 1
Example of Binomial distribution Suppose a machine produces light bulbs with 0.1% probability that a bulb is defective. In a box of 200 bulbs, what is the probability that no bulbs are defective? In the same box, what is the probability that no more than two bulbs are defective? We notice that because a defect is a rare event, the distribution of n (i.e., number of defects) has its peak at zero and then declines very rapidly.
Example of Binomial distribution: Blindsight John C. Marshall & Peter W. Halligan, Nature 336, 766 – 767, 1988 The patient, P.S., had sustained right cerebral damage and failed overtly to process information in the hemispace contralateral to lesion. In common with most patients who manifest left-sided neglect, P.S. has a left homonymous hemianopia. Nonetheless, her neglect persists despite free movement of the head and eyes and is thus not a direct consequence of sensory loss in the left visual field. P.S. was presented simultaneously with two line drawings of a house, in one of which the left side was on fire. She judged that the drawings were identical; yet when asked to select which house she would prefer to live in, she reliably chose the house that was not burning. She was shown 17 examples of the house, and on 14 trials she picked the one that was not on fire. How do we know if this is “reliably” different than chance?
Number of times she picked the house that was not on fire Chance performance Her performance was about 1 SD away from chance. Is that significant?
Poisson distribution and its relation to the binomial distribution If n is a random variable distributed as a Poisson with parameter l, then:
Continuous random variables: Normal distribution Scalar random variable A normal distribution has 95% of its area in the range
Continuous random variables: Normal distribution Vector random variable To take the expected value of a vector or a matrix, we take the expected value of the individual elements. When x is a vector, the variance is expressed in terms of a covariance matrix C,where ρij corresponds to the degree of correlation between variables xi and xj
6 6 4 4 2 2 -6 -6 -4 -4 -2 -2 2 2 4 4 6 6 -2 -2 -4 -4 -6 -6 Ellipses representing regions of constant probability density with 25% probability Data fall inside this ellipse with 75% probability Observations about the data: 1. Variance of x2 is greater than x1.2. x1 and x2 have a negative correlation. with 50% probability
Variance of scalar and vector random variables Var and cov of vector random variables produce symmetric positive definite matrices