1 / 26

Intro to Probability

Intro to Probability. Slides from Professor Pan,Yan, SYSU. 0. 1. 2. 3. X. 4. 5. 6. 7. 8. Probability Theory. Example of a random experiment We poll 60 users who are using one of two search engines and record the following:. Each point corresponds to one of 60 users. Two search

vala
Download Presentation

Intro to Probability

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Intro to Probability Slides from Professor Pan,Yan, SYSU

  2. 0 1 2 3 X 4 5 6 7 8 Probability Theory Example of a random experiment • We poll 60 users who are using one of two search engines and record the following: Each point corresponds to one of 60 users Two search engines Number of “good hits” returned by search engine

  3. 0 1 2 3 X 4 5 6 7 8 Probability Theory Random variables • X and Y are called random variables • Each has its own sample space: • SX = {0,1,2,3,4,5,6,7,8} • SY = {1,2}

  4. 3 6 8 8 5 3 1 0 0 60 60 60 60 60 60 60 60 60 0 1 2 3 X 4 5 6 7 8 0 0 0 1 4 5 8 6 2 60 60 60 60 60 60 60 60 60 Probability Theory Probability • P(X=i,Y=j) is the probability (relative frequency) of observing X=i and Y=j • P(X,Y)refers to the whole table of probabilities • Properties: 0 ≤ P≤ 1, SP = 1 P(X=i,Y=j)

  5. P(X) P(Y) 0 1 2 3 X 4 5 6 7 8 Probability Theory Marginal probability • P(X=i) is the marginal probability that X=i, ie, the probability that X = i, ignoring Y

  6. SUM RULE P(X=i) 3 6 8 8 5 3 1 0 0 60 60 60 60 60 60 60 60 60 26 60 P(Y=j) 34 60 0 1 2 3 X 4 5 6 7 8 0 3 6 0 8 0 9 1 9 4 5 8 9 8 6 6 2 2 60 60 60 60 60 60 60 60 60 60 60 60 60 60 60 60 60 60 Probability Theory Marginal probability • P(X=i) is the marginal probability that X=i, ie, the probability that X = i, ignoring Y • From the table: P(X=i)=SjP(X=i,Y=j) Note that SiP(X=i) = 1 and SjP(Y=j) = 1

  7. P(X|Y=1) P(Y=1) 0 1 2 3 X 4 5 6 7 8 Probability Theory Conditional probability • P(X=i|Y=j) is the probability that X=i, given that Y=j • From the table: P(X=i|Y=j)=P(X=i,Y=j) /P(Y=j)

  8. 3 6 8 8 5 3 1 0 0 60 60 60 60 60 60 60 60 60 0 0 0 1 4 5 8 6 2 3 6 8 9 9 8 9 6 2 3 6 8 8 5 3 1 0 0 3 6 8 9 9 8 9 6 2 0 0 1 1 2 2 3 3 X X 4 4 5 5 6 6 7 7 8 8 0 3 0 6 8 0 9 1 4 9 5 8 8 9 6 6 2 2 60 60 60 60 60 60 60 60 60 60 60 60 60 60 60 60 60 60 Probability Theory Conditional probability • How about the opposite conditional probability, P(Y=j|X=i)? • P(Y=j|X=i)=P(X=i,Y=j) /P(X=i) Note that SjP(Y=j|X=i)=1 P(X=i) P(Y=j|X=i) P(X=i,Y=j)

  9. Summary of types of probability • Joint probability: P(X,Y) • Marginal probability (ignore other variable): P(X) and P(Y) • Conditional probability (condition on the other variable having a certain value): P(X|Y) and P(Y|X)

  10. PRODUCT RULE Probability Theory Constructing joint probability • Suppose we know • The probability that the user will pick each search engine, P(Y=j), and • For each search engine, the probability of each number of good hits, P(X=i|Y=j) • Can we construct the joint probability, P(X=i,Y=j)? • Yes. Rearranging P(X=i|Y=j)=P(X=i,Y=j) /P(Y=j) we getP(X=i,Y=j)=P(X=i|Y=j) P(Y=j)

  11. Summary of computational rules • SUM RULE: P(X) = SYP(X,Y) P(Y) = SXP(X,Y) • Notation: We simplify P(X=i,Y=j) for clarity • PRODUCT RULE: P(X,Y) = P(X|Y)P(Y) P(X,Y) = P(Y|X)P(X)

  12. 0 1 2 3 X 4 5 6 7 8 Ordinal variables • In our example, X has a natural order 0…8 • X is a number of hits, and • For the ordering of the columns in the table below, nearby X’s have similar probabilities • Y does not have a natural order

  13. Probabilities for real numbers • Can’t we treat real numbers as IEEE DOUBLES with 264 possible values? • Hah, hah. No! • How about quantizing real variables to reasonable number of values? • Sometimes works, but… • We need to carefully account for ordinality • Doing so can lead to cumbersome mathematics

  14. Probability theory for real numbers • Quantize X using bins of width  • Then, X {.., -2, -, 0, , 2, ..} • Define PQ(X=x) = Probability that x X ≤ x+ • Problem: PQ(X=x) depends on the choice of  • Solution: Let  0 • Problem: In that case, PQ(X=x) 0 • Solution: Define a probability density P(x) = lim0PQ(X=x)/ = lim0 (Probability that x X ≤ x+)/

  15. Probability theory for real numbers Probability density • Suppose P(x) is a probability density • Properties • P(x)0 • It is NOT necessary thatP(x)≤1 • xP(x)dx = 1 • Probabilities of intervals: P(aX≤b) = bx=a P(x) dx

  16. y R x Probability theory for real numbers Joint, marginal and conditional densities • Suppose P(x,y) is a joint probability density • x yP(x,y)dx dy = 1 • P( (X,Y)  R) = R P(x,y) dx dy • Marginal density: P(x) =y P(x,y) dy • Conditional density: P(x|y) = P(x,y) / P(y)

  17. The Gaussian distribution sis the standard deviation

  18. Mean and variance • The mean of X is E[X] =SXX P(X) or E[X] =xx P(x)dx • The variance of X is VAR(X) =SX(X-E[X])2P(X) or VAR(X) =x(x - E[X])2P(x)dx • The std dev of X is STD(X) =SQRT(VAR(X)) • The covariance of X and Y is COV(X,Y) = SXSY (X-E[X])(Y-E[Y])P(X,Y) or COV(X,Y) = x y (x-E[X])(y-E[Y])P(x,y) dx dy

  19. Mean and variance of the Gaussian E[X] =  VAR(X) =  2 STD(X) = 

  20. How can we use probability as a framework for machine learning?

  21. L = Maximum likelihood estimation • Say we have a density P(x|q) with parameter q • The likelihood of a set of independent and identically drawn (IDD) data x= (x1,…,xN) is P(x|q) =Pn=1NP(xn|q) • The log-likelihood is L = ln P(x|q) = Sn=1NlnP(xn|q) • The maximum likelihood (ML) estimate of q is q ML = argmaxqL= argmaxqSn=1Nln P(xn|q) • Example: For Gaussian likelihood P(x|q)= N(x|,2),

  22. Comments on notation from now on • Instead ofSj P(X=i,Y=j), we write SX P(X,Y) • P() and p() are used interchangeably • Discrete and continuous variables treated the same, so SX, X, Sx and x are interchangeable • qML and q ML are interchangeable • argmaxqf(q) is the value of q that maximizes f(q) • In the context of data x1,…,xN, symbols x, X, X and X refer to the entire set of data • N(x|,2) = • log()=ln() and exp(x) = ex • pcontext(x) and p(x|context) are interchangable

  23. L = Maximum likelihood estimation • Say we have a density P(x|q) with parameter q • The likelihood of a set of independent and identically drawn (IDD) data x= (x1,…,xN) is P(x|q) =Pn=1NP(xn|q) • The log-likelihood is L = ln P(x|q) = Sn=1NlnP(xn|q) • The maximum likelihood (ML) estimate of q is q ML = argmaxqL= argmaxqSn=1Nln P(xn|q) • Example: For Gaussian likelihood P(x|q)= N(x|,2),

  24. Questions?

  25. L = Maximum likelihood estimation • Say we have a density P(x|q) with parameter q • The likelihood of a set of independent and identically drawn (IDD) data x= (x1,…,xN) is P(x|q) =Pn=1NP(xn|q) • The log-likelihood is L = ln P(x|q) = Sn=1NlnP(xn|q) • The maximum likelihood (ML) estimate of q is q ML = argmaxqL= argmaxqSn=1Nln P(xn|q) • Example: For Gaussian likelihood P(x|q)= N(x|,2),

  26. Example: For Gaussian likelihood P(x|q) = N(x|,2), Objective of regression: Minimize error E(w)= ½Sn( tn- y(xn,w) )2 L = Maximum likelihood estimation

More Related