1 / 25

Probability and Statistics for Data Mining

Probability and Statistics for Data Mining. COMP5318. Question 1. Question: Suppose you randomly select a credit card holder and the person has defaulted on their credit card. What is the probability that the person selected is a ‘Female’?. Probability.

etenia
Download Presentation

Probability and Statistics for Data Mining

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Probability and Statistics for Data Mining COMP5318

  2. Question 1 • Question: Suppose you randomly select a credit card holder and the person has defaulted on their credit card. What is the probability that the person selected is a ‘Female’?

  3. Probability • Probability is the mathematical language to understand uncertainty. • We need to make decisions in the presence of uncertainty which is ever present. • Example: The Earth is warming- a phenomenon that is known as Global Warming (GW). Is modern human activity the cause of GW. • Physics driven approach • Data driven approach

  4. Experiments and Observation • When an experiment is carried out we observe the outcome – which is often uncertain. • If not uncertain then why carry out the experiment? • We look into a random shopping basket. Does it contain a a packet of “Tofu”? • We toss a coin, does it land on “Heads”? • We ask a question: “Is it raining in Broom, WA, right now”?

  5. Building Blocks of Probability • The space of all possible outcomes is called the sample space. • Non-trivial to decide. • Single Coin Toss. The space is {H,T}. • Shopping Basket. The space of all possible combinations of all items sold in the store. • Shopping Basket: {Tofu, Not-Tofu}.

  6. Events • Events are subsets of the sample space. Events are often defined in familiar terms. • In the shopping basket scenario • A vegetarian shopping basket is an event. • all possible vegetarian item combinations. • Throw of a dice. The event we are looking for could be: Even Number = {2,4,6}, where the sample space = {1,2,3,4,5,6}

  7. Events • Let G be the set of all galaxies. Characterize each galaxy by three number • d: distance from earth • a: major axis • b: minor axis • Elliptic Galaxies (EG) • EG ={(a,b,d) | a/b > 1.5} • Distant Spiral Galaxies (DSG) • DSG ={(a,b,d) | a/b <= 1.5 and d > 10}

  8. Events • Let G be the set of all genes. Each gene can be “on” or “off”. Let E correspond to the event: all genes which are “on” when the skin cells are “starved”.

  9. Events are Sets • At the most basic level events are sets. Therefore we can carry out set union, difference and intersection on events. • For example: • E1: shopping baskets which contain Tofu • E2: shopping baskets which contain Milk • E1 U E2: shopping baskets which contain either Tofu or Milk

  10. Probability • Let S be the space of all possible elementary outcomes. Let  = Power(S) be the power set of S. Then the probability P is function: P :  [0,1] that satisfy the following properties (axioms):

  11. Interpretation of Probability • Physical or Ontological: Long term frequency • 50% chance that a coin will land on heads. • 20% of all Woolworth shopping baskets are vegetarian. • 22% of all Woolworth shopping baskets in Northbridge plaza are vegetarian. • Epistemological : Degree of Belief • 20% chance that my neighbours are watering their lawn on “dry” days. • 99% chance that the green immovable object outside my house is a Tree. • 90% chance that Australia will win the cricket world cup.

  12. Consequences of Axioms

  13. Example • Two coin tosses. Let H1 be the event that a heads occurs on toss 1 and H2 a heads on toss 2. All events are equally likely. • Sample space = {HH, HT, TH, TT} • H1 = {HH, HT} • H2 = {HH,TH} • P(H1 U H2) = ½ + ½ - ¼ = 3/4

  14. Example • Two events A and B are independent if • P(A ∩ B) = P(A)P(B) • P(A∩B) is also written as P(AB) and P(A,B). • If A and B are disjoint event then A and B such that P(A) > 0 and P(B) > 0 then A and B cannot be independent • P(A ∩ B) = 0. Yet P(A)P(B) > 0 • Except for this case you cannot determine independence by looking at a Venn diagram

  15. Question • A shopping basket can either be kosher or not. The probability that it will be kosher is 3/4. Examine 10 baskets at a check out counter. What is the probability that there will be at least one kosher basket.

  16. Answer • Let E be the event “At least one kosher basket.” Let NKi be the event that the i-th basket is non-kosher. Independence

  17. Example • For an Online Book Seller (OBS) the conversion rate is 1/100, i.e., every 100th visitors ends up making a purchase. What is the probability that at least one purchase will be made in 10 consecutive visits (by distinct customers).

  18. Example • Two people take turns to sink a basketball. P1 succeeds with probability 1/3 and P2 with ¼. What is the probability that P1 succeeds before P2. • Requires clever setting up of the events. • Let E be the event that P1 succeeds before P2. • Let Ai be the event that P1 succeeds before P2 on the ith trial. • Ai ∩Aj = Ø and E = [i=11Ai

  19. Conditional Probability • Very Important Concept • P(A|B) is “fraction of occurrences of B in which A also occurs” • P(A|B) = P(A ∩ B)/P(B); P(B) > 0 • For a fixed B, P(.|B) is a probability • Therefore if A1 and A2 are disjoint then • P(A1 U A2 |B) = P(A1|B) + P(A2|B) • Note, P(A|B U C) =/= P(A|B) + P(A|C) • Also P(A|B) =/= P(B|A)

  20. Standard Example Suppose a test is positive. What is the probability of disease? D is disease +/-; Test positive or negative

  21. Standard Data Mining Example Suppose the data above closely resembles the behaviour of the population at large. What is the chance that those who buy a Diaper will also buy Beer. = P(Diaper ∩ Beer)/P(Diaper) = 0.6/0.8 = 0.75 Is Diaper an Event?

  22. Conditional Independence • If A and B are independent then P(A|B)=P(A) • P(AB) = P(A|B)P(B) • Law of Total Probability.

  23. Bayes Theorem

  24. Question 1 • Question: Suppose you randomly select a credit card holder and the person has defaulted on their credit card. What is the probability that the person selected is a ‘Female’?

  25. Answer to Question 1 But what does G=F and D=Y mean? We have not even formally defined them.

More Related