80 likes | 400 Views
CS110: Introduction to Computer Science: Lab12. Birthdays Parties and Counting Blind. Using Probabilities to Build Estimates.
E N D
CS110: Introduction to Computer Science: Lab12 Birthdays Parties and Counting Blind Using Probabilities to Build Estimates Suppose you are throwing a surprise party for your friends. However, it is essential that no two people at the party share a birthday. How many people can you invite and still be confident that this happens. Interestingly, solutions to this problem are useful for counting things that are too big to capture. Core Quantitative Issue Computation Probabilities Estimation Supporting Computational Issues Random Numbers Combinations Experimental Science Prepared by Fred Annexstein University of Cincinnati CC Some rights reserved. 2007
The Birthday Paradox A simple problem in elementary probability and statistics is the Birthday Problem: What is the probability that at least two of N randomly selected people have the same birthday? (Same month and day, but not necessarily the same year.) The problem is usually simplified by assuming two things: Nobody was born on February 29. People's birthdays are uniformly distributed over the other 365 days of the year. Build the following table in Excel that gives a random listing of Birthdays, starting with any day (we chose 01/01) You will use Excel’s “conditional formatting” to highlight those rows that indicate days that are duplicates of previously appearing days on the list. Is it surprising how fast duplicates begin to appear?!
The Birthday Problem in terms of Probabilities and Recursion This problem, like others we have seen before, is easier to solve by considering the complementary problem: What is the probability that N randomly selected people have all different birthdays? Well let’s see: Suppose for the moment that we knew the probability that N-1 people all had different birthdays. Let us call that value All_Different (N-1). Good, put that in your pocket. Now focus on the probability that the Nth person coming in to the party also had a different birthday from all others. Sure, we know that would be the chance of hitting any of the days not seen so far. Let us call that value Diff_Person = 365-(N-1) / 365 Hence we know that: All_Different(N) = All_Different(N-1) * Diff_Person This is called a Recursive Function – an extremely important concept in Computer Science. To solve it all you need is an initial value, say for one person N=1, and then iterate.
Questions: • When does the probability become greater than 50-50 that two people at the party share a birthday? • 2. When does the probability become greater than 75% that two people at the party share a birthday? • 3. When does the probability become greater than 99.9% that two people at the party share a birthday? • 4. On the planet Mars, there are 669 Martian days in a Martian year. For a party on Mars when does the probability become greater than 50-50 that two Martians at a party share a birthday?
BUT WHAT ABOUT LEAP YEARS? What happens to the probabilities if we add February 29 to the mix? Do you think it will increase or decrease the chance that 2 people share a birthday? To model the leap year, it gets somewhat more complicated, but we are smart and can make some additional assumptions: • Equal numbers of people are born on days other than February 29. • The number of people born on February 29 is one-fourth of the number of people born on any other day. Why? Hence the probability that a randomly selected person was born on February 29 is 1 out of 1461 = 0.25/365.25, and the probability that a randomly selected person was born on another specified day is 1/365.25. The probability that N persons, possibly including one born on February 29, have all different birthdays is the sum of two probabilities: • That the N persons were born on N different days other than February 29. • That the N persons were born on N different days, and include one person born on February 29. We can add the probabilities because the two cases are mutually exclusive. Now each probability can be expressed recursively:
Leap Year Recursion First compute the case that excludes birthdays on Feb29. We have that the probability that N persons were born on N different days other than February 29 is: All_Diff_no_Feb29(N) = All_Diff_no_Feb29 (N-1) * Diff_Person_no_Feb29 (N) Diff_Person_no_Feb29 (N) = (365- (N-1)) / 365.25 is the probability the Nth person has different birthday from all others. For the second case, which includes one birthday on Feb 29: All_Diff_yes_Feb29 (N) = All_Diff_yes_Feb29(N-1) * Diff_Person_no_Feb_29 (N-1) + All_Diff_no_Feb29(N-1) * 0.25 / 365.25; Redo the calculations with the new ‘leap year’ formulas. Questions: 5. Does the new model increase or decrease the chance that 2 people share a birthday? 6. Do any of your previous answers (#1-3) change with the new calculations?
Estimating the Size of Big We can apply the logic we just used in reverse to estimate the size of something that might be very difficult to count directly, e.g., the number of web-pages on the WWW, the number of deer in Mt. Airy, or the number of ants in the Amazon rainforest. Blind Counting Theorem: Given a large set of N objects. If we select objects at random and with replacement, then the number of expected draws before a repetition occurs is approximately: =SQRT(N * pi() / 2) Run 6 experiments using N=10,100,1000,10000,10^5,10^6 and using your method from Slide 2 you must determine when a repetition occurs. Compute the absolute and relative errors of your results with respect to this Theorem on each experiment. Produce a table like the following: