1 / 27

生醫統計學期末報告

生醫統計學期末報告. Distributions 學生:劉俊成 學號: 993002014 授課老師:蔡章仁. Negative Binomial and Geometric Distributions.

yen
Download Presentation

生醫統計學期末報告

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. 生醫統計學期末報告 Distributions 學生:劉俊成 學號:993002014 授課老師:蔡章仁

  2. Negative Binomial and Geometric Distributions Under the same assumptions as for the binomial distribution, let x and y be discrete random variables. The pdf for the negative binomial distribution is the probability of getting x failures before y successes where p = the probability of success on any single trial; i.e. The geometric distribution is a special case of the negative binomial distribution, where y = 1, namely: This represents the probability of getting x failures before the first success.

  3. Negative Binomial and Geometric Distributions Excel Functions: Excel provides the following function regarding the negative binomial distribution: NEGBINOMDIST(x, y, p) = the probability of getting x failures before y successes where p = the probability of success on any single trial; i.e. the pdf of the negative binomial distribution. Excel 2010/2013 provide the following additional function: NEGBINOM.DIST(x, y, p, cum) where cum takes the values TRUE or FALSE. In particular, NEGBINOM.DIST(x, y, p, FALSE) = NEGBINOMDIST(x, y, p), while NEGBINOM.DIST(x, y, p, TRUE) = the probability of getting at least x failures before y successes, where p = the probability of success on any single trial; i.e. the cumulative probability function.

  4. Negative Binomial and Geometric Distributions Observation: The geometric distribution is memoryless, which means that if you intend to repeat an experiment until the first success, then, given that the first success has not yet occurred, the conditional probability distribution of the number of additional trials required until the first success does not depend on how many failures have already occurred. The die one throws or the coin one tosses does not have a “memory” of these failures. The geometric distribution is in fact the only memoryless discrete distribution. The cumulative distribution function of the geometric distribution is

  5. Negative Binomial and Geometric Distributions Other key statistical properties of the geometric distribution are: Mean = (1 – p) ⁄ p Mode = 0 Range = [0, ∞) Variance = (1 – p) ⁄ p^2 Skewness = (2 – p) ⁄ (1-p)^0.5 Kurtosis = (6 + p^2) ⁄ (1 – p)

  6. Hypergeometric Distribution Under the same assumptions as for the binomial distribution, from a population of size m of which k are successes, a sample of size n is drawn. Let x be a random variable whose value is the number of successes in the sample. The pdf for x, called the hypergeometric distribution, is given by Observations: Let p = k/m. Then the situation is the same as for the binomial distribution B(n, p) except that in the binomial case after each trial the selection (success or failure) is put back in the population, while in the hypergeometric case the selection is not put back and so can’t be drawn again. When n is big the hypergeometric and bionomial distributions yield more or less the same result, but this is not necessarily true for small samples.

  7. Hypergeometric Distribution Excel Functions: Excel provides the following function: HYPERGEOMDIST(x, n, k, m) = the probability of getting x successes from a sample of size n, where the population has size m of which k are successes; i.e. the pdf of the hypergeometric distribution. Excel 2010/2013 provide the following additional function: HYPERGEOM.DIST(x ,n, k, m, cum) where cum takes the values TRUE or FALSE. HYPERGEOM.DIST(x, n, k, m, FALSE) = HYPERGEOMDIST(x, n, k, m), while HYPERGEOM.DIST(x, n, k, m, TRUE) = the probability of getting at most x successes from a sample of size n, where the population has size m of which k are successes; i.e. the cumulative probability function.

  8. Example Mary and Jane both attend the same university, but don’t know each other. Each has about 200 friends at the university. Assuming that each of these groups of friends represents a random sample from the 50,000 students who attend the university, what is the probability that Mary and Jane will have at least one friend in common. It turns out that this problem is equivalent to picking 200 balls at random (representing Mary’s friends) from a bag containing 49,998 balls (representing the 50,000 students less Mary and Jane), 200 of which are blue (representing Jane’s friends), and getting at least one blue ball. We first calculate the probability that none of the balls will be blue as follows: HYPERGEOMDIST(0, 200, 200, 49998) = .448 Thus the answer is 1 – .448 = 55.2%.

  9. Beta Distribution For the binomial distribution the number of successes x is the random variable and the number of trials n and the probability of success p on any single trial are parameters (i.e. constants). Instead we would like to view the probability of success on any single trial as the random variable, and the number of trials n and the total number of successes in n trials as constants. Let α = # of successes in n trials and β = # of failures in n trials (and so α + β = n). The pdf for x = the probability of success on any single trial is given by

  10. Beta Distribution This is a special case of the beta function where Γ is the gamma function.

  11. Beta Distribution Excel Functions: Excel provides the following functions: BETADIST(x, α, β) = the cumulative distribution function F(x) at x for the pdf given above. BETAINV(p, α, β) = x such that BETADIST(x, α, β) = p. Thus BETAINV is the inverse of BETADIST. Excel 2010/2013 provide the following two additional functions: BETA.INV which is equivalent to BETAINV and BETA.DIST(x, α, β, cum) where cum takes the values TRUE or FALSE and BETA.DIST(x, α, β, TRUE) = BETADIST(x, α, β) while BETA.DIST(x, α, β, FALSE) is the pdf of the beta distribution at x (as described above).

  12. Example 1 A lottery organization claims that at least one out of every ten people wins. Of the last 500 lottery tickets sold 37 were winners. Based on this sample, what is the probability that the lottery organization’s claim is true: namely players have at least a 10% probability of buying a winning ticket? What is the 95% confidence interval? To answer the first question we use the cumulative beta distribution function as follows: BETADIST(.1, 37, 463, TRUE) = 98.1% This represents that organization’s claim is false (i.e. less than 10% probability of success). Thus the probability that the organization’s claim is true is 100% – 98.1% = 1.9%

  13. Example 1(cont.) The lower bound of the 95% confidence interval is BETAINV(.025, 37, 463) = 5.3% The upper bound of the 95% confidence interval is BETAINV(.975, 37, 463) = 9.8% Since 10% is not in the 95% confidence (5.3%, 9.8%), we conclude (with 95% confidence) that the lottery’s claim is not accurate.

  14. Multnomial Distribution Given an experiment with the following characteristics: the experiment consists of n independent trials, each with k mutually exclusive outcomes Ei for each trial the probability of outcome Ei is pi Let x1 …, xk be discrete random variables whose values are the number of times outcome Ei occurs in n trials. Then the probability distribution function for x1 …, xk is called the multinomial distribution and is defined as follows: Here The case where k=2 is equivalent to the binomail distribution.

  15. Example 1 Suppose that a bag contains 8 balls: 3 red, 1 green and 4 blue. You reach in the bag pull out a ball at random and then put the ball back in the bag and pull out another ball. This experiment is repeated a total of 10 times. What is the probability that the outcome will result in exactly 4 reds and 6 blues? The possible outcomes for each trial in this experiment are E1 = a red ball is drawn, E2 = a green ball is drawn and E3 = a blue ball is drawn. Thus p1 = 3/8, p2 = 1/8 and p3 = 4/8, x1 = 4, x2 = 0 and x3 = 6.

  16. Multnomial Distribution Excel Function: While Excel does not provide a function for the multinomial distribution, it does provide the following function: MULTINOMIAL(x1 …, xk) = n! / (x1!∙…∙xk!) Thus we could also calculate the answer to Example 9.10 by using the formula MULTINOMIAL(4,0,6)*(3/8)^4*(1/8)^0*(4/8)^6 = .064888 We can also use a range as the argument of MULTINOMIAL as in Figure 1.

  17. Multnomial Distribution Figure 1.

  18. Multnomial Distribution Actually, we can use the following more complicated Excel formula to calculate the same result: =B9*EXP(SUMPRODUCT(B3:B5,LN(B6:B8))) Real Statistics Excel Function: The following supplemental function in the Real Statistics Resource Pack can be used to calculate the multinomial distribution. MULTINOMDIST(R1, R2) = the value of the multinomial pdf where R1 is a range containing the values x1, …, xk and R2 is a range containing the values p1, …, pk Referring to Figure 1, we have MULTINOMDIST(B3:B5,B6:B8) = 0.064888.

  19. Poisson Distribution The Poisson distribution has pdf given by The parameter μ is often replaced by λ. Observation: Some key statistical properties of the Poisson distribution are: Mean = µ Median = µ Skewness = 1 /\! \sqrt{\mu} Kurtosis = 1/µ

  20. Poisson Distribution Excel Function: Excel provides the following function for the Poisson distribution: POISSON(x, μ, cum) where μ = the mean of the distribution and cum takes the values TRUE and FALSE POISSON(x, μ, FALSE) = probability density function value f(x) at the value x for the Poisson distribution with mean μ. POISSON(x, μ, TRUE) = cumulative probability distribution function F(x) at the value x for the Poisson distribution with mean μ. Excel 2010/2013 provide the additional function POISSON.DIST which is equivalent to POISSON.

  21. Theorem 1 If the probability p of success of a single trial approaches 0 while the number of trials n approaches infinity and the value μ = np stays fixed, then the binomial distribution B(n, p) approaches the Poisson distribution with mean μ. Observation: Based on Theorem 1 the Poisson distribution can be used to estimate the binomial distribution when n ≥ 50 and p ≤ .01, preferably with np ≤ 5.

  22. Example 1 A company produces high precision bolts so that the probability of a defect is .05%. In a sample of 4,000 units what is the probability of having more than 3 defects? We can solve this problem using the distribution B(4000, .0005), namely the desired probability is 1 – BINOMDIST(3, 4000, .0006, TRUE) = 1 – 0.857169 = 0.142831 We can also use the Poisson approximation as follows: μ = np = 4000(.0005) = 2 1 – POISSON(3, 2, TRUE) = 1 – 0.857123 = 0.142877 As you can see the approximation is quite accurate.

  23. Example 1(cont.) Observation: If the average number of occurrences of a particular event in an hour (or some other unit of time) is μ and the arrival times are random without any tendency to bunch up then the probability of x events occurring in an hour is given by

  24. Example 2 A large department store sells on average 100 MP3 players a week. Assuming that purchases are as described in the above observation, what is the probability that the store will run out of MP3 players in a week if they stock 120 players? How many MP3 players should the store stock in order to make sure that it has a 99% probability of being able to supply a week’s demand? The probability that they will sell ≤ 120 MP3 players in a week is POISSON(120, 100, TRUE) = 0.977331

  25. Example 2(cont.) Thus, the answer to the first problem is 97.7%. We can answer the second question by using successive approximations until we arrive at the correct answer. E.g. we could try x = 130, which is higher than 120. The cumulative Poisson is 0.998293, which is too high. We then pick x = 125 (halfway between 120 and 130). This yields 0.993202, which is a little too high, and so we try 123. This yields 0.988756, which a little too low, and so we finally arrive at 124 which has cumulative Poisson of 0.991226. Observation: We have observed that under the appropriate conditions the binomial distribution can be approximated by either the Poisson or normal distribution. We conclude this section by stating that the Poisson distribution can be approximated by the normal distribution.

  26. Theorem 2 For n sufficiently large (usually n ≥ 20), if x has a Poisson distribution with mean μ, then x ~ N(μ, μ).

  27. The End Thanks for your atttention!

More Related