1 / 59

BSc/HND IETM Week 9/10 - Some Probability Distributions

BSc/HND IETM Week 9/10 - Some Probability Distributions. When we looked at the histogram a few weeks ago, we were looking at frequency distributions.

delta
Download Presentation

BSc/HND IETM Week 9/10 - Some Probability Distributions

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. BSc/HND IETM Week 9/10 - Some Probability Distributions

  2. When we looked at the histogram a few weeks ago, we were looking at frequency distributions.

  3. It is possible to convert such frequency distributions into probability distributions, such that the probability of encountering some particular value (or range of values) of x is plotted on the vertical axis, rather than the number of occurrences of that value of x.

  4. There are a few standard forms of such distributions, which make analysis rather easy - so long as the data really do fit the chosen form.

  5. We shall look at two of these standard forms, the normal and the negative exponential distributions.

  6. Probability distributions from frequency distributions

  7. Suppose that our previously-mentioned (and, sadly, hypothetical) optional unit for your course, ‘Flower Arranging for Engineers’, becomes extremely popular.

  8. In fact, it becomes so popular that it is studied by 208 students, from all the various BSc courses in the School.

  9. In an effort to analyse the performance of the students, so as to determine if any improvements to the unit are required, we might decide to plot a histogram of the final marks obtained.

  10. As we know, this is a frequency distribution, and might be obtained from the following summary of the students’ scores, as shown:

  11. Frequency polygonsThe first step in the conversion is to change from the histogram to what is called a frequency polygon. This is simply a line graph, joining the centres of each of the chosen data intervals.

  12. At the ends, our frequency polygon reaches the zero axis as shown, since no student can obtain less than zero or more than 100 per cent. In situations when this doesn’t apply, it is conventional to terminate the polygon on the zero axis, half way through the next interval.

  13. It is very easy to obtain probability distributions from diagrams such as those above. All that is necessary is to divide each frequency by the total number of (in this case) students, to obtain the probability of any individual student, selected at random, obtaining a mark in a particular range.

  14. For example, to convert the histogram on page 1, or the frequency polygon on page 2, into probability distributions, simply divide every number on the vertical axis (and therefore also the numbers written on the plots) by 208.

  15. Thus, the vertical axes would now be calibrated in probabilities from zero to 53/208 = 0.255.

  16. The probability of any given student obtaining a mark in the range 40 to 49.9 per cent will be 47/208 = 0.226. The probability of a student scoring 90 per cent or more will be 3/208 = 0.0144, etc.

  17. The normal distribution It is not very surprising that the marks distribution (frequency or probability) looks like the diagrams above.

  18. In a fair examination, taken by a large number of students, we would expect that only a few students would obtain either abysmally low marks or astronomically high marks.

  19. We would expect the majority of marks to be ‘somewhere in the middle’, with a ‘tail’ at both the low and the high ends of the range.

  20. We would expect the majority of marks to be ‘somewhere in the middle’, with a ‘tail’ at both the low and the high ends of the range. This is what we see above.

  21. Several real-life situations fit this general form of distribution, where it is most likely that results will be clustered around the centre of some range, with outlying values tailing off towards the ends of the range.

  22. Wisniewski, in his ‘Foundation’ text, uses an example based on the distributions of the weights of breakfast cereal packed by machines into boxes.

  23. There should always ideally be the stated amount in a box but, inevitably, some boxes will be lighter, and some heavier. There will be the odd ‘rogue’ boxes a long way from the mean.

  24. To make it easier to cope with such situations, they are often assumed to fit a standardised probability distribution, called the normal distribution.

  25. By doing this, it is possible to use standard printed tables to make predictions such as (for example), how many students would be expected to score less than 40 per cent

  26. To allow standard tables to be used, we need to assume a certain fixed shape of probability distribution, and we also need to define it in terms of mean and standard deviation.

  27. We cannot define it in terms of actual data values (e.g. examination marks, or weight of cereal in a box), otherwise we would need a different set of tables for every new problem.

  28. The normal distribution curve is actually defined by a rather unpleasant formula (but we don’t need to use it, as we are going to use tables which have been derived from it by someone else).

  29. If the variable in which we are interested is x (e.g. a mark in per cent, or the weight of cereal in a box in kg), the mean value of x is and the standard deviation of the data set is x,

  30. then the normal distribution curve is defined by the probability that x will take a particular value (P(x)) obeying the following relationship (I believe there is an error in Wisniewski’s version):

  31. The resulting plot of P(x) as x varies is a ‘bell-shaped’ curve, as shown in the next slide.

  32. Notes1. The “x axis” is in STANDARD DEVIATIONS2. The total area under the graph is 1 unit.3. The area under the graph between two values of x gives the probability that the quantity will be between those values.

  33. ExampleSay that a large set of examination results has a mean of 55 per cent, and a standard deviation of 15 per cent.

  34. How many students would we expect to fail the examination (if we define a failure as obtaining less than 40 per cent), and how many students would we expect to get a first-class result (defined as obtaining 70 per cent or more)?

  35. X = 1.0 (1 SD from mean)First : Probability 0.1587Fail: Also 0.1587 !

  36. The negative exponential distributionTo cover a wider range of real-world situations, more ‘standardised’ probability distributions are required.

  37. The other one we shall briefly look at is the negative-exponential distribution. This is also sometimes called a ‘failure-rate’ curve, because it tends to describe how components fail with time.

  38. If a certain number of components is manufactured and put into service, it is reasonable to assume that they will all eventually fail.

  39. If a certain number of components is manufactured and put into service, it is reasonable to assume that they will all eventually fail. The probability of any one of the components failing during a given time period might well depend on how many components are left in service.

  40. Choose to measure time t in the best units for the problem (seconds, months, years, etc.). Technically, the unit chosen should be short compared with the expected lifetime of a component, so that any given component is expected to last for many time units.

  41. Let  be the failure rate, that is, the proportion of components expected to fail in one time unit. This means that  must have ‘dimensions’ of (1/time). In the example above, we said that 1 per cent of components might fail in three years so, in that case, the failure rate  0.01/3 (proportion per year).

  42. This can also be viewed as a probability - there is a probability of 0.01/3 that any given component will fail in a given period of one year.

  43. Therefore, to find the proportion of components expected to fail over a time t (measured in our chosen units), we need the quantity t. This is now dimensionless - it is actually the probability that any given component will fail over the stated time period.

More Related