1 / 24

Probability and Probability Distributions

Probability and Probability Distributions. Concepts in Statistics. The Big Picture. Probability : the underlying foundation for the methods of statistical inference. Probability.

norris
Download Presentation

Probability and Probability Distributions

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Probability and Probability Distributions Concepts in Statistics

  2. The Big Picture Probability: the underlying foundation for the methods of statistical inference

  3. Probability The probability that an event will occur is a number between (and including) 0 and 1. We write this idea in mathematical notation as . What is the probability that when you flip a coin you get heads? There are two equally likely outcomes: heads or tails This is the theoretical probability of getting a head when you toss a coin. We determine the number of ways an event can occur and divide by the total number of possible outcomes. No experiments or data collection is necessary.

  4. Theoretical and Empirical Probability Empirical probability of an event is an estimate, using data, of the likelihood that the event will happen.  The empirical probability will approach the theoretical probability after a large number of repetitions. There is less variability in a large number of repetitions. This means that in the long run, we will see a pattern, so we are more confident about estimating the probability of an event using empirical probability with a large number of repetitions.

  5. Random or due to chance? When we say that an event is random or due to chance, we mean that the event is unpredictable in the short run but has a regular and predictable behavior in the long run. This is obviously true for the coin-tossing activity. We cannot predict whether an individual toss will be heads, but in the long run, the outcomes have a predictable pattern. The relative frequency of heads is very close to 0.5 for a fair coin. • We can make probability statements only about random events. • Probability of an event A is the relative frequency with which that event occurs in a long series of repetitions

  6. Probability Distribution We think of all possible outcomes as variable values. Each variable value has a probability. The variable values together with their probabilities are a probability distribution. For example, the Stanford University’s Blood Center gives the probabilities of human blood types in the United States as follows:

  7. Probability Rules • The outcomes are random events. When we randomly choose a person, we do not know their blood type. But there is a predictable pattern in the outcomes that is described by the relative frequencies. • All outcomes are assigned a probability. • The probabilities are numbers between 0 and 1. This makes sense because each probability is a relative frequency. • The sum of all of the probabilities is 1. This makes sense because we have listed all the outcomes. Since each probability is a relative frequency, these outcomes make up 100% of the observations.

  8. Probability Rules When two events have no outcomes in common, they are disjoint. If two events are disjoint, then we can add their individual probabilities. We write this fact as a rule: For example, the probability that a randomly selected person from the United States is type A or type O is The complement of event is the event composed of outcomes that are “not ”. The complement rule is written as: For example, the probability that a random person in the United States is not a universal donor (type O) is .

  9. Independence and Conditional Probability If , then the two events A and B are independent. To say two events are independent means that the occurrence of one event makes it neither more nor less probable that the other occurs. The general rule that relates joint, conditional, and marginal probabilities is: When and are independent events, then , so our rule becomes:

  10. Discrete Random Variables Discrete random variables have numeric values that can be listed and often can be counted. For example, the variable number of boreal owl eggs in a nest is a discrete random variable. Blood type is not a discrete random variable because it is categorical. Continuous random variables have numeric values that can be any number in an interval. For example, the (exact) weight of a person is a continuous random variable. Foot length is also a continuous random variable. Continuous random variables are often measurements, such as weight or length. With a discrete variable, you can count the possible values for the variable without rounding off. With a continuous variable, you cannot.

  11. Probability Distribution forDiscrete Random Variables The probability distribution of a discrete random variable can be represented with a probability histogram. The horizontal axis accounts for therange of all possible values of therandom variable, and the vertical axisrepresents the probabilities of thosevalues. The heights of the bars addto 1, which is not surprising since theheights represent probabilities.

  12. The Mean of a Discrete Random Variable The mean of a discrete random variable is a weighted average of all the possible values of the random variable , where each value is weighted by its probability. Another term used to describe the mean is expected value. It is a useful term because it reminds us that the mean of a random variable is not calculated on a fixed data set. Rather, the mean (expected value) is a measure of the expected long-term behavior of the random variable.

  13. The Standard Deviation for a Discrete Random Variable Here is the formula for the standard deviation of a discrete random variable.  represents the probability of , where  is a value of the random variable , andis the mean of X. We focus on the term inside the square root: The term here represents the deviation of each value of the random variable from the mean , just as the term represents the deviation of each observation of the data set from the mean .

  14. Continuous Probability Distribution Forcontinuous random variables, the probability distribution can be approximated by a smooth curve called a probability density curve. As in a probability histogram, the total area under the density curve equals 1, and the curve represents probabilities by area. To find the probability that X is in an interval, find the area above the interval and below the density curve.

  15. Continuous Probability Distribution The probability distribution of a continuous random variable is represented by a probability density curve. The probability that  has a value in any interval of interest is the area above this interval and below the density curve.

  16. Observations of Normal Distributions Variables, such as weight, shoe sizes, and other human physical characteristics, exhibit distributions that are fairly symmetric with a central peak. We call these bell-shaped. Symmetry indicates that the variable is just as likely to take a value a certain distance below its mean as it is to take a value that same distance above its mean. The bell shape indicates that values closer to the mean are more likely, increasingly unlikely to take values far from the mean in either direction. We use a mathematical model with a smooth bell-shaped curve to describe these bell-shaped data distributions. These models are called normal curves or normal distributions.

  17. Observations of Normal Distributions There are many normal curves. Even though all normal curves have the same bell shape, they vary in their center and spread. • The black & red normal curves have means or centers at μ = 10. However, the red curve is more spread out and thus has a larger standard deviation. • The black and the green normal curves have the same standard deviation or spread.

  18. Observations of Normal Distributions • We use  to represent the mean of data in a sample. We use to represent the mean of a density curve defined by a mathematical model. • We use SD or ​​ to represent the standard deviation of data in sample. We use σ to represent the standard deviation of a density curve defined by a mathematical model. All normal curves share a basic geometry. While the mean locates the center of a normal curve, it is the standard deviation that is in control of the geometry.

  19. Observations of Normal Distributions Random variable  that has a normal distribution with mean = 10 and standard deviation = 2. μ = 10 and σ = 2 • The point 1SDless than the mean is represented by μ − σ . Since μ = 10 and σ = 2, this point is located at 10 − 2 = 8, as shown. • The point 1 SD more than the mean is represented by μ + σ . Since μ = 10 and σ = 2, this point is located at 10 + 2 = 12, as shown. You will notice we have indicated that the area of the green region is 0.68. So we can say that the probability of  being between 8 and 12 equals 0.68.

  20. Empirical Rule for Normal Curves • The probability that X is within 1 SD of the mean equals approx.0.68 • The probability that X is within 2 SD of the mean equals approx.0.95 • The probability that X is within 3 SD of the mean equals approx.0.997​​

  21. Normal Random Variables To compare -values from different distributions, we standardize the values by finding a -score:  A-score measures how far  is from the mean in standard deviations. In other words, the -score is the number of standard deviations X is from the mean of the distribution. For example, = 1 means the -value is 1 standard deviation above the mean.

  22. Normal Random Variables To compare -values from different distributions, we standardize the values by finding a -score. • If we convert the -values into -scores, the distribution of -scores is also a normal density curve. This curve is called the standard normal distribution. We use a simulation with the standard normal curve to find probabilities for any normal distribution. • We can also work backwards and find the x-value for a given probability. We used a different simulation to work backwards from probabilities to x-values. With this simulation, we found x-values corresponding to quartiles and percentiles. The-score of an -value is the number of standard deviations  is away from the mean. As a formula, this is

  23. Normal Random Variables Given values of a normal random variable, find an associated probability. The two basic steps in the solution process were as follows: • Convert value to a score. • Use the simulation to find associated probability. The-score of an -value is the number of standard deviations  is away from the mean. As a formula, this is

  24. Quick Review • Ameasure of the likelihood that the event occurs is? • What do theoretical methods use to determine probabilities? • What are discrete random variables? • What do you use to model the probability distribution for many variables? • What is the empirical rule for normal curves? • What are two ways of determing probabilities? • How do you compare x-values from different distributions? • What does a z-score measure? • The probability of an event is a measure of what?

More Related