Probability and Statistics in the Life Sciences Winter 2011 AMS 110.01 Lecture Note 2

1. Probability and Statistics in the Life Sciences (Winter 2011) AMS 110.01Lecture Note 2 Donghyung Lee

2. Chapter 5Sampling Distributions Read Section 5.1, 5.2, 5.3, 5.4, 5.5, 5.6

3. Section 5.3Quantitative Observations Statistic : A numerical value calculated using information gathered from a sample. Ex) Average of weights in our class A statistic is any quantity whose value can be calculated from sample data. Prior to obtaining data, there is uncertainty as to what value of any particular statistic will result. Therefore, a statistic is a random variable and will be denoted by an uppercase letter; a lowercase letter is used to represent the calculated or observed value of the statistic.

4. Section 5.3Quantitative Observations Example 1) The sample mean, regarded as a statistic (before a sample has been selected or an experiment carried out), is denoted by ; the calculated value of this statistic is . 2) S represents the sample standard deviation thought of as a statistic, and its computed value is s.

5. Section 5.3Quantitative Observations Sampling distribution : a sampling distribution is the probability distribution of a given statistic based on a random sample of certain size n. It may be considered as the distribution of the statistic for all possible samples of a given size. The sampling distribution depends on the underlying distribution of the population, the statistic being considered, and the sample size used. (from wikipedia.com)

6. Section 5.3Quantitative Observations Sampling distribution of the sample mean 1) a sampling distribution of the sample mean is the probability distribution of the sample mean based on a random sample of certain size n. 2) the probability distribution that describes sampling variability in the sample mean.

7. Section 5.3Quantitative Observations Sampling distribution of the sample mean 1. MEAN: The mean of the sampling distribution of the sample mean is equal to the population mean. 2. STANDARD DEVIATION: The standard deviation of the sampling distribution of the sample mean is equal to the population standard deviation divided by the square root of the sample size.

8. Section 5.3Quantitative Observations Sampling distribution of the sample mean 3. Shape (a) If the population distribution of X is normal, then the sampling distribution of is normal, regardless of the sample size n. (b) Central Limit Theorem : If n is large, then the sampling distribution of is approximately normal, even if the population distribution of X is not normal.

9. Section 5.3Quantitative Observations Proposition Let be a random sample from a normal distribution with mean and standard deviation . Then for any n, is normally distributed with mean and standard deviation . The Central Limit Theorem (CLT) Let be a random sample from a distribution with mean and standard deviation . Then if n is sufficiently large, has approximately a normal distribution with mean and standard deviation . The larger the value of n, the better the approximation.

10. Section 5.3Quantitative Observations Example 1 A large population of seeds of the princess bean Phaseotus vulgaris is to be sampled, The weights of the seeds in the population follow a normal distribution with mean , and standard deviation . Suppose now that a random sample of four seeds is to be weighed, and let represent the mean weight of the four seeds. 1. Find the distribution of . 2. Find .

11. Section 5.3Quantitative Observations Example 1 1. Find the distribution of . According to the Proposition, the sampling distribution of will be a normal distribution with mean and standard deviation as follows:

12. Section 5.3Quantitative Observations Example 1 2. Find .

13. Section 5.3Quantitative Observations Example 2 (Exercise 5.22 page 165) Professor Smith conducted a class exercise in which students ran a computer program to generate random samples from a population that had a mean of 50 and a standard deviation of 9 mm. Each of Smith�s students took a random sample of size n and calculated the sample mean. Smith found that about 68% of the students had sample means between 48.5 and 51.5 mm. What was n? (Assume that n is large enough that Central Limit Theorem is applicable.)

14. Section 5.3Quantitative Observations Example 2 (Exercise 5.22 page 165)

15. Section 5.3Quantitative Observations Example 3 (Page 165, Exercises 5.18) The heights of a certain population of corn plants follow a normal distribution with mean 145cm and standard deviation 22 cm. 1) What percentage of the plants are between 135 and 155 cm tall? 2) Suppose we were to choose at random from the population a large number of samples of 16 plants each. In what percentage of the samples would the sample mean height be between 135 and 155cm? 3) If represents the mean height of a random sample of 16 plants from the population, what is 4) If represents the mean height of a random sample of 36 plants from the population, what is

16. Section 5.3Quantitative Observations Example 3 (Page 165, Exercises 5.18)

17. Section 5.3Quantitative Observations Example 4 Professor Mendell conducted a class exercise in which students ran a computer program to generate random samples from a population that had a mean of 60 and a standard deviation of 12 mm. Each of Mendell�s students took a random sample of size n and calculated the sample mean. Mendell found that about 98% of the students had sample means less than or equal to 62.5 mm. What was n? (Assume that n is large enough that Central Limit Theorem is applicable.)

18. Section 5.3Quantitative Observations Example 4

19. Section 5.3Quantitative Observations Example 5 The time taken by a randomly selected applicant for a mortgage to fill out a certain form has a normal distribution with mean value 10 min and standard deviation 2 min. If five individuals fill out a form on one day and six on another, what is the probability that the sample average amount of time taken on each day is at most 11 min?

20. Section 5.3Quantitative Observations Example 5

21. Section 5.2Dichotomous Observations Population proportion and sample proportion

22. Section 5.2Dichotomous Observations Example (Superior Vision p152)




26. Section 5.2Dichotomous Observations Dependence on Sample Size

27. Section 4.5The Continuity Correction The Continuity Correction





32. Section 5.5The Normal Approximation to the Binomial Distribution Normal approximation to binomial distribution (Section 5.5) (a) If n is large, then the binomial distribution can be approximated by a normal distribution with n: the sample size ( # of independent trials) p: the population proportion (the probability of success in each independent trial) (b) If n is large, then the sampling distribution of can be approximated by a normal distribution with

33. Section 5.5The Normal Approximation to the Binomial Distribution Example (Superior Vision p152) In a previous example when n=20 and p=.3, we found that Here we can apply the normal approximation to this probability.

34. Section 5.5The Normal Approximation to the Binomial Distribution Example (Superior Vision p152) In a previous example when n=20 and p=.3, we found that Here we can apply the normal approximation to this probability.

35. Chapter 6Confidence Intervals Read Section 6.1, 6.2, 6.3, 6.4, 6.5, 6.6

36. Section 6.1Statistical Estimation Statistical Estimation 1. determining an estimate of some feature of the population 2. assessing the precision of the estimate

37. Section 6.1Statistical Estimation Example (Soybean Growth page 179) As part of a study on plant growth, a plant physiologist grew 13 individually potted soybean seedlings of the type called Wells II. She raised the plants in a greenhouse under identical environmental conditions (light, temperature, soil, and so on). She measured the total stem length (cm) for each plant after 16 days of growth. The data are given as follows: Stem Length (cm) : 20.2, 22.9, 23.3, 20.0, 19.4, 22.0, 22.1, 22.0, 21.9, 21.5, 19.7, 21.5, 20.9

38. Section 6.1Statistical Estimation Example (Soybean Growth page 179) For the data, the mean is . and standard deviation is . Assume that the 13 observations is a random sample from a population. The population could be described by its mean, , and its standard deviation, . = the (population) mean stem length of Wells II soybean plants grown under the specified conditions. = the (population) SD of stem lengths of Wells II soybean plants grown under the specified conditions.

39. Section 6.1Statistical Estimation Example (Soybean Growth page 179) is an estimate of . is an estimate of . In general, is an estimate of . is an estimate of . CAN WE BELIEVE IN THESE ESTIMATES? We should assess the reliability or precision of these estimate. To quantify the confidence of these estimate we use confidence intervals.

40. Section 6.2Standard Error of the Mean The standard error of the mean : an estimate of the standard deviation of the sampling distribution of . The SE can be interpreted in terms of the expected sampling error. The SE is a measure of the reliability or precision of as an estimate of . What is the standard error of the mean?

41. Section 6.2Standard Error of the Mean SE (standard error) Versus SD (standard deviation) - SE : describes the uncertainty (due to sampling error) in the mean of the data. - SD : describes the dispersion of the data Ex) Lamb Birthweights (page 181) A geneticist weighed 28 female lambs at birth. The lambs were all born in April, were all the same breed (Rambouillet), and were all single births (no twins). The diet and other environment condition were the same for all parents. The birthweights are as follows: DATA: 4.3, 5.2, 6.2, 6.7, 5.3, 4.9, 4.7, 5.5, 5.3, 4.0, 4.9, 5.2, 4.9, 5.3 5.4, 5.5, 3.6, 5.8, 5.6, 5.0, 5.2, 5.8, 6.1, 4.9, 4.5, 4.8, 5.4, 4.7

42. Section 6.2Standard Error of the Mean SE (standard error) Versus SD (standard deviation) Ex) Lamb Birthweights (page 181) DATA: 4.3, 5.2, 6.2, 6.7, 5.3, 4.9, 4.7, 5.5, 5.3, 4.0, 4.9, 5.2, 4.9, 5.3 5.4, 5.5, 3.6, 5.8, 5.6, 5.0, 5.2, 5.8, 6.1, 4.9, 4.5, 4.8, 5.4, 4.7 The mean is 5.17kg, the standard deviation SD is .65kg, and the standard error SE is .12kg. SD : describes the variability from one lamb to the next. SE : describes the variability associated with the sample mean (5.17kg), viewed as an estimate of the population mean birthweight. Question) What if the sample size n increases? (the behavior of the sample mean and sample SD and SE)

43. Section 6.2Standard Error of the Mean Example (Exercise 6.2 page184) An agronomist measured the heights of n corn plants. The mean height was 220 cm and the standard deviation was 15 cm. Calculate the standard error or the mean if (a) n=25 (b) n=100

44. Section 6.3Confidence Interval For Derivation of Confidence Interval for Thus, the interval will contain for 95% of all samples.

45. Section 6.3Confidence Interval For Definition (when s is known) If after observing a sample data we compute the observed sample mean and then substitute in place of , the resulting fixed interval is called a 95% confidence interval for . This CI can be expressed either as is a 95% CI for . or as with 95% confidence

46. Section 6.3Confidence Interval For Example (when s is known) Industrial engineers who specialize in ergonomics are concerned with designing workspace and devices operated by workers so as to achieve high productivity and comfort. The article �Studies on Ergonomically Designed Alphanumeric Keyboards� (Human Factors, 1985: 175-187) reports on a study of preferred height for an experimental keyboard with large forearm-wrist support. A sample n=31 trained typists was selected, and the preferred keyboard height was determined for each typist. The resulting sample average preferred height was . Assuming that preferred height is normally distributed with (a value suggested by data in the article), obtain the 95% CI for true average preferred height .

47. Section 6.3Confidence Interval For Example (when s is known) We can be highly confident that 79.3 < <80.7. This interval is relatively narrow, indicating that has been rather precisely estimated.

48. Section 6.3Confidence Interval For Interpreting a Confidence Interval A correct interpretation of �95% confidence� relies on the long-run frequency interpretation of probability: To say that an event A has probability .95 is to say that if the experiment on which A is defined is performed over and over again, in the long run A will occur 95% of the time. Suppose we obtain a sample from a population and compute a 95% interval. Then suppose we consider repeating this for a second sample, a third sample, a fourth sample, and so on. Let A be the event that Since P(A)=.95, in the long run 95% of our computed CIs will contain .

49. Section 6.3Confidence Interval For Definition 1 (when s is known) A 100(1-a)% confidence interval for the mean of a normal population when the value of s is known is given by or, equivalently, by

50. Section 6.3Confidence Interval For Example (when s is known) The production process for engine control housing units of a particular type has recently been modified. Prior to this modification, historical data had suggested that the distribution of hole diameters for bushing on the housings was normal with a standard deviation of .100mm. It is believed that the modification has not affected the shape of the distribution or the standard deviation, but that the value of the mean diameter may have changed. A sample of 40 housing units is selected and hole diameter is determined for each one, resulting in a sample mean diameter of 5.426 mm. Find a confidence interval for true average hole diameter using a confidence level of 90%.

51. Section 6.3Confidence Interval For Example With a reasonably high degree of confidence, we can say that 5.400< <5.452.

52. Section 6.3Confidence Interval For Definition 2 (when s is unknown and n=30) A 100(1-a)% confidence interval for the mean of a population when the value of s is unknown and sample size n=30 is given by or, equivalently, by

53. Section 6.3Confidence Interval For Example (when s is unknown and n=30) As part of a study of the treatment of anemia in cattle, researchers measured the concentration of selenium in the blood of 56 cows who had been given a dietary supplement of selenium (2 mg/day). The cows were all the same breed (Santa Gertrudis) and had borne their first calf during the year. The sample mean selenium concentration was 6.21 ug/dLi and the sample standard deviation was 1.84 ug/dLi. Construct a 95% confidence interval for the population mean.

54. Section 6.3Confidence Interval For Example (when s is unknown and n=30)

55. Section 6.3Confidence Interval For t distribution (1) When is known and the variable is normally distributed or when is unknown and n=30, the standard normal distribution is used to find confidence intervals for the mean. However, in many situations, the population standard deviation is not known and the sample is size is less than 30. In such situations, the standard deviation from the sample can be used in place of the population standard deviation for confidence intervals. But a somewhat different distribution, called the t distribution, must be used when the sample size is less than 30 and the variable is normally or approximately normally distributed.

56. Section 6.3Confidence Interval For t distribution (2) - Similar to the standard normal distribution in these ways. 1. It is bell-shaped. 2. It is symmetric about the mean, 3. The mean, median, and mode are equal to 0 and are located at the center of the distribution. 4. The curve never touches the x-axis. - Differs from the standard normal distribution in the following ways. 1. The variance is greater than 1. 2. The t distribution is actually a family of curves based on the concept of degree of freedom (df), which is related to sample size. (df = n-1) 3. As the sample size increases, the t distribution approaches the standard normal distribution.

57. Section 6.3Confidence Interval For t distribution (3)

58. Section 6.3Confidence Interval For t distribution (4): t distribution table (page 677 Table4)

59. Section 6.3Confidence Interval For t distribution (5) When sample size =10 Find

60. Section 6.3Confidence Interval For Definition 3 (when s is unknown and n < 30) A 100(1-a)% confidence interval for the mean of a normal population when the value of s is unknown and sample size n < 30 is given by or, equivalently, by The degrees of freedom are n-1.

61. Section 6.3Confidence Interval For Example (when s is unknown and n < 30) As part of a study of the development of the thymus gland, researchers weighed the glands of five chick embryos after 14 days of incubation. The thymus weight(mg) were as follows: 29.6 21.5 28.0 34.6 44.9 For these data, the mean is 31.72 and the standard deviation is 8.729 (We assume that the population distribution follows a normal.) Construct a 90% confidence interval for the population mean.

62. Section 6.3Confidence Interval For Example (when s is unknown and n < 30)

63. Section 6.3Confidence Interval For When to use the z or t distribution

64. Section 6.3Confidence Interval For Example (example 6.9 page 192) Lone bone mineral density often leads to hip fractures in the elderly. In an experiment to assess the effectiveness of hormone replacement therapy, researchers gave conjugated equine estrogen (CEE) to a sample of 94 women between the ages of 45 and 64. After taking the medication for 36 months, the bone mineral density was measured for each of the 94 women. The average density was .878 g/cm2, with a standard deviation of .126 g/cm2. Find a 95% confidence interval.

65. Section 6.3Confidence Interval For Example (example 6.9 page 192) : This is a solution from our text book. (The degree of freedom is 93 but we use 80 degrees of freedom since Table 4 doesn�t list 93 degrees of freedom)

66. Section 6.3Confidence Interval For Example (example 6.9 page 192) : This is my solution. Thus, we are 95% confident that the average hip done mineral density of all women age 45 to 64 who take CEE for 36 months is between .852 g/cm2 and .904 g/cm2

67. Section 6.3Confidence Interval For Example A massive multistate outbreak of food-borne illness was attributed to Salmonella enteritidis. Epidemiologists determined that the source of the illness was ice cream. They sampled nine production runs from the company that had produced the ice cream to determine the level of Salmonella enteritidis in the ice cream. These levels (MPN/g) are as follows: .593 .142 .329 .691 .231 .793 .519 .392 .418 Find a 99% confidence interval for the average level of Salmonella enteritidis in the ice cream.

68. Section 6.3Confidence Interval For Example

69. Section 6.4Choosing the sample size for estimating Sample Size (1) How can we determine the number of observations to include in the sample? Data collection costs money. If the sample is too large, time and talent are wasted. Conversely, it is wasteful if the sample is too small. Hence, the number of observations to be included in the sample will be a compromise between the desired accuracy of the sample statistic as an estimate of the population parameter and the required time and cost to achieve this degree of accuracy. There are two considerations in determining the appropriate sample size for estimating using a confidence interval. First, the tolerable error establishes the desired width of the interval. The second consideration is the level of confidence. (From An Introduction to Statistical Methods and Data Analysis 5th editon)

70. Section 6.4Choosing the sample size for estimating Sample Size (2) Suppose we want to estimate using a 100(1-a)% confidence interval having tolerable error W. The margin of error E is defined to be half width of a 100(1-a)% CI. Note that determining a sample size to estimate requires knowledge of the population standard deviation .

71. Section 6.4Choosing the sample size for estimating Sample Size (Example 1) AMS 110 students wanted to estimate the mean height of Stony Brook students with a 95% confidence interval having a tolerable error of 4. The population standard deviation is known as 7.5 cm. How many students must be included in the sample to achieve their specifications?

72. Section 6.4Choosing the sample size for estimating Sample Size (Example 1) Thus, 55 students should be included in the sample to achieve their specifications.

73. Section 6.4Choosing the sample size for estimating Sample Size (Example 2) An insurance company is concerned about the number of worker compensation claims based on back injuries by baggers in grocery stores. They want to evaluate the fitness of baggers at the many grocery stores they insure. The workers selected for the study will be evaluated to determine the amount of weight that they can lift without undue back stress. From studies by other insurance companies, pounds. How many baggers must be included in the study to be 99% confident that the average weight lifted is estimated to within 8 pounds?

74. Section 6.4Choosing the sample size for estimating Sample Size (Example 2)

75. Section 6.6Confidence Interval for a Population Proportion Confidence interval for a population proportion Up to this point in Chapter 6, we have described confidence intervals when the observed variable is quantitative. Now we will turn our attention to situations in which the variable is categorical and the parameter of interest is a population proportion. Suppose a geneticist observes n guinea pigs whose coat color can be either black or white; let us fix attention on the category �black.� Let p denote the population proportion of the category, and let denote the corresponding sample proportion. , where y is the number of �black� out of n A natural estimate of the population proportion, p, is the sample proportion, .

76. Section 6.6Confidence Interval for a Population Proportion Normal approximation to binomial distribution (Section 5.5) (a) If n is large, then the binomial distribution can be approximated by a normal distribution with n: the sample size ( # of independent trials) p: the population proportion (the probability of success in each independent trial) (b) If n is large, then the sampling distribution of can be approximated by a normal distribution with

77. Section 6.6Confidence Interval for a Population Proportion Normal approximation to binomial distribution (Section 5.5) (a) If n is large, then the binomial distribution can be approximated by a normal distribution with n: the sample size ( # of independent trials) p: the population proportion (the probability of success in each independent trial) (b) If n is large, then the sampling distribution of can be approximated by a normal distribution with

78. Section 6.6Confidence Interval for a Population Proportion Wald confidence interval (1) If n is large,

79. Section 6.6Confidence Interval for a Population Proportion Wald confidence interval (2)

80. Section 6.6Confidence Interval for a Population Proportion Wald confidence interval (3) Most books present the Wald confidence interval, since it is much more simple in form. However, the Wald confidence interval has poor coverage properties: A nominal 95% Wald confidence interval might actually cover p only 80% of the time, rather than 95% of the time. In addition, the Wald confidence interval doesn�t work properly when the sample proportion is exactly one or exactly zero or the sample size is small or the probability p is extreme.

81. Section 6.6Confidence Interval for a Population Proportion Wald confidence interval (4) (the sample proportion)

82. Section 6.6Confidence Interval for a Population Proportion Wilson confidence interval : This interval has good properties even for a small number of trials (small sample size) and/or an extreme probability. (the modified sample proportion)

83. Section 6.6Confidence Interval for a Population Proportion Example 1 (ex 6.16 page208) BRCA1 is a gene that has been linked to breast cancer. Researchers used DNA analysis to search for BRCA1 mutations in 169 women with family histories of breast cancer. Of the 169 women tested, 27(16%) had BRAC1 mutations. Let p denote the probability that a woman with a family history of breast cancer will have a BRAC1 mutation. 1) Find 95% and 99% Wald confidence interval for p. 2) Find 95% and 99% Wilson confidence interval for p.

84. Section 6.6Confidence Interval for a Population Proportion Example 1 (ex 6.16 page208) 1) 95% and 99% Wald confidence interval

85. Section 6.6Confidence Interval for a Population Proportion Example 1 (ex 6.16 page208) 2) 95% and 99% Wilson confidence interval


87. Section 6.6Confidence Interval for a Population Proportion Example 2 (ex 6.16 page209) Extracorporeal membrane oxygenation (ECMO) is a potentially life saving procedure that is used to treat newborn babies who suffer from severe respiratory failure. An experiment was conducted in which 11 babies were treated with ECMO; none of the 11 babies died. Let p denote the probability of death for a baby treated with ECMO. 1) Find 95% and 99% Wald confidence interval for p. 2) Find 95% and 99% Wilson confidence interval for p.

88. Section 6.6Confidence Interval for a Population Proportion Example 2 (ex 6.16 page209) 1) 95% and 99% Wald confidence interval


90. Section 6.6Confidence Interval for a Population Proportion Example 2 (ex 6.16 page209) 2) 95% and 99% Wilson confidence interval We know that p cannot be negative, so we state the confidence interval as (0,.299). Thus, we are 95% confident that the probability of death in a newborn with severe respiratory failure who is treated with ECMO is between 0 and .299.


92. Section 6.6Confidence Interval for a Population Proportion Example 2 (ex 6.16 page209) 2) 95% and 99% Wilson confidence interval We know that p cannot be negative, so we state the confidence interval as (0,.429). Thus, we are 99% confident that the probability of death in a newborn with severe respiratory failure who is treated with ECMO is between 0 and .429.

Probability and Statistics in the Life Sciences Winter 2011 AMS 110.01 Lecture Note 2

Probability and Statistics in the Life Sciences Winter 2011 AMS 110.01 Lecture Note 2

Presentation Transcript

Life Sciences Statistics 2011

Probability and Statistics in the Life Sciences Winter 2011 AMS 110.01 Lecture Note 1

ST1232 Statistics in the Life Sciences

Probability and statistics

Probability and Statistics

NA3873 Winter 2008 Probability, Statistics and Random Processes

Lecture 10: Probability and Statistics (part 2)

2. Review of Probability and Statistics

Probability and Statistics for Sciences and Engineers (EMIS 7370) Summer 2011

Probability &Statistics Lecture 8

Probability and Statistics

Probability and Statistics

Probability and Statistics

P340 Lecture 3 “Probability and Statistics”

PROBABILITY AND STATISTICS

PROBABILITY AND STATISTICS IN THE LAW

Probability and Statistics

Probability & Statistics Lecture 1

Statistics and Probability Part 2

Life Sciences Statistics 2011

Probability and Statistics Lecture notes 03

ENGR 224/STAT 224 Probability and Statistics Lecture 2

Probability and Statistics in the Life Sciences Winter 2011 AMS 110.01 Lecture Note 2