1 / 78

Lecture 5 probability model normal distribution & binomial distribution

Lecture 5 probability model normal distribution & binomial distribution. xiaojinyu@seu.edu.cn. Contents. Normal distribution for continuous data Binomial distribution for binary categorical data. The Normal Distribution. The most important distribution in statistics. Normal distribution.

caseyc
Download Presentation

Lecture 5 probability model normal distribution & binomial distribution

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Lecture 5 probability modelnormal distribution & binomial distribution xiaojinyu@seu.edu.cn

  2. Contents • Normal distribution for continuous data • Binomial distribution for binary categorical data

  3. The Normal Distribution The most important distribution in statistics.

  4. Normal distribution • Introduction to normal distribution • History • Parameters and shape • standard normal distribution and Z score • Area under the curve • Application • Estimate of frequency distribution • Reference interval (range) in health_related field.

  5. histroy-Normal Distribution • Johann Carl Friedrich Gauss • Germany • One of the greatest mathematician • Applied in physics, astronomy • Gaussian distribution (1777~1855) Mark and Stamp in memory of Gauss.

  6. The Most Important Distribution • Many real life distributions are approximately normal. such as height, EFV1,weight, IQ, and so on. • Many other distributions can be almost normalized by appropriate data transformation (e.g. taking the log). When log X has a normal distribution, X is said to have a lognormal distribution.

  7. (a) (b) (c) (d) Frequency distributions of heights of adult men.

  8. Histogram- the area of the bars Cumulative relative frequency in the sample, the proportion of the boys of age 12 that are lower than a specified height. normal distribution curve The area under the curve The cumulative probability. In the population. Generally speaking, the chance that a boy of aged 12 is lower than a specified height if he grow normally Sample & Population

  9. Definition of Normal distribution • X~N(,2), X is distributed as normal distribution with mean  and variance2. • The probability density function (PDF) f (x) for a normal distribution is given by Where:e = 2.7182818285, base of natural logarithm  = 3.1415926536, ratio of the circumference of a circle to the diameter. (- < X < +)

  10. The shape of a normal distribution .4 f(x) .3 .2 .1 0 x

  11. 3 1 2 The normal distributions with the equal variance but different means

  12. 2 3 The normal distributions with the same mean but different variances 1 

  13. Properties Of Normal Distribution • & completely determine the characterization of the normal distribution. • Mean, median , mode are equal • The curve is symmetric about mean. • The relationship between  and the area under the normal curve provides another main characteristic of the normal distribution.

  14. Areas under the Standard Normal Curve • A variable that has a normal distribution with mean 0 and variance 1 is called the standard normal variate and is commonly designated by the letter Z. • N(0,1) • As with any continuous variable, probability calculations here are always concerned with finding the probability that the variable assumes any value in an interval between two specific points a and b.

  15. Cumulative distribution Function ( • the area under the curve) from -∞ to x, cumulative Probability • Example: What is the probability of obtaining a z value of 0.5 or less? • We have S(-, )=1

  16. Area under standard normal distribution (Z) Z 0.00 -0.02 -0.04 -0.06 -0.08 -3.0 0.0013 0.0013 0.0012 0.0011 0.0010 -2.5 0.0062 0.0059 0.0055 0.0052 0.0049 -2.0 0.0228 0.0217 0.0207 0.0197 0.0188 -1.9 0.0287 0.0274 0.0262 0.0250 0.0239 -1.6 0.0548 0.0526 0.0505 0.0485 0.0465 -1.0 0.1587 0.1539 0.1492 0.1446 0.1401 -0.5 0.3085 0.3015 0.2946 0.2877 0.2810 0 0.5000 0.4920 0.4840 0.4761 0.4681 Z 0 Z is the standard score, that is the units of standard deviation.

  17. Figure Standard normal curve and some important divisions. • P(-1 < z < 1)=0.6826 • P(-2 < z < 2)=0.9545 • P(-3 < z < 3)=0.9974

  18. Find probability in Excel • Using an electronic table, find the area under the standard normal density to the left of 2.824. • We use the excel2007 function NORMSDIST evaluated at 2.824 [NORMSDIST(2.824)]with the result as follows:

  19. EXAMPLE • What is the probability of obtaining a z value between 1.0 and 1.58? • We have

  20. CUMULATIVE PROBABILITY FOR X~N(μ,σ2) • Z=(X-μ)/σ X= μ+Zσ -3 -2 -  + +2 +3 x

  21. Areas under the Normal Curve S(-, +3)=0.9987 S(-, +2)=0.9772 S(-, +1)=0.6587 S(-, )=1 S(-, -3)=0.0013 S(-, -2)=0.0228 S(-, -1)=0.1587 S(-, )=0.5 -3 -2 -  + +2 +3 x -4 -3 -2 -1 0 1 2 3 4 Z

  22. -3 -2 -1 0 1 2 3 Area Under Normal Curve S(-, -3)=0.0013 S(-, -2)=0.0228 S(-, -1)=0.1587 S(-, -0)=0.5 S(-3, -2)=0.0115 S(-2, -1)=0.1359 S(-1,  )=0.3413 -3 -  + +3 -2 +2 -3 -2 -  + +2 +3

  23. -3 -2 -1 0 1 2 3 Area Under Normal Curve 95% 2.5% 2.5% +1.96 -1.96

  24. 90% 5% 5%  +1.64 -1.64 -3 -2 -1 0 1 2 3 Area Under Normal Curve

  25. -3 -2 -1 0 1 2 3 Area Under Normal Curve 99% 0.5% 0.5% -2.58 +2.58

  26. -3 -2 -1 0 1 2 3 Area Under Normal Curve 95% 2.5% 2.5% +1.96 -1.96 26

  27. 95% heights of females will fall in the range between mean -1.96SD and mean +1.96SD and

  28. Z score, Standard Score • Transform N(,2) toN(0,1z is refer to as Standard Normal score • How many SD’s the observation from the mean? • Transformation of a normal distribution such that the units are in SD’s. (z score, Standard Score) • By the units of SD, we can compare the observations from diff population. A female with height 172 cm a male with height 172 cm

  29. Values of variable & area under curve • The area that falls in the interval under the nonstandard normal curve is the same as that under the standard normal curve within the corresponding u-boundaries.

  30. The Most Important Distribution • Inpractice Many real life distributions are approximately normal, such as height, weight, IQ, GB and so on • In theory Many other distributions can be almost normalized by appropriate data transformation (e.g. taking the log); • 30

  31. Summarizing • The fundamental probability distribution of statistics. • A very important distribution both in theory and in practice. • The normal distribution has a set of curves. Defined by mean and SD. (infinite) • N(0,1) is unique. • The areas under normal curveare equal when measured by standard deviation.

  32. Applications of Normal distribution • Estimate frequency distribution • Estimate Reference Range

  33. Estimate frequency distribution Example: • IF the distribution of birth weights follows a normal distribution with mean 3150g, and standard deviation is 350g。 • To estimate what proportion of infants whose birth weight are less than 2500g?

  34. Solve for the Example: • The standard normal deviate if x=2500: Z=(x-3150)/350=-1.86 • The probability when Z<-1.86 under the standard normal distribution : ϕ(-1.86)=P(z<-1.86)=0.0314 • Result: there are about 3.14% infants whose birth weight are less than 2500g.

  35. 0.0314 2500 Estimate Frequency Distribution 3150 

  36. Using Normal Distribution • For any variables distributed as normal distribution, 95% individuals assume values between μ-1.96σ~μ+1.96σ; • 99% between -2.58~ +2.58 ; • And so on.

  37. Reference Interval( Range) • In health-related fields, a reference range or reference interval usually describes the variations of a measurement or value in healthy individuals. • It is a basis for a physician or other health professional to interpret a set of results for a particular patient. • The standard definition of a reference range (usually referred to if not otherwise specified) basically originates in what is most prevalent in a reference group taken from the population. However, there are also optimal health ranges that are those that appear to have the optimal health impact on people.

  38. Reference Interval( Range) • What is ? • A range of values within which majority of measure-ments from “normal” subjects will lie. • Majority: 90%,95%,99%, etc.。 • Usage: • Used as the basis for assessing the result of diagnostic tests in clinic. (normal? abnormal?) • Definitions of “Normal subject”: • Normal  Healthy • maybe suffer from other diseases, but do not influence the variable we studied.

  39. How to estimate a reference interval? • Homogeneity of normal subjects. 100 • Measurement errors are controlled • One side? Two sides? • Majority? 90%,95%? • Is it necessary to estimated RI in subgroups? (considerations of partitioning based on age, sex etc) • Determine the suspect range if necessary

  40. Two-side or One-side • Determined by medical professional. • Two-side: • WBC, BP, serum total cholesterol, …… • One-side: • Upper Limit :urine Ld, hair Hg, …Normal as long as lower than • Low Limit:Vital Capacity, IQ, FEV1 (forced expiratory volume in one second) • Normal as long as great than

  41. Overlap distributed of observations for Normal and Abnormal (one-side) Normal Subject False-negative rate False-positive rate Abnormal 界值

  42. Overlap distributed of observations for Normal and Abnormal (one-side) Normal Subject False-negative rate False-positive rate Abnormal

  43. Overlap distributed of observations for Normal and Abnormal (two-side) False-negative rate False-positive rate Normal Subject Abnormal Abnormal

  44. Normal approximate method • For normally distributed data • A 95% reference interval • Two-side: • One-side: For upper limit: For low limit:

  45. Percentile Method • For non-normally distributed data • A 95% reference interval • Two-side: P2.5 ~ P97.5 • One-side: For upper limit: <P95 For low limit: >P5

  46. Example • Hb (hemoglobin) for 360 normal male. • The mean is 13.45 g/100ml; • The standard deviation is 0.71 g/100ml; • Hb is normally distributed. • Estimate the 95% reference range and the 90% reference range.

  47. Example (cont.) • Two side • The 95% reference range is 12.06~14.84 (g/100ml)

  48. Example (cont.) • Two side The 90% reference range is 12.29~14.61 (g/100ml) The 95% reference range is 12.06~14.84 (g/100ml)

  49. Two methods for reference intervals. Method two-side One-side Low Upper Normal Percentile P2.5~P97.5>P5<P95 

  50. Central Limit Theorem • As a sample size increased, the means of samples drawn from a population of and distribution will approach the normal distribution. This theorem is known as the central limit theorem (CLT). • That is Sampling distributions • Probability and the central limit theorem

More Related