780 likes | 799 Views
Lecture 5 probability model normal distribution & binomial distribution. xiaojinyu@seu.edu.cn. Contents. Normal distribution for continuous data Binomial distribution for binary categorical data. The Normal Distribution. The most important distribution in statistics. Normal distribution.
E N D
Lecture 5 probability modelnormal distribution & binomial distribution xiaojinyu@seu.edu.cn
Contents • Normal distribution for continuous data • Binomial distribution for binary categorical data
The Normal Distribution The most important distribution in statistics.
Normal distribution • Introduction to normal distribution • History • Parameters and shape • standard normal distribution and Z score • Area under the curve • Application • Estimate of frequency distribution • Reference interval (range) in health_related field.
histroy-Normal Distribution • Johann Carl Friedrich Gauss • Germany • One of the greatest mathematician • Applied in physics, astronomy • Gaussian distribution (1777~1855) Mark and Stamp in memory of Gauss.
The Most Important Distribution • Many real life distributions are approximately normal. such as height, EFV1,weight, IQ, and so on. • Many other distributions can be almost normalized by appropriate data transformation (e.g. taking the log). When log X has a normal distribution, X is said to have a lognormal distribution.
(a) (b) (c) (d) Frequency distributions of heights of adult men.
Histogram- the area of the bars Cumulative relative frequency in the sample, the proportion of the boys of age 12 that are lower than a specified height. normal distribution curve The area under the curve The cumulative probability. In the population. Generally speaking, the chance that a boy of aged 12 is lower than a specified height if he grow normally Sample & Population
Definition of Normal distribution • X~N(,2), X is distributed as normal distribution with mean and variance2. • The probability density function (PDF) f (x) for a normal distribution is given by Where:e = 2.7182818285, base of natural logarithm = 3.1415926536, ratio of the circumference of a circle to the diameter. (- < X < +)
The shape of a normal distribution .4 f(x) .3 .2 .1 0 x
3 1 2 The normal distributions with the equal variance but different means
2 3 The normal distributions with the same mean but different variances 1
Properties Of Normal Distribution • & completely determine the characterization of the normal distribution. • Mean, median , mode are equal • The curve is symmetric about mean. • The relationship between and the area under the normal curve provides another main characteristic of the normal distribution.
Areas under the Standard Normal Curve • A variable that has a normal distribution with mean 0 and variance 1 is called the standard normal variate and is commonly designated by the letter Z. • N(0,1) • As with any continuous variable, probability calculations here are always concerned with finding the probability that the variable assumes any value in an interval between two specific points a and b.
Cumulative distribution Function ( • the area under the curve) from -∞ to x, cumulative Probability • Example: What is the probability of obtaining a z value of 0.5 or less? • We have S(-, )=1
Area under standard normal distribution (Z) Z 0.00 -0.02 -0.04 -0.06 -0.08 -3.0 0.0013 0.0013 0.0012 0.0011 0.0010 -2.5 0.0062 0.0059 0.0055 0.0052 0.0049 -2.0 0.0228 0.0217 0.0207 0.0197 0.0188 -1.9 0.0287 0.0274 0.0262 0.0250 0.0239 -1.6 0.0548 0.0526 0.0505 0.0485 0.0465 -1.0 0.1587 0.1539 0.1492 0.1446 0.1401 -0.5 0.3085 0.3015 0.2946 0.2877 0.2810 0 0.5000 0.4920 0.4840 0.4761 0.4681 Z 0 Z is the standard score, that is the units of standard deviation.
Figure Standard normal curve and some important divisions. • P(-1 < z < 1)=0.6826 • P(-2 < z < 2)=0.9545 • P(-3 < z < 3)=0.9974
Find probability in Excel • Using an electronic table, find the area under the standard normal density to the left of 2.824. • We use the excel2007 function NORMSDIST evaluated at 2.824 [NORMSDIST(2.824)]with the result as follows:
EXAMPLE • What is the probability of obtaining a z value between 1.0 and 1.58? • We have
CUMULATIVE PROBABILITY FOR X~N(μ,σ2) • Z=(X-μ)/σ X= μ+Zσ -3 -2 - + +2 +3 x
Areas under the Normal Curve S(-, +3)=0.9987 S(-, +2)=0.9772 S(-, +1)=0.6587 S(-, )=1 S(-, -3)=0.0013 S(-, -2)=0.0228 S(-, -1)=0.1587 S(-, )=0.5 -3 -2 - + +2 +3 x -4 -3 -2 -1 0 1 2 3 4 Z
-3 -2 -1 0 1 2 3 Area Under Normal Curve S(-, -3)=0.0013 S(-, -2)=0.0228 S(-, -1)=0.1587 S(-, -0)=0.5 S(-3, -2)=0.0115 S(-2, -1)=0.1359 S(-1, )=0.3413 -3 - + +3 -2 +2 -3 -2 - + +2 +3
-3 -2 -1 0 1 2 3 Area Under Normal Curve 95% 2.5% 2.5% +1.96 -1.96
90% 5% 5% +1.64 -1.64 -3 -2 -1 0 1 2 3 Area Under Normal Curve
-3 -2 -1 0 1 2 3 Area Under Normal Curve 99% 0.5% 0.5% -2.58 +2.58
-3 -2 -1 0 1 2 3 Area Under Normal Curve 95% 2.5% 2.5% +1.96 -1.96 26
95% heights of females will fall in the range between mean -1.96SD and mean +1.96SD and
Z score, Standard Score • Transform N(,2) toN(0,1z is refer to as Standard Normal score • How many SD’s the observation from the mean? • Transformation of a normal distribution such that the units are in SD’s. (z score, Standard Score) • By the units of SD, we can compare the observations from diff population. A female with height 172 cm a male with height 172 cm
Values of variable & area under curve • The area that falls in the interval under the nonstandard normal curve is the same as that under the standard normal curve within the corresponding u-boundaries.
The Most Important Distribution • Inpractice Many real life distributions are approximately normal, such as height, weight, IQ, GB and so on • In theory Many other distributions can be almost normalized by appropriate data transformation (e.g. taking the log); • 30
Summarizing • The fundamental probability distribution of statistics. • A very important distribution both in theory and in practice. • The normal distribution has a set of curves. Defined by mean and SD. (infinite) • N(0,1) is unique. • The areas under normal curveare equal when measured by standard deviation.
Applications of Normal distribution • Estimate frequency distribution • Estimate Reference Range
Estimate frequency distribution Example: • IF the distribution of birth weights follows a normal distribution with mean 3150g, and standard deviation is 350g。 • To estimate what proportion of infants whose birth weight are less than 2500g?
Solve for the Example: • The standard normal deviate if x=2500: Z=(x-3150)/350=-1.86 • The probability when Z<-1.86 under the standard normal distribution : ϕ(-1.86)=P(z<-1.86)=0.0314 • Result: there are about 3.14% infants whose birth weight are less than 2500g.
0.0314 2500 Estimate Frequency Distribution 3150
Using Normal Distribution • For any variables distributed as normal distribution, 95% individuals assume values between μ-1.96σ~μ+1.96σ; • 99% between -2.58~ +2.58 ; • And so on.
Reference Interval( Range) • In health-related fields, a reference range or reference interval usually describes the variations of a measurement or value in healthy individuals. • It is a basis for a physician or other health professional to interpret a set of results for a particular patient. • The standard definition of a reference range (usually referred to if not otherwise specified) basically originates in what is most prevalent in a reference group taken from the population. However, there are also optimal health ranges that are those that appear to have the optimal health impact on people.
Reference Interval( Range) • What is ? • A range of values within which majority of measure-ments from “normal” subjects will lie. • Majority: 90%,95%,99%, etc.。 • Usage: • Used as the basis for assessing the result of diagnostic tests in clinic. (normal? abnormal?) • Definitions of “Normal subject”: • Normal Healthy • maybe suffer from other diseases, but do not influence the variable we studied.
How to estimate a reference interval? • Homogeneity of normal subjects. 100 • Measurement errors are controlled • One side? Two sides? • Majority? 90%,95%? • Is it necessary to estimated RI in subgroups? (considerations of partitioning based on age, sex etc) • Determine the suspect range if necessary
Two-side or One-side • Determined by medical professional. • Two-side: • WBC, BP, serum total cholesterol, …… • One-side: • Upper Limit :urine Ld, hair Hg, …Normal as long as lower than • Low Limit:Vital Capacity, IQ, FEV1 (forced expiratory volume in one second) • Normal as long as great than
Overlap distributed of observations for Normal and Abnormal (one-side) Normal Subject False-negative rate False-positive rate Abnormal 界值
Overlap distributed of observations for Normal and Abnormal (one-side) Normal Subject False-negative rate False-positive rate Abnormal
Overlap distributed of observations for Normal and Abnormal (two-side) False-negative rate False-positive rate Normal Subject Abnormal Abnormal
Normal approximate method • For normally distributed data • A 95% reference interval • Two-side: • One-side: For upper limit: For low limit:
Percentile Method • For non-normally distributed data • A 95% reference interval • Two-side: P2.5 ~ P97.5 • One-side: For upper limit: <P95 For low limit: >P5
Example • Hb (hemoglobin) for 360 normal male. • The mean is 13.45 g/100ml; • The standard deviation is 0.71 g/100ml; • Hb is normally distributed. • Estimate the 95% reference range and the 90% reference range.
Example (cont.) • Two side • The 95% reference range is 12.06~14.84 (g/100ml)
Example (cont.) • Two side The 90% reference range is 12.29~14.61 (g/100ml) The 95% reference range is 12.06~14.84 (g/100ml)
Two methods for reference intervals. Method two-side One-side Low Upper Normal Percentile P2.5~P97.5>P5<P95
Central Limit Theorem • As a sample size increased, the means of samples drawn from a population of and distribution will approach the normal distribution. This theorem is known as the central limit theorem (CLT). • That is Sampling distributions • Probability and the central limit theorem