160 likes | 280 Views
Biometri cs 2007 Lecture 8. László Pótó. From the sample to the population…. Typical questions: - Is a given lab-data (of a group of patients) different from the „healthy” value? (what is the expected value – for healthy people ?).
E N D
Biometrics 2007 Lecture 8 László Pótó
From the sample to the population… Typical questions: - Is a given lab-data (of a group of patients) different from the „healthy” value? (what is the expected value –for healthy people?) • Remember: Biometrics is about making conclusion about the unknown population based on the collected data (sample). - Is a measuring tool/process sharp enough (pipette, drug content of pills, box of sugar, and so on…)? - Does a complete series of measurements give the proof that the values are over a certain limit (air or water pollution, …)? – • The problem: how to make conclusion: from x and sxto µand x és sx (and ‘n’, so the measures of the sample) are known…, but:what about µ and ? So: which population come the sample from. – Two methods: - estimation - hypothesis testing
t values n-1 p=95% 2 4,30 5 2,57 8 2,31 10 2,23 15 2,13 20 2,09 50 2,01 1000 1,96 Z= 1,96 For 100intervals created for n-data samples by x ±t* sx/n… ¯ The confidence interval for the µ In case of 16 data the mean ± 2.13* sx /n interv. contains the exp. value of the population by 95% prob. Summary: Because of the increased (t value, depends on n) intervals 95 contain again the µ - out of the 100 different center (the means are different) and length (the std devs are different) intervals. The x ±t* sx /nis thep% confidence interval of the µ (for n=16 and p=95% the t=2.13) ¯ An inside value is a possible µ, an outside one is not (5% error risk)
Calculation of the confidence interval • The drug content of pills at a pharmacological factory was checked by the measures of a 16 pills sample. • The measures are: n=16, mean=102.1 mg, S.D.= 4mg. Can the expected value be 100mg? • The 95% conf. intv. (in mg): 102.1±2.13*4/16 =102.1±2.13== (99.97- 104.23)mg • The 100mg is inside of it, so the 100mg is a possible !(by 95% confidence or 5% error-risk) Interpretation: When repeating the experiment 100 times – having 100 datasets: 100 different means and S.D.s – and calculating the 95% CI from each on the above way, then 95 out of the 100 different CI would contain the real expected value (the ) and only 5 CI not. But note, please, that we can not know that which is the only one C.I. out ofthe above 100? Is that one out of the 95 (that „contains”) or the 5 (that is „not…”)! Let’s see the second method for giving answer to such kind of questions:thehypothesis testing method!
The hypothesis testing – 1 1,Let’s suppose it was not… (it was just a dream…) • An „everyday life” model • I remember like hearing some noises of heavy rain during night. How can I decide in the morning, whether it was a rain or just a dream? (2,) Decide what do I mean on „probable” and on „not probable”… (this is more or less obvious now!) 3, Estimate how probable would be the observed fact in the case of the 1st point hypothesis? (suppose it IS true now!) 4, Decide about the hypothesis („no rain” in this case) a, When the result of point 3 is: „not probable”, do reject… b, When the result of point 3 is: „probable”, do not reject… 5, Conclusion • Checking the method: try the opposite hypothesis at the 1st point
The hypothesis testing – 2 1,Suppose that =100mg is true! No significant difference, the difference is just by chance! — „null”-hypothesis — : H0 • Hypothesis testing in biometrics • „The drug content of 16 pills…” example. Mean: 102.1 mg, S.D. 4mg. Can be the expected value 100mg? 2, Let’s choose the low-end of „probable” is 5%. „Border for decision”: . So let it be now = 0.05 3, If =100mg, than how probable that the mean of 16 data would differ from this at least by 2.1mg? - As to last week: the difference between the mean and the is t*S.E. (here SE= 4mg/16=1mg) where „t” follows df=n-1 (here 15) t distribution. - In our case t=2.1/1=2.1 (-times the S.E.). At the t15-curve at 2.13 (figure!) would „cut” 5% area (probability), so the prob. of „at least 2.1-times” S.E. difference is >5%. So that p>0.05 (=„probable”) – (figure) 4, Decide about the hypothesis („ =100mg”) Because at point 3: p> („probable”), donot reject! 5, Conclusion: The mean is not significantly different than the hypothetical expected value. So can be 100mg!
The one sample t test • When the difference „t” is big(= the area under the t curve – outside of the ‘t’ - is small that means: at least this size of difference has small probability if H0 was true) than our sample (the fact) are against of our hypothesis(null-hypothesis) • See: everyday life model of hyp. test:Reject the null-hypothesis! • We checked how different is the mean than a hypothetical („ H0”) expected value (in S.E. units: „t” times). • When the difference „t” is small (= the area under the t curve is big)at least this size of difference has large probability if H0 was true) than our sample (the fact) is not against of our hypothesis (the null-hypothesis). • See: everyday life model of hyp. test:Do not reject the null-hypothesis! • The probability (area) can be calculated knowing „t” (and n) using the prob dens function. By computer: „p=” (sharp) or fromtable: „p< ”. This is the: One sample t test.
¯ The Conf. Intv. and the t test – compare -1 the „inside” of the intv means: confidence (95 cases out of 100) the „outside” of the intv means : error-risk (5 cases out of 100) • The confidence interval was that: The x ±t* sx /n interval contains the exp. value of the population by a probability depends on the „t” value. (In case of 16 data themean ± 2.13* sx /ninterval contains the expected value by 95% probability.) • While the hypothesis-testing (1 sample t test): For any hypothetical µ(H0) at least that difference (t) of the mean of our actual sample from µ (so: x- µ) could happen by p probability. ¯ the probability is the area outside of the (-t, t) intv (t dens fct.) - When t is big (the prob is smallp<), reject null-hypothesis - When t is not big (the prob is not smallp), donot reject H0 What is the case when the µ examined by hypothesis testing - is inside of the confidence interval, and what if it - is outside of the confidence interval?
¯ The Conf. Intv. and the t test – compare -2 • The meaning of the x ±t* sx /n interval (right at the border) x ¯ -t(95%) t(95%) • An equivalent meaning is that: the µ ±t* sx /n interval The probability of„at least this difference” is just 5% µ ¯ x
¯ x ¯ x The Conf. Intv. and the t test – compare -3 t > t(95%) , p < reject H0 • For an „outside” hypothetical µ: (= „big” distance): µ • For an „inside” hypothetical µ: (= „small” distance) t < t(95%) , p > not reject H0 µ
µ ¯ x ¯ x The error risk of a decision t > t(95%) , p < reject H0 • For an „outside” hypothetical µ: p < probability for the sample case – this is „unlikely”, so reject the null-hypothesis. Wrong decision by „p” probability: risk of the Type 1 error • For an „inside” hypothetical µ: t < t(95%) , és p > not reject H0 Can be also wrong:Type 2 error µ
How to decrease the error risk(s)? • For the Type 1 error: decrease - the „not reject H0” interval is increasing - we would reject less and less H0 because the area for „small” p values would decrease - decreasing type 1 error risk: the p itself (p < ). • But meanwhile- more and more H0 would be accepted, so - increasing risk of type 2 error • For the Type 2 error: increase - the „not reject H0” interval is decreasing - we would accept less and less H0 because the area for „large” p values would decrease - decreasing type 2 error risk. • But meanwhile- more and more H0 would be rejected, so - increasing risk of type 1 error: the p itself (p ).
What is the best ? „just not to imprison an innocent!” prefer H0 (decrease ) increasing risk of type 2 error (accept a wrong null-hypothesis) „ just not to let free a murderer!” prefer rejecting H0 (increase ) increasing risk of type 1 error (reject a good null-hypothesis) Example: the case of a murder trial: H0 : innocent Example : is the new (say the 35th of this kind) drug effective? H0 : not „ just not to produce an ineffective product!” (money!) prefer H0 (decrease : =0.01) increasing risk of type 2 error (accept a wrong null-hypothesis) Example : is there some side-effect of the new drug?: H0 : no „ just not to have a hidden side effect” prefer rejecting H0 (increase : =0.1) increasing risk of type 1 error (reject a good null-hypothesis) The =0.05 is a „golden middle” when there are no special preferences
What was it today? • The confidence interval and the hypothesis testing (for the µ ) • The hypothesis testing H0 is the „no effect” case (only the „blind chance” is acting) • do reject every hypothetical „outside” µ, not the „inside” ones 2. both decisions has an error risk Risks for the Type 1 and Type 2 errors • when decreasing the one the other will increase (border ) 4. selection of this border is not free depending on the problem increase or decrease it - the 5% is a good „middle” • Coming next: compare the means of two independent samples (which one is the more effective drug,…?)
From the textbooks: • Belágyi: pp. 50-58 • Moore: 365-407 Thank you for your attention!
What have we learned so far? • Why biometrics (importance) – decisions based on probability • The interpretation of probability– variables • The data – the population and the sample • Basic terms • Description and measures of the sample –histogram, mean, S.D. • The population – distributions, density fct., parameters: µ, • The normal distribution • Applications: difference between a sample mean and an expected value? (which population come the sample from?) Two methods • The confidence interval for the µ (estimation) • The one sample t test (hypothesis testing)