1 / 38

P Values

P Values. Robin Beaumont 8/2/2012 With much help from Professor Chris Wilds material University of Auckland. Where do they fit in!. probability. Putting it all together. P Value. sampling. statistic. Rule. A P value is a special type of probability:

Download Presentation

P Values

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. P Values Robin Beaumont 8/2/2012 With much help from Professor Chris Wilds material University of Auckland

  2. Where do they fit in!

  3. probability Putting it all together P Value sampling statistic Rule

  4. A P value is a special type of probability: • It considers more than one outcome (one event can have more than one outcome) • Is a conditional probability Probability Values P Value • A typical probability value: 0.25 • A probability must be between 0 and 1 e.g. Probability of winning the lottery 0.0000001 yes no 0.9999999 All possible outcomes at any one time must add up to 1

  5. Probabilities are rel. frequencies

  6. Probability Density Function 11 The total area = 1 total 48 scores 10 9 8 7 6 Probability 5 4 Density 3 B A 2 1 0 33 37 43 47 53 57 63 67 73 77 83 87 Scores p(score<45) = area A p(score > 50) = area B Multiple outcomes at any one time P(score<45 and score >50) = Just add up the individual outcomes

  7. Normal Distribution: 0.4 Chi-Squared Distribution: df = 9 0.10 0.3 0.08 0.2 0.06 Density 0.04 0.1 0.02 0.0 0.00 -3 -2 -1 0 1 2 3 0 5 10 15 20 25 30 x The ‘more extreme’ idea The probability of a value more extreme?

  8. What happens if events affect each other? = Conditional Probability Example from Taylor – From patient data to medical knowledge p160 20 in a room : 8 female + 12 male 4 of which have a beard P(bearded) = 4/20 = 0.2 P(male) = 12/20 = .6 So does the probability of being a bearded male = 0.2 x 0.6 = 0.12 NO Multiple each branch of the tree to get end value P(bearded|male) P(Male AND bearded) = 0.6 x 0.3333 = 0.2 4/12 = .3333 P(male) 12 12/20 = .6 20 P(clear|male) P(bearded AND male) = P(male) x P(bearded| male) 8/20 = .4 P(female) 8

  9. Screening Example 0.1% of the population (i.e 1 in a thousand) carry a particular faulty gene. A test exists for detecting whether an individual is a carrier of the gene. In people who actually carry the gene, the test provides a positive result with probability 0.9 90% of the time we get the correct result In people who don’t carry the gene, the test provides a positive result with probability 0.01. 1% of the time we get a incorrect positive result Let G = person carries gene P = test is positive for gene N = test is negative for gene Given that someone has a positive result, find the probability that they actually are a carrier of the gene. We want to find Need P(P) looking at the two P(P) branches P(P) = P(G and P) + P(G' and P) = 0.0009 + 0.00999 = 0.01089 P( P | G) Errors P(P | G) ≠ P (G | p) ORDER MATTERS

  10. Disease / Test = Conditional Probability P(test+|disease) Disease X AND test+ P(disease)

  11. observed | hypothesised The probability of obtaining the hypothesised value GIVEN THAT we obtained the summary value x X Hypothesised value Summary value=x P(hypothesised value|summary value=x) • The probability of obtaining summary value x GIVEN THAT I have this hypothesised value summary value=x Hypothesised value P(summary value=x|hypothesised value)

  12. Chi-Squared Distribution: df = 9 0.10 0.08 0.06 Density 0.04 0.02 0.00 0 5 10 15 20 25 30 Combining conditional probability + multiple outcomes = P value Here we have a probability distribution of possible observed values for the chi-square summary statistic GIVEN THAT The hypothesised value is ZERO A P value is a conditional probability considering a range of outcomes The blue bit presents all those values greater than 15 • 0.0909 Area = 0.0909 This is the P value P value = P(observed chi square value or one more extreme |value = 0)

  13. Probability summary • All outcomes at any one time add up to 1 • Probability histogram = area under curve =1 • -> specific areas = sets of outcomes • “More extreme than x” • Conditional probability –– ORDER MATTERS • A P value is a conditional probability which considers a range of outcomes

  14. probability Putting it all together P Value sampling statistic Rule

  15. Populations and samples Ever constant at least for your study! = Parameter estimate = statistic

  16. One sample

  17. Size matters – single samples

  18. Size matters – multiple samples

  19. We only have a rippled mirror

  20. Standard deviation - individual level Area! Wait and see But does not take into account small sample size = t distribution = measure of variability 'Standard Normal distribution' Area: 95% 68% Total Area = 1 Defined by sample size aspect ~ df SD value = 2 1 0 Between + and - three standard deviations from the mean = 99.7% of area Therefore only 0.3% of area(scores) are more than 3 standard deviations ('units') away. -

  21. Sampling level -‘accuracy’ of estimate Talking about means here We can predict the accuracy of your estimate (mean) by just using the SEM formula. From a single sample = 5/√5 = 2.236 SEM = 5/√25 = 1 From: http://onlinestatbook.com/stat_sim/sampling_dist/index.html

  22. Example - Bradford Hill, (Bradford Hill, 1950 p.92) • mean systolic blood pressure for 566 males around Glasgow = 128.8 mm. Standard deviation =13.05 • Determine the ‘precision’ of this mean. • “We may conclude that our observed mean may differ from the true mean by as much as ± 2.194 (.5485 x 4) but not more than that in around 95% of observations. page 93. [edited]

  23. Sampling summary • The SEM formula allows us to: • predict the accuracy of your estimate ( i.e. the mean value of our sample) • From a single sample • Assumes Random sample

  24. Variation what have we ignored! Onto Probability now

  25. sampling Putting it all together P Value probability statistic Rule

  26. Statistics • Summary measure – SEM, Average etc • T statistic – different types, simplest: So when t = 0 means 0/anything = estimated and hypothesised population mean are equal So when t = 1 observed different same as SEM So when t = 10 observed different much greater than SEM

  27. T statistic example Serum amylase values from a random sample of 15 apparently healthy subjects. The mean = 96 SD= 35 units/100 ml. How likely would such a sample be obtained from a population of serum amylase determinations with a mean of 120. (taken from Daniel 1991 p.202 adapted) A population value = the null hypothesis This looks like a rare occurrence? But for what

  28. 9.037 n =15 t density: s = x 96 Original units: 120 Shaded area=0.0188 0 2.656 0 -2.656 t Given that the sample was obtained from a population with a mean of 120 a sample with a T(n=15) statistic of -2.656 or 2.656 or one more extreme will occur 1.8% of the time = just under two samples per hundred on average. . . . . Given that the sample was obtained from a population with a mean of 120 a sample of 15 producing a mean of 96 (120-x where x=24) or 144 (120+x where x=24) or one more extreme will occur 1.8% of the time, that is just under two samples per hundred on average. What does the shaded area mean! Serum amylase values from a random sample of 15 apparently healthy subjects. mean =96 SD= 35 units/100 ml. How likely would such a sample be obtained from a population of serum amylase determinations with a mean of 120. (taken from Daniel 1991 p.202 adapted) But it this not a P value p = 2 · P(t(n−1) < t| Ho is true) = 2 · [area to the left of t under a t distribution with df= n − 1]

  29. P value and probability for t statistic p value = 2 x P(t(n-1) values more extreme than t(n-1) | Ho is true) = 2 · [area to the left of t under a t distribution with n − 1 shape] A p value is a special type of probability with: Multiple outcomes + conditional upon the specified parameter value

  30. sampling Putting it all together P Value probability statistic Rule Do we need it!

  31. 9.037 n =15 t density: s = x 96 Original units: 120 Shaded area=0.0188 0 2.656 0 -2.656 t Say one in twenty 1/20 = Or 1/100 Or 1/1000 or . . . . Rules Set a level of acceptability = critical value (CV)! If our result has a P value of less than our level of acceptability. Reject the parameter value. Say 1 in 20 (i.e.CV=0.5) Given that the sample was obtained from a population with a mean (parameter value) of 120 a sample with a T(n=15) statistic of -2.656 or 2.656 or one more extreme with occur 1.8% of the time, This is less than one in twenty therefore we dismiss the possibility that our sample came from a population mean of 120 . . . . What do we replace it with?

  32. Fisher – only know and only consider the model we have i.e. The parameter we have used in our model – when we reject it we accept that any value but that one can replace it. Neyman and Pearson + Gossling Must have an alternative specified value for the parameter

  33. Power – sample size • Affect size • – indication of clinical importance: If there is an alternative - what is it – another distribution! Serum amylase values from a random sample of 15 apparently healthy subjects. mean =96 SD= 35 units/100 ml. How likely would such a sample be obtained from a population of serum amylase determinations with a mean of 120. (taken from Daniel 1991 p.202 adapted)

  34. α = the reject region = 96 = 120 Correct decisions incorrect decisions

  35. Insufficient power – never get a significant result even when effect size large Too much power get significant result with trivial effect size

  36. Life after P values • Confidence intervals • Effect size • Description / analysis • Bayesian statistics - qualitative approach by the back door! • Planning to do statistics for your dissertation? • see: My medical statistics courses: • Course 1: • www.robin-beaumont.co.uk/virtualclassroom/stats/course1.html • YouTube videos to accompany course 1: • http://www.youtube.com/playlist?list=PL9F0EBD42C0AB37D0 • Course 2: • www.robin-beaumont.co.uk/virtualclassroom/stats/course2.html • YouTube videos to accompany course 2: • http://www.youtube.com/playlist?list=PL05FC4785D24C6E68

  37. Your attitude to your data

  38. Where do they fit in!

More Related