1 / 123

CPH Exam Review Biostatistics

CPH Exam Review Biostatistics. Lisa Sullivan, PhD Associate Dean for Education Professor and Chair, Department of Biostatistics Boston University School of Public Health. Outline and Goals. Overview of Biostatistics (Core Area) Terminology and Definitions Practice Questions

bat
Download Presentation

CPH Exam Review Biostatistics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CPH Exam ReviewBiostatistics Lisa Sullivan, PhD Associate Dean for Education Professor and Chair, Department of Biostatistics Boston University School of Public Health

  2. Outline and Goals • Overview of Biostatistics (Core Area) • Terminology and Definitions • Practice Questions An archived version of this review, along with the PPT file, will be available on the NBPHE website (www.nbphe.org) under Study Resources

  3. Biostatistics Two Areas of Applied Biostatistics: Descriptive Statistics • Summarize a sample selected from a population Inferential Statistics • Make inferences about population parameters based on sample statistics.

  4. Variable Types • Dichotomous variables have 2 possible responses (e.g., Yes/No) • Ordinal and categorical variables have more than two responses and responses are ordered and unordered, respectively • Continuous (or measurement) variables assume in theory any values between a theoretical minimum and maximum

  5. We want to study whether individuals over 45 years are at greater risk of diabetes than those younger than 45. What kind of variable is age? • Dichotomous • Ordinal • Categorical • Continuous

  6. We are interested in assessing disparities in infant morbidity by race/ethnicity. What kind of variable is race/ethnicity? • Dichotomous • Ordinal • Categorical • Continuous

  7. Numerical Summaries of Dichotomous, Categorical and Ordinal Variables Frequency Distribution Table Ordinal variables only

  8. Frequency Bar Chart

  9. Relative Frequency Histogram

  10. Continuous Variables • Assume, in theory, any value between a theoretical minimum and maximum • Quantitative, measurement variables • Example – systolic blood pressure Standard Summary: n = 75, = 123.6, s = 19.4 Second sample n = 75, = 128.1, s = 6.4

  11. Summarizing Location and Variability • When there are no outliers, the sample mean and standard deviation summarize location and variability • When there are outliers, the median and interquartile range (IQR) summarize location and variability, where IQR = Q3-Q1 • Outliers <Q1–1.5 IQR or >Q3+1.5 IQR

  12. Mean Vs. Median

  13. Box and Whisker Plot Min Q1 Median Q3 Max

  14. Comparing Samples withBox and Whisker Plots 100 110 120 130 140 150 160 2 1 Systolic Blood Pressure

  15. What type of display is shown below? Percent Patients by Disease Stage • Frequency bar chart • Relative frequency bar chart • Frequency histogram • Relative frequency histogram

  16. The distribution of SBP in men, 20-29 years is shown below. What is the best summary of a typical value • Mean • Median • Interquartile range • Standard Deviation

  17. When data are skewed, the mean is higher than the median. • True • False

  18. The best summary of variability for the following continuous variable is • Mean • Median • Interquartile range • Standard Deviation

  19. Numerical and Graphical Summaries • Dichotomous and categorical • Frequencies and relative frequencies • Bar charts (freq. or relative freq.) • Ordinal • Frequencies, relative frequencies, cumulative frequencies and cumulative relative frequencies • Histograms (freq. or relative freq. • Continuous • n, and s or median and IQR (if outliers) • Box whisker plot

  20. What is the probability of selecting a male with optimal blood pressure? • 20/25 • 20/80 • 20/150 Blood Pressure Category Optimal Normal Pre-HtnHtn Total Male 20 15 15 30 80 Female 5 15 25 25 70 Total 25 30 40 55 150

  21. What is the probability of selecting a patient with Pre-Htn or Htn? • 95/150 • 45/80 • 55/150 Blood Pressure Category Optimal Normal Pre-HtnHtn Total Male 20 15 15 30 80 Female 5 15 25 25 70 Total 25 30 40 55 150

  22. What proportion of men have prevalent CVD? • 35/80 • 35/265 • 35/300 CVD Free of CVD Men 35 265 Women 45 355

  23. What proportion of patients with CVD are men ? • 35/700 • 35/80 • 80/300 CVD Free of CVD Men 35 265 Women 45 355

  24. Are Family History and Current Status Independent? Example. Consider the following table which cross classifies subjects by their family history of CVD and current (prevalent) CVD status. P(Current CVD| Family Hx) = 15/105 = 0.143 P(Current CVD| No Family Hx) = 25/240 = 0.104

  25. Are symptoms independent of disease? • No • Yes Disease No Disease Total Symptoms 25 225 250 No Symptoms 50 450 500

  26. Probability Models – Binomial Distribution • Two possible outcomes: success and failure • Replications of process are independent • P(success) is constant for each replication • Mean=np, variance=np(1-p)

  27. Probability Models – Poisson Distribution • Two possible outcomes: success and failure • Replications of process are independent • Often used to model counts (often used to model rare events) • Mean=m, variance=m

  28. Probability Models – Normal Distribution • Model for continuous outcome • Mean=median=mode

  29. Normal Distribution Properties of Normal Distribution I) The normal distribution is symmetric about the mean (i.e., P(X > m) = P(X < m) = 0.5). ii) The mean and variance (mand s2)completely characterize the normal distribution. iii) The mean = the median = the mode iv) Approximately 68% of obs between mean + 1 sd 95% between mean +2 sd, and >99% between mean +3 sd

  30. Normal Distribution Body mass index (BMI) for men age 60 is normally distributed with a mean of 29 and standard deviation of 6. What is the probability that a male has BMI < 29? P(X<29)= 0.5 11 17 23 29 35 41 47

  31. Normal Distribution What is the probability that a male has BMI less than 30? P(X<30)=? 11 17 23 29 35 41 47

  32. Standard Normal Distribution Z Normal distribution with m=0 and s=1 -3 -2 -1 0 1 2 3

  33. Normal Distribution P(X<30)= P(Z<0.17) = 0.5675 From a table of standard normal probabilities or statistical computing package.

  34. Comparing Systolic Blood Pressure (SBP) Comparing systolic blood pressure (SBP) • Suppose for Males Age 50, SBP is approximately normally distributed with a mean of 108 and a standard deviation of 14 • Suppose for Females Age 50, SBP is approximately normally distributed with a mean of 100 and a standard deviation of 8 If a Male Age 50 has a SBP = 140 and a Female Age 50 has a SBP = 120, who has the “relatively” higher SBP ?

  35. Normal Distribution ZM = (140 - 108) / 14 = 2.29 ZF = (120 - 100) / 8 = 2.50 Which is more extreme?

  36. Percentiles of the Normal Distribution The kthpercentile is defined as the score that holds k percent of the scores below it. Eg., 90th percentile is the score that holds 90% of the scores below it. Q1 = 25th percentile, median = 50th percentile, Q3 = 75th percentile

  37. Percentiles For the normal distribution, the following is used to compute percentiles: X = m + Z s where m = mean of the random variable X, s = standard deviation, and Z = value from the standard normal distribution for the desired percentile (e.g., 95th, Z=1.645). 95th percentile of BMI for Men: 29+1.645(6) = 38.9

  38. Central Limit Theorem • (Non-normal) population with m, s • Take samples of size n – as long as n is sufficiently large (usually n > 30 suffices) • The distribution of the sample mean is approximately normal, therefore can use Z to compute probabilities Standard error

  39. Statistical Inference • There are two broad areas of statistical inference, estimation and hypothesis testing. • Estimation. Population parameter is unknown, sample statistics are used to generate estimates. • Hypothesis Testing. A statement is made about parameter, sample statistics support or refute statement.

  40. What Analysis To Do When • Nature of primary outcome variable • Continuous, dichotomous, categorical, time to event • Number of comparison groups • One, 2 independent, 2 matched or paired, > 2 • Associations between variables • Regression analysis

  41. Estimation • Process of determining likely values for unknown population parameter • Point estimate is best single-valued estimate for parameter • Confidence interval is range of values for parameter: point estimate + margin of error point estimate + t SE (point estimate)

  42. Hypothesis Testing Procedures 1. Set up null and research hypotheses, select a 2. Select test statistic 3. Set up decision rule 4. Compute test statistic 5. Draw conclusion & summarize significance (p-value)

  43. P-values • P-values represent the exact significance of the data • Estimate p-values when rejecting H0 to summarize significance of the data (approximate with statistical tables, exact value with computing package) • If p <a then reject H0

  44. Errors in Hypothesis Tests Conclusion of Statistical Test Do Not Reject H0 Reject H0 H0 true Correct Type I error H0 false Type II error Correct

  45. Continuous OutcomeConfidence Interval for m • Continuous outcome - 1 Sample n > 30 n < 30 Example. 95% CI for mean waiting time at ED Data: n=100, =37.85 and s=9.5 mins 37.85 + 1.86 (35.99 to 39.71) Statistical computing packages use t throughout.

  46. New Scenario • Outcome is dichotomous • Result of surgery (success, failure) • Cancer remission (yes/no) • One study sample • Data • On each participant, measure outcome (yes/no) • n, x=# positive responses,

  47. Dichotomous Outcome Confidence Interval for p • Dichotomous outcome - 1 Sample Example. In the Framingham Offspring Study (n=3532), 1219 patients were on antihypertensive medications. Generate 95% CI. 0.345 + 0.016 (0.329, 0.361)

  48. One Sample Procedures – Comparisons with Historical/External Control Continuous Dichotomous H0: m=m0 H0: p=p0 H1: m>m0, <m0, ≠m0 H1: p>p0, <p0, ≠p0 n>30 n<30

  49. One Sample Procedures – Comparisons with Historical/External Control Categorical or Ordinal outcome c2 Goodness of fit test H0: p1=p10,p2=p20,. . . , pk=pk0 H1: H0 is false

  50. New Scenario • Outcome is continuous • SBP, Weight, cholesterol • Two independent study samples • Data • On each participant, identify group and measure outcome

More Related