1 / 35

Formulating a Sound Research Question and Study Hypotheses: Hypothesis Testing

Learn how to formulate a sound research question and study hypotheses for hypothesis testing. This lunch and learn series by the Southern California Clinical and Translational Science Institute will cover topics such as study design, data collection, sample size, and statistical analysis.

anell
Download Presentation

Formulating a Sound Research Question and Study Hypotheses: Hypothesis Testing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Biostatistics Lunch & Learn SeriesFormulating a sound research question and study hypotheses: hypothesis testing Southern California Clinical and Translational Science Institute: Research Development and Team ScienceBiostatistics, Epidemiology and Research Design (BERD) October 22, 2018

  2. Biostatistics, Epidemiology and Research Design (BERD) Faculty: Wendy Mack, BERD Director Christianne Lane, USC Melissa Wilson, USC Cheryl Vigen, USC Ji HoonRyoo, CHLA Staff: Choo Phei Wei, CHLA Caron Park and Melissa Koc, USC

  3. Objectives • These lectures are designed to prepare you to communicate with biostatisticians and others to design and conduct your study, as well as interpret your study results • Know your limits and when to consult a biostatistician or other person with domain expertise. • It is best to do so at the planning stage of your research!Is my research question appropriately specified? What is an appropriate and feasible study design to address the research question? Can I collect the appropriate data to test the research question? How should the data be analyzed and interpreted?

  4. Objectives • Today: Formulating a sound research question and study hypotheses: hypothesis testing • Part 2: Study designs and data collection strategies: scientific and logistical considerations in selecting the design to address your research question • Part 3: Sample size and study power: Why do I need so many subjects? What will my biostatistician need to know and how can I get that information? • Part 4: Statistical analysis: What statistical methods are appropriate for my study design and data collected?

  5. Caveats • These lectures WILL NOT teach you how to use statistical software to analyze your study data! • If you want to learn how to competently analyze and interpret your study data, there are: Courses in the Biostatistics and Epidemiology graduate programs (Dept of Preventive Medicine MS in Clinical and Biomedical Investigation

  6. Defining the Research Question and Hypothesis Testing • What are the components of a good research question? • How do I translate my research question to a statistical question (and hypothesis) that I can test? • What is statistical hypothesis testing? What does a p-value mean? • How does the research question relate to study design? What alternative designs might be used to address my research question? (Next workshop)

  7. Statistical Question (statistical concepts) Research Question (general concepts) Design a study and collect data to test the statistical question and answer the research question.

  8. Statistical Question: Among persons age>70, do those who develop dementia have a higher proportion of systolic BP>120 than those who do not develop dementia? Or… Among persons age>70, is the incidence rate of dementia greater in those who have systolic BP>120 compared to those who did not? Research Question: Does control of systolic BP reduce dementia risk in the elderly? 1st statistical question: case-control study; Statistical concepts: Compare proportions (with systolic BP>120) 2nd statistical question: cohort study Statistical concepts: Compare incidence rates (developing dementia) Other options: randomized trial???

  9. The Research Question • States a relationship between two or more variables, phrasing in terms of some question. • Why is this research important? What is the research gap in our scientific understanding? • What is the past research in this area? • What areas need further exploration? • Can my study help fill in these gaps or lead to greater understanding?

  10. Refining the Research Question • Identify the main concepts or keywords of your research question • Topic too broad? Add more concrete or specific terms to your question • Topic too narrow? Broaden content of terms

  11. FINER Criteria to Develop the Research Question • F FeasibleAdequate number of subjects and technical expertiseAffordable in time and moneyManageable in scope • I Interesting • N Novel Confirms, refutes or extends previous findings • E Ethic • R RelevantTo scientific knowledge, clinical health and policy, future research Hulley S, et al. Designing clinical research. Philadelphia (PA): Lippincott Williams and Wilkins; 2007.

  12. PICOT Criteria to Develop the Research Question • P PopulationWhat specific population will you test the intervention in? • I Intervention (or Exposure)What is the intervention/exposure to be investigated? • C Comparison Group What is the main comparator to judge the effect of the intervention? • O OutcomeWhat will you measure, improve, affect? • T TimeOver what time period will outcome be assessed?

  13. Defining your Population: Population to Sample • Choosing a representative & feasible sample • Balancing science & practicality Who you want to generalize to Who you have access to Who you plan to sample Analysis sample

  14. Defining your Population and Sample • Target populations • What populations are relevant to the research question? • Demographics (e.g., age>70) • Clinical characteristics (not demented) • Source populations • What population with the characteristics of the target population are available to you? • Geography (residents of Los Angeles county) • Temporal (e.g. recruitment period 1/1/2017 – 12/31/2020) • Availability of the study population may affect generalizability and reproducibility • Intended sample • The part of the accessible population you will attempt to recruit; ideally representative of the accessible population • Prevalence of target characteristics

  15. Research to Scientific Question • Operationalize the general concepts in your research question • Research Question: Do elevated systolic BP increase dementia in elderly persons? • Population: Elderly personsOperationalize (who we will sample from): persons aged>70 • Intervention/exposure: Systolic BP levelOperationalize: Average systolic BP>120 in prior 1 year • Outcome: DementiaOperationalize: New diagnosis of dementia (i.e., among non-demented persons) over 5 years of follow-up (time)

  16. Research to Scientific Question • Groups (exposed and comparator): Non-demented persons age>70, with and without average systolic BP>120 in the prior year • Statistical outcome: Incidence rate of new dementia diagnosis • Statistical question: Among non-demented persons aged>70 (population), does the 5-year (time) incidence rate of dementia (outcome: new dementia diagnosis) differ in those who did and did not have an average systolic BP>120 in the prior year (exposure/intervention and comparator)?

  17. Research Question and Hypothesis Testing • An objective framework for making scientific (statistical) conclusions about a sample of data • Research question: A question focused on what is to be described by a given research project and what relationships may be established. Does elevated systolic BP increase dementia in elderly persons? • Hypothesis: A re-statement of the research question in a testable format, stating the relationship between measured variables – stated in “null” (H0: no relationship) and “alternative” (H1: the relationship we are interested in) formats Example: Non-demented persons aged>70 with average systolic BP>120 in the prior year will have a higher 5-year incidence of dementia than non-demented persons aged>70 who did not have average systolic BP>120 in the prior year.

  18. Hypothesis Testing • Elements of an hypothesis (like PICOT criteria) • Exposure (independent variable) • Outcome (dependent variable) • Direction of effect (of exposure on outcome) • Population under study • Comparison group

  19. Why Hypothesis Testing? • Based on a research question, collect information about a population based upon data we collect from a sample • If chosen well, samples will be REPRESENTATIVE of the population and the information will therefore be GENERALIZEABLE • We use probability (i.e., p-values!) to decide if the data we have collected from our study sample are more likely to have arisen from a null versus an alternative hypothesis

  20. Hypothesis Testing Steps • Ask research question • State the null (H0) and alternative (H1) hypothesis • Pick significance level (α) • Collect data • Decide what statistical test to perform; find the value of the test statistic • Convert test statistic to p-value (probability): The probability of obtaining that value of the test statistic or anything larger IF THE NULL HYPOTHESIS IS TRUE • Make statistical decisionif p≤α, Reject H0 (statistically significant)if p>α, Do not reject H0 (not significant) • State conclusion

  21. Hypothesis Testing • Objective: make an inference about a population, based on information contained in a sample

  22. Hypothesis Testing • Idea: Assume null hypothesis is true and find the probability of getting the observed data or anything more extreme • Example: Given that statins really do not influence incidence of dementia, what is the probability of observing the incidence rate of subjects developing dementia in those who received a statin prescription vs. those who did not receive a statin prescription?

  23. Hypothesis Testing: p-values • p-value - probability of observing a result (your data) as extreme (or more extreme) than the one observed, assuming that H0 is true • Small p-value gives evidence for H1 (H0 is not true) It is not very likely that we would obtain the data we did if H0 is true • Large p-value gives evidence for H0 (H0 is true) It is pretty likely that we would obtain the data we did if H0 is true

  24. Hypothesis Testing: Type I and Type II Errors • Type I, or α error: the probability that your conclusion to reject H0 was wrong (false positive). Example: concluded the incidence rate of dementia is different in persons with vs. without average systolic BP>120, when they truly are notStandard α = 0.05 (will be wrong 5% of the time when we decide to reject H0). • Type II, or β error: the probability that your conclusion to not reject H0 was wrong (false negative). Example: concluded the incidence of dementia is not different in persons with vs. without average systolic BP>120, when they truly are

  25. Hypothesis Testing: Type I and Type II Errors Critical value of Test statistic Under H1 Under H0 P(Type I error) P(Type II error) What happens to test errors when we change critical values?

  26. Hypothesis Testing: p-values • Calculate sample estimates of population parameter (e.g., group mean, or dementia incidence rate) from your collected data and translate this to a test statistic (using statistical formula) • Many test statistics have the form: test statistic = observed value – null hypothesis value standard error of observed value • Using the null hypothesis distribution of the test statistic, find the probability (p-value) of your calculated test statistic or anything more extreme

  27. Hypothesis Testing: p-values P-value Calculated t Critical t

  28. Hypothesis Testing: Interpreting p-values • How does data compare to likely variation in the sample due to chance when H0 is true? • Large p-value (p>α): data could be likely when null is true, can’t rule out possibility that H0 is true Conclusion: Do not reject H0 • Small p-value (p<α): H0 doesn’t seem realistic since data would rarely occur by chance when H0 is trueConclusion: Reject H0

  29. Hypothesis Testing: 1-sided vs. 2-sided test • Two-sided - used when we are interested in any deviation from the null hypothesis. The alternative hypothesis does not state a directionH1: The incidence rate of dementia is different in persons with vs. without average systolic BP>120. • One-sided – used when there is strong prior evidence for deviation from the null hypothesis in a specific directionH1: The incidence rate of dementia is higher in persons with vs. without average systolic BP>120. • To be conservative, one usually conducts a 2-sided hypothesis test, even if one is anticipating that there will be a reduction (or increase) in the outcome

  30. Hypothesis Testing: 1-sided vs. 2-sided test 2-sided 1-sided

  31. Statistical Significance and P-values (American Statistical Association) • P-values can indicate how incompatible the observed data are with a statistical model (e.g., evidence against the null hypothesis) • P-values do NOT measure the probability that the study hypothesis is true • Scientific conclusions should NOT be based only on whether a p-value passes a specific threshold (nothing magic about 0.05) – consider also study design (strengths, limitations), data quality, possible study biases, external evidence, etc. • Proper interpretation of a p-value requires full reporting of hypothesis testing conducted –Conducting multiple analyses and then selectively reporting the “interesting” results = “cherry-picking”, “data dredging” is a pervasive problem in the medical literature – invalid excess of “significant” findings, lack of reproducibilityReport all study hypotheses, data collection decisions, all statistical tests (p-values) computed, and how those reported were selected for presentation Wasserstein RL, Lazar NA. The ASA’s statement on p-values: context, process, and purpose. The American Statistician, 2016;70(2):129-133

  32. Statistical Significance and P-values (American Statistical Association) • P-value does NOT measure the size of the study effect or importance of the findingSmall study effects can have small p-values with large sample size, and large study effects may have large p-values with small sample size.Studies with the SAME effect size will have different p-values, depending on sample size and measurement precision • Don’t report p-values alone (for many of the reasons above)Provide measures of effect size, along with estimate of precision (confidence intervals) Wasserstein RL, Lazar NA. The ASA’s statement on p-values: context, process, and purpose. The American Statistician, 2016;70(2):129-133

  33. Next workshop on Nov 12: Study designs and data collection strategies: scientific and logistical considerations in selecting the design to address your research question. SC CTSI | www.sc-ctsi.org

  34. CTSI Biostatistics (BERD): a resource for you at USC • Biostatisticians to help you with study design, sample size estimation, data management plan, statistical analyses, and summarizations of your methods and results • Recharge center • To request a consult:https://sc-ctsi.org/bbr-consult

More Related