1 / 20

Medical Biometry I

Medical Biometry I. ( Biostatistics 511) Discussion Section Week 3 Phillip Keung. Discussion Outline. Key Concepts/Topics from Weeks 2 & 3 Illustration: Cardiovascular Health Study Describing associations Categorical vs. continuous characteristics

perdy
Download Presentation

Medical Biometry I

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Medical Biometry I (Biostatistics 511) Discussion Section Week 3 Phillip Keung Biostat 511

  2. Discussion Outline • Key Concepts/Topics from Weeks 2 & 3 • Illustration: Cardiovascular Health Study • Describing associations • Categorical vs. continuous characteristics • Continuous vs. continuous characteristics • Categorical vs. categorical characteristics • Graphical summaries • Numerical summaries Biostat 511

  3. Example: Cardiovascular Health Study • Study Population • The Cardiovascular Health Study (CHS) is a cohort of men and women, aged 65 years and older, drawn from four U.S. communities. • Details of the CHS study design have been published elsewhere (Fried et al., 1991). Demographic information, laboratory tests, physical measurements, ultrasound, and measures of cognitive and functional status were collected at baseline and at annual visits thereafter. Biostat 511

  4. Set up your Stata session • Start .log file • log using “D:/pkeung/My Documents/week3disc.log” • Replace “liangcj” above with your username • Load CHS data: • use https://courses.washington.edu/b511/Data/chs.dta Biostat 511

  5. Describing Associations • Association (definition): • The distribution of one variable varies by values of the other variable. • Examples: • Height and sex are associated: the distribution of height varies between men and women. • Blood pressures (SBP & DBP): see definition above. • Associations are statistical relationships, not (necessarily) causal relationships. • How might we assess causal relationships? Biostat 511

  6. Approaches for describing 2-way relationships (not exhaustive) Biostat 511

  7. Categorical vs. Quantitative • Stratify on categorical variable • Summarize distribution of quantitative variable in each stratum. • If they differ, then the variables are associated. • Example: Are height and sex are associated? Does the distribution of height vary between men and women. • What kinds of variables (data types) are these? Biostat 511

  8. Categorical vs. Quantitative • Example: Are height and sex associated? Does the distribution of height vary between men and women? • Visualization using stratified box plots: • . graph box height, by(gender) Biostat 511

  9. Categorical vs. Quantitative Descriptive statistics: What conclusions can we draw from the visualization and descriptive statistics? Are height and gender associated? Biostat 511

  10. Quantitative vs. Quantitative • Option 1 (grouping): • Group one variable into categories • Compare the distribution of the other variable by categories of the first (e.g. by using techniques as described in the categorical vs quantitative section). • Option 2 (multivariate): • Scatter plots and lowesscurves to show association. • Assess strength of relationship with correlation (Pearson) • Example: Is there an association between systolic blood pressures and diastolic blood pressures? Biostat 511

  11. Quantitative vs. Quantitative Example: is there an association between systolic and diastolic blood pressure? We can use scatterplots and/or lowess smoothers to visualize Method 1: . graph twoway scatter sbpdbp Method 2: . lowesssbpdbp Biostat 511

  12. Correlation review • Definition: • Correlation coefficient is used to summarize strength of association between two quantitative variables. • Range is (-1, 1) • -1= perfect negative • 1= perfect positive • 0 is uncorrelated • Pearson statistic • Measures strength of linear association • Spearman statistic • Measures strength of the monotone association Biostat 511

  13. Correlation review, cont’d More on the difference between Spearman correlation and Pearson correlation. Here is a visual example of an association that is monotone but not perfectly linear: Note that the relationship is monotone (always increasing/decreasing) but clearly not linear. This is reflected in the different correlation values. Biostat 511

  14. Quantitative vs. Quantitative Descriptive statistics. Pearson correlation on the left, Spearman correlation on the right • Why does the Pearson correlation function output 3 different numbers? • What conclusions can we draw? • Should we be concerned about outlying values? Biostat 511

  15. Categorical vs. Categorical • Pick one categorical variable to stratify on • Summarize distribution of other categorical variable in each stratum. • If the distributions differ, then the variables are associated. • Example: Are myocardial infarction and sex associated? Biostat 511

  16. Categorical vs. Categorical Example: Are myocardial infarction and sex associated? We can summarize the data in a 2x2 table Why is gender labeled 0 and 1? What do they refer to? Biostat 511

  17. Categorical vs. Categorical We can further numerically summarize the 2x2 data by calculating risks for males and females, then calculating the risk difference and risk ratios How were the circled numbers calculated? How do we interpret them? Biostat 511

  18. Questions: For categorical vs. quantitative data, we used stratified box plots as a visualization technique. Can we do the same with quantitative vs. quantitative data? Do we want to? For quantitative vs. quantitative data , we used a scatter plot for visualization. Can we do the same with categorical vs. categorical data? Do we want to? Biostat 511

  19. Summary • Quantitative vs. categorical variables • Visualization: box plots • Numeric summary: mean, standard deviation, median, other percentiles • Quantitative vs quantitative variables • Visualization: scatterplot and fitted line (e.g. lowess curve) • Numeric summary: correlation • Categorical vs. categorical variables • Visualization: contingency table (e.g. 2x2 table when both variables are binary) • Numeric summary: risk ratio, risk difference • The above choices are popular visualization and numerical summaries for their corresponding data types, but by no means the only choices. Biostat 511

  20. Review of previous week’s concepts • For the CHS • What is the population? • What is the sample? • What are some population parameters of interest? • What are the corresponding statistics? Biostat 511

More Related