200 likes | 347 Views
Medical Biometry I. ( Biostatistics 511) Discussion Section Week 3 Phillip Keung. Discussion Outline. Key Concepts/Topics from Weeks 2 & 3 Illustration: Cardiovascular Health Study Describing associations Categorical vs. continuous characteristics
E N D
Medical Biometry I (Biostatistics 511) Discussion Section Week 3 Phillip Keung Biostat 511
Discussion Outline • Key Concepts/Topics from Weeks 2 & 3 • Illustration: Cardiovascular Health Study • Describing associations • Categorical vs. continuous characteristics • Continuous vs. continuous characteristics • Categorical vs. categorical characteristics • Graphical summaries • Numerical summaries Biostat 511
Example: Cardiovascular Health Study • Study Population • The Cardiovascular Health Study (CHS) is a cohort of men and women, aged 65 years and older, drawn from four U.S. communities. • Details of the CHS study design have been published elsewhere (Fried et al., 1991). Demographic information, laboratory tests, physical measurements, ultrasound, and measures of cognitive and functional status were collected at baseline and at annual visits thereafter. Biostat 511
Set up your Stata session • Start .log file • log using “D:/pkeung/My Documents/week3disc.log” • Replace “liangcj” above with your username • Load CHS data: • use https://courses.washington.edu/b511/Data/chs.dta Biostat 511
Describing Associations • Association (definition): • The distribution of one variable varies by values of the other variable. • Examples: • Height and sex are associated: the distribution of height varies between men and women. • Blood pressures (SBP & DBP): see definition above. • Associations are statistical relationships, not (necessarily) causal relationships. • How might we assess causal relationships? Biostat 511
Approaches for describing 2-way relationships (not exhaustive) Biostat 511
Categorical vs. Quantitative • Stratify on categorical variable • Summarize distribution of quantitative variable in each stratum. • If they differ, then the variables are associated. • Example: Are height and sex are associated? Does the distribution of height vary between men and women. • What kinds of variables (data types) are these? Biostat 511
Categorical vs. Quantitative • Example: Are height and sex associated? Does the distribution of height vary between men and women? • Visualization using stratified box plots: • . graph box height, by(gender) Biostat 511
Categorical vs. Quantitative Descriptive statistics: What conclusions can we draw from the visualization and descriptive statistics? Are height and gender associated? Biostat 511
Quantitative vs. Quantitative • Option 1 (grouping): • Group one variable into categories • Compare the distribution of the other variable by categories of the first (e.g. by using techniques as described in the categorical vs quantitative section). • Option 2 (multivariate): • Scatter plots and lowesscurves to show association. • Assess strength of relationship with correlation (Pearson) • Example: Is there an association between systolic blood pressures and diastolic blood pressures? Biostat 511
Quantitative vs. Quantitative Example: is there an association between systolic and diastolic blood pressure? We can use scatterplots and/or lowess smoothers to visualize Method 1: . graph twoway scatter sbpdbp Method 2: . lowesssbpdbp Biostat 511
Correlation review • Definition: • Correlation coefficient is used to summarize strength of association between two quantitative variables. • Range is (-1, 1) • -1= perfect negative • 1= perfect positive • 0 is uncorrelated • Pearson statistic • Measures strength of linear association • Spearman statistic • Measures strength of the monotone association Biostat 511
Correlation review, cont’d More on the difference between Spearman correlation and Pearson correlation. Here is a visual example of an association that is monotone but not perfectly linear: Note that the relationship is monotone (always increasing/decreasing) but clearly not linear. This is reflected in the different correlation values. Biostat 511
Quantitative vs. Quantitative Descriptive statistics. Pearson correlation on the left, Spearman correlation on the right • Why does the Pearson correlation function output 3 different numbers? • What conclusions can we draw? • Should we be concerned about outlying values? Biostat 511
Categorical vs. Categorical • Pick one categorical variable to stratify on • Summarize distribution of other categorical variable in each stratum. • If the distributions differ, then the variables are associated. • Example: Are myocardial infarction and sex associated? Biostat 511
Categorical vs. Categorical Example: Are myocardial infarction and sex associated? We can summarize the data in a 2x2 table Why is gender labeled 0 and 1? What do they refer to? Biostat 511
Categorical vs. Categorical We can further numerically summarize the 2x2 data by calculating risks for males and females, then calculating the risk difference and risk ratios How were the circled numbers calculated? How do we interpret them? Biostat 511
Questions: For categorical vs. quantitative data, we used stratified box plots as a visualization technique. Can we do the same with quantitative vs. quantitative data? Do we want to? For quantitative vs. quantitative data , we used a scatter plot for visualization. Can we do the same with categorical vs. categorical data? Do we want to? Biostat 511
Summary • Quantitative vs. categorical variables • Visualization: box plots • Numeric summary: mean, standard deviation, median, other percentiles • Quantitative vs quantitative variables • Visualization: scatterplot and fitted line (e.g. lowess curve) • Numeric summary: correlation • Categorical vs. categorical variables • Visualization: contingency table (e.g. 2x2 table when both variables are binary) • Numeric summary: risk ratio, risk difference • The above choices are popular visualization and numerical summaries for their corresponding data types, but by no means the only choices. Biostat 511
Review of previous week’s concepts • For the CHS • What is the population? • What is the sample? • What are some population parameters of interest? • What are the corresponding statistics? Biostat 511