1 / 49

Statistics for Medical Researchers

Statistics for Medical Researchers. Hongshik Ahn Professor Department of Applied Math and Statistics Stony Brook University Biostatistician, Stony Brook GCRC. Experimental Design Descriptive Statistics and Distributions Comparison of Means Comparison of Proportions

Jims
Download Presentation

Statistics for Medical Researchers

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Statistics for Medical Researchers Hongshik Ahn Professor Department of Applied Math and Statistics Stony Brook University Biostatistician, Stony Brook GCRC

  2. Experimental Design Descriptive Statistics and Distributions Comparison of Means Comparison of Proportions Power Analysis/Sample Size Calculation Correlation and Regression Contents

  3. Experiment Treatment: something that researchers administer to experimental units Factor: controlled independent variable whose levels are set by the experimenter Experimental design Control Treatment Placebo effect Blind single blind, double blind, triple blind 1. Experimental Design

  4. Randomization Completely randomized design Randomized block design: if there are specific differences among groups of subjects Permuted block randomization: used for small studies to maintain reasonably good balance among groups Stratified block randomization: matching 1. Experimental Design

  5. Completely randomized design The computer generated sequence: 4,8,3,2,7,2,6,6,3,4,2,1,6,2,0,……. Two Groups (criterion: even-odd): AABABAAABAABAAA…… Three Groups: (criterion:{1,2,3}~A, {4,5,6}~B, {7,8,9}~C; ignore 0’s) BCAACABBABAABA…… Two Groups: different randomization ratios(eg.,2:3): (criterion:{0,1,2,3}~A, {4,5,6,7,8,9}~B) BBAABABBABAABAA…….. 1. Experimental Design

  6. Permuted block randomization With a block size of 4 for two groups(A,B), there are 6 possible permutations and they can be coded as: 1=AABB, 2=ABAB, 3=ABBA, 4=BAAB, 5=BABA, 6=BBAA Each number in the random number sequence in turn selects the next block, determining the next four participant allocations (ignoring numbers 0,7,8 and 9). e.g., The sequence 67126814…. will produce BBAA AABB ABAB BBAA AABB BAAB. In practice, a block size of four is too small since researchers may crack the code and risk selection bias. Mixing block sizes of 4 and 6 is better with the size kept un known to the investigator. 1. Experimental Design

  7. Methods of Sampling Random sampling Systematic sampling Convenience sampling Stratified sampling 1. Experimental Design

  8. Random Sampling Selection so that each individual member has an equal chance of being selected Systematic Sampling Select some starting point and then select every k th element in the population 1. Experimental Design

  9. Convenience Sampling Use results that are easy to get 1. Experimental Design

  10. Stratified Sampling Draw a sample from each stratum 1. Experimental Design

  11. Parameter: population quantity Statistic: summary of the sample Inference for parameters: use sample Central Tendency Mean (average) Median (middle value) Variability Variance: measure of variation Standard deviation (sd): square root of variance Standard error (se): sd of the estimate Median, quartiles, min., max, range, boxplot Proportion 2. Descriptive Statistics & Distributions

  12. Normal distribution 2. Descriptive Statistics & Distributions

  13. Standard normal distribution: Mean 0, variance 1 2. Descriptive Statistics & Distributions

  14. Z-test for means T-test for means if sd is unknown 2. Descriptive Statistics & Distributions

  15. Two-sample t-test Two independent groups: Control and treatment Continuous variables Assumption: populations are normally distributed Checking normality Histogram Normal probability curve (Q-Q plot): straight? Shapiro-Wilk test, Kolmogorov-Smirnov test, Anderson-Darling test If the normality assumption is violated T-test is not appropriate. Possible transformation Use non-parametric alternative: Mann-Whitney U-test (Wilcoxon rank-sum test) 3. Inference for Means

  16. A clinical trial on effectiveness of drug A in preventing premature birth 30 pregnant women are randomly assigned to control and treatment groups of size 15 each Primary endpoint: weight of the babies at birth TreatmentControl n 15 15 mean 7.08 6.26 sd 0.90 0.96 3. Inference for Means

  17. Hypothesis: The group means are different Null hypothesis (Ho):1 = 2 Alternative hypothesis (H1):12 Significance level:  = 0.05 Assumption: Equal variance Degrees of freedom (df): Calculate the T-value (test statistic) P-value: Type I error rate (false positive rate) Reject Ho if p-value <  Do not reject Ho if p-value >  3. Inference for Means

  18. Previous example: Test at P-value: 0.026 < 0.05 Reject the null hypothesis that there is no drug effect. 3. Inference for Means

  19. Confidence interval (CI): An interval of values used to estimate the true value of a population parameter. The probability 1-  that is the proportion of times that the CI actually contains the population parameter, assuming that the estimation process is repeated a large number of times. Common choices: 90% CI ( = 10%), 95% CI ( = 5%), 99% CI ( = 1%) 3. Inference for Means

  20. 3.Inference for Means CI for a comparison of two means: where A 95% CI for the previous example:

  21. SAS programming for Two-Sample T-test Data steps : Click ‘File’ Click ‘Import Data’ Select a data source Click ‘Browse’ and find the path of the data file Click ‘Next’ Fill the blank of ‘Member’ with the name of the SAS data set Click ‘Finish’ Procedure steps : Click ‘Solutions’ Click ‘Analysis’ Click ‘Analyst’ Click ‘File’ Click ‘Open By SAS Name’ Select the SAS data set and Click ‘OK’ Click ‘Statistics’ Click ‘ Hypothesis Tests’ Click ‘Two-Sample T-test for Means’ Select the independent variable as ‘Group’ and the dependent variable as ‘Dependent’ Choose the interested Hypothesis and Click ‘OK’ 3. Inference for Means

  22. 3. Inference for Means Click ‘File’ to import data and create the SAS data set. Click ‘Solution’to create a project to run statistical test Click ‘File’ to open the SAS data set. Click ‘Statistics’ to select the statistical procedure.

  23. Mann-Whitney U-Test (Wilcoxon Rank-Sum Test) Nonparametric alternative to two-sample t-test The populations don’t need to be normal H0: The two samples come from populations with equal medians H1: The two samples come from populations with different medians 3. Inference for Means

  24. Mann-Whitney U-Test Procedure Temporarily combine the two samples into one big sample, then replace each sample value with its rank Find the sum of the ranks for either one of the two samples Calculate the value of the z test statistic 3. Inference for Means

  25. Mann-Whitney U-Test, Example Numbers in parentheses are their ranks beginning with a rank of 1 assigned to the lowest value of 17.7. R1 and R2: sum of ranks 3. Inference for Means

  26. Hypothesis: The group means are different Ho: Men and women have same median BMI’s H1: Men and women have different median BMI’s p-value= 0.33, thus we do not reject H0 at =0.05. There is no significant difference in BMI between men and women. 3. Inference for Means

  27. SAS Programming for Mann-Whitney U-Test Procedure Data steps : The same as slide 21. Procedure steps : Click ‘Solutions’ Click ‘Analysis’ Click ‘Analyst’ Click ‘File’ Click ‘Open By SAS Name’ Select the SAS data set and Click ‘OK’ Click ‘Statistics’ Click ‘ ANOVA’ Click ‘Nonparametric One-Way ANOVA’ Select the ‘Dependent’ and ‘Independent’ variables respectively and choose the interested test Click ‘OK’ 3. Inference for Means

  28. 3. Inference for Means Click ‘File’ to open the SAS data set. Click ‘Statistics’ to select the statistical procedure. Select the dependent and independentvariables:

  29. Paired t-test Mean difference of matched pairs Test for changes (e.g., before & after) The measures in each pair are correlated. Assumption: population is normally distributed Take the difference in each pair and perform one-sample t-test. Check normality If the normality assumption is viloated T-test is not appropriate. Use non-parametric alternative: Wilcoxon signed rank test 3. Inference for Means

  30. Notation for paired t-test d= individual difference between the two values of a single matched pair µd= mean value of the differences dfor the population of paired data = mean value of the differences dfor the paired sample data sd= standard deviation of the differences dfor the paired sample data n = number of pairs 3. Inference for Means

  31. Example: Systolic Blood Pressure OC:Oral contraceptive 3. Inference for Means

  32. Hypothesis: The group means are different Ho: vs. H1: Significance level:  = 0.05 Degrees of freedom (df): Test statistic P-value: 0.009, thus reject Ho at =0.05 The data support the claim that oral contraceptives affect the systolic bp. 3. Inference for Means

  33. Confidence interval for matched pairs 100(1-)% CI: 95% CI for the mean difference of the systolic bp:  (1.53, 8.07) 3. Inference for Means

  34. SAS Programming for Paired T-test Data steps : The same as slide 21. Procedure steps : Click ‘Solutions’ Click ‘Analysis’ Click ‘Analyst’ Click ‘File’ Click ‘Open By SAS Name’ Select the SAS data set and Click ‘OK’ Click ‘Statistics’ Click ‘ Hypothesis tests’ Click ‘Two-Sample Paired T-test for means’ Select the ‘Group1’ and ‘Group2’ variables respectively Click ‘OK’ (Note: You can also calculate the difference, and use it as the dependent variable to run the one-sample t-test) 3. Inference for Means

  35. 3. Inference for Means Click ‘File’ to open the SAS data set. Click ‘Statistics’ to select the statistical procedure. Put the two group variables into ‘Group 1’ and ‘Group 2’

  36. Comparison of more than two means: ANOVA (Analysis of Variance) One-way ANOVA: One factor, eg., control, drug 1, drug 2 Two-way ANOVA: Two factors, eg., drugs, age groups Repeated measures: If there is a repeated measures within subject such as time points 3. Inference for Means

  37. Example: Pulmonary disease Endpoint: Mid-expiratory flow (FEF) in L/s 6 groups: nonsmokers (NS), passive smokers (PS), noninhaling smokers (NI), light smokers (LS), moderate smokers (MS) and heavy smokers (HS) 3. Inference for means

  38. Example: Pulmonary disease Ho: group means are the same H1: not all the groups means are the same P-value<0.001 There is a significant difference in the mean FEF among the groups. Comparison of specific groups: linear contrast Multiple comparison: Bonferroni adjustment (/n) 3. Inference for means

  39. SAS Programming for One-Way ANOVA Data steps : The same as slide 21. Procedure steps : Click ‘Solutions’ Click ‘Analysis’ Click ‘Analyst’ Click ‘File’ Click ‘Open By SAS Name’ Select the SAS data set and Click ‘OK’ Click ‘Statistics’ Click ‘ ANOVA’ Click ‘One-Way ANOVA’ Select the ‘Independent’ and ‘Dependent’ variables respectively Click ‘OK’ 3. Inference for Means

  40. 3. Inference for Means Click ‘File’ to open the SAS data set. Click ‘Solutions’ to select the statistical procedure. Select the dependent and Independentvariables:

  41. Chi-square test Testing difference of two proportions n: #successes, p: success rate Requirement: & H0: p1 = p2 H1: p1 p2 (for two-sided test) If the requirement is not satisfied, use Fisher’s exact test. 4. Inference for Proportions

  42. Decide significance level (eg. 0.05) Decide desired power (eg. 80%) One-sided or two-sided test Comparison of means: two-sample t-test Need to know sample means in each group Need to know sample sd’s in each group Calculation: use software (Nquery, power, etc) Comparison of proportions: Chi-square test Need to know sample proportions in each group Continuity correction Small sample size: Fisher’s exact test Calculation: use software 5. Power/Sample Size Calculation

  43. Correlation Pearson correlation for continuous variables Spearman correlation for ranked variables Chi-square test for categorical variables Pearson correlation Correlation coefficient (r): -1<r<1 Test for coefficient: t-test Larger sample  more significant for the same value of the correlation coefficient Thus it is not meaningful to judge by the magnitude of the correlation coefficient. Judge the significance of the correlation by p-value 6. Correlation and Regression

  44. Regression Objective Find out whether a significant linear relationship exists between the response and independent variables Use it to predict a future value Notation X: independent (predictor) variable Y: dependent (response) variable Multiple linear regression model Where is the random error Checking the model (assumption) Normality: q-q plot, histogram, Shapiro-Wilk test Equal variance: predicted y vs. error is a band shape Linear relationship: predicted y vs. each x 6. Correlation and Regression

  45. 6. Correlation and Regression

  46. The regression equation is The mean blood pressure increases by 1.08 if weight (x1) increases by one pound and age (x2) remains fixed. Similarly, a 1-year increase in age with the weight held fixed will increase the mean blood pressure by 0.425. s=2.509 R2=95.8% Error sd  is estimated as 2.509 with df=13-3=10 95.8% of the variation in y can be explained by the regression. 6. Correlation and Regression

  47. SAS Programming for Linear Regression Data steps : The same as slide 21. Procedure steps : Click ‘Solutions’ Click ‘Analysis’ Click ‘Analyst’ Click ‘File’ Click ‘Open By SAS Name’ Select the SAS data set and Click ‘OK’ Click ‘Statistics’ Click ‘ Regression’ Click ‘Linear’ Select the ‘Dependent’ (Response) variable and the ‘Explanatory’ (Predictor) variable respectively Click ‘OK’ 6. Correlation and Regression

  48. 6. Correlation and Regression Click ‘File’ to open the SAS data set. Click ‘Solutions’ to select the statistical procedure. Select the dependent and explanatory variables:

  49. Other regression models Polynomial regression Transformation Logistic regression 6. Correlation and Regression

More Related