1 / 34

Statistics

Statistics. April 16, 2009 Tim Bunnell, Ph.D. & Jobayer Hossain, Ph.D. Nemours Bioinformatics Core Facility. Experimental Design Terminology.

sachi
Download Presentation

Statistics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Statistics April 16, 2009 Tim Bunnell, Ph.D. & Jobayer Hossain, Ph.D. Nemours Bioinformatics Core Facility Nemours Biomedical Research

  2. Experimental Design Terminology • An Experimental Unit is the entity on which measurement or an observation is made. For example, subjects are experimental units in most clinical studies. • Homogeneous Experimental Units: Units that are as uniform as possible on all characteristics that could affect the response. • Randomization is the process of assigning experimental units randomly to different experimental groups. • Replication is the repetition of an entire experiment or portion of an experiment under two or more sets of conditions. Nemours Biomedical Research

  3. Experimental Design Terminology • A Factor is a controllable independent variable that is being investigated to determine its effect on a response. E.g. treatment group is a factor. • Factors can be fixed or random • Fixed -- the factor can take on a discrete number of values and these are the only values of interest. • Random -- the factor can take on a wide range of values and one wants to generalize from specific values to all possible values. • Each specific value of a factor is called a level. E.g. treatment group: A, B, and placebo. Then all these are three levels. Nemours Biomedical Research

  4. Experimental Design Terminology • Effect is the change in the average response between two factor levels. • That is, factor effect = average response at one level – average response at a second level. Nemours Biomedical Research

  5. Experimental Design Terminology • Interaction is the joint factor effects in which the effect of one factor depends on the levels of the other factors. No interaction effect of factor A and B Interaction effect of factor A and B Nemours Biomedical Research

  6. Analysis of Variance (ANOVA) • The analysis of variance (ANOVA) is a technique of decomposing the total variability of a response variable into: • Variability due to the experimental factor(s) and… • Variability due to error (i.e., factors that are not accounted for in the experimental design). • The basic purpose of ANOVA is to test the equality of several means. • A fixed effect model includes only fixed factors in the model. • A random effect model includes only random factors in the model. • A mixed effect model includes both fixed and random factors in the model. Nemours Biomedical Research

  7. The basic ANOVA situation • Type of variables: Quantitative response and Categorical (factor) predictors (independent variable). • Main Question: Are mean response measures of different groups are equal? • One categorical variable with only 2 levels (groups): • 2-sample t-test • One categorical variable with more than two levels (groups): • One way ANOVA • Two or more categorical variable, each with at least two or more levels (groups) of each: • Factorial ANOVA Nemours Biomedical Research

  8. Graphical Investigation • Graphical investigation: • side-by-side box plots • multiple histograms Side by Side Boxplots Nemours Biomedical Research

  9. One-way analysis of Variance • One factor of k levels or groups. E.g., 3 treatment groups in our default data • Total variation of observations (SST) can be split in two components: variation between groups (SSG) and variation within groups (SSE). • Variation between groups is due to the difference in different groups. E.g. different treatment groups or different doses of the same treatment. • Variation within groups is the inherent variation among the observations within each group. • Completely randomized design (CRD) is an example of one-way analysis of variance. Nemours Biomedical Research

  10. One-way analysis of Variance • Model: • yij = µ + ai + eij • Where yij is the ith observation of the jth group • ai is the effect of the ith group • µ is the grand mean and eij is the error. • Assumptions: • Observations yij are independent. • eij are normally distributed with mean zero and constant standard deviation. • The second assumption implies that response variable for each group is normal (Check using q-q plot, histogram, or test for normality) and standard deviations for all groups are equal (rule of thumb: ratio of largest to smallest are approximately 2:1). Nemours Biomedical Research

  11. One-way analysis of Variance • Hypothesis: Ho: Means of all groups are equal. Ha: At least one of them is not equal to other. • doesn’t say how or which ones differ. • Can follow up with “multiple comparisons” • ANOVA Table for one way classified data Note: Large F means that MSG is large compared to MSE Nemours Biomedical Research

  12. Summary One-way ANOVA • Significance of the differences between the groups depends on • the difference in the means • the standard deviations of each group • the sample sizes • A useful web (thanks Bette for sending this website for the class): www.psych.utah.edu/stat/introstats/anovaflash.html • ANOVA determines P-value from the F statistic Nemours Biomedical Research

  13. Multiple comparisons • If the F test is significant in ANOVA table, then we intend to find the pairs of groups are significantly different. Following are the commonly used procedures: • Fisher’s Least Significant Difference (LSD) • Tukey’s HSD method • Bonferroni’s method • Scheffe’s method • Dunn’s multiple-comparison procedure • Dunnett’s Procedure Nemours Biomedical Research

  14. One-way ANOVA – Rcmdr Demo • Make sure that group variables are in factor form. • Data->Manage variables in active data set -> Convert numeric variables to factor -> pick variables, select supply level names or Use numbers for Factor levels, Give a new name or leave as the same name. • To run one-way ANOVA: • Statistics -> Means-> One-way ANOVA -> Model Name, pick response variable (e.g, PLUC.post), pick group variable (e.g. grp), select pairwise comparisons of means Nemours Biomedical Research

  15. One-way ANOVA output: Pluc.pre on treatment groups > Model1 <- aov(PLUC.post ~ grp, data=data) > summary(Model1) Df Sum Sq Mean Sq F value Pr(>F) grp 2 157.282 78.641 18.844 5.225e-07 *** Residuals 57 237.880 4.173 --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 > numSummary(data$PLUC.post , groups=data$grp, statistics=c("mean", "sd")) mean sd n 1 10.15657 2.227334 20 2 12.15339 1.874268 20 3 14.12242 2.011497 20 Nemours Biomedical Research

  16. One-way ANOVA output: Pluc.pre on treatment groups > .Pairs <- glht(Model1, linfct = mcp(grp = "Tukey")) > confint(.Pairs) Simultaneous Confidence Intervals Multiple Comparisons of Means: Tukey Contrasts Fit: aov(formula = PLUC.post ~ grp, data = data) Estimated Quantile = 2.4064 95% family-wise confidence level Linear Hypotheses: Estimate lwr upr 2 - 1 == 0 1.9968 0.4422 3.5514 3 - 1 == 0 3.9658 2.4113 5.5204 3 - 2 == 0 1.9690 0.4144 3.5236 Nemours Biomedical Research

  17. One-way ANOVA output: Pluc.pre on treatment groups Nemours Biomedical Research

  18. Analysis of variance of factorial experiment (Two or more factors) • Factorial experiment: • The effects of the two or more factors including their interactions are investigated simultaneously. • For example, consider two factors A and B. Then total variation of the response will be split into variation for A, variation for B, variation for their interaction AB, and variation due to error. Nemours Biomedical Research

  19. Analysis of variance of factorial experiment (Two or more factors) • Model with two factors (A, B) and their interactions: • Assumptions: The same as in One-way ANOVA. Nemours Biomedical Research

  20. Analysis of variance of factorial experiment (Two or more factors) • Null Hypotheses: • Hoa: Means of all groups of the factor A are equal. • Hob: Means of all groups of the factor B are equal. • Hoab:(αβ)ij = 0, i. e. two factors A and B are independent Nemours Biomedical Research

  21. Two Factor ANOVA • ANOVA for two factors A and B with their interaction AB. Nemours Biomedical Research

  22. Two-factor ANOVA - Rcmdr Demo Statistics -> Means-> One-way ANOVA -> Model Name, pick response variable (e.g, PLUC.post), pick group variable (e.g. grp), select pairwise comparisons of means Nemours Biomedical Research

  23. Two- way ANOVA output: Pluc.pre on treatment groups and ped Model2 <- (lm(PLUC.post ~ grp*Ped, data=data)) > Anova(Model2) Anova Table (Type II tests) Response: PLUC.post Sum Sq Df F value Pr(>F) grp 157.282 2 19.2016 5.024e-07 *** Ped 0.842 1 0.2055 0.6521 grp:Ped 15.880 2 1.9387 0.1538 Residuals 221.159 54 --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 > tapply(data$PLUC.post, list(grp=data$grp, Ped=data$Ped), mean, na.rm=TRUE) + # means Ped grp 1 2 1 9.591563 10.72158 2 12.397521 11.90926 3 14.798619 13.44621 Nemours Biomedical Research

  24. Two- way ANOVA output: Pluc.pre on treatment groups and ped > tapply(data$PLUC.post, list(grp=data$grp, Ped=data$Ped), sd, na.rm=TRUE) + # std. deviations Ped grp 1 2 1 2.656100 1.645896 2 1.850990 1.964046 3 2.034403 1.840354 > tapply(data$PLUC.post, list(grp=data$grp, Ped=data$Ped), function(x) + sum(!is.na(x))) # counts Ped grp 1 2 1 10 10 2 10 10 3 10 10 Nemours Biomedical Research

  25. Repeated Measures • The term repeated measures refers to data sets with multiple measurements of a response variable on the same experimental unit or subject. • In repeated measurements designs, we are often concerned with two types of variability: • Between-subjects - Variability associated with different groups of subjects who are treated differently (equivalent to between groups effects in one-way ANOVA) • Within-subjects - Variability associated with measurements made on an individual subject. Nemours Biomedical Research

  26. Repeated Measures • Examples of Repeated Measures designs: • Two groups of subjects treated with different drugs for whom responses are measured at six-hour increments for 24 hours. Here, DRUG treatment is the between-subjects factor and TIME is the within-subjects factor. • Students in three different statistics classes (taught by different instructors) are given a test with five problems and scores on each problem are recorded separately. Here CLASS is a between-subjects factor and PROBLEM is a within-subjects factor. Nemours Biomedical Research

  27. Repeated Measures • When measures are made over time as in example A we want to assess: • how the dependent measure changes over time independent of treatment (i.e. the main effectof time) • how treatments differ independent of time (i.e., the main effect of treatment) • how treatment effects differ at different times (i.e. the treatment by time interaction). • Repeated measures require special treatment because: • Observations made on the same subject are not independent of each other. • Adjacent observations in time are likely to be more correlated than non-adjacent observations Nemours Biomedical Research

  28. Repeated Measures Response Time Nemours Biomedical Research

  29. Repeated Measures • Methods of repeated measures ANOVA • Univariate - Uses a single outcome measure. • Multivariate - Uses multiple outcome measures. • Mixed Model Analysis - One or more factors (other than subject) are random effects. • We will discuss only univariate approach Nemours Biomedical Research

  30. Repeated Measures • Assumptions: • Subjects are independent. • The repeated observations for each subject follows a multivariate normal distribution • The correlation between any pair of within subjects levels are equal. This assumption is known as sphericity. Nemours Biomedical Research

  31. Repeated Measures • Test for Sphericity: • Mauchley’s test • Violation of sphericity assumption leads to inflated F statistics and hence inflated type I error. • Three common corrections for violation of sphericity: • Greenhouse-Geisser correction • Huynh-Feldt correction • Lower Bound correction • All these three methods adjust the degrees of freedom using a correction factor called Epsilon. • Epsilon lies between 1/k-1 to 1, where k is the number of levels in the within subject factor. Nemours Biomedical Research

  32. Repeated Measures - R Demo • The following R commands will arrange the data in default.csv for the ANOVA of repeated measures (you can also use R functions gl() for this manipulation), - attach(x) sid1<- c(sid,sid) grp1<-c(grp,grp) plc<- c(PLUC.pre, Pluc.post) time<-c(rep(1,60),rep(2,60)) • The following code is for repeated measures ANOVA summary(aov(plc~factor(grp1)*factor(time) + Error(factor(sid1)))) Nemours Biomedical Research

  33. Repeated Measures R output: Error: factor(sid1) Df Sum Sq Mean Sq F value Pr(>F) factor(grp1) 2 121.640 60.820 16.328 2.476e-06 *** Residuals 57 212.313 3.725 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘’ 1 Error: Within Df Sum Sq Mean Sq F value Pr(>F) factor(time) 1 172.952 172.952 33.8676 2.835e-07 *** factor(grp1):factor(time) 2 46.818 23.409 4.5839 0.01426 * Residuals 57 291.083 5.107 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘’ 1 Nemours Biomedical Research

  34. Thank you Nemours Biomedical Research

More Related