520 likes | 708 Views
Defining and Evaluating ‘Study Quality’. Luke Plonsky Current Developments in Quantitative Research Methods LOT Winter School, 2014. Study Quality Matters?. YES!. Building theory (or a house) Studies = 2x4s, bricks, etc. Self-evident? Rarely discussed in linguistics research
E N D
Defining and Evaluating ‘Study Quality’ Luke Plonsky Current Developments in Quantitative Research Methods LOT Winter School, 2014
Study Quality Matters? YES! • Building theory (or a house) • Studies = 2x4s, bricks, etc. • Self-evident? • Rarely discussed in linguistics research • But lack of attention to quality ≠ low quality • Implication: Study quality needs to be examined, not assumed
Defining ‘Study Quality’ • How was SQ defined in Plonsky & Gass (2011) and Plonsky (2013)? • How was SQ operationalized? • Do you agree with this definition & operationalization? • Now consider your (sub-)domain of interest • How would you operationalize SQ? • How would you weight or prioritize different features?
LIMITED INFLUENCE ON L2 THEORY, PRACTICE, AND FUTURE RESEARCH INEFFICIENT
Sources/Considerations for an Instrument to Measure Study Quality? 1. (Over 400) Existing measures of study quality from the meta-analysis literature (usually for weighting Ess) (e.g., sample: Valentine & Cooper, 2008—Table 2) 2. Societal guidelines (e.g., APA, APS, sample: JARS Working Group—Table 1, 2008, AERA 2006 reporting standards, LSA??, AAAL/AILA??) 3. Journal guidelines (e.g., Chapelle & Duff, 2003) 4. Methodological syntheses from other social sciences (e.g., Skidmore & Thompson, 2010) 5. Previous reviews / meta-analyses (e.g., Chaudron, 2001; Norris & Ortega, 2000; Plonsky, 2011) 6. Methods/stats textbooks (Larson-Hall, 2010; Porte, 2010) 7. Others?
(Only) Two studies in this area to address study quality empirically Plonsky & Gass (2011); Plonsky (2013, in press) Rationale & Motivations • Study quality needs to be measured, not assumed • Concerns expressed about research and reporting practices • “Respect for the field of SLA can come only through sound scientific progress” (Gass, Fleck, Leder, & Svetics, 1998) • No previous reviews of this nature
Plonsky & Gass (2011) & Plonsky (2013) Two common goals: • Describe and evaluate quantitative research practices • Inform future research practices
Methods(very meta-analytic but focus on methodsrather than substance/effects/outcomes) Plonsky (2013) • Domain: all areas of L2 research; quantitative only • Two journals: LL & SSLA (all published, 1990-2010) • K = 606 • Coded for: designs, analyses, reporting practices (sample scheme) • Analyses: frequencies/%s Plonsky & Gass (2011) • Domain: Interactionist L2 research; quantitative only • Across 16 journals & 2 books (all published, 1980-2009) • K = 174 • Coded for: designs, analyses, reporting practices • Analyses: frequencies/%s How would you define your domain? Where would you search for primary studies?
Results: Analyses Number of Unique Statistical Analyses Used in L2 Research Tests of Statistical Significance in L2 Research Plonsky (2013)
Results: Descriptive Statistics Plonsky (2013)
Results: Other Reporting Practices Plonsky (2013) ?
Studies excluded due to missing data (usually SDs) (as % of meta-analyzed sample) MEDIAN Median K = 16 (Plonsky & Oswald, under review)
Data missing in meta-analyzed studies (as % of total sample)
Reporting of reliability coefficients (as % of meta-analyzed sample)
Reporting of effect sizes & CIs (as % of meta-analyzed sample)
Other data associated with quality/transparency and recommended or required by APA(as % of meta-analyzed sample)
Results: Changes over time Meara (1995): “[When I was in graduate school], anyone who could explain the difference between a one-tailed and two-tailed test of significance was regarded as adangerous intellectual; admitting to a knowledge of one-way analyses of variance was practically the same as admitting to witchcraft in 18th century Massachusetts” (p. 341).
Changes Over Time: Designs Plonsky (in press) Plonsky & Gass (2011)
Changes Over Time: Designs Plonsky (in press)
Changes Over Time: Analyses Plonsky & Gass (2011)
Changes Over Time: Analyses Plonsky (in press)
Changes Over Time: Reporting Practices Plonsky & Gass (2011)
Changes Over Time: Reporting Practices Plonsky (in press)
Relationship between quality and outcomes? Plonsky (2011) Plonsky & Gass (2011): larger effects for studies that include delayed posttests
Discussion (Or: So what?) General: • Few strengths and numerous methodological weaknesses are present—common even—in quantitative L2 research • Quality (and certainly methodological features) vary across subdomains AND over time. • Possible relationship between methodological practices and the outcomes they produce. • Three common themes: • Means-based analyses • Missing data, NHST, and the ‘Power Problem’ • Design Preferences
Discussion: Means-based analyses • ANOVAs, t tests dominate, increasingly • Not problematic as long as • Assumptions checked (17% of Plonsky, 2013) • Data are reported thoroughly • Test are most appropriate for RQs (i.e., not default) • Benefits to increased regression analyses (see Cohen, 1968) • Less categorization of continuous variables (e.g., proficiency, working memory) to use ANOVA loss of variance! • More precise results (R2s +informative than an overall p or eta2) • Fewer tests preservation of experiment-wise power
Discussion: Missing data, NHST, & Power • In general: lots of missing and inconsistently reported data! • BUT We’re getting better! • The “Power Problem” • Small samples • Heavy reliance on NHST • Effects not generally very large • Omission of non-statistical results inflated summary results • Rarely check assumptions • Rarely use multivariate statistics • Rarely analyze power
Discussion: Design Preferences • Signs of domain maturity? • +classroom-based studies • +experimental studies • +delayed posttests
Discussion-Summary Causes/explanations - Inconsistencies among reviewers - Lack of standards - Lack of familiarity with design and appropriate data analysis and reporting - Inadequate training(Lazaraton et al., 1987) - Non-synthetic-mindedness - Publication bias • Effects • Limited interpretability • Limited meta-analyzability • Overestimation of effects • Overreliance on p values S l o w e r P r o g r e s s
Intro • M-As = high visibility and impact on theory and practice quality is critical • Several instruments proposed for assessing M-A quality • Stroup et al. (2000) • Shea et al. (2007) • JARS/MARS (APA, 2008) • Plonsky (2012)
Plonsky’s (2012) Instrument for Assessing M-A Quality • Goal 1: Assess transparency and thoroughness as a means to • Clearly delineate the domain under investigation • Enable replication • Evaluate the appropriateness of the methods in addressing/answering the study’s RQs • Goal 2: Set a tentative, field-specific standard • Inform meta-analysts and reviewers/editors of M-As • Organization: • Lit review/intro • Methods • Discussion What items would you include?
Plonsky’s (2012) Instrument for Assessing M-A Quality—Section I Combine?
Plonsky’s (2012) Instrument for Assessing M-A Quality—Section II
Plonsky’s (2012) Instrument for Assessing M-A Quality—Section III
Looking FORWARD Recommendations for: Individual researchers Journal editors Meta-researchers Researcher trainers Learned societies
Dear individual researchers, • Consider power before AND after a study (but especially before) • pis overrated (meaningless?) especially when working with (a) small samples, (b) large samples, (c) small effects, (d) large effects • Report and interpret data thoroughly (EFFECT SIZES!) • Consider regression and multivariate analyses • Calculate and report instrument reliability • Team up with an experimental (or observational) researcher • Develop expertise in one or more novel (to you) methods/analyses Love, Luke
Dear journal editors, • Use your influence to improve rigor, transparency, and consistency • It’s not enough to require reporting (of…ES, SDs, reliability etc.) – interpretation too! • Develop field-wide and field-specific standards • Include special methodological reviews (see Magnan, 1994) • Devote (precious) journal space to methodological discussions and reports Love, Luke
Dear meta-researchers, • Use your voice! • Guide interpretations of effect sizes in your domains • Evaluate and make knows methodological strengths, weaknesses, and gaps; encourage effective practices and expose weak ones • Don’t just summarize • Explain variability in effects, not just means (e.g., due to small samples, heterogeneous samples or treatments) • Examine substantive and methodological changes over time and as they related to outcomes • Cast the net wide in searching for primary studies Love, Luke
Dear researcher trainers, • Lots of emphasis on the basics: descriptive statistics, sample size+power+effectsize+p; synthetic approach, ANOVA • Encourage more specialized courses, in other departments if necessary Love, Luke
Dear learned societies (AILA/AAAL, LSA, etc.), To Learned Societies (AILA, AAAL, LSA, etc.) • Designate a task force or committee to establish field-specific standards for research and reporting practices: • at least one member of the executive committee, • members from the editorial boards of relevant journals, • a few quantitatively- and qualitatively-minded researchers, • and one or more methodologists in other disciplines Love, Luke
Closure • Content objectives: conceptual and practical (but mostly conceptual) • Inform participants’ current and future research efforts • Motivate future inquiry with a methodological focus • Happy to consult or collaborate on projects related to these discussions