1 / 53

Assessing the Quality of Research Plans and Publications

Assessing the Quality of Research Plans and Publications. Validity ----> Credibility ----> Usability. For Making Decisions About: Instructional Methods: Effective? Program Materials: Effective? Assessment Instruments and Methods: Measure what they are supposed to, reliably?

Download Presentation

Assessing the Quality of Research Plans and Publications

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Assessing the Quality of Research Plans and Publications

  2. Validity ----> Credibility ----> Usability For Making Decisions About: • Instructional Methods: Effective? • Program Materials: Effective? • Assessment Instruments and Methods: Measure what they are supposed to, reliably? • Data, findings: Accurate? Generalizable?

  3. Adequacy of the Question Is the question, hypothesis, problem, or interest stated precisely enough that things to measure objectively can be derived from it? Poor. “Do students learn more when tasks are authentic? [What does ‘authentic’ mean? What does ‘learn more’ mean? Amount? Speed?] Better. “Is there a difference in the number of learning trials required for students to learn to decode words that are in common use (such as fun, car, cat, is, run) vs. pseudowords (fis, nif, ris)?”

  4. Coverage of the Topic Poor. A new theory of reading is presented. The only literature cited was done by the author and by persons sympathetic to the author’s position (that a new theory is needed). The author criticizes other approaches to reading instruction (except his own), but presents none of the extensive research that DOES support other approaches and/or that criticizes the author’s. In other words, the author is misleading readers into thinking that his approach is the best one.

  5. Coverage of the Topic Better. A new theory of reading is presented. The author identifies the other major approaches to reading instruction, and presents a representative sample of the research on each approach—both research that supports and research that challenges these approaches. The author also presents research that supports and that challenges his own approach.

  6. Relevance Did the literature review examine literature that is both directly relevant to the topic at hand and to larger relevant issues? For example, did a literature review on effective mathematics instruction include mathematics research and research on instruction in general? Poor. The literature review focuses narrowly on the topic.

  7. Relevance Better. The literature review focuses on the topic and on other relevant issues, such as the proper distribution of practice. Therefore, the reader can see that the topic at hand is important on both a smaller and larger scale.

  8. Useful Summary Did the writer summarize the literature with a set of generalizations stated as propositions and facts, and diagrams? Did the writer identify possible gaps in what is known? Poor. The literature review is mostly a cascade of supporting citations that gives the appearance of serious intent to conduct unbiased work. The literature does not lead obviously to particular research questions.

  9. Useful Summary Better. The literature review ends with a summary of what is known empirically, partly known but not confirmed, and what is unknown. These empirical findings are stated such that it is clear exactly what the writer is talking about and what needs to be studied.

  10. Useful Summary Poor. “The more engaged students are the more they participate.” [Vague] Good. “The higher the rate of opportunities to respond, the fewer times students become inattentive.” [Objective, observable]

  11. Feasibility Is the project do-able, or feasible? Can sufficient data be collected and analyzed? Poor. “We wish to determine the most effective method for teaching comprehension.” [It is impossible to determine what IS most effective because you cannot study everything that IS and will be.] Better. “We wish to determine which of three methods for teaching comprehension is associated with the highest gains in scores of comprehension on the Especially Fine Test of Reading Comprehension.”

  12. Feasibility Poor. “Our objective is to determine whether violent images in the media cause violent behavior in schools. [You may learn that students who commit more violent acts in school also watch more violent events on TV, but you can never know if seeing those events causes the violence. Besides, there are so many intervening variables and contributing conditions that you could never study all the connections.] Better. “Our objective is to determine whether students who commit more violent acts in school also watch more violent events on TV.”

  13. Null Hypothesis Is the research or report based on the null hypothesis—an intellectually honest position? “I believe program X works, but I’m going to assume that it doesn’t work and I’m going to collect data to try to show that it doesn’t work. If the data do NOT show that X does not work, then I will conclude that maybe it does work. Maybe.”

  14. Null Hypothesis Or, on the contrary, is the author is trying to persuade readers to join him in accepting a method, or already believes he knows the truth or knows what works, or is not very interested in the possibility that he is wrong?

  15. Null Hypothesis Poor. “Good readers do NOT sound out words. They use context to predict what words say. I will support this statement with examples.” The point is not to support what you believe but to TEST it—to try to FALSIFY it. Look for examples that do NOT support the belief or hypothesis. Look for good readers who do NOT guess. Anyone can find evidence to support the stupidest ideas. Hitler could “prove” he was a kind person by reporting how often he petted his dog.

  16. Null Hypothesis Poor. An author only reports studies that show one thing (that Program X is associated with high achievement). The author concludes that Program X produces high achievement. However, there is some research showing that Program X did not produce high achievement. The author did not report THAT research. [This is called “cherry picking.” It is also called not telling the whole truth.]

  17. Null Hypothesis Good. An author conducts research on Math Program X in one school (suburban, middle class). Kids who got Program X learned a lot of math. The author says, “It probably won’t work again.” She tests the program again in a very similar school. Same results. “Okay, but it won’t work in a poor, urban school.” Same results. Okay, that was a fluke. It won’t happen again.” She tests it in another poor, urban school. Same results. “Okay, but it won’t work with Hispanic kids.” It does. “Okay, but it won’t work with POOR Hispanic kids.” It does. “Okay, but it won’t work in rural areas. It does. “Okay, maybe it’s wrong to say it doesn’t work. It makes sense for a principal to try it.” [Replication]

  18. Appropriate Design Is the design (survey, experiment, ethnography) appropriate to the question? Let’s say the research question has to do with what works better (what is associated with better outcomes), or whether something worked. For example, is one reading program better than another? Does a math program produce more skilled math students?

  19. Appropriate Design Poor. Do a survey of teachers. Ask their opinions. [If so, you are measuring teacher perception. You WANT to measure student achievement. That requires that you measure what students DO, not merely what teachers think.] Poor. Have teachers make up their own instruments to measure progress and outcomes. [How will you know that these instruments measure what they are supposed to measure? Are reliable? Are predictive? Are as useful as standardized instruments? You won’t.]

  20. Appropriate Design Poor. Conduct ethnographic observations in classes. Observe student-teacher interaction and students working together and individually. [This may be interesting information, but it tells you little about WHAT math or reading skills students learned. It tells you about the conditions in which they learned or didn’t learn. You can’t use ethnographic research as the sole source of evaluative data on methods and curricula. You need to know what students learned and how much they learned.]

  21. Appropriate Design Better. You have an experimental design. You give pre-tests, post-tests, and possibly measures of progress in between. You use instruments that have already been validated by research, and that measure student learning—specifically, the accuracy and speed of solving math problems, or reading connected text. If you are comparing one program with another, you have comparison classes that are as similar as possible so that other variables (such as help from parents, differences in students’ abilities) do not contribute to the outcomes. Observers are trained to use the instruments and the reliability of their measures is determined.

  22. Experimental Designs Classic Design: Pre-test, Post-Test, Control Group, with Random Allocation E C Pre-test Post-test However, randomization may not be possible. Some students are denied effective instruction.

  23. Experimental Designs Time Series 99 00 01 02 03 04 05 06 07 08 Eclectic Reading First Percentage Passing No no-treatment control group. However, samples will vary from year to year.

  24. Experimental Designs Interrupted Time Series. Same Group, Reversal of Treatment and Baseline Eclectic RF Eclectic RF A1 B1 A2 B2

  25. Appropriate Design Rule. If you want to describe what is happening at one point in time (for example, the percentage of students in each grade level that pass state tests in reading—a cross-section) you could do a SURVEY (literally, over view) of a sample of schools that represent the population of schools in the state (e.g., some urban, some suburban, some rural; some wealthy, some middle class, some poor).

  26. Appropriate Design These survey data might show relationships; for example, the higher the social class, the higher the percentage of students who pass. But that does not show that social class is the cause of achievement. It could be that parental education is part of the cause; educated parents can teach their own children, or wealthier schools have better teachers.

  27. Appropriate Design If you want to find out what people think and feel, ask them, using interviews or questionnaires. This information may be useful in making certain decisions. For example, teachers using a new reading program may say that they do not like it, or say that they need more assistance. Still, these opinions cannot be used to judge whether a program is well-designed or is effective. That is a different variable and would have to be measured in terms of student achievement by teachers who used the program properly.

  28. Causal Relationships If the research is assessing causal relationships (e.g., program effectiveness) did the researchers satisfy the conditions for making credible causal inferences? Did they:

  29. Causal Relationships • Determine that change in the outcome variables (e.g., achievement) followed change in the input variables (introduction of the program)? This requires a pre-test and perhapsrepeatedor alternative measures. • Collect data on both the outcome variables (e.g., achievement) and the input variables (e.g., whether, how often, and how well teachers used the program)? Otherwise, if data show that achievement is low, you might conclude the program is ineffective when in fact teachers did not use it properly. • Rule out the effects of extraneous factors, such as maturation, other sources of instruction, unreliable measurement?

  30. Definitions Often, • When definitions and measures are vague (you don’t know exactly what you are supposed to see), • When measures are subjective (students enjoy, appreciate, demonstrate, understand), and • When data collection involves writing narrative and impressions rather than counting what students and teacher do… it serves the function of disguising bias, ignorance, and agenda behind a façade of “science.”

  31. Definitions Weak, vague, imprecise, and too-broad definitions foster or support unaccountability. How can you be held accountable if you never say exactly what you do and what students are supposed to learn? It ensures that beliefs and biases will always be supported by evidence because when definitions are vague and general, and data are subjective, almost anything can count as supporting evidence.

  32. Definitions Were adequate conceptual definitions (from which measures were derived) used? That is, did conceptual definitions cover all relevant features of a concept and exclude irrelevant features? Poor. Reading is defined as a psycholinguistic guessing game. Therefore, researchers measure student guessing as evidence of reading. Is guessing what is meant by reading?

  33. Definitions Poor. Reading is defined as making sense of text, or “meaning-making.” [In other words, reading is wrongly defined not by what readers DO but by what reading accomplishes.] Therefore, researchers will only measure comprehension. If a teacher is not yet working on comprehension (but IS working on other reading skills) students will by that narrow definition be considered nonreaders.

  34. Definitions Better. Reading is a cognitive routine for accurately and rapidly decoding written text into words and connected statements, and then comprehending the definitions and propositions communicated by the text. [This definition covers all the reading skills (the correspondence between letters and sounds, sounding out words, reading words and sentences fluently, and knowing vocabulary and comprehension strategies). Research based on this definition would be obliged either to study all of what is meant by reading or to explicitly limit the research to certain subskills.]

  35. Definitions Where adequate operational definitions used? Operational definitions are supposed to be derived from conceptual definitions. In addition, operational definitions should provide clear examples that cover the range of what is implied by the conceptual definition, and should exclude what is not relevant.

  36. Definitions Poor. Reading is (properly) conceptually defined as a cognitive routine for accurately and rapidly decoding written text into words and connected statements and then comprehending the definitions and propositions communicated by the text. However, the operational definition of reading includes how children handle books (upright, turn pages), name the parts of a book, and recognize environmental print. These may be important, but they are NOT part of the conceptual definition of reading. Therefore, kids whose reading is measured according to this definition may get high scores even though they are not reading.

  37. Definitions Better. The researchers state that they will not measure all aspects of reading. They are interested (in this study) only in students accurately and rapidly decoding words. They define decoding words this way. “The student says each sound in a word, does not stop between the sounds, and says the word as a unit (blends sounds into a whole).”This operational definition yields objective, observable measures.

  38. Definitions Were objective measures derived from the operational definitions? Decisions about which variables affect other variables, or about whether a program or method is effective, are so important to children’s lives that measures must be objective; that is, any observer can SEE what is being measured.

  39. Definitions Poor measures. • “Students enjoyment of reading.” [How many times they smile?] • “Students’ appreciation of literary genres.” [Students say Thanks to a poem?] • “Students’ demonstrating understanding of the sounding out routine?” [They use hand puppets?] If a researcher doesn’t define variables in terms of what persons DO, then the researcher probably doesn’t know WHAT he’s talking about.

  40. Definitions Better measures. • Enjoyment of reading might be measured objectively (indirectly) by HOW MANY books or how much time students read on their own. • Students’ appreciation of literacy genres might be measured objectively by how many samples of literacy genres (fiction, biography, poetry) they correctly name. • Students demonstrating understanding of the sounding out routine might be measured objectively by the number of words in a paragraph students correctly read within two minutes.

  41. Tested Instruments Were instruments and measurement methods tested for validity and reliability? For example, were the scores obtained from a new instrument compared to scores obtained from another instrument known to be valid and reliable? And were the scores highly correlated? If so, then the new instrument is probably valid and reliable. [Criterion validity] If a new instrument is not validated there is no way to know if it measures what it is supposed to measure and provides accurate information.

  42. Sampling Is the sample size, composition, and selection/allocation of participants to groups appropriate for the type of study and for how the findings might be used? Poor. The author claims that a teaching method is effective. However, it was used on only one class, or in one school. There is no way to tell if it would be effective anywhere else. Poor. The author claims that a method was not effective. However, it was used on only one class, or in one school. There is no way to tell if it would have been effective elsewhere.

  43. Sampling Poor. The author claims that one method worked better than another. However, the two comparison groups were not created by random assignment or by matching. Therefore, the groups may have been different in other ways besides the method that is associated with higher achievement.

  44. Sampling Better. The researcher conducted a pilot study to see if a method seems to work well enough that it ought to be given a more valid test. The author used a small sample (one class or school). This is not a representative sample of the school population but that is alright. [Do you want to use a method that may be harmful on a lot of kids?] The author claims that the findings are very tentative and cannot be generalized anywhere else. Further research is needed.

  45. Sampling Better. The author has conducted research on the same method in many different settings. Each new study is called a replication. The sample of places is representative of the population of schools. The author claims that the data from most of the studies suggest that the method is effective most of the time, but the author cannot figure out what factors hinder its effectiveness or what additional factors may be needed to make it work. The author advises caution in using the method. [This is honest and morally responsible.]

  46. Longtitudinal Was the research conducted for long enough to reveal important processes? Poor. A project on a new teaching method was used for (two weeks, a month, a semester). Students made significant gains in achievement between pre-test and post-test. The author says that the findings are evidence that the method works. [The introduction of something new is often followed by beneficial change, especially because students and teachers expect something to happen.] Do YOU plan to use the method for such a short time?

  47. Longtitudinal Better. A project on a new teaching was used for a year. [Longitudinal research] For the first month or so, students made significant gains. But progress slowed. By the end of the year, students receiving the new method had gained no more than students in control class. The authors conclude that there is no evidence that the program works better than the current method.

  48. Level of Measurement Was the proper level of measurement used? It is best to measure with the highest degree of precision or detail. Least detail. You can measure at the nominal level. These are categories, such as student who is a “Visual learning style reader” “Auditory learning style reader” or “Tactile/kinesthetic learning style reader.

  49. Level of Measurement A little more detail. You can measure at the ordinal level, which implies rank order. • Proficient reader • Emergent reader • Struggling reader • Nonreader

  50. Level of Measurement Most detail. You can measure at the ratio level: real numbers. Objective data. Count something everyone can see. • Some students in grade 3 read connected text 120 correct words per minute. • Other students in grade 3 read connected text 100 correct words per minute. • Still other students in grade 3 read connected text less than 40 correct words per minute.

More Related