290 likes | 598 Views
Coh-Metrix: An Automated Measure of Text Cohesion. Danielle S. McNamara, Yasuhiro Ozuru, Max Louwerse, Art Graesser. Coh-Metrix Investigators. Co-PIs and Senior Researchers: Max Louwerse, Art Graesser, Zhiqiang Cai , Randy Floyd, Xiangen Hu, Vasili Rus
E N D
Coh-Metrix: An Automated Measure of Text Cohesion Danielle S. McNamara, Yasuhiro Ozuru, Max Louwerse, Art Graesser
Coh-Metrix Investigators • Co-PIs and Senior Researchers: Max Louwerse, Art Graesser, Zhiqiang Cai, Randy Floyd, Xiangen Hu, Vasili Rus • Postdocs & Staff: Rachel Best, David Dufty, Christian Hempelman, Tenaha O’Reilly, Yasuhiro Ozuru • Many students
Coh-Metrix • Coh-Metrix v1.2 Analyzes texts on many different dimensions of cohesion and language • Input text on a web site • Outputs 12 primary measures and over 200 additional measures Graesser, McNamara, Louwerse, & Cai, 2004
Prior Research • Increasing text cohesion improves memory for text content. • Increasing argument overlap between sentences. • Most plastics are good insulators. So are clothes you wear, like sweaters and coats. • Most plastics are good insulators. Other good insulators are the clothes you wear, like sweaters and coats. • Adding connectives • For example, most plastics are good insulators. • because, consequently, so that, in addition, however • Adding headers and topic sentences
Prior Research • Increasing text cohesion improves memory for text content. • Text cohesion is particularly crucial for low-knowledge readers. • Decreasing text cohesion helps high-knowledge readers process the text more actively and understand it at a deeper level. • McNamara, Kintsch, Songer, & Kintsch (1996, C&I) • McNamara & Kintsch (1996, DP) • McNamara (2001, CJEP)
Cohesion and Coherence • Research points to the need to consider text difficulty in terms of text cohesion and coherence. • Cohesion is a property of the text. • Coherence is a property of the reader’s mental representation. • We need automated measures of cohesion and coherence.
Current Method:Readability Measures • E.g., Flesch-Kincaid Grade Level • Based on the work of Rudolph Flesch in the 1940’s • Scores range from 0-12 to predict grade appropriateness • Measure based on surface characteristics • sentence length • word length
Goals of Coh-Metrix Tool • Analyze texts on many different dimensions of cohesion and language • Input text on a web site • Outputs over 200 measures • Focus primarily on deeper levels of meaning and cohesion, unlike standard readability formulas • Tailor texts to students (K12, college) with different world knowledge and abilities
Computational Linguistics Modules Lexicons Corpora norms Morpho- semantics Pattern classifiers Part-of- speech tagging Syntactic parsing Latent Semantic analysis
Any disorder that stops the heartfrom supplying blood to the body is a threat to life. Heart disease is such a disorder. Any disorder that stops the blood supply is a threat to life. Heart disease is very common easy Argument overlap F-K hard
Cohesion and Readability Scores for 19 pairs of passages examined in 12 published studies easy Argument overlap F-K hard
List of Cohesion Publications Beck et al. (1984) Beck et al. (1991) Britton and Gulgoz (1989) Cataldo & Oakhill (2000) Kintsch (1990) Lehman & Schraw (2002) Linderholm et al. (2000) Loxterman et al. (1994) McNamara (2001) McNamara et al. (1996) Vidal-Abarca et al. (2000) Voss & Silfies (1996)
Linderholm et al. 2000 Mademoiselle Germaine (Easy Text) What variables showed a greater than 50% difference in favor of the cohesive text? clarification connectives causal, particle to verb ratio causal connectives pronoun incidence McNamara et al. 1996 Mammal Text, Exp. 1 Lehman & Schraw 2002 The Quest for the Northwest Passage No differences causal, particle to verb ratio causal connectives LSA Sentence to Sentence noun overlap
Overall Results • The 20 variables showing the largest differences were co-reference measures. • Argument overlap measures showed the largest differences in comparison to noun and stem overlap measures • Argument overlap includes pronouns • They skied all day. They were tired. • Regardless of whether overlap was counted at distances of 1, 2, or 3 sentences • Adjacent overlap showed the largest difference
Other Significant Variables • Type-Token Ratio for Nouns (L>H) • Higher level constituents per sentence (H>L) • Ratio of causal particles and causal verbs (p<.06; H>L) • Causal connectives (p<.07; H>L) • Celex, log Freq, min in sentence (p<.08; L>H) • Average Words per Sentence (p<.08; H>L) • LSA, sentence to sentence (p<.11; H>L)
Indicates that the high-cohesion texts did not add new information
Number of Words Descriptive Statistics N Minimum Maximum Mean Std. Deviation 38 101.0 1390.0 590.8 381.9
ANNOUNCING THE RELEASE OF Coh-Metrix 1.1
Current Goals • Examine cohesion measures by grade level for TASA and complete textbooks. • Conducting empirical studies to further examine the effects of text cohesion for adults • Conducting experiments to establish the effects of cohesion for young children. • e.g., currently conducting comprehension and eye-tracking studies with 3rd-5th grade children.
What will Coh-Metrix achieve? • Enhance education by giving educators better tools for choosing textbooks • Help publishers more appropriately tailor books to target age groups • Help writers improve the cohesion of their writing • Help researchers better understand the hidden properties of text