290 likes | 378 Views
The Computational Foundations of Language Arts. Art Graesser , Zhiqiang Cai & Nia Dowell Psychology, Computer Science, & the Institute for Intelligent Systems. Overview. Analysis of text and discourse with Coh -Metrics
E N D
The Computational Foundations of Language Arts Art Graesser, ZhiqiangCai & Nia Dowell Psychology, Computer Science, & the Institute for Intelligent Systems
Overview • Analysis of text and discourse with Coh-Metrics • Five primary measures of text complexity • Complexity profiles for different literary texts
Text Difficulty Measures • Common measures • Flesch-Kincaid (Klare, 1975) • Degrees of Reading Power (Koslin, Zeno, Koslin, 1987) • Lexile scores (Stenner, 2006, MetaMetrix) • Typical factors • Word Familiarity • Word frequency in the language • Number of letters • Number of syllables • Sentence Length • Validated primarily with Cloze comprehension tests • Predictive of surface level comprehension
Multilevel framework of discourse comprehension • Words • Syntax • Textbase • Explicit ideas (propositions) • Referential cohesion • Situation model • causal, intentional, temporal, spatial logical relationships • Connectives • Genre and rhetorical structure • Pragmatic communication • Graesser & McNamara (2010) • Graesser, Millis & Zwaan (1997) • Kintsch • Perfetti
Measures from Coh-Metrix Text Graesser, McNamara, Louwerse, & Cai (2004)
Preprocessing Syntax Analysis Lexical Analysis (Filters) (Tagger, Parser) (Lemmatizier, Stemmer) WordNet Word Difficulty Referential Cohesion Database Info LSA Wd Lists Sentence Complexity Causal Cohesion CELEX MRC Temporal Cohesion Syntax Features Lexical Features Spatial Cohesion Text Complexity Components Coh-Metrix Database
Example Coh-Metrix Measures • Co-reference Cohesion • Noun and argument overlap • Stem overlap • (lemmas: run, runs, runner) • Latent semantic analysis (LSA) • Lexical diversity (type-token ratio) • Pronouns Word Measures • Number of syllables • Part of speech (noun, verb…) • Word frequency • Concreteness, imagery • Multiple meanings • Syntax • Structural complexity • Modifiers per noun-phrase • Words before main verb of main clause • Syntactic similarity between sentences • Situation Model Cohesion • Connectives & discourse markers • Causal and intentional verbs • Causal and intentional cohesion • Repetition in tense and aspect • Logical operators • and, or, therefore, if, then, not
Analysis of TASA Texts • TASA - Touchstone Applied Science Associates • 37,651 texts • Represents texts a student would experience throughout K12. • Texts had mean of 288.6 words (SD = 25.4) • Most of the texts in the language arts, science, and social studies • Degrees of Reading Power (DRP) scores • Conducted a Principal Components Analysis
Coh-Metrix Ease of Processing • Narrativity. Narrative text tells a story, with characters, events, places, and things that are familiar to the reader. • Referential cohesion. High cohesion text contains words and ideas that overlap across sentences and the entire text, forming threads that connect the textbase together for the reader. • Situation model cohesion. Causal, intentional, and temporal connectives help the reader to form a more coherent and deeper understanding of the text. • Syntactic simplicity. Sentences with few words and simple, familiar structures are easier to process and understand. • Word concreteness. Concrete words evoke mental images and are more meaningful to the reader than abstract words.
Narrativity (Genre) NARRATIVE (e.g. stories, language arts) Familiar words Early age of acquisition Verbs Adverbs Pronouns Negations Intentional actions INFORMATIONAL (e.g., science, expository) Nouns Adjectives Longer noun-phrases Passives
Z-scores on Five Components as a Function of DRP Grade Levels
Z-scores for Language Arts versus Science and Social Studies
Z-scores on Five Components as a Function of DRP Grade Levels for Language Arts
Midsummer Night’s Dream (grade 6-8)William Shakespeare Percentile on Text Complexity NARRATIVITY REFERENTIAL COHESION SITUATION MODEL COHESION SYNTAX WORD ABSTRACTNESS EASY DIFFICULT high low high low high low simple complex concrete abstract
Romeo and Juliet (grade 9-10)William Shakespeare Percentile on Text Complexity NARRATIVITY REFERENTIAL COHESION SITUATION MODEL COHESION SYNTAX WORD ABSTRACTNESS EASY DIFFICULT high low high low high low simple complex concrete abstract
Macbeth (grade 11-12)William Shakespeare Percentile on Text Complexity NARRATIVITY REFERENTIAL COHESION SITUATION MODEL COHESION SYNTAX WORD ABSTRACTNESS EASY DIFFICULT high low high low high low simple complex concrete abstract
Death of a Salesman (grade 11-12) Arthur Miller Percentile on Text Complexity NARRATIVITY REFERENTIAL COHESION SITUATION MODEL COHESION SYNTAX WORD ABSTRACTNESS EASY DIFFICULT high low high low high low simple complex concrete abstract
Glass Menagerie (grade 9-10)Tennessee Williams Percentile on Text Complexity NARRATIVITY REFERENTIAL COHESION SITUATION MODEL COHESION SYNTAX WORD ABSTRACTNESS EASY DIFFICULT high low high low high low simple complex concrete abstract
Adventures of Tom Sawyer (grade 6-8)Mark Twain Percentile on Text Complexity NARRATIVITY REFERENTIAL COHESION SITUATION MODEL COHESION SYNTAX WORD ABSTRACTNESS EASY DIFFICULT high low high low high low simple complex concrete abstract
Little Women (grade 6-8)Louisa May Alcott Percentile on Text Complexity NARRATIVITY REFERENTIAL COHESION SITUATION MODEL COHESION SYNTAX WORD ABSTRACTNESS EASY DIFFICULT high low high low high low simple complex concrete abstract
Grapes of Wrath (grade 9-10)John Steinbeck Percentile on Text Complexity NARRATIVITY REFERENTIAL COHESION SITUATION MODEL COHESION SYNTAX WORD ABSTRACTNESS EASY DIFFICULT high low high low high low simple complex concrete abstract
Farenheit 451 (grade 9-10)Ray Bradbury Percentile on Text Complexity NARRATIVITY REFERENTIAL COHESION SITUATION MODEL COHESION SYNTAX WORD ABSTRACTNESS EASY DIFFICULT high low high low high low simple complex concrete abstract
Jane Eyre(grade 9-10)Charlotte Brontë Percentile on Text Complexity NARRATIVITY REFERENTIAL COHESION SITUATION MODEL COHESION SYNTAX WORD ABSTRACTNESS EASY DIFFICULT high low high low high low simple complex concrete abstract
As I Lay Dying (grade 11-12)William Faulkner Percentile on Text Complexity NARRATIVITY REFERENTIAL COHESION SITUATION MODEL COHESION SYNTAX WORD ABSTRACTNESS EASY DIFFICULT high low high low high low simple complex concrete abstract
The Great Gatsby (grade 11-12)F. Scott Fitzgerald Percentile on Text Complexity NARRATIVITY REFERENTIAL COHESION SITUATION MODEL COHESION SYNTAX WORD ABSTRACTNESS EASY DIFFICULT high low high low high low simple complex concrete abstract
Next Steps • Analyze K12 texts in different states • Gates Foundation and Core Standards • Structural equation modeling on 5 complexity components • Integrating LIWC with Coh-Metrix