210 likes | 250 Views
LIWC. L inguistic I nquiry & W ord C ount. Jeff Spicer & Matthew Egizii. The Pennebaker Dictionary. LIWC uses Dictionaries of Categories to define its search terms. The Pennebaker Dictionary is built in, but others can be imported. The Pennebaker Dictionary (2001)
E N D
LIWC Linguistic Inquiry & Word Count Jeff Spicer & Matthew Egizii
The Pennebaker Dictionary • LIWC uses Dictionaries of Categories to define its search terms. • The Pennebaker Dictionary is built in, but others can be imported. • The Pennebaker Dictionary (2001) • LIWC's default set of psychologically meaningful categories • 74 subdictionaries (categories) • {80 in LIWC 2007} • Each subdictionary is comprised of words chosen and assessed by a set of judges who then agreed upon a set of subdictionary scales (93%-100% of the time). • Many of these words are in multiple categories.
The Pennebaker Dictionary • If you are able, use the Pennebaker 2007 rather than 2001: • It removes several categories that had, “Consistently low base rates and were rarely used: Optimism, Positive Feelings, Communication Verbs, Other References, Metaphysical, Sleeping, Grooming, School, Sports, Television, Up, and Down. The category of unique Words (also known as Type/Token ratio) has also been removed.” • It adds the categories of Conjunctions, Adverbs, Quantifiers, Auxiliary Verbs, Commonly-used Verbs, Impersonal Pronouns, Total Function Words, and Total Relativity Words. • Also, the categories themselves are much more fleshed out: • Religion is not strictly, “Catholicism,” as it was before (seemed a tad biased).
The Pennebaker Dictionary • The LIWC website has a page with comparisons between the scores of each dictionary based on its library. • Means, SDs, Correlations • Comparing LIWC2007 with LIWC2001 Dictionaries
Preparing Text • LIWC uses .txt or ASCII files for analysis. • Files should be checked for: • Correct U.S. Spelling • Spelled-out meaningful abbreviations • Removal of “Non-Fluency” words
Reading the Results • Results are given as a % of the total text. • Except for: • Word Count • Words Per Sentence • Sentences Ending with a Question Mark (?) • Results are placed in a .xls file (Spreadsheet) • The file is “Tab-Delimited” meaning that importing it into an SPSS data file is quite simple.
Opening & Processing Files • Opening: Allows you to read/edit the text within LIWC • Processing: Runs the text analysis
Setting Dictionaries & Categories • Each of the categories can be turned on/off with a checkbox.
Analyze Function • Segmenting the File • Segmenting the Selection allows you to divide the text into multiple parts for analysis.
Analysis of Epic Texts • We decided to use the power of CATA on several huge literary blocks of text: • The Odyssey • The Aeneid • Beowulf
Analysis of Epic Texts • Textualization of Oral Epic Tradition • Attempt to capture the Ekphrasis of the original medium. • Some elements are lost in translation. • Question: Which elements are both difficult to describe and also necessary to pass on to a culture?
Analysis of Epic Texts • Primarily we were interested in references to Gods, Religious Tradition and Worship • We chose the (now defunct) category, “Metaphysical,” rather than, “Religion.” • Its word choices are more in line with spirituality rather than modern, formalized religion. • We also used the Standard Information category • Word Count, Words/Sentence, Sentences ending with ?, LIWC dictionary words, Unique words, Words longer than 6 characters
Source of Error? • We had some trouble with the Psychological Processes group. • Several categories wouldn’t shut off, even after de-selecting them. • ??? • So we decided torun them too!
Processed Files • After hitting Process & choosing where to save the .xls file, it will open in plain text within LIWC.
Results & Graphs • Total Word Counts: • Odyssey: 117643 • Aeneid: 101370 • Beowulf: 23726
Results & Graphs • Unique Words (%) • Odyssey: 5.76 • Aeneid: 8.54 • Beowulf: 15.76 • Words/Sentence • Odyssey: 37.43 • Aeneid: 32.74 • Beowulf: 25.19
Results & Graphs • Question Marks (% of Sentences) • Odyssey: 0.23 • Aeneid: 0.41 • Beowulf: 0.03 • Exclamation Marks (% of Sentences) • Odyssey: 0.01 • Aeneid: 0.1 • Beowulf: 0.72 • Metaphysical(% of Words) • Odyssey: 0.97 • Aeneid: 1.37 • Beowulf: 1.42
Findings • Odyssey is: • MASSIVE • Has the longest sentences • Has the least % of unique words • Has the least % of exclamations • Is the least interested in the Metaphysical.
Findings • Aeneid is: • Also pretty big • Has a larger amount of Metaphysical text • Also isn’t interested in exclamations • And asks the most questions.
Findings • Beowulf is: • Pretty short (comparatively) • Much shorter sentences • Filled with many unique words • Asks few questions • Is the most interested in the Metaphysical • And is very excitable!!!!!