1 / 76

COM 633: Content Analysis CATA

COM 633: Content Analysis CATA. Kimberly A. Neuendorf, Ph.D. Cleveland State University Fall 2010. COM 633 Fall 2010 CATA Presentations. Kate & Julie: LIWC & PCAD Jen & Diane: LIWC & MCCALite? Fran & Dongwoo: CATPAC & WordStat Jon & Elizabeth: Yoshikoder & General Inquirer

ariane
Download Presentation

COM 633: Content Analysis CATA

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. COM 633: Content AnalysisCATA Kimberly A. Neuendorf, Ph.D. Cleveland State University Fall 2010

  2. COM 633 Fall 2010CATA Presentations • Kate & Julie: LIWC & PCAD • Jen & Diane: LIWC & MCCALite? • Fran & Dongwoo: CATPAC & WordStat • Jon & Elizabeth: Yoshikoder & General Inquirer • Joe: Diction

  3. CATA: Computer Aided Text Analysis • Why might you want to use CATA rather than traditional human-coding techniques? • CATA programs typically have been written by researchers with a specific need; thus, their utility is often limited. • Online search and acquisition opportunities have made CATA easier, more attractive (e.g., Nexis)

  4. Purposes of CATA • 1. Descriptive—e.g., word counts • Modell project, using: • VBPro, M. Mark Miller, 1980s software • 2. Coding of Open-ended Survey Responses • WordStat, SimStat adjunct program • (Provalis Research; Normand Peladeau)

  5. Purposes of CATA Standard Dictionaries: Most of the following applications use internal “standard” dictionaries: • 3. Linguistic and Sociolinguistic Measures • General Inquirer, Philip Stone, 1966 • Harvard IV Dictionary • MCCALite, Don McTavish & Ellen Pirro • 116 “idea categories” are applied to multiple characters in a script • CATPAC, Joseph Woelfel • Semantic “neural” networks—no actual dictionary

  6. Purposes of CATA • 4. Psychometric Measures (or “Thematic Content Analysis”—Smith) • General Inquirer • e.g., Lasswell Values Dictionary • 5. Clinical Psychological/Psychiatric Diagnoses • PCAD, Louis Gottschalk & Robert Bechtel • Computer version of Gottschalk’s earlier human-coded schemes devised to provide alternative diagnostic techniques

  7. Purposes of CATA • 6. Verbal Style or Communicator Style • LIWC, Pennebaker, Booth, & Francis • e.g., positive emotions, cognitive processes • Also includes many linguistic measures and some that might be used as psychometrics • Diction, Rod Hart • Computer application of Hart’s earlier human-coded schemes aimed at measuring characteristics of political speech—e.g., aggression, cooperation, ambivalence

  8. Purposes of CATA • 7. Authorship Attribution • Most use simple counts of letters or words to attribute authorship (e.g., the Federalist papers; Raymond Chandler; Shakespeare) • Basic computer/word processing programming is sufficient

  9. Measurement in CATA Three choices: Custom Dictionaries Complicated, time-consuming Standard Dictionaries A task of matching one’s conceptualization to someone else’s operationalization—sometimes a scavenger hunt Similar to the challenge of finding an appropriate scale for a survey “Emergent” Coding—outcome based on language patterns that emerge (e.g., CATPAC)

  10. Quantitative CATA Programs

  11. Quantitative CATA Programs

  12. Validity and CATA • Validation part of development of CATA system (e.g., Lin et al., 2009—genres of online discussion threads) • Validation of thematic CA (psychometrics) against self-report—rare and uncertain (e.g., McClelland et al., 1992) • A comprehensive model for assessing content, external, and predictive validity when using CATA—Short, Broberg, Cogliser, Brigham (2010) as applied to “entrepreneurial orientation”: • Content validity—an inductive/deductive combo • External validity—use multiple sampling frames • Predictive validity—measure non-CATA variables that should relate

  13. Validity of Standard Dictionaries Trusting the Standard Dictionary—an issue of face validity Few CATA programs reveal the full dictionary lists (e.g., Diction, General Inquirer) None reveal the full algorithm (including disambiguation (e.g., well, pot, leaves)) None account for negation Construct and Criterion Validity Rod Hart’s Diction—”normed” rather than validated Gottschalk and Bechtel’s PCAD—validated against standard psychiatric diagnoses

  14. Quantitative CATA Programs

  15. Yoshikoder

  16. About Yoshikoder • Created by Will Lowe at Harvard’s Department of Government • Can be downloaded free at www.yoshikoder.org • A cross-platform, multi-lingual CATA program • Must run one case at a time • Assumes the researcher will create dictionaries • Can import external dictionaries • Exports results into Excel

  17. Yoshikoder: KWIC and Concordance

  18. Yoshikoder: Dictionary Report

  19. WordStat

  20. About WordStat • Created by Normand Peladeau, as part of the SimStat suite for quantitative data analysis (a counterpart to SPSS) • Must be run as part of SimStat • Particularly suited to analyzing open-ended responses, in that data set typically includes both numeric and textual variables—which can immediately be crosstabulated • The “standard” dictionaries that are included are incomplete and should be avoided • Also includes KWIC

  21. The WordStat Interface (within SimStat)

  22. Selection of Independent & Dependent Variables—Including Textual Variable

  23. Standard WordStat “Dictionaries”

  24. Breakdown of very limited WordStat “Dictionary”

  25. WordStat Output: Word counts

  26. WordStat Output: Dendogram

  27. WordStat Output: Crosstab with bar graph

  28. WordStat Output: Crosstab and 3D representation

  29. WordStat Output: KWIC

  30. General Inquirer (PC/MAC version)

  31. About General Inquirer • Created by Philip Stone in the Department of Social Relations at Harvard in the 1960s—on mainframe for many years • The current version combines the "Harvard IV-4" dictionary content-analysis categories, the "Lasswell" dictionary content-analysis categories, and five categories based on the social cognition work of Semin and Fiedler, making for 182 categories (dictionaries) in all

  32. The General Inquirer (PC) Interface • Input and output files must be named • Two choices: Tags (application of dictionaries) & Words

  33. General Inquirer Output: Tags (data file that may easily be exported to Excel & SPSS) First row of each set is the ‘r’ (raw count) form of the output. This corresponds to frequencies. Second row of each set is the ‘s’ (scaled count) form of the output. This corresponds to percentages (of total).

  34. General Inquirer Output: Words

  35. PCAD

  36. About PCAD • Developed by Gottschalk & Bechtel, using scales developed by Gottschalk & Gleser for human-coding in 1960s • Diagnostic—assesses one text at a time • Intended for naturally-occurring speech or writing, minimum 80 words • Measures states of neuropsychiatric interest such as: • Anxiety • Hostility • Cognitive impairment • Depression • Schizophrenia • Achievement Strivings • Hope

  37. The PCAD Interface

  38. PCAD Interface-2

  39. PCAD Output: 4 Types(Clauses, Summaries, Analyses, Diagnoses)

  40. PCAD Output: Analyses

  41. PCAD Output: Diagnoses

  42. LIWC

  43. About LIWC • Created by Pennebaker, Booth, & Francis • “Looks at how people write & their state of mind” • Intended to measure both affective and cognitive constructs • 84 Output Variables (standard dictionaries): • 17 Standard linguistic dimensions(e.g., number of pronouns) • 25 Word categories (e.g., “psychological constructs – affect, cognition”) • 10 Time categories(e.g.“space, motion”) • 19 Personal concerns (e.g., “home”) James W. Pennebaker & Martha E. Francis

  44. LIWC Dictionaries (dimensions) with sample wordshttp://www.liwc.net/descriptiontable1.php

  45. The LIWC Interface

  46. LIWC Output: Data Matrix (Each row is a case/text, each column a dictionary)

  47. Diction

  48. About Diction • Created by Roderick P. Hart, University of Texas, originally for the purpose of analyzing political discourse • To measure “semantic features”, uses a series of 31 standard dictionaries and five “Master Variables” (scales constituted of combinations of the 31): • Activity • Optimism • Certainty • Realism • Commonality • Users can create custom dictionaries in addition to standard dictionaries. • The program can accept individual or multiple passages.

  49. The Diction Interface

More Related