1 / 43

Catpac & WordStat

Learn about Catpac and WordStat, two powerful software analysis tools for examining text and analyzing consumer behavior. Catpac recognizes word frequency and co-occurrence, while WordStat offers hierarchical categorization and statistical analysis.

shawnaj
Download Presentation

Catpac & WordStat

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Catpac & WordStat Dongwoo Kim & Fran Stewart COM633, Fall 2010

  2. Catpac: The Basics • Originally created by Joseph Woelfel to examine consumer behavior and marketing. • Presented as part of the Galileo package of software analysis tools. • Billed as a “self-organizing artificial neural network” optimized for examining text.

  3. Catpac: What It Does • Recognizes frequency of words used in text. • Focuses on co-occurrence of words – words that appear near each other in context. • Uses cluster analysis to display word co-occurrence. • Incorporates ThoughtView’s perceptual mapping and Oresme’s interactive clustering.

  4. Catpac: How Does It Work? • Catpac moves a window of n words through the text. For example, if the window size selected is 7 words, then Catpac will systematically scan words 1-7, then words 2-8, 3-9 and so on until it completes the document. • Words appearing in the window then activate the neurons representing them. Connections among activated neurons allow Catpac to associate words that appear close together within the text. • .

  5. Getting Started • Catpac can only be used on ASCII text files so Word documents will need to be converted to .txt files. • The most simple analysis is a “dendogram,” according to the Galileo manual. • A dendrogramis “a branching diagram representing a hierarchy of categories based on degree of similarity or number of shared characteristics especially in biological taxonomy.” – Merriam-Webster • Dendron is Greek for tree.

  6. Step 1: Convert to .txt file

  7. Step 2: Input .txt file in Catpac.

  8. Step 3: Select file to be analyzed.

  9. Step 4: Make a Dendrogram.(Note the spelling error.)

  10. This is what you will see ... 50 most frequent words 25 most frequent words

  11. … after you exclude common words. (This seems a bit clunky given what the program purports to do.)

  12. Results • Of the 738 total words in Klaus Krippendorff’s article on “Testing the Reliability of Content Analysis Data: What Is Involved and Why?”, the most frequently used word in the text was data. It appears 84 times, accounting for 11.4 percent of all words used. • Reliability was the next most often-used word, accounting for more than 8 percent of the total words used.

  13. Compare that to … His 1,368-word discussion of “Computing Krippendorff’s Alpha-Reliability,” where data was the most frequently used word (excluding common articles and prepositions). The word appears 65 times, followed closely by reliability and observers.

  14. Dendrograms Ward’s Method Centroid

  15. Examining Word Clusters • The Oresme interactive clustering function allows for examining concepts that are associated with each other. • “Cycle Input” tells which concepts are activated by a selected concept. • “Cycle Output” “cycles the network output window back into itself.” • Huh? • “Instead of ‘thinking’ about the concepts you originally gave it, it is thinking about the concepts generated by the concepts you originally gave it.”

  16. This is what it looks like … Cycle Input Cycle Output The manual makes note of what some analysts call the “Buddhist monk syndrome,” where “after sufficient contemplation, it appears that all things are one.”

  17. To map these cluster concepts … • First save as a crud file (.crd). • “Select Open from the ThoughtView File menu.” • Wait, where the heck is ThoughtView? (CRD files extract coordinate information from the dendrograms.)

  18. 2D mapping of concept clusters Note the tight grouping of words like reliability, data and coders on the right.

  19. 3D mapping of concept clusters

  20. 3D mapping allows for rotation …

  21. Now for a demonstration …

  22. WordStat

  23. WordStat is… • Content analysis module of SimStat. • Designed to analyze textual information (open-ended responses, interview transcripts, journal articles, news stories, websites, etc.) • Used both for automatic categorization of text using a dictionary and for manual coding.

  24. WordStat has… • Integrated text-mining analysis and visualization tools. • Hierarchical categorization dictionary or user-generated dictionary. • Keyword-in-context (KWIC) and keyword retrieval tools. • Capability of statistical analyses (factor analysis, word frequencies, etc.).

  25. Getting Started • First open SimStat because WordStat must be run as part of the SimStat program. • Build your own dictionary because WordStat’s standard dictionaries are lacking. • Run spell-check on the text to be analyzed because misspelled words may be left uncoded. • Select text-type file (Text, MS Word, HTML, Excel, SPSS files)

  26. Example Study • Sense of humor study data (N=288, 52 missing data included) • Open-ended responses (Q: instances of sense of superiority in humor) • Demographical information (gender, ethnic background and political philosophy) and sense of humor

  27. How to get WordStat • Free trial version on web site; http://www.provalisresearch.com/wordstat/WordStatDownload.html • Dictionary; http://www.provalisresearch.com/wordstat/RID.html

  28. How to use WordStat • Create or import an existing dataset

  29. How to use WordStat • Create or import an existing dataset

  30. How to create dictionary • Add categories and words

  31. Dictionary for example study

  32. Results • Frequencies

  33. Results • Frequencies - chart

  34. Results • Frequencies – dendrogram, concept map

  35. Results • Crosstab word count - gender

  36. Results • Crosstab word count – political tendency

  37. Results • Crosstab word count – ethnicity

  38. Results • Crosstab word count – combination

  39. Results • KWIC (Keyword-in-Context)

  40. Reports • Overall Humor>Race>Family>Politics>Religion • Gender (M:105, F:131) Women used more Family (p<.05), less Politics (n.s.) <COUNT> <COLUMN PERCENT>

  41. Reports • Ethnic background (W: 159, NW: 67) • White people used more Humor (p<.01), less Religion (n.s.) • Political philosophy (N=S Consv:13, Consv:30, Mid:64, Libr:63, S Libr:38, No Comment: 28) <COLUMN PERCENT> <COLUMN PERCENT>

  42. Limitations • Incomplete dictionary • Overestimation: ambiguous words, overlapping • Underestimation: misspellings, odd expressions • Categorization: obscurations, incongruities

  43. More? Q & A

More Related