1 / 32

Yoshikoder & General Inquirer

Explore how General Inquirer and Yoshikoder help analyze news content by categorizing words and identifying sentiments. Learn about using categories from Harvard IV-4 and Lasswell dictionaries for a comprehensive analysis.

leroyj
Download Presentation

Yoshikoder & General Inquirer

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Yoshikoder &General Inquirer Jonathan Simon Elizabeth Langdon COM 633, Fall 2010

  2. General Inquirer - Basics • The function of GI is to generate a count of words falling into various dictionary-supplied categories • Uses categories from the Harvard IV-4 dictionary and the Lasswell dictionary, as well as five categories based on the social cognition work of Semin and Fiedler • 182 categories in all • Each category is a list of words and word senses

  3. General Inquirer - Dictionaries • Examples of Harvard IV-4 categories: • Pstv 1045 positive words, plus a subset of 557 words tagged Affilfor words indicating affiliation or supportiveness • Ngtv 1160 negative words, plus a subset of 833 words tagged Hostile for words indicating an attitude or concern with hostility or aggressiveness • Strong 1902 words implying strength, plus a subset of 689 words tagged Power, indicating a concern with power, control or authority • Weak 755 words implying weakness, plus a subset of 284 words tagged Submit, indicating submission to authority or power, dependence on others, vulnerability to others, or withdrawal

  4. General Inquirer - Dictionaries • Examples of Lasswell categories: • PowGain = 65 words about power increasing • PowLoss = 109 words of power decreasing • PowEnds = 30 words about the goals of the power process • PowAren = 53 words referring to political places and environments • PowCon= 228 words for ways of conflicting

  5. General Inquirer - Dictionaries • For names and basic descriptions of each category: http://www.wjh.harvard.edu/~inquirer/homecat.htm • For a list of all words contained in each of the 182 categories: http://www.webuse.umd.edu:9090/tags/

  6. General Inquirer - Dictionaries • Users CAN add new categories • Considerations for adding categories: • “Somewhat comparable to producing a set of survey questions that everyone agrees has validity in measuring a well-specified construct” • To map categories with accuracy requires attention to word use, word senses, and disambiguation routines

  7. General Inquirer – Application & Use • Purpose: Analyze content of news articles from three different sources • Articles are about the same Ted Strickland fundraiser • Include a newscast (via closed captioning) from WKYC, an online article from FOX8, and online article from The Plain Dealer

  8. General Inquirer – Application & Use Beginning Screens:

  9. General Inquirer – Application & Use • Input: • Select the content you wish to analyze • Use plain text format (.txt) • Analyze a single file or multiple files at one time • To analyze multiple files simultaneously, save them to a directory (e.g. F:\NewsArticles) • In output, each file will have its own line of data within your Excel file (one row for single files, multiple rows for multiple files)

  10. General Inquirer – Application & Use • Dictionary: • You will not need to change this! GI will analyze your content using all of its 182 categories • Output: • Specify where you want the data output to be saved, name the file and add the .xls extension

  11. General Inquirer – Application & Use • Tags: • Output is a matrix of counts and percentages of words falling into the dictionaries’ semantic categories • Format column includes r (raw count, or simple count of words) and s (scaled count, or percentage of words in each category • Wordcount column is total number of words in the file • Leftovers column shows words not found in any dictionary

  12. General Inquirer – Application & Use

  13. General Inquirer – Application & Use • Words: • Output is a count of all words appearing in your file • Rows are words, columns are file names

  14. General Inquirer – Application & Use

  15. General Inquirer – Results • Overall, the WKYC article can be viewed as being more positive and affiliative when compared to the FOX and PD articles • WKYC story showed highest percentages of all positively valenced categories • FOX or Plain Dealer showed higher percentages of all negatively valenced categories • CATA / GI findings are reflective of the overall tone of the articles, as experienced by readers (e.g. pulled quotes, emphasis on political / economic climates, etc.)

  16. General Inquirer – Results

  17. Yoshikoder- Basics • Yoshikoder is provides a general word count, custom dictionary word count, KWIC, and reading highlight function • The program can handle multiple documents and analyze them individually or side by side • All dictionaries must be either custom built or downloaded from an external source – several dictionaries are available on the Yoshikoder website

  18. Yoshikoder- Dictionaries • Dictionaries consist of 2 levels: Categories and Patterns • Categories are concept words that fall into a larger construct • Patterns are individual words or phrases that fall into a category and are actually searched for • Yoshikoder dictionaries allow wild cards (*)

  19. Yoshikoder– Application & Use • Purpose: Analyze content of news articles from three different sources • Articles are about the same Ted Strickland fundraiser • Include a newscast (via closed captioning) from WKYC, an online article from FOX8, and online article from The Plain Dealer • This analysis will identify which issues were most frequently mentioned in these stories given a list of predetermined possible issues

  20. Yoshikoder– Application & Use Beginning Screen:

  21. Yoshikoder– Application & Use • Add Document: • Documents must be .TXT file

  22. Yoshikoder– Application & Use Multiple Documents can be uploaded

  23. Yoshikoder – Building a Dictionary 1 2 3 4

  24. Yoshikoder – Building a Dictionary 5 6 7 9 8

  25. Yoshikoder – Building a Dictionary It is important to make sure that the proper level is highlighted when adding a category or pattern. Yoshikoder can stack categories within each other

  26. Yoshikoder – Import a Dictionary Pre-made or downloaded dictionaries can be imported

  27. Yoshikoder – Analysis • A Yoshikoder “concordance” is a KWIC analysis • Concordance > Make Concordance • Results can be exported to HTML or Excel

  28. Yoshikoder - Analysis • Report • Document Word Frequencies reports the frequencies of all words in an individual document • All Word Frequencies reports the frequencies of all words in all documents, sorted by document • Unified Word Frequencies reports the frequencies of all words in all selected documents

  29. Yoshikoder - Analysis • Report • Dictionary Report shows the frequencies of dictionary words, by category or pattern for an individual document • A unified dictionary report downloads the category frequencies into an excel spreadsheet • Document Comparison will compare any two documents • Statistical Comparison Report will compare any two documents in terms of percent difference

  30. Yoshikoder – Analysis Results

  31. Yoshikoder Results – Unified Dictionary Report

  32. Yoshikoder – Analysis Results The Channel 3 newscast contained more issue keywords than the Fox 8 and PD stories, with the biggest difference in focus being in education issues. The “Jobs” issue was most frequently mentioned, however it was more emphasized in the FOX 8 and PD story than in channel 3’s coverage. The remainder of issue mentions were sporadic with little overlap between the sources.

More Related