340 likes | 597 Views
Computer-Aided Text Analysis: Tips and Techniques. Jeremy Short Jerry S. Rawls College of Business Administration Texas Tech University. Two Views on Words. “Words are the voice of the heart.” Confucius (China's most famous teacher, philosopher, and political theorist, 551-479 BC)
E N D
Computer-Aided Text Analysis: Tips and Techniques Jeremy Short Jerry S. Rawls College of Business Administration Texas Tech University
Two Views on Words “Words are the voice of the heart.” Confucius (China's most famous teacher, philosopher, and political theorist, 551-479 BC) “I was reading the dictionary. I thought it was a poem about everything.” Stephen Wright (Comedian)
What Is Content Analysis? Content Analysis is a research method that uses a set of procedures to classify or categorize communication. Content analysis allows for an unobtrusive method to gather attributions, cognitions, or other organizational projections. Commonly analyzed communications include shareholder letters and organizational mission statements.
Examples of Content Analysis Short, J.C., & Palmer, T.B. (2003). Organizational performance referents: An empirical examination of their content and influences. Organizational Behavior and Human Decision Processes, 90: 209-224. Palmer, T.B., & Short, J.C. (2008). Mission statements in U.S. colleges of business: An empirical examination of their content with linkages to configurations and performance. Academy of Management Learning and Education, 7: 454-470.
Computer-Aided Text Analysis Computer-Aided Text Analysis (CATA) is a specific form of content analysis. CATA often proceeds with the assumption that word choices provide valuable information in the context of a particular organizational narrative. CATA is advantageous as it can allow for the processing of hundreds of documents quickly with extremely high reliabilities. Despite benefits, less than 25% of Content Analysis studies analyzed by Duriau, Reger, & Pfarrer (2007 ORM) used CATA.
Confusion about Content Analysis Krippendorff (2004) suggests, ‘‘Deductive and inductive inferences are not central to content analysis’’ (p. 36). Neuendorf (2002) suggests: You may use standard dictionaries (e.g., those in Hart’s program DICTION) or originally created dictionaries. When creating original dictionaries, be sure to first generate a frequency list from your text sample, and examine for key words and phrases. (p. 50) Management researchers using content analytic methods generally incorporate both deductive and inductive approaches when using content analysis (Doucet & Jehn, 1997; L. Doucet, B. Kabanoff, & T. Pollock, personal communications, December 14, 2008)
Construct Validation Using CATA Pretty much all of my thoughts today are ‘best practices’ lifted from the following: Short, J.C., Broberg, J.C., Cogliser, C.C., & Brigham, K. (2009). Construct validation using computer-aided text analysis (CATA): An illustration using entrepreneurial orientation. Organizational Research Methods. doi:10.1177/1094428109335949
Content Validity Content validity involves an assessment examining a match between theoretical definition and empirical measurement (Nunnally & Bernstein, 1994). Our approach to CATA relies on single words as the unit of analysis. Key steps involve content validity, external validity, reliability, assessment of dimensionality, and predictive validity. I illustrate our approach using the construct of entrepreneurial orientation.
Deductive Content Validity 1. Create a working definition of the construct of interest using a priori theory when possible We begin by identifying a formal definition of entrepreneurial orientation offered by Lumpkin and Dess (1996 AMR), who define the construct as the, "processes, practices, and decision-making activities that lead to new entry” (p. 136). 2. Conduct initial assessment of construct dimensionality (autonomy, competitive aggressiveness, innovativeness, proactiveness, and risk taking). 3. Develop exhaustive list(s) of key words. 4. Validate word lists using content experts to assess rater reliability.
Validated Word List for Autonomy At-liberty, authority, authorization, autonomic, autonomous, autonomy, decontrol, deregulation, distinct, do-it-yourself, emancipation, free, freedom, freethinking, independence, independent, liberty, license, on-one’s-own, prerogative, self-directed, self-directing, self-direction, self-rule, self-ruling, separate, sovereign, sovereignty, unaffiliated, unattached, unconfined, unconnected, unfettered, unforced, ungoverned, unregulated
Interrater Reliability Holsti’s (1969) method for assessing interrater reliability uses the following formula (PAO = 2A/nA + nB) where PAO is proportion agreement observed, A is the number of agreements between the two raters, and nA and nB are the number of words coded by the two raters. Although there is no generally accepted ‘‘rule of thumb’’ for interrater reliability coefficients analogous to the .70 heuristic for coefficient a, Riffe, Lacy, and Fico (2005) and Krippendorff (2004) suggested interpreting values greater than .80.
Inductive Content Analysis 1. Identify commonly used words from narrative text of interest using DICTION or other CATA software. 2. Identify or create a working definition of the construct of interest to guide word selection. 3. Identify words that match the construct of interest. 4. Establish initial interrater reliability. 5. Refine and finalize word lists.
External Validity 1. Select appropriate samples and relevant narrative texts to examine construct of interest (we chose shareholder letters since entrepreneurial orientation has been conceptualized as a firm level construct). 2. Compare two relevant samples when possible (we compare the S&P 500 with high growth firms from the Russell 2000).
Assessment of Validity *p<.01
Reliability Assure reliability by analyzing texts using a computer-aided technique. We relied on DICTION 5.0 (Hart, 2000).
Dimensionality Assess construct dimensionality using visual inspection of the correlation matrix. If dimensions are uncorrelated, they might be assessing different constructs and dimensions might exhibit problems of convergent validity. If dimensions are correlated over .5, the construct may not be multidimensional. If dimensions exhibit too high a correlation, consider collapsing subdimensions to form a single measure (or less subdimensions)
Predictive Validity Examine ability to predict theoretically related variables not captured via content analysis using regression or structural equation modeling
CATA Using DICTION Short, J.C., & Palmer, T.B. (2008). The application of DICTION to content analysis research in strategic management. Organizational Research Methods, 11: 727-752. The DICTION software package (Hart, 2000) contains 31 predefined dictionaries, containing more than 10,000 search words that can be used to analyze any given text. Based in linguistic theory (Bligh et al., 2004) the dictionaries were developed based on a number of different types of narrative texts including business texts such as annual reports, mission statements, and CEO speeches.
DICTION Master Variables 1. Certainty - language that indicates resoluteness, inflexibility, completeness, and a tendency to speak with authority. 2. Optimism - involves language endorsing some person, group, concept, or event. 3. Activity - examines language featuring movement, change, and implementation of ideas and the avoidance of inertia 4. Realism - examines language describing tangible, immediate, recognizable matters. 5. Commonality - is an approximation of the communitarian concepts found in the work of Etzioni (1993) and this variable examines language that highlights agreed-on values of a group and rejects idiosyncratic modes of engagement.
Optimism Score Formula: [Praise + Satisfaction + Inspiration] - [Blame + Hardship + Denial] PRAISE: Affirmations of some person, group, or abstract entity. For example, terms isolating important social qualities (dear, delightful, witty), entrepreneurial qualities (successful, conscientious, renowned), and moral qualities (faithful, good, noble). SATISFACTION: Terms associated with positive affective states (cheerful, passionate, happiness), with moments of undiminished joy (thanks, smile, welcome) and moments of triumph (celebrating, pride). INSPIRATION: Abstract virtues deserving of universal respect. Most of the terms in this dictionary are nouns isolating desirable moral qualities (faith, honesty, self-sacrifice, virtue) as well as attractive personal qualities (courage, dedication, wisdom, mercy). BLAME: Terms designating social inappropriateness (mean, naive, sloppy, stupid) as well as downright evil (fascist, blood-thirsty, repugnant, malicious) compose this dictionary. HARDSHIP: This dictionary contains natural disasters (earthquake, starvation, tornado, pollution), hostile actions (killers, bankruptcy, enemies, vices) and censurable human behavior (infidelity, despots, betrayal). DENIAL: A dictionary consisting of negative functions words (nor, not, nay).
DICTION Calculated Variables 1. Insistence - is based on the use of repeated words. The assumption is that repetition of key terms indicates a preference for a limited, ordered world. 2. Embellishment - is the ratio of adjectives to verbs and is based on David Boder’s (1940) conception that heavy modification ‘‘slows down’’ a verbal passage by de-emphasizing human and material action. 3. Variety - Variety conforms to Johnson’s (1946) Type-Token Ratio that divides the number of different words in a passage by the passage’s total words. A high score indicates a speaker’s avoidance of overstatement and a preference for precise statements. 4. Complexity - is a measure of the average number of characters-per-word in a given input file. Complexity is based on the notion that convoluted phrasings make a text’s ideas implications unclear.