1 / 84

Exploring Constructions in Texts: N-grams Analysis & Functionality in Language

This paper discusses the identification & functional contributions of constructions in literary classics and political speeches through N-grams analysis and construction grammar. Learn about the theoretical framework, methodological positioning, and discursive functionality of constructions in discourse.

Download Presentation

Exploring Constructions in Texts: N-grams Analysis & Functionality in Language

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Constructions in Wonderland:Exploring the functionality of constructions through N-grams Functionality of Language | Sprogets Funktionalitet Aalborg Unviersity 10 December, 2014 Kim Ebensgaard Jensen Aalborg University Yoshikata Shibuya Kyoto University of Foreign Studies

  2. Introduction Is there a way to (semi-)automatically identify constructions in texts, discourses, and corpora? Could N-gram analysis and N-gram based network analysis be ways to do that? How can (semi-)automatic identification of constructions help us learn about the functional contributions of constructions in discourse? To this end, we will analyze two literary classics and four political speeches. Kim Ebensgaard Jensen Aalborg University Yoshikata Shibuya Kyoto University of Foreign Studies

  3. Introduction Outline Theoretical framework Positioning of paper (Very) basic principles of construction grammar Functionality of constructions Method Data collection and methods Wordclouds N-grams Network analysis Analyses Word clouds N-grams Networks Kim Ebensgaard Jensen Aalborg University Yoshikata Shibuya Kyoto University of Foreign Studies

  4. Theoretical grounding Kim Ebensgaard Jensen Aalborg University Yoshikata Shibuya Kyoto University of Foreign Studies

  5. Theoretical and methodological positioning of this paper Theoretical positioning Construction grammar (e.g. Goldberg 1995; Croft 2001) Cognitive poetics/stylistics (e.g. Stockwell 2002) Cognitive discourse analysis (e.g. Hart 2013) Methodological positioning Corpus stylistics (e.g. Semino & Short 2004) Corpus-aided discourse studies (e.g. Baker 2012) Text-mining (e.g. Miner et al. 2012) Kim Ebensgaard Jensen Aalborg University Yoshikata Shibuya Kyoto University of Foreign Studies

  6. Constructions In construction grammar, a construction is a functional unit that pairs form and semantic and/or discourse-pragmatic function (Goldberg 1995, 2006; Croft 2001, 2005; Hilpert 2014). Examples from English - [form]/[function] (cf. Langacker 1987): [S V IO DO]/[TRANSFER OF POSSESSION] (Goldberg 1995) [X BE so Y that Z]/[SCALAR CAUSATION] (Bergen & Binsted 2004) [you don't want me to V]/[THREATENING SPEECH ACT] (Martínez 2013) [to begin with]/[INTRODUCTION OF LIST OF ITEMS] (Lipka & Schmid 1994) [PROacc CLinf (or NP)]/[DISBELIEF TOWARDS PROPOSITION] (Lambrecht 1990) 'What, me worry?!', 'Him a doctor?!' etc. Kim Ebensgaard Jensen Aalborg University Yoshikata Shibuya Kyoto University of Foreign Studies

  7. Constructions Constructions may be schematic, substantive (fixed), or something in-between (Fillmore et al. 1988). Constructions may be atomic/simple, complex, or something in-between and form a lexicon-syntax continuum (e.g. Goldberg 1995, Croft 2001). Language competence is an inventory of constructions (aka. the construct-i-con) of varying degrees of abstraction which are instantiated in language use (e.g. Goldberg 1995). In most contemporary incarnations of construction grammar, the construct-i-con is usage-based (e.g. Croft 2001). Constructions are subject to general human cognitive processes and principles, such that language is not a separate, autonomous cognitive faculty; thus, construction grammar is part of the overall endeavor of cognitive linguistics. Kim Ebensgaard Jensen Aalborg University Yoshikata Shibuya Kyoto University of Foreign Studies

  8. The discursive and stylistic functionality of constructions If constructions are functional units (pairings of form and meaning/function), then they logically must contribute to discourse as part of a speaker's linguistic repertoire. Here are two examples: Writers of fiction may use constructions in... descriptions of actions and happenings; characterizations (Culpeper 2009) and mind styles (Fowler 1977) by having characters use certain constructions in their dialog and narrative, or by using certain constructions in the descriptions of characters or of their actions; in setting up the text-world and specifying temporal relations in the narrative; foregrounding, deviation, parallelism etc. (e.g. Short & Leech 2007) In political speeches, speakers may use constructions in... framing of issues and other ideologically based representations; organization of topics/issues; rhetorical strategies (including parallelisms etc.). Kim Ebensgaard Jensen Aalborg University Yoshikata Shibuya Kyoto University of Foreign Studies

  9. Method Kim Ebensgaard Jensen Aalborg University Yoshikata Shibuya Kyoto University of Foreign Studies

  10. Data collection and methods Text data Alice's Adventures in Wonderland by Lewis Carroll (1865), obtained via Gutenberg Project = AW Adventures of Huckleberry Finn by Mark Twain (1884),obtained via Gutenberg Project = HF Inaugural speeches by US Presidents, obtained via Bartleby Text mining and corpus-linguistic methods Wordclouds (R package ‘wordcloud’) N-grams (R package ‘tau’, AntConc) Network analysis (R package ‘igraph’) Concordances (AntConc) Kim Ebensgaard Jensen Aalborg University Yoshikata Shibuya Kyoto University of Foreign Studies

  11. Wordclouds Wordclouds are graphical representations of the lexical texture of a text, based on frequencies. They are visual versions of frequency lists. ... except they do not provide any information on frequencies. Kim Ebensgaard Jensen Aalborg University Yoshikata Shibuya Kyoto University of Foreign Studies

  12. N-grams An N-gram is a string of words that co-occur frequently in a data set (such as a corpus or a text). N-grams are specified in accordance with the number of words in the string in question (N = number) Monogram (1-gram) = one word, bigram (2-gram) = two words, trigram (3-gram) = three words, four-gram (4-gram) = four words, five-gram (5-gram) = five words etc. Examples = 'damn you' (2-gram), 'what the hell' (3-gram), 'pick up the phone' (4-gram) (e.g. Stubbs 2009) Kim Ebensgaard Jensen Aalborg University Yoshikata Shibuya Kyoto University of Foreign Studies

  13. N-grams N-gram retrieval is a text mining technique used in the identification of N-grams in a data set, text, or corpus. How it works (in a nutshell): ask a computer to find strings of N words, and it returns a list of N-grams ranked in terms of frequency. An example: 4-grams in all of Shakespeare's plays Find all instances of word + word + word + word combinations in the collective body of Shakespeare's plays. Calculate frequency of word + word + word + word combinations in the collective body of shakespeare's plays. List the word + word + word + word combinations in terms of frequency in the collective body of shakespeare's plays. Kim Ebensgaard Jensen Aalborg University Yoshikata Shibuya Kyoto University of Foreign Studies

  14. N-grams N-gram retrieval is a text mining technique used in the identification of N-grams in a data set, text, or corpus. How it works (in a nutshell): ask a computer to find strings of N words, and it returns a list of N-grams ranked in terms of frequency. An example: 4-grams in all of Shakespeare's plays Find all instances of word + word + word + word combinations in the collective body of Shakespeare's plays. Calculate frequency of word + word + word + word combinations in the collective body of shakespeare's plays. List the word + word + word + word combinations in terms of frequency in the collective body of shakespeare's plays. Kim Ebensgaard Jensen Aalborg University Yoshikata Shibuya Kyoto University of Foreign Studies

  15. N-grams N-gram analysis can be used for a wide range of things: Identification of "aboutness" of a text or discourse. Phraseological analysis. Probablistic language modeling. Analysis of aspects of style, genre and register. Identification of various types of anonymous writers. Identification of constructions and their functionality in a text or discourse. Etc. Kim Ebensgaard Jensen Aalborg University Yoshikata Shibuya Kyoto University of Foreign Studies

  16. Network analysis Network analysis sets up data points as nodes in a network and calculates the strengths of association between them. As a text mining method, it represents word types (as opposed to tokens) in a text as nodes and, based on N-gram relations, sets up relations between the nodes, or words. Kim Ebensgaard Jensen Aalborg University Yoshikata Shibuya Kyoto University of Foreign Studies

  17. Analysis Kim Ebensgaard Jensen Aalborg University Yoshikata Shibuya Kyoto University of Foreign Studies

  18. Wordclouds: a first look Kim Ebensgaard Jensen Aalborg University Yoshikata Shibuya Kyoto University of Foreign Studies

  19. Wordclouds Wordcloud AW Wordcloud HF Kim Ebensgaard Jensen Aalborg University Yoshikata Shibuya Kyoto University of Foreign Studies

  20. Wordclouds Wordcloud AW Wordcloud HF Visually attractive Informative to some extent No details (e.g. frequency) Kim Ebensgaard Jensen Aalborg University Yoshikata Shibuya Kyoto University of Foreign Studies

  21. N-grams in AW and HF Kim Ebensgaard Jensen Aalborg University Yoshikata Shibuya Kyoto University of Foreign Studies

  22. N-grams in AW 3-grams 4-grams 2-grams Kim Ebensgaard Jensen Aalborg University Yoshikata Shibuya Kyoto University of Foreign Studies

  23. N-grams in AW 3-grams 4-grams 2-grams Kim Ebensgaard Jensen Aalborg University Yoshikata Shibuya Kyoto University of Foreign Studies

  24. N-grams in AW 3-grams 4-grams 2-grams Kim Ebensgaard Jensen Aalborg University Yoshikata Shibuya Kyoto University of Foreign Studies

  25. N-grams in AW 3-grams 4-grams 2-grams Kim Ebensgaard Jensen Aalborg University Yoshikata Shibuya Kyoto University of Foreign Studies

  26. N-grams in AW 3-grams 4-grams 2-grams Kim Ebensgaard Jensen Aalborg University Yoshikata Shibuya Kyoto University of Foreign Studies

  27. N-grams in AW 3-grams 4-grams 2-grams speech / dialog Kim Ebensgaard Jensen Aalborg University Yoshikata Shibuya Kyoto University of Foreign Studies

  28. N-grams in AW 3-grams 4-grams 2-grams definite NP speech / dialog Kim Ebensgaard Jensen Aalborg University Yoshikata Shibuya Kyoto University of Foreign Studies

  29. N-grams in AW 3-grams 4-grams 2-grams definite NP speech / dialog 'said' Kim Ebensgaard Jensen Aalborg University Yoshikata Shibuya Kyoto University of Foreign Studies

  30. N-grams in AW 3-grams 4-grams 2-grams definite NP [DIALOG said NPdef/unique]/[TOPICALIZATION OF DIALOG] speech / dialog 'said' Kim Ebensgaard Jensen Aalborg University Yoshikata Shibuya Kyoto University of Foreign Studies

  31. N-grams in HF 2-grams 3-grams 4-grams 5-grams Kim Ebensgaard Jensen Aalborg University Yoshikata Shibuya Kyoto University of Foreign Studies

  32. N-grams in HF 2-grams 3-grams 4-grams 5-grams Kim Ebensgaard Jensen Aalborg University Yoshikata Shibuya Kyoto University of Foreign Studies

  33. N-grams in HF 2-grams 3-grams 4-grams 5-grams Kim Ebensgaard Jensen Aalborg University Yoshikata Shibuya Kyoto University of Foreign Studies

  34. N-grams in HF 2-grams 3-grams 4-grams 5-grams Kim Ebensgaard Jensen Aalborg University Yoshikata Shibuya Kyoto University of Foreign Studies

  35. N-grams in HF 2-grams 3-grams 4-grams 5-grams Kim Ebensgaard Jensen Aalborg University Yoshikata Shibuya Kyoto University of Foreign Studies

  36. N-grams in HF 2-grams 3-grams 4-grams 5-grams Kim Ebensgaard Jensen Aalborg University Yoshikata Shibuya Kyoto University of Foreign Studies

  37. N-grams in HF 2-grams 3-grams 4-grams 5-grams productive Kim Ebensgaard Jensen Aalborg University Yoshikata Shibuya Kyoto University of Foreign Studies

  38. N-grams in HF 2-grams 3-grams 4-grams 5-grams less productive productive Kim Ebensgaard Jensen Aalborg University Yoshikata Shibuya Kyoto University of Foreign Studies

  39. N-grams in HF 2-grams 3-grams 4-grams 5-grams less productive productive Two different constructions [it BEn'tno X] [there BEn'tno X] Kim Ebensgaard Jensen Aalborg University Yoshikata Shibuya Kyoto University of Foreign Studies

  40. N-grams in HF 2-grams 3-grams 4-grams 5-grams Kim Ebensgaard Jensen Aalborg University Yoshikata Shibuya Kyoto University of Foreign Studies

  41. N-grams in HF 2-grams 3-grams 4-grams 5-grams Kim Ebensgaard Jensen Aalborg University Yoshikata Shibuya Kyoto University of Foreign Studies

  42. N-grams in HF 2-grams 3-grams 4-grams 5-grams Kim Ebensgaard Jensen Aalborg University Yoshikata Shibuya Kyoto University of Foreign Studies

  43. N-grams in HF 2-grams 3-grams 4-grams 5-grams Event relating constructions used to temporally structure events in the narrative: [X and then Y]/[EVENT1 FOLLOWED BY EVENT2] [by and by X]/[EVENT HAPPENING AFTER SOME TIME] Kim Ebensgaard Jensen Aalborg University Yoshikata Shibuya Kyoto University of Foreign Studies

  44. N-gram analysis: pros and cons 1 Pro: Simple N-gram analysis can help us identify and address constructions and their functionality in one text or discourse, as it identifies frequent combinations of words in the text or discourse in question. Con: Problem #1: What simple N-gram analysis does not tell us is whether or not those frequent combination of words can also be found in other text. Our N-gram analyses of AW and HF do not tell us if the findings are actually just general patterns in English or if they are actually good delineations of the texts. Solution to problem #1: To obtain a list of N-grams that really delineate a given text (so that we can identify what constructions are characteristically associated with the text), a comparative analysis can be useful. Kim Ebensgaard Jensen Aalborg University Yoshikata Shibuya Kyoto University of Foreign Studies

  45. N-grams: AW vs. HF Note that here we focus on 2-grams Procedures 2-gram analysis of AW and HF Normalization of frequencies for AW and HF: per 10,000 words Comparison between 2-grams of AW and those of HF with Fisher’s exact test Kim Ebensgaard Jensen Aalborg University Yoshikata Shibuya Kyoto University of Foreign Studies

  46. N-grams: AW vs. HF Kim Ebensgaard Jensen Aalborg University Yoshikata Shibuya Kyoto University of Foreign Studies

  47. N-grams: AW vs. HF Kim Ebensgaard Jensen Aalborg University Yoshikata Shibuya Kyoto University of Foreign Studies

  48. Other texts: a quick look at inaugural speeches Kim Ebensgaard Jensen Aalborg University Yoshikata Shibuya Kyoto University of Foreign Studies

  49. What about other texts? Political texts: inaugural speeches by US Presidents Procedures 2-gram analysis of 4 inaugural speeches by George Bush (GB), Bill Clinton (BC), George W. Bush (GWB), and Barack Obama (BO). Normalization: per 1,000 words Fisher’s exact test Kim Ebensgaard Jensen Aalborg University Yoshikata Shibuya Kyoto University of Foreign Studies

  50. Results Kim Ebensgaard Jensen Aalborg University Yoshikata Shibuya Kyoto University of Foreign Studies

More Related