1 / 34

Corpus (2)

Corpus (2). Games. Look at your watch and count how much time does it take to recognise the disordered words and group of words respectively?. mttaer. slpeling. it deosn’t mttaer. this is bcuseae. Will you say ‘Yes’?.

hafwen
Download Presentation

Corpus (2)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Corpus (2)

  2. Games • Look at your watch and count how much time does it take to recognise the disordered words and group of words respectively? mttaer slpeling it deosn’t mttaer this is bcuseae

  3. Will you say ‘Yes’? • To my 'selected' strange-minded friends:If you can read the following paragraph, reply it with a "Yes".--------------------------------------------------------------------------------Only great minds can read this This is weird, but interesting! fi yuo cna raed tihs, yuo hvae a sgtrane mnid too Cna yuo raed tihs? Olny 55 plepoe out of 100 can. i cdnuolt blveiee taht I cluod aulaclty uesdnatnrd waht I was rdanieg. The phaonmneal pweor of the hmuan mnid, aoccdrnig to a rscheearch at Cmabrigde Uinervtisy, it dseno't mtaetr in waht oerdr the ltteres in a wrod are, the olny iproamtnt tihng is taht the frs it and lsat ltteer be in the rghit pclae. The rset can be a taotl mses and you can sitll raed it whotuit a pboerlm. Tihs is bcuseae the huamn mnid deos not raed ervey lteter by istlef, but the wrod as a wlohe. Azanmig huh? yaeh and I awlyas tghuhot slpeling was ipmorantt! if you can raed tihs forwrad it Reply the topic with a "YES" if you are within the 55 ^_^

  4. Meaning is created • According to a researcher at Cambridge University, it doesn’t matter in what order the letters in a word are, the only important thing is that the first and last letter be in the right place. The rest can be a total mess and you can still read it without a problem. This is because the human mind does not read every letter by itself, but the word as a whole.

  5. Meaning is created • The human mind does not read every word in a clause by itself, but the co-slection of words as a whole. • Words do not create meaning in isolation. They enter into meaningful relations with other words around them, for some, we can directly observe, e.g. compounds, idioms, phrasal verbs, jargon expressions, fixed phrases etc. But they are many others we can only observe with the computer. • Meaning is created by the co-selection of words, i.e. patterns of co-selection among words have a direct connection with meaning.

  6. Five categories of co-selection • Sinclair (2004) put forward the model of five categories of co-selection as components of a lexical item: • Obligatory 1. the core: “invariable, and constitutes the evidence of the occurrence of the lexical item as a whole” (141) 2. semantic prosody • Optional 3. collocation (Firth 1951, 1957) 4. colligation (1951, 1957) 5. semantic preference i.e. the lexical item = semantic prosody + the core (+semantic preference) (+collocation) (+colligation)

  7. How to use a corpus 2. Read for formal patterns: Collocation • You shall know a word by the company it keeps (Firth 1957:179) • We may use the term node to refer to an item whose collocations we are studying, and we may define a span as the number of lexical items on each side of a node that we consider relevant to that node. Items in the environment set by the span we will call collocates. (Sinclair 1966:415) • Collocates are the words which occur in the neighbourhood of your search word

  8. Collocation • The collocates of a word are words which frequently co-occur in the vicinity of that word, e.g. letter co-occur with post, stamp, envelope cause co-occur with destruction, damage, disease, pain seeking co-occurs with asylum, heldp, advice, support, information

  9. Concordance: provide

  10. Questions • What are the collocates of the word ‘provide’ in the concordance?

  11. Collocation • This a lexical relation between two or more words which have a tendency to co-occur within a few words of each other in running text. For example, PROVIDE frequently occurs with words which refer to valuable things which people need, such as help and assistance, money, food and shelter, and information. These are some of the frequent collocates of the verb. (Stubbs 2002: 24). collocates …node …collocates ---------------- span ---------------- [-5 +5] • In order to identify the patterns, we need to sort out the collocates to our needs.

  12. How to use a corpus • 3. Read for formal patterns: Colligation • Colligation can be defined as ‘the grammatical company a word keeps and the position it prefers’: in other words, a word’s colligations describe what it typically does grammatically (Hoey 2000:234) • knowledge of a collocation, if it is to be used appropriately, necessarily involves knowledge of the patterns or colligations in which that collocation can occur acceptably (Hargreaves 2000:214).

  13. Colligation • A word has a colligate when a particular word class co-occurs in the vicinity of the word. e.g. the word the often co-occur with a noun or noun phrase The word cases often co-occur with a quantifier: some, many, most, more, both several, etc.

  14. ‘against’ in sport news report

  15. Concordance sample of give(BNC World Edition) - colligations

  16. What are the colligations of ‘against’? • What are the colligations of ‘give’?

  17. Semantic prosody • “the determiner of the meaning of the whole lexical item” (Sinclair 2004:141) • “a subtle element of attitudinal, often pragmatic meaning” (p.145) • “shows how the rest of the item is to be interpreted functionally (p.34)

  18. Semantic prosody • Read as a sample of social practice:Semantic or discourseprosody • A discourse prosody is a feature which extends over more than one unit in a linear string. […] Discourse prosodies express speaker attitude(Stubbs 2002: 65) • ‘the consistent aura of meaning with which a form is imbued by its collocates’ … prosodies based on very frequent forms can bifurcate into ‘good’ and ‘bad’, using a grammatical principle like transitivity in order to do so.

  19. Build up

  20. Cause

  21. Semantic preference (1) • “the restriction of regular co-occurrence to items which share a semantic features, e.g. about sport or suffering” (Sinclair 2004:141) • An item shows semantic preference when it co-occurs with “ a class of words which share some semantic feature (such as words to do with ‘medicine’ or change’)” (Stubbs 2001:88) • “the relation, not between individual words, but between a lemma or word-form and a set of semantically related words” (Stubbs 2001:65)

  22. Semantic preference (2) • The relation between an individual words and semantically-related words, e.g. • The word commit is related to behaviour e.g. commit suicide, commit a crime • The word large is associated with words that express quantities and sizes, e.g. large number (s), scale, part, amounts, quantities, areas • The word heated is associated with the following semantic set: debate, discussion, argument, exchange • The word agent is associated with the following semantic set: estate, travel, secret, literary

  23. Example

  24. Abstract relations among the four categories • Collocation is “precisely located in the physical text” (Sinclair 2004:142) and can be observed directly. • Involves single words which can be directly observed in texts (Stubbs 2001) • Colligation can only be observed after assigning “a word class to each word under examination” (Sinclair 2004:142) • Involves word classes or grammatical phenomena which can not be directly observed, but are often small and closed (Stubbs 2001) • Semantic preference can not be observed without noticing “similarity of meaning” (Sinclair 2004:142) • Involves a class of words belonging to the same semantic set, which are abstract and open-ended, but “have frequent and typical members” (Stubbs 2001:88) • Semantic prosody “is not subject to any conventions of linguistic realisation, and so is subject to enormous variation, making it difficult for a human or a computer to find it reliably” (Sinclair 2004:144) • Even more open-ended and typically have lexical variability (Stubbs 2001:88)

  25. WordSmith Tools • Wordsmith (WS) is an excellent concordancing software developed by Mike Scott and distributed by Oxford University Press • WS contains a suite of programs: Concord, Wordlist, and Keywords • Each of these functions makes use of an independent application that can be started through the WS, but Keywords should be used with the results of Wordlist.

  26. How to use a corpus • Read for repeated events: Wordlist • A list generated by the software containing all the words used in the corpus. They words can be listed alphabetically or according to its frequency (repeating). • A wordlist of the BNC Sampler

  27. Wordlist • This program generates word lists based on one or more ASCII or ANSI text files. The word lists are automatically generated in both alphabetical and frequency order. • The point of it… • These can be used • 1 simply in order to study the type of vocabulary used; • 2 to identify common word clusters; • 3 to compare the frequency of a word in different text files or across genres; • 4 to compare the frequencies of cognate words or translation equivalents between different languages; • 5 to get a concordance of one or more of the words in your list.  

  28. Within WordList you can compare two lists, or carry out consistency analysis, i.e. to find out which words recur consistently in lots of texts of a given genre, for stylistic comparison purposes. • These word-lists may also be used as input to the KeyWords program, which analyses the words in a given text and compares frequencies with a reference corpus, in order to generate lists of "key-words" and "key-key-words".

  29. Concord • Concord is a program which makes a concordance using DOS, Text Only, ASCII or ANSI text files. • To use it you will specify a search word, which Concord will seek in all the text files you have chosen. It will then present a concordance display, and give you access to information about collocatesof the search word, dispersion plots showing where the search word came in each file, cluster analyses showing repeated clusters of words (phrases) etc.

  30. The point of it… • The point of a concordance is to be able to see lots of examples of a word or phrase, in their contexts. You get a much better idea of the use of a word by seeing lots of examples of it, and it's by seeing or hearing new words in context lots of times that you come to grasp the meaning of most of the words in your native language. It's by seeing the contexts that you get a better idea about how to use the new word yourself. A dictionary can tell you the meanings but it's not much good at showing you how to use the word.

  31. Language students can use a concordancer to find out how to use a word or phrase, or to find out which other words belong with a word they want to use. For example, it's through using a concordancer that you could find out that in academic writing, a paper can describe, claim, or show, though it doesn't believe or want (*this paper wants to prove that ...). • Language teachers can use the concordancer to find similar patterns so as to help their students. They can also use Concord to help produce vocabulary exercises, by choosing two or three search-words, blanking them out, then printing. • Researchers can use a concordancer, for example when searching through a database of hospital accident records, to see whether fracture is associated with fall, grease, ladder. Or to examine historical documents to find all the references to land ownership.

  32. Keywords • This is a program for identifying the "key" words in one or more texts. Key words are those whose frequency is unusually high in comparison with some norm. • The point of it… • Key-words provide a useful way to characterise a text or a genre. Potential applications include: language teaching, forensic linguistics, stylistics, content analysis, text retrieval. • The program compares two pre-existing word-lists, which must have been created using the WordList tool. One of these is assumed to be a large word-list which will act as a reference file. The other is the word-list based on one text which you want to study. • The aim is to find out which words characterise the text you're most interested in, which is automatically assumed to be the smaller of the two texts chosen. The larger will provide background data for reference comparison.

More Related