460 likes | 711 Views
LIN 3098 Corpus Linguistics. Albert Gatt. In this lecture. We proceed with our discussion of how corpus-based studies influence the study of grammar. Focus: lexico-grammar. Uses of corpora in grammar studies. The use of corpora to study grammar is relatively recent.
E N D
LIN 3098 Corpus Linguistics Albert Gatt
In this lecture • We proceed with our discussion of how corpus-based studies influence the study of grammar. • Focus: lexico-grammar
Uses of corpora in grammar studies • The use of corpora to study grammar is relatively recent. • With corpora, the unit of analysis tends to be the word (tokens/types) • Studies of lexis therefore a natural application. • The study of grammar has in fact emphasised the role of lexis. • Also aided by recent developments in automatic POS tagging and parsing. • Additional grammatical information enables search and analysis of complex structures.
Part 1 The relationship between grammar and lexis
Degrees of abstraction • We have already looked at the use of corpora in studying collocations. • Given sufficient grammatical annotation, we can look at collocational patterns at different degrees of abstraction.
Degrees of abstraction • Example: all preceding collocates of the noun time in the BNC. • Not all collocates are equally interesting. • lots of noise when searching for a single word!
Practical task 1 • Let’s try to make our search more interesting, by focusing on a combination of lexical and grammatical material. • Conduct a search for: • Any adjective followed by the noun time
Degrees of abstraction • Example: only adjectival collocates of the noun time in the BNC. • Can make grammatically informed queries. [ADJ + time] • Allows focus on what is truly of interest.
Practical task 2 • We can go further in abstracting away from specific lexical material. • Conduct a search for: • Any adjective followed by any noun
Degrees of abstraction • Suppose we were interested in all adjective-noun combinations. [ADJ + N] • Given a query language of the right complexity (such as CQL), we can extract grammatically interesting collocations.
Limitations of these approaches • What we’ve done still retains a focus on the word. • The main purpose is to improve lexical research by incorporating a limited amount of grammatical info (usually POS) • Can we go further and really investigate grammar?
Part 2 Collocational Frameworks
Does this sound familiar? • Colourless green ideas sleep furiously • Chomsky’s example illustrates an approach to syntax where: • the primary focus is on syntactic rules • rules manipulate lexical items of the right categories • “grammatical” or “legal” is distinct from “sensible” or “meaningful” • syntactic rules operate (semi-) independently of lexical items: if X is of the right category, then X can be slotted into a syntactic position
Chicken and egg questions • When we formulate an utterance, which comes first? • syntax? • lexical items? • both in parallel? • Do particular syntactic constructions have a meaning (or communicative function)? E.g. what is the meaning of: • the appositive that-construction The reason that he gave was… • the extraposedit-construction It is possible to hire a car if you want one.
Lexical approaches to grammar • Assumptions: • syntactic structures are highly sensitive to the lexical items that they can select • structures also may have specific communicative functions or meanings • speakers/authors convey meaning, and syntax is used as a resource to convey it • ideally, grammar+lexis should be viewed as part and parcel of the same process • phraseology and co-selection play an important role • in particular constructions, we find that particular words tend to co-occur with great regularity
The idiom principle • Sinclair (1991): • “a language user has available to him or her a large number of semi-preconstructedphrases that constitute single choices, even though they might appear to be analyzable into segments”
Implications • The idiom principle suggests that speakers/writers: • Don’t just apply abstract rules to build structures; • Re-use bits of structure; • It also implies that bits of structure are themselves meaningful.
The idiom principle vs open choice • This principle contrasts with the “open-choice” principle. • Open choice predicts that: • Syntactic rules operate independently of lexical items. • Structures are constructed by applying rules and “plugging” in lexemes.
Putting the idiom principle to work • Sinclair and Renouf (1991) introduced collocational frameworks • Intended as a practical way to investigate the use and meaning of grammatical constructions • A collocational framework consists of a pattern involving 3 items: • A function word • A content word (specified via POS) • Another function word • Example: [a + Noun + of]
Collocational frameworks • Is a pattern like [a + Noun + of] a linguistic unit? If it is, we would expect that: • The grammatical context (a, of) makes restrictions on the semantics of the Noun in the middle (not any noun can be used)
Practical task 3 • Conduct a search for: • The collocational framework [a+Noun+of] • In looking at the nouns that occur here, can you spot any semantic commonalities? • What does this tell you about the way the structure itself is used, and what it usually means?
[a + Noun + of] • Nouns in this construction are often quantities: • a lot of • a number of • ... • This suggests that this construction itself places a restriction on the semantics of the content words used in it.
Collocational frameworks: final remarks • Sinclair and Renouf did not suggest that any string of words or pattern counts as a collocational framework. • Crucially, there has to be evidence for semantic restrictions on content words. • E.g. [Verb in NP] doesn’t count as a good pattern, because practically any verb can occur in the first position.
Part 3 Colligates
Colligations • Roughly, a collocation at the level of part of speech. • An idea due to Firth. The main question is: • What are the grammatical environments in which a particular word occurs? • One way of answering this question is to look for a word, and then look at the POSs to the left and right.
Practical task 4 • Conduct a search for the word consequence, specifying any word to the right and any word to the left. • Make a frequency count of node tags. • What do you observe?
Some data (Gries 2009) • Left context of consequence • Article • Adjective • ... • Right context: • Of • Preposition • ...
Observations • This operationalisation of the concept of colligation is highly related to the collocational framework of Renouf/Sinclair. • It’s primarily intended to give an idea of the grammatical environment in which a word occurs.
Limitations • Both collocational frameworks and colligations have some drawbacks: • They’re still highly word-based • They focus only on POS (not full syntax) • Their view of grammatical structure is purely linear.
Part 3 Some case studies
Example 1: It as object • Components: • non-referential use of it • object of a verb • followed by an NP or AdjP • Examples (from the BNC): • Many people who use drugs regularly find it difficult to exist in a drug-free world . • You can also find it hard to remember things • in court unless they agree to do so , making it difficult for detainees to challenge the validity
Example 1 continued • Typical analysis: • this construction involves extraposition: People who use drugs find existing in a drug-free world difficult. People who use drugs find it difficult to exist in a drug-free world • Some empirical observations on lexis (Francis 1993): • 98% of cases involve find and make • some other verbs like think, consider, see to • Possible “meaning”/function of the structure: • a stereotyped way of presenting a situation in terms of how it is evaluated • evaluation is placed after the verb
Example 2: appositive clauses • Apposition: • a relation between an NP and another phrase which refers to the same thing (Leech and Svartvik, 1975) • Examples: • your daughter, the lawyer, is here • In English, can also occur with that-clauses and to-clauses: • the newsthat your daughter was here • the plotto assassinate the president
Example 2: appositive clauses • Distinguished from restrictive relative clauses: • the dog that I saw yesterday • restricts the reference of the head noun • Appositive clause: • the fact that I came • does not restrict the reference of the head noun • “amplifies” or “qualifies” the head noun
Example 2: Appositives • Appositive that-clauses (BNC): • The fining of airlines plus the fact that the nationals of many refugee-producing countries • as firm as the Emperor Augustus about the principle that a ruler's actual appearance matters less • Traditional grammars (Leech and Svartvik 1975): • “head noun must be an abstract noun” • Question: • what are the lexical restrictions here? • do they have implications for the function of this syntactic structure?
Levels of stereotypicality in syntax • Phraseological constraints: • the co-selection of particular lexical items within a particular syntactic structure • These seem to range on a continuum. • At one extreme: fixed, unchanging constructions (behave like multi-word lexical items) • At the other: complete freedom in lexical selection.
Phraseology • Completely fixed idioms: • it never rains but it pours • Less fixed idioms: • put on a brave face • putting a brave face on … • put a good face on… • Some room for lexical manoeuvre • Semi-prepackaged phrases which allow for variation: • I haven’t the faintest/foggiest/remotest idea/notion • Highly nebulous lexico-syntactic dependencies: • be a case of X • a case of déjà vu • a case of take the money and run • …
Syntactic “fixedness” • Given the cline from fixed to flexible, some linguists (e.g. Francis 1993) suggest that the distinction between “lexicon” and “syntax” is arbitrary. • This argument is based on phraseological constraints observable only in very large corpora. • This is not too far from recent positions in Generative Grammar: • Jackendoff (2002)’s parallel architecture; • Construction Grammar (e.g. Goldberg, 1995)
The “item” and the “environment” • Francis proposes that the distinction between “lexical item” and “syntactic environment” only be used for convenience. • Proposed method: • look at a syntactic environment • discover lexical regularities • focus on a subset of the lexical items • discover further generalisations about the grammar of those items
Case study: Extraposed it-clauses • One of the most frequent adjectives is possible: • it is possible to hire a car • it is possible that it will rain • Proposed interpretations: • that-clause is used for possibility • to-clause is used to express ability • This suggests that possible might have (at least) two different meanings.
The grammar of possible • Further patterns involving possible: • article + superl. adj. + possible + noun the best possible start • as … as possible • … • Main idea: specifications of possible grammatical environments of the item can help specify its range of meanings. • these examples seem to confirm the ability/probability use of possible