650 likes | 659 Views
Explore the integration of text annotations, FrameNet lexicon, and Constructicon for English language analysis. Learn about creating a database encompassing text annotations and grammatical constructions. Discover how FrameNet documents lexical units' use and meaning in English via examples from a vast text corpus. Uncover the importance of structurally simple examples in illustrating word meanings and valences within frames. Dive into identifying frame elements and constructing representative examples for different valence possibilities. Address challenges in illustrating verb usage and frame elements through text annotations and grammatical constructions.
E N D
Toward the Linking of Text Annotations, the FrameNet Lexicon, and an Intended Future Constructicon CJFillmore Berkeley
Change of Emphasis • Departing slightly from promises made in the abstract, I’ll be adding some discussion on • what it would take to discover and record the constructions found in a large English Text that is also lexically annotated, in “Frame Semantics” terms, and • how one could construct an Open Source online directory of partial descriptions of grammatical constructions for English,
without ignoring the promised concern for • indicating in Lexical Entries information about the constructions in which the words participate, and • indicating in Construction Entries, information about the lexical items that participate in them. • This obviously requires constructing a single articulated database that includes text annotations, a frame-based Lexicon, and a register of constructions - a Constructicon.
“FrameNet” • Our goal in FrameNet is to document the use and meaning of lexical units in English - especially “frame-bearing” words - by careful examination of attested examples taken from a very large text Corpus. • This means we need to find good examples of each of the words we describe, and that requires some attention.
Criteria for choosing examples • FrameNet lexicographers are told that when they choose examples for illustrating the meaning and use of lexical units, • The example sentences should be structurally simple. • Their lexical content should illustrate the semantic frames they realize. • Enough examples should be collected to illustrate all of each word’s valences - its basic combinatorial affordances.
Why use simple examples? Suppose we’re working on the verb accuse. • Simple example: • Their publisher accused me of plagiarism. • Complex example: • Plagiarism is something I would hate to be accused of. Point: The second is a perfectly good example of an English sentence, but its complexity has nothing to do with relevant facts about the verb accuse. It would not be a good dictionary example of the verb.
Finding “Frame Elements” • For words in the frame containing accuse, the annotations we produce recognize three main roles, and our job is to show how these are expressed in sentences headed by the verb. We can refer to these three roles as • accuser [the person who does the accusing] • accused [the person accused of wrongdoing] • charge [the offense] Their publisher accused meof plagiarism. Plagiarism is something I would hate to be accused of. (unexpressed)
Why “frame relevant” contexts? They accused meof it. Pronouns don’t tell us much about what is going on in this sentence. Our examples are always single sentences, and even if we could find the antecedents of they and it in the surrounding text that would not tell us much about the verb itself. Their publisher accused meof plagiarism. This has more information about the context of an accusation and provides information about the charge.
Why representative? • We want examples of each valence possibility we discover. The verb accuse has VPs of two types: V + NP + PP[of NP] • They accused me of theft. • burglary, arson, perjury, murder V + NP + PP[of VP-ing] • They accused me of stealing their car. • lying to the judge, • killing their dog, • insulting their mother V + NP
Problems with the criteria • Most “simple” sentences illustrating the use of a verb are not frame-revealing, since the arguments are mainly pronouns. • Sentences in which all of the frame-relevant elements are expressed in a single clause are unnatural-sounding -- the kinds of sentences linguists and psycholinguists make up. (“The publisher accused the author of plagiarism.”) • Many words do not occur often enough (even in a very large corpus) to provide simple and clear examples of all of their affordances.
Full Text Annotation • In general, for our lexicographic work, we tried to steer clear of syntactically complex structures, while knowing that we were missing the possibilities of giving good explanations of certain lexical units. • For reasons related to the interests of our later funders, FrameNet activities have moved from “mere” lexicon building, with the use of a vast Research Corpus, to the annotation of continuous texts, letting the examples found there provide material for lexical analysis.
This means that we now have to deal with • mistakes • ambiguities • sentence fragments • repetitions • and - especially - “non-core” grammatical constructions
Constructions? • If we’re going to start dealing with constructions in our work, we need strategies and principles for • recognizing a construction when we see one, • discovering and recording its properties, and • convincing ourselves and our colleagues that what we’ve found really does need the kind of description and explanation that requires the positing of a special construction.
As grammarians, we feel the need to incorporate each new construction within a consistent and coherent generative construction grammar; but as text analysts, we can be (temporarily) satisfied with partial descriptions. • This is normal linguistics: we’ve always been able to recognize (clear cases of), say, the “tough construction,” but it’s taking forever to come up with a satisfying account of it.
The Strategy • If you find something that looks as if it can’t be described within the framework provided by the current state of your theory, keep trying to make it fit. • If you have to give up, then try to see it, not as a lonely idiom, but as an instance of some general grammatical phenomena, and explore such phenomena as thoroughly as you can. • If nothing works, then call it an idiom and add it to the lexicon - at least for now.
Valence and Grammar • Familiar FrameNet valences presuppose a portion of the basic grammar. • That is, information they provide about grammatical functions (subject, object, complement, head, modifier, determiner, etc.) are taken as meaning that we know how these words behave in sentences built up with such construction types as predication, complementation, modification, determination, and the like. • [ILLUSTRATE WITH “accuse”] • Comment on that word “core”.
THE PLAN OF THIS TALK • To examine a few construction types. • one that has fixed slots and fixed words • one that’s pure syntactic form • one whose properties are mostly hidden • To suggest ways of connecting lexical and constructional information. • To suggest ways of annotating texts for their constructions. • To propose cooperatively building a public online construction registry for English.
1a Case: next week • My account will be a little fussy, since I want to illustrate the reasons for deciding that something is a construction, and the need to look for its “boundaries.” So, suppose you come upon the phrase next week in a sentence like Let’s finish this job next week.
1a Case: next week • First impression: What’s the problem? • This is a case of simple modification: adjectivenext + nounweek • But wait! • why doesn’t next week have an article? • why doesn’t it come with a preposition? • why does it mean what it means?
1a What does it mean? • The phrase next week, by itself, refers to the calendar week which comes immediately after the calendar week which includes ‘now’, i.e., the moment of speaking. • It is a deictically anchored time expression. • Compare it to the next week. This phrasing is anaphorically anchored and is much more regular.
1a Is it a simple idiom? • If it’s an idiom, just add it to the lexicon and look for a more interesting problem. • But wait! We find completely analogous interpretations with • next month • next year • next semester • So maybe it’s a construction that uses the word next followed by a noun naming a temporal period.
1a Restrictions • It works fine with week, month, year, and a few special words like semester, but • it doesn’t work with day: *next day • and it doesn’t seem to work with calendric units that are too big to figure in the life experiences of a single individual: *next millennium • So we have to formulate all these restrictions too. (Maybe.)
1a Wait! We’re not finished. • There are semantically and formally analogous patterns that use, instead of next, the words this and last, -and they too are deictically anchored expressions, -and they too exclude day. • this X: the X which contains ‘now’ this week, this month, this year, *this day • last X: the X which precedes the X containing ‘now’ last week, last month, last year, *last day
1a What have we got so far? • Special use of this, next and last. • notice: this is a demonstrative, next and last are adjectives • combining, without prepositions or articles, with specific words that name calendric time periods • forming meanings that relate these time periods as identical to, following, or preceding, the named period containing ‘now’.
1a Descriptive Choices • We could state the conditions for the construction as generally as possible, • regarding the exclusion of the day unit as explained by a pre-emption: in order to express these meanings, the words today, yesterday and tomorrow are required,
1a • regarding the exclusion of century and millennium by describing the function of the construction in terms of the practical limits of human planning, and • regarding the inclusion of non-calendric terms like semester or hour (meaning ‘class hour’ in a school setting), as an exploitation of the system, something that might not need to be described in the grammar.
1a Are we there yet? • No. Here are some more facts about these words: • If we want to talk about the X that followsnext X, or the X that precedeslast X, we say: • the X after next • the X before last • Notice that here the words next and last, by themselves, mean next X and last X • the week after next, the week before last • the month after next, the month before last • COMPARE: the day before yesterday, the day after tomorrow
1a And there’s still more. • The words this, last and next also occur with the names of members of temporal cycles, like • weekday names (Monday, etc.), • month names (January, etc.), • season names (summer, etc.), and • day part names (morning, etc.)
1a • And these have regular but complicated interpretations: • last Friday is ‘the Friday of last week’; • next summer is ‘the summer of next year’; • this March is ‘the March of this year’, • last night is ‘the night of yesterday [last day]’. and there are various extensions, pre-emptions, exceptions.
1a Conclusions so far • We have here a family of constructions that make clear use of particular lexical items, in particular combinations, having semantic interpretations that do not follow from anything else that we know about the grammar of English, which combine with words of particular semantic types.
A lexicon of English has to show that these words can have these functions in these constructions. • A constructicon of English has to show what words, or classes of words, can participate in each of its constructions for which lexical membership is specified. • Text annotations for English should link each word to the relevant lexical entry, and each construction instance should be linked to the relevant entry in a constructicon.
Intermediate Cases • There are lots of constructions that people (some of them in this room) have described that have both lexical and grammatical-pattern requirements.
Traditional and Special • Questions, imperatives, relative clauses, comparatives - each of these with many types. • Serial verbs (Goldberg), WXDY (Kay +), Let_alone (Fillmore +), MadMagazine (Lambrecht), Presentatives (Lakoff), Nominal Extraposition (Michaelis +), Way Construction (Goldberg +), Away Construction (Jackendoff), Correlative Conditional (Lots of people), Tautologies (Wierzbicka), Just Because (Hirose, Bender & Kathol), and dozens more.
Adjective Negation with “no” • It seems that the only adjectives that can be “negated” with no are fair, good, and different. • And these seem to be different from the structure that has no modifying a comparative adjective: • no bigger than a bug • no taller than my baby sister • *no older than Methuselah • *no younger than Chuck
Presentatives • Here comes Harry, wearing my shirt. • Here he comes, wearing my shirt. • First part: here or there • Second part: V+NP or Pron+V • Verb: go, come, be, sit, stand, lie, hang • Third part (optional): secondary predicate
1b By contrast, • There are some constructions that have no specified lexical components. • One of these is “Right Node Raising”, so-called. We might want to call it the Shared Completion Construction. • Description • a final phrase “completes” each of two truncated phrases, • these connected by some kind of conjoining or adjoining device • associated with paired foci
1b y or x…y is a conjoining (adjoining, subjoining) device form is 0+(x)+1+y+2+3 1 and 2 offer paired foci Interpretation: 0+1+3 {and} 0+2+3 where ‘{and}’ is the meaning of the conjoining device 0 1 2 3 Preceding Context Part-1 Part-2 Completion y (x) I wouldn’t touch let alone eat anything that ugly.
1b Conjunctions • “Conjunctions” observed participating in this construction include: • and, or, but, both…and, either…or, not_only…but_also, if_not, but_not, if_even, rather_than, instead_of, let_alone,
1b Conclusions so far • This last construction seems to operate very generally, interacting with almost any grammatical device that permits the expression of contrasting foci. • There appears to be no reason to associate this construction with anything in the lexicon. • And it doesn’t seem possible to blame the construction itself for the meaning of any of the expressions it inhabits. It’s pure syntax.
1c A “hidden” construction? • Now here’s something I think is a special construction, but it’ll be hard to convince most of my grammarian friends. • Components: • a predicate with meaning related to ‘having’ • the word “the” • a noun construable as the name of a resource • an infinitive complement controlled by whoever is interpreted as the subject of the ‘having’ relation, or alternatively a Purpose phrase with “for”
1c Examples • I don’t have the money to take a vacation. • We lack the staff to take on such a project. • Where can I find the cash to buy something that expensive? • Do we have the resources to manage that? • We don’t have the fuel to make it to the next town. • Who’ll give us the funds to do that?
1c (verb with ‘having’ semantics) • I don’t have the money to take a vacation. • We lack the staff to take on such a project. • Where can I find the cash to buy something that expensive? • Do we have the resources to manage that? • We don’t have the fuel to make it to the next town. • Who’ll give us the funds to do that?
1c (noun construable as resource) • I don’t have the money to take a vacation. • We lack the staff to take on such a project. • Where can I find the cash to buy something that expensive? • Do we have the resources to manage that? • We don’t have the fuel to make it to the next town. • Who’ll give us the funds to do that?
1c (complement controlled by ‘haver’) • I don’t have the money to take a vacation. • We lack the staff to take on such a project. • Where can I find the cash to buy something that expensive? • Do we have the resources to manage that? • We don’t have the fuel to make it to the next town. • Who’ll give us the funds to do that?
1c Mystery • The construction allows us to explain the fact that the sequence [the N to VP] is not a self-standing constituent, having a bounded meaning independent of its context. • Evidence *I lost [the money to take a vacation]. *We spilled [the fuel to get us to the next town]. *She just fired [the staff to complete the project].