560 likes | 683 Views
Corpus Studies of Constituent Ordering. Tom Wasow. An example, from Steven Pinker’s The Language Instinct , p. 131:. In my laboratory we use it as an easily studied instance of mental grammar, allowing us to document in great detail the psychology of linguistic rules
E N D
Corpus Studies of Constituent Ordering Tom Wasow
An example, from Steven Pinker’s The Language Instinct, p. 131: In my laboratory we use it as an easily studied instance of mental grammar, allowing us to document • in great detail • the psychology of linguistic rules • from infancy to old age • in both normal and neurologically impaired people, • in much the same way that biologists focus on the fruit fly Drosophila to study the machinery of the genes.
One of the other 119 possible orders: In my laboratory we use it as an easily studied instance of mental grammar, allowing us to document • the psychology of linguistic rules • in great detail • in both normal and neurologically impaired people, • from infancy to old age • in much the same way that biologists focus on the fruit fly Drosophila to study the machinery of the genes.
And another order ?? In my laboratory we use it as an easily studied instance of mental grammar, allowing us to document • in much the same way that biologists focus on the fruit fly Drosophila to study the machinery of the genes • in both normal and neurologically impaired people, • in great detail • the psychology of linguistic rules • from infancy to old age
What makes some orders sound more natural than others? • The answer might shed light on the psychological processes underlying language use. • It might also have practical applications: • for on-line style checkers • for machine translation • for other applications requiring robust generation
The Alternations I Studied • Heavy Noun Phrase Shift: • We take too many dubious idealizations for granted. • We take for granted too many dubious idealizations. • The Verb-Particle Construction: • We figured out the problem. • We figured the problem out. • Dative Alternation: • Kim handed a toy to the baby. • Kim handed the baby a toy.
Factors I Looked At • Structural complexity (or “weight”) • Discourse status (or “newness”) • Semantic connectedness of verb and following constituents • Lexical biases of verbs • Ambiguity avoidance
Grammatical Weight • Behaghel’s “Gesetz der Wachsenden Glieder”: “Von zwei Gliedern von verschiedenem Umfang steht das umfangreichere nach.” Translation Law of Growing Constituents: Of two constituents of different size, the larger one follows the smaller one • In other words: Simple phrases precede complex ones.
Many Proposals to Make Behaghel’s Generalization Precise • Some absolute, others relative • Some categorical, others graded • Corpus data support relative, graded definition • Various proposed measures are so highly correlated that they can’t be distinguished
Categorial Weight Definitions • An NP is heavy if it "dominates S” [Ross (1967, rule 3.26)] • "the condition on complex NP shift is that the NP dominate an S or a PP" [Emonds (1976; 112)] • "Counting a nominal group as heavy means either that two or more nominal groups...are coordinated...., or that the head noun of a nominal group is postmodified by a phrase or clause" [Erdmann (1988; 328), emphasis in original] • "the dislocated NP [in HNPS] is licensed when it contains at least two phonological phrases" [Zec and Inkelas (1990; 377)] • "it is possible to formalize the intuition of 'heaviness' in terms of an aspect of the meaning of the constituents involved, namely their givenness in the discourse" [Niv (1992; 3)]
Graded Weight Definitions • Number of words dominated [Hawkins (1990)] • Number of nodes dominated [Hawkins (1994)] • Number of phrasal nodes (i.e. maximal projections) dominated [Rickford, et al (1995; 111)]
Numbers of Examples HNPS DA V-Prt V DO X 10,592 426 496 V X DO 694 615 1,205 TOTAL 11,286 1,041 1,701
Testing Categorical Definitions as Relative Criteria using HNPS
Correlation Coefficients for 3 Weight Measures HNPSDAV-Prt Words & Nodes .94 .96 .99 Words & Phrasal Nodes .96 .97 .95 Nodes & Phrasal Nodes .94 .96 .98
Two Verb Classes and HNPS • Vt (for "transitive verbs") require NP objects in all their subcategorizations: bring, carry, make, place, put, set, take. • Vp (for "prepositional verbs") can occur with NP objects but also have uses with an immediately following PP and no NP object: add, build, call, draw, give, hold, leave, see, show, write.
Predictions SPEAKER'S PERSPECTIVELISTENER'S PERSPECTIVE Vt HNPS rare HNPS relatively common Vp HNPS relatively common HNPS very rare
Two Verb Classes and DA • Vs(for ”sentential verbs") may be followed by an NP and that-clause or infinitval VP: offer, show, teach, tell, write • Vn (for ”non-sentential verbs") may not be followed by an NP and that-clause or infinitval VP : assign, bring, give, hand, pay, send, take
Predictions SPEAKER'S PERSPECTIVELISTENER'S PERSPECTIVE Vs double object relatively common double object relatively rare Vn double object relatively rare double object relatively common
Newness • The “Given-Before-New Principle”, as formulated by Clark & Clark: “Given information should appear before new information.” • Many variants in the literature.
Are weight and newness distinct effects? • New information requires more more words to convey than old information (e.g., descriptions vs. pronouns) • Is one of these factors just a side-effect of the other? • Surprisingly, nobody asked this question until a few years ago.
Weight and newness are distinct. • With my students, I conducted corpus analyses and a production experiment to tease weight and newness apart. • Both methods showed the two factors were not reducible to one.
Weight & Newness Aren’t the Whole Story “On this side of the Atlantic, the Lancaster-Oslo/Bergen corpus was designed to replicate as closely as possible the Brown corpus, the only difference being that this corpus contains British rather than American English texts.” Judith Klavans, “Computational Linguistics,” in W. O’Grady, M. Dobrovolsky, & F. Katamba, Contemporary Linguistics: An Introduction
Another Factor: Semantic Connectedness Behaghel again: “das geistig eng Zusammengehörige auch eng zusammengestellt wird” Translation What belongs together mentally is also placed close together
Collocations and Idioms • Idioms (semantically opaque collocations): • …bring pressure to bear • Semantically transparent collocations: • …bring the meeting to an end • Non-collocations: • ...bring a pencil to the meeting
Dependent vs. Independent Particles • Dependent: They ate the cookies up. • The meaning of “up” is dependent on the meaning of “ate”, since the cookies don’t go up. • Independent: They picked the cookies up. • The meaning of “up” is independent of the meaning of “ate”, since the cookies go up.
Possible Explanations for Factors Influencing Order Variation • Short before long is easier to process, because hard tasks are postponed. • Given before new facilitates efficient communication by establishing common ground.
Possible Explanations for Factors Influencing Order Variation (continued) • Long phrases and new information are hard to produce and thus get postponed. • Choices in word order allows speakers flexibility in production. • Our memory for words includes information about what constructions they occur in and how frequently.
Another Possible Factor: Ambiguity Avoidance • Global ambiguity: • I saw a man wearing an odd hat with a telescope. • I saw with a telescope a man wearing an odd hat. • Local ambiguity: • They gave Grant’s letters to Lincoln to a museum. • They gave a museum Grant’s letters to Lincoln.
Corpus Search for Local Ambiguity • Few ambiguities of the relevant form (3) • The company gave the U.S. rights to the drug to the Population Council… • More unambiguous word orders (56) • Giuliani gave the commissioner the ceremonial key to the city… • But all unambiguous cases are also cases of short-before-long.
SPEAKER LISTENER Experimental Method 1. Speaker silently reads a sentence: A museum received Grant's letters to Lincoln from the foundation.
What did the foundation do? SPEAKER LISTENER Experimental Method 2. Sentence disappears from screen. Listener reads question from list.
Experimental Method 3. Speaker answers the listener’s question. Listener chooses the correct response on list (from two choices). The foundation gave .... the museum, um, Grant's letter's to Lincoln. SPEAKER LISTENER