570 likes | 693 Views
The Web in Theoretical Linguistics Research: Two Case Studies Using the Linguist’s Search Engine . Philip Resnik, Aaron Elkiss, Heather Taylor, and Ellen Lau University of Maryland. Berkeley Linguistics Society. February 20, 2005. * Theje dberk eobbfid dbeonc kdoeb.
E N D
The Web in Theoretical Linguistics Research:Two Case Studies Using the Linguist’s Search Engine Philip Resnik, Aaron Elkiss, Heather Taylor, and Ellen Lau University of Maryland Berkeley Linguistics Society February 20, 2005
*Theje dberk eobbfid dbeonc kdoeb Did that sound ok to you? “a small, imperfect experiment…”
Nature of Grammar Data-oriented Probabilistic Ordered constraints Hard / Categorical Conventional / Binary {__,?,??,?*,*,**} Contrasts Magnitude estimation Nature of Elicitation Schütze (1996) Cowart (1997) Bard, Robertson, and Sorace (1996) Crocker and Keller (2005) Sorace and Keller (2005)
Nature of Grammar Data-oriented Probabilistic Ordered constraints Hard / Categorical Language Technology Linguist Source of Language Sample Naturally occurring Corpora Part-of-speech taggers Treebanks Statistical parsers Semantic role labeling …etc. ? Nature of Elicitation
If you build it, they will come… Manning (2003): “…it remains fair to say that these tools have not yet made the transition to the Ordinary Working Linguist without considerable computer skills.” % export TGREP_CORPUS=wsj_mrg.crp % tgrep -n __ | grep . | gzip > wsj_mrg.txt.gz % tgrep2 -C -p wsj_mrg.txt wsj_mrg.t2c.g NP !<< PP [> NP | >> VP]
Roadmap • Motivations • The Linguist’s Search Engine • Case Study 1: Psycholinguistics • Case Study 2: Syntax • Conclusions
A Brief Illustration of the LSE • Pollard and Sag (1994); discussion in Manning (2003) • (a) We consider Kim to be an acceptable candidate • (b) We consider Kim an acceptable candidate • (c) We consider Kim quite acceptable • (d) We consider Kim among the most acceptable candidates • (e) *We consider Kim as an acceptable candidate • (f) *We consider Kim as quite acceptable • (g) *We consider Kim as among the most acceptable candidates • (h) *We consider Kim as being among the most acceptable candidates
Type an example of the structure you’re interested in. LSE generates an automatic analysis (You don’t have to agree with the analysis!) Query By Example
A few mouseclicks later, you have a description of the structure you’re looking for. The LSE creates the query for you.
Hit ‘search’ and the LSE retrieves sentences whose analysis matches the structure you specified.
Two Case Studies • Focus in this talk: • What was the study about? • How was the LSE useful? In both cases, my co-authors were naïve users of the Linguist’s Search Engine. I didn’t discover the LSE had been useful to them until after the fact.
Case Study I: Psycholinguistics • Nina Kazanina, Ellen Lau, Moti Lieberman, Colin Phillips and Masaya Yoshida, “Active Dependency Formation in the Processing of Backwards Anaphora”. 17th Annual CUNY Sentence Processing Conference, University of Maryland, College Park. March 2004. http://www.ling.umd.edu/ninaka/Papers/CUNY_2004_slides.pdf
While he was watching TV, John heard the phone ring. • Early pronoun signals upcoming dependency formation • Active processing of dependency observed? • Dependency formation constrained by grammar? Active Dependency Formation The teacher asked what the team was laughing about __. • Wh-word signals upcoming dependency formation • Active processing of dependency observed filled gap effect • Dependency formation constrained by grammar island constraints
Original data for testing prediction Gender mismatch effect While she was cooking dinner, John listened to the radio. She was cooking dinner while John listened to the radio. Principle C rules out coreference in c-commanded position, so no mismatch effect should be observed Active Dependency Formation Results looked good, but there was a confound! She was cooking dinner while John listened to the radio. She was cooking dinner while John listened to the radio. Needed a construction where the target position is expected; otherwise processor might simply have stopped looking for target.
Options: • Rely on experimenter intuition • Do a pilot study • Sift through a corpus Active Dependency Formation Possible solution: expletive constructions It was clear to his mother that John should go. It was clear to him that John should go. It was clear to his mother that John should go. It was clear to himthat John should go. No Principle C Principle C Question: does this construction really have the right properties? • Is the second clause consistently expected? • Is it consistently expletive rather than referential?
Query by example: It was clear to him Becomes It AUX [clear to NP]
Active Dependency Formation Result: • Verified that virtually all results of the search did involve expletive it with a following clause. • Obtained reassurance in designing the follow-up study • Later double-checked using an off-line completion study The LSE made it easy to start with linguists’ intuitions and find relevant evidence in naturally occurring text. The LSE also makes it easy to look for additional relevant data that may not have occurred to the experimenter.
Query by example: It AUX Adj PP that… Any adjective PP with any preposition
clear important vital manifest interesting necessary obvious
Case Study II: Syntax • Heather Taylor, “Interclausal (co)dependency: the case of the comparative correlative”, Proc. Michigan Linguistics Society, October 2004. http://www.ling.umd.edu/events/syntax/abstracts/heather1.PDF
Comparative Correlatives* The Xer …, the Yer … • Highlighted in recent debates about the UG approach • Central question: are these constructions amenable to an analysis based on UG principles, or do they present a challenge to the UG view? Central claim here: the LSE is useful regardless of which side of the debate you’re on. *A.k.a. Conditional correlatives, correlative conditionals, “more-more” constructions
Comparative Correlatives Culicover and Jackendoff (1999) Taylor (2004) IP/CP CP Sui generis CP CP CP CP CP [the more XP]i (that) IP [the more XP]j(that) IP … ti … … tj … Interclausal relationships accounted for outside the syntax UG analysis relating CCs to conditionals
is is Comparative Correlatives • McCawley’s generalization (1988, 1998): Deletion of copular main verbs in CCs is sensitive to semantic properties of the subject (generic/specific) • The better an advisor , the more successful a student • The more obnoxious Fred , the less attention you should pay is *Ø • But analysis of LSE data exposes the role of: • Phonological weight of the subject • Parallelism (copula in both clauses, deletion in both clauses) casting doubt on the generalization’s validity
Comparative Correlatives *The more obnoxious Fred, the less attention you should pay to him. ?The more obnoxious Fred’s younger brother, the less attention you should pay to him. ?The longer the day’s activities are, the sleepier the campers. ?The longer the day’s activities, the sleepier the campers are. √The longer the day’s activities, the sleepier the campers. Informant judgments confirm the tendencies indicated by naturally occurring data.
Comparative Correlatives • Overt then? • The hungrier Romeo gets, then the more pizza he eats. • Cf. If Romeo gets hungrier, then he eats more pizza.
Comparative Correlatives • Overt then • The hungrier Romeo gets, then the more pizza he eats. • Cf. If Romeo gets hungrier, then he eats more pizza. • LSE searches suggest that overt then is not anomalous. • Might this support a UG account that provides a unified treatment of CCs and conditionals? One more fact to add to the theoretical debate!
Traditional?! Conclusions • The LSE is useful to traditional linguists • Confirming/disconfirming intuitions (theory data) • Exposing a wider range of data (data theory) • The LSE complements new methodological trends • Magnitude estimation, etc. • The LSE is available for anyone to use • http://lse.umiacs.umd.edu
Conclusions • Chomsky (1979): “You can also collect butterflies and make many observations. If you like butterflies, that’s fine; but such work must not be confounded with research, which is concerned to discover explanatory principles of some depth and fails if it does not do so.” • Einstein (1940): “Science is the attempt to make the chaotic diversity of our sense-experience correspond to a logically uniform system of thought [in which] experience must be correlated with the theoretical structure… What we call physics comprises that group of natural sciences which base their concepts on measurements…”
A Web Search Tool for the Ordinary Working Linguist • Must have linguist-friendly “look and feel” • Must minimize learning/ramp-up time • Must permit real-time interaction • Must permit large-scale searches • Must allow search on linguistic criteria • Must be reliable • Must evolve with real use
LSE Example: Text in Parallel Translation Example: seeing how English “completive particle” usages (eatup versus simply eat, indicating a telic event) are rendered in different languages.
LSE Example: Implicit Objects • Resnik (1993, 1996): • Information-theoretic model of selectional constraints • Model makes predictions with respect to implicit objects • Implicit objects • John ate Ø (= John ate something edible) • *John found Ø (can’t mean John found something findable). • Question from audience: • “Doesn’t your model then predict that the verb titrate should permit implicit objects?” • Options • Find informants for whom titrate is in the working vocabulary • Slog through corpora looking for titrate used “intransitively”