750 likes | 760 Views
This course aims to connect the big picture of linguistic theory to specific research possibilities in psycholinguistics through case studies, discussions, moderated debates, and presentations. Readings will consist of primary literature available online, and assignments will involve more writing than implementing. Collaboration is encouraged, and grades will be based on a scale of 80% or higher. The course will cover topics such as language learning, language diversity, language processing, language acquisition, and more.
E N D
syllabus etc. ‘Official’ materials will be available at colinphillips.net > teaching Course goal: link the big picture to specific research possibilities Course format: case studies, discussion, “moderated debate”; presentations Readings: all primary literature, all available online Assignments: more writing than implementing Grades: A = 80% … and it’s not your target Silence is not golden Collaboration: yes, yes, yes! Dates: M, W until May 13th. Except: 2/20, 4/3, 4/29, uncertain 2/18.
Topics • “Linguistic theory” and psycholinguistics • Language learning & language diversity • Language processing as evidence • Language acquisition • Grammatical constraints in toddlers and preschoolers • Language processing in children • Language processing • Processing models across levels of analysis • Memory processes in language understanding • Comprehension vs. production
what we need What a good psycholinguist needs: • Understanding of the problem space, how pieces connect • Practical, analytical, computational skills • Zoom in/out: connect details to broader issues • Taste/nose for a good problem • Ability to reason about evidence-implications (‘theory of the task’) • Knowledge, about language(s), about learner groups, etc. This is different than standard linguistic analysis/theory And: success = research x communication
taste / nose for a problem we rarely learn much when we are right a good problem is tractable a good problem is related to a good hypothesis (at least one) conflicts between well-established generalizations are often revealing some areas are too messy or crowded (or fast-moving) to be fruitful if a project is not worth doing, it is not worth doing well (H. Gleitman) simple is good
Language Learning Simple logic Difficult implementation Grammars Children Experience
Defining the Learning Problem • The output of learning is complex • Examples: that-t, wanna contraction, parasitic gaps, reconstruction, etc. etc. • The output of learning is hard to observe • Crucial input for learning is hard to observe • It’s noisy (on both sides of the ear) • It’s dissimilar from what must be learned • It’s rare • Yet learning is robust • We should be able to describe the learning problem at multiple grains of analysis, just like the output of learning
Obvious variation • English verbs precede their objects (ate the pizza)Japanese verbs follow their objects (piza-otabeta) • English distinguishes the vowels in sheep and shipSpanish does not • All Russian verbs encode aspect (± completed action)English verbs do not • etc. etc. etc.
Easy to Observe • English is an SVO language, Japanese is an SOV language • John ate the pizza. • John-gapiza-otabeta. • English wh-questions involve wh-fronting, Chinese counterparts do not • Who did Sally meet __ ? • Sally met who? [Chinese] • English main verbs follow adverbs, French main verbs precede adverbs • Joe always drinks coffee in the morning. • Jean boittoujours du café avec son petit déjeuner. [CP’s bad French]J. drinks always coffee with his breakfast
English - okRussian - not ok English - badItalian - ok Impossible inall languages Not-so-obvious variation • Example 1: Pronoun Interpretation • While John was reading the book, he ate an apple. • While he was reading the book, John ate an apple. • John ate an apple while he was reading the book. • He ate an apple while John was reading the book. • Example 2: Constraints on questions • What do you think Sally ate ___? • What do you think that Sally ate ___? • Who do you think ___ ate the donut? • Who do you think that ___ ate the donut?
Language typology & learning • The Big Idea: identifying constraints on language variation and explaining the success of language learning are essentially the same problem • Universals: properties that are common to all human languages do not need to be learned • Co-variation: clusters of non-universal properties that consistently co-occur in a language reflect a single underlying trait (and so those properties do not need to be learned individually) • Ensuring learning success: any non-obvious property that must be learned should be part of cluster that includes an ‘obvious’ property, thereby ensuring reliable learning • Current status …
Micro-variation • Greatly expanded database of language informationWorldwide typological surveysDense regional dialect projects • Reliable clusters are harder to find.Not good news for learners. • Large-scale studies biased towards more easy-to-observe phenomena • Important challenge: does variation in ‘non-obvious’ properties show micro-variation?More rigid constraints in domains where learning is more difficult?Testing semantic variation.
Who do you think John likes __? Who do you think that John likes __? Who do you think __ likes John? Who do you think that __ likes John? that-trace effect English * French * Spanish ok Italian ok Levantine Ar. * Beni-Hassan Ar. ok Post-verbal subject position ‘Telephoned John.’ Who do you think that likes John __?
typology problem =learning problem Challenges… Link all hard-to-observe facts to easy-to-observe phenomena Find reliable parameters of variation in the face of microvariation Find a reliable learning procedure Show evidence of abstract inference in learning Principles & Parameters program (1980s)
Language Diversity langscape.umd.edu
statistical learning! Elissa Newport (Georgetown) Challenges… Learning is closely tied to experience Robust learning procedures available, noise sensitive Evidence of learning available Almost nothing to say about hard-to-observe phenomena Little to say about typological consistency
Language Processing Logic of problem is less clear
Close Alignment is not Desirable “It has sometimes been argued that linguistic theory must meet the empirical condition that it account for the ease and rapidity of parsing. But parsing does not, in fact, have these properties. […] In general, it is not the case that language is readily usable or ‘designed for use.’” (Chomsky & Lasnik, 1993, p. 18) “we understand everything twice” (Townsend & Bever, 2001) “…the language comprehension system creates representations that are ‘good enough’ (GE) given the task that the comprehender needs to perform.” (Ferreira & Patson, 2007, p. 71)
Number Cognition • Multiple systems for encoding quantity:Approximate number systemExact number system • Task-specific routines for specific numerical problems can be distinguished from deeper understanding of numbere.g., 423 x 56 = ___e.g., early counting: “one, two, three, four … two dogs!”
Mon-Weds 2/4-6/19 • Branigan & Pickering 2017 on structural priming • Levels of analysis (David Marr and all that) • Logic and history of relations between linguistics and psycholinguistics
Logic of Structural Analysis • Priming: if X primes Y then they both engage the same, stored, structure • Acceptabilty: • If X can be coordinated [X and Y] then it is a constituent • If X can be the antecedent for ellipsis [“and Sally did too”] then it is a constituent • If X can bind Y [“Every dog loves its owner”] then X c-commands Y • Etc. • Argument for levels of representation … • Argument about limitations of acceptability judgments …
Fast Language Processing Process models Focus on what speakers do quickly Little interest in abilities that manifest more slowly Grammatical Theories Process-neutral models Focus on what speakers can do when freed of time/memory limits Talk of ‘mental computation’ But little interest in real-time operations Not a linguistics vs. psychology split: Language development shows no such divide Slow
Standard Grammatical Analysis (a.k.a. ‘syntactic theory’) Hierarchical groupings of terminals All elements are discrete symbolic representations No time dimension Derivations generally not taken as claims about actual time steps (more discussion: Phillips & Lewis 2013) Default questions: How acceptable is this sentence? Why is it so (un)acceptable? Does it violate combinatorial rules? Is it just hard? … This is a narrow set of questions to ask.
Cognitive Models of Sentence Structure Building (a.k.a. ‘processing theories’) Same questions + many more Order always matters Time sometimes matters representation = memory encoding dependencies = memory access nodes may have varyingactivation levels computations may depend on indep. cogn. abilities comprehension, production work w/ limited information Lewis, Vasishth & Van Dyke 2006 Dillon, Sloggett, Mishler, & Phillips 2013
Neural Models of Sentence Structure Building Different vocabulary / toolkit Constraints from connectivity & anatomy Nodes correspond to complex activity pattern Van der Velde & de Kamps 2006
One Language System Multiple Levels of Analysis “linguistic” “computational?” “cognitive” “algorithmic?” Marr 1982 Marr-levels don’t really fit language research. And there aren’t really 3 discrete levels. “neural” “implementational”
“Suppose […] that one actually found the apocryphal grandmother cell.* Would that really tell us anything much at all? It would tell us that it existed – Gross’s hand-detectors tell us almost that – but not why or even how such a thing may be constructed from the outputs of previously discovered cells.” (Marr 1982, p. 15) * A cell that fires only when one’s grandmother comes into view.
structural constraints (left-right) order Many Levels of Analysis time resource limitations memory architecture and access non-discrete units etc.
What did John say that Mary thought that Sally likes __? *What did John read a book [RC that praised __]? • Formal constraint on long-distance dependencies • It’s unrepresentable. • It’s representable, but naughty • It’s just hard
Multiple Language Systems Multiple Levels of Analysis Grammar Parser Producer
Motivations for Non-Separation • Simpler architecture • A single real-time model is more testable • Better prospects for unifying levels of analysis • Learning • The counter-evidence is not as strong as suspected
Motivations for Separation • Over-extension of competence-performance distinction • Slaying the DTC dragon (Derivational Theory of Complexity) • Comprehension – Production differences • Arguments for grammatical derivations • What is language “designed for”? • Implementation independence • Systematic judgments are slow; language processing is dumb
Chomsky 1965 • “We thus make a fundamental distinction between competence (the speaker-hearer's knowledge of his language) and performance (the actual use of language in concrete situations). Only under the idealization set forth in the preceding paragraph is performance a direct reflection of competence. […] Observed use of language […] may provide evidence as to the nature of this mental reality, but surely cannot constitute the actual subject matter of linguistics, if this is to be a serious discipline.” (p. 4) • “To avoid what has been a continuing misunderstanding, […] a generative grammar is not a model for a speaker or a hearer. […] When we say that a sentence has a certain derivation with respect to a particular generative grammar, we say nothing about how the speaker or hearer might proceed, in some practical or efficient way, to construct such a derivation. These questions belong to the theory of language use - the theory of performance. (p. 9)
Townsend & Bever (2001, ch. 2) • “Linguists made a firm point of insisting that, at most, a grammar was a model of competence - that is, what the speaker knows. This was contrasted with effects of performance, actual systems of language behaviors such as speaking and understanding. Part of the motive for this distinction was the observation that sentences can be intuitively ‘grammatical’ while being difficult to understand, and conversely.”
Townsend & Bever (2001, ch. 2) • “…Despite this distinction the syntactic model had great appeal as a model of the processes we carry out when we talk and listen. It was tempting to postulate that the theory of what we know is a theory of what we do, thus answering two questions simultaneously.1. What do we know when we know a language?2. What do we do when we use what we know?
Townsend & Bever (2001, ch. 2) • “…It was assumed that this knowledge is linked to behavior in such a way that every syntactic operation corresponds to a psychological process. The hypothesis linking language behavior and knowledge was that they are identical.
Townsend & Bever (2001, ch. 2) • “…It was assumed that this knowledge is linked to behavior in such a way that every syntactic operation corresponds to a psychological process. The hypothesis linking language behavior and knowledge was that they are identical.” Note:(i) No indication of how psychological processes realized (ii) No commitment on what syntactic operations count (iii) No claim that syntactic processes are the only processesin language comprehension.
Miller & Chomsky (1963) • ‘The psychological plausibility of a transformational model of the language user would be strengthened, of course, if it could be shown that our performance on tasks requiring an appreciation of the structure of transformed sentences is some function of the nature, number and complexity of the grammatical transformations involved.’ (Miller & Chomsky 1963: p. 481)
Miller (1962) 1. Mary hit Mark. K(ernel)2. Mary did not hit Mark. N3. Mark was hit by Mary. P4. Did Mary hit Mark? Q5. Mark was not hit by Mary. NP6. Didn’t Mary hit Mark? NQ7. Was Mark hit by Mary? PQ8. Wasn’t Mark hit by Mary? PNQ
Miller (1962) Transformational Cube
Townsend & Bever (2001, ch. 2) • “The initial results were breathtaking. The amount of time it takes to produce a sentence, given another variant of it, is a function of the distance between them on the sentence cube. (Miller & McKean 1964).”“…It is hard to convey how exciting these developments were. It appeared that there was to be a continuing direct connection between linguistic and psychological research. […] The golden age had arrived.”
Derivational Theory of Complexity • Miller & McKean (1964): Matching sentences with the same meaning or ‘kernel’ • Joe warned the old woman. KThe old woman was warned by Joe. P 1.65s • Joe warned the old woman. KJoe didn’t warn the old woman. N 1.40s • Joe warned the old woman. KThe old woman wasn’t warned by Joe. PN 3.12s
McMahon (1963) a. i. seven precedes thirteen K (true) ii. thirteen precedes seven K (false) b. i. thirteen is preceded by seven P (true) ii. seven is preceded by thirteen P (false) c. i. thirteen does not precede seven N (true) ii. seven does not precede thirteen N (false) d. i. seven is not preceded by thirteen PN (true) ii. thirteen is not preceded by seven PN (false)
Easy Transformations • Passive • The first shot the tired soldier the mosquito bit fired missed. • The first shot fired by the tired soldier bitten by the mosquito missed. • Heavy NP Shift • I gave a complete set of the annotated works of H.H. Munro to Felix. • I gave to Felix a complete set of the annotated works of H.H. Munro. • Full Passives • Fido was kissed (by Tom). • Adjectives • The {red house/house which is red} is on fire.
Failure of DTC? • Any DTC-like prediction is contingent on a particular theory of grammar, which may be wrong • … and a theory of how to embed a grammar in a recognition device • It’s not surprising that transformations are not the only contributor to perceptual complexity • memory demands, may increase or decrease • ambiguity, where grammar does not help • difficulty of accesse.g., John donated the dog the mouse chased to the school.