Introduction to Language Acquisition Theory Janet Dean Fodor St. Petersburg July 2013

Introduction to Language Acquisition Theory Janet Dean Fodor St. Petersburg July 2013 Class 4. Poverty of the stimulus: Missing positive evidence

Can learners rely on positive evidence? • On Friday, we observed the almost total absence of direct negative evidence for learners (= evidence about what is not in the language), such as usable corrective feedback. • Unclear whether indirect negative evidence can step in to distinguish grammatical/ungrammatical sentences. If not, it follows that a grammar must be acquired on the basis of positive evidence (= evidence of what is in the language). • Hearing a sentence is positive evidence of its grammaticality. Especially valuable: the ‘triggers’ for target language values of UG-provided parameters. • Today, we will see that much positive evidence is missing also. And some positive ‘evidence’ is misleading. This has further implications for how much UG must contribute.

Various imperfections of the input

Poverty of the Positive Stimulus (‘POPS’) • Self-evident: Child hasn't already heard every sentence s/he produces. Children generalize beyond their input. • So POPS is at least trivially true: The input does not exemplify the whole language that is acquired. • Least interesting: Learners substitute new words into observed constructions. The cat sat. The dog sat. • More interesting: Recursion: Learners extend the lgby generating degree-n clauses like degree-1 clauses. Mary said that John thinks that Susan hopes that.... • But not by generalizing degree-0 to degree-1: The Penthouse Principle (What happens upstairs...). • Most interesting: Learners deduce the existence or properties of a novel construction.

An early example of POPS: Aux sequences • Child input: It may have rained. It has been raining. • Mental representation of the input (Chomsky 1957): Aux  Tns (M) (have) (be) + V • Predicts the existence of: It may have been raining. • Chomsky concludes: Children would assume that this is in the language even if they had never heard it. • Kimball (1973; an early corpus study) reported that children don't reliably hear it. • Pullum & Scholz (2002) dispute this, giving examples from Moby Dick and Wuthering Heights. Also the Wizard of Oz and Peter Pan.

Structure-dependent auxiliary inversion • The most-discussed argument for POPS.Chomsky (1975) The auxiliary-inversion transformation in English questions. (a) The man is tall. (b) Is the man tall? • Two different generalizations compatible with (a) and (b).  Move the structurally highest aux to the front. CORRECT  Move the linearly first aux to the front. WRONG • The linear generalization gives wrong results in complex sentences. It predicts (d), instead of the correct (e). (c) The man who is tall is in the room. (d) * Is the man who tall is in the room? (e) Is the man who is tall in the room? • But there’s no way to tell that from the one-clause ex’s.

Structure-dependence of transformations • Crain and Nakayama (1987) tested children 3-6 years. ‘Ask Jabba if the boy who is watching Mickey is happy.’ (f) Is [the boy who is watching Mickey] _ happy? (g) *Is [the boy who _ watching Mickey] is happy? The children made no linear-generalization errors like (g). • Chomsky famously claimed “a person could go through much or all of his life without ever having been exposed to” the correct version; hence the structure-dependence of transformational rules must be innately given. • Many disagreements over 3 decades about whether children do reliably hear such sentences. See slides below.

Strongest example of POPS: Parasitic gaps • Which article did you file without reading it?Which article did you file without reading e? Both good. Same meaning. Overt pronoun is optional. • John was killed by a rock falling on him. *John was killed by a rock falling on e. Overt pronoun cannot be omitted here. • Chomsky claims (1983 i.a.) that constructions like these “are so rare that it is quite likely that during the period a child masters his native language (the first five or six years of life), he never hears any of these constructions, or he hears them very sporadically. Nonetheless, every native speaker of English knows flawlessly when you can and can't drop pronouns in these kinds of sentences.”

Parasitic gap constructions have remarkable properties • P-gaps occur inside extraction islands, such as adjunct islands (without reading e), subject island/RC islands: Which linguist did [ everyone who met e ] admire t? • Island constraints block extractions that create a single ‘normal’ empty category (a trace): * Which linguist did [ everyone who met t ] admire Sue? • Regardless of how children come to know about island constraints (innately?), they also come to know that these constraints are not applicable to parasitic gaps. • (a) Not clear that they receive positive data for p-gaps.(b) Even if they do, they must not overgeneralize island constraint violations to ‘normal’ gaps. Somehow, they know that p-gaps are grammatical but special.

Do p-gaps follow from independently established principles of UG? • Chomsky’s claim: The existence of parasitic gaps (and the fine constraints on when and where they can occur) aren’t acquired from experience. • So, despite their curious properties, they must follow from innate principles + positive evidence about other kinds of empty categories. • In “Some Concepts and Consequences…” (1982), he argues that a p-gap is a null pronoun which becomes A-bar-bound by the moved antecedent of the ‘real’ gap, as long as the ‘real’ gap doesn’t c-command it. • He claims that the existence of p-gaps is entailed by the existence of null pronouns, plus the binding theory, θ-criterion, and Projection Principle.

Later refinements of the linguistic analysis • Chomsky’s (1982) analysis thus explained why these constructions (not needed for communication!) exist in many (all?) languages. But it didn’t cover all the facts. • the person that John described t … (a) …without examining [any pictures of e]. (b) * …without [any pictures of e] being on file. • Later proposals by Kayne (Connectedness,1983) and Chomsky (Barriers, 1986). • Kayne: the ‘government projection’ of the p-gap must meet up with the g-projection of the trace. See trees  • This blocks p-gaps inside left branches, as in (b), since government is to the right in English – unless the left branch satisfies Connectedness: (c) a person who people that talk to e usually admire t. e inside R-branch e inside L-branch e inside L-branch

Connectedness constraint on the g-projections of the p-gap and the trace Not connected. Ungrammatical.

Connected g-projections of the p-gap and the trace Connected. Grammatical, as long as NP is phonologically null ( = the trace, which licenses the p-gap). Ungrammatical if NP is overt. *him *Sam.

Are all instances of POPS evidence ofinnate linguistic knowledge? • For p-gaps we are inclined to conclude that what the learner’s brain has to supply, to compensate for stimulus poverty, is non-trivial and specificallylinguistic. • But in other cases, maybe a child’s extensions of the patterns in the input sample are guided just by general principles of induction, rather than specialized language-specific generalization principles. • Some recent such challenges to POPS arguments for UG. • One suggestion: Any learner could arrive at these same generalizations just by tracking the statistical properties of input sentences.

Structure-dependent auxiliary-inversion • Reali & Christiansen (2005) maintained that structure-dependent aux-inversion can be learned just by tracking the frequency of bigrams (two-word sequences). • Their bigram-based model was 96% correct in choosing between the  and * versions of pairs like: Is the boy [who is crying] t hurt? * Is the boy [who t crying] is hurt? • But Kam, Stoyneshka, Tornyova, Fodor & Sakas (2008) showed that this was due entirely to the high corpus frequency of the bigram ‘who is’ in all of R&C’s  versions. • The bigram model failed on all other  variants of the same general rule (object gap, do-support, main verb inversion in Dutch) which don’t have ‘who is’. Did the boy who Sue likes… • Conclusion: the R&C success was just a fluke.

Empirical evaluation of POPS – a methodological challenge • What would substantiate the claim that children acquire some language facts without benefit of relevant positive evidence? Psycholinguistic data showing the child knows X at age Y.  Plus evidence that the child wasn’t exposed to X by age Y. • Practical problems: Very difficult to prove a child was not exposed to X (need day-and-night recordings for years!) Instead, estimate, based on absence from CDS corpora. Does overhearing adult talk count? (See next slide.) Does one instance count as exposure? Do non-conversational genres like nursery rhymes?

Overheard language - incidental learning • Saffran et al. (1997): 6 - 7 year olds overheard a 21-minute tape of nonsense words, while engaged in an art project. No instruction to listen or remember. • Two days in a row. • Then tested on distinguishing words from non-words. Words: babupu bupada dutaba patubi pidabu tutibu Nonwords: batipa bidata dupitu pubati tipabu tapuba • Children correct 68.3% (p<.01). Adults correct 73.1%. • Conclusion: “incidental learning is a robust phenomenon that may play a role in natural language acquisition”. • If so, children may learn from much richer input than CDS.

Pullum & Scholtz don’t agree that child- directed speech is a unique (limited) genre • Because Hudson (1994) found approx same % of nouns in texts of many genres (inc. some child lg, tho’ no CDS). • Some findings show genre differences, e.g., reduced relative clauses are much rarer in conversation than in newspaper. • But P&S deem it appropriate to cite, as evidence against POPS, examples of aux-inversion from The Wall Street Journal and The Importance of Being Ernest (1895). WSJ: Is a young professional who lives in a bachelor condo as much a part of the middle class as a family in the suburbs? Oscar Wilde: Who is that young person whose hand my nephew Algernon is now holding in what seems to me a peculiarly unnecessary manner? • P&S do admit children’s sentences are short, and suggest a 4-word Aux-example: Has whoever left returned?

Corpus estimates of exposure to positive input • P&S cite just 3 ex’s from CDS. To Nina at 2+ yrs.All where’s (JDF: possibly treated as a unit by the child) Where’s the little blue crib that was in the house before? Where’s the other dolly that was in here? Where’s the other doll that goes in there? • Sampson (1989) cites a children’s encyclopedia for 10-yr olds, and a William Blake poem (The Tyger, 1794; see below) which also contains *main verb fronting: In what distant deeps or skies burnt the fire of thine eyes? • A serious CDS corpus search (Yang & Legate, 2002): For both +NullSubj and +V2, acquired at approx age 3;2 (=Crain & Nakayama’s age-of-acquisition data for aux-inversion), CDS provided 1.2% evidence of total input. • But CDS for aux-inversion: Nina: 0.07%, Adam: 0.05%.

Other practical problems in substantiating POPS • Exposure to construction X before the child has the capacity to process construction X should not count, but difficult to establish when that is. • Production tends to be delayed relative to perception in any domain (it's a more demanding task). So production evidence that the child knows X by age Y may be absent even if the child really does.  Comprehension experiments are needed. • Adults may anticipate readiness of a child to cope with X (e.g., relative clauses). So it’s not unlikely that a child who utters X has already heard X – even if the hearing was not the cause. • So, even if children do "invent" X on the basis of UG, as Chomsky’s POS claim implies, could it be proven?

A novel form of argument for POPS • Children sometimes invent ungrammatical forms – which we can be pretty sure they didn’t hear. • If that novel form is grammatical in other languages, it’s arguably not a random invention or error, but is drawn from the possibilities made available by UG. The child just has a UG parameter set wrong. • Thornton (1990): Long-distance Wh-extraction in English. 2-5 yr olds often insert an overt Wh-item into the intermediate Comp. Ungrammatical in English, but ok in other languages (Romani, some German dialects,...). * Who do you think who Grover wants to hug? • Conclusion: Children use UG to compensate for gaps in their input evidence. (How can I form a long-distance Qn?) • Thornton suggests they’re pronouncing the intermediate trace – just a PF-level mistake.

Another example of inventing a UG-compatible but wrong form • Corpus study: Left-branch extractions in Dutch by several children (approx 3-6yrs); van Kampen (1997). These are grammatical in Latin & Polish, but not in adult Dutch. *Welke wil jij liedje zingen? (Which want you t song sing?) *Ik weet niet hoe het lang is. (I know not how it t long is.) • These children are not copying their input. But unrelated children behave alike, so not just random errors. • Proposed explanation: They’re exercising a UG option, when their input info (positive/negative?) is insufficient. • vK argues that left-branch extraction is a learner’s default, because it more closely reflects scope at LF. • In general: This is a promising form of argument for UG-guided learning. More cases would be welcome.

Joint implications of POPS and PONS • Assuming incremental learning (= retain or change grammar after each input), what size & shape is the generalization a learner formulates based on a novel input? • Due to POPS, the positive input merely sets a lower bound on the generalization: it must license at leastthat example. But which others as well? • Due to PONS, negative data (if any) merely set a very loose upper bound: little info about what not to license. • In between POPS and PONS is a huge information gap, allowing a host of alternative grammar hypotheses. • Yet, consistency across all normally developing children. • IN THIS GAP, grammar choice must be being made by something internal to the learner (all learners) = UG or LM.

Now, misleading positive input

Ungrammatical input may occur (but is not the major problem) • An early influential study. Adult speech to young children “is unswervingly wellformed” (Newport, Gleitman & Gleitman 1977). • But there are many fragments in child-directed speech (e.g, “The blue one”, “Over there”). Might a child mistake these for whole sentences? (JDF: Ok if well-formed in context. Children have to learn ellipsis too.) • Children may misanalyze what they hear, which is equivalent to hearing ungrammatical sentences. • Misanalysis errors aren’t easy to document in syntax, but are observed for morphology: weather report weathery port weathery man

Misleading input is ignored – how? • Learners evidently filter out some of their positive input. Not just slips of the tongue, other speech errors, L2. • They hear archaic forms in nursery rhymes and stories, but they don’t adopt them into their own grammar. E.g., (i) Now I lay me down to sleep. (ii) Did he who made the Lamb make thee?(Wm Blake; cited by Sampson 1989 as a positive source for correct aux-inversion, contra Chomsky) • The same poems contain seriously misleading info: (iii) I pray the Lord my soul to keep. (iv) Did he smile his work to see?*Topicalization in infinitival complement clause. • How do children know which examples to ignore??

‘Peripheral’ input is not generalized – why/how? • Children don’t mistake exceptional constructions for core. • Children are exposed to (and use!) many idioms, some of which resemble triggers for parameters. Danger! E.g., (1) Out popped the cuckoo.(Could mis-trigger Verb Second for English main verbs.) (2) I’m gonna have me a nice hot bath.(Could mis-trigger local binding of pronouns, as in Maori.) • How do children know what’s representative of the language as a whole? It can’t be frequency; some idioms are very frequent. (Here you are. Let go of me.) • Could UG help learners distinguish core from periphery? Perhaps Designated triggers for parameters (Fodor 1994).The canonical instance: Maryi pinched heri +LocalBinding

Can stimulus poverty reveal the exact content of UG? • POS can reveal what must follow from UG. • POS rules out any linguistic theory too weak to provide the missing input information or to represent the input in a relevant way (e.g., finite state grammars; bigrams). • Also all theories that wrongly predict which patterns of generalization over the input are natural. • But the 'subtraction method' cannot by itself deliver the "psychologically real" grammar (the exact mix of principles, rules, lexical entries, constraints on derivations or representations, etc.) in people’s heads. • Different linguistic theories assume different UGs, which capture that information in different ways.

Summary of stimulus poverty • Children’s input is seriously uninformative in some basic respects, and potentially misleading. • Yet learners quite reliably end up with (essentially) the same grammar as each other, and as their adult models. • Innate linguistic knowledge (UG) would be capable of resolving some of the indeterminacies, though not all. • Strategies of the learning mechanism may help. Innately given procedures for coping with incomplete input info. Uniqueness Principle; Subset Principle (Class 5) • If not, that leaves a vacuum to be filled, perhaps by more powerful data-driven / statistical / probabilistic / neural network approaches. Many of these reject UG entirely. But hybrid models may be developed.

Please read, for Friday (Class 5) • 1½ pages, on a retrenchment paradox due to the Subset Principle. • This is an excerpt from Fodor, J. D. ‘Syntax acquisition: An evaluation measure after all?’ In Of Minds and Language: The Basque Country Encounter with Noam Chomsky (2009)

Introduction to Language Acquisition Theory Janet Dean Fodor St. Petersburg July 2013