Differential Case Marking: Objects

Optimalitätstheorie und PragmatikKompaktseminar an der Universität WienSommersemester 2005Manfred KrifkaStochastische OptimalitätstheorieLernalgorithmenEvolutionäre Optimalitätstheorie

Differential Case Marking: Objects • In many languages, case marking of subject and objectdepends on a variety of factors. • Hebrew: Only definite object NPs are case marked.Ha-seret her’a ‘et ha-milxama.‘the-movie showed ACC the-war’Ha-seret her’a (*‘et) milxama.‘the-movie showed (*ACC) war’ • Spanish: Only animate object NPs are case marked.Busco a una señora.I-look-for ACC a woman.Busco (*a) una casa.‘I-look-for (*ACC) a house.’ • Bossong (1985): differential object marking, attested in more than 300 languages. • Explanation in Aissen (2002) • Two scales that determine differential object marking: • Animacy: Human > Animate > Inanimate • Definiteness: Pers.Pronoun > Name > Def.NP > Indef.Spec.NP > Nonspec. NP • Generalization: Object marking more likely at the high end of the scales.

From Judith Aissen: Differential Object Marking. Iconicity vs. EconomyDraft, Stanford 2000 A closer look: DOM in medieval Spanish

Differentielle Objektmarkierung im Deutschen: Nominativ / Akkusativ - Synkretismus • Auch im Deutschen finden wir differentielle Objektmarkierung, determiniert durch Genus: • Maskulinum: Der Mann sieht den Hasen. Der Hasesiehtden Mann. NOM  AKK • Femininum: Die Frau sieht den Hasen.Der Hase sieht die Frau. NOM = AKK, Synkretismus • Neutrum: Das Kind sieht den Hasen.Der Hase sieht das Kind. NOM = AKK, Synkretismus • Synkretismus im Neutrum ist ererbt (allgemein in indogerman. Sprachen), im Feminum hat er sich im Mittelhochdeutschen / Frühneuhochdeutschen herausgebildet. • Synkretismus innerhalb einer Flexionsklasse der Nomina (n-Stämme) nach Belebtheit:der Mensch / den Mensch-en, der Bote / den Bot-en, der Hase / den Has-en ...der Regen / den Regen, der Kragen / den Kragen, der Besen / den Besen ... • Fische unbelebt: der Karpfen / den Karpfen, der Rochen / den Rochen • Unterschiedlich kategorisierbare Fälle: der Same(n) / der Wille(n), der Friede(n), ... • Dubletten: der Drache, der Drachen; der Rappe, der Rappen; der Lump, der Lumpen

Differentielle Objektmarkierung im Deutschen: Nominativ / Akkusativ - Synkretismus • Belebtheit als ein Faktor des Kasus-Synkretismus im Allgemeinen:Maskuline Nomina sind wahrscheinlicher belebt als femine. • Beispiel: Korpus von Ruoff (1981), 500.000 Wörter, gesprochene Alltagserzählungen aus dem schwäbischen Raum

Differentielle Objektmarkierung im Deutschen: Nominativ / Akkusativ - Synkretismus • Belebtheit als ein systematischer Faktor: Nominalderivation • Maskuline Ableitungen sind oft animat:Lehr-er, Lehr-ling, Praktik-ant, Psycho-loge • Feminine Ableitungen sind nicht animat:Frei-heit, Freund-schaft, Kleid-ung, Diskuss-ion, Sing-ereiAusnahme: Movierung, Präsident-in. • Neutrum-Ableitungen sind ebenfalls oft nicht animat. • Daher: Auch der Kasussynkretismus im Deutschen hat eine Affinitätzu allgemeinen Gesetzmäßigkeiten der differentiellen Objektmarkierung.

Differential Case Marking: Subjects • Differential subject marking (“Split Ergativity”): Example: Dyirbal, Australia. • 1st and 2nd person pronouns: No marking of subject NP • ɲana banaga-nyu.ɲana ɲurra-na bura-n.we returned. we you-ACC saw. • ɲurra banaga-nyu ɲurra ɲana-na bura-n.you returned you us-ACC saw • Other pronouns and NPs: Ergative marking of subject of transitive sentence:ɲuma banaga-nyu ɲuma-ɲgu yabu bura-n.Father returned. Father-ERG mother saw. • Mixed system: ɲuma-ɲgu ɲurra-na bura-n.Father-ERG you-ACC saw • Hundreds of languages (Basque, Georgian, Hindi...) distribution of subject marking governed by similar scales (Silverstein 1976): • Animacy: Human > Animate > Inanimate • Definiteness: Pers.Pronoun > Name > Def.NP > Indef.Spec.NP > Nonspec. NP • Generalization: Subject marking more likely at the low end of the scales.

harmonic alignment, case marking unlikely disharmonic alignment, case marking likely Differential Case Marking: Scale Alignment • Aissen (2002): Case marking patterns as the result of alignment of two scales,here illustrated with definiteness scale. • Alignment of two scales produces the following markedness scales: • Subj/pronoun > Subj/name > Subj/def > Subj/spec > Subj/nonspec • Obj/nonspec > Obj/spec > Obj/def > Obj/name > Obj/pronoun

Scale Alignment and OT constraints • Expression of marking tendencies, Hebrew: • Relevant parts of basic hierarchies: Subj > Obj, +def > –def • Aligned hierarchies: Subj/+def > Subj/–def(harmonic > disharmonic) Obj/–def > Obj/+def (only this one relevant here) • Correspond. constraint ranking: *Obj/+def >> *Obj/–def “Not marking definite objects is worse than not marking indefinite objects” better interpretation: “Case marking of definite objects is more important than case marking of indefinite objects” • Markedness constraint: *STRUC: Avoid Structure (explicit marking): Speaker Economy (not strictly necessary for Hebrew case, but relevant later) • Constraint ranking: *Obj/+def >> *STRUCT >> *Obj/–def

Derivation of Dyirbal System • The facts, again: • 1st and 2nd person pronouns: No marking of subject NP • ɲana banaga-nyu.ɲana ɲurra-na bura-n.we returned. we you-ACC saw. • ɲurra banaga-nyu ɲurra ɲana-na bura-n.you returned you us-ACC saw • Other pronouns and NPs: Ergative marking of subject of transitive sentence:ɲuma banaga-nyu ɲuma-ɲgu yabu bura-n.Father returned. Father-ERG mother saw. • Mixed marking: ɲuma-ɲgu ɲurra-na bura-n.Father-ERG you-ACC saw • No marking:ɲana ɲuma bura-n.we Father saw

OT Constraints, Case marking in a Dyirbal-like Language • Basic hierarchies, universal: S(ubj) > O(bj)1(st) > 3(rd) • Aligned hierarchies: S/1 > S/3O/3 > O/1 • Generated constraint orders: *S/3 >> *S/1 *O/1 >> *O/3 “marking of S/3 is more important than marking of S/1” • Combined constraints: {*S/3, *O/1} >> *STRUC >> {*S/1, *O/3 }

Where do the hierarchies come from? • Aissen simply assumes hierarchies like S > O, 1 > 3, def > indef as given. • Bresnan, Dingare & Manning (2001), Zeevat & Jäger (2002):The hierarchies can be explained by typical patterns of language use. • Example:Subjects and objects in 3151 simple transitive clausesof Swedish everyday conversation (SAMTAL corpus, Ö. Dahl)

Probabities that subjects and objectshave certain properties,SAMTAL Corpus of spoken Swedish(collected by Ö. Dahl, analyzed by Zeevat & Jäger) Biases in the SAMTAL Corpus Resulting stastical biases, expressed as conditional probabilities e.g., p(Subj | +def): probability that a +def NP is subject: 63% p(Subj | +def) = 63% p(Subj | –def) = 4% p(Obj | +def) = 37% p(Obj | –def) = 96% p(Subj | +pron) = 66% p(Subj | –pron) = 9% p(Obj | +pron) = 33% p(Obj | –pron) = 91% p(Subj | +anim) = 90%p(Subj | –anim) = 7%p(Obj | +anim) = 10%p(Obj | –anim) = 93% This holds for a fairly large and representative corpus of spoken Swedish;findings can be reproduced in their tendencies for other languages, communities;but collecting further data absolutely necessary.

–case, Obj/–def +case, Obj/–def  –case, Obj/+def  +case, Obj/+def Statistical Bias and Bidirectional OT • Zeevat & Jäger (2002), Jäger (2003) • Economical encoding: • Case marking is disfavored for frequent combinations, e.g., indefinite objects: p(Obj | –def) = 96% • but case marking is favored for infrequent combinations, e.g., indefinite subjects: p(Subj | –def) = 4% definite objects: p(Obj | +def) = 37% • A case for weak bidirectional optimization? • Preference for simple forms: –case >> +case • Preference for meanings that correspond to bias: Obj/–def >> Obj/+def Optimal pairs, case markingpattern of Hebrew. Problem: There is no choiceto interpret a given NPas +def or –def;this is explicitly marked!

Statistical Bias and Bidirectional OT • Zeevat & Jäger assume the following constraints: • *STRUC: Avoid structure, i.e. avoid overt marking • FAITH: Faithful interpretation of case morphemes, e.g. ACC: Obj, ERG: Subj • BIAS: An NP of a certain category is interpreted as having the grammatical function that is most probable for this category, e.g. Obj: inanimate • Ranking: FAITH >> BIAS >> *STRUC • Hearer optimality and speaker optimality (Asymmetric Bi-OT): • Hearer optimality: For a given form, choose the meaning that shows the least severe constraint violation!In the case at hand, interpret an NP according to its case marking pattern;if there is no case marking, follow statisticalbias (I-Implicature) • If two competing forms are both hearer optimal for a given meaning,speaker can choose the preferred one (here: the one without case marking) • Hearers have to be served first, as Speakers want to be understood. • Definition: • A pair F, M GEN is hearer-optimal iff there is no alternative F, M’ GEN such that F, M’ > F, M. • A pair F, M GEN is optimal iff it is hearer-optimaland there is no alternative form F’, M GEN such that F’, M is hearer-optimal and F’, M > F, M.

Example: Animacy in a language with ERG and ACC

From Pragmatics to Grammar? • One caveat:The OT-tableaus typically abstract away from important factors, e.g. word order, plausibility, selectional restrictions. • The lightning killed the man.Even though the man is animate and in object position, it wouldn’t need case marking, as only animates can be killed. • A second caveat:Case marking is typically part of the core grammar,and not derived by pragmatic rules.But: Pragmatic tendencies as one source of core grammar(functionalist view of grammar).

Motivation for Stochastic Optimality Theory • Judith Aissen (2000) and Joan Bresnan (2002):There is not just a universal tendency towards differential case markingin the core grammars of language, • but it can be also describe optional case marking within a language. • Example: Case marking by postpositions in colloquial Japanese(data: Fry 2001, Ellipsis and w-marking in Japanese):Subj/anim: 60% Subj/inanim: 70%Obj/anim: 54% Obj/inanim: 47% • Obligatory case marking patterns can be seen as extreme casesof statistical marking patterns, e.g. Spanish:Obj/anim: 100% Obj/inanim: 0% • Stochastic Optimaltiy Theory (StOT), Boersma (1998), Functional Phonologydeveloped originally for phonological phenomena, can be used to model this intuition:Core grammar phenomena are not essentially differentfrom statistical tendencies based on usagein phenomena that core grammar leaves, to a certain degree, optional.

Constraints C1, C2 overlap:mostly C1 >> C2 sometimes C2 >> C1 Constraints C1, C2 do not overlap:C1 >> C2 (almost) all the time Stochastic Optimality Theory (StOT) • Main differences between standard OT and Stochastic OT: • Constraint ranking on a continuous scaleEvery constraint is assigned a real numberwhich determines the ranking of the constraintsand is a measure for the distance between them. • Stochastic evaluation:For each evaluation, the placement of a constraintis modified by adding a noise value with normal distribution.The ordering of the constraints after adding this noise valuedetermines the actual evaluation of the set of candidates.

Difference between mean values > 10:C1 dominates C2 categorically,p(C2 > C1) < 10-10 Difference between mean values  5:preference for C1 >> C2,but C2 >> C1 lead to grammatical results,p(C2 > C1)  10% Difference between mean values = 0no ranking preferences,p(C2 > C1) = p(C1 > C2) = 50%,random outcomes. Stochastic OT: Ordering Probabilities

Statistical OT and Gradual Learning • Boersma (1998), Boersma & Hayes (2001), in Linguistic Inquiry:Gradual Learning Algorithm (GLA) for learning constraint rankings(not for learning of possible candidates, GEN) • In phonology:GEN: pairs of phonological formsand phonetic interpretations: //, [] • In semantics/pragmatics:GEN: pairs of syntactic/morphological forms and semantic/pragmatic interpretations: F, M

Boersma’s Gradual Learning Algorithm (GLA) • 0. Initial state: All constraint values are set to 0 • Learning datum: input-output pair i, o • Generation:a. For each constraint, a noise value with probability following normal distribution, is added to current ranking: This is the selection point of the constraint.b. Constraints are ranked by order of their selection points.c. The grammar generates an output o’ for the input i; alternative pair: i, o’ • Comparison:If o’ = o, nothing happens. Otherwise, algorithm compares the constraint violationsof the learning datum i, o with the generated datum i, o’ • Adjustment:a. All constraints that favor the learning datum i, o over the self-generated i, o’ are increased by a small predefined numerical amount (“plasticidy”)b. All constraints that favor the self generated i, o’ over the learning datum i, o are decreased by the plasticity value. • Final state: Steps 1 – 4 are repeated until the constraint values stablize. • Plasticidy may change over life time from high to low.

hypothesized form observed form observed likely meaning hypothesized meaning Bidirectional Gradual Learning Algorithm (BiGLA) • Jäger (2003): ‘The bidirectional gradual learning algorithm’ • Speaker-based learning:Input: Meaning, Output: Form. i, o =M, FSpeaker compares different forms. • Hearer-based learning:Input: Form, Output: Likely meaning. i, o = F, MHearer compares different meanings. • Hearer also uses speaker-based reasoning:On hearing F, M with likely meaning M, speaker checks: Would I have used a different F’ to express M?If yes: Adjust rankings to increase likelihood of using F to express M.

Modelling Pragmatics • The Bidirectional Gradual Learning Algorithm (BiGLA)can be tested experimentally. • Implementation: evolOT, downloadable with files at: http://uni-potsdam.de/~jaeger/nasslli03 • Example: Differential Object Marking triggered by definiteness (e.g., Hebrew); • input: Statistical distributions of SAMTAL corpus.

ranking differences between constraints generations constraints Development of Differential Object Marking markdefiniteobjects! Starting state:constraints start outequally ranked *STRUC After 1000 generations,ranking of constraints firmly established,including previously observedm:Obj/+def >> *STRUCT >> m:Obj/–def markindefiniteobjects!

Development of Split Ergativity (Animacy) markanimateobjects! mark inanimate subjects! Start out with high value of FAITH:Every NP is case marked *STRUC Lower value of FAITH:Fewer NPs are case marked don’t markinanimateobjects! don’t mark animate subjects!

markanimateobjects! don’t markinanimateobjects! mark inanimate subjects! don’t mark animate subjects! mark inanimate subjects! *STRUC don’t markinanimateobjects! markanimateobjects! *STRUC don’t mark animate subjects! Development of Split Ergativity: Initial State doesn’t matter

markanimateobjects! don’t markinanimateobjects! mark inanimate subjects! don’t mark animate subjects! *STRUC *STRUC don’t markinanimateobjects! mark inanimate subjects! markanimateobjects! don’t mark animate subjects! Development of Split Ergativity: Initial State doesn’t matter

*STRUC m:S/+a m:S/+a *STRUC Learning under the Microscope: Speaker Mode Assume current constraint ranking includes the following relative ranking, where m:S/+a: ‘mark animate subjects’ and *STRUC: ‘avoid marking’ • Incoming datum: Subj.anim-Ø (non-marked animate subject) • In speaker mode:Algorithm produces one of the forms:a. Subj.anim-Ø (= learning datum, nothing happens)b. Subj.anim-ERG (satisfying FAITH) • Comparison with learning datum:b. *STRUC favors datum and is promoted, m:S/+a disfavors datum and is demoted. Ultimately, *STRUC will rank higher than m:S/+a, suppressing marking of animate subjects. In general: If a form is produced that differs from the datum and is– a non-marked NP: promotion of *STRUC and/or demotion of marking constraint (see example)– a case-marked NP: demotion of *STRUC, promotion of FAITH if case marking is different.

m:O/+a m:S/+a m:S/+a m:O/+a Learning under the Microscope: Hearer Mode Assume current constraint ranking includes the following relative ranking, where m:S/+a: ‘mark animate subjects’ and m:O/+a: ‘mark animate objects’ • Incoming datum: Subj.anim-Ø (non-marked animate NP interpreted as subject) • In hearer mode:Algorithm produces one of the interpretations (as subject or object):a.Subj.anim-Ø (= learning datum, nothing happens)b. Obj.anim-Ø • Comparison with learning datum:b. m:S/+a favors datum and is promoted, m:O/+a disfavors datum and is demoted. In general: If a meaning is produced that differs from the datum and the NP is– a case-marked NP: promotion of FAITH– a non-marked NPs: promotion and/or demotion of marking constraints (see example)

Differential Case Marking: Objects