360 likes | 378 Views
This article discusses the importance of the given/new distinction in information status and its uses in natural language processing. It explores different models for identifying given/new information and provides examples from the Boston Directions Corpus.
E N D
Varieties of Information Status • ContrastJohn wanted a poodle but Becky preferred a corgi. • Topic/commentThe corgi they boughtturned out to have fleas. • Theme/rhemeThe corgi they boughtturned out to have fleas. • Focus/presuppositionIt was Beckywho took him to the vet. • Given/newSome wildcats bite, but this wildcat turned out to be a sweetheart.
Today: Given/New • Why do we care about Given/New? • Defining Given/New: why is this hard? • Hearer-based and Discourse-based models • Uses of Given/New information in NLP • Identifying Given/New information automatically • Rule-based • Corpus-based • The Boston Directions Corpus • Laboratory studies suggest new directions
Why do we care about the given/new distinction? • Building a model of the discourse • What do S and H believe to be true? • What is in their consciousness now? • What is ‘grounded’? • Speech technologies • TTS: Given information is often deaccented while new information is usually accented • ASR?
Defining Given/New • Halliday ‘67: • Given: Recoverable from some form of context • New: Not recoverable • Chafe ’74 ’76: • Given: what S believes is in H’s consciousness • New: what S believes is not… • “Chafe-givenness” Yesterday I had my class disrupted by a bulldog/dog. I’m beginning to dislike dogs/bulldogs. • But not vice versa….
Prince ’81: A Given/New Taxonomy • Text as set of instructions from S to H on how to construct a discourse model • Model includes discourse entities, attributes, and links between entities • Discourse entities: individuals, classes, exemplars, substances, concepts (NPs) • Entities as ‘hooks’ on which to hang attributes (Webber ’78) • Entities when first introduced are new
Brand-new (H must create a new entity) I saw a dinosaur today. • Unused (H already knows of this entity) I saw your mother today. • Evoked entities are old -- already in the discourse • Textually evoked The dinosaur was scaley and gray. • Situationally evoked The light was red when you went through it. • Inferrables • Containing
I bought a carton of eggs. One of them was broken. • Non-containing A bus pulled up beside me. The driver was a monkey.
Given/New and Definiteness/Indefiniteness • Definiteness: subject NPs tend to be syntactically definite and old • Indefiniteness: object NPs tend to be indefinite and new I saw a black cat yesterday. The cat looked hungry. • Definite articles, demonstratives, possessives, personal pronouns, proper nouns, quantifiers like all, every signal definiteness…but… There were the usual suspects at the bar. • Indefinite articles, quantifiers like some, any, one signal indefiniteness…but…. This guy came into the room
What’s wrong with a simple Hearer-centric model of given/new? • Hearer-centric information status: • Given: what S believes H has in his/her consciousness • New: what S believes H does not have in his/her consciousness • But discourse entities may also be given and new wrt the current discourse • Discourse-old: already evoked in the discourse • Discourse-new: not evoked
(1) A: I’ve decided to make an appointment with Lee Bollinger. (2) B: Why do you want to see Bollinger? • Hearer status of discourse entities in 1? 2? • If B is your roommate? your mother? a guy on the subway? • Discourse status of discourse entities in 1? 2? • What would be the hearer/discourse status of discourse entities in this version? (1) A: I’ve decided to make an appointment with Lee Bollinger. (2a) B: Why do you want to see the president? (2b) B: Have you talked to his secretary?
What does this new Hearer/Discourse given/new distinction provide? • A way to separate what is explicit in the discourse model from what is believed to be in speaker/hearer cognitive model • A way to explain given/new in more complex terms • To identify coreference relations • To explain deaccenting in ASR and TTS
Gross Oversimplification: Given Items Tend to be Deaccented • Accenting and deaccenting: making items intonationally prominent or not • Critical to get this distinction ‘right’ in TTS • Accenting everything makes it hard for people to understand anything, e.g. I like my cat and my cat adores me. One potato, two potato, three potato,… If a discourse entity is given for one speaker then it may or may not be given for another speaker.
How can we determine automatically whether a discourse entity is given or new? • A rule-based approach: • Stem the content words in the discourse • Select a window within which incoming items with the same stem as a previous entity and within this window will be labeled ‘given’ • Other items are ‘new’ • Is this hearer-based? Discourse-based? • How well does it work? • 65-75% accurate (precision) depending on genre, domain
Boston Directions Corpus (Hirschberg & Nakatani ’96) • Experimental Design • 12 speakers: 4 used • Spontaneous and read versions of 9 direction-giving tasks • Corpus: 50m read; 67m spon • Labeling • Prosodic: ToBI intonational labeling • Discourse: Grosz & Sidner • Given/new (Prince ’92), grammatical function, p.o.s.,…
Boston Directions Corpus: Describe how to get to MIT from Harvard d1: dsp1: step 1: enter and get tokenfirstenter the Harvard Square T stopand buy a token d2: dsp2: inbound on red linethenproceed to get on theinboundumRed Lineuh subway
dp3 dsp3: take subway from hs, to cs to ksandtake the subwayfrom Harvard Squareto Central Squareand then to Kendall Square dp4: dsp4: get off T.then get off the T
Hearer and Discourse Given/New Labeling first enter <HG/DN the Harvard Square T stop> and buy <HI/DN a token> then proceed to get on <HI/DN the inbound um Red Line uh subway> and take <HG/DG the subway> from <HG/DG Harvard Square> to <HG/DN Central Square> and then to <HG/DN Kendall Square> then get off <HG/DG the T>
What could we do with this labeled data? • Can we predict given/new? • Can we predict what will be accented and what will be deaccented?
What else might be at work? • Given/new and grammatical function • Hypothesis: how discourse entities are evoked in a discourse influences how ‘given’ they are • E.g., How might grammatical function and surface position interact with the accentuation of ‘given’ items? • Cases: • X has not been mentioned in the prior context • X has been mentioned, with the same grammatical function/surface position • X has been mentioned but with a different grammatical function/surface position
Experimental Design • Major problem: • How to elicit ‘spontaneous’ productions while varying desired phenomena systematically? • Key: simple variations and actions can capitalize upon natural tendency to associate grammatical functions with particular thematic roles for a given set of verbs
Triangle Rectangle Cylinder Diamond Octagon
Context 1 Rectangle Triangle Cylinder Diamond Octagon
Context 2 Triangle Rectangle Cylinder Diamond Octagon
Context 3 Triangle Rectangle Cylinder Octagon Diamond
Target(A) Triangle Rectangle Cylinder Diamond Octagon
Target(B) Triangle Rectangle Cylinder Diamond Octagon
Experimental Conditions • 10 native speakers of standard American English • Subject and experimenter in soundproof booth • Subject told to describe scenes to confederate outside the booth, visible but with providing no feedback • 10 practice scenarios • ~20 minutes per subject
Prosodic Analysis • Target turns excised and analyzed by two judges independently for location of pitch accents for each referring expression: accented (2), unsure (1), deaccented (0) accentedness score from 0-4 (81% agreement for 0 and 2 scores)
Findings • In general • Items that differ from context to target in grammatical function or surface position tend to be accented • Items that share grammatical function and surface position tend to be deaccented • But • Subjects tend to be accented more often than objects, even if previously mentioned in the same role • Direct objects and pp-objects tend to be more distinguished from subjects than from one another
How can we explain these observations? • Consider our examples, e.g. subjD.O. The TRIANGLE touches the CYLINDER. The triangle touches the DIAMOND. The triangle touches the OCTAGON. The RECTANGLE touches the TRIANGLE. • An entity may be ‘given’ or ‘new’ wrt the role it plays in the discourse
Given/New Sensitive to the Role the Discourse Entity Plays • E.g., a discourse entity may retain a given or take on a new thematic role • By the time the target is uttered, ‘triangle’ is established both as a ‘given’ discourse entity and as the discourse topic (or BLC in centering theory) • But this status has been established for ‘triangle’ as agent • What is new, and, perhaps, focused in the target is ‘triangle’s’ new thematic role as patient – the players are the same but the roles are different
Consequences for NLP • Identification of given/new status must be sensitive to more complex model of context (grammatical function/thematic role) • Will this help us predict deaccenting more accurately? • Stay tuned…..