300 likes | 429 Views
[A Recursive Annotation Scheme [ for Referential Information Status] ] Arndt Riester 1 , David Lorenz 2 , Nina Seemann 1 1 Institute for Natural Language Processing (IMS) & SFB 732, University of Stuttgart 2 English Department, University of Freiburg. 19.5.2010 LREC Malta.
E N D
[A Recursive Annotation Scheme[forReferential Information Status]]Arndt Riester1, David Lorenz2, Nina Seemann11Institute for Natural Language Processing (IMS) & SFB 732, University of Stuttgart2English Department, University of Freiburg 19.5.2010 LREC Malta
Information Status • Describesthecognitiveactivationof nominal expressions • Distinguishesbetweengivenandnewitems
Information Status • Describesthecognitiveactivationof nominal expressions • Distinguishesbetweengivenandnewitems • orbetweengiven, accessibleandnewitems (Chafe 1976, 1994)
Information Status • Describesthecognitiveactivationof nominal expressions • Distinguishesbetweengivenandnewitems • orbetweengiven, accessibleandnewitems (Chafe 1976, 1994) • orbetweenevoked, inferrableandnewitems (Prince 1981)
Information Status • Describesthecognitiveactivationof nominal expressions • Distinguishesbetweengivenandnewitems • orbetweengiven, accessibleandnewitems (Chafe 1976, 1994) • orbetweenevoked, inferrableandnewitems(Prince 1981) • or: e.g. Prince (1992), Nissim et al. (2004), Dipper et al. (2007) mediated-part accessible-inferable mediated-possessive discourseold mediated-aggregated old-id-generic mediated-general textuallyevoked discoursenew brand-newanchored old-event hearernew old-generic discourseold mediated-situation old-relative situationallyevoked brand-newunanchored mediated-func_values accessible-general unused mediated-event containinginferrable old-identity bridging old-generic old-general accessible-situation
Desiderata • A simple schemebased on cleartheoreticalassumptions • Good inter-coderagreementfor different textualgenres • Fullcoverageof all nominal expressions • Capableofdealingwithrecursiveembeddings • [theredgem[in [theQueen‘s] crown] ] 3 referents
Desiderata • A simple schemebased on cleartheoreticalassumptions • Good inter-coderagreementfor different textualgenres • Fullcoverageof all nominal expressions • Capableofdealingwithrecursiveembeddings • [theredgem[in [theQueen‘s] Acrown] B] C 3 referents 3 nestedlabelsforinformationstatus
Twolevelsofgivenness • Givennessofwords:repetition, synonymy, hypernymy (2) {On mywayhome, I saw a poodle. a. ItremindedmeofAnna‘spoodle. b. ItremindedmeofAnna‘sdog. • Givennessofreferents:coreference (3) {On mywayhome, I sawa poodle.} a. The poodle / Ittriedtobiteme. b. The stupid beasttriedtobiteme.
Twolevelsofgivenness • Givennessofwords:repetition, synonymy, hypernymy (2) {On mywayhome, I saw a poodle. a. ItremindedmeofAnna‘spoodle. b. ItremindedmeofAnna‘sdog. • Givennessofreferents:coreference (3) {On mywayhome, I sawa poodle.} a. The poodle / Ittriedtobiteme. b. The stupid beasttriedtobiteme. • Keep thetwo apart! In thefollowing: given ≡ coreferential • But see Baumann & Riester (2010) for a two-levelscheme ( Importanceforprosody)
A Simple Rulefor Definite Expressions • Definite descriptions, demonstratives, proper names, pronounstriggerthepresuppositionthattheirreferentshouldbeidentifiedin „the“ context(e.g. Heim, 1983; van der Sandt, 1992). • Claim: Information statusclassesshoulddirectlyreflectthefourcontextcomponents.
A Simple Rulefor Definite Expressions • Definite descriptions, demonstratives, proper names, pronounstriggerthepresuppositionthattheirreferentshouldbeidentifiedin „the“ context(e.g. Heim, 1983; van der Sandt, 1992). • Claim: Information statusclassesshoulddirectlyreflectthefourcontextcomponents.
AnnotatingHearerKnowledge(unused) • Prince (1981): choiceofreferringexpressionreflectsthespeaker‘s/ writer‘sassumptionsconcerningthehearer‘sknowledge (assumedfamiliarity) • Noaccesstothespeaker‘smind • Simplification: as an annotator, decide upon yourownexpectationswhether a (non-anaphoric) item isknownto an intendedaudience
AnnotatingHearerKnowledge(unused) • Prince (1981): choiceofreferringexpressionreflectsthespeaker‘s/ writer‘sassumptionsconcerningthehearer‘sknowledge (assumedfamiliarity) • Noaccesstothespeaker‘smind • Simplification: as an annotator, decide upon yourownexpectationswhether a (non-anaphoric) item isknownto an intendedaudience „thewoman Max went out with last night“ „Barack Obama“
AnnotatingHearerKnowledge(unused) • Prince (1981): choiceofreferringexpressionreflectsthespeaker‘s/ writer‘sassumptionsconcerningthehearer‘sknowledge (assumedfamiliarity) • Noaccesstothespeaker‘smind • Simplification: as an annotator, decide upon yourownexpectationswhether a (non-given) item isknownto an intendedaudience encyclopaedicknowledge „thewoman Max went out with last night“ „Barack Obama“ accommodation
News Example (USA Today, 17.5.10) [...] [Protestants]indef-resumptive still account [forabout 55% [of the 111th Congress]unused-unknown]indef-partitive-contained, but [a recentflurryofCatholicandJewishappointments]indef-new hasturned[them]given-pronoun[into a minorityofone [on the Supreme Court]bridging]indef-new(predicate).Should [Kagan]given-shortbeconfirmed [nextweek]situative, [[the nation‘s]given-epithethighestcourt]given-epithetwouldbe [a Protestant-freezone]indef-generic [forthefirst time since [John Jay, [thenation‘s]given-repeatedfirstchiefjustice (and an Episcopalian)]unused-unknown]unused-unknown, banged [[his]given-pronoungavel]unused-unknown[in 1790]unused-known.
News Example (USA Today, 17.5.10) [...] [Protestants]indef-resumptive still account [forabout 55% [of the 111th Congress]unused-unknown]indef-partitive-contained, but [a recentflurryofCatholicandJewishappointments]indef-new hasturned [them]given-pronoun [into a minorityofone [on the Supreme Court]bridging]indef-new(predicate).Should [Kagan]given-shortbeconfirmed [nextweek]situative, [[the nation‘s]given-epithethighestcourt]given-epithetwouldbe [a Protestant-freezone]indef-generic [forthefirst time since[John Jay, [thenation‘s]given-repeatedfirstchiefjustice (and an Episcopalian)]unused-unknown]unused-unknown, banged [[his]given-pronoungavel]unused-unknown[in 1790]unused-known.
News Example (USA Today, 17.5.10) [...] [Protestants]indef-resumptive still account [forabout 55% [of the 111th Congress]unused-unknown]indef-partitive-contained, but [a recentflurryofCatholicandJewishappointments]indef-new hasturned [them]given-pronoun [into a minorityofone [on the Supreme Court]bridging]indef-new(predicate).Should [Kagan]given-shortbeconfirmed[nextweek]situative, [[the nation‘s]given-epithethighestcourt]given-epithetwouldbe [a Protestant-freezone]indef-generic [forthefirst time since [John Jay, [thenation‘s]given-repeatedfirstchiefjustice (and an Episcopalian)]unused-unknown]unused-unknown, banged [[his]given-pronoungavel]unused-unknown[in 1790]unused-known.
News Example (USA Today, 17.5.10) [...] [Protestants]indef-resumptive still account [forabout 55% [of the 111th Congress]unused-unknown]indef-partitive-contained, but [a recentflurryofCatholicandJewishappointments]indef-new hasturned [them]given-pronoun [into a minorityofone[on the Supreme Court]bridging]indef-new(predicate).Should [Kagan]given-shortbeconfirmed [nextweek]situative, [[the nation‘s]given-epithethighestcourt]given-epithetwouldbe [a Protestant-freezone]indef-generic [forthefirst time since [John Jay, [thenation‘s]given-repeatedfirstchiefjustice (and an Episcopalian)]unused-unknown]unused-unknown, banged [[his]given-pronoungavel]unused-unknown[in 1790]unused-known.
News Example (USA Today, 17.5.10) [...] [Protestants]indef-resumptivestill account[forabout 55% [of the 111th Congress]unused-unknown]indef-partitive-contained, but [a recentflurryofCatholicandJewish appointments]indef-newhasturned[them]given-pronoun[into a minorityofone[on the Supreme Court]bridging]indef-new(predicate).Should[Kagan]given-shortbe confirmed[nextweek]situative, [[thenation‘s]given-epithet highestcourt]given-epithetwouldbe[a Protestant-free zone]indef-generic[forthefirst time since [John Jay, [the nation‘s]given-repeatedfirstchiefjustice (and an Episcopalian)]unused-unknown]unused-unknown, banged [[his]given-pronoungavel]unused-unknown[in 1790]unused-known.
Data • Transcriptsfrom German radionewsbulletins (threefulldaysof (hourly) news) • About 3000 sentences • Parsedwith XLE / German LFG grammar (Rohrer & Forst 2006) • Annotatedwith SALTO tool (Burchardt et al. 2006), extendedTigerXMLformat • Twoannotators, verificationandultimatedecisionby a thirdannotator
Annotation using SALTO (Burchardt et al. 2006) „...saidKirchner in Cordoba...“ „... theArgentinianheadofstate...“
Inter-Annotator Agreement (Cohen 1960) • Evaluation performed on a subsetcomprising 1149 nominal expressions, whichtheannotatorshadtoidentifybythemselves • 1100 expressionsidentifiedbybothannotators • 757 labeledidentically • Agreement κ = .66 (fullscheme: 21 subclasses) κ = .78 (coreschemecomprising 6 classes: given, situative, bridging, unused, indef, other) • Comparison: • Dipper et al. (2007), κ = .55 (newspapercommentaries) • Nissim et al. (2004), κ = .79 (full); κ = .85 (core) (dialogue) (fewerembeddings; pre-exclusionof „difficult“ cases) (Source: Ritz et al. 2008)
Conclusion • Schemeenables fast, comprehensibleandreliableannotationsofnestedexpressions in arbitrarytextgenres • Usefulfor • Computationallinguists: e.g. creating a goldstandardforanaphoraresolutionandrelatedtasks • Theoreticallinguists: empiricaldataforinvestigationsinto form ofreferringexpressions, (non-)restrictivityofmodification, word order, grammaticalrole, discoursestructure etc. • Phoneticians: investigatingprosody in spokencorpora Learnmore: http://www.ims.uni-stuttgart.de/~arndt
Details: given Subclasses: pronoun, reflexive, short, repeated, epithet • BothhadtheblessingsofDr. Richard Klausner. But even [Klausner]given-shorthadtobepersuadedatfirst. • Beforethe European Union‘sbanon incandescentlightbulbswentintoeffect on Sept. 1, consumersacross Europe raidedstorestostockpile [thefamiliarbulbs]given-epithet
Details: bridging Subclasses: 0, text, contained • Germany lost thefootballmatchagainst England because [theaudience]bridging was againstthem. • United weretrailing 3-1 when Fletcher was felled [in thearea]bridging-textby Aleksei Berezutski. The Scotland Midfieldermidfielder was thenyellow-cardedby [thereferee]bridging-text.
Details: bridging-contained vs. unused-unknown • The Republicanswon [thegovernorshipof Virginia]bridging-contained. (expected / prototypicalrelationship) • He was convictedofhelpingtoorganise [theseizure [of Osama Moustafa Nasr]]unused-unknownfrom a Milan street in February 2003. (non-prototypicalrelationship, can‘tbeseparated) • # SpeakingofOsama Moustafa Nasr, [theseizure] happened in 2003.
Details: indef Subclasses: new, generic, partitive, resumptive • [A man]indef-newcame in. He bought a pair ofshoes. • [Serious beer drinkers]indef-genericshouldheadstraighttothis 550-year oldinstitution. • AtviolentclashesbetweenthepoliceanddemonstratingKurds, [threedemonstrators]indef-partitivewereinjured. • That‘sclosetohow a cancervaccineworks, but not precisely. Most expertssee [cancervaccines]indef-resumptiveas a hybrid oftreatmentandprevention.
Other • expletive • null: nobody, nothing • relative: non-restrictive relative clause • cataphor: canbe indefinite or definite