1 / 30

19.5.2010 LREC Malta

[A Recursive Annotation Scheme [ for Referential Information Status] ] Arndt Riester 1 , David Lorenz 2 , Nina Seemann 1 1 Institute for Natural Language Processing (IMS) & SFB 732, University of Stuttgart 2 English Department, University of Freiburg. 19.5.2010 LREC Malta.

elaine
Download Presentation

19.5.2010 LREC Malta

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. [A Recursive Annotation Scheme[forReferential Information Status]]Arndt Riester1, David Lorenz2, Nina Seemann11Institute for Natural Language Processing (IMS) & SFB 732, University of Stuttgart2English Department, University of Freiburg 19.5.2010 LREC Malta

  2. Information Status • Describesthecognitiveactivationof nominal expressions • Distinguishesbetweengivenandnewitems

  3. Information Status • Describesthecognitiveactivationof nominal expressions • Distinguishesbetweengivenandnewitems • orbetweengiven, accessibleandnewitems (Chafe 1976, 1994)

  4. Information Status • Describesthecognitiveactivationof nominal expressions • Distinguishesbetweengivenandnewitems • orbetweengiven, accessibleandnewitems (Chafe 1976, 1994) • orbetweenevoked, inferrableandnewitems (Prince 1981)

  5. Information Status • Describesthecognitiveactivationof nominal expressions • Distinguishesbetweengivenandnewitems • orbetweengiven, accessibleandnewitems (Chafe 1976, 1994) • orbetweenevoked, inferrableandnewitems(Prince 1981) • or: e.g. Prince (1992), Nissim et al. (2004), Dipper et al. (2007) mediated-part accessible-inferable mediated-possessive discourseold mediated-aggregated old-id-generic mediated-general textuallyevoked discoursenew brand-newanchored old-event hearernew old-generic discourseold mediated-situation old-relative situationallyevoked brand-newunanchored mediated-func_values accessible-general unused mediated-event containinginferrable old-identity bridging old-generic old-general accessible-situation

  6. Desiderata • A simple schemebased on cleartheoreticalassumptions • Good inter-coderagreementfor different textualgenres • Fullcoverageof all nominal expressions • Capableofdealingwithrecursiveembeddings • [theredgem[in [theQueen‘s] crown] ] 3 referents

  7. Desiderata • A simple schemebased on cleartheoreticalassumptions • Good inter-coderagreementfor different textualgenres • Fullcoverageof all nominal expressions • Capableofdealingwithrecursiveembeddings • [theredgem[in [theQueen‘s] Acrown] B] C 3 referents  3 nestedlabelsforinformationstatus

  8. Twolevelsofgivenness • Givennessofwords:repetition, synonymy, hypernymy (2) {On mywayhome, I saw a poodle. a. ItremindedmeofAnna‘spoodle. b. ItremindedmeofAnna‘sdog. • Givennessofreferents:coreference (3) {On mywayhome, I sawa poodle.} a. The poodle / Ittriedtobiteme. b. The stupid beasttriedtobiteme.

  9. Twolevelsofgivenness • Givennessofwords:repetition, synonymy, hypernymy (2) {On mywayhome, I saw a poodle. a. ItremindedmeofAnna‘spoodle. b. ItremindedmeofAnna‘sdog. • Givennessofreferents:coreference (3) {On mywayhome, I sawa poodle.} a. The poodle / Ittriedtobiteme. b. The stupid beasttriedtobiteme. • Keep thetwo apart! In thefollowing: given ≡ coreferential • But see Baumann & Riester (2010) for a two-levelscheme ( Importanceforprosody)

  10. ContextTheory

  11. A Simple Rulefor Definite Expressions • Definite descriptions, demonstratives, proper names, pronounstriggerthepresuppositionthattheirreferentshouldbeidentifiedin „the“ context(e.g. Heim, 1983; van der Sandt, 1992). • Claim: Information statusclassesshoulddirectlyreflectthefourcontextcomponents.

  12. A Simple Rulefor Definite Expressions • Definite descriptions, demonstratives, proper names, pronounstriggerthepresuppositionthattheirreferentshouldbeidentifiedin „the“ context(e.g. Heim, 1983; van der Sandt, 1992). • Claim: Information statusclassesshoulddirectlyreflectthefourcontextcomponents.

  13. AnnotatingHearerKnowledge(unused) • Prince (1981): choiceofreferringexpressionreflectsthespeaker‘s/ writer‘sassumptionsconcerningthehearer‘sknowledge (assumedfamiliarity) • Noaccesstothespeaker‘smind • Simplification: as an annotator, decide upon yourownexpectationswhether a (non-anaphoric) item isknownto an intendedaudience

  14. AnnotatingHearerKnowledge(unused) • Prince (1981): choiceofreferringexpressionreflectsthespeaker‘s/ writer‘sassumptionsconcerningthehearer‘sknowledge (assumedfamiliarity) • Noaccesstothespeaker‘smind • Simplification: as an annotator, decide upon yourownexpectationswhether a (non-anaphoric) item isknownto an intendedaudience „thewoman Max went out with last night“ „Barack Obama“

  15. AnnotatingHearerKnowledge(unused) • Prince (1981): choiceofreferringexpressionreflectsthespeaker‘s/ writer‘sassumptionsconcerningthehearer‘sknowledge (assumedfamiliarity) • Noaccesstothespeaker‘smind • Simplification: as an annotator, decide upon yourownexpectationswhether a (non-given) item isknownto an intendedaudience encyclopaedicknowledge „thewoman Max went out with last night“ „Barack Obama“ accommodation

  16. News Example (USA Today, 17.5.10) [...] [Protestants]indef-resumptive still account [forabout 55% [of the 111th Congress]unused-unknown]indef-partitive-contained, but [a recentflurryofCatholicandJewishappointments]indef-new hasturned[them]given-pronoun[into a minorityofone [on the Supreme Court]bridging]indef-new(predicate).Should [Kagan]given-shortbeconfirmed [nextweek]situative, [[the nation‘s]given-epithethighestcourt]given-epithetwouldbe [a Protestant-freezone]indef-generic [forthefirst time since [John Jay, [thenation‘s]given-repeatedfirstchiefjustice (and an Episcopalian)]unused-unknown]unused-unknown, banged [[his]given-pronoungavel]unused-unknown[in 1790]unused-known.

  17. News Example (USA Today, 17.5.10) [...] [Protestants]indef-resumptive still account [forabout 55% [of the 111th Congress]unused-unknown]indef-partitive-contained, but [a recentflurryofCatholicandJewishappointments]indef-new hasturned [them]given-pronoun [into a minorityofone [on the Supreme Court]bridging]indef-new(predicate).Should [Kagan]given-shortbeconfirmed [nextweek]situative, [[the nation‘s]given-epithethighestcourt]given-epithetwouldbe [a Protestant-freezone]indef-generic [forthefirst time since[John Jay, [thenation‘s]given-repeatedfirstchiefjustice (and an Episcopalian)]unused-unknown]unused-unknown, banged [[his]given-pronoungavel]unused-unknown[in 1790]unused-known.

  18. News Example (USA Today, 17.5.10) [...] [Protestants]indef-resumptive still account [forabout 55% [of the 111th Congress]unused-unknown]indef-partitive-contained, but [a recentflurryofCatholicandJewishappointments]indef-new hasturned [them]given-pronoun [into a minorityofone [on the Supreme Court]bridging]indef-new(predicate).Should [Kagan]given-shortbeconfirmed[nextweek]situative, [[the nation‘s]given-epithethighestcourt]given-epithetwouldbe [a Protestant-freezone]indef-generic [forthefirst time since [John Jay, [thenation‘s]given-repeatedfirstchiefjustice (and an Episcopalian)]unused-unknown]unused-unknown, banged [[his]given-pronoungavel]unused-unknown[in 1790]unused-known.

  19. News Example (USA Today, 17.5.10) [...] [Protestants]indef-resumptive still account [forabout 55% [of the 111th Congress]unused-unknown]indef-partitive-contained, but [a recentflurryofCatholicandJewishappointments]indef-new hasturned [them]given-pronoun [into a minorityofone[on the Supreme Court]bridging]indef-new(predicate).Should [Kagan]given-shortbeconfirmed [nextweek]situative, [[the nation‘s]given-epithethighestcourt]given-epithetwouldbe [a Protestant-freezone]indef-generic [forthefirst time since [John Jay, [thenation‘s]given-repeatedfirstchiefjustice (and an Episcopalian)]unused-unknown]unused-unknown, banged [[his]given-pronoungavel]unused-unknown[in 1790]unused-known.

  20. News Example (USA Today, 17.5.10) [...] [Protestants]indef-resumptivestill account[forabout 55% [of the 111th Congress]unused-unknown]indef-partitive-contained, but [a recentflurryofCatholicandJewish appointments]indef-newhasturned[them]given-pronoun[into a minorityofone[on the Supreme Court]bridging]indef-new(predicate).Should[Kagan]given-shortbe confirmed[nextweek]situative, [[thenation‘s]given-epithet highestcourt]given-epithetwouldbe[a Protestant-free zone]indef-generic[forthefirst time since [John Jay, [the nation‘s]given-repeatedfirstchiefjustice (and an Episcopalian)]unused-unknown]unused-unknown, banged [[his]given-pronoungavel]unused-unknown[in 1790]unused-known.

  21. Data • Transcriptsfrom German radionewsbulletins (threefulldaysof (hourly) news) • About 3000 sentences • Parsedwith XLE / German LFG grammar (Rohrer & Forst 2006) • Annotatedwith SALTO tool (Burchardt et al. 2006), extendedTigerXMLformat • Twoannotators, verificationandultimatedecisionby a thirdannotator

  22. Annotation using SALTO (Burchardt et al. 2006) „...saidKirchner in Cordoba...“ „... theArgentinianheadofstate...“

  23. Inter-Annotator Agreement (Cohen 1960) • Evaluation performed on a subsetcomprising 1149 nominal expressions, whichtheannotatorshadtoidentifybythemselves • 1100 expressionsidentifiedbybothannotators • 757 labeledidentically • Agreement κ = .66 (fullscheme: 21 subclasses) κ = .78 (coreschemecomprising 6 classes: given, situative, bridging, unused, indef, other) • Comparison: • Dipper et al. (2007), κ = .55 (newspapercommentaries) • Nissim et al. (2004), κ = .79 (full); κ = .85 (core) (dialogue) (fewerembeddings; pre-exclusionof „difficult“ cases) (Source: Ritz et al. 2008)

  24. Conclusion • Schemeenables fast, comprehensibleandreliableannotationsofnestedexpressions in arbitrarytextgenres • Usefulfor • Computationallinguists: e.g. creating a goldstandardforanaphoraresolutionandrelatedtasks • Theoreticallinguists: empiricaldataforinvestigationsinto form ofreferringexpressions, (non-)restrictivityofmodification, word order, grammaticalrole, discoursestructure etc. • Phoneticians: investigatingprosody in spokencorpora Learnmore: http://www.ims.uni-stuttgart.de/~arndt

  25. Thankyou!

  26. Details: given Subclasses: pronoun, reflexive, short, repeated, epithet • BothhadtheblessingsofDr. Richard Klausner. But even [Klausner]given-shorthadtobepersuadedatfirst. • Beforethe European Union‘sbanon incandescentlightbulbswentintoeffect on Sept. 1, consumersacross Europe raidedstorestostockpile [thefamiliarbulbs]given-epithet

  27. Details: bridging Subclasses: 0, text, contained • Germany lost thefootballmatchagainst England because [theaudience]bridging was againstthem. • United weretrailing 3-1 when Fletcher was felled [in thearea]bridging-textby Aleksei Berezutski. The Scotland Midfieldermidfielder was thenyellow-cardedby [thereferee]bridging-text.

  28. Details: bridging-contained vs. unused-unknown • The Republicanswon [thegovernorshipof Virginia]bridging-contained. (expected / prototypicalrelationship) • He was convictedofhelpingtoorganise [theseizure [of Osama Moustafa Nasr]]unused-unknownfrom a Milan street in February 2003. (non-prototypicalrelationship, can‘tbeseparated) • # SpeakingofOsama Moustafa Nasr, [theseizure] happened in 2003.

  29. Details: indef Subclasses: new, generic, partitive, resumptive • [A man]indef-newcame in. He bought a pair ofshoes. • [Serious beer drinkers]indef-genericshouldheadstraighttothis 550-year oldinstitution. • AtviolentclashesbetweenthepoliceanddemonstratingKurds, [threedemonstrators]indef-partitivewereinjured. • That‘sclosetohow a cancervaccineworks, but not precisely. Most expertssee [cancervaccines]indef-resumptiveas a hybrid oftreatmentandprevention.

  30. Other • expletive • null: nobody, nothing • relative: non-restrictive relative clause • cataphor: canbe indefinite or definite

More Related