1 / 17

Analysis of geographic references András Kornai, Beth Sundheim

HLT/NAACL03 workshop 31 May 2003. Analysis of geographic references András Kornai, Beth Sundheim. Thanks to. Sponsors:. AQUAINT. TIDES. Program committee: Doug Appelt Merrick Lex Berman Sean Boisen Quintin Congdon Jim Cowie Doug Jones Linda Hill George Wilson.

Download Presentation

Analysis of geographic references András Kornai, Beth Sundheim

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. HLT/NAACL03 workshop 31 May 2003 Analysis of geographic referencesAndrás Kornai, Beth Sundheim

  2. Thanks to Sponsors: AQUAINT TIDES Program committee: Doug Appelt Merrick Lex Berman Sean Boisen Quintin Congdon Jim Cowie Doug Jones Linda Hill George Wilson Conference support: Ed Hovy James Allen Steven Abney Dragomir Radev Ali Hakim Dekang Lin

  3. Program • 19 papers submitted, 12 accepted • 2 invited speakers • 2 discussion periods • Authors asked to email presentation to geowkshp@kornai.com by end of day

  4. Changes • Afternoon invited speaker: Jerry Hobbs (ISI) replaces Randy Flynn (NIMA) • Paper presentation ordering: Li et al swapped with Manov et al (9:30am v 12:10pm) • Additional workshop event: Linda Hill (UCSB) poster during breaks

  5. Workshop goals • Exchange information on work in the analysis and grounding of place names and other forms of geographic reference • Informally assess state of art in handling various aspects of the problem • Identify ways to follow up on workshop as a community

  6. External resources • Diversity across projects: • ADL, Tipster, NIMA/USGS, UN-LOCODE, TGN, GB Historical GIS, web, … • Integrated resources: • KIM KB (Manov et al.), named entity word list in InfoXtract, extended multi-gazetteer MetaCarta db, … • Net result – how happy are we with current resources and integration solutions? • With coverage of named places, richness of information, utility for NLP analysis as well as for grounding references? • With using a named entity finder as an analysis preprocessor?

  7. Entity finding in text • Some systems (for now) entirely manual • Semi-automated (with human review) • Fully automated • FS template matching • (Weighted) rule-based • HMM-based • Confidence-based

  8. Disambiguation • What do we mean? • Discrimination between names of places and other types of names • Disambiguation of place reference by location of place • Disambiguation of place reference by type of place • How well do current techniques work, and what hard problems remain? • Relative difficulty given texts about U.S., detailed location references, historical texts • Relation to general word sense disambiguation problem • Use of non-local descriptive references, coreference, … • Co-occurrence of names with non-spatial clue terms (“San Francisco” and “earthquake”)

  9. Disambiguation (2) Observations from Nov. ’02 name annotation round: • For 80% of all name instances, evidence from local context was enough to determine which gazetteer entry was the corresponding one in over 75% of cases • This augurs well for successful automation • No gazetteer linkage could be made for 20% of all name instances – either the name did not appear in the gazetteer at all (majority), or it appeared there in the wrong sense • This lack of gazetteer coverage presents a significant challenge

  10. Failure modes (1) • Lack of complete match on name • St. Petersburg – no variant in gazetteer with “St[.]” • Multiple acceptable entries • [the] Crimea – one for “regions”, one for “capes” • Transliteration differences • Sheremetyevo -> Sheremet’yevo • Belarus -> Byelarus • Mismatch on feature type • Simferopol, Vladikavkaz – “capital” in doc, but not in gazetteer

  11. Failure modes (2) • Many matching entries, but no clear winner • Prigorodny – 16 hits on Prigorod (many in Russia) • No entry for general places • Asia – no entry in gazetteer • Variant name missing from entry • America – no match in gazetteer (i.e., not a listed variant) • Name in doc matches wrong entry in gaz • The Heavenly Ski Resort – exactly matches entry with BUILDING feature, but correct entry is under Heavenly Valley Ski Area (with LOCALE feature in USGS GNIS and “sports facilities” feature in ADL gaz)

  12. Foreign language • Example: TIDES surprise language exercise • Challenge: Develop resources and NLP tools for a foreign language in a month (June) • Can’t expect to find an existing placename gazetteer for this language • This language is likely to have a non-western script; ease of transliteration unpredictable

  13. Community • Offerings from SPAWAR Systems Center: • Annotated corpora available to those with licenses for source texts, along with annotation protocol • “Modernized” (with respect to diacritics) Tipster gazetteer available upon request • Call for papers: • Special issue of TALIP journal on temporal and spatial information processing (Editors: Mani, Pustejovsky, Sundheim) • Submissions due December 1 – think about it!

  14. Tagging • Finding the entity in text • Disambiguation • Type assignment • Grounding • Linking to unique gazetteer entry • Assigning coordinates

  15. Annotation standards • Example: Automatic Content Extraction (ACE) • XML-based • Levels: mentions (instances), entities, inter-entity relations • Types of mentions: names, nominals (descriptive references), pronouns • Entity categories wrt places: LOCATION, FACILITY, GEOPOLITICAL ENTITY (GPE) • Each category has defined subtypes (new) • Scheme allows for metonymic usage and fuzzy meaning • Software tools to support manual annotation, output format transformation, annotation lookup and review • Entity and relation schemes could/should be elaborated further over time

  16. Volume and pressure

  17. Conclusions • Procedural input sought from participants: shall we summarize at the end? • Who is we: • Organizers? • Session chairs? • Committee members? • Panel?

More Related