120 likes | 223 Views
Zero Pronoun Resolution in Japanese. Jeffrey Shu Ling 575 Discourse and Dialogue. Review of Zero Pronouns. (Pro)nouns often dropped if pragmatically/semantically inferable from context
E N D
Zero Pronoun Resolution in Japanese Jeffrey Shu Ling 575 Discourse and Dialogue
Review of Zero Pronouns • (Pro)nouns often dropped if pragmatically/semantically inferable from context • General preference for names/nouns rather than pronouns in polite/formal speech (except 1st person and demonstrative pronouns) • Grammatical markers removed with nouns • Pronominal referent may or may not appear in discourse prior to appearance of zero pronoun
Overview of Objectives • Find general syntactic rules that can be used to identify the presence of zero pronouns • Find syntactic/semantic/pragmatic clues to identify referents for ZPs, or determine appropriate pronoun/noun insertion if not explicitly found in textual context • Determine priority in case of conflict
Syntactic Identification of Zero Pronouns • Determining presence of zero pronoun fairly straightforward for subject/topic, objects • Determine syntactic argument structure of verb • Identify if noun exists to fill arguments (simple task with grammatical markers) • Sometimes grammatical markers not used in casual speech • Not so straightforward for other nouns(e.g. locatives) • Might not be important if unstated (e.g. “I’ll go to the store” vs. “I’ll go”). Usually has precedent.
Findings: Some Generalizations from Semantics/Syntax • In consecutive statements with ZPs, the topic/subject is often the same • Many verbs have a semantic preference for certain types of nouns (e.g. animate vs. inanimate) which can narrow possibilities • Imperatives are always 2nd person (unless speaker is talking to him/herself) • Rarely have a subject/topic
Findings: Conversations • 1P by far most commonly used pronoun • If ZP in statement is 1P, usually has an explicit precedent • If 2P, it may or may not have an explicit precedent • Question/answer format generalizations • Subject/topic of question usually 2/3P • If1P, often directly stated • Subject of answer almost always opposite pronoun of question (2/3P vs. 1P), same referent (i.e. same person) • 3P both simplest and potentially most complicated • 3P personal pronouns (e.g. he/she) very rarely used • If 3P ZP, almost always preceded by explicit reference to name/noun • Gender indeterminate (possible problem for MT)
Findings: Domain Specific • Formal situations usually dictate specific conventions to be followed • Representatives almost never use 1P singular or 3P personal pronouns • 2P ZPs can be inferred from domain context (a corporate statement would probably be addressing its customers or investors)
Findings: Domain Specific • Reference articles (e.g. Wikipedia) • While ZPs commonly are used, the antecedent is usually found no more than a few sentences prior • ZPs in consecutive sentences usually have same referent • If no clear antecedent can be found, subject of article is often a reasonable assumption • 1st and 2nd person virtually non-existent, can be safely eliminated as possibilities • Exceptions: reader-addressed texts (e.g. reference guides) • News articles similar conventions
Findings: Domain Specific • Visual media (e.g. comics, TV, etc.) • Most problematic • Referent very often in visual context with no textual context • Heavy reliance on visuals results in not only (pro)noun dropping, but “anything” dropping (including verbs), losing syntactic information • Quite possibly impossible to resolve without human intervention • Purely textual works (e.g. novels) usually have enough information
Priority • Domain has highest priority in all situations • Rules for separate domains largely mutually exclusive • Failure to determine reasonable pronouns as determined to be can lead to misinformation • Japanese particularly sensitive to social context • Generic pronoun insertion may be highly inappropriate • Simple domain information can be extremely valuable • Unless ruled out by domain, general conversation rules may be applicable to many different media
Priority • Pragmatics/semantics > syntax • Ex: Question/answer conventions are of greater relevance than assumption that the subject of the previous sentence is the same as the subject of the current sentence • Ex: The semantics of verbs to prefer certain types of nouns is of greater relevance than the fact that a ZP is a particular grammatical role (e.g. naïve assumption that direct objects tend to be inanimate)
An Idea • Instead of inserting “best guess” pronouns, provide a selection of best candidates in text for user to disambiguate • In current MT systems that insert generic pronouns, users have to “interpret” (guess) what is really meant anyway • Insertion of pronouns is never 100% certain • Some media (visual) require human intervention • Insertion of pronouns/nouns can lead to misinformation, faux pas, and sense of unreliability of system • It is much faster to pick out of a set of provided candidates rather than guess whether the pronoun is right or wrong, and go back to try to figure out what is going on