320 likes | 344 Views
Generative Models of Discourse. Eugene Charniak Brown Laboratory for Linguistic Information Processing. L. BL IP. Joint Work With. Micha Elsner (PhD student, Brown) Joseph Osterwile (Ex Undergraduate, Brown). Abstract.
E N D
Generative Models of Discourse Eugene Charniak Brown Laboratory for Linguistic Information Processing L BL IP
Joint Work With • Micha Elsner (PhD student, Brown) • Joseph Osterwile (Ex Undergraduate, Brown)
Abstract Discourse, the study of how the meaning of a document is built out the meanings of its sentences, is the inter-sentential analogue of semantics. In this talk we consider the following abstract problem in discourse. Given a document, randomly permute the order of the sentences and then attempt to distinguish the original from the permuted version. We present a sequence of generative models that can handle the problem with increasing accuracy. Each model accounts for some aspect of the document, and assigns a probability to the document's contents. In the standard generative way the subsequent models simply multiply individual probabilities to get their results. We also discuss the linkage of this abstract task to more realistic ones such as essay grading, document summarization and document generation.
NOTICE! This example is doctored to illustrate the program. You can ask me about the real randomized abstract if you like. Revised Abstract We present a sequence of generative models that can handle the problem with increasing accuracy. Each model accounts for some aspect of the document, and assigns a probability to the document's contents. Given a document, randomly permute the order of its sentences and then attempt to distinguish the original from the permuted version. In the standard generative way the subsequent models simply multiply individual probabilities to get their results. In this talk we consider the following abstract problem in discourse. We also discuss the linkage of this abstract task to more realistic ones such as essay grading, document summarization and document generation. Discourse, the study of how the meaning of a document is built out the meanings of its sentences, is the inter-sentential analogue of semantics.
A Note on “Generative” When we talk about a “generative” model we do NOT mean a model that actually generates language. (If we do mean that we will say “literally generate”) Rather “generative” is used in machine learning to talk about a model that assigns probability to the input. So “generate” = “assign a probability to”.
Our Three Models • So each of our three models assigns a probability to some aspect of the input (head-nouns, pronouns, and noun-phrase syntax, respectively). • The idea is that the probability assigned to the original document should be higher than that assigned to the random one. • One advantage of such generative models is that if done correctly, they can be combined by just multiplying their probabilities together. This is, in fact, exactly what we do.
More Formally We generate each sentence conditioned on the previous sentences For each sentence we compute three probabilities, head-nouns, pronouns, and NP syntax.
Generative Models of Discourse I Introduction II Model 1 – Head Nouns (Entity Grids) III Model 2 - Pronominal Reference IV Model 3 – Noun-Phrase Syntax V Real Problems (Future Work)
Nouns Tend to Repeat Discourse, the study of how the meaning of a documentis built out the meaning’s of its sentences, is the inter-sentential analogue of semantics. In this talk we consider the following abstract problem in discourse. Given a document, randomly permute the order of the sentences and then attempt to distinguish the original from the permuted version. We present a sequence of generative models that can handle the problem with increasing accuracy. Each model accounts for some aspect of the document, and assigns a probability to the document's contents. In the standard generative way the subsequent models simply multiply individual probabilities to get their results. We also discuss the linkage of this abstract task to more realistic ones such as essay grading, document summarization and document generation.
Entity Grids • Following Barzilay Lapata, and Lee, an entity grid is an array with the “entities” (really just the head nouns) of the document on one axis, the sentence ordering on the other, and at each point the role the entities plays in the sentence. As in previous work we limit the roles to subject (S), object (0), other (X) and not mentioned (-).
A (Partial) Entity Grid Discourse S X - - - - - Meaning X - - - - - - Document X - X - X - - Sentences X - X - - - - Talk - X - - - - - Problem - O - O - - - Order - - O - - - - Original - - X - - - - - Version - - X - - - - - Models - - - X - S -
The Grid for the Randomized Document Discourse - - - - X - S Meaning - - - - - - X Document - X X - - - X Sentences - - X - - - X Talk - - - - X - - Problem O - - - O - - Order - - O - - - - Original - - X - - - - Version - - X - - - - Models X - - S - . -
The Basic E-grid Probability For head-noun probabilities we look at each head nouns probability given its two sentence history (what roles, (S,O,X,-) it filled in the two previous sentences. Each noun in the sentence The role n plays in the i-1th sentence
Model 1 Results Baseline 50% Model 1 82.2% Trained on 10,000 automatically parsed documents form the NTC corpus, tested on 1323 other documents from same corpus.
Generative Models of Discourse I Introduction II Model 1 - Entity Grids III Model 2 - Pronominal Reference IV Model 3 – Noun-Phrase Syntax V Real Problems (Future Work)
Can Pronouns Help • In our abstract the only important pronouns have intra-sentential antecedents. • Furthermore, when the document is out of order, there will almost always be something for the pronoun to point back to. • As we will see, pronouns are the weakest of our models, but they do help.
Adding Pronouns to the Mix To handle pronouns we need to consider the various pronoun resolution possibilities Unfortunately this sum is intractable, so we approximate it with This is reasonable because most documents have only one set of reference assignments that make sense
The probability of an Antecedent and the Pronoun given the Antecedent Probability that the antecedent is a given how far away a is, and how often it has been mentioned Probability of the pronoun number given the antecedent. Probability of the pronoun gender given the antecedent.
Example Pronoun Probabilities P(ref=x|x is 1 back and appeared 1 time)= 0.25 If it is 1 back and appeared>4 times, 0.86 P(“asbestos” is neuter) = 0.998 P(“alice” is feminine)=0.84 P(“it” has a plural antecedent)=.04
Model 2 Results Model 1 82.2% Model 2 71.3% Model 1+2 85.3%
Pronoun Reference vs. Discourse Modeling Best Gender Weak Gender Model Model Model 2 Discourse 71.3% 66.7% Pronoun Reference Accuracy 79.1% 75.5%
Generative Models of Discourse I Introduction II Model 1 - Entity Grids III Model 2 - Pronominal Reference IV Model 3 – Noun-Phrase Syntax V Real Problems (Future Work)
Abstract Discourse, the study of how the meaning of a document is built out the meanings of its sentences, is the inter-sentential analogue of semantics. In this talk we consider the following abstract problem in discourse. Given a document, randomly permute the order of the sentences and then attempt to distinguish the original from the permuted version. We present a sequence of generative models that can handle the problem with increasing accuracy. Each model accounts for some aspect of the document, and assigns a probability to the document's contents. In the standard generative way the subsequent models simply multiplies individual probabilities to get their results. We also discuss the linkage of this abstract task to more realistic ones such as essay grading, document summarization and document generation.
Distinctions Between First and Non-First Mentions • The first mention of an entity tends to have more deeply embedded syntax, • It is longer at every level of embedding, • Uses the determiner “a” more often, • Often uses certain key words more or less often. E.g., most newspapers seem to follow the convention that, e.g., “John Doe” will be followed by “Mr. Doe”.
Using This Information • We assume that the first time a particular head noun occurs is the first mention, and all subsequent uses are non-first. • We have a generative model of the noun-phrase syntax/key-words that should pick out the correct ordering.
Generative NP Syntax l={first, notfirst} h=height. Probability of larger h will be higher for l=first s is either a non-terminal or key-word
A Simple Example NP P(the|start,h=1,l) is high for l=nonfirst P(h=1|l) is high for l=nonfirst DET NOUN the document
Model 3 Results Model 1 82.2% Model 1+2 85.3% Model 3 86.2% Model 1+2+3 90.3% 1+3, 89.1%
Generative Models of Discourse I Introduction II Model 1 - Entity Grids III Model 2 - Pronominal Reference IV Model 3 – Noun-Phrase Syntax V Real Problems (Future Work)
Future Models • Next week: Probabilistic choice of pronoun/full-NP. • Next month: Insert quotations. (Almost) never in first sentence. Usually clustered together. • Next year: Temporal relations between sentences, relations between verbs, different kinds of descriptions.
Real Problems -er • Given an abstract representation of what we know about the entities in the document, (really) generate the words for those entities • Given the sentences of two documents, and the first sentence of one of them, pick out the rest of the sentences of that document. • The same, but with 10 documents on (roughly) the same topic.