240 likes | 275 Views
Noun Phrase Translation: a discriminative approach. Grazia Russo-Lassner Ling 895. Classification in a nutshell. Statistical classification:
E N D
Noun Phrase Translation: a discriminative approach Grazia Russo-Lassner Ling 895
Classification in a nutshell • Statistical classification: Given a training set of objects each with a class label, a classifier seeks to learn the relationship between the objects and the class label so that new observations, whose class label is unknown, can be assigned to a class. • Examples of classification tasks in NLP:
Discriminative vs generative approach • Training a classifier means to estimate f: X Y, or P(Y|X) • Generative (or joint) classifiers learn probabilities over the observed data and the hidden parameters, i.e. P (X,Y). • N-gram models, Naïve Bayes classfiers, HMM, PCFG. • Discriminative (or conditional) classifiers learn the probabilities of the hidden parameters given the data, i.e. P(Y|X), without assuming anything about the input distribution P(X). • Neural Networks, Logistic regression, Max. Entropy models, SVM, Perceptrons
Discriminative Reranking in NP translation • For a source phrase s, produce an n-best list of translation candidates c1 , c2 , … cn with a baseline MT system • Represent each <s, c> pair as a set of features which expressevidence about the observed phrase pair and the class we want to predict (good vs bad translation) • Train a discriminative classifier to recognize a good translation and produce a statistical model • Reorder the n-best list using the statistical model in (3)
NP reranking: example Reranking Model Baseline MT system institutional delegations delegations institutions delegations institutional ' delegations affairs 's delegations institutions delegations are institutions delegation. institutions delegations institutions ' interinstitutional institutions delegation institutions organisations institutions agencies institutions delegations bodies clamouring institutions ? institutions questioned institutions delegations institutional delegations institutional agencies institutional questioned institutions delegations affairs interinstitutional institutions delegation institutions clamouring institutions ? institutions delegations institutional ' 's delegations institutions delegations are institutions delegations institutions ' delegations institutions organisations institutions agencies institutions institutional delegations
Exploiting global features not available in the baseline system Keeping baseline system simple Keeping decoding complexity low System’s improvement dependent upon what is available in the n-best list Pros and cons of reranking in MT
Classifier training Translation classifier Parallel text (<fv1>, good) … (<fvn>,bad) Extract features WWW NP bracketers (s,e1,good)(s,e2,bad) … (s,en,bad) NP alignment Create training data (s,e) Adaptive selection NP Translation architecture: training Translation classifier
Classifier training Translation classifier Parallel text Parallel text WWW Extract features NP bracketers NP bracketers NP alignment NP alignment Create training Creating labeled training items Positive Evidence (s,e,good) (s,e,good) (s,e,good) …
Classifier training Translationclassifier Parallel text WWW Extract features NP bracketers NP alignment decoder NP alignment Create training Creating labeled training items Negative Evidence : Pseudo-supervised approach s Class Imbalance Problem ! Negative evidence outnumbers positive evidence ! (s,e,good) (s,e,bad) (s,e,bad) (s,e,bad) (s,e,bad) … Solution: among the negative evidence pick candidate translations that are closer in terms of minimum edit distance to the correct phrase e1 … en
Classifier training Translationclassifier Parallel text WWW Extract features NP bracketers NP alignment Create training • Feature extraction (e,s) • Target Web Counts (e) • Source Web Counts (s) • TWC(e)/SWC(s) • Target Token Count (e) • Source Token Count (s) • TTC(e)/STC(s) • LMdecoder(s,e) • TMdecoder(s,e) • LM*TMdecoder(s,e) • Web Context Similarity (s,e) • Tralex Score
Spanishtext (fv1,score1) (fv2,score2) … (fvn,scoren) Translation classifier (<fv1>, ???) … (<fvn>,???) Extract features WWW NP bracketer (s,e1, ???) (s,e2, ???) … (s,en, ???) An n-best list of candidate English phrase translations, sorted by confidence! Decoder Src NPs Candidates e1 … en NP Translation: Run time
Data EuroParl Spanish-English Sentence aligned corpus ~700 sentence pairs 70-20-10 random split NP pairs used for Reranking: Tuning the development and training to the test set
How large of an N to use for the n-best list ? • Avg[N]: average rank of correct answer in decoded candidate list Avg[N] does not improve much past N = 200
Expanding the candidate list • Why necessary ? Successful reranking depends on what is in the n-best list • Dictionary-based approach: • Given a source phrase s consisting of words s1 s2 … sn, a candidate lattice is generated for each permutation order of s, using an automatically produced translation lexicon. • Dictionary-based candidates are ranked by tralex score and ties are broken by language model score
Evaluation • On two types of n-best lists • RList, just the n-best list from the decoder • AList, the decoder’s n-best list augmented by the dictionary-based candidates (20 per src phrase)
Evaluation metrics • ValidRecs: # src phrases that get a correct translation somewhere in n-best list • FirstRight: # times the correct translation is ranked first • AvgRank_on_ValidRecs: average position of correct translation of src phrases that get a correct translation • AvgRank_on_AllRecs: average position of correct translation of all src phrases
Feature selection Features of the top 100-best models for the development set
Results on dev RList:a statistically significant improvement over the baseline …. • Improvement of best performing model over the baseline is statistically • significant wrt FirstRight and AvgRank_on_ValidRecs (p = 0.001), but not • wrt AvgRank_on_AllRecs (p = 0.2) • Possible reasons: • Large number of cases with no correct translation • Correct translation found only with an exact match with the gold phrase
Results on dev RList:the contribution of the web-based context similarity …. A model that uses contextual knowledge has a better chance of pushing to the top of the list candidates that are an exact translation of the source phrase and are more similar to it in the context. The best performing models on dev RList are those trained on a combination of: CR, LM, STC, SC, and one of the WCS’s features
Results on dev RList:the contribution of the web-based context similarity … continued
Future Work • More sophisticated strategies for dealing with missing values for web-based context similarity feature (subphrases, stemming, controlling for phrase length, etc.) • More challenging baseline than ReWrite, such as a phrase-based MT system • Evaluate the benefits of the NP translation system on real document translation (sentence telescoping framework is already built and evaluation infractucture has already been developed) • Evaluate the effect of the NP translation system in a Multilingual QA setting