A Lightweight and High Performance Monolingual Word Aligner

A Lightweight and High Performance Monolingual Word Aligner Xuchen Yao, Benjamin Van Durme, (Johns Hopkins) Chris Callison-Burch and Peter Clark (UPenn) (Vulcan)

monolingual word alignment • Aligning one sentence pair from RTE2 • Premise: Linda Johnson, who lives with her husband, Charles, and two cats in ... , said Katrina has ... • Hypothesis: Linda Johnson is married to Charles • alignment contributed by Brockett (2007) ACL 2013, Sofia

monolingual vs. bilingual aligment • less training data (labeled or unlabeled), but more lexical resources • semantic relatedness: cued by distributional word similaries • the same grammar shared by source/target sentences ACL 2013, Sofia

a discriminative model • first proposed by Blunsom and Cohn (2006): • s, t: source (observation), target sentence • a: target word indices (0 to target length), state 0 is NULL state for deletion. • f(): feature functions ACL 2013, Sofia

ACL 2013, Sofia

desired Viterbi decoding path ACL 2013, Sofia

a discriminative model • first proposed by Blunsom and Cohn (2006): • s, t: source (observation), target sentence • a: target word indices (0 to target length), state 0 is NULL state for deletion. • f(): feature functions ACL 2013, Sofia

features • string similarity • Jaro Winkler, Dice Sorensen, Hamming, Jaccard, Levenshtein, NGram overlapping and common prefix matching • POS tags matching • WordNet • hypernym, hyponym, synonym, derived form, entailing, causing, members of, have member, substances of, have substances, parts of, have part ACL 2013, Sofia

features • positional • offset difference between src/tgt word • context • whether neighboring words are similar • helps to align functional words • distortion (Markov feature) • how far apart are two aligned target words ACL 2013, Sofia

Implementation: jacana-alignsource code at http://code.google.com/p/jacana • lightweight: only used a POS tagger and WordNet • written in Scala, optimize with LBFGS • platform independent, compiles to a .jar file, fully interoperable with Java • high performance? -> evaluation ACL 2013, Sofia

Baselines • GIZA++ • Tree Edit Distance (with stem/wordnet matching) • MANLI • MacCartney, B.; Galley, M. & Manning, C. D., A Phrase-Based Alignment Model for Natural Language Inference, EMNLP 2008 • MANLI-constraint (decoding with ILP) • Thadani, K. & McKeown, K. Optimal and syntactically-informed decoding for monolingual phrase-based alignment. ACL 2011 ACL 2013, Sofia

10.3% performance in F1 ACL 2013, Sofia

performance in F1 0.8% 3.3% ACL 2013, Sofia

20x 20x performance in speed(seconds per sentecne) • when sentences are more balanced, jacana-align is about 20x faster ACL 2013, Sofia

30x 30x 4x performance in speed(seconds per sentecne) • the speed of jacana-align is not as sensitive to sentence length increase ACL 2013, Sofia

Conclusion • state-of-the-art monolingual word aligner • in accuracy • in speed • open source, use it and hack it! ACL 2013, Sofia

thank youwith a demo

A Lightweight and High Performance Monolingual Word Aligner

A Lightweight and High Performance Monolingual Word Aligner

Presentation Transcript

A Culture of High Performance

Driving a High Performance Culture

Leading a High Performance Team

Data Aligner

A Culture of High Performance

MONOLINGUAL CLASSROOMS

Building a High Performance Team

Developing a High Performance Anatomy

A High Performance Workforce

Building a High Performance Workplace

Creating a high performance culture

Building a High Performance Team

Creating a High Performance Culture

DIRAC: A Scalable Lightweight Architecture for High Throughput Computing

Building a High Performance Team

Collocation Extraction Using Monolingual Word Alignment Method

English Monolingual Lexicography

Wheel Aligner Machine

Stylish and Lightweight High-Quality WordPress Theme (Swoop)

HIGH PERFORMANCE

A High Performance PlanetLab Node

Building a High Performance Team