Parallel Clustering of English Verbs into Levin Classes

Parallel Clustering of English Verbs into Levin Classes 6.338/18.337 Final Project Melanie Goetz Andrew Hogue May 13, 2004

Background • Levin [1993] hand-classified verbs • 3086 verbs into 264 classes (with overlaps) • Utilized verb arguments and alternations • E.g. “the glass broke” or “broke the glass” • Classes correlated with semantic meaning of verbs

Our Approach • Automatically classify verbs • Build graph G with node for each word, edges if words appear in same sentence • First, build bipartite graph with verbs and prepositions • Extend with subject nouns, object nouns • Use spectral partitioning to divide verbs into classes

Our Approach

Parallel Implementation • Three components: • Extract meaningful words from parsed corpus • Merge per-processor sparse matrices without bringing data to front end • Run parallel spectral partitioning on full graph

Parsing • Embarrassingly parallel • Wall Street Journal corpus of 99 documents • Each processor separately extracts tree from corpus and relevant words from tree

Indexing • Need to combine matrices from separate processors into one indexing scheme • Bringing to front end is inefficient • Solution: share “vocabulary lists” between processes • Allows each process to use the same index for each word

Indexing

Partitioning • Based on specpart.m from Meshpart toolkit • Serial version uses Cholesky decomposition • Our parallel version uses eigs() function as we only need a few eigenvalues

Results • Clustered 3317 sentences from Wall Street Journal corpus • 2827 unique words • Included subjects, verbs, objects, prepositions

Results - Parsing

Results - Indexing May 13, 2004 6.338/18.337 Final Project 15

Results - Partitioning May 13, 2004 6.338/18.337 Final Project 16

Results - Clustering May 13, 2004 6.338/18.337 Final Project 17

Future Work • Parse other corpora (Project Gutenberg) • Restrict word types to verb/preposition or subject/verb/object • Other ways to use eigenvectors for partitioning into more than 2 parts

Parallel Clustering of English Verbs into Levin Classes

Parallel Clustering of English Verbs into Levin Classes

Presentation Transcript

English Modal Verbs

English Classes

English Classes

Behavior-driven clustering of queries into topics

ENGLISH PHRASAL VERBS

English classes online | Break Into English

English Modal Verbs ?

ENGLISH VERBS CAN BE CLASSIFIED INTO TWO CATEGORIES:

Distributional Clustering of English Words

Classes of Lexical Verbs

Parallel Clustering Algorithms: Survey

English module verbs

ENGLISH VERBS

Distributional clustering of English words

Spoken English Classes

Spoken English Classes

Parallel Density-based Hybrid Clustering

Parallel Clustering Algorithms: Survey

English Classes

Learn English speaking | English classes in Dubai | Spoken English classes

Spokenwave- English Classes

English Speaking Classes