1 / 18

Parallel Clustering of English Verbs into Levin Classes

Parallel Clustering of English Verbs into Levin Classes. 6.338/18.337 Final Project Melanie Goetz Andrew Hogue May 13, 2004. Background. Levin [1993] hand-classified verbs 3086 verbs into 264 classes (with overlaps) Utilized verb arguments and alternations

jasia
Download Presentation

Parallel Clustering of English Verbs into Levin Classes

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Parallel Clustering of English Verbs into Levin Classes 6.338/18.337 Final Project Melanie Goetz Andrew Hogue May 13, 2004

  2. Background • Levin [1993] hand-classified verbs • 3086 verbs into 264 classes (with overlaps) • Utilized verb arguments and alternations • E.g. “the glass broke” or “broke the glass” • Classes correlated with semantic meaning of verbs

  3. Our Approach • Automatically classify verbs • Build graph G with node for each word, edges if words appear in same sentence • First, build bipartite graph with verbs and prepositions • Extend with subject nouns, object nouns • Use spectral partitioning to divide verbs into classes

  4. Our Approach

  5. Our Approach

  6. Parallel Implementation • Three components: • Extract meaningful words from parsed corpus • Merge per-processor sparse matrices without bringing data to front end • Run parallel spectral partitioning on full graph

  7. Parsing • Embarrassingly parallel • Wall Street Journal corpus of 99 documents • Each processor separately extracts tree from corpus and relevant words from tree

  8. Indexing • Need to combine matrices from separate processors into one indexing scheme • Bringing to front end is inefficient • Solution: share “vocabulary lists” between processes • Allows each process to use the same index for each word

  9. Indexing

  10. Indexing

  11. Partitioning • Based on specpart.m from Meshpart toolkit • Serial version uses Cholesky decomposition • Our parallel version uses eigs() function as we only need a few eigenvalues

  12. Results • Clustered 3317 sentences from Wall Street Journal corpus • 2827 unique words • Included subjects, verbs, objects, prepositions

  13. Results - Parsing

  14. Results - Indexing May 13, 2004 6.338/18.337 Final Project 15

  15. Results - Partitioning May 13, 2004 6.338/18.337 Final Project 16

  16. Results - Clustering May 13, 2004 6.338/18.337 Final Project 17

  17. Future Work • Parse other corpora (Project Gutenberg) • Restrict word types to verb/preposition or subject/verb/object • Other ways to use eigenvectors for partitioning into more than 2 parts

More Related