1 / 7

Evaluation of a Stylometry System on Various Length Portions of Books

Evaluation of a Stylometry System on Various Length Portions of Books. Ida Schulstad, Mark Boga, Cranston Jordan, Kara Pally, Vinnie Monaco, Richard DeStefano, John Stewart, and Charles Tappert. Stylometry.

kylar
Download Presentation

Evaluation of a Stylometry System on Various Length Portions of Books

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Evaluation of a Stylometry System on Various Length Portions of Books Ida Schulstad, Mark Boga, Cranston Jordan, Kara Pally, Vinnie Monaco, Richard DeStefano, John Stewart, and Charles Tappert

  2. Stylometry • “Stylometry is the application of the study of linguistic style, usually to written language …” and “… is often used to attribute authorship to anonymous or disputed documents” – Wikipedia

  3. Book Text Experiments • In this study, stylometry was used to verify the identity of authors • Data: 30 authors and 10 books from each author • System: earlier developed stylometry system • System enhanced with additional features • Performance of the stylometry system was determined on these literary texts • In particular, the degree of performance increase with increasing text lengths

  4. Classification System: Cha’s Dichotomy Model Used in All of Our Biometric Authentication Systems The feature space is transformed into a feature-difference space by calculating vector distances between pairs of samples of the same person (intra-person distances) and between pairs of samples of different people (inter-person distances). (a) Feature space (b) Feature-difference space Transformation from feature space (a) to feature distance space (b)

  5. Book Text Experiments - #1 • The 30 Author Main Experiment • Training and testing files were split in to 5 books for each author. Strong training – the system was trained on the test subjects. • EERs for word sizes of 2, 5, and 10 K: 34%, 30%, and 25% Receiver Operating Characteristic (ROC) Curves 250, 500, 1K, 2K, 5K, 10K words. The Equal Error Rate (EER) increases with the Text Length

  6. Book Text Experiments - #2 • Strong training on 15 of the authors. • Trained on 5 books from each author, tested on remaining 5 • Performance improved with fewer subjects • EERs ~20% for 10K, 24% for 5K, and 30% for 2K word samples. • Receiver Operating Characteristic (ROC) Curves 2K, 5K, 10K words

  7. Equal Error Rate (EER) vs. Text Length in Literary Book Texts from 30 Authors EER decreases logarithmically as a function of text length

More Related