60 likes | 88 Views
Dive into 19th and early 20th-century English literature through NLP topic modeling, exploring lexicon, books, and segmentation.
E N D
Top 100Topic Modeling MSDS 7337 – NLP Patrick McDevitt 04-Dec-2018
Summer reading challenge • discover a book from each topic category • Passport to 19th and early 20th century (mostly) English literature !
Lexicon (nouns) of Guttenberg 17,206,479 characters
Topic Modeling – by the numbers 5,827 stopwords 105 books 02:03:28 normalization process time linux ubuntu 16.04 LTS Intel Core i5 CPU @ 1.70Ghz x 4 72,408,393characters 9 topics 7,597 text segments