60 likes | 88 Views
Top 100 Topic Modeling. MSDS 7337 – NLP Patrick McDevitt 04-Dec-2018. S ummer reading challenge discover a book from each topic category P assport to 19 th and early 20 th century (mostly) E nglish literature !. NLP – Topic Model - Pipeline.
E N D
Top 100Topic Modeling MSDS 7337 – NLP Patrick McDevitt 04-Dec-2018
Summer reading challenge • discover a book from each topic category • Passport to 19th and early 20th century (mostly) English literature !
Lexicon (nouns) of Guttenberg 17,206,479 characters
Topic Modeling – by the numbers 5,827 stopwords 105 books 02:03:28 normalization process time linux ubuntu 16.04 LTS Intel Core i5 CPU @ 1.70Ghz x 4 72,408,393characters 9 topics 7,597 text segments