60 likes | 76 Views
Alex Blackstock Matt Spitz 6/9/08. Classifying Movie Scripts by Genre. Overview. Motivation classifying movie scripts may identify box office flops and successes before they're even produced! Data freely-available movie scripts (DailyScripts.com, etc)
E N D
Alex Blackstock Matt Spitz 6/9/08 Classifying Movie Scripts by Genre
Overview Motivation classifying movie scripts may identify box office flops and successes before they're even produced! Data freely-available movie scripts (DailyScripts.com, etc) IMDB genres (several labels/movie) Tools Lucene MEMM from PA3 jBNC (naïve Bayes classifier) Stanford Named Entity Recognizer Stanford Part-Of-Speech Tagger
Features Non-NLP dialogue shape character information NLP POS ratios Named Entity appearances Character-Based NLP analyze individual characters exclamations main vs. secondary
Evaluation Metrics Example output: Blade II (gold labels: Action, Thriller, Horror) guessed labels: Action, Adventure, Horror, Thriller, ... F1 Score per genre weighted-average over all genres # of guesses allowed = # of gold labels Partial Credit Score allows for some error # guesses allowed = # of gold labels * 1.5 penalized for guesses that are beyond # gold labels, but still get points
Conclusions Success! best feature set: basic NLP & POS tagging PC Score: 0.601 F1 Score: 0.551 Classifier comparison (jBNC) N-way classification problem 22 genres average of 3.02 genres/datum Dataset Issues consistency diversity size