110 likes | 248 Views
Design and Implementation of a Dynamic Data MLP to Predict Motion Picture Revenue. David A. Gerasimow. Problem Statement. Problem: Motion picture revenue is seemingly unpredictable.
E N D
Design and Implementation of a Dynamic Data MLP to Predict Motion Picture Revenue David A. Gerasimow
Problem Statement • Problem: Motion picture revenue is seemingly unpredictable. • Solution: Develop an artificial neural network that takes into account the characteristics of successful films and predicts the opening weekend box-office revenue of upcoming releases. • However, the film industry is constantly changing as is public taste. • Consequently, develop dynamic data artificial neural network that is constantly retraining itself to the most up-to-date data.
Data Collection 1 • Determine the significant characteristics of a film that contribute to its success or failure at the box-office. • The characteristics include: 1) Month and year of release 2) Genre 3) Rating (i.e., G, PG, etc.) 4) Runtime 5) Number of theatres in which the film is played 6) Production studio 7) Holiday weekend opening? 8) Sequel? 9) Color, black and white, or animation
Data Collection 2 • All films released since 1989 that earned more than fifteen million dollars can be found at: www.boxofficeguru.com • Furthermore, film specific information (i.e., genre, etc.) is listed at: www.imdb.com • Data collection application development (in Visual Basic 6.0) • dataextractor.exe extracts information from files downloaded from www.boxofficeguru.com and converts them to a readable format. • dataconcatenator.exe links the readable files into a single file. • dataconverter.exe searches single data file to determine which data fields need to be filled in manually at www.imdb.com • This data collection process needs to be performed only once and using it to design an ANN will create a standard static data neural network.
Dynamic Data Collection • Develop an application that will gather data continually and automatically – allowing ANN to be retrained using up-to-date data. • updatewizard.exe (developed in Visual Basic 6.0) • Functionality of updatewizard.exe • Step 1: Download up-to-date data from www.boxofficeguru.com, process and concatenate. • Step 2: Compare up-to-date data to current data. If there is a difference, ANN needs to be retrained. • Step 3: Create new training and testing files from up-to-date data.
Developing ANN • For motion picture revenue application, MLP is appropriate. • Determine optimal MLP configuration using: • Three-way cross-validation • Multiple trials of MLP training • Compute mean and standard deviation of classification rates to choose configuration.
MLP Configuration • After three-way cross-validation and multiple trials, the results were: • 10-6-X configuration (X represents the number of output classes – varies depending on options chosen in updatewizard.exe) • Learning rate: α = 0.1 • Momentum constant: μ = 0.7 • Max. number of epochs: 5000 • Samples per epoch: 64 • Scaling of input: [-5, 5] • Other values are defaults as specified in bp.m
MATLAB Files for MLP • Project MATLAB m-files modified from Professor Yu Hen Hu’s code for back-propagation MLP. • Modified code contained in: • moviesbp.m • moviesbptest.m • moviesbpconfig.m • Modification allows for: • application specific characteristics • hard-coding of configuration • interfaces with Windows application to predict opening weekend revenue of a newly-released film
Prediction • Windows application newmovie.exe allows user to enter a newly-released film’s characteristics using a graphical user interface. • newmovie.exe stores data in testsinglemovie.txt – which is read by moviesbp.m. Then, the moviesbp.m classifies the film.
Results 1 • MLP Classification Rates: 54% - 59% • Improvement over past ANN approaches used by students in CS/ECE/ME 539. • Random classification: Roughly 20% • Clearly, MLP performs well. • Dynamic Data Aspect • Data is updated weekly on www.boxofficeguru.com. Run updatewizard.exe to update automatically.
Results 2 • The project was functional for less than two weeks. • Thus, not enough time has past to accumulate enough data to make a statistically significant improvements in MLP performance. • According to dynamic data ANN model, performance should increase gradually over time.