150 likes | 265 Views
Project #2 Presentation CS895 – Applied Visual Analytics, Spring 2013 Old Dominion University. VAST Mini challenge 1 – Visualizing Box office http ://boxofficevast.org/ Mat Kelly, Sai Chaitanya Tirumerla , Ibrahim Ben Mustafa { mkelly , stirumer , iben }@cs.odu.edu
E N D
Project #2 PresentationCS895 – Applied Visual Analytics, Spring 2013Old Dominion University VAST Mini challenge 1 – Visualizing Box officehttp://boxofficevast.org/ Mat Kelly, SaiChaitanyaTirumerla, Ibrahim Ben Mustafa {mkelly, stirumer, iben}@cs.odu.edu May 3rd , 2013 Group Wiki Page www.bit.ly/cs795s13 For More info Visit our Wiki page bit.ly/cs795s13
Project Description{http://boxofficevast.org/vast-welcome.html} • “Main theme - Movie success at the box office and in viewer ratings” • Predict Ticket sales vs. Movie ratings • Visual analytics to support the statement • Restricted Sources given to us • IMDb - http://www.imdb.com/interfaces • Twitter – Twitter API • Not sure of bitly For More info Visit our Wiki page www.bit.ly/cs795s13
Approach to the project • Dumping the data from the restricted resources • Working on IMDb and twitter data simultaneously • Using sentiment based-classifier and differentiating the classes of positive , neutral and negative sentiment • Tried integrating python libraries for the twitter data {https://dev.twitter.com/docs/twitter-libraries} For More info Visit our Wiki page bit.ly/cs795s13
Time-framing issues accessing the tweets by ID • Choosing OAuth 1.1 route to deal with the issue , which also failed frequently ---------------End of Twitter Data------------- • IMDb – Huge data , unrefined consisting Many text files. • According to the theme the .list files (actors, business, movies and ratings) were selected • Ran into memory issues • Python code did not fetch proper results For More info Visit our Wiki page www.bit.ly/cs795s13
Brute Force Approach • Manual extraction of data • Establishing links between the files • Data from IMDb API {http://imdbapi.org/} • Manually sanitized and converted the data to JSON • Python-based processing script to read the JSON For More info Visit our Wiki page www.bit.ly/cs795s13
Manual strategy – Collection & Cleaning • Taking vast movies and IMDb movies into consideration • For each movie Top three actors • For each Actor Movies where he/she is the lead/top three • For each Movie obtained , box office take on the opening weekend and ratings are collected , we repeated the process of selecting three actors and performed this operation recursively • Factors for prediction and design --The top three actors in the movie --The leading writer of the movie -- The leading director of the movie --Each of the previously listed member’s previous success with movies (rating and box office take) For More info Visit our Wiki page www.bit.ly/cs795s13
Flow • For actor John Smith (in 3+ movies, VASTMovie1, IMDBMovie1, and IMDBMovie2): - VASTMovie1----Movie ID---Sales----Ratings -- John Smith (id 012) -- Actor2 (id 123) -- Actor3 (id 234) -IMDBMovie1---Movie ID---Sales----Ratings -- Actor4 (456) -- John Smith (id 012) -- Actor5 (345) - IMDBMovie2---Movie ID---Sales----Ratings -- Actor2 (id 123) -- Actor6 (id 678) -- Actor7 (id 890) For More info Visit our Wiki page www.bit.ly/cs795s13
G.I. Joe : Retaliation (2013) 6.2 Dwayne Johnson ( nm0425005 ) Jonathan Pryce ( nm0000596 ) Byung - hun Lee ( nm0496932 ) Snitch (2013) 6.8 Dwayne Johnson ( nm0425005 ) Barry Pepper ( nm0001608 ) Jon Bernthal ( nm1256532 ) Masquerade (2012) 7.6 Byung - hun Lee ( nm0496932 ) Seung - yongRyoo ( nm2440627 ) Hyo -ju Han ( nm2174122 ) nm0425005 Dwayne Johnson Pain & Gain (2013) : tt1980209 $20 ,244 ,505 - 7.0 G.I. Joe : Retaliation (2013) : tt1583421 $40 ,501 ,814 - 6.2 Snitch (2013) : tt0882977 $13 ,167 ,607 - 6.8 Journey 2: The Mysterious Island (2012) : tt1397514 $27 ,335 ,363 - 5.7 nm0000596 Jonathan Pryce Dark Blood (2012) : tt0293069 N/A - N/A Hysteria (2011) : tt1435513 $35 ,656 - 6.7 My Zinc Bed (TV 2008) : tt1056101 N/A - 5.6 Sherlock Holmes and the Baker Street Irregulars (TV 2007) : tt0892743 N/A - 6.3 The Moon and the Stars (2007) : tt0460873 N/A - 6.1 Brothers of the Head (2005) : tt0432260 $10 ,794 - 6.3 De - Lovely (2004) : tt0352277 $123 ,920 - 6.4 The Affair of the Necklace (2001) : tt0242252 $125 ,523 - 6.0 For More info Visit our Wiki page www.bit.ly/cs795s13
Prediction Algorithm Validating a Sample Movie; the numbers given: Calculated from Movie Attrributes: For More info Visit our Wiki page www.bit.ly/cs795s13
Tools • D3.js • http://d3js.org/ • jQuery • http://jquery.com/ • Glue Javascript • Python • IMDB API • http://imdbapi.org/ • IMDbPY • http://imdbpy.sourceforge.net/ For More info Visit our Wiki page www.bit.ly/cs795s13
Future Work • Show the actors’ other works in the calculation of prediction • Provide more movie info when movie node clicked • More effective visualization with the information of the movies , when hovered on the center of the circle • Info about selected actor’s other works , including statistics For More info Visit our Wiki page www.bit.ly/cs795s13
Problems • Acquiring data from IMDB and Twitter • Not everything is obtained from IMDBAPI • Failed attempts of several approaches we took initially • Complexity of data to handle within set time frame • Due to time constraints not everything is covered For More info Visit our Wiki page www.bit.ly/cs795s13
Implementation http://www.cs.odu.edu/~mkelly/semester/2013_spring/project2/ For More info Visit our Wiki page www.bit.ly/cs795s13
Questions? For More info Visit our Wiki page www.bit.ly/cs795s13
Thank You!! For More info Visit our Wiki page www.bit.ly/cs795s13