1 / 15

Project #2 Presentation CS895 – Applied Visual Analytics, Spring 2013 Old Dominion University

Project #2 Presentation CS895 – Applied Visual Analytics, Spring 2013 Old Dominion University. VAST Mini challenge 1 – Visualizing Box office http ://boxofficevast.org/ Mat Kelly, Sai Chaitanya Tirumerla , Ibrahim Ben Mustafa { mkelly , stirumer , iben }@cs.odu.edu

sadah
Download Presentation

Project #2 Presentation CS895 – Applied Visual Analytics, Spring 2013 Old Dominion University

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Project #2 PresentationCS895 – Applied Visual Analytics, Spring 2013Old Dominion University VAST Mini challenge 1 – Visualizing Box officehttp://boxofficevast.org/ Mat Kelly, SaiChaitanyaTirumerla, Ibrahim Ben Mustafa {mkelly, stirumer, iben}@cs.odu.edu May 3rd , 2013 Group Wiki Page www.bit.ly/cs795s13 For More info Visit our Wiki page bit.ly/cs795s13

  2. Project Description{http://boxofficevast.org/vast-welcome.html} • “Main theme - Movie success at the box office and in viewer ratings” • Predict Ticket sales vs. Movie ratings • Visual analytics to support the statement • Restricted Sources given to us • IMDb - http://www.imdb.com/interfaces • Twitter – Twitter API • Not sure of bitly For More info Visit our Wiki page www.bit.ly/cs795s13

  3. Approach to the project • Dumping the data from the restricted resources • Working on IMDb and twitter data simultaneously • Using sentiment based-classifier and differentiating the classes of positive , neutral and negative sentiment • Tried integrating python libraries for the twitter data {https://dev.twitter.com/docs/twitter-libraries} For More info Visit our Wiki page bit.ly/cs795s13

  4. Time-framing issues accessing the tweets by ID • Choosing OAuth 1.1 route to deal with the issue , which also failed frequently ---------------End of Twitter Data------------- • IMDb – Huge data , unrefined consisting Many text files. • According to the theme the .list files (actors, business, movies and ratings) were selected • Ran into memory issues • Python code did not fetch proper results For More info Visit our Wiki page www.bit.ly/cs795s13

  5. Brute Force Approach • Manual extraction of data • Establishing links between the files • Data from IMDb API {http://imdbapi.org/} • Manually sanitized and converted the data to JSON • Python-based processing script to read the JSON For More info Visit our Wiki page www.bit.ly/cs795s13

  6. Manual strategy – Collection & Cleaning • Taking vast movies and IMDb movies into consideration • For each movie  Top three actors • For each Actor  Movies where he/she is the lead/top three • For each Movie obtained , box office take on the opening weekend and ratings are collected , we repeated the process of selecting three actors and performed this operation recursively • Factors for prediction and design --The top three actors in the movie --The leading writer of the movie -- The leading director of the movie --Each of the previously listed member’s previous success with movies (rating and box office take) For More info Visit our Wiki page www.bit.ly/cs795s13

  7. Flow • For actor John Smith (in 3+ movies, VASTMovie1, IMDBMovie1, and IMDBMovie2): - VASTMovie1----Movie ID---Sales----Ratings -- John Smith (id 012) -- Actor2 (id 123) -- Actor3 (id 234) -IMDBMovie1---Movie ID---Sales----Ratings -- Actor4 (456) -- John Smith (id 012) -- Actor5 (345) - IMDBMovie2---Movie ID---Sales----Ratings -- Actor2 (id 123) -- Actor6 (id 678) -- Actor7 (id 890) For More info Visit our Wiki page www.bit.ly/cs795s13

  8. G.I. Joe : Retaliation (2013) 6.2 Dwayne Johnson ( nm0425005 ) Jonathan Pryce ( nm0000596 ) Byung - hun Lee ( nm0496932 ) Snitch (2013) 6.8 Dwayne Johnson ( nm0425005 ) Barry Pepper ( nm0001608 ) Jon Bernthal ( nm1256532 ) Masquerade (2012) 7.6 Byung - hun Lee ( nm0496932 ) Seung - yongRyoo ( nm2440627 ) Hyo -ju Han ( nm2174122 ) nm0425005 Dwayne Johnson Pain & Gain (2013) : tt1980209 $20 ,244 ,505 - 7.0 G.I. Joe : Retaliation (2013) : tt1583421 $40 ,501 ,814 - 6.2 Snitch (2013) : tt0882977 $13 ,167 ,607 - 6.8 Journey 2: The Mysterious Island (2012) : tt1397514 $27 ,335 ,363 - 5.7 nm0000596 Jonathan Pryce Dark Blood (2012) : tt0293069 N/A - N/A Hysteria (2011) : tt1435513 $35 ,656 - 6.7 My Zinc Bed (TV 2008) : tt1056101 N/A - 5.6 Sherlock Holmes and the Baker Street Irregulars (TV 2007) : tt0892743 N/A - 6.3 The Moon and the Stars (2007) : tt0460873 N/A - 6.1 Brothers of the Head (2005) : tt0432260 $10 ,794 - 6.3 De - Lovely (2004) : tt0352277 $123 ,920 - 6.4 The Affair of the Necklace (2001) : tt0242252 $125 ,523 - 6.0 For More info Visit our Wiki page www.bit.ly/cs795s13

  9. Prediction Algorithm Validating a Sample Movie; the numbers given: Calculated from Movie Attrributes: For More info Visit our Wiki page www.bit.ly/cs795s13

  10. Tools • D3.js • http://d3js.org/ • jQuery • http://jquery.com/ • Glue Javascript • Python • IMDB API • http://imdbapi.org/ • IMDbPY • http://imdbpy.sourceforge.net/ For More info Visit our Wiki page www.bit.ly/cs795s13

  11. Future Work • Show the actors’ other works in the calculation of prediction • Provide more movie info when movie node clicked • More effective visualization with the information of the movies , when hovered on the center of the circle • Info about selected actor’s other works , including statistics For More info Visit our Wiki page www.bit.ly/cs795s13

  12. Problems • Acquiring data from IMDB and Twitter • Not everything is obtained from IMDBAPI • Failed attempts of several approaches we took initially • Complexity of data to handle within set time frame • Due to time constraints not everything is covered For More info Visit our Wiki page www.bit.ly/cs795s13

  13. Implementation http://www.cs.odu.edu/~mkelly/semester/2013_spring/project2/ For More info Visit our Wiki page www.bit.ly/cs795s13

  14. Questions? For More info Visit our Wiki page www.bit.ly/cs795s13

  15. Thank You!! For More info Visit our Wiki page www.bit.ly/cs795s13

More Related