90 likes | 259 Views
DNC-Big Data and Data Mining in 2012 US Election. Azamat Kamzin Mandar Bhide. Overview. Highlights of Narwhal System Organization Classification Associative patterns Predictive models References. Highlights. Codename: Narwhal Budget:$100 million Lead Developer: Scott VanDenPlas
E N D
DNC-Big Data and Data Mining in 2012 US Election Azamat Kamzin Mandar Bhide
Overview • Highlights of Narwhal • System Organization • Classification • Associative patterns • Predictive models • References
Highlights • Codename: Narwhal • Budget:$100 million • Lead Developer: Scott VanDenPlas • Chief Analytics: Dan Wagner • Team: Approx. 200 members • General Objective: • Bring together information on voters, supporters, donorsat one place( unlike in 2008 where information was split 6 different servers/vendors) • It was top 20 largest consumer/customer databases ever made • Size: As per VanDenPlas tweet • “4Gb/s, 10k requests per second, 2,000 nodes, 3 datacenters, 180TB and 8.5 billion requests...” • (Service Provider:Amazon Cloud)
System Organization Call/Email to motivate the voter 2008 Voter databases Best Channel and timeslot to advertise Narwhal DreamCatcher Directing volunteers to right door Private/ Public Databases • Level of support for Obama • Likelihood to vote • Estimate donation Amount Data Collection /Enrichment Right email Ad to right person • Automated 1.2 million call survey per day • Tracking visitors behavior online using cookies
Dreamcatcher -Voter Classification • Classification was done in 4 categories
Dreamcatcher:Association Pattern • Output: Detailed profile of voters • Inputs are attributes of each individual stored in Narwhal • Voting history • Social media Likes, comments • Volunteering • Magazine subscriptions • Registered car • Insurance data • Individual Private Information from firms like Aristotle
Predictive Models • A/B Testing: • To understand which image or text user response will be higher • Ex. “Learn More” garnered 18.6 percent more signups per visitor than the default of “Sign Up.” • Time Series Analysis: • To understand Approval and disapproval trend
Predictive Models • Regression • Used to calculate Electoral votes(dependent variable) based on top issues such as economy, healthcare etc. • Packages used were SAS, R and MATLAB • Decision Trees • We don’t believe they used decision trees due to large number of attributes which differ with each individual
Reference • Author: Michael Scherer ( November8, 2012). “How Obama's data crunchers helped him win”. Retrieved from http://www.cnn.com/2012/11/07/tech/web/obama-campaign-tech-team • Author: Sasha Issenberg (December 19, 2012). “How President Obama’s campaign used big data to rally individual voters”. Retrieved from http://www.technologyreview.com/featuredstory/509026/how-obamas-team-used-big-data-to-rally-voters/