300 likes | 423 Views
Using Data Science as Evidence in Public Policy With Big Data and Elections. Dr. Brand Niemann Director and Senior Enterprise Architect – Data Scientist Semantic Community http://semanticommunity.info/ AOL Government Blogger http://gov.aol.com/bloggers/brand-niemann/ November 1-2, 2012
E N D
Using Data Science as Evidence in Public Policy With Big Data and Elections Dr. Brand Niemann Director and Senior Enterprise Architect – Data Scientist Semantic Community http://semanticommunity.info/ AOL Government Blogger http://gov.aol.com/bloggers/brand-niemann/ November 1-2, 2012 http://semanticommunity.info/CNSTAT
Start by Asking Questions • Which by State, Congressional District, and which by time? • Which is the easiest to reformat? • Which is the most interesting? • Where have the candidates been? • Which data is free? • Etc. Note: Drew Conway (@drewconway) speaking about the joys, challenges, and power of data science. "Data science, as a discipline, is fundamentally about human behavior.” http://semanticommunity.info/AOL_Government/2012_Recorded_Future_User_Conference
Then Look for the Evidence • Brainstorm: • What Have I Done Before? • 2012 Annual Statistical Abstract: • Chapter 7. Elections • Google Searches: • Election and Voting Data • Conferences: • National Academy Seminars • Television: • Debates, etc.
Begin With the End In Mind(Stephen Covey) • Story (publicity and money) • Research Notes (document what I did and learned) • Conditioned Data Sets (added value) • Spotfire Dashboard (cool visualizations) • Lecture to Students at George Mason University (help them learn what a data scientist/data journalist does)
My 5-Step Method • So what I like to do to illustrate (data science) and explain (data journalism) in the following (like a recipe): • Put the Best Content into a Knowledge Base (e.g. MindTouch) • The 2012 Annual Statistical Abstract, CNSTAT, etc. • Put the Knowledge Base into a Spreadsheet (Excel) • Linked Data to Subparts of the Knowledge Base • Put the Spreadsheet into a Dashboard (Spotfire) • Data Integration and Interoperability Interface • Put the Dashboard into a Semantic Model (Excel) • Data Dictionaries and Models • Put the Semantic Model into Dynamic Case Management (Be Informed) • Structured Process for Updating Data in the Dashboard
Knowledge Base http://semanticommunity.info/CNSTAT
2012 Annual Statistical Abstract:Chapter 7. Elections (Visualizations) http://semanticommunity.info/FedStats.net#Section_7_ELECTIONS
2012 Annual Statistical Abstract:Chapter 7. Elections (Metadata) http://semanticommunity.info/FedStats.net#Section_7._Elections
FedStat.net: Commemorating over 135 years of making statistics available to citizens everywhere http://semanticommunity.info/FedStats.net#Story
FedStats.gov Remains Rich Source Of Government Data For Citizens http://gov.aol.com/2012/07/26/fedstats-gov-remains-rich-source-of-government-data-for-citizens/
2012 Annual Statistical Abstract http://www.census.gov/compendia/statab/
Data From CD-ROM to My Server http://semanticommunity.net/StatAbs2012/
Spreadsheet http://semanticommunity.info/@api/deki/files/19606/Elections2012.xls
Welcome to the Campaign 2012 Interactive Dashboard My Note: Not like the next slide! http://campaign2012.c-span.org/electoral-college-map
CNN Electoral Map http://www.cnn.com/ELECTION/2012/ecalculator
CNN Electoral Map in Excel http://semanticommunity.info/@api/deki/files/19606/Elections2012.xls
CNN Electoral Map in Spotfire https://silverspotfire.tibco.com/us/library#/users/bniemann/Public?Elections-Spotfire
Data Set Inventory and Results http://semanticommunity.info/CNSTAT#Story
2012 Annual Statistical Abstract Election Tables Metadata https://silverspotfire.tibco.com/us/library#/users/bniemann/Public?Elections-Spotfire
Table 397. Participation in Elections for President and U.S. Representatives and Table 402. Vote Cast for President, by Major Political Party https://silverspotfire.tibco.com/us/library#/users/bniemann/Public?Elections-Spotfire
Table 405. Electoral Vote Cast for President by Major Political Party--States https://silverspotfire.tibco.com/us/library#/users/bniemann/Public?Elections-Spotfire
Table 408. Apportionment of Membership in House of Representatives, by State https://silverspotfire.tibco.com/us/library#/users/bniemann/Public?Elections-Spotfire
Table 410. Vote Cast by Congressional Districts: 2010 https://silverspotfire.tibco.com/us/library#/users/bniemann/Public?Elections-Spotfire
Cover Page https://silverspotfire.tibco.com/us/library#/users/bniemann/Public?Elections-Spotfire
Conclusions and Suggestions • I had the pleasure of attending three very interesting and related professional statistical meetings recently that showed that statisticians really care about current issues. • This made me appreciate that elections are a big data problem that is approached in three basic ways: Historical elections data, Collection and modeling of polling survey data before the election, and Use of social media. • So I used inventoried the historical and polling survey data (I could get) to aid in selection and visualization in a dashboard and found I needed both Congressional and State boundary files as shown in a table. • So imagine an election season in which we had less or no polls to influence voters so they could focus on the candidates and the issues and then we got an amazing example of big data processing just after the polls closed (by gentleman's agreement with Congress) which we could all participate in by seeing the precinct voting results posted to Twitter and processed by many apps that developers had developed to bring us interesting and useful results. I am eager to see that to happen in 2014 and 2016! • I will be updating these results with the final 2012 elections data and providing another story.
Extra Slides • Boundary Files: • US States Repositioned • US Counties Repositioned • US Congressional Districts 1 • US Congressional Districts 2 • Sources: • Spotfire • https://silverspotfire.tibco.com/us/library • US Census • http://www.census.gov/cgi-bin/geo/shapefiles2010/main
US States Repositioned https://silverspotfire.tibco.com/us/library#/users/bniemann/Public?Elections-Spotfire
US Counties Repositioned https://silverspotfire.tibco.com/us/library#/users/bniemann/Public?Elections-Spotfire
US Congressional Districts 1 https://silverspotfire.tibco.com/us/library#/users/bniemann/Public?Elections-Spotfire
US Congressional Districts 2 https://silverspotfire.tibco.com/us/library#/users/bniemann/Public?Elections-Spotfire