Federal Big Data Working Group Meetup

Federal Big Data Working Group Meetup Dr. Brand Niemann Director and Senior Data Scientist Semantic Community http://semanticommunity.info/ http://www.meetup.com/Federal-Big-Data-Working-Group/ http://semanticommunity.info/Data_Science/Federal_Big_Data_Working_Group_Meetup January 7, 2014

Mission Statement • Federal: Supports the Federal Big Data Initiative, but not endorsed by the Federal Government or its Agencies; • Big Data: Supports the Federal Digital Government Strategy which is "treating all content as data", so big data = all your content; • Working Group: Data Science Teams composed of Federal Government and Non-Federal Government experts producing big data products (see Possible Team Presentations below); and • Meetup: The world's largest network of local groups to revitalize local community and help people around the world self-organize like MOOCs (Massive Open On-line Classes) being considered by the White House.

Co-organizers • Brand Niemann and Kate Goodier • Kate Goodier, Host: Excelerate Solutions offices in Tysons Corner: • Capacity about 50 with Skype and wifiavailable. The Silver Line Spring Hill Metro Stop (planned to open in March) is across the street (Route 7 and Spring Hill Road). • Directions to the building are easy and they have open underground parking: • See photo below from Excelerate Solutions Office looking south to the Spring Hill Road Silver Line Metro Station (planned to open in March 2014). • Logistics: • Refreshments, restrooms, etc.

Suggested Format • 6:30 p.m. Tutorials (I will start with - Proposed GMU Course, and hope that others would offer to do tutorials as well) and Refreshments • 7:00 p.m. Introductions and Announcements (10 seconds per individual depending on the size of the group) • Remarks by Dr. George Strawn, Director, NITRD/NCO and co-chair of the Federal Big Data Senior Steering Work Group • 7:15 p.m. Featured Presentation/Demonstration (where did you get the data, where did you store the data, and what were your results) • Start with our Semantic Big Data Science Application: Semantic Medline on the YarcData Graph Appliance for the Federal Big Data Senior Steering Work Group that our Semantic Data Science Team made a good presentation of to Lee Watkins Jr., Director of Bioinformatics at the Institute of Genetic Medicine Center for Inherited Disease Research (CIDR) recently. • 8:30 p.m. Networking/Individual Demos (talk among yourselves and look at one another's work) • 9:00 p.m. Continue Your Conversations Elsewhere (We need to clear out of the space)

Next Meetups • Second Meetup: Tuesday, February 4, 6:30 p.m. • Continue Data Science Tutorial: Practical Data Science for Data Scientists • What Went Wrong with the Obamacare Web Site, and How Can It Be Fixed? and Why the First Rollout of HealthCare.gov Crashed, an Architectural Assessment, Eric Kavanagh, Inside Analysis, and Geoffrey Malafsky, PSIKORS Institute; Healthcare.gov Data Science, Brand Niemann, Semantic Community; and Healthcare.gov Prototype Video, Kees van Mansom, Be Informed • Third Meetup: Tuesday, February 18, 6:30 p.m. • Continue Data Science Tutorial: Modus Operandi Semantic Knowledge Base • Wave All-Source Semantic Fusion Engine: Eric Little, Modus Operandi: and Department of Defense Metadata Engineers. • FourthMeetup: March 4, 6:30 p.m. • Continue Data Science Tutorial: Graph Databases and Bigdata SYSTAP Literature Survey of Graph Databases • Bigdata SYSTAP, Michael Personick and Bryan Thompson, SYSTAP • April Workshop: Date and Location TBA • 2nd Cloud: SOA, Semantics, Data Science, and Business Concept Computing (16th SOA for eGov Conference).

Practical Data Science for Data Scientists http://semanticommunity.info/Data_Science/Practical_Data_Science_for_Data_Scientists

Resources • Required Textbook • Doing Data Science: • http://shop.oreilly.com/product/0636920028529.do • Free Sampler: • http://cdn.oreillystatic.com/oreilly/booksamplers/9781449358655_sampler.pdf (PDF) • Optional Supplemental Reading: • Data Science Starter Kit: • http://shop.oreilly.com/category/get/data-science-kit.do • DC Data Community: • http://datacommunitydc.org/blog/about/ • DC Data Community Calendar: • http://datacommunitydc.org/blog/calendar/ • Technology Requirements • Internet and Free Tools like Spotfire Cloud: • https://spotfire.cloud.tibco.com/tsc/#!/compproductrequest • NodeXL: • http://nodexl.codeplex.com/

Class 1 • 1/21 What is Data Science and the Data Science Process? • Discuss Reading: Chapters 1 and 2 • My Resources: • http://semanticommunity.info/Data_Science • http://semanticommunity.info/Analytics/Predictive_Analytic_World_Government_2013#Story • Hands-on Class Exercise: Individual and Team Profiles and Case Study: RealDirect

Tutorial • Overview: Data Science and the Data Science Process • My Profile: Breaking Government/AOL Government Data Stories and Products • Select some interesting content and make it structured • Select a related data set/table • Explore both and write a story about it: • Where did you get the data?, • Where did you store the data?, and • What were your results? • What were the steps? • Assignment: Do something like My Profile

Overview: Data Science Key Concepts Extracted What is Data Science? The future belongs to the companies and people that turn data into products See Sidebar Topics http://semanticommunity.info/Data_Science

Overview: Data Science Process So my three overlapping circles are: "Find and Prepare Data Sets", "Store and Query Data Sets", and "Discover Data Stories in the Data Sets“ See mapping between the three Venn Diagrams in the table below. http://semanticommunity.info/Analytics/Predictive_Analytic_World_Government_2013#Story

Select some interesting content http://breakinggov.com/2012/03/30/defense-department-bets-big-on-big-data/

Make it structured http://semanticommunity.info/@api/deki/files/27612/SpotfireCloud.xlsx

Select a related data set/table My Note: This is Categorized (Faceted Search) Correlation (Two Numeric Variables) Relational (Columns and Rows) Linked (URLs) Semantic Web (Subject, Predicate, and Object) Graph/Network Analytics (Edge and Node Tables) Geospatial (Could add Latitude and Longitude) http://semanticommunity.info/@api/deki/files/27612/SpotfireCloud.xlsx

AOL Gov to BreakingGov Migration Note: The lack of correlation between Excel size and Spotfire size is due to the presence of large boundary (Shape) files). Web Player

Spotfire Silver to Spotfire Cloud Migration Web Player

Explore both and write a story about it • Where did you get the data?, • The Web and spreadsheets • Where did you store the data?, and • Spreadsheets • What were your results? • All files were accounted for in the two migrations (data quality), versatile formats were created, and visualizations help me and others build on this data science work • Steps: • Search MindTouch for Spotfire File Name: Like GDELT-Spotfire • Find Where It Was Used at One Or More Locations • Change Web Player Links in Spotfire Dashboard, Story, and Slides • Test to See If Embedded File Works • Repeat the Process 283 Times!

Preview of What You Are Going To Hear • The Best Way to Get BIG DATA is By Starting Small: • BIG DATA • Subcommittee on Networking and Information Technology Research and Development (NITRD Subcommittee) • These three activities fostered Semantic Medline on the YarcData Graph Appliance for the White House Big Data Initiative. • Data Science Team Example • Generic Problems • Semantic Medline – YarcData Graph Appliance Application for Federal Big Data Senior Steering WG • Modus Operandi: Mantra, Performance, and Vision • Knowledge Base: Modus Operandi Web Intelligence in MindTouch • Big Data in Memory: Innovation Story

Federal Big Data Working Group Meetup