160 likes | 287 Views
Federal Big Data Working Group Meetup. Dr. Brand Niemann Director and Senior Data Scientist Semantic Community http://semanticommunity.info/ http://www.meetup.com/Federal-Big-Data-Working-Group/ http://semanticommunity.info/Data_Science/Federal_Big_Data_Working_Group_Meetup June 30, 2014.
E N D
Federal Big Data Working Group Meetup Dr. Brand Niemann Director and Senior Data Scientist Semantic Community http://semanticommunity.info/ http://www.meetup.com/Federal-Big-Data-Working-Group/ http://semanticommunity.info/Data_Science/Federal_Big_Data_Working_Group_Meetup June 30, 2014
Mission Statement • Federal: Supports the Federal Big Data Initiative, but not endorsed by the Federal Government or its Agencies; • Big Data: Supports the Federal Digital Government Strategy which is "treating all content as data", so big data = all your content; • Working Group: Data Science Teams composed of Federal Government and Non-Federal Government experts producing big data products (How was the data collected, Where is it stored, and What are the results?); and • Meetup: The world's largest network of local groups to revitalize local community and help people around the world self-organize like MOOCs (Massive Open On-line Classes) being considered by the White House. Co-organizers: Brand Niemann and Katherine Goodier
What Are We Doing? • Leadership of the Semantic Data Science Team that produced Semantic Medline running on the Yarc Data Graph Appliance. • Founding and co-organizing of the Federal Big Data Working Group Meetup. • A graduate class prepared for GMU entitled “Practical Data Science for Data Scientists”. • Using the Cross Industry Standard Process for Data Mining (CRISP-DM; Shearer, 2000) to build a Data Science Knowledge Base • Mining of the Data Science and Digital Earth scientific journals for the CODATA International Workshop on Big Data for International Scientific Programmes, (June 8-9, in Beijing). • Participation in the Data FAIRport (Findable, Accessible, Interoperable, and Reusable) with “Data Publication in Data Browsers”. • Providing data stories that persuade and presentation materials for public education conferences like the COM.BigDataConference (August 4-6, in Washington, DC).
How Are we Doing it? • Federating Uses Cases: Data Science (Brand Niemann); Environmental and Earth Science (Joan Aron); and Astronomy (Kirk Borne) • Federating Data Publications: Structured Scientific Content (Papers, journals, books, reports, etc.); Data FAIRports (Findable, Accessible, Interoperable); and Reusable Data Stories That Persuade (Claims and Evidence) • Federating Solutions & Technologies: Hand-Crafted by Individuals and Teams (Mary Galvin, STEM); Data Mining Standards and Products (Brand Niemann, Data Publications in Data Browsers); Machine Processing (Fredrik Salvesen, Semantic Data Publications on Yarc Data Graph Appliance); Reading and Reasoning (Katherine Goodier and Chuck Rehberg (Semantic Insights on Elsevier Content Text Mining); and Data Curation at Scale (Alan Wagner, Tamr on 1000s of Spreadsheets)
Data FAIRPort Final Report, Interview, and Joint Hackathons Started http://datafairport.org/ http://semanticommunity.info/Data_Science/Euretos_BRAIN
June 2ndMeetup: Ontology Summit 2014 Postmortem and Reading & Reasoning with Semantic Insights • How Was the Meetup? • Meeting was very good. • This is a smart group. Thank you. • We Listen and Respond: • Visual Document Mining. Simple clustering applied to text (with no ontology). • SIRA is much more advanced, but you might like to watch the 4 minute video. • https://www.overviewproject.org/ http://www.meetup.com/Federal-Big-Data-Working-Group/events/186838842/
The Overview Project https://www.overviewproject.org/
Fourth Paradigm and Fourth Question • The Fourth Paradigm of Science (1): • First Paradigm. Observation, descriptions of natural phenomena, and experimentation. • Second Paradigm. Theoretical science such as Newton’s laws of motion and Maxwell’s equations. • Third Paradigm. Simulation and modelling, such as in astronomy. • Fourth Paradigm. Data-intensive science that exploits the large volumes of data in new ways for scientific exploration, such as the International Virtual Observatory Alliance in astronomy. • The Fourth Question of Big Data for Science (2): • How was the data collected? • Where is the data stored? • What are the data results? • Does the data story persuade? Bell G, Hey, T., & Szalay, A. (2009) Beyond the data deluge, Science 323, 6 March 2009, pp. 1297-1298. de Waard, Anita, (2014) About Stories, that Persuade With Data, Federal Big Data Working Group Meetup, 20 May,, 41 slides.
Activities • Mentoring: • White House Energy Datapalooza, May 28 (In process with Alexandra Winkler, Knowledge Cities Graduate Student) • Health Datapalooza V, June 1-3, and HHS Fellowship: • Story and Application for HHS 12-month External Entrepreneur Fellowship for Innovative Design, Development and Linkages of Databases • Big Data for Government, June 16-17: • Keynote from Dr. George Strawn and Presentation by Dr. Tom Rindflesch and Semantic Medline/YarcData Team • Improving Government Performance in the Era of Big Data: Opportunities and Challenges for Federal Agencies, June 19: • A big data workshop hosted by White House Office of Science and Technology Policy and the Georgetown University McCourt School of Public Policy’s Massive Data Institute with Mary Galvin • Earth Cube All-Hands Meeting, June 24-26: • ESIP Earth Science Analytics with Joan Aron, Global Environmental/Climate Change Scientist • Keynote and Panel: COM.BigData2014, August 4-6: • Katherine Goodier, Organizer and Moderator, with Joan Aron, Mary Galvin, Chuck Rehberg, Tom Rindflesch, and Kirk Borne
EarthCube Data Science Publications http://workspace.earthcube.org/earthcube-data-science-publications http://semanticommunity.info/Data_Science/EarthCube_Data_Science_Publications
Keynote and Panel: COM.BigData 2014 http://www.com-geo.org/conferences/2014/prog_keynotes.htm
Next Meetup: July 7Data Science of White House Big Data Review and Brooke Aker: Big Data Lens on OpendFDA • Katherine Goodier: • Legislative Data and Transparency Conference and Use Case on Privacy and Security • Mary Galvin: • Improving Government Performance in the Era of Big Data: Opportunities and Challenges for Federal Agencies • Brooke Aker: • Background • Working on data analytics since 1987 when I did my first regression analysis on surplus government cheese !! Now working on healthcare and security predictive analytics and machine learning. • Networking and Agenda with Announcements, Presentations, Training, and Demos • Here is a nice method to use if you are seeking to understand new technology, it's applicability and readiness for use. It is also emblematic of good Big Data practice - turning a large, free information resource into something valuable with simple straightforward thinking and driven by sophisticated software. Enjoy. http://www.bigdatalens.com/blog/2014...ta-methodology • Participation in other Meetups • Lots of other Big Data Meetup Groups. Was at the Data Salon in Cambridge Mass last night !! • Do you live near a DC Metro Station and use Skype? • Use Skype for sure
June 30thMeetup:Continue Data Science Tutorial • Practical Data Science for Data Scientists: • Reading Assignments: • Chapter 15: The Students Speak • We invited the students who took Introduction to Data Science version 1.0 to contribute a chapter to the book. They chose to use their chapter to reflect on the course and describe how they experienced it. • Chapters 16: Next-Generation Data Scientists, Hubris, and Ethics • The best minds of my generation are thinking about how to make people click ads… That sucks. — Jeff Hammerbacher • We’d like to encourage the next-gen data scientists to become problem solvers and question askers, to think deeply about appropriate design and process, and to use data responsibly and make the world better, not worse. • Resources: AmericasDataFestCompetition • Team Homework Exercise: • Study about Graph Databases, Graph Computing, and Semantic Medline • Review Wiki and View Videos: YarcDataVideos (Schizo-7 minutes, Cancer-21 minutes). • Ask Me Questions and Prepare to Ask Questions Next Week
Practical Data Science for Data Scientists Providing On-Line Class With Private Tutoring Class 8 http://semanticommunity.info/Data_Science/Practical_Data_Science_for_Data_Scientists
Follow Ben Shneiderman's 8 Golden Rules of Data Science • Preparation • Choose actionable problems & appropriate theories • Consult domain experts & generalists • Exploration • Examine data in isolation & contexually • Keep cleaning & add related data • Apply visualizations& statistical patterns, clusters, gaps, outliers, missing & uncertain data • Decision • Evaluate your efficacy, refine your theory • Take responsibility, own your failures • World is complex, proceed with humility Source: "8 Golden Rules of Data Science“ http://semanticommunity.info/Data_Science/Ben_Shneiderman
Agenda • MIT Big Data Initiative: Sam Madden, & Current Elephants: Michael Stonebraker • Background: See Workshops on Extremely Large Databases • 6:30 pm Welcome and Introduction • 6:35 Laura Keilson - Xcelerate Solutions is hiring! • 6:40 pm MIT Big Data Initiative: bigdata@CAIL and the new Intel Science and Technology Center for Big Data, Sam Madden • 7:10 pm Brief Member Introductions • 7:15 pm Alan Wagner, Tamr Demo • 7:30 pm Why the current "elephants" are good at nothing, Data Tamer, and data integration issues, Michael Stonebraker • 8:30 p.m. Open Discussion • 8:45 p.m. Networking • 9:00 p.m. Depart • July 7 and August 4: Once a month • Silver Line Spring Hill Metro Station Opens July 26th