200 likes | 335 Views
Federal Big Data Working Group Meetup. Dr. Brand Niemann Director and Senior Data Scientist Semantic Community http://semanticommunity.info/ http://www.meetup.com/Federal-Big-Data-Working-Group/ http://semanticommunity.info/Data_Science/Federal_Big_Data_Working_Group_Meetup July 28, 2014.
E N D
Federal Big Data Working Group Meetup Dr. Brand Niemann Director and Senior Data Scientist Semantic Community http://semanticommunity.info/ http://www.meetup.com/Federal-Big-Data-Working-Group/ http://semanticommunity.info/Data_Science/Federal_Big_Data_Working_Group_Meetup July 28, 2014
Mission Statement • Federal: Supports the Federal Big Data Initiative, but not endorsed by the Federal Government or its Agencies; • Big Data: Supports the Federal Digital Government Strategy which is "treating all content as data", so big data = all your content; • Working Group: Data Science Teams composed of Federal Government and Non-Federal Government experts producing big data products (How was the data collected, Where is it stored, What are the results, and Does the data story persuade?); and • Meetup: The world's largest network of local groups to revitalize local community and help people around the world self-organize like MOOCs (Massive Open On-line Classes) being considered by the White House to reduce the cost of higher education. Co-organizers: Brand Niemann and Katherine Goodier
What Are We Doing? • Leadership of the Semantic Data Science Team that produced Semantic Medline running on the Yarc Data Graph Appliance. • Founding and co-organizing of the Federal Big Data Working Group Meetup. • A graduate class prepared for GMU entitled “Practical Data Science for Data Scientists”. • Using the Cross Industry Standard Process for Data Mining (CRISP-DM; Shearer, 2000) to build a Data Science Knowledge Base • Mining of the Data Science and Digital Earth scientific journals for the CODATA International Workshop on Big Data for International Scientific Programmes, June 8-9, in Beijing. • Participation in the Data FAIRport (Findable, Accessible, Interoperable, and Reusable) with “Data Publication in Data Browsers”. • Providing data stories that persuade and presentation materials for public education conferences like the COM.BigDataConference, August 4-6, in Washington, DC.
How Are we Doing it? • Federating Uses Cases: Data Science (Brand Niemann); Environmental and Earth Science (Joan Aron); and Astronomy (Kirk Borne) • Federating Data Publications: Structured Scientific Content (Papers, journals, books, reports, etc.); Data FAIRports (Findable, Accessible, Interoperable); and Reusable Data Stories That Persuade (Claims and Evidence) • Federating Solutions & Technologies: Hand-Crafted by Individuals and Teams (Mary Galvin, STEM); Data Mining Standards and Products (Brand Niemann, Data Publications in Data Browsers); Machine Processing (Fredrik Salvesen, Semantic Data Publications on Yarc Data Graph Appliance); Reading and Reasoning (Katherine Goodier and Chuck Rehberg (Semantic Insights on Elsevier Content Text Mining); and Data Curation at Scale (Alan Wagner, Tamr on 1000s of Spreadsheets)
Data FAIRPort Final Report, Interview, and Joint Hackathons Started http://datafairport.org/ http://semanticommunity.info/Data_Science/Euretos_BRAIN
Fourth Paradigm and Fourth Question • The Fourth Paradigm of Science (1): • First Paradigm. Observation, descriptions of natural phenomena, and experimentation. • Second Paradigm. Theoretical science such as Newton’s laws of motion and Maxwell’s equations. • Third Paradigm. Simulation and modelling, such as in astronomy. • Fourth Paradigm. Data-intensive science that exploits the large volumes of data in new ways for scientific exploration, such as the International Virtual Observatory Alliance in astronomy. • The Fourth Question of Big Data for Science (2): • How was the data collected? • Where is the data stored? • What are the data results? • Does the data story persuade? Bell G, Hey, T., & Szalay, A. (2009) Beyond the data deluge, Science 323, 6 March 2009, pp. 1297-1298. de Waard, Anita, (2014) About Stories, that Persuade With Data, Federal Big Data Working Group Meetup, 20 May,, 41 slides.
July 7th Meetup: Data Science for the Big Data Review and Brooke Aker: Big Data Lens • How Was the Meetup? • OpenFDA seems to provide a very easy interface for analysis against diverse data. • OpenFDA looks great - API needs some use! • Lots of good presenters and topics. Great reference material on the symantic web site as always. Thanks Brand and Katherine! • Along with introducing meeting participants to OpenFDA, the speakers, as is true of all good teachers, kept repeating the basic concepts and methods. Very good. • Meetup had excellent overview of OpenFDA, thanks for sharing the slides and links. • Informative discussion of FDA application • We Listen and Respond: • File Name: OpenFDA code.rtf. Description: Here is the python (ver 2.7) code I promised last night. Just comment/un-comment the section you want to run. • Here is a tutorial, albeit in a different domain, for beginners who want to learn just enough to access and visualize data on the web. http://www.vidyasource.com/tutorial/Web/HTML/JavaScript/Python/REST/Architecture/Programming/2014/01/14/starting-with-data • Here is the link to Glue, a Python tool for exploring and visualizing data sets. http://www.glueviz.org/en/latest/ http://www.meetup.com/Federal-Big-Data-Working-Group/events/186838842/
Brooke Aker Followup • Hoping everyone learned as much as I did. Good to have the chance to interact with like-minded folks. And all your connections and pointers are a big help to me Brand. • I can write the routine to spit out data into CSV (spreadsheet) format for you if you want. Do you know what you want to concentrate on? • Thank you and I do not know beforehand until I look at all the data rows and columns if possible. • OK you got it. How much (many records) are you looking for? • The 3.6 Million!
Data Science for OpenFDA http://semanticommunity.info/Data_Science/Data_Science_for_OpenFDA
Data Science for Vertical Data Mining http://semanticommunity.info/Data_Science/Data_Science_for_Vertical_Data_Mining
Data Science for RDA http://semanticommunity.info/Data_Science/Data_Science_for_RDA
Data Mining NSF Selected Publications We are starting to mine selected NSF publications for funding opportunities, data sets, policy guidance, etc. by hand and with SIRA. http://semanticommunity.info/Data_Science/NSF_Strategic_Plan/Selected_Publications#Story
Earth and Space Science Educationand Informatics • Thought Leader 1: Professor Kirk Borne • Topic 1-1: Teaching Science Data Analytics Skills Needed to Facilitate Heterogeneous Data/Information Research: The Future Is Here • Topic 1-2: Identifying and Better Understanding Data Science Activities, Experiences, Challenges, and Gaps Areas • Topic 1-3: Advancing Analytics using Big Data Climate Information System • Topic 1-4: Big Data in the Geosciences: New Analytics Methods and Parallel Algorithms • Thought Leader 2: Dr. Michael Stonebraker • Topic 2-1: Leveraging Enabling Technologies and Architectures to Enable Data Intensive Science • Topic 2-2: Open Source Solutions for Analyzing Big Earth Observation Data • Topic 2-3: Technology Trends for Big Science Data Management Big Data Science for Astronomy & Space and ESIP Earth Sciences Data Analytics
DRAFT Future Meetups • August 4: Keynote and Panel: COM.BigData 2014 • September 8: OSTP / NITRD FASTER CoP • October 6: Wolfram Language and Michael Daconta, Build a Knowledge Base with the my (experimental) software EzKb • November 3: Georgetown Massive Data Institute • December 1: NSF GEO/EarthCube and ICER (Integrative and Collaborative Education & Research)
Keynote and Panel: COM.BigData 2014Next Meetup: August 4-6 http://www.com-geo.org/conferences/2014/prog_keynotes.htm
Michael Daconta: EzKb 1 • Install the Easy Knowledge Base Editor (EzKb) Today! • For an online article I am writing for my Goverment Computer News column (called reality check), I am releasing an alpha release of my Java software called the Easy Knowledge Base Editor (or EzKb for short). I created a Windows installer and Windows executable that you can download here. If there is interest I will create a manual install with the Jar files for Linux and MacOS installations. • There are some help files (but I need to create many more) and some youtube videos on my youtube channel to give you an introduction to the software and some of the things that it can do. Be warned that this is alpha software so it is not feature complete. It has integrated maps and an integrated wordnet dictionary. • The way to think about this software is that each tab represents a layer in your knowledge base, starting from the smallest layer (a single fact) to more complex layers like things (aka Entities or Classes), to a relationship editor (connect things to create relationships) to rules (if-then constructs) to triggers (when to execute rules) and many other items (like a .csv file import). I have a grand vision for this software and frankly, not enough time to actually create what I envision. So, I am releasing it in this alpha state and then will continue to improve it (as time permits). • Enjoy! http://www.daconta.us/Articles/EzKb-Installer.html
Michael Daconta: EzKb 2 • 16:35: Demonstration of Inferencing in EZKB by Michael Daconta • Demonstrate the creation of rules and inferencing using a Genealogy Knowledge Base. NOTE: recommend viewing this screen ... • https://www.youtube.com/watch?v=7Bh-bbF6Lsc • 5:14 : Demonstration of Creating Triggers in EZKB by Michael Daconta • Triggers enable you to fire actions and rules that act against your knowledge base. This video walks you through the detail of ... • https://www.youtube.com/watch?v=7glXwc3bbkM • 7:42: Demonstration of Creating Classes and Inheritance in EZKB by Michael Daconta • How to use EZKB to implement inheritance of attributes in modeling your data. NOTE: recommend viewing this screen cast in ... • https://www.youtube.com/watch?v=HRIsLL16wi0 • 3:52: Demonstration of Alert Triggers and Rules in EZKB by Michael Daconta • This video walks you through creating a trigger and alert to monitor a credit card statement for purchases that exceed a threshold. • https://www.youtube.com/watch?v=eSe_j5fBEhM • 7:42: Intro to Facts in EZKB by Michael Daconta • A basic explanation of adding facts to your knowledge base. NOTE: recommend viewing this screen cast in Full-Screen Mode for ... • https://www.youtube.com/watch?v=qB_Kzoo5VJ8
Data Science Challenges • DataBay "Reclaim the Bay" Innovation Challenge: • August 1-3, 2014, Smithsonian Environmental Research Center, 647 Contees Wharf Rd, Edgewater, MD 21037 • http://databay.splashthat.com • http://semanticommunity.info/Data_Science/Data_Science_for_DataBay • HHS VizRisk Challenge is the first-ever government hackathon that brings together students, designers, coders, healthcare enthusiasts, and government officials to produce visualizations of behavioral health data to inform personal and policy decisions. • Open for submissions July 28th - October 28th: • directors@hhsvizrisk.org
OMB Ontology and Ontologizing Memo • In Summary: • An Ontology: • is a formal representation of meaning in an information system; • creates the bridge between the internal world of the computer and the external world of people’s understanding; • provides an inter lingua between disparate data sources and knowledge bases; • allows us to build useful and usable systems for complex tasks in health care. • Remember: • don’t try to divorce the Ontology from its application (the ‘universal ontology’) • building and embedding an Ontology in a useful application has pitfalls that require judgment, experience, clarity of purpose, and resources. http://semanticommunity.info/Data_Science/NSF_Strategic_Plan#Ontology_and_Ontologizing
Agenda • Silver Line Metro For OMB Ontology Memo from Multiple Experts (Blevins, Morosoff, & Pohl) • Agenda: • 6:30 p.m. Welcome and Introduction (New Tutorial and Mentoring) • 6:45 p.m Peter Morosoff, President, E-MAPS, Inc. • 7:00 p.m Jens G. Pohl, PhD, Professor of Architecture, Emeritus, California Polytechnic State University, Senior Director, Adaptive Systems, Tapestry Solutions (a Boeing Company) Intelligent Information Management Tools in a Service-Oriented Software Environment • 7:30 p.m. Brief Member Introductions • 7:45 p.m. David Blevins, Staff Engineer, Booz Allen Hamilton. Currently supports Life Sciences research performed by the Federal Government. Ontologies in Medical Care and Integration/Reuse Challenges at the Clinical and Enterprise Level • 8:15 p.m. Open Discussion • 8:45 p.m. Networking • 9:00 p.m. Depart