240 likes | 437 Views
SOA Pilots: Federation of SOA and Semantic Medline. Dr. Brand Niemann Director and Senior Enterprise Architect – Data Scientist Semantic Community http://semanticommunity.info/ AOL Government Blogger http://gov.aol.com/bloggers/brand-niemann/ October 2, 2012.
E N D
SOA Pilots: Federation of SOA and Semantic Medline Dr. Brand Niemann Director and Senior Enterprise Architect – Data Scientist Semantic Community http://semanticommunity.info/ AOL Government Blogger http://gov.aol.com/bloggers/brand-niemann/ October 2, 2012 http://semanticommunity.info/Federal_SOA/14th_SOA_for_E-Government_Conference_October_2_2012 http://semanticommunity.info/Federal_SOA/Federation_of_SOA http://semanticommunity.info/A_NITRD_Dashboard/Semantic_Medline http://semanticommunity.info/Federal_SOA/14th_SOA_for_E-Government_Conference_October_2_2012#Blog_By_Brand_Niemann
Overview • Key People: • Gus Hunt, CTO, CIA • Robert Ames, Senior VP for Technology, In-Q-Tel • Dr. George Strawn, Director, National Coordination Office, Networking and Information Technology Research and Development (NITRD) Program, OSTP, White House • Big Data: • Transactions • Interactions • Conversations • Four Vs: • Volume (Terabytes to Zettabytes) • Variety (Structured to Structured and Unstructured) • Velocity (Batch to Streaming Data) • Value (Worth the Extra Expense? Need Data Scientist)
Intelligence Community Love Big Data • Gus Hunt, CTO, CIA http://gov.aol.com/2012/03/13/why-the-intelligence-community-loves-big-data/ http://semanticommunity.info/AOL_Government/Intelligence_Community_Loves_Big_Data
Big Data and the Government Enterprise • Robert Ames, Senior VP for Technology, In-Q-Tel • Dr. George Strawn, Director, National Coordination Office, Networking and Information Technology Research and Development (NITRD) Program, OSTP, White House http://semanticommunity.info/AOL_Government/Big_Data_and_the_Government_Enterprise#Story
Big Data Innovation Conference http://analytics.theiegroup.com/bigdata-boston
Big Data Innovation Data Science http://semanticommunity.info/AOL_Government/Big_Data_Innovation#Story-Pre-Summit
Big Data Innovation Dashboard https://silverspotfire.tibco.com/us/library#/users/bniemann/Public?IEGroup-BigDataInnovation-Spotfire
Panève’sZettaLeaf & ZettaTreeProducts • Scalable single level storage • Panève’s scalable single level storage model collapses the server, network, and storage by removing software and replacing them with memory system primitives. This eliminates all network and network-processing overhead associated with accessing storage and delivers a 10,000X increase in raw performance. http://semanticommunity.info/@api/deki/files/19353/exec_summary_20120916.pdf
Big Data in Memory Innovation Story • Met Jef Sharp, President, Panève: • Amazing fast access and massive storage – Big Data Supercomputer on My Mobile Device • John Hopkins University – Blackbook (CIA Cloud) • I suggested: • Greylock Partners - #2 Data Scientist in the World (DJ Patil, Entrepreneur-in-Residence who built the first formal data science team at LinkedIn) • Works for In-Q-Tel (Robert Ames, Senior VP for Technology, In-Q-Tel) • Works for CIA (Gus Hunt, CTO, CIA) • Who Wants Big Data Supercomputer on Mobile Devices
Big Data Innovation Conference Book http://www-01.ibm.com/software/info/rte/bdig/bdwa-7-post.html
Understanding Big Data • Understanding Big Data: • Analytics for Enterprise Class Hadoop and Streaming Data • This book is about big data: Big Data is a Big Deal! • Big data is going to change the way you do things in the future, how you gain insight, and make decisions (the change isn’t going to be a replacement, rather a synergy and extension). This book to help you get quickly up to speed on this technology and to show you the unique things IBM is doing to turn the freely available open source big data technology into a big data platform; there’s a major difference and the platform is comprised of leveraging the open source technologies (and never forking it) and marrying that to enterprise capabilities provided by a technology leader that understands the benefits a platform can provide. • By the time you are done reading this book, you’ll have a good handle on the big data opportunity that lies ahead, a better understanding on the requirements that ensures you have the right big data platform (as opposed to just technology), and have a strong foundational knowledge as to the business opportunities that lie ahead with big data and some of the technologies available. http://semanticommunity.info/@api/deki/files/19341/IML14296USEN.pdf
YARC Data Solutions & Products • Graph analytics at work: finding needles in a needle stack: • Many Big Data problems are about searching for things you know you want to find. It's challenging because the volumes of data make it like searching for a needle in a haystack. But it's easy because a needle and a piece of hay, though similar, do not look exactly alike. • But discovery problems are about finding what you don't know. Imagine trying to find a needle in a stack of needles . That's even harder. How can you find the right needle if you don't know what it looks like? How can you discover something new if you don't know what you're looking for? • In order to find the unknown, you often have to know the right question to ask. It takes time and effort to ask every question and you keep learning as you continue to ask questions. uRiKA dramatically shortens this cycle. In the same amount of time it took you to ask one question, we enable you to ask a thousand questions, making it more likely that you'll discover the answer that gives you a "uRiKA" moment - and helps you gain competitive advantage. • uRiKAspecializes in discovering the unknown, the unpredicted - and completely unexpected. Learn how customers spanning government, financial services and healthcare organizations are able to find needles in needle stacks that change the balance in their favor. • YarcData'suRiKA: Big Data appliance for real time graph analytics (512 terabytes in memory). http://www.yarcdata.com/solutions.html & http://www.yarcdata.com/products.html
$100,000 YarcData Big Data Graph Analytics Challenge • The YarcData Big Data Graph Analytics Challenge will recognize the best submissions of un-partitionable, big data graph problems. The Challenge is open until October 31, 2012, and will award prizes ranging from $3,000 to $70,000. http://www.yarcdata.com/graph-analytic-challenge.html
MITRE Big Data Analytics http://www.mitre.org/work/tech_papers/2012/12_0076/12_0076.pdf http://www.mitre.org/news/digest/advanced_research/06_12/data_analytics.html
October 4th ACT-IAC Big Data Forum! • Questions: • Do you have a Big Data challenge? • Do you face compliance issues from Big Data implementation? • How do you ensure a usable result from your Big Data Project? • Do you want to discuss these and other Big Data issues with more than 20 top level decision makers and thought leaders from government, including the Department of Defense, Homeland Security, and Energy, industry and academia as they explore ways to address the ever expanding challenges surrounding the processing and analysis of Big Data? • Keynote Speaker: • Dr. George Strawn, Director, National Coordination Office, Networking and Information Technology Research and Development (NITRD) Program, OSTP, White House • Venue: Grand Hyatt Washington, 1000 H Street, NW, Washington, DC 20001 • Time: 12:30 – 6 p.m. • MY NOTE: CANCELLED – WORKING TO DO AT 15th CONFERENCE (APRIL 2nd) BASED ON JANUARY 24, 2013, PRESENTATION TO THE FEDERAL BIG DATA SENIOR STEERING GROUP (ALSO DECEMBER 12th BIG DATA PART II). http://semanticommunity.info/Federal_SOA/14th_SOA_for_E-Government_Conference_October_2_2012#Big_Data_Fall_Forum_2012
Innovation by Our Data Science Team • Members: • See below (and anyone else that would like to join us) • Presentations: • Semantic Information Integration within the Healthcare Sector – Eric Little, OrbisTechnologies • Using Semantic Medline on the New Cray Graph Computer for Medical Research – Victor Pollara, Noblis • Panel Discussion: Big Data and the Government Enterprise • Kate Goodier (Moderator, IC), and *Dr. George Strawn (OSTP/NITRD/NCO), Dr. Eric Little (Orbis Technologies), Dr. Victor Pollara (Noblis), *Steve Reinhardt (Cray), and Dr. Tom Rindflesch (NLM) • Please Think of Questions * Note: Gadi Ben-Yehuda replaces Dr. George Strawn, and Mark Guiton replaces Steve Reinhardt.
Eric Little • Eric Little is currently Director of Information Management at Orbis Technologies, Inc., in Orlando, FL. He received a Ph.D. in Philosophy and Cognitive Science in 2002 from the University at Buffalo, State University of New York. He later received a Post-Doctoral Fellow in the University at Buffalo’s Department of Industrial Engineering developing ontologies for multisource information fusion applications (2002-04). Dr. Little then worked for several years as Assistant Professor of Doctoral Studies in Health Policy & Education and Director of the Center for Ontology and Interdisciplinary Studies at D'Youville College, Buffalo, NY (2004-2009). He left academia in 2009 to work as Chief Knowledge Engineer at the Computer Task Group (CTG) before joining Orbis.
Victor Pollara • Dr. Pollara is a Senior Principal Scientist at Noblis’ in the Health Innovation mission area. He applies several decades of experience in theoretical computer science, bioinformatics, knowledge extraction from text, and algorithm design to develop computational solutions for complex, data-driven problems. His current work is focused on applying formal modeling and semantic technologies to large, heterogeneous data sets and experimenting with Noblis’ Cray XMT2 as a multi-billion triplestore server.
Kate Goodier • Ms. Goodier is a senior engineering consultant for the STRATIS division of L-3 Communications. She has more than 20 years of experience in the technical program management and systems development team leadership for both industry and the intelligence community. In addition to technical program management and management support, she has extensive systems engineering and integration experience with in large ACAT I programs. She maintains sponsored accounts in the Joint requirements Oversight Council (JROC) and other knowledge-bases. Ms. Goodier was fifth employee hired at the Center for Information Protection for Dept. of Treasury, FBI, and CIA. She was recognized by the Federal Enterprise Architecture (FEA) Program Management Office (PMO) as an expert in system Data Engineering and developed the Data Reference Model (DRM) version 1.5 Data Description guidance for the FEA. She is a member of the Scientific Committee for the Semantic Technologies in Intelligence, Defense and Security community.
Gadi Ben-Yehuda • Gadi Ben-Yehuda is the Director of Innovation and Social Media for The Center for the Business of Government. • Mr. Ben-Yehuda has worked on the Web since 1994, when he received an email from Maya Angelou through his first Web site. He has an MFA in poetry from American University, has taught writing at Howard University, and has worked in Washington, DC, for nonprofits, lobbying organizations, Fleishman-Hillard Global Communications, and Al Gore's presidential campaign. • Prior to his current position, Gadi was a Web Strategist for the District of Columbia's Office of the chief Technology Officer (OCTO). Additionally, Gadi has taught creative, expository, and Web writing for more than 10 years to university students, private-sector professionals, and soldiers, including Marines at the Barracks at 8th and I in Washington, DC. (The lattermost by far the most disciplined.) • You can follow Gadi on Twitter, read his columns on Huffington Post, and see his posts on GovLoop, and read his blog entries on the IBM Center for the Business of Government site.
Mark Guiton • Mark Guiton serves as Director, Government Relations, responsible for working with federal executive and legislative branch officials on a variety of program, policy and procurement issues as it relates to advanced computing. Prior to joining Cray, Mr. Guiton served as legislative director in the U.S. Congress from 1999 to 2003 with a focus on appropriations and technology matters. From 1995 to 1998, he served as a technology policy advisor working closely with the House Government Management, Information and Technology subcommittee from 1995 to 1998. Before working in Congress, he was a computer programmer/analyst for Shared Medical Systems Corporation (now Siemens). Mr. Guiton received a B.S. in computer science with a concentration in electrical engineering from the University of Scranton, Pennsylvania.
Tom Rindflesch • Thomas C. Rindflesch has a Ph.D. in linguistics from the University of Minnesota and conducts research in natural language processing at the National Library of Medicine. He leads a research group focused on exploiting the Library’s resources to support development of advanced information management technologies in the biomedical domain.
BIG DATA at the Hill • My three suggestions: • What Congress Should Do to Help Big Data • Allow access to confidential data like the Census Data Centers • Allow sharing between statistical agencies • Have a Chief Data Officer that promotes a Federal Data Science Community of Data Scientists and Statisticians • The Federal Government Should First Focus on the Value of Big Data • Hadoop Projects are costing 50 times more than expected • DHS failed fast with a Big Data in the Cloud Project, but quickly and at less cost • Semantic Medline on the Cray Graph Computer in an example of Federal Data Science Team Project with Value • The Federal Government Should Foster Real Innovation with Government Data • Encourage private industry to add value to government data • Consider having the Federal Government's Chief Statistician be the Chief Data Officer • Empower the Government's Data Scientists and Statisticians to Analyze Big Data and Statistical Data http://semanticommunity.info/AOL_Government/BIG_DATA_at_the_Hill