200 likes | 294 Views
NFAIS 2012 Annual Conference. The Outlook for Big Data. Chris Greer Information Technology Laboratory National Institute of Standards and Technology. Article I, Section 8: The Congress shall have the power to… fix the standard of weights and measures. Mission:
E N D
NFAIS 2012 Annual Conference The Outlook for Big Data Chris Greer Information Technology Laboratory National Institute of Standards and Technology
Article I, Section 8: The Congress shall have the power to…fix the standard of weights and measures Mission: To promote U.S. innovation and industrial competitiveness by advancing measurement science, standards, and technology in ways that enhance economic security and improve our quality of life. National Bureau of Standards established by Congress in 1901 Designated the National Institute of Standards and Technology in 1988
IT Measurement and Testing Mathematical and Statistical Analyses for Measurement Science Technology Development Modeling and Simulation for Measurement Science IT Standards Development and Deployment
Big Data - Definition • Data mass • Volume, velocity, and/or complexity • Data-enabled analytics • Correlation and inference analyses enabled by data mass Data mass and/or data analytics that are beyond the capacity of your current system
Big Data - Volume Information Petabytes Worldwide Available Storage Source: John Gantz, IDC Corporation, The Expanding Digital Universe
Big Data - Volume Source: IDC Corporation, Worldwide Information Growth Ticker, Feb 2012
Big Data - Velocity LSST: “Suspended between its vast mirrors will be a three billion-pixel sensor array, which on a clear winter night will produce 30 terabytes of data. In less than a week this remarkable telescope will map the whole night sky …. And then the next week it will do the same again … building up a database of billions of objects and millions of billions of bytes.” Nature 440:383 • Sloan Digital Sky Survey • 140 Terabytes, year 2000 to present • LSST – Large Synoptic Survey Telescope • Expect 140 Terabytes every 5 days • Square Kilometer Array • Expect 140 Terabytes every 3 sec
Big Data - Complexity Combining Structured and Unstructured Data
Big Data – Volume, Velocity, and Complexity The Department of Defense’s ARPANET project, launched in 1966 to explore methods for “resource sharing among computers”, initially connected 4 nodes. Today’s Internet links more than 2.2 billion users over more than 200,000 networks worldwide; with 14 new users added every second.
Big Data - Analytics Writing in a recent issue of the journal Science, Hod Lipson and Michael Schmidt describe how they programmed a computer to take unstructured and imperfect lab measurements from swinging pendulums and mechanical oscillators and, with just the slightest initial direction - and no knowledge of physics, mechanics, or geometry - derive equations representing fundamental laws of nature. Source: Gary Anthes, Communications of the ACM, Vol. 52 No. 11
Big Data - Analytics The End of Theory: The Data Deluge Makes the Scientific Method Obsolete Google's founding philosophy is that we don't know why this page is better than that one: If the statistics … say it is, that's good enough. No semantic or causal analysis is required. That's why Google can translate languages without actually "knowing" them (given equal corpus data, Google can translate Klingon into Farsi as easily as it can translate French into German). - Chris Anderson Wired Magazine 06.23.08
Recommendations: • Design and organize for data agility • Treat data as assets
Design and Organize for Data Agility Over the next decade, the number of servers (virtual and physical) worldwide will grow by a factor of 10, the amount of information managed by enterprise [and cloud] datacenters will grow by a factor of 50, and the number of files the datacenter will have to deal with will grow by a factor of 75, at least. J. Gantz and D. Reinsel, Extracting Value from Chaos, IDC Corp., June 2011
NFAIS 2012 Annual Conference Do the capabilities of your current, in-house IT systems meet the big data needs of your organization? • Yes 2. No
Design and Organize for Data Agility The iPlant platform helps researchers use tools and data more easily and efficiently. It provides sustainable access to high performance computing, interoperable software analysis, and large data sets. Source: Frontiers in Plant Science, SA Goff et al., 25 Jul 2011; www.iplantcollaborative.org
Design and Organize for Data Agility I.B.M., seeing an opportunity in data-hunting services, created a Business Analytics and Optimization Services group in April. The unit will tap the expertise of the more than 200 mathematicians, statisticians and other data analysts in its research labs — but that number is not enough. I.B.M. plans to retrain or hire 4,000 more analysts across the company. S. Lohr, New York Times, Aug. 5, 2009
NFAIS 2012 Annual Conference Does your organization employ any mathematicians or statisticians? • Yes 2. No
Treat Data as Assets • Organizational Data Policy • Data Management Plans • Risk Management Plans • Designed-in Information Security
NFAIS 2012 Annual Conference Does your organization have a formal data management plan describing preservation, access, and use policies? • Yes 2. No
Thank you! Contact information: chris.greer@nist.gov