570 likes | 825 Views
On Big Data. Patricia Florissi, Ph. D. VP - Americas & EMEA CTO. April, 2012. IN 2010 THE DIGITAL UNIVERSE WAS 1.2 ZETTABYTES. 1,200,000,000,000,000,000,000 Zetta Exa Peta Tera Giga Mega Kilo Byte. Source: 2010 IDC Digital Universe Study.
E N D
On Big Data Patricia Florissi, Ph. D. VP - Americas & EMEA CTO April, 2012
IN 2010 THE DIGITAL UNIVERSE WAS 1.2 ZETTABYTES 1,200,000,000,000,000,000,000 ZettaExaPetaTera Giga Mega Kilo Byte Source: 2010 IDC Digital Universe Study
The Data Deluge This Decade 2020 44 TIMES LARGER 35.2 ZB INFORMATION GROWING 20090.8 Zettabytes WORLDWIDE IT STAFFING WILL GROW BY LESS THAN 50% Source: IDC Digital Universe Study, sponsored by EMC, 2011
From Information Deluge To Big Data Agenda • How Did We Get Here? • What Is Big Data Anyway? • Does Big Data Matter? • Who Said It Mattered To Brazil and the USA?
Waves Of Change You Are Here! Networked/ DistributedComputing PC/ Microprocessor Minicomputer Mainframe
Dramatic x86 Performance Growth 2000% Performance Increase Since 2005 Xeon E7-4800 Ten Core 32nm Xeon 7500 Eight Core 45nm Xeon 7400 Six Core 45nm Xeon 7300 Quad Core 65nm Xeon 7100 Dual Core 65nm Xeon 7040 Dual Core 90nm Xeon 3.66 GHz Single Core 90nm Source: Intel internal OLTP database workload performance estimates as of 15 April 2011. Results have been estimated based on internal Intel analysis and are provided for informational purposes only. Any difference in system hardware or software design or configuration may affect actual performance.
Dominant Market Share For x86 x86 As A Percent Of Worldwide Server Shipments 1989 1995 2000 2005 2010 Unit Share Revenue Share Source: IDC
Flash Fills The Performance Gap Processors 1,000,000,000s DRAM Memory 100,000,000s OPERATIONS PER SECOND Flash 100,000s Hard Disk 100s Picoseconds Nanoseconds Microseconds Milliseconds Seconds LATENCY
Then Came Virtualization Old World – Physical New World – Virtual Dynamic Pools Of Compute & Storage Dedicated, Vertical Stacks
Waves Of Change CloudComputing Networked/ DistributedComputing PC/ Microprocessor Minicomputer Mainframe
What’s Driving The Data Deluge? Video Rendering VideoSurveillance Social Media Mobile Sensors Gene Sequencing Medical Imaging GeophysicalExploration Smart Grids COST TO SEQUENCE ONE GENOME HAS FALLEN FROM $100M IN 2001 TO $10K IN 2011 READING METERS EVERY 15 MINS. IS 3,000X MORE DATA INTENSIVE FACEBOOK GROWS BY 250 MILLION PHOTOS / DAY
Big Data Refers To… • All Data that comes at high Volume • All Data that comes at high Velocity • All Data that comes from a Variety of Sources • All Data that brings Complexity • All Data that challenges existing Information Infrastructure Capabilities • All Data that makes us “Think Different” Today
Big Data Is A Relative Concept What is Big Today… May Not Be So Big Tomorrow….
Data Sources Are Expanding INFORMATION IN THE ENTERPRISE WILL • GROW 50X • IN THE NEXT 10 YEARS Source: 2011 IDC Digital Universe Study
Big Data Applications Structured Data Unstructured Data Semi-Structured Data Telco Billing Retail POS Sales Forecast Gene Sequencing Movie Editing Seismic Study Social Media Clickstream Productivity Web Content Storage Services Social Media Clickstream Productivity HybridCloud
The Complexity Of Big Data Structured Data Unstructured Data Semi-Structured Data Telco Billing Retail POS Sales Forecast Gene Sequencing Movie Editing Seismic Study Social Media Clickstream Productivity Web Content Storage Services • The Service could not have been better • The service could have been better • The service could have been better, • even if they were dead
90% OF THEDIGITAL UNIVERSE IS UNSTRUCTURED Source: 2011 IDC Digital Universe Study
Massive Numbers Of Massive Files Files In The Digital Universe Big Data Applications 500 Quadrillion 5+ TB Source: 2011 IDC Digital Universe Study, EMC Customers
Record File System IO Performance Single File System 1,100,000+ 636,036 403,326 190,675
Record File System Capacity Single File System 15 PB 2PB 100 TB 64 TB Source: Vendor Product Specifications
Big Data Is About Predictive Analytics
Old Analytic Processes Administrator Bottleneck Reactive, Unresponsive Opaque, No Collaboration
New Analytic Processes Are Different Self-Service Iterative, Agile Transparent, Real Time Collaboration
How Can Big Data Transform Your Business? New Source of Customer, Product and Operational Insights Today’s Decision-making Big Data Decision-making “Forward-looking” insight Exploit all data from diverse sources Real-time, correlated “Rearview Mirror” hindsight Less than 10% of available data Incomplete, disjointed, inaccurate
Big Data Apps Require Big Data Analytics Your Approach To Business Analytics Must Change Limited, Pre-Defined Expansive, Iterative Slow & Reactive Limited Insight Risky Shadow Repositories Agile & Proactive Expanded Insight & Correlation Improved Compliance
The White House Big Data R&D Initiative “The initiative we are launching today promises to transform our ability to use Big Data for scientific discovery, environmental and biomedical research, education, and national security.”
White House Big Data Call To Action March 29th, 2012 To Action To Knowledge From Data
CLOUD TRANSFORMS IT BIG DATA TRANSFORMS BUSINESS
Big Data At Every Single Step Seismic: Pre-stack Velocity Data Interpretation Geologic Model Navigation Culture Data Log Curves Pressure Data Seismic: Post-stack
Oil & Gas Going Through Severe Levels of Complexity Thought Leadership Bigger data at Higher quantities analyzed More scientifically moving at Wider distances needing to be Better managed accessible over Longer lifecycles of time Innovation Trends: • Higher precision images • More measures, more often, more places • More iterations • Longer production periods • More scarcity of even morespecialized skills • Greater collaboration and automation • More analytical processes over longer
EMC Enables Next Gen Upstream Upstream Explore Develop Produce • Seismic Acquisitionand Processing • Seismic & GeologicalInterpretation • ReservoirModeling • Reservoir Simulation • ReservoirManagement Pre-stacked Data Post-stacked Data Interpreted Image Modeled Image Simulation Image Reservoir Image
EMC Enables Next Gen Interpretation • Seismic & Geological Interpretation • SCIENTIFICWORKFLOWMANAGEMENT
EMC Defines Next Next Gen Upstream Explore Develop Produce
EMC Next Next Gen Big Data Science Explore Develop Produce
Brazil Develops A Robust Ecosystem “Intellectual growth should commence at birth and cease only at death.” UFRJ Campus TechnologyPark
EMC to open an R&DCenter in the University Technology Park in Rio Co-located on the University Campus with Petrobras R&D (CENPES) Neighbor to Schlumberger, Landmark/Halliburton & GE 30+ Big Data Scientists Collaborating with 50 others on Campus Partnering with Intel & Cisco Research on Oil & Gas Acquisition, Analysis, Collaboration & Visualization of Seismic Data EMC’s Expansion in Brazil Petrobras R&D Expansion Technology Park UFRJ Campus