270 likes | 471 Views
Big Data. Santi Apichairojkul System Consultant. 35 ZB. By 2020, the digital universe will be 44 times larger than it was in 2009. 61%. 5-6%. Of executives surveyed want more information when making a decision.
E N D
Big Data Santi Apichairojkul System Consultant
35 ZB By 2020, the digital universe will be 44 times larger than it was in 2009 61% 5-6% Of executives surveyed want more information when making a decision Productivity boost realized by companies that use data-directed decision-making 83% 2X #1 Ranking of Analytics & BI in Gartner CIO survey of top technology priorities for 2012 Of mid-market CIOs surveyed identified analytics as their top-priority investment area IDC estimates that the digital universe will double every 18 months
What’s Big Data? Confidential
BIG Data? • Hardware and software technologies for managing big volumes of data • Datasets whose size is beyond the ability of typical database software tools • Focus on Web 2.0 Technologies • Database Scale-out • Relational Data Analytics • Distributed Data Analytics • Distributed File Systems • Real Time Analytics Confidential
What’s Big Data? Velocity The speed at which the data must be processed and a decision made Variety Volume The range of data, types and structure to the data A large amount of data, growing at large rates Confidential
The ‘Big Data’ Phenomenon • Big Data Drivers • The proliferation of data capture and creation technologies • Increased “interconnectedness” drives consumption (creating more data) • Inexpensive storage makes it possible to keep more, longer • Innovative software and analysis tools turn data into information More Consumption More Devices New & Better Information More Content • Every gigabyteof stored content can generate apetabyte or more of transient data* • The information about you is much greater than the information you create Big Data encompasses not only the content itself, but how it’s consumed *Source: IDC 2011 Confidential
Big Data Solution Requirements • Cost-effectively manage • the volume, variety and velocity of data Process and analyze large, complex data sets…quickly Flexibly adapt to context changes and new data types Confidential
Big Data Retention Solutions Big Data Analytics Solutions Confidential
Dell Big Data Retention Solutions Confidential
Big Data Retention Solution The data sources and tools Reduce Size: Massive patented de-dupe and compression, typically 95-97% storage capacity savings Hardware: Low-cost Dell servers and storage Resources: Eliminates requirements for specialized skillsets, infrastructure, and services Retain Preserve: Maintains record volumes in original format Immutable: Tamper proof worm & audit trails Configurable: User-configurable retention policies Massively Scalable: With no complexity Longevity: Long-term optimized systems. Retrieve Standards: SQL &BI tools via ODBC/JDBC Performant: Fast queries for large complex datasets Flexible: With schema evolution & point-in-time access The Dell Big Data Retention Solution
RainStor Leads Industry with 40X Compression 0 5 3X 6X 7X 10 8X 15 20 25 30 35 40 40X 45 50 FlatfileGzip Hadoop LZO Compressed Relational Columnar Source: Ratios vs. Raw – RainStor Benchmarks using customer data (2011) Confidential
Big Data Analytics Solutions Confidential
What is Apache Hadoop? CORE HADOOP COMPONENTS • Hadoop is a platform for data storage and processing that is… • Scalable • Fault tolerant • Open source Hadoop Distributed File System (HDFS) File sharing and data protection across physical servers MapReduce Distributed computing across physical servers • Scales • economically • Can be deployed on commodity hardware • Open source platform guards against vendor lock • Excels at • complex analysis • Scale-out architecture divides workloads across multiple nodes • Flexible file system eliminates ETL bottlenecks • Consolidates • everything • A single repository for storing and mining any type of data • Not bound by a single schema Confidential
Distributed File System (DFS) Distributed File System (DFS) Traditional • Black Box • Big Iron • Big Disk • General-purpose, standards-based servers, storage, networking • Software that easily scales processing to 1000s of cores/systems Confidential
DFS - Architecture MPP (Massively Parallel Processing) Shared-Nothing Architecture SQL MapReduce Master Severs Query planning & dispatch ... ... Network Interconnect SegmentSevers Query processing & data storage ... ... ExternalSources Loading, streaming, etc. Confidential
Map Reduce Confidential
HDFS & MapReduce - Briefly Confidential
Hadoopin Production Confidential
Dell Apache Hadoop Solution • Petabyte-scale data management – open source distributed files system and computational processing engine called MapReducefor highly scalable data management. • For: • Financial, research institutions, retail, media & entertainment, telcom, government, and health and life sciences • Benefits: • Reliable, scalable, low-cost file storage • Rapid parallel processing of big data • Complements existing data management systems Joint Services & Support Cluster-optimized PowerEdge C + + 6248sw C2100 C2100 C2100 C2100 Dell Cloud Solutions
| Revolution R Enterprise | Big Data Analysis Confidential
What does big data mean to you? • How will you handle your big data? • How do you plan to use analytics in your business? • Are you considering adding analytics to the services you offer your customers? • Who are the decision makers and end users of your BD, BI, &/or analytics? • How are you storing your Big Data?
The power to do more Confidential