350 likes | 572 Views
Big Data and Analytics on Vblock business Decision Maker Deck. Name February 2014. It being fundamentally disrupted. MOBILITY. CLOUD. BIG DATA. New Applications. Speed. Economics. Internet of things. Team Skills. Complexity.
E N D
Big Data and Analytics on Vblockbusiness Decision Maker Deck Name February 2014
It being fundamentally disrupted MOBILITY CLOUD BIG DATA New Applications Speed Economics Internet of things Team Skills Complexity
Extend our leadership in Converged infrastructure to bring Big Data to core VCE helps transform business with big data and analytics on Vblock • The following products/services are NOT OPEN for bidding in this RFP: • Products • Any technology that clashes with a current bank strategic platform (MPP database for example) • Commodity or Appliance Hardware • Source: Request for Proposal - Global 100 Bank, December 13, 2013 2013 is the year of larger scale adoption of big data. 42% of respondents invested in big data or were planning to do so within a year.(Source: Doug Laney, “Predicts 2013 information innovation March 13) Industrial Internet Will Add $15 Trillion to Global GDP by 2030(Source: GE April 13) Source: McKinsey, Game changers: Five opportunities for US growth and renewal, July 2013
Did you know 76% of VCE customers are currently using (35%) or considering using (41%) Vblock for big dataand Analytics in 2013?
Customers run Big Data and Analytics with mixed workloads on Vblock Vblock System for Databases Use Cases (n=171) Vblock System for Private Cloud Use Cases (n=172) Vblock System for App Dev/Test Use Cases (n=158) Vblock System for Microsoft Suite Use Cases (n=156) Vblock System for SAP/ERP Use Cases (n=130) Vblock System for Business Continuity/Disaster Recovery Use Cases (n=160) Vblock System for VDI Use Cases (n=160) Vblock System for Big Data/Analytics Use Cases (n=118) Vblock System for Remote/Branch Office Use Cases (n=128) Source: VCE 2013 Customer Satisfaction & Loyalty Survey
Why Customers interested in running big data and analytics • Virtualization • VblockSystems Run multi-tenant service model Speed Deployment Rationalize HW/SW Costs Aggregate Workloads Increase Utilization Optimize and Standardize Infrastructure Provision Faster Lower Operational Cost Improve Quality of Service and Availability Secure Environment Promote Innovation Pre-Engineered,Pre-Validated, Pretested
VCE customer benefits • Results of September 2013 IDC research study • 4X FASTER TIME TO DEPLOYMENT • 5X FASTER TIME FORNEW SERVICES • 25 days down to 5 days • 160 days down to 40 • 79% less staff effort • 96% REDUCTION • IN DOWN TIME • 50% REDUCTION • OF ANNUAL DATA CENTER COSTS
Challenges that can not be solved by commodity infrastructure • Unable to impact business with siloed“big data” projects and shadow IT, away from enterprise IT • Lack of flexibility in moving to IT-as-a-service – virtualization? Multi-tenancy? • Security and privacy concerns in commodity infrastructures, cannot run as a service • Availability and scalability concerns moving to production environment • Not getting value from our big data initiative with limited maturity • Unable to decouple server/storage and scale tuned to business demand The Garter survey based on 720 members worldwide found that of the64 percent of organizations investing or planning to invest in big data technology in 2013, 30 percent have already invested in big data technology, 19 percent plan to invest within the next year, and an additional 15 percent plan to invest within two years. (Source: Gartner September 2013)
Converging your investments to drive value from big data Hadoop Big data into core business Evolve analytics with big data for high value High Availability and Performance Pre-engineered system with best-of-breed technologies As-a-Service operation Virtualization, multi-tenancy, back-up recovery and business continuity Standardization and flexibility Standardized across data centers, adaptable to logical and physical change Private Public Incremental deployment Ease of scaling compute and storage and mix and match technology components Security, governance, and compliance Oversight and visibility over disparate data sources Next Generation of IT New Approach: Embrace existing and new data and applications on an open, converged platform
VBLOCK™ extended to drive value from big data and analytics Architected to drive value from shared infrastructure model Ideal for as a service model with multi-tenancy and virtualization enablement Converge data and application investments incrementally • Mission critical availability • Enterprise-class data protection • Open, mixed workload ready • Native Hadoop integration - HDSF with Isilon • Virtualization and multi-tenancy with vSphere big data extension • Lifecycle System Assurance • Lower TCO – operations, process, and licensing • Independent scaling of compute and storage • Reuse existing skills for Vblock 300 and 700 family
Target use cases to combine All Data Horizontal Risk and Fraud Management Customer & Offer Management Logistics & Operations Process Efficiency Price Optimization Vertical Financial Services Retail/Online Services Healthcare Oil, Gas, & Energy Telcos Unstructured data – logs, POS, images, video/audio, locational, network genomics, email, docs, etc. Structured data – customer, patient, product, supplier, purchase, financial, risk, etc. Multi/structured, Semi-structured or industry standards (HIPAA, HL7, NCPDP, ACORD, DTCC, EDI-X12, EDIFACT, SWIFT, FIX, NACHA)
Embrace All Data on a converged Platform for Big data and analytics Open to other source Applications/Systems virtualization VMware vSphere including Big Data Extension (BDE) Pivotal Analytic Database (Greenplum) Pre-loaded Server Cisco Unified Computing System (UCS) servers VMware vSphere Server Flash - XtremSF/SW (Optional, Phase 2) Cisco Data Center and Cloud Networking (DCN) portfolio Network Server Cisco B-series Network Nexus 5K, MDS, Infiniband (contingency) Storage EMC Symmetric VMAX, VNX and Isilon VNX Shared Protection EMC Avamar, Data Domain, VPLEX, RecoverPoint
HDFS: INTEGRATED ISILON WITH HADOOP Web Click data Hadoop Cluster NFS MAP Reduce node info Step 2: Jobs are run MAP Reduce node info Decision Support Databases MAP Reduce node info MAP Reduce node info SMB, NFS, HTTP, FTP, HDFS OLAP data node name node name node name node name node Step 1: Much or all of the Data lives on the Isilon/Hadoop Cluster Isilon EDW
Cost Comparison: DAS vs Isilon Customer requirements • 640 TB raw capacity DAS Option • 14.8% usable capacity/DataNode • 38 racks of servers Isilon Option • 10 Racks (including Compute) • 65% less expensive than DAS $ 6M Network HadoopLicensing $ 5M Management $ 4M Config Installation $ 3M Energy Isilon $ 2M Servers $ 1M $ 0 Hadoop on Isilon is often significantly less costly!
VIRTUALIZE HADOOP ON VBLOCK Drive Operational Simplicity with Performance Maximize Resource Utilization Architect Scalable Platform Node reply Node reply Node reply Node reply PIVOTAL Hortonworks NFS SMB Virtualization SMB, NFS, HTTP, FTP, HDFS SMB name node HDFS data node name node NFS name node name node Isilon name node vSphere Big Data Extension VMAX / VNX name node name node Apache
Virtualized Hadoop - vSphere Big Data Extensions Value Propositions Drive Operational Simplicity with Performance Maximize Resource Utilization Architect Scalable Platform • Rapid Deployment • Self service tools • Performance • True multi-tenancy • Elastic scaling • Avoid dedicated hardware • VM-based isolation • Increase resource utilization • Deployment choice • Maintain management flexibility at scale • Control Costs • Leverage vSphere features
Why HDFS on Isilon with Vblock • No Ingest necessary • Eliminate NameNode SPOF • Eliminate 3x mirroring • Enterprise feature set • Multi-protocol access • Simultaneous Multi-distribution support • Better cost! • SmartDedupe for Hadoop • SEC 17a-4 Compliant WORM • Kerberos Authentication • Hadoop Multi-tenancy • Supports All Major Distros—Apache, Pivotal, Hortonworks, Cloudera • Great performance! Get HDFS license key included in One FS – no additional license cost • New Way Forward: Deploying Hadoop on Vblock with Isilon as a shared infrastructure
VBLOCK™ System evolving to accommodate big datA and analytics • Workload patterns change over time • Shifting resources need flexible points of scalability • Simplified deployment, management,and ongoing operations MODULAR SCALABLE Designed for the next gen data center
Integrated data protection Option VBLOCKTM DATA PROTECTION • Continuous availability, best in class system uptime • Advanced de-duplication • Optimum resource utilization • Eliminate risk, faster time to value • Support for entire Vblock Systems portfolio Avamar 7 plus Data Domain EMC RecoverPoint EMC VPLEX
CONCLUSION and NEXT STEPS • In summary • Technology inflection point is here – opportune time to rethink infrastructure • VCE and our investors, EMC, Cisco and VMware to work with you on your journey • Focused on helping you drive value from big data and analytics while leveraging your existing investments • Calls to Action • Identify your focus areas and initiatives • Deep dive into your objectives and challenges • Map your priority engagements to VCE solutions
Isilon OneFS: Built for Big Data Massive Scalability • Up to 20PB in a single file systems • Unmatched Performance • Up to 118 GB/s of concurrent throughput • Up to 740 MB/s single stream throughput • Up to 25.1 TB Global Cache Application and Workflow Consolidation Industry-leading Reliability and Self-Healing Management Simplicity – automates activities “unfit for humans” • Symmetric scale-out architecture • Fully distributed, fine-grained services • Unified IP storage (NFS, SMB, Object, HDFS)
Independent Scaling Before • Storage to Compute ratio is fixed • Scaling compute means scaling capacity • Difficult to provide QoS • Compute upgrade is a forklift Required performance/capacity • After Storage Required Hadoop Cluster Nodes • Scale compute independent of storage • Achieve optimal performance balance even as workloads evolve • No data migrations, ever! • Add new performance as hardware evolves Compute
Snapshot/Version Control Before • Traditional HDFS does not have replication • No Snapshotting of data • Loss of Version control • Not designed for Mission Critical • After • Full Snapshot IQTM integration identifies changes • Multi-threaded, Multi-Node Scale-Out replication • Improved RPO/RTO for business continuity • Geo-replicated Hadoop!
Vblock Systems • True Converged infrastructure SOLUTIONS & SERVICES VCE & PARTNERS SYSTEM720 SYSTEM100 SYSTEM200 SYSTEM340 SPECIALIZEDSYSTEMS VCE Vision™ Intelligent Operations software
SAME Sizing outcome you Look for in Big Data and Analytics virtualization • vSphere, including Big Data Extension (BDE) for Hadoop • Volume – higher transaction volume, change data, # of queries, etc. • Velocity – lower response time, latency, and coordination of batch/real-time • Variety – data types, block/files. eg. XML behaves differently from image Open to other source Applications/Systems Server • B-Series (baseline) or C-Series • # of nodes, CPU, Memory, Disk Pivotal Analytic Database (Greenplum) Pre-loaded Network • Confirm no customization necessary VMware vSphere Server Flash - XtremSF/SW (Optional, Phase 2) Server Cisco B-series Storage • EMC Symmetric VMAX, VNX and Isilon • Raw Storage Capacity Required Network Nexus 5K, MDS, Infiniband (contingency) • EMC Avamar, Data Domain, VPLEX, RecoverPoint • RPO, RTO, Distance and Volume VNX Shared Protection KEY – Pay attention to workloads (visualization/analytic apps, MPP databases, Hadoop, HDFS, HBase, name/data nodes, etc. ) and translate metrics into sizing
If virtualized : Take CPU requirement and convert it to spec int. Provide enough processors to serve total spec int requirement Calculate nodes per blade based on CPU requirement Calculate memory per blade based on nodes per blade(ensure division provides whole number of node memory) Use 5400 or 5600 VNX array if blade limit doesn't impact the solution Provide boot space for ESXi Provide space for local disk on node in VNX Attach isilonconfig as parent quote Ensure additional nexus 5k, and enough twin ax cables are added Ensure enough physical ports are provided on the Vblock nexus 5k for uplinks from isilon 5k Size DR solution based using standard methods, ensure that ports used for DR aren't consumed by isilon uplinks If not virtualized Provide blades to match CPU, and memory requirement (blades may provide too much CPU) Configure the remainder with notes from virtualization Considerations for Big Data Workload • Review whitepaper on Isilon attach to 300/700 • Contact EMC isilon SE to assist with holding isilonconfig
DAS = direct attached storage Styles of Big Data & Analytics on Vblock HDFS = Hadoop distributed file systems
DAS = direct attached storage FEX = Fabric extender VCE Distinct Benefits to Business ✔ ✔ ✔ ✔ ✔ ✔
DAS = direct attached storage FEX = Fabric extender VCE for Virtual shared infrastructure GAP
Example Solution Architecture Open to other source Applications/Systems Pivotal Analytic Database (Greenplum) Pre-loaded VMware vSphere Server Flash - XtremSF/SW (Optional, Phase 2) Server Cisco B-series Network Nexus 5K, MDS, Infiniband (contingency) VNX Shared Note: Detailed architecture would vary based on sizing and customer environments
Solution leverages Current Vblock Big Data and Analytics on Vblock http://media.vceview.com/documents/isilon-with-vblock-300-700-deployment.pdf Open to Hadoop Distributions Applications/Systems VMware vSphere inc. Big Data Extension (Serengeti) for virtual provisioning Server Flash - XtremSF/SW (Optional, Phase 2) Server Cisco B- or C-series Network Cisco unified networks VB340 or VB720 • Virtual, Shared infrastructure to Support Mission-Critical Deployment VNX (or VMAX) Pivotal DB/Structured Data Boot/Virtualized LUN Isilon Native HDFS integration Base configuration Isilon