1 / 22

Big Data’s Virtualization Journey

Big Data’s Virtualization Journey. Andrew Yu Sr. Director, Big Data R&D VMware. Big Data: Not Just for the Web Giants – Now the Intelligent Enterprise. Real-time analysis allows instant understanding of market dynamics.

shaina
Download Presentation

Big Data’s Virtualization Journey

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Big Data’s Virtualization Journey Andrew Yu Sr. Director, Big Data R&D VMware

  2. Big Data: Not Just for the Web Giants – Now the Intelligent Enterprise

  3. Real-time analysis allows instant understanding of market dynamics. Retailers can have intimate understanding of their customers needs and use direct targeted marketing. Market Segment Analysis  Personalized Customer Targeting`

  4. The Emerging Pattern of Big Data Systems: Retail Example Analytics Real-Time Processing Data Science Machine Learning Real-Time Streams Parallel Data Processing Exa-scale Data Store Cloud Infrastructure

  5. A single GE Jet Engine produces 10 Terabytes of data in one hour – 90 Petabytes per year. Enabling early detection of faults, common mode failures, product engineering feedback. Post Mortem  Proactively Maintained Connected Product

  6. Storage: Plan for Peta-scale Data Storage and Processing Analytics Rapidly Outgrows Traditional Data Size by 100x PB of Data

  7. Change workload types to Real-time Analytics, Machine Learning , Hadoop above cloud infra, too Cloud Infrastructure Supports Mixed Big Data Workloads MachineLearning Compute MachineLearning Real-TimeAnalytics Hadoop Cloud Infrastructure Real-TimeAnalytics Storage/Availability Network/Security Hadoop Management

  8. Change workload types to Real-time Analytics, Machine Learning , Hadoop above cloud infra, too Cloud Infrastructure Supports Multiple Tenants Web User Analytics Historical Customer Behavior Financial Analysis Compute Cloud Infrastructure Storage/Availability Network/Security Management

  9. Software-defined Datacenter: Compute The Core Values of Virtualization Apply to Big Data Agility / Rapid deployment Isolation for resource control and security Storage/Availability Compute Network/Security Lower Capex 1 2 3 4 Operational efficiency Management

  10. Strong Isolation between Workloads is Key Hungry Workload 1 Reckless Workload 2 Nosy Workload 3 Cloud Infrastructure

  11. Consolidation of workloads: Higher Utilization Hadoop 1 Hadoop 2 HBase • Without virtualization • independent Hadoop clusters each have access to fraction of total physical resources • Consolidate and virtualize, • Consolidated cluster has access to entire pool of physical resources • For common use cases, reduce latency on priority jobs on consolidated cluster • Multiple HDFS striped across all physical hosts

  12. Big Data Mix of Workloads NoSQL Cassandra, Mongo, etc Other Spark, Shark, Solr, Platfora, Etc,… Hadoop batch analysis Big SQL Impala, Pivotal HawQ Compute layer HBase real-time queries File System/Data Store Host Host Host Host Host Host Host Virtualization

  13. Software-defined Datacenter: Storage Requirements of Next Generation Storage 10x lower cost of storage Support a variety ofapplication types Storage/Availability Compute Network/Security Handle explosive data growth 1 2 3 4 Solve the privacy andsecurity issues Management

  14. Software-defined Storage Enables Fundamental Economics Traditional SAN/NAS Distributed Object Storage HDFS MAPR CEPH Petabytes Deployed Scale-out NAS Isilon, NTAP

  15. Top of Rack Switch Host Host Host Host Host Host Host Big-Data using Local Disks Servers with Local Disks High Performance 10GBE Switch per Rack 16-24 core server 12-24 SATA 2-4TB Disks 10 GbE adapter iSCSI/NFS for Shared Storage for vMotionetc,…

  16. Big Data Storage Scale-out Network Storage • Hadoop Protocol • Snapshots • Posix Apps • Full NFS Access • Replication • Erasure Coding Elastic Compute Scale-out Network Storage

  17. Customer Success: Hadoop as a Service at FedEx • Elastic vSphere Cluster • Mixed Workloads • vSphere • Existing Rack Mount Servers • Scale-out Isilon Cluster • Shared Data • NAS + Hadoop

  18. Storage Configuration for Data/Compute Separation With Isilon Hadoop Virtual Node 2 Hadoop Virtual Node 3 Hadoop Virtual Node 1 Task- tracker Task- tracker Job- tracker Ext4 Ext4 Ext4 Ext4 Ext4 Ext4 NN NN data node NN Virtualization Host VMDK NN OS Image – VMDK OS Image – VMDK VMDK OS Image – VMDK NN Isilon VMDK NN Shared storage SAN/NAS Temp

  19. Agile Big Data at FedEx

  20. Breakthrough Use Cases • Web Log Analysis • Initial exploration was around detection of mobile devices accessing the website. • Analysis of 570 billion web server log entries took approximately 9 minutes to complete on a small cluster. • ZIP code Analysis • Analysis of data to determine which ZIP codes are the highest source or destination for shipments. • Shipment Analysis • Analysis of shipment information to determine patterns that may delay a package.

  21. Cloud Infrastructure is Ready for Big Data – Are you? Cloud Infrastructure

  22. Q&A

More Related