1 / 8

Large Scale Computing Systems

Large Scale Computing Systems. A new ERA: BIG x 3. Revisit: algorithms, architectures, distributed systems, parallel computing, scalable DBs. Data Computations Infrastructures. Big Data. ‘ Moore's ’ Law: Data doubles every 18 months 90% of today’s data was created in the last 2 years

tavi
Download Presentation

Large Scale Computing Systems

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Large Scale Computing Systems A new ERA: BIGx 3 Revisit: algorithms, architectures, distributed systems, parallel computing, scalable DBs Data Computations Infrastructures

  2. Big Data • ‘Moore's’ Law: Data doubles every 18 months • 90% of today’s data was created in the last 2 years • Facebook: 20TB/day compressed • CERN/LHC: 40TB/day (15PB/year) • NYSE: 1TB/day • Many more • Web logs, financial transactions, medical records, etc

  3. Data Growth 1 EB (Exabyte-1018) = 1000 PB (Petabyte-1015) Last year (2010) US mobile data traffic 0.8 ZB (Zettabyte) = 800 EB Entire global mass of digital data in 2009 according to IDC 35 ZB (Zettabyte-1021) IDC’s forecast for all digital data in 2020

  4. MapReduce • A programming model • A software framework for writing applications that • rapidly process vast amounts of data in parallel • on large clusters of compute nodes

  5. Cloud computing • Big Data pushes databases to their limits • NoSQL databases • Horizontal scalable schema-free multi-datacenter data stores that can handle PB of data • Google’s BigTable, Facebook’s Cassandra, LinkedIn’s Voldemort, Amazon’s Dynamo, and many more • Cloud Computing • Virtualized resources from distant data centers • Elastic and “pay as you go” resource provisioning • Easy resource manipulation through an API

  6. Big computations Challenges for exascale computing: Scalability up to millions of cores Programmability (revisit traditional parallel programming models) Fault tolerance (in thousands or millions of nodes, several may fail every day) Low power consumption (maximize GFLOP/WATT) It’s notHigh-Performance Computing (HPC) anymore… it’s High-Efficiency Computing (HEC)

  7. Exascale applications Computations on sparse matrices: The heart of scientific and engineering simulations (Huge) Graph algorithms: Shortest paths, PageRank, etc Regular grids: solving PDEs with millions of unknowns

  8. Big Infrastructures OS, Architectures revisited Virtualization Cloud Facilities - Datacenters Distributed storage: 100’s PBs using commodity disks HPC clusters: Exascale computing using scalable ‘ingredients’

More Related