110 likes | 378 Views
Large Scale Computing Systems. A new ERA: BIG x 3. Revisit: algorithms, architectures, distributed systems, parallel computing, scalable DBs. Data Computations Infrastructures. Big Data. ‘ Moore's ’ Law: Data doubles every 18 months 90% of today’s data was created in the last 2 years
E N D
Large Scale Computing Systems A new ERA: BIGx 3 Revisit: algorithms, architectures, distributed systems, parallel computing, scalable DBs Data Computations Infrastructures
Big Data • ‘Moore's’ Law: Data doubles every 18 months • 90% of today’s data was created in the last 2 years • Facebook: 20TB/day compressed • CERN/LHC: 40TB/day (15PB/year) • NYSE: 1TB/day • Many more • Web logs, financial transactions, medical records, etc
Data Growth 1 EB (Exabyte-1018) = 1000 PB (Petabyte-1015) Last year (2010) US mobile data traffic 0.8 ZB (Zettabyte) = 800 EB Entire global mass of digital data in 2009 according to IDC 35 ZB (Zettabyte-1021) IDC’s forecast for all digital data in 2020
MapReduce • A programming model • A software framework for writing applications that • rapidly process vast amounts of data in parallel • on large clusters of compute nodes
Cloud computing • Big Data pushes databases to their limits • NoSQL databases • Horizontal scalable schema-free multi-datacenter data stores that can handle PB of data • Google’s BigTable, Facebook’s Cassandra, LinkedIn’s Voldemort, Amazon’s Dynamo, and many more • Cloud Computing • Virtualized resources from distant data centers • Elastic and “pay as you go” resource provisioning • Easy resource manipulation through an API
Big computations Challenges for exascale computing: Scalability up to millions of cores Programmability (revisit traditional parallel programming models) Fault tolerance (in thousands or millions of nodes, several may fail every day) Low power consumption (maximize GFLOP/WATT) It’s notHigh-Performance Computing (HPC) anymore… it’s High-Efficiency Computing (HEC)
Exascale applications Computations on sparse matrices: The heart of scientific and engineering simulations (Huge) Graph algorithms: Shortest paths, PageRank, etc Regular grids: solving PDEs with millions of unknowns
Big Infrastructures OS, Architectures revisited Virtualization Cloud Facilities - Datacenters Distributed storage: 100’s PBs using commodity disks HPC clusters: Exascale computing using scalable ‘ingredients’