290 likes | 422 Views
Distributed Computing Primer. CMSC 491/691 Hadoop-Based Distributed Computing Spring 2014 Adam Shook. Some content adapted from Dr. Kalpakis’s CMSC 621 slides. Agenda. Evolution of Computing Infrastructure Networking Infrastructure Properties of Distributed Systems
E N D
Distributed Computing Primer CMSC 491/691 Hadoop-Based Distributed Computing Spring 2014 Adam Shook Some content adapted from Dr. Kalpakis’s CMSC 621 slides
Agenda • Evolution of Computing Infrastructure • Networking Infrastructure • Properties of Distributed Systems • Example System Architectures
Mainframe – 50s to 70s • Custom hardware • Custom low-level specialized code • Very expensive solutions
Client/Server – 80s to 00s • IT-led architectures • More portable solutions • Scalable solutions based on demand • Reign of the Enterprise Data Warehouse
Cloud – 00s to Today • Consumer-grade infrastructure • Growing IaaS and PaaS markets • Data revolution • Focus on applications and not infrastructure
Where does Hadoop fit? • A piece of your data infrastructure • Can crunch data for analytics • Can expose data for web applications • Exploration of raw data • Augments today’s infrastruture • IMO, a big toolbox that can do a bit of everything
Single Server Server Scale Up HDD CPU RAM NIC HDD CPU RAM NIC Faster CPUs Bigger Storage Scale Out More Servers
Local-Area Network (LAN) Rack Rack Server Server Server Server Server Server Server Server HDD HDD HDD HDD HDD HDD HDD HDD CPU CPU CPU CPU CPU CPU CPU CPU RAM RAM RAM RAM RAM RAM RAM RAM NIC NIC NIC NIC NIC NIC NIC NIC WAN Gateway HDD HDD HDD HDD HDD HDD HDD HDD CPU CPU CPU CPU CPU CPU CPU CPU RAM RAM RAM RAM RAM RAM RAM RAM NIC NIC NIC NIC NIC NIC NIC NIC
Wide Area Network (WAN) London, England New York, NY Beijing, China
Distributed Systems • The development of low-cost powerful microprocessors, together with the invention of high speed networks, enable us to construct computer systems by connecting a large number of computers • A distributed system is a collection of independent computers that appears to its users as a single coherent system.
Transparency Sometimes it make sense to expose distribution rather than hide it
Properties of Distributed Systems • Reliability • Scalability • Availability • Efficiency • CAP Theorem
Reliability • Can the system deliver services in face of several component failures?
Scalability • Can the system scale to support a growing number of tasks?
Availability • How much latency is imposed on the system when a failure occurs?
Efficiency • How efficient is the system, in terms of latency and throughput?
CAP Theorem • Consistent • Available • Partition Tolerant • Trade-off between Consistency and Availability
Stateful vs. Stateless • Whether or not a distributed system saves their state on an attached device for recovery
Distributed System Pitfalls • Peter Deutch identifies false assumptions made when building distributed systems • The network is reliable • The network is secure • The network is homogeneous • The topology does not change • Latency is zero • Bandwidth is infinite • Transport cost is zero • There is one administrator
References • http://webdam.inria.fr/Jorge/html/wdmch15.html • Google Images