170 likes | 185 Views
Dive into the world of cloud computing and grid computing, comparing data and computing grids, Hadoop File System, MapReduce, and more. Learn about the advantages of Hadoop and the challenges of cloud computing. Discover the nuances of grid computing, distributed data storage, and parallel execution. Get insights into how cloud computing is redefining modern computing paradigms, offering scalability, reduced costs, and improved resource utilization.
E N D
Cloud ComputingWhat, why, how? Noam Bercovici Renata Dividino
Motivation • Count how frequent each words appears in the corpus MEDline (18 millions texts)
Motivation I want to extend my research to another corpus Need more computing resources
Agenda • Introduction • Data Grid vs. Computing Grid • Grid Computing • Cloud Computing • Data Grid (HaDoop File System) • Computing Grid (Map Reduce) • Conclusion
Data Grid vs. Computing Grid Grid Computing • Data Grid: • distributed data storage • controlled sharing and management of large amounts of distributed data. • Computing Grid: • Parallel execution • divide pieces of a program among several computers • Data Grid + Computing Grid
Grid Computing Slaves Task Master The Grid
Grid Computing • Motivation: high performance, improving resources utilization • Aims to create illusion of a simple, yet powerful computer out of a large number of heterogeneous systems • Tasks are submitted and distributed on nodes in the grid
Cloud Computing • “The interesting thing about cloud computing is that we’ve redefined cloud computing to include everything that we already do. “ • Larry Ellisonduring Oracle’s Analyst Day
Cloud Computing • Pay-as-you-go • No initial investments • Reduced operation costs • Scalability • Availability
Cloud Computing - Open Issues • Bandwidth and latency • Lack of standard and portability • „Black-box“ implementations • Security and lack of control • Immature tools and framework support • Legal issues (ownership, auditing, etc) • Limited Service Level of Agreements (SLAs)
Data Grid vs. Computing Grid Grid Computing • Data Grid: • distributed data storage • controlled sharing and management of large amounts of distributed data. • Computing Grid: • Parallel execution • divide pieces of a program among several computers • Data Grid + Computing Grid
Data Grid (Hadoop FS - Overview) • Caching of Data Index: Namenode (master node) Metadata (Name, .., ..) Ask specific text … Client Block ops Datanodes (Slave node) Replication
Counting Words in Text Files Split-Operation Map-Operation Reduce-Operation w1: countWords(File) w1: 6 w2: w2: 14 countWords(File) w3: 15 w3: … … w4: 17 countWords(File) w4: … countWords(File) w5: w5: 1
Advantages of Hadoop • Purely written in Java, requires installation of Cygwin under Windows • Available under LGPL and Apache 2.0 license • Usually offers only one implementation for the different features of a grid framework • May also use other file systems than Hadoop FS • Very flexible implementation of MapReduce • For split operation only supports FileSplit out of the box • Better suited for computations where … • … large data collections should be handled • … if reduce-operation is more than a simple aggregation of the map‘s output
Danke! • Questions?