1 / 21

Building Scalable, High Performance Cluster and Grid Networks: The Role Of Ethernet

Building Scalable, High Performance Cluster and Grid Networks: The Role Of Ethernet. Thriveni Movva CMPS 5433. Overview. About Grids/Clusters Uses of Grid Differences between Grids/Clusters Benefits of Grid Grid Architecture Building Ethernet Network for Grids/Clusters

sonora
Download Presentation

Building Scalable, High Performance Cluster and Grid Networks: The Role Of Ethernet

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Building Scalable, High Performance Cluster and Grid Networks: The Role Of Ethernet Thriveni Movva CMPS 5433

  2. Overview • About Grids/Clusters • Uses of Grid • Differences between Grids/Clusters • Benefits of Grid • Grid Architecture • Building Ethernet Network for Grids/Clusters • Examples of Ethernet Grids/Clusters • Conclusion/Summary

  3. What Is A Grid Computer? • Hardware and Software System • Integrates a collection of distributed system components • Computer systems • Storage etc • Solves large-scale computation problems • Appear to the user as a single, large “Virtualized” computing system • Consists of geographically dispersed computers

  4. What is a Cluster? • Multiprocessor system consisting of co-located computers and storage • Viewed as though it were a single computer • Connected through fast local area networks (Localized within a room or building) • Provides more speed and/or reliability than a single computer • Cost-effective than single computers of comparable speed or reliability.

  5. Uses of Grid Computing • Computer systems and other resources • not constrained to be dedicated to individual users or applications • Can be made available for dynamic pooling/sharing according to the changing needs • Using internet, Grid-based resource sharing and collaborative problem solving can be extended to multi-institutional “Virtual Organizations”

  6. Differences between Grids/Clusters • Grids: • dispersed over a local/metropolitan/WAN • span administrative boundaries • focus on problems in distributing computing/resource sharing • distribute workloads among different machine types and OS • Clusters: • localized within a room/building • single administration • focus on compute-intensive problems and HPC • homogenous (single type of processor and OS)

  7. Benefits Of The Grid • Grid Computing offers a number of Potential uses and benefits that can be broadly categorized in the following way: • High Performance Computing (HPC) • Data Federation and Collaboration • Resource Allocation and Optimization

  8. High Performance Computing (HPC) • Computationally intensive parallelizable applications can be benefited • Uses computer array of numerous commodity or specialized systems • Most applications of the Grid fall into HPC classification • Advantages Of HPC: • Cost effective solutions to critical problems • High return on investment • Solves problems that were previously insolvable within given time and cost • Solve problems too large for conventional supercomputers • Fields in which the HPC Grid has successfully addressed a wide range of computational problems include: • Climate/weather/ocean modeling and simulation, Internet search engines, Signal/image processing, Pharmaceutical research, Military forces simulation

  9. Data Federation and Collaboration • Consolidates data from different sources in a single data service • Hides data location, local ownership and infrastructure from the application • No data disruption by local users, applications or data management policies • Facilitates wide range of integrated applications like: • Corporate performance dashboards • Marketing analysis tools • Customer service applications • Data mining applications

  10. Resource Allocation and Optimization • Sharing of computing and storage to improve resource utilization • For Example, the applications and the batch jobs can be transferred to an idle server • Benefits of resource optimization • Reclaims much of the stranded capacity of the computing infrastructure • Reduces the level of capital investment • No modification of existing application required

  11. Grid Computing Architecture • Basic architecture of Grid consists of • User Interface • Applications • Grid Middleware • Computing Resources • Grid Network

  12. Applications • Classification of parallel applications • Embarrassingly Parallel Computations (EPC) • Divided into independent parts • Allocated to multiple processors for simultaneous execution • No communication is required between the processors • Example : Testing large integers to determine prime numbers • Parametric and Data Parallel Computations • Also referred to as Nearly Embarrassingly Parallel Computations (NEPC) • Each processor works on independent subset of the data • Data is later gathered by a single process • Examples: Internet search engines • Loosely Coupled Synchronous Parallel Computations • Inter-process communication between small subset of processors before the computation can be completed

  13. Grid Middleware • Gives the Grid the semblance of a single computer system • Provides coordination among computing resources of the Grid • Provides location transparency • Allows the applications to run over a virtualized layer of networked resources • Available from system vendors and independent software vendors • Example: Globus Toolkit

  14. Functions of Middleware • Discovery and monitoring • Discover what resources or services are available • Monitor their status • Resource allocation and management • Matches application requirements to the available computing resources • Creates and schedules remote jobs as required • Ensures optimum load balancing and resource utilization • Security • Shared resources may contain sensitive information • Secures communications, authenticate user identities using SSL/TLS etc • Message Passing System • Used by compute-intensive parallel applications for inter-process communication • Examples: MPI (Message passing interface) and PVM (parallel virtual machine)

  15. Ethernet Networks for Clusters and Grids • Single-switch Clusters • Large Clusters • Ethernet Grid Networks

  16. Single-switch Clusters • Built using a single high-availability Gigabit Ethernet switch/router as the cluster interconnect • The maximum size of a single-switch Ethernet cluster is determined by the non-blocking port capacity of the switch • Current Switch/routers provide interconnect for over 600 GbE connected servers • All server ports configured to be in same subnet

  17. Large Clusters • Built using meshes of Federated Ethernet switches • Ethernet switches use non-blocking, constant Bi-sectional Bandwidth (CBB) topologies • CBB • Provides scalability to support thousands of cluster nodes • Provide high bandwidth connectivity to the network • The core of the cluster provides each node switch with equal load share to avoid blocking of ports

  18. Ethernet Grid Networks(Campus Grid network based on Ethernet switching) • Ethernet allow the cluster to participate in a broader campus or Enterprise Grid structure • Desktop computers, workstations connected to the campus grid network using gbE • Server farms Outside of cluster are connected to site switches using gbE • Goal of campus LANs • gives high priority to general Grid traffic • ensures critical Grid traffic does not incur any added latency

  19. Grid Tools • Tools used to prioritize critical grid traffic • Priority Queuing • The forwarding capacity of a congested port is immediately allocated to any high priority traffic that enters the queue • Rate limiting and policing • Limits the amount of lower priority traffic that enters the network • Weighted Random Early Discard (WRED) • Packet loss can be eliminated if buffers are never allowed to fill to capacity with resulting overflows • Overflows can be avoided by applying WRED to the lower priority traffic • WRED eliminates the possibility of high priority packets arriving at a buffer that is already overflowing with lower priority packets

  20. Examples of Ethernet Cluster/Grids • TeraGrid • Is a multi-institutional effort to build and deploy world’s most comprehensive computing infrastructure for open scientific research • NASA • NASA uses ESDCD “Grid of clusters”, to help scientists increase their understanding of the Earth, the solar system and the universe through computational modeling and processing of space-borne observations

  21. Conclusion/Summary • Ethernet continues to evolve as a highly cost-effective and flexible technology • Majority of parallel and general Grid applications are very well served by the performance characteristics of Ethernet as the cluster/Grid interconnect • In the future, Ethernet end-to-end data transfer bandwidths, message latencies and CPU utilization will improve dramatically due to NIC enhancements • Volume production leading to price decline • These developments expected to improve the overall performance of existing Ethernet clusters/Grids and use of cluster/Grid technology by a broader range of commercial enterprises

More Related