1 / 18

HPC Open Forum for Researchers

HPC Open Forum for Researchers. Overview. Received $1.8 million grant to expand Cardinal Research Cluster (CRC) and research computing infrastructure Identified weak links in CRC Identified needs for new hardware based on current usage and requests Developed recommendations.

Download Presentation

HPC Open Forum for Researchers

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. HPC Open Forum for Researchers

  2. Overview Received $1.8 million grant to expand Cardinal Research Cluster (CRC) and research computing infrastructure Identified weak links in CRC Identified needs for new hardware based on current usage and requests Developed recommendations

  3. Cardinal Research Cluster - CRC Campus network Visualization Cluster (CECS) Login nodes crc.hpc.louisville.edu SMP IBM p570 16 CPUs 1/10 Gbps Ethernet Global Storage Statistical server Visualization server High Performance Computing Cluster 304 nodes/2432 cores 16 or 32 GB/node 4x DDR Infiniband Informatics Server/Storage 100+ TB

  4. CRC Limitations • Network limitations • Network switch has no free ports—zero room for expansion • Limited capacity to campus backbone and Internet2 • Storage limitations • Scratch space is already becoming full • Slow/unreliable performance of GPFS storage • Lack of good archiving system • Single points of failure • No redundancy in storage servers, all must be online to function • No backup hardware for management, queue, and user nodes

  5. Usage Trends • Lots of serial or single-node jobs, very few massively parallel jobs • Bioinformatics jobs • Molecular dynamics jobs • Some Gaussian jobs are single node, none should be more than ~4 nodes • Current massively parallel jobs are well-served by existing InfiniBand nodes

  6. Researcher Requests Expand storage capacity Provide ability to have larger quotas Provide data archiving and management Expand visualization servers Provide ability to quickly add applications servers

  7. Other Considerations Need for separate statistical server to free shared memory p570 system for computational focus Need to implement second phase of Oracle RAC redundancy (extended RAC) Need for general purpose applications servers that can be allocated for dedicated research applications Need for local scratch disks on compute nodes Need for facilities upgrades (cooling and power)

  8. Recommendation - Networking • Remark: CRC network switch cannot be expanded and is a single point of failure • Recommendation: Redesign networking for expansion of research computing infrastructure and improved connectivity • Add new core switch for shared resources including storage, user nodes, p570, viz, and stats server • Add switch for expansion of compute nodes and servers on the CRC • Expand connectivity to campus backbone network and Internet2

  9. CRC – Network Redesign

  10. Recommendation - Storage • Remark: Address storage space expansion and performance issues • Recommendation: • Add storage space • Increase number of storage servers • Increase allocation of scratch space • Review quota structure with governance committee • Develop archiving systems • Continue to address GPFS tuning concerns

  11. Recommendation - Computation • Remark: Lots of serial or single-node jobs, very few massively parallel jobs • Recommendation: Implement new cluster optimized for high-throughput serial processing • Utilize blade centers to provide a low cost way to maximize number of compute nodes • 14 nodes/blade center – 168 cores/blade center – allows most jobs to run in a single blade with a high-speed network among the nodes • network between blade centers offers less optimal inter-blade communication than intra-blade communication

  12. Recommendation – Computation Remark: Address requested and required capabilities Recommendations: Add dedicated statistical server Implement extended Oracle RAC Add rack of general-purpose servers Add visualization systems Expand local scratch disk on compute nodes Provide backup server(s) for queue and management nodes

  13. Datacenter Requirements • Proposed project to upgrade cooling & electrical in darkroom • Submitted ARI-R2 grant application - stimulus funding for renovation or expansion of a research facility • $400,000 for datacenter renovation • $450,000 for network expansion • Decision expected by January 2010

  14. Software Needs First round of software acquired $85,000 committed to ongoing support $65,000 available for additional acquisitions Need to define needs and priorities for this year

  15. Summary of Recommendations Redesign cluster network around core switch Expand storage and address performance issues Add compute cluster optimized for serial jobs Provide additional statistical, visualization, and general purpose application servers Upgrade datacenter facilities to accommodate cluster upgrades

  16. CRC - before Campus network Visualization Cluster (CECS) Login nodes crc.hpc.louisville.edu SMP IBM p570 16 CPUs 1/10 Gbps Ethernet Global Storage Statistical server Visualization server High Performance Computing Cluster 304 nodes/2432 cores 16 or 32 GB/node 4x DDR Infiniband Informatics Server/Storage 100+ TB

  17. CRC - after Campus network Visualization Cluster (CECS) Login nodes crc.hpc.louisville.edu CRC-1 switch CRC-2 switch Core switch 1/10 Gbps Ethernet Visualization servers Global Storage SMP IBM p570 16 CPUs Serial/Small Job Cluster High Performance Computing Cluster 304 nodes/2432 cores 16 or 32 GB/node Statistical Server Application servers 4x DDR Infiniband Informatics Server/Storage 100+ TB

  18. Comments and Questions

More Related