70 likes | 198 Views
HPC USER FORUM I/O PANEL April 2009 Roanoke, VA. Panel questions: 1 response per question Limit length to 1 slide. Q1: With the formalization of Parallel NFS as a standard - what steps are being provided to enable this to be hosted on current (and future) platform choices?
E N D
HPC USER FORUM I/O PANEL April 2009Roanoke, VA Panel questions: 1 response per question Limit length to 1 slide
Q1: With the formalization of Parallel NFS as a standard - what steps are being provided to enable this to be hosted on current (and future) platform choices? • Q2: What tools are available to help optimize this (from application layer level all the way to archival stage)? What is missing and who should provide it? • Q3: We all are facing complexity and cost issues. With IB or 10 GbE (40/100 GbE) : where should the HPC community should focus its resources on - for all I/O? • Q4: Too many standards. interconnects, media layers are issues today. iSCSI/FCOE/FcoCEE/FCoIB have all been touted as the solution(s). Is it even relevant in the HPC arena? Is fragmentation the only choice? • Q5: What do you consider to be the top 3 main (technical or human) issues in HPC I/O?
Q1. Parallel NFS finally is here! • Current platforms are likely to be folded in with non-parallel servers and updated with clients as parallel resources become of interest. • Future general and capacity HPC systems will likely be sought with pNFS capabilities • Allowing for decoupling of storage and compute system purchase cycles and potentially for more organic growth of the former. • Future capability machines still warrant coupled storage capabilities • Scale of compute and storage need follow technology trends such that buying storage today sufficient for a big system next year isn’t an efficient investment • Similarly the working storage for last year’s system be sufficient for today’s. • The lure of squeezing out “10% more” performance by using an accelerated access method is still strong for the core system, but the “out of the box” availability of parallel clients will better enable the data-center ecosystem (post processing, data transfer, archiving, front-end & other systems)
Q2. Parallel NFS – implementation details… • The need to abstract some of the complication allows for faster adoption and easier usage, but tools for administrators to observe and analyze operations are likely to be crucial to realizing full potential. • Tools for this purpose will necessarily come from some pNFS vendors, though development or contribution to the open community should help further adoption. • Having a decent DIY option speeds adoption, though likely still leaves plenty of room for differentiated vendor offering for those who need more/better. • Exposure of advanced controls for advanced users should also be present in the clients. Sane defaults set by administrators help, but “one size fits all” settings can be troublesome for a diverse application base.
Q3. Physical media interconnects … • Given enough time, Ethernet wins most battles. • Datacenter heterogeneity and cost-effective 10GigE LOM likely to command volume for clients at large. • Focus on backward compatibility and cost compromises may not serve the higher bandwidth needs. • IB bandwidth cost-effectiveness and density are still hard to beat. • Current lead and narrower focus may allow IB to continue to be differentiated and attractive to the HPC [storage] market. • Renewed ASIC competition should lead to betterand/or cheaper options. LOMs already addressing host cost. • Tiered/hybrid access (native IB clients vs. 10GigE Clients via gateways) fits with the needs of capability systems • For HPC systems, meeting goals within budget is challenging and semi-commodity interconnects (IB) may still warrant their premium & effort.
Q4. Layer protocols above the interconnects • HPC sites are often early adopters and become accustomed to working with technology selections for project durations even if the market shifts trends toward competing technology. • Tolerance for less standardized technology is reasonably high, if the comparative advantages over more mainstream options are worthwhile. • Use of locally attached storage, which is shared at the file or object level isolates many HPC installations from the pain of fragmentation. • Convergence that leads to reduced costs and great choice through interoperability is a benefit to the HPC arena.
Q5. I/O issues not yet addressed? • Dealing with increased client count • File systems: Increasing core counts increase metadata load (not uncommon for file count increasing proportional to # compute cores) • Too much concurrency of sequential [user] streams may look random to the storage. (some relief via middleware such as PSC Zest, LANL PLFS) • Relaxing (suspending/breaking) POSIX compliance may help. • Increasing demand for database access as a means for coping with large data sets imposes IOPS demand orthogonal to aggressive bandwidth demands—can/should/must a common capability satisfy both? • Cost of storage media bandwidth is straining budgets to try to chase computational advances. • Newer technologies providing more options for IOPs or bandwidth, but not capacity—so disk (and tape) still in the picture and are not fully isolated from performance struggle.