160 likes | 285 Views
Virtual Streams. Performance-Robust Parallel I/O. Z. Morley Mao, Noah Treuhaft CS258 5/17/99 Professor Culler. Introduction. Clusters exhibit performance heterogeneity static & dynamic, due to both hardware and software Consistent peak performance demands adaptive software
E N D
Virtual Streams Performance-Robust Parallel I/O Z. Morley Mao, Noah Treuhaft CS258 5/17/99 Professor Culler
Introduction • Clusters exhibit performance heterogeneity • static & dynamic, due to both hardware and software • Consistent peak performance demands adaptive software • building performance-robust parallel software means keeping heterogeneity in mind • This work explores… • adaptivity appropriate for I/O-bound parallel programs • how to provide that adaptivity
Heterogeneity demands adaptivity Process Disk Cluster Node • Physical I/O streams are simple to build and use • But their performance is highly variable • different drive models, bad blocks, multizone behavior, file layout, competing programs, host bottlenecks • I/O-bound parallel programs run at rate of slowest disk ...
Virtual Streams Process • Performance-robust programs want virtual streams that... • eliminate dependence on individual disk behavior • continually equalize throughput delivered to processes Virtual Streams Layer Disk
Graduated Declustering (GD): a Virtual Streams implementation • data replicated (mirrored) for availability • use replicas to provide performance availability, too • fast network makes remote disk access comparable to local • distributed algorithm for adaptivity • client provides information about its progress • server reacts by scheduling requests to even out progress client A client B Process GD server GD client library server server B A
GD in action Before Perturbation After Perturbation Client0 B Client1 B Client2 B Client3 B Client0 7B/8 Client1 7B/8 Client2 7B/8 Client3 7B/8 To Client0 To Client0 B/2 B/4 B/2 B/2 B/2 B/2 5B/8 3B/8 B/2 B/2 B/2 B/2 3B/8 B/4 B/2 B/2 B/2 5B/8 From Server3 From Server3 0 0 1 1 1 1 2 2 2 2 3 3 3 3 0 0 Server0 B Server1 B Server2 B Server3 B Server0 B Server1 B/2 Server2 B Server3 B • Local decisions yield global behavior
Evaluation of original GD implementation: progress-based Seek overhead • Seek overhead due to reading from all replicas
Deficiency of original GD implementation: seek overhead • Under the assumption of sequential data access: • Seek occurs even when there is no perturbation • seeks are becoming more significant as disk transfer rate increases • Need a new algorithm, that ... • reads mostly from a single disk under no perturbation • dynamically adjusts to perturbation when necessary • achieves both performance adaptivity and minimal overhead
Proposed solution: response-rate-based GD • Number of requests clients send to server based on server response rate • servers use request queue lengths to make scheduling decisions • uses implicit information, “historyless” • no bandwidth information transmitted between server and client • advantage: each client has a primary server
Evaluation of response-rate-based GD Reduced Seek overhead • Graph of bandwidth vs. disk nodes perturbed
Historyless vs. History-based adaptiveness • History-based: (progress based) • Adjustment to perturbation occurs gradually over time • Close to perfect knowledge, if the information not outdated • extra overhead in sending control information • Historyless: (response-rate based) • primary server designation possible • to increase sensitivity to real perturbation by creating “artificial” perturbation • considers varying performance of data consumers • takes longer to converge
Stability and Convergence • How long does it take for the system to converge? • Linear with the number of nodes • Depends on the last occurrence of perturbation • Influenced by the style of communication (implicit vs. explicit)
Server request handoff • If a server finishes all its requests, it will contact other servers with the same replicas to help serve their clients (workstealing) • server request handoff keeps all disks busy when possible • design decisions? • How many requests to handoff? Depending on the BW history of both servers, depending on the size of request queue. • Benefit vs. Cost tradeoff
Writes Process • Identical to reads except... • Create incomplete replicas with “holes” • track “holes” in metadata • afterward, do “hole-filling” both for availability and for performance robustness
Conclusions • What did we achieve? • New load balancing algorithm--response-rate based • Deliver equal BW to parallel-program processes in face of performance heterogeneity • demonstrate the stability of the system • reduce seek overhead • server request handoff • writes • creates a useful abstraction for steaming I/O in clusters
Future Work • Future work: • hot file replication • get peak BW after perturbation ceases • achieve orderly replies • multiple disks abstraction