330 likes | 344 Views
This paper discusses the challenges in communication and synchronization in large-scale distributed machine learning and proposes HiPS, a hierarchical parameter synchronization approach. The paper presents the HiPS design, theoretical evaluation, simulation evaluation, and testbed evaluation, showcasing its effectiveness in reducing communication costs and improving performance in distributed machine learning.
E N D
HiPS:Hierarchical Parameter Synchronization inLarge-ScaleDistributedMachineLearning JinkunGeng,DanLi,YangCheng,ShuaiWang,andJunfengLi
ACMSIGCOMMWorkshoponNetAI Net AI for
Background Computation Communication DistributedMachineLearning
Background • StrongComputationPower(GPU&TPU)
Background • Communication Challenge • TCP: High Latency & Low Throughput, Kernel Overheads, etc. • RDMA-PromisingAlternativetoTCP
Background • AMNISTBenchmarkwith1millionparas
Background • RoCE/RDMA–multi-vendorecosystem • ManyProblemsinFat-TreebasedDeployment
Background • Fat-TreebasedDeployment • PFCpauseframestorm[SIGCOMM’15,’16,NS-3Simulation] • ResilientRoCE-PerformanceSacrifice [Chelsio-Tech] • SynchronizationPerformance
Background • Fat-TreebasedDeployment • PFCpauseframestorm[SIGCOM’15,’16] • ResilientRoCE-PerformanceSacrifice Server-CentricNetworks
Background • Fat-TreebasedDeployment • SynchronizationPerformance Hierarchical Synchronization
Background • Server-CentricNetworks • LesshopsleadtolessPFCpauseframes • ServerspreventcascadingeffectofPFCpauseframe
Background • SynchronizationAlgorithm • PS-based • Mesh-based • Ring-based
Background • SynchronizationAlgorithm • PS-based(Pull+Push)
Background • SynchronizationAlgorithm • Mesh-based(Diffuse+Collect)
Background • SynchronizationAlgorithm • Ring-based(Scatter+Gather)
Background • SynchronizationAlgorithm • Ring-based(Scatter+Gather)
HiPSDesign • MapLogicViewandPhysicalStructure • Flexible(Topology-Aware) • Hierarchical(Efficient)
HiPSDesign • HiPSinBCube
HiPSDesign • HiPSinBCube
HiPSDesign • HiPSinBCube
HiPSDesign • HiPSinBCube(Server<01>)
HiPSDesign • HiPSinBCube
HiPSDesign • HiPSinTorus
FutureWork • ConductFurtherComparativeStudy • IntegrateHiPSintoDMLsystems
Simulation Evaluation • NS-3SimulationwithVGGWorkload • BCube:GSTreducedby37.5%∼61.9%. • Torus:GSTreducedby49.6%∼66.4% GST Comparison with RDMAinBCube GST Comparison with RDMAinTorus
Testbed Evaluation • SystemInstanceofHiPS:BML • AddanOPinTensorflow • 9Servers,eachequippedwith2RNICs(BCube(3,1)) • MINISTandVGG19asbenchmarks • RingAllreduceinRingandMesh-based(P2P)SyncinFat-TreeasBaseline
Testbed Evaluation 18.7%~56.4%
OngoingWork • ConductFurtherComparativeStudy • OptimizeHiPSinDMLsystems • MoreCasesofNetworkforAI
Thanks! NASPResearchGroup https://nasp.cs.tsinghua.edu.cn/