320 likes | 332 Views
Explore how Smart Parameter Server improves distributed machine learning efficiency through selective updates, proactive pushes, and intelligent strategies. Experiment results show significant reductions in training time compared to baseline methods.
E N D
AcceleratingDistributedMachineLearningbySmartParameterServer JinkunGeng, Dan Li and Shuai Wang
Background • Distributedmachinelearningbecomesthecommonpractice,becauseof: • 1.Theexplosivegrowthofdatasize
Background • Distributedmachinelearningbecomesthecommonpractice,becauseof: • 2.Theincreasingcomplexityoftrainingmodel ImageNetCompetition: <10(Hinton, 2012), 22 (Google, 2014), 152 (Microsoft, 2015), 1207 (SenseTime, 2016)
Background • ParameterServer(PS)-basedarchitectureiswidelysupportedbymainstreamDMLsystems.
Background • However,thepowerofPSarchitecturehasnotbeenfullyexploited. • 1.Communicationredundancy • 2.Stragglerproblem
Background • Adeeperinsight… • 1.Worker-centricdesignislessefficient • 2.PScanbemoreintelligent(i.e.SmartPS) SmartPS
Background • TomakePSmoreintelligent… • Dependency-Aware • Straggler-Assistant
DesignStrategies • TomakePSmoreintelligent… • 1.Selectiveupdate() • 2.Proactivepush() • 3.Prioritizedtransmission() • 4.Unnecessarypushblockage()
Evaluation • ExperimentSetting: • 17Nodeswithdifferentperformanceconfigurations:1PS+16Worker • 2Benchmarks: • MatrixFactorizationandPageRank • 5Baselines: • BSP, ASP,SSP(slack=1), SSP(slack=2),SSP (slack=3)
Evaluation MFBenchmark: Withacommonthreshold,SmartPSreducesthetrainingtimeby68.1%~90.3%comparedwiththebaselines.
Evaluation PRBenchmark: Withacommonthreshold,SmartPSreducesthetrainingtimeby65.7%~84.9%comparedwiththebaselines.
FurtherDiscussion • Comparisontosomerecentworks: Bothleveragetheknowledgeofparameterdependency 2.Bothleverageprioritizedtransmission forDMLacceleration
FurtherDiscussion • Comparisontosomerecentworks:
OngoingWork • AdeeperinsightintoPS-basedarch… • FunctionofPS: • 1.ParameterDistribution • 2.ParameterAggregation • FunctionofWorker: • 1.ParameterRefinement ->DataAccessControl ->DataOperation ->DataOperation
OngoingWork ParameterDistribution ParameterAggregation ParameterRefinement
OngoingWork DataAccessControl DataOperation DataOperation
OngoingWork DataAccessControl Token Token Token DataOperation
NextGenerationofSmartPS • ParameterServer->TokenServer • 1.Decoupledata(access)controlanddataoperation • 2.Alight-weightandsmartTokenServerinsteadofParameterServer. TokenServer ParameterServer
Thanks! NASPResearchGroup https://nasp.cs.tsinghua.edu.cn/ https://www.gengjinkun.com/