90 likes | 171 Views
C++11 and three compilers: performance considerations. CERN openlab Summer Students Lightning Talks Sessions Supervised by Pawel Szostek Stephen Wang. 19/08/2014. Motivation. “Surprisingly, C++11 feels like a new language” by Stroustrup .
E N D
C++11 and three compilers: performance considerations CERNopenlabSummerStudents LightningTalksSessions Supervised by PawelSzostek StephenWang • 19/08/2014
Motivation • “Surprisingly, C++11 feels like a new language”byStroustrup. • Improvements: Language usability,Mulithreadingandotherstuff. • Howaboutperformance? • Four questions: Timemeasurement methods,For-loop efficiency,std::async,STLalgorithms in parallel mode • GCC4.9.0,ICC15.0.0,Clang+LLVM3.4 • -O2, -O3, -Ofast Stephen Wang
Contribution • Makefourmicro-benchmarksforeachfeature. • Code:https://github.com/wangyichao/CPP11_Benchmarks • Automatizetheprocess,makeresultsrepeatable • Pythonandbashscriptareused. • Compile->Run->Table(3compiler * 3options * containers * algorithms…) • Useprofilingtoolstocheckperformancesuchasvectorizationandthreads. • Perf,IntelVtune,Likwid,evenPMU • Answerbasicquestions. Stephen Wang
Conclusions auto start = std::chrono::steady_clock::now(); auto end = std::chrono::steady_clock::now(); intelapsed = std::chrono::duration_cast<std::chrono::nanoseconds>(end - start).count(); overhead.push_back(elapsed); RDTSCP std::chrono Stephen Wang
Conclusions • std::chrono (C++11) isareliabletimemeasurement. auto start = std::chrono::steady_clock::now(); microseconds_sleep(index); auto end = std::chrono::steady_clock::now(); intelapsed = std::chrono::duration_cast<std::chrono::nanoseconds>(end - start).count(); Stephen Wang
Conclusions • Performanceoffor-loopvarieswithiterationmethodandcontainer. • Whichcontainersarevectorized? Tablerange-basedforloop StephenWang
Conclusions • std::asyncspawns threads when called andthecomputationisdoneimmediately. ( stdlibc++ from GCC) double sum = 0; auto handle1 = std::async(std::launch::async, fsum, v.begin(), v.begin()+distance); Autohandle2=std::async(…); … … sum = handle1.get()+handle2.get()+handle3.get()+handle4.get(); StephenWang
Conclusions • GNUlibstdc++parallelspeeds uppartofSTLalgorithms,whoseperformancevarieswithcontainers. Stephen Wang
Thanks StephenWang