220 likes | 355 Views
BulkCommit : Scalable and Fast Commit of Atomic Blocks in a Lazy Multiprocessor Environment. Author: Xuehai Qian , Benjami Sahelices , Depei Qian. DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING NATIONAL INSTITUTE OF TECHNOLOGY KARNATAKA-SURATHKAL 2014. Presented by:
E N D
BulkCommit: Scalable and Fast Commit of Atomic Blocks in a Lazy Multiprocessor Environment Author: XuehaiQian, BenjamiSahelices, DepeiQian • DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING • NATIONAL INSTITUTE OF TECHNOLOGY • KARNATAKA-SURATHKAL • 2014 Presented by: PravinRamteke (CS13F08)Sawan Belekar(13IS04F) Course Instructor: Dr. BasavarajTalawar
Motivation •ArchitecturesthatcontinuouslyexecuteAtomicBlocksorChunks(e.g.,TCC,BulkSC) • Chunk:agroupofdynamicallycontiguousinstructionsexecutedatomically • Providingperformanceandprogrammabilityadvantages[Hammond04][Ahn09] •Chunkcommitisanimportantoperation:makingthestateofachunkvisible atomically •Wefocusonthedesignswithlazydetectionofconflicts • Provideshigherconcurrencyincodeswithhighconflicts • Parallelizingthecommitischallenging •Requirestheconsistentconflictresolutiondecisionoverallthedistributed directorymodules • Therefore,mostcurrentschemeshavesomesequentialstepsinthecommit •Inaddition,thecurrentlazyconflictresolutionsaresub-optimal • IncurthesquashwhenthereisonlyWrite-After-Write(WAW)conflict 2
Lifetime of aChunk Time Grouping Propagation Execution Commit • Execution: • Readsandwritesbringlinesintothecache • Nowrittenlineismadevisibletootherprocessors • Executionendswhenthelastinstructionofthechunkcompletes • Commit:makethechunkstatevisibleatomically • Grouping:settherelativeorderofanytwoconflictingchunks • Grabbingthedirectory:lockingthelocalmemorylinesanddetectingtheconflicts • Afteracommitgrabsalltherelevantdirectories,itisguaranteedtocommitsuccessfully • Propagation:makingthestoresinachunkvisibletotherestofthesystem • Involvingsendinginvalidationsandupdatingdirectorystates • Atomicityisensuredsincetherelevantcachelinesarelogicallylockedbysignaturesduringtheprocess 3
Inefficiency:SquashonWAW-Only Time Grouping Propagation C0 C1 P0 P1 store buffer wr x wr x ?? C1 Squashed/ x: D in P0 Dir wr x Serialize WAW-only conflictwith squash wr x wr x Execution Commit Chunks wr x wr x wr x x:S x:I x:I x:D x:D x:I x:S L1 cache Re-exec x: S in P0&P1 x: D in P1 m Serialize WAW Conventional System without re-execution Chunk-based System 5
Contribution:BulkCommit •BulkCommit:commitprotocolwithparallelgroupingandsquash-free serializationofWAW-onlyconflict • IntelliSquash: nosquash onWAW • Insight:using L1 cacheasthe“storebuffer”forthechunk • IntelliCommit: parallelgroupingwithoutbroadcast • Insight:using preemptionmechanism toensuretheconsistentorder oftwoconflictingchunks •BulkCommittriestoachievetheoptimalcommitprotocoldesign 6
Outline •Motivation •IntelliSquash •IntelliCommit •Evaluation 7
IntelliSquash:Insight •Challenge:thespeculativedataproducedbyachunkcannotbelostwhenthechunkisreadyto commit • Solution:usetheL1cacheasthe“storebuffer”forachunk • Similartothestorebufferintheconventionalsystem • Onreceivinganinvalidation,thespeculativedirtywordsofalinearepreserved • Absentbit:itissetwhen • Thelineisnotpresented • Thelinecontainssomespeculativewords • Per-worddirtybit(notshown) P0 P0 P1 P1 000 1 111 v 011 v v 0 011 v v Dir State m: D in P1 Dir State m: D in P1 line(m) line(m) 1 1 1 v v 0 1 1 1 commit commit spV d spV A d 1 1 1 v v 1 1 1 v v spV d spV A d m: S in P0&P1 m: S in P0&P1 8
IntelliSquash:MergeOperation •Performed when thewholelinewith The dirty word is merged with •Mergetheremotenon-speculative cachelinewiththelocalspeculative P0 read 0 111 v’ • Oncommit AspVd • Thelineis notaccesses again AspVd • Therefore,needtobringtheline Dir State m: S in P1&P1 tothecacheasifthereis amiss • Unset Absent(A)bit Absentbitsetis brought tothecache the non-speculative line words line(m) • On misses to a word not presented v 1 0 1 1 1 0 0 1 1 1 1 1 0 v v v v m: S in P0&P1 m: D in P1 9
Outline •Motivation •IntelliSquash •IntelliCommit •Evaluation 10
IntelliCommit Protocol •Onchunkcommit: • Processorsends commitrequests P toalltherelevantdirectorymodules request: • Locksthememorylines D0 D1 D2 D3 • Responds withcommit_ack • Processorcountsthenumberof commit_ackreceived Group formed • Processorsends commit_confirm when itreceivestheexpected numberofcommit_ack • Directory module receives commit 11
ConflictingChunksTryingtoCommit • Different overlapped directory modules receive commit request in opposite order • Need to avoid deadlock P3 D3 P0 D0 C3 C0 P1 D1 P2 D2 12
IntelliCommit:DeadlockResolution •Basicidea:enforceaconsistentorderbetweentwoconflictingchunks •Piggybackahardware-generatedrandomnumberwiththecommitrequest 13
WhyDoes IntelliCommit Work? 1. Whenthedirectorygroupofachunkis alreadyformed,thechunk cannotbepreemptedbyanother chunk 2. Allthemodulesinvolved in aconflictreachthesamedecisionon whichchunkhasthehigherpriority,locally 14
IntelliCommit Implementation • Extramessages(P=Processor,D=Directory): • preempt_request(D→P) • preempt_ack(P→D) • preempt_nack(P→D) • preempt_finish(D→P) • CommitAckCounter(CAC):#(notreceivedcommit_ack) • PreemptionVector(PV)(N=#P=#D): • Eachprocessor:Ncountersofsizelog(N) • PV[i]atPj=k • Pj’schunkis preemptedbyPi’schunkin kdirectories • IncreasePV[i]:abouttosendpreempt_ackforPi’schunk • DecreasePV[i]:receivedapreempt_finishforPi’schunk • Whentosendcommit_confirm? • (CAC==0)&&(foreachi,PV[i]==0) •Receivedallcommit_ackandthechunkisnotpreemptedbyanyotherchunksin anydirectory
Outline •Motivation •IntelliSquash •IntelliCommit •Evaluation 16
Evaluation •CycleaccurateNOCsimulationwithprocessorandcachemodel •Numberofcores:16and64 •11SPLASH-2and7PARSECapplications •Oneortwooutstandingchunks •Implementedmostdistributedcommitprotocols: • ScalableTCC(ST) • ScalableBulk(SB) • BulkCommitwithoutIntelliSquash(BC-SQ) • BulkCommit(BC) 17
SPLASH-2 Performance •BulkCommitreducesbothsquash andcommittime 18
One and TwoOutstandingChunks •Using twooutstandingchunksisnotalwaysusefuldue tothesetrestriction • Twochunksfromthesameprocessorcannotwritethesamecacheset
Conclusion •ProposedBulkCommit:commitprotocolwithparallelgroupingand squash-freeserializationofWAW-onlyconflict •Keyproperties: • SerializingWAWbetweenchunkswithoutsquashing • Exploitingthesimilarityofachunkcommitandanindividualstore • Parallelgrouping • Usingpreemptionmechanismstoordertwoconflictingchunks consistently •Results: • Eliminatethecommitbottleneckwithevensingleoutstandingchunk • Reducethesquashtimeforsomeapplication