310 likes | 434 Views
Multi C ore P rocessors and C asino P rogramming. W. J. Paul Vienna 2014. TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A A A A A. l ayers of system architecture. p hysical gates. different programming models on different layers
E N D
Multi Core ProcessorsandCasino Programming W. J. Paul Vienna 2014 TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAAAAA
layersofsystemarchitecture physicalgates • different programmingmodels on different layers • instructionsetarchitecture (ISA)… • … • parallel C + devices + macroassembly + assembly + interrupts ISA hypervisor
layer n ofsystemarchitecture • userseesprogramming model (purple) providedbylayer n • implementerimplementsit in programming model oflayer n-1 (white) • implementationsusually simple orwrong • KISS layer n-1 layer n
layer n ofsystemarchitecture • userseesprogramming model (purple) providedbylayer n • implementerimplementsit in programming model oflayer n-1 (white) • implementationsusually simple • easy IFweknowprogramming model on layer n-1 layer n-1 layer n
ifweonlykindofknowprogramming model oflayer n-1….. layer n-1, n…
thecasinoispresentlyeverywhere • ISA ofmulticoresystemsisonlykindofknown • listofoperatingconditions in these 3000 pagesmightbeincomplete • completelistcanbeobtainedbycorrectnessproofofprocessorhardware • Semanticsstack on top is • not completelydefined + justified
mismatch • manufacturersof real time systems • avoidmulticoreor • turn presently off all parallel featurestheycan • theyknowwhattheyaredoing
roadmap/plan of talk • ISA-spformulticoreprocessors • MIPS 86 = MIPS + TSO • below: • hardwarecorrectnessformulticorenondeterministic ISA • collectoperatingconditions • bottomofroadmap: digital gates • bottom: physicalgates • above: • definesemanticslayers • justifyarguingaboutimplementation in lowerlayers • ownershipand order reduction
ISA-sp: disk APIC • X64 ISA model • E. Cohen: communicatingsequentialcomponents; order ofstepsnondeterministic • sb: storebuffer • mmu: memorymanagementunit; walkingofpagetablesnondeterministic(speculation) • APIC: device, interrupts • disk: forbooting mem + caches sb mmu core
Nondeterministic ISA • hardwarecorrectness • induction on cycles t ofdeterministichardware • ne(t): numberofnondeterministic ISA stepscompletedatcycle t • oracleinput o forthesesteps • unitstepped • initial walk guessedof MMU • walk usedbycore
Implementationdependentoperatingconditions • pipelinestages • old: wheniswritetogprvisible ? • forwardingandstalling pc-translate fetch decode execute ea-translate memory gprwrite back
Implementationdependentoperatingconditions • pipelinestages • wheniswriteof an instructionvisible • speculation • Kröning 1999 pc-translate fetch decode execute ea-translate memory gprwrite back
Implementationdependentoperatingconditions • pipelinestages • wheniswriteof an instructionorpagetablebyotherprocessorvisible • drainpipe + storebuffer + sync pc-translate fetch decode execute ea-translate memory gprwrite back
invlpg • pipelinestages • core: • stepatstage ‚memory‘ • IMMU: • stepatstage ‚pc-translate‘; speculation in ISA. • pipeline walk wo in ghostregisters • invariant: wo in virtualtlb • corestep(wo) • onlyallowedif invariant holds • invariant: • inhibituseoftranslation in tlbinvlpgdbyinstruction in stagesdecode…memory • roll back pc-translateusingtranslationinvlpgdatstagefetch (speculativeexecution) • interruptin stagedecode • changestountranslatedmode • IMMU step in stagepc-translatewould not occur in deterministic ISA • was speculated in nondeterministic ISA (evenwithdeterministic MMU) pc-translate wo fetch decode execute ea-translate memory gprwrite back
Invlpg: canbeimplementedwithoutsoftwareconditionin nodeterministic ISA • pipelinestages • core: • stepatstage ‚memory‘ • IMMU: • stepatstage ‚pc-translate‘; speculation in ISA. • pipeline walk wo in ghostregisters • invariant: wo in virtualtlb • corestep(wo) • onlyallowedif invariant holds • invariant: • inhibituseoftranslation in tlbinvlpgdbyinstruction in stagesdecode…memory • roll back pc-translateusingtranslationinvlpgdatstagefetch (speculativeexecution) • interrupt in stagedecode • changestountranslatedmode • IMMU step in stagepc-translatewould not occur in deterministic ISA • was speculated in nondeterministic ISA (evenwithdeterministic MMU) pc-translate wo fetch decode execute ea-translate memory gprwrite back
currentresearch/last forhardware • pipelinestages • Whenaredevicestepsvisible in multicoremachines? pc-translate fetch decode execute ea-translate memory gprwrite back
ISA +devicesanddrivercorrectness (Dublin 2009) • hardware parallel evenwithsequentialprocessor • ISA nondeterministicconcurrent, 1 stepat a time • disableinterruptsofdevices >1 anddon‘tpollthem • reordertheirdevicesteps out ofdriverrunofdev 1 • preand post conditionsfordrivers… dev 1 proc dev k
ISA +devicesanddrivercorrectness • disableinterruptsofdevices >1 anddon‘tpollthem • reordertheirdevicesteps out ofdriverrunofdev 1 • preand post conditionsfordrivers… • assumesabsenceofsidechannels dev 1 proc dev k
ISA +devicesanddrivercorrectness • disableinterruptsofdevices >1 anddon‘tpollthem • reordertheirdevicesteps out ofdriverrunofdev 1 • preand post conditionsfordrivers… Device 1: motor Device 2: clima Side channel: power consumption dev 1 proc dev k
C + devices • Implementation • accessdeviceportsbyassemblycode • do not allocate C variables toports • disableinterruptsduringrunoftranslated C code • Order reduction: devicesstepscanbereorderedtoassemblyportion • Semantics • Configurations (a,c,d) or (a,d) • d fordevice • devicestepsonlyfor (a,d)
Ownership (1)concept • Classifyaddresses • local (e.g. C stack) • sharedandreadonly (e.g. program) • sharedowned (temporarilylocal/locked) • sharedwriteable not owned (locks) • invariants: • atmost 1 owner …. • disjointness… • safeprograms: actlikenamesofaddressclassessuggest • accessestoclass 4 atomicatthelanguagelevel
Ownership (2)Def: structured parallel C (almostfolklore) • Classifyaddresses • local (e.g. C stack) • sharedandreadonly (e.g. program) • sharedowned (temporarilylocal/locked) • sharedwriteable not owned (locks) • multiple C threads • sequentiallyconsistentmemory! • shared: heap + global variables • local: stacks • safew.r.t. ownership • class 4 access: volatile • Interleaveat(compilerconsistencypointsbefore) class 4 accesses
Ownership (3)structured parallel C to parallel assembly • IF • translatethreadswithsequentialcompiler • translate volatile C accesstointerlocked ISA access • atmost 1 class 4 accessbetweentwointerleavingpoints(e.g. no global pointerchasingto global variable) • THEN • ISA programsafe • multicore ISA simulates parallel C • Baumann 2014
Ownership (4)parallel storebufferreduction in ISA-sp dirty • maintainlocaldirtybits • class 4 writesince last localsb- flush • class 4 readonlyifdirty =0 • Cohen Schirmer ITP 2010: storebuffers invisible • formal, 70 pagesproof • nommu • push throughhierarchy • implementsb-flushascompilerintrinsic in C C compiler m-asm m-assembler ISA-u=asm before ISA-sp
Ownership (5)parallel storebufferreduction in ISA-sp dirty • maintainlocaldirtybits • class 4 writesince last localsb- flush • class 4 readonlyifdirty =0 • Chen Cohen Kovalev (VSTTE 2014: storebuffers invisible • 94 pagesproof • withmmu • pagetableslocaltoprocessor + mmuorshared • newownershipclass: locallyshared. Processoraccesswhilelocalmmuwalks: class 4 C compiler m-asm m-assembler ISA-u=asm before ISA-sp
Ownership (6): Semanticsof C + interruptsPentchev 2014 • C programthread + handlerthreads • ownershipdisciplinebetweenprogramandhandlerthread • interleaveatconsistencypointsaroundclass 4 accesses • Parallel C programthreads + handlerthreads • ownershipasforstructured parallel C forlocalthreads + handlers • newownershipclass: locallysharedbetweenprogramthreadandhandler
Summary • Hardware • searchofsoftwareconditionsalmostcompleted (exceptmulticore + devices) • so faronlyknown type ofsoftwareconditionsfound • withnondeterministic ISA nosoftwareconditionsforuseofinvlpg • Sofwarestack • C + assembly • C + devices • structured Parallel C • storebufferreductionwith MMUs • C + interrupts
Oncethisresearchisdone • wecouldquit • ifwewantedto