630 likes | 768 Views
Clocking and Timing in Fault-Tolerant Systems-on-Chip. Andreas Steininger. Outline. The Clock as a Blessing The Clock as a Curse Alternative Synchronization Schemes GALS fully asynchronous the DARTS approach Conclusion. Contributors to this Work. The DARTS project team
E N D
Clocking and Timing in Fault-Tolerant Systems-on-Chip Andreas Steininger
Outline The Clock as a Blessing The Clock as a Curse Alternative Synchronization Schemes • GALS • fully asynchronous • the DARTS approach Conclusion
Contributors to this Work The DARTS project team TU Vienna Gottfried Fuchs Matthias Fuegger Ulrich Schmid Thomas Handl RUAG Space Gerald Kempf Manfred Sust Wolfgang Zangerl
The Need for Fault Tolerance miniaturizationiskeytoprogress in VLSI => smallerstructures => lowervoltage swing => smallercriticalcharge => higheroperatingfrequencies …result in highersusceptibilitytofaults (SET, EMI,…) => cannotavoidfaults, needtotoleratethem
The Roleof Time “The only reason for time is so that everything doesn’t happen at once”, Albert Einstein
The Need forClocking activitiesneedtobeco-ordinated • on systemlevel (brakingofwheels, …) • on algorithmiclevel (consensus, …) • on communicationlevel • on logiclevel (statemachineswitching,…) co-ordination in the time domain (synchronization) is an efficientwaytoattainthis => need a global notionof time (discrete „ticks“)
The Quality ofSynchronization local time (numberofticks) precision π real time
Typical Precision Values on systemlevel: ms … ms on algorithmlevel: ms … ms on communicationlevel: ns … ms on logiclevel: ps … ns
SynchronizationRequirements phasesynchronisation(for „hardwareclock“on logiclevel) 1ms isexcellentprecisionfordistributedclock at 1GHz thismeans 360.000° phaseshift clocksynchronisation(fordistributed time baseon algorithmiclevel)
GloballySynchronous Design whole design is „isochronic“ („perfect“ precision) • time conveyedbyclocktransitions • perfectco-ordination of all activities veryefficient design • canassumeconsistentstates • high levelofabstraction veryefficientimplementation: • singlecrystaloscillator • singlecontrolline (clocknet)
„Isochronic“ Regions ? speedof light (in medium) = 2x 108 m/s = 20cm/ns Ref 2cm 1GHz 4GHz 8GHz
The Variation Problem Designer User ?(unknown) projectedconditions actualconditions worstcase actualsystem ?(imperfections) systemmodel safetymargins Timing completelyfixed after design Nowaytoreacttoactualconditions & system („PVT variations“)
Fault-Tolerant Architectures • Duplication & Comparison • Triple-Modular Redundancy FU FU vo-ter ERR Y =? FU FU FU
Lock-Step Operation singleclock singlepointoffailure goodreplicadeterminism FU „3“ „4“ vo-ter Y FU „4“ „3“ FU „4“ „3“
Lock-Step Operation independentclocks single fault tolerant badreplicadeterminism FU „3“ „4“ vo-ter Y FU „3“ „4“ FU „3“ „4“
Fault-Tolerant HW-Clocking FU v vo-ter Y FU v FU v
Fault-Tolerant HW-Clocking ? FU v D vo-ter Y FU v D ? FU v
The Charme ofSoCs billionsoftransistors fit on one die => structuringinto (IP) modules „System-on-Chip“ BUT: large clockdistributionnetworks => „isochronic“?? FT clockingdoes not workwith large skew mayneed individual clocksforfunctionmodules => clock-synchronyneitherattainablenordesirable
Co-ordination of Data Exchange Whencan SNK useitsinput? Whenitis valid andconsistent f(x) SRC SNK Whencan SRC applythenextinput? When SNK hasconsumedthepreviousone
The Synchronous Approach f(x) SRC SNK co-ordination based on (global) time
Alternative: Asynchronous Design co-ordination based on handshaking REQ: „Data word valid, youcanuseit“ f(x) SRC SNK ACK: „Data wordconsumed, send thenext“
Async. Design – Advantages closed-loop controlmakestimingmuchmore robust and adaptive to PVT variations noneedforworst-casetiming localhandshakesreplace global clock activityonlywhenneeded beneficialfor EMI tendstostopoperation in caseof fault
Async. Design – Disadvantages Need to handle racebetween REQ anddata
Async. Design – Disadvantages Need to handle racebetween REQ anddata REQ: „Data word valid, youcanuseit“ f(x) SRC SNK
Async. Design – Disadvantages Need to handle racebetween REQ anddata Solution 1: „Bundled Data“ REQ: „Data word valid, youcanuseit“ f(x) SRC SNK
Async. Design – Disadvantages Need to handle racebetween REQ anddata Solution 2: „Delay Insensitive“ (Coding) REQ: „Data word valid, youcanuseit“ Completiondetection f(x) SRC SNK
Async. Design – Disadvantages Need to handle racebetween REQ anddata significant HW overhead (coding, delayelements) „adaptive“ timing not aspredictable moredifficultto design classical fault-toleranceschemes not applicable tendstostopoperation in caseoffault
Best ofBothWorlds GALS: GloballyAsynchronousLocallySynchronous retainefficiencyofsynchronous design whereverpossible: „intra-module“ useasynchronousprinciplewhere clockdistribution toocumbersome: „inter-module“ First mention in PhDthesisbyChapiro / Stanford 84
A GALS Example DSP2,7GHz CPU2GHz PCI-IF533MHz USB-IF24MHz
Communication in GALS Shared Memory producerwritestomemory, consumerreadsfromtherepro: controlflowstaysindependent • shared single-portmemory • true dual-portmemory Direct Messages (Data words) movedatawordfromproducer‘soutputregistertoconsumer‘sinputregister • non-buffered / buffered (FIFO-queues) • clockfixed, data-drivenorpausible
SharedMemory decouplingofclockdomainsbymemoryactingas a thirdparty => high areaoverhead => unusual forsingleportmemoryarbitrationrequired • arbitrationproblem (unboundeddelay…) • onesidemay block theotheratthearbiter formultiportmemoryproblemsareconfinedtoaccesstothe same cell • busyflagmaybecomemetastable • blocking still possibleforonespecificaddress
Shared Memory perfectdecouplingofdatapath potential metastabilityproblemsatarbitrationlogic potential blockingthrougharbitration DSP2,7GHz CPU2GHz 0xff14 Arbi-tration shared memory
Direct Messages clockdomainboundaryisbetweenproducer‘soutputregisterandconsumer‘sinputregister in general a synchronizerisneededatconsumer‘sinput • definitelyforconventional (fixed) clock • canbeavoidedbydata-driven / pausibleclocking controlflowsofproducerandconsumerarestronglycoupled: not maintainingtheinput/outputregisterblocksotherparty buffers/queues/FIFOscan • mitigate, but not avoidthisproblem (full/empty) • compensatevariations in thedata rate on bothsides, but not different averagedatarates
Direct Messages S S datamovingoverclockdomainboundary metastabilityproblems => needtoinserthandshake …withsynchronizers DSP2,7GHz CPU2GHz 0xff14 and (optional) buffers
Arbiter: Principle purpose: ○ manage concurringrequeststosharedresource method: ○ handle pairsofrequest_in / grant_out ○ requestsmayarrive in anyorder ○ arbitermust activateonlyonegrant_outat a time(respondtothefirstrequester)Mutual Exclusion (MUTEX) problem: ○ resolveconcurrentrequests => metastabilityproblem
Arbiter: Circuit MUTEX-element: SR-latch Vout,FF R1 G1’ G1 Vmeta Vth,inv G2’ G2 R2 t „Metastabilityfilter“: e.g., hi-thresholdinverter [from D. J. Kinniment „Synchronizationand Arbitration in Digital Systems“, Wiley]
Arbiter: Operation R1 G1’ G1 G2’ G2 R2 R1 R2 G1 G2
Muller C-Element IF a = bTHEN y = aELSE hold y a b C y reset a y RS a C b set y b
Muller C-Element: Circuit [Alan Martin, Caltech]
Data-DrivenClocking Principle:○ assoonasnewdataarrive => startclocking ○ determinenumberkofclockcyclesrequiredtoprocessnewdata ○ stopclocking after kcycles, waitfornextdata Properties:○ needtoswitchclock on and off => bewarespuriousclockpulses! ○ nometastabilityproblem: datastableassoonasconsumerclockstarts ○ potential for power saving ○ usefulforspecificapplicationsonly (nopipe!)
Data-DrivenClock: Circuit / 1 CLK out • CLK half period determined by D D D CLK out
Data-DrivenClock: Circuit / 2 CLK out • transition on REQ answered by transition on CLK out • min CLK half period deter-mined by D C REQ D ACK D CLK out REQ ACK
PausibleClocking Principle: ○ producerrequestsconsumer‘sclocktopause ○ dataprovidedtoinputregisterduringidle time ○ consumer‘sclockmayresume - freerunning („pausibleclock“) - withonecycleonly („stoppableclock“) Properties: ○ needtoswitchclock on and off => bewarespuriousclockpulses! => bewareofclocktreedelays! ○ producercontrolsconsumer‘sclock (blocking!) ○ applicationsmust copewithpausedclock
PausibleClock: Circuit / 1 CLK out • inverter generates next REQ from ACK • self-oscillation C REQ D ACK D CLK out REQ ACK
PausibleClock: Circuit / 2 CLK out ACK’ • external unit can safely stop CLK by activating REQ’ • … and gets ACK’ as a response C REQ’ Arb D D CLK out REQ’ ACK’
PausibleClock: Circuit / 3 ACK1 ACKn CLK out REQ1 REQn Arb Arb C D • for more external sources arbiters can be added and “anded” before the Muller C-Element • the two inverters can be eliminated by using a Muller C-Element with inverting output
Advantages of GALS synchronousislandscanbedesignedefficiently modulesoperateindependently canusemodulespecific-clock & timing clockingisnosinglepointoffailure
Problems with GALS operationofmodules not (inherently) co-ordinatedsynchronyforcommunication but not on system / algorithmlevel communicationhastocrossclockboundaries potential formetastability=> performancepenaltythroughsynchronizers OR => module must handle irregularclocking
The DARTS Idea Distributed Algorithmsfor Robust Tick Synchronization phase synchronisation tick synchronisation clock synchronisation
The DARTS Approach Concept:Multiple synchronized tick generators Method: Distributed algorithm for fault-tolerant tick generation implemented in (asynchronous) digital logic Advantages • No crystal oscillator(s) • No critical clock tree • Clock isno single point of failure! • Reasonable synchrony