190 likes | 304 Views
AMULET3i - asynchronous SoC. Steve Furber - sfurber@cs.man.ac.uk Agenda: AMULET3i Design tools Future problems. AMULET3. a third generation asynchronous ARM performance comparable with ARM9 radically new internal organisation based on reorder buffer Harvard core, unified I/D memory
E N D
AMULET3i - asynchronous SoC Steve Furber - sfurber@cs.man.ac.uk • Agenda: • AMULET3i • Design tools • Future problems
AMULET3 • a third generation asynchronous ARM • performance comparable with ARM9 • radically new internal organisation • based on reorder buffer • Harvard core, unified I/D memory • under development within the OMI ATOM project • first application as as part of a telecommunications controller
AMULET3 core organisation • Harvard core • forward from reorder buffer • out-of-order completion • in-order register update • aborts handled at writeback
AMULET3H local bus RAM • segmented memory • I & D ports arbitrate at each block • quad-word I & D line buffers
AMULET3 tools • LARD • behavioural modelling tool for async design • 10x designer productivity vs Asim • Petrify • much enhanced FORCAGE descendant • can handle wider range of circuits • Balsa • synthesis tool used for DMA controller
Tools - LARD • Language for Asynchronous Research and Development • parallel processes with communication primitives • extensive data types • modelling of elapsed time • used to model AMULET3 • available from AMULET web site
Tools - LARD • Features • time view • block view • HLL debug • test generation • co-simulation • Platforms • UNIX/Linux
Tools - BALSA • Synthesis system for asynchronous circuits • similar to Philips ‘Tangram’ • used for AMULET3H DMA controller • direct HLL to netlist compilation • syntax directed translation • peephole optimisation
Tools - Petrify • Petri Net modelling tool • for low-level asynchronous circuits • speed-independent synthesis • technology mapping • very powerful • can be tricky to use • extensively used to design AMULET3 modules
AMULET3 validation • workstations now powerful enough to run ARM validation suite under TimeMill • around 8 CPU-weeks total • testing full functionality now very hard • very complex system-on-chip • design aimed at high performance • timing margins much reduced • validation complex and uncertain
AMULET3 - problems • high performance target • timing margins must be small • timing is hard to verify • very dependent on accurate extraction, models • modelling tools are imperfect • e.g. crosstalk • bus wire delay 1.5ns +/- 1ns crosstalk • careful layout gives 0.9ns +/- 0.15ns • how can we be sure such factors are OK?
The Future • timing accuracy is getting harder • wire delays will become more significant • crosstalk will get worse • on-chip transistor variance will increase • higher speeds will lead to higher noise • will delay-matching be viable? • alternatives are dual-rail or other DI codes • incur significant area and power overheads
Alternatives to bundled data • Delay-insensitive codes • timing encoded in data • dual-rail encoding • 100% area overhead c.f. bundled data • significant power cost • e.g. NCL from Theseus • deal just announced with Motorola • use conventional synthesis tools • timing closure ceases to be an issue
Alternatives to bundled data • Delay-insensitive codes • N-of-M codes • 3-of-6 code • 50% area overhead • 3 transitions to send 4 bits • 2-of-7 code • 75% area overhead • 2 transitions to send 4 bits • well-suited to inter-chip communication • may suit on-chip buses
Conclusions • complex async design is feasible • standard tools are just about survivable • additional tools improve productivity • ideal design flow: • LARD-like specification • formal verification of high-level properties • automated synthesis onto module library • timing closure is the major problem • may ultimately rule out bundled data