210 likes | 322 Views
The Next Big Leap In Adaptive/ Reconfigurable Systems. Bob Plunkett Director, Product Development. Time for an IC Paradigm Shift. Design costs of custom silicon dominate the cost equation for low-volume applications Per part manufacturing cost is increasingly irrelevant
E N D
The Next Big Leap In Adaptive/ Reconfigurable Systems Bob Plunkett Director, Product Development
Time for an IC Paradigm Shift • Design costs of custom silicon dominate the cost equation for low-volume applications • Per part manufacturing cost is increasingly irrelevant • Fixing design problems has significant time and high cost implications • Mask sets >$1M • Up to 9 months to redo layout and cycle through fab • Power efficiency improvements of conventional ICs have not kept pace with increasing IC complexity • Per part power consumption is increasing • Standard off-the-shelf silicon does not deliver needed performance of modern communication algorithms • e.g. Wideband CDMA systems Cost Fixes Power Performance Gap
Requirements for the Next-Generation IC • Algorithmic Breadth • Ability to handle wide range of algorithms without modification • Abstraction • Hide details of implementation from developer • Unified Programming Model • Program system as a single unit, using a single tool • Scalability • Allow system to grow from simple to very complex, without changing programming model, without extensively modifying software • Relocatability • Leave details of where an application runs to the operating system • Eliminate need for developer to understand other applications running on the same machine
Requirements for the Next-Generation IC • Forward Compatibility • Permit code written for one machine to be carried forward to a new machine, without extensive modification • Exploit Algorithmic Parallelism • Break away from linear programming model of microprocessors and DSPs • Allow algorithms to spread out over the hardware, supporting both homogeneous and heterogeneous parallelism • Exploit clock speed and silicon size improvements • Efficiently use silicon rather than adding more silicon • Avoid redesign of an implementation and take advantage of faster clocks and smaller parts • Low-level optimization • Provide tools to support low-level optimization when performance is the key criteria for implementation success
Adaptive Computing – 3 Elements • Silicon IC • Scalable • Heterogeneous at very low level • Adapts clock cycle by clock cycle • Development tools • New SilverC language that bypasses limitations of old sequential programming models • Operating system • Module independence from hardware addresses • Manage download of new SilverWare (binary) modules The Adaptive Computing Machine (ACM) is the first and only, all software-programmable, real-time, adaptive IC that delivers high performance, low power consumption, low cost and architecture flexibility in a single chip.
ACM Approach Power Mgmt. Interleaver FEC Encoder Ext I/O Power Mgmt. Packet Controller Flash ACM Modulator RF System Timer Audio Codec Flash X Searcher Vocoder A/D Searcher USB Searcher Audio Codec Frequency Synthesizer Filters D/A RF UI / Graphics SRAM USB X Channel Coder RAKE Receiver SRAM Channel Decoder AGC Algorithm Protocol Stack ASIC Error Correction DSP Interleaver MMU RISC The All Software Solution Design for Consumer Products Conventional IC Approach • Power consumption < 87% • Die size reduces > 60% • Performance increases ~ 9x • Design time drops ~ 50% Applicable where power/cost/flexibility matter!
The Adaptive Computing Machine Heterogeneous nodes are connected by homogenous communications network Interconnect Matrix • Scalable from one node to thousands • Point-to-point interconnect for tasks between nodes • Packet-based • Abstracted in SilverC language/tools – easy to program Node
Algorithmic Diversity • Both high and low vector content • Both streaming and block oriented • Dataflow • Complex control • Linear and non-linear arithmetic • Bit manipulations • Variable word sizes – 8, 16, 32 bits and stuff in between • Complex high-speed, finite-state machines • Memory, gate, bandwidth dominated designs • Both parallel and sequential dominated algorithmic elements Diverse Algorithms Require Diverse Hardware --- Heterogeneity
Heterogeneity AN2 – 2.1 GOPs DAN – 4.5 GOPs NODE PSN – 0.5 GOPS AXN – 11 GOPs Drawing size is relative to silicon area DFN – 17 GOPs DBN – 45 GOPs
Heterogeneity 4 Degrees – Algorithm, Control, Parallelism, Bit Widths
Abstract the Hardware Differences Diverse architecture simplified by Unified Programming Model from inside Node Wrapper AN2 – 2.1 GOPs DAN – 4.5 GOPs NODE MIN INPUT NODE PSN – 0.5 GOPS WRAPPER PIPELINE HARDWARE AXN – 11 GOPs DMA TASK MEMORY ENGINE MANAGER COMPUTING UNIT DFN – 17 GOPs PIPELINE MIN OUTPUT DBN – 45 GOPs
Abstract Away Connections Complex connections are abstracted into buffered pipes in a dataflow model • Abstracted in SilverC language/tools – easy to program • Programmer only cares about node type -- not where, not which one, not which address • Tasks are relocated through simple assignments
Abstract Away the Adaptability • SilverC is a system design language for hardware abstraction • Algorithms are represented as tasks • Tasks execute in a dataflow model • Tasks operate asynchronously based on data availability • Abstraction allows tasks to be easily relocated • OS hides details of underlying hardware changes & assignments from programmer Input Port TASK Output Port Node Node Node Node Node Node Node Node
Scalability ACM scales from a few nodes to hundreds of nodes Same programming model Same application code
Producer <16> Min<16> Consumer<1> Relocatability - SilverC SilverC Example - 3 Tasks, 2 Links • All addressing is symbolic – Linker/Loader handles the details of specific implementations • void main() { • Producer<16> myProducer; • Min<16> myMin; • Consumer<1> myConsumer; • link(myProducer.dataOut, myMin.dataIn); • link(myMin.dataOut, myConsumer.dataIn); • execute; • }
Forward Compatibility • ACM provides forward compatibility through the all software model • Hardware assignments done at linking/load time • Higher clock speeds have no impact on software design • New capabilities can be introduced through new nodes with no impact to existing designs • Evolution of adaptive computing will provide for full compilation of applications onto nodes
Exploit Algorithmic Parallelism • ACM supports both homogeneous and heterogeneous parallelism • Homogeneous Parallelism: Same algorithm spread across multiple nodes – need not be the same node type • Heterogeneous Parallelism: Different algorithms spread across multiple nodes • Programmer can exploit parallelism to the maximum availability of nodes • Parallelism: Can be homogeneously done for a task by creating additional instances on multiple nodes – only the upstream feeder and downstream consumer tasks are modified to support the parallelism • Several nodes have internal structures that support homogeneous parallelism within the node, e.g. AXN, DBN, DFN
Exploit Speed/Silicon Size Improvements • Node designs behave like processors – faster clocks mean the node finishes its tasks faster • Without more tasks to run, a node has more idle time with a faster clock • Exploit higher clock speeds by adding more tasks to a node – existing tasks remain unmodified • New tasks can be transfers from within an application – fewer nodes needed – smaller part • New applications – put more capability on an existing design • Unlike an ASIC or an FPGA, a new design is not required to take advantage of faster silicon • Hardware timing does not affect software behavior • Software design remains unmodified and ports transparently
Allow Low-Level Optimization Node assemblers permit low level optimization of algorithms
Conclusion • Adaptive computing sets a new paradigm for IC technologies -- all software programmable, highly scalable, high performance, low power consumption, low cost, architecture flexibility • ACM Algorithmic breadth eliminates the need for discrete uP, DSP, ASIC/FPGA solutions • ACM merges heterogeneous processing elements under a Unified Programming Model, abstracting many details of the underlying hardware • The all software development model provides forward compatibility • Efficiently use transistors rather than adding more silicon • Exploit clock speed and silicon size improvements • Facilitate adjustments for algorithmic parallelism along the way