310 likes | 373 Views
ECE 506 Reconfigurable Computing http://www.ece.arizona.edu/~ece506 Lecture 1 Course Introduction Ali Akoglu. Background needed for this course. You should be familiar with: Digital design Architecture Controller+Datapath Memory Hierarchy Pipelining More listed in syllabus
E N D
ECE 506Reconfigurable Computinghttp://www.ece.arizona.edu/~ece506Lecture 1Course IntroductionAli Akoglu
Background needed for this course • You should be familiar with: • Digital design • Architecture • Controller+Datapath • Memory Hierarchy • Pipelining • More listed in syllabus • Assumes no knowledge of reconfigurable computing Topic self-contained! Reconfigurable Computing is a lot more than just devices
Goals • Understanding of issues related to RC (reconfigurable computing) • Architectures • Tools • Design methodologies • Detailed investigation of a specific problem • Research project
Course Organization • 3-5 Homework assignments (30%) • Project (45%) • Exam (15%) • Class participation/attendance (10%) • No required text – readings will be assigned from research papers
Project • Groups • Size to be determined based on enrollment • Likely 3-4 per group • Topic subject to instructor approval • Will give examples • Phase 1: Literature Review (no page limit), Due: Feb 19, In class hard copy! • Phase 2: Project Plan, Due Feb 26, In class hard copy • Phase 3: Presentations, starting March 17 • Phase 4: Final Report and Demo (Last class, In class hard copy)
Reading List • Reading List will be posted ahead of time • Collection of papers and/or tutorials • Discussion Oriented • Participation
General Purpose Computing? • In 1945, John Von Neumann demonstrated that a computer could execute any kind of computation, given a properly programmed control, without the need for hardware modification. • Quickly became the fundament of future generations of high-speed digital computers. • One of the reasons is its simplicity of programming that follows the sequential way of human thinking.
General Purpose Computing? • All algorithms must be sequentially programmed to run on a VN computer, many algorithms cannot be executed with their potential best performance.
General Purpose Computing • Advantage: Flexibility: any well coded program can be executed • Drawbacks Speed: Not optimal due to the sequential program execution (temporal resource sharing). Resource efficiency: Only one part of the hardware resources is required for the execution of an instruction. The rest remains idle. Memory access: Memories are about 10 times slower than the processor Drawbacks are compensated using high clock speed, pipelining, caches, instruction pre-fetching, etc.
Domain Specific Processors • Data path is tailored for an optimal execution of a common set of operations that mostly characterizes the algorithms • Digital Signal Processor (DSP) belong to the most used domain-specific processors in telecommunication, multimedia, automobile, radar, sonar, seismic, image processing, etc. • Ability to perform one or more multiply accumulate (MAC) operations in single cycle. • Special support for efficient looping. • Special loop or repeat instruction allows a loop implementation without expending any instruction cycles for updating and testing the loop counter or branching back to the top of the loop. • Customized for data with a given width according to the application domain. (image processing, pixels are represented in Red Green Blue (RGB) system where each color is represented by a byte, then an image processing DSP will not need more than 8 bit data path. • Specialization increases the performance and improves the device utilization. • Flexibility is reduced, because it cannot be used anymore to implement other applications other than those for which it was optimally designed.
Application Specific Processors • Although DSPs incorporate a degree of application-specific features such as MAC and data width optimization, they still remain sequential machines. • If a processor has to be used for only one application, then the processing unit could be designed and optimized for that particular application. • In multimedia processing, processors are usually designed to perform the compression of video frames according to a video compression standard.
Application Specific Processors if (a < b) then { d = a+b; c = a*b; } else { d = b+1; c = a-1; } At least 3 instructions Exec.time >= 3*tinstruction The complete execution is done in parallel in a single clock cycle Exec.time = tclock= delay longest path from input to output The VN computer needs to be clocked at least 3 times faster (to reach equal exec.time)
Implementation Spectrum (Hardware vs. Software) • Computer hardware, such as application-specific integrated circuits (ASICs) • provides highly optimized resources for quickly performing critical tasks, • but it is permanently configured to only one application via a multimillion-dollar design and fabrication effort. • Computer software provides the flexibility to change applications and perform a huge number of different tasks, • orders of magnitude worse than ASIC implementations in terms of performance, silicon area efficiency , and power usage. • Reconfigurable hardware blends the benefits of both hardware and software. • implement circuits just like hardware, yet can be reprogrammed cheaply and easily to implement a wide range of tasks. Microprocessor Reconfigurable Hardware ASIC
Processing Approaches, Need for Reconfigurable Computing Programmable ASIC Reconfigurable Special Purpose General Purpose Data Level Instruction CISC RISC Parallelism Level SIMD MIMD VLIW/ With Media Without Superscalar extended ISA ISA
What is Reconfigurable Computing? • Computation using hardware that can adapt at the logic level (post-fabrication) to solve specific problems • A way of implementing circuits without fabricating a device • Spatial structure of the device is modified to match the new application.
X Y * a * b + Z Reconfigurable Computing • What is it? • Compute by building a circuit rather than executing instructions. • Efficient for long running computations • Video and image processing • DSP • Network processing Example: Z[i] = a.X[i] + b.Y[i] //program Load rx, X Mpy r1, rx, ra Load ry, Y Mpy r2, ry, rb Add r3, r1, r2 Store r3, Z implement computations spatially , simultaneously computing millions of operations in resources distributed across a silicon chip.
Reconfigurable Computing? • can be hundreds of times faster than microprocessor-based designs • unlike in ASICs, computations are programmed into the chip, not permanently frozen by the manufacturing process. • FPGA-based system can be programmed and reprogrammed many times. • a bug fix to correct faulty behavior, or it is used to add a new feature. • reconfigure a generic computation engine for a new task • reconfigure a device during operation to allow a single piece of silicon to simultaneously do the work of numerous special-purpose chips
Reconfigurable Computing? • Delivering best of hardware and software , not quite! • creating efficient programs for them is more complex • useful only for operations that process large streams of data, such as signal processing, networking, and the like. • Compared to ASICs, they may be 5 to 25 times worse in terms of area, delay , and performance. • ASIC design may take months to years to develop and have a multimillion-dollar price tag • RC design might only take days to create and cost tens to hundreds of dollars. • For systems that do not require the absolute highest achievable performance or power efficiency, RC is a compelling design alternative.
Reconfigurable Computing? • Current devices can compute functions • on the order of millions of basic gates, • running at speeds in the hundreds of Megahertz. • To boost speed and capacity , additional, special elements can be embedded • such as large memories, multipliers, fast-carry logic for arithmetic and logic functions, and even complete microprocessors. • Reconfigurable devices today are capable of implementing complete systems
Reconfigurable Computing Devices: FPGA • Field Programmable Gate Arrays • Logic blocks in a general routing structure. • arrayof logic gatesis the G and A in FPGA. • logic blocks perform simple combinational logic, as well as sequential logic. • FPGA can implement very complex circuits. • The logic and routing elements in an FPGA are controlled by programming points • By way of a configuration file or bitstream, an FPGA can be configured to implement the user’s desired function. • allowing customization at the user’s electronics bench, or even in the final end product. • This is why FPGAs are field programmable
FPGA • customizing an FPGA merely involves storing values to memory locations, similarly to compiling and then loading a program onto a computer, the creation of an FPGA-based circuit is a simple process of creating a bitstream to load into the device
Reconfigurable Computing, Function Level Programming? • Because of the FPGA’s dual nature—combining the flexibility of software with the performance of hardware—an FPGA designer must think differently from designers who use other devices. • Software developers typically write sequential programs that exploit a microprocessor’s ability to rapidly step through a series of instructions. • In contrast, a high-quality FPGA design requires thinking about spatial parallelism—that is, simultaneously using multiple resources spread across a chip to yield a huge amount of computation.
Reconfigurable Computing, Function Level Design? • the flexibility of FPGAs gives architects new opportunities generally not available in ASICs • designs can be rapidly developed and deployed, and even reprogrammed in the field with new functionality . • they do not demand the huge design teams and validation efforts required for ASICs. • Also, the ability to change the configuration, even when the device is running • However, FPGAs are noticeably slower and have lower capacity than ASICs, designers must carefully optimize their design to the target device.
Fields of Application • Rapid Prototyping: Testing hardware before fabrication • Software simulation • Relatively inexpensive • Slow • Accuracy? • Hardware emulation • Hardware testing under real operation conditions • Fast • (Relatively) Accurate • Allows for several iterations
Fields of Application • Post-fabrication Customization • Time to market advantage • Ship the first version of a product • Remote upgrading with new product versions • Remote repairing
Fields of Application • Multi-modal Computation: Reconfigurable vehicles, mobile phones, etc. Built-in Digital Camera Video phone service Games Internet Navigation system Emergency Diagnostics Different standard and protocols Monitoring Entertainment
Fields of Application • Adaptive Computing Systems • Computing systems that are able to adapt their behavior and structure to changing operating and environmental conditions, time-varying optimization objectives, and physical constraints like changing protocols, new standards, or dynamically changing operation conditions of technical systems.
Fields of Application • Fault tolerance • Autonomous fault detection on communication lines • Detections of defect nodes • Task migration on node failure • Load balancing computation
Conclusion • 10 years of Moore’s-law progress led to the microprocessor • Raised engineers’ productivity • Problem-solving became programming • Grew to billions of units/year • Further speed gains will not be seen any more due to unreliability and higher variations of transistor • Stalled progress in design methods for thirty years • Future Multi-Core Designs are already available, but do have major problems: • Shared Memory Model does not scale to hundreds of processors on a chip • Distributed Memory Model is difficult to program • Power consumption and temperature are further problems • Reconfigurable Processors, Networks, and Memories on a Chip may be the solution…
Hot Reconfigurable Computing Research Areas • Developing power-efficient architectures and CAD techniques for FPGAs • Important new applications for reconfigurable devices (especially embedded applications and security) • Better understanding the role of standard microprocessors and reconfigurable hardware. • Multiple types of parallelism • Coarse-grained reconfigurable architectures • 3D Reconfigurable Architectures • Autonomous Systems • Self-healing