200 likes | 239 Views
Layered Approach To Intrinsic Evolvable Hardware Using Direct Bitstream Manipulation Of Virtex II Pro Devices. Rashad S. Oreifej, Rawad N. Al-Haddad, Heng Tan and Ronald F. DeMara University of Central Florida. Evolvable Hardware. Intelligent Search. Hardware Design. Automated
E N D
Layered Approach To Intrinsic Evolvable Hardware Using Direct Bitstream Manipulation Of Virtex II Pro Devices Rashad S. Oreifej, Rawad N. Al-Haddad, Heng Tan and Ronald F. DeMara University of Central Florida
Evolvable Hardware Intelligent Search Hardware Design Automated Construction: developElectronic CircuitsbyIntelligent Search Applications:Design,Optimization, orFailure Recoveryphases Bayesian Amplifiers Simulated Annealing Filters Genetic Algorithms FPGAs Nearest Neighbor Antennas Evolvable Hardware Applications Individual (Chromosome) GAs frequently use binary strings to represent candidate solutions: genotype • Translation to FPGA Configuration bitstream maps genotype to phenotype FPGAs for evolving digital logic GENE
Genetic Algorithm Genetic Algorithm Simulation in the loop Hardware in the loop Done? Build it software model GAs and Evolution Genetic Algorithms: • Implement guided trial-and-error search using principles of Darwinian evolution • Iterative selection enforces “survival of the fittest” • Genetic operators - mutation, crossover, … - can be used to refurbish designs Extrinsic Evolution Intrinsic Evolution • Functional models abstract physical aspects of device • Representation has to undergo placement and routing before implementation. • Fitness measured using physical device output • Observes constraints imposed by internal structure
Related Work • Conventional vs. Evolutionary Design [Miller, Alg. Evol Strat. 98] • GA is presented that can evolve 100% functional adder and multiplier circuits • Explored the effect of the device physical constraints (Xilinx 6216 FPGA) • Emphasized EH feasibility over FPGA implementation concerns • Fitness-based vs. Population-based Evolution [Keymeulen, IEEE Trans. Rel 02] • Design fault-insensitive electronic components using evolutionary techniques • Online and offline repair techniques via an intrinsic design tool (EHWPack) • Fine-grained CMOS Field Programmable Transistor Array (FPTA) architecture is used to evolve analog multiplier and digital XNOR • Intrinsic EHW on Virtex Devices [Hollingworth, ICES00] • Evolution by partial reconfiguration of bitstream for changes from baseline circuit • Runtime configuration using Xilinx’s JBits Interface (Java in the loop) • Recent General-purpose Frameworks Support Bitstream Reuse • Blodget et al [Blodget FPL03] • Two-layer framework for Virtex II devices using Xilinx Partial Reconfiguration Toolkit (XPART) utilzing a soft processor core within the FPGA • Williams et al [Williams ERSA04] • Egret focuses on a full SOC solution using ICAP and an embedded Linux system on a Xilinx Virtex II chip with bash shell scripts to perform operations, such as obtaining partial bit streams from remote servers, and initiating reconfiguration • Kalte et al [Kalte PDPS05] • REPLICA (Relocation per online Configuration Alteration) filter uses the SelectMAP interface to perform bitstream manipulation to carry out the relocation during the regular download process
UCF Intrinsic Evolution Platform The developed platform utilizes the following hardware components on the FPGA chip: • JTAG (IEEE 1149.1) Port • Half-duplex serial communication interface • Connects to the General-purpose Native jtAg Tester (GNAT) from the FPGA side, and to the parallel port (IEEE 1284) on the host PC using a Xilinx Parallel Cable • Confers input/output data exchanged between the host PC and the FPGA • GNAT • Implemented in the bitstream to reside on the reconfigurable area • Connects to the BSCAN_VIRTEX2 block via the Test Data Input (TDI), Test Data Output (TDO), and Control signals, and to the targeted circuit via a straightforward read/write bus interface • Evolved Circuit • Circuit to be evolved on the FPGA chip • Circuit peripherals are connected to the read/write bus of the GNAT to receive/deliver data throughput input/output
UCF Platform Software Components The developed platform consists of following software components: • GA Engine • C++ based console application implemented using an object oriented architecture • Implements a conventional population-basedGA with runtime customizable parameters • Chromosome Manipulator • C based GA operators library (yet executed using Visual Studio .NET) • Provides a logical abstraction and hardware transparency of genetic operators to the GA Engine module • MRRA • Partitions operations into Logic, Translation, and Reconfiguration layers with a standardized set of APIs • FPGA configurations are manipulated at runtime using on-chip resources on Xilinx Virtex II Pro via PC (JTAG) or PowerPC (SelectMAP) • Bitstream File • Pre-compiled baseline bitstream generated using the Xilinx CAD tools • The platform manipulates this bitstream to carry out the physical mapping of the crossover or mutation
Intrinsic Evolution Workflow START: module-based flow Iterate: frame-based flow
Multilayer Runtime Reconfiguration Architecture (MRRA)Framework for Dynamic Reconfiguration • Three layers (Logic, Translation, and Reconfiguration) with well-defined interfaces promoting modularity and reuse within a set of high-level APIs to carry out the partial reconfiguration process with reduced manual intervention. • Task-level Modularity: provide support at levels down to and including task-level granularity. A task is defined as an arbitrary function synthesized to a module that can be dynamically downloaded into the reconfigurable device: Module-based or Frame-based. • Runtime Scenario Support: provide the ability to generate and reconfigure task bitstreams at runtime as well as design-time. Runtime scenarios envisioned at design-time may not necessarily know in advance which tasks will arrive nor when they will arrive, and in selected cases what some of their specific properties will be. • Encapsulation: control logic of each layer self-contained with fixed interface to other layers. If new control algorithms are added or the device platform is changed, the system can be ported more readily.
MRRA Logic Control Flow Integrated and adopted Module-based Flow from the standard Xilinx flow plus selected area management ability and direct bit management process, we term Frame-based Flow. Module-based utilized at design time. Later, translation engine supports autonomous reconfiguration without GUI interface. • One-Dimensional Area Management performed on full physical FPGA device by partitioning into 1-dimensional column-based rectangles, for fixed and reconfigurable modules arranged based on size and specified area constraints. Tools, such as PlanAhead, are accommodated. • Bus Macros maintain correct connections between modules by spanning boundaries of these rectangular regions. Next, the modules are implemented and verified individually to create the Module Implementation and optimized by additional Two-Dimensional Area Allocation placements inside each module to minimize the partial reconfiguration bitstream size. • After initial bitstream download, precompiled partial bitstreams can be monitored by algorithms in Logic Layer and updated directly to device for dynamic reconfiguration. New modification requests can be generated by the user logic in the form of hardware-independent representation depicted by the Runtime Flow. Although boundary of module is fixed, physical logic resources inside can be modified at runtime.
X Y Adder / Cout / Bout Cin / Bin Subtracter S / D X Y Cin Cout S X Y Bin Bout D X X 0 0 0 0 0 0 0 0 0 0 96 96 Y S Y D 0 0 1 0 1 0 0 1 1 1 Logic Cin 0 1 0 0 1 0 1 0 1 1 Bin 0 1 1 1 0 0 1 1 1 0 Switch 1 0 0 0 1 1 0 0 0 1 1 0 1 1 0 1 0 1 0 0 E 8 8 E Cout Bout 1 1 0 1 0 1 1 0 0 0 1 1 1 1 1 1 1 1 1 1 ( a ) 1 Bit Full Adder ( b ) 1 Bit Full Subtracter Direct Bitstream ManipulationConcept and Case Study • Change one-bit full adder to a one-bit full subtracter • Both have three one-bit inputs and two one-bit outputs, 2 LUTs with identical logic connections between LUTs and I/O signals • Only difference is only one truth table stored inside one LUT, changing from 0xE8 to 0x8E • Practical case study: dynamically reconfigurable SHA-1/MD5 Message Digest hashing algorithms:
Direct Bitstream ManagementEquations deduced to locate logic content in V2Pro bitstream • Each CLB has 4 slices in 2 cols/2 rows as XiYj, where X is the slice column number, 0 <= i <= 2N-1, beginning from left. N=number CLB cols. Y = row number 0 <= j <= 2K-1 from bottom to top and K=number CLB rows, e.g. XC2VP7 N=40, K=34 • Configuration frame has unique 32-bit address of Block Address (BA), a Major Address (MJA), a Minor Address (MNA), and a byte number offset • Let X denote column and overhead include GCLK + leftmost IOB + IOI col (e.g. 3): • Full configuration file: organized consecutively by frame without labeling: • In 5 bytes of slice, first 16 bits for G-LUT truth table (left to right as MSB to LSB) and the last 16 bits for F-LUT (reverse order from LSB to MSB). Each LUT max 4 inputs with up to 16 truth table elements but when less than 4 inputs utilized, remaining unused entries are filled with the duplicated effective values of the used entries:
Stuck-at Zero and One Fault Modeling LUT address
Performance Metrics : The numerical measure of fitness for best individual of final generation, e.g. 2^(two 4-bit inputs) * 5-bit output=1280 : The arithmetic mean for the fitness of all individuals in the final generation of the run : The total number of generations in the run : The time elapsed to perform the GA crossover and mutation during the entire run : The time elapsed to apply the input patterns and read back the corresponding outputs for all the fitness evaluations during the entire run : The average time taken by a single genetic crossover for a certain GA run : The average time taken by a single genetic mutation for a certain GA run
Experimental Results Summary Fastest convergence Repair must overcome failed resource limitation Microsecond Order
Circuit Evolution: Fitness vs. Time Unseeded Design Seeded Design Repair:Stuck-at Fault
Results Summary • An intrinsic evolution platform is developed for genetic operators and fitness assessment using API layers which directly manipulate the configuration bitstream on Xilinx Virtex II Pro devices • Three experiments were conducted: unseeded design, seeded design, and repair • Full design/repair is achievable using this platform with an average time of 0.4 microseconds to perform the genetic mutation, 0.7 microseconds to perform the genetic crossover, and 5.6 milliseconds for one input pattern intrinsic evaluation • Performance advantage of three orders of magnitude over JBITS and more than seven orders of magnitude over the Xilinx design tool driven flow for realizing intrinsic genetic operators on a Virtex II Pro device • Current work is on utilizing partial reconfiguration to reduce JTAG transfer time and porting to Virtex-4 platform Millisecond Order Multiple Seconds
References [1] S. Vigander, "Evolutionary Fault Repair in Space Applications," in Dept. of Computer & Information Science, vol. Masters Thesis. Trondheim: Norwegian University of Science and Technology (NTNU), 2001. [2] J. F. Miller, P. Thomson, and T. Fogarty., "Designing Electronic Circuits Using Evolutionary Algorithms. Arithmetic Circuits: A Case Study," in Algorithms and Evolution Strategy in Engineering and Computer Science, D. Quagliarella, J. Periaux, C. Poloni, and G. Winter, Eds. Chichester, England, 1998, pp. 105-131. [3] D. Keymeulen, R. S. Zebulum, Y. Jin, and A. Stoica, "Fault-Tolerant Evolvable Hardware Using Field-Programmable Transistor Arrays," IEEE Transactions On Reliability, vol. 49, issue 3, September 2000. [4] R. S. Oreifej, C. A. Sharma, and R. F. DeMara, "Expediting GA-Based Evolution Using Group Testing Techniques for Reconfigurable Hardware," in proc. International Conference on Reconfigurable Computing and FPGAs (Reconfig'06), San Luis Potosi, Mexico, September 20-22, 2006, pp. 106-113. [5] R. F. DeMara and K. Zhang., "Autonomous FPGA Fault Handling through Competitive Runtime Reconfiguration," in Proc. of the NASA/DoD Conference on Evolvable Hardware (EH'05), Washington D.C., U.S.A, June 29-01, 2005. [6] G. Hollingworth, S. Smith, and A. Tyrrell, "The intrinsic evolution of virtex devices through internet reconfigurable logic," in Proc. of the Third International Conference on Evolvable System, April 2000. [7] H. Tan and R. F. DeMara, "A Device-Controlled Dynamic Configuration Framework Supporting Heterogeneous Resource Management," in proc. of the International Conference on Engineering of Reconfigurable Systems and Algorithms (ERSA'05), Las Vegas, Nevada, U.S.A, June 27-30, 2005. [8] D. Wallace, "Using the JTAG Interface as a General-Purpose Communication Port," www.xilinx.com/publications/xcellonline/xcell_53/xc_pdf/xc_jtag53.pdf, 2005. [9] Xilinx, "Parallel Cable IV Connects Faster and Better," Xcell Journal, Spring 2002. [10] Xilinx, "Using a Microprocessor to Configure Xilinx FPGAs via Slave Serial or SelectMAP Mode," v1.4, November 2003, [11] B. Blodget, P. James-Roxby, E. Keller, S. McMillan, and P. Sundararajan, “A Self-Reconfiguring Platform”, in Proceedings of Field-Programmable Logic and Applications 2003, Lisbon, Portugal, September 1-3, 2003. [12] J. Williams, and N. Bergmann, “Embedded Linux as a Platform for Dynamically Self-Reconfiguring Systems-On-Chip”, in Proceedings of Engineering of Reconfigurable Systems and Algorithms (ERSA 2004), Las Vegas, Nevada, USA, 21-24 June, 2004. [13] H. Kalte, G. Lee, M. Porrmann, and U. Ruckert, “REPLICA: A Bitstream Manipulation Filter for Module Relocation in Partial Reconfigurable Systems”, in Proceedings of 19th IEEE International Proceedings of Parallel and Distributed Processing Symposium, Denver, Colorado, USA, April 04-08, 2005.