280 likes | 368 Views
A Novel High-level Dynamic Hardware-Software Remapping Technique for Mission Critical Reconfigurable Computers. Luis E. Cordova, Duncan A. Buell. Outline. Problem and definitions Motivation Architecture N Techniques Advantages Disadvantages Lessons learned. Problem and Definitions.
E N D
A Novel High-level Dynamic Hardware-Software Remapping Technique for Mission Critical Reconfigurable Computers Luis E. Cordova, Duncan A. Buell Cordova
Outline • Problem and definitions • Motivation • Architecture • N Techniques • Advantages • Disadvantages • Lessons learned Cordova
Problem and Definitions 1. RCs are built from FPGAs + CPUs + memories + &c • RC are general purpose embedded platforms • RC are used to accelerate scientific applications Problem: • Achieve fault tolerance over heterogeneous hardware • RC requires knowledge of where the electronics inside the satellite is used, its orbit, for how long, and which direction it is facing solution is adaptability Notes: There are more FPGAs than microprocessors in an RC. (examples: SRC, BEE, &c)This is the egg-chicken dilema! “SRAM-based FPGAs are less reliable than microprocessors” but reconfigurable. Cordova
Motivation * Ground tracking of LEO orbit of CEASE device: Source: [Amptek03] Cordova
P4 P4 Control (2.8GHz) (2.8GHz) FPGA 22400 22400 XC2V6000 / / MB/s MB/s 2x800 MB/s L2 L2 / (6x64 bits) 4256 MB/s On-Board Memory (24 MB) MIOC Max payload rate is 1400 MB/s 1064 MB/s 4256 MB/s / / / 4800 MB/s 4800 MB/s (6x 64 bits) (6x 64 bits) / / Computer PCI-X / Memory 1064 MB/s 2400 MB/s (8 GB) FPGA 1 FPGA 2 (192 bits) XC2V6000 XC2V6000 DDR uP Interface (108 bits) Board / SNAP / (108 bits) Chain 2400 MB/s for each port Ports Source: [SRC] Case: SRC Hardware Architecture Cordova
Fault Tolerance Techniques • Dynamic FPGA-HOST remapping • Dynamic FPGA-FPGA remapping • FPGA Checkpointing • System-level radiation tolerance • Sanity checks with golden copy • Streaming Heart Beat Signal • Redundancy-based Data integrity • Control Flow Tolerance • Memory scrubbing • Hardware-Software Backup Threading • Dynamic Spatial Radiation Tolerance • HW-HW and HW-SW injection Remapping and Recovery Monitoring Protection Profiling Cordova
System Dynamic Remapping Dynamic Redistribution between uProc and FPGAs: Radiation Environment(SETs/SEUs) Program code main( ) { comp 1 … comp N } // end Faults on uProc handled by other methods comp 4 comp 7 Speedup demand of the computation Faults on FPGA side User FPGA 1 User FPGA 2 Trade-offs:- parallelism- tolerance- FPGA resources Cordova
Static Host-FPGA Mapping Hybrid Computer Under Test (HCUT) parity & check ( ) main ( ) saboteur ( ) check & parity ( ) saboteur ( ) BRAM OBM map_function ( ) V self_repair ( ) diagnose ( ) Processor MAP reconfigurable fabric Cordova
Remapping and Monitoring HierarchicalToleranceuProc levelMAP levelRTL level Host RadHard uProc Dynamic remappingbetween uProc and FPGAs Streaming heart beat On-board-memory (OBM) Bridge heart beat Dynamic remappingbetween FPGAs Chip 1 Chip 2 Cordova
Top level Remapping hardware functionality mapped if (mapIt (mapnum1)) { fprintf (stdout, "Hybrid level 1 failed!"); fprintf (stdout, "Entering hybrid level 2."); if (mapIt(mapnum2)) { fprintf (stdout, "Hybrid level 2 failed!"); fprintf (stdout, "Entering level 3 (full software).\n"); /* Computation on Software */ computeInSoftware(A,B,C,D); } else { user2 (n, A, B, C, D, &time, 0); } } else { user1 (n, A, B, C, D, &time, 0); } more 1 2 3 less Cordova
Block RAM ‘Flip-Flop’ Scrubbing // computation for (i=0;i<n;i++) { tmr_in = al[i]; saboteur = bl[i]; // reading input stream // Block RAM Scrubbing Technique if (i%2) { bram_rw = scrubb_flip [i]; // parity check flag error scrubb_flop [i] = bram_rw; } else { bram_rw =scrubb_flop [i]; // parity check flag error scrubb_flip [i] = bram_rw; } // bram_rw is used later on ... NEXT bram_rw even parity bits Scrubb flip Read Write Write Read Scrubb flop parity bits check odd check Block RAMs Cordova
Hardware-Hardware & Software-Hardware Fault Injection // datapath level module redundancy -- DPLMR result_1 = tmr_in * bram_rw + (saboteur & 16LL); result_2 = tmr_in * bram_rw + (saboteur & 8LL); result_3 = tmr_in * bram_rw + (saboteur & 4LL); result_4 = tmr_in * bram_rw + (saboteur & 2LL); ... Redundant data-paths 1 to N bram_rw data-path 1 X + result_k data-path k tmr_in ... data-path N saboteur for k Hardware-Hardware(LFSR = linear feedback shift register) Software-Hardware (recall previous slide) saboteur = bl[i]; // reading input stream Cordova
Dynamic Spatial Radiation Hardening 1 if ((result_1 == result_2) && (en_hub1 == 1) && (en_hub2 == 1)) { final_result = result_1; mul_diagnose_opt = 12; } else if ((result_2 == result_3) && (en_hub2 == 1) && (en_hub3 == 1)) { final_result = result_2; mul_diagnose_opt = 23; } else if ((result_3 == result_4) && (en_hub3 == 1) && (en_hub4 == 1)) { final_result = result_3; mul_diagnose_opt = 34; } else if ((result_4 == result_5) && (en_hub4 == 1) && (en_hub5 == 1)) { final_result = dresult_4; mul_diagnose_opt = 45; } else if ((result_5 == result_1) && (en_hub5 == 1) && (en_hub1 == 1)) { final_result = result_5; mul_diagnose_opt = 51; } else { final_result = result_5; mul_diagnose_opt = 55; } Multi-diagnose Option Enabling Hub result_A data-path 1 Voting data-path k result_B final_result ... result_C data-path N Cordova
Dynamic Spatial Radiation Hardening 2 // on-next-iteration do enable/disable redundant datapaths circularly if (temp_mul_diagnose_opt != mul_diagnose_opt) { temp_v = en_hub5; en_hub5 = en_hub4; en_hub4 = en_hub3; en_hub3 = en_hub2; en_hub2 = en_hub1; en_hub1 = temp_v; } temp_mul_diagnose_opt = mul_diagnose_opt; en_hub1 1 N = 5 1 en_hub2 Enableddata-pathsare 1, 2, 3 (*) temp_v en_hub3 1 0 en_hub4 en_hub5 0 Multi-diagnose Option result_A data-path 1 Enabling Hub Voting data-path k result_B final_result ... result_C data-path N * implementing an LFSR is similar Cordova
Control Flow Tolerance: IF statement // Agent-based control flow technique #define xor(x,y) (x & !y)|(!x & y) control_flag1 = 0; control_flag2 = 0; ... if (condition) { control_flag1 = 1; ... } ... if (condition &tolerance) { control_flag2 = 1; ... } error_flag = xor(xor(condition, control_flag1), control_flag2) ...; if true mux control_flag1 if false mux control_flag2 condition error_flag Cordova
Control Flow Tolerance: FOR statement // Agent-based control flow technique #pragma src parallel sections { #pragma src section { for (i=0; i<sz; i++) { control_counter1++; } } #pragma src section { for (i=0; i<sz; i++) { control_counter1++; } } } if (control_counter1 == control_counter2) {error = 0;} else {error = 1;} Dummy path Data path counter2 counter1 = error Cordova
Resource Utilization Area is crucial to assess efficiency but it is also a flexible variable that we can tune with our programming model Table I. Resource Utilization * 1 = bare-bones design 2 = radhard design moderate 3 = radhard design high Total for chip: xc2v6000-ff1517-4 33,792 slices (x2 FFs) 144 Mult/BRAM Cordova
FPGA Checkpointing // attempt to back up the On-Board-Memory (OBM) banks if (status == temporary_failure) { obm_single_dma_stripe_backup(status, backed_up_obm_data); } else if (status == at_speed_backup) { obm_double_dma_looping_backup(status, backed_up_obm_data); } else { // FPGA unrecoverable backed_up_obm_data = null; status= 0; } control backed_up_obm_data Host RadHard uProc status A B C D E F G H Chip 1 Chip 2 Cordova
Hardware-Software Backup Threading Two types of threads: 1. POSIX thread backup 2. FPGA leading thread FPGA routine uP1 comp x openMP backup Message Passing X FPGA routine uP2 comp x openMP backup Cordova
Compute Data Integrity hw_valid // Compute Data Integrity technique int main(){ rst_count = 0; hw_valid = 0; ... for(i=0; i< compute_blocks; i++){ for (j=0; j<sz; j++) { if(hw_valid){ sw_array->aarray[j] = hw_array->aarray[j]; } else { hw_array->aarray[j] = sw_array->aarray[j]; } } ... hw_array if sw_array Cordova
Hardware-Software Backup Threading // Backup threading technique pthread_create(&thread_hw, NULL, &foo_hw, NULL); pthread_create(&thread_sw, NULL, &foo_sw, NULL); pthread_testcancel(); pthread_join(thread_hw, NULL); pthread_join(thread_sw, NULL); printf(“compute_block done! \n"); if(rst_count > 2){ system("snap Reset"); rst_count = 0; } } printf("job done! \n"); return(0); } ... thread_hw thread_sw foo_hw foo_sw hw_valid =1 rst_count ++ hw_valid =0 Cordova
“foo_SW” Software Thread // foo_sw : software version of function foo void *foo_sw(){ pthread_setcanceltype(PTHREAD_CANCEL_ASYNCHRONOUS,NULL); printf("I am thread foo_sw \n"); for(j=0; j<sz; j++) { sw_array->aarray[j] = 1 + sw_array->aarray[j]; } status = pthread_cancel(thread_hw); pthread_testcancel(); printf("canceling thread_hw with status = %i\n", status); pthread_exit(NULL); return NULL; } foo_sw Cordova
“foo_HW” Hardware Thread // foo_hw : hardware version of function foo void *foo_hw(){ pthread_setcanceltype(PTHREAD_CANCEL_ASYNCHRONOUS,NULL); printf("I am thread foo_hw \n"); rst_count++; foo_hw_map(hw_array->aarray, hw_array->mapno); rst_count--; hw_valid = 1; status = pthread_cancel(thread_sw); pthread_testcancel(); printf("canceling thread_sw with status = %i\n", status); pthread_exit(NULL); return NULL; } foo_hw Cordova
SystemC – Calling MAP C from C++ • Offline • Development is seamless and based on code transformation • that can be copy/pasted to a MAP C design • Online • Online Interface (OIF). The MAP hardware is treated as an • object. Computation is performed at the high level e.g. main ( ) reset FIR input_valid output_data_ready sample result CLK display output_data_ready reset result stimulus CLK input_valid foo_hw sample Cordova
Sanity Checking // Read back (supported if API supports it) p_bitstream_new = JTAG_bitstream_read_back(); error = compare(p_bitstream_new, p_bitstream_golden); // Sanity checking with hw module database foo_hw_1(argument_1, result_1); ... foo_hw_N(argument_N, result_N); for(i=0; i< modules; i++) { error[i] = compare(result_1, golden_1); } golden (sw-computed or stored) = error [ ] foo_1 ( ) foo_N() Cordova
Advantages Hardening: Dynamic levels of radiation hardening or customization. System description is fully synthesizable in both SW (compiled-> processor) or HW (forged-> C to fpga compilation) Fault-injection: Fault injection can be specified at high level (ANSI C or Fortran) and can be interfaced with scripts for verification and test Simulation and emulation capabilities: At speed tolerance check, debugging, cycle accurate simulation, hardware emulation Cordova
Disadvantages Too high level: • Optimization is aimed at first only by the use of a Hardware compiler • Further optimization is achieved by a skilled or experienced programmer • Fine tunning is possible at the expense of time yet this obstacle is being overcome by more advanced hardware compiler technology and released programmer techniques Cordova
Leasons Learned • Tested High-level Advance Fault tolerance techniques • Develop high performance embedded computing techniques that are power aware and versatile to counteract different radiation scenarios • High performance supercomputing methodologies need of terrestrial-based radiation hardening due to amplifying effects in supercomputers comprising large number of processing elements Cordova