1 / 25

PROCStar III Performance Charactarization

Final Presentation. PROCStar III Performance Charactarization. Instructor : Ina Rivkin Performed by : Idan Steinberg Evgeni Riaboy Semestrial Project Winter 2010. Project Overview.

Download Presentation

PROCStar III Performance Charactarization

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Final Presentation PROCStar III Performance Charactarization Instructor:Ina RivkinPerformed by:Idan Steinberg EvgeniRiaboy Semestrial Project Winter 2010

  2. Project Overview • With the introduction of a new FPGA based board, we have to devise a series of tests to examine the devices max practical performance, allowing the students that use these boards for future projects, to plan optimal design based on the concluded tested performance. • All the tests are intended to determine maximal frequency that ensure correct results (this is why we check all the data for correctness).

  3. GidelPROCstar III Stratix III 260E • 4 AlteraStratix III 260E FPGA’s, with 256 MB on chip memory • 8 Lane PCIe host interface • 8 DDR2 Banks, with 2*2GB on first FPGA, and 1*2GB on the other FPGA’s • ~2MB FPGA Internal RAM • 255K Logic Elements (per FPGA)

  4. ThePROCStar III Processing Unit

  5. Project Goals Testing Procstar III board for: • Maximum frequency of reading/writing between the FPGA and Memory Banks • Maximum communication speed between FPGA’s, both on their adjacent connection and on their shared BUS. • Highest possible performance of the internal logic in Add, Subtract, Multiply, Divide and sqrt configurations.

  6. Test 1: External Memories Transfer Rate: • In stage 1, As a preparation for the test, We have written constant (X’1E) to the memory in FIFO configuration from the PCIe. • In Stage 2, we read the data from the FPGA, comparing it to the written data, in purpose to determine the max frequency between the memory bank and the FPGA. Procstar III FPGA Memory Bank

  7. Test 1: Results: We ran the test in the full spectrum of frequencies (0 – 400 MHz), and were unable to find a failing frequency for the memory access operation. After testing the system in Signaltap, we concluded that the memory access is always 16 cycles long, no matter what the frequency is.

  8. Project Plan Tests 2,3: FPGA Communication: We have tested the communication between the FPGA’s, both on their adjacent connection and their shared BUS.

  9. Test 2: FPGA Communication: We have built a state machine that creates 4 different outputs on FPGA2, transferred the data to FPGA1 on their direct connection, and checked the data correctness (running at increasing frequencies on the data channel). Procstar III FPGA 1 FPGA 2

  10. Test 2: Results: The FSM both on FPGA1 and FPGA2:

  11. Test 2: Results: • For data 00,01,10,11 (lsb), we found the failure frequency to be 247 Mhz. • For data 00.. and 11.. , which is the worst case because all bits change between transfers, we found the failure frequency to be 65 Mhz. • For a vector 10101010100… and the 000… vector, we found the failure frequency to be 133 Mhz.

  12. Test 2: Results: • Data length: 100 bit

  13. Test 3: FPGA Communication: We have built a state machine that creates 4 different outputs on FPGA2, transferred the data to FPGA1,2 and 4 on their BUS, and checked the data correctness (running at increasing frequencies on the data channel). BUS Procstar III FPGA 1 FPGA 2 FPGA 3 FPGA 4

  14. Test 3: Results: • For data 00,01,10,11 (lsb), we found the failure frequency to be 89 MHz for FPGA4, 97 MHz for FPGA1, 101 MHz for FPGA3. • For data 00.. and 11.. , which is the worst case because all bits change between transfers, we found the failure frequency to be 72 MHz for FPGA4, 73Mhz for FPGA1, 76 MHz for FPGA3. • For a vector 10101010100… and the 000… vector, we found the failure frequency to be 87 MHz for FPGA4, 98 MHz for FPGA1, 96 MHz for FPGA3.

  15. Test 3: Results: Data length: 37 bit

  16. Test 4: Internal Functions Testing: We have built 2 slice units, for the fixed point: One with alu1, including dsp unit And one with alu2, without dsp unit FSM 18 bit 18 bit per operation Data error per operation ALU1 = ??? Test done per operation 18 bit per operation FSM 18 bit 18 bit per operation Data error per operation ALU2 = ??? Test done per operation 18 bit per operation

  17. Test 4: Internal Functions Testing: We have built another slice unit in the floating point configuration to check that the comparator doesn’t affect the performance: FSM 18 bit 18 bit per operation Data error per operation ALU = ??? 18 bit per operation Test done per operation = ??? Comparator testing (from fsm) = ??? Comparator testing (from alu)

  18. Test 4: Internal Functions Testing: constant ALTFP_ADD floating point Alu implementation: all the units are from Altera arithmetic library constant ALTFP_SUB constant From fsm ALTFP_MULT constant ALTFP_DIV ALTFP_SQRT

  19. Test 4: Internal Functions Testing: constant add fixed point Alu implementation: LPM_DIVIDE, ALTSQRT and half_dsp_block units are from Altera arithmetic library constant sub Mul_le x FF + From fsm x x x + x constant LPM_DIVIDE ALTSQRT constant half_dsp_block

  20. Test 4: Internal Functions Testing: Fixed point: Frequency at which one of the units fails for each configuration:

  21. Test 4: Internal Functions Testing: Floating point: Frequency at which one of the units fails for each configuration:

  22. Conclusions • Our goal was to determine the maximal performance of the Procstar III 260 board with different operations performed on and between the FPGA’s. • The most significant conclusion we have found, is that the results are completely temperature dependant, and differ extremely when the system gets hot. • The edge point for normal performance is at 62°C, when after this temperature performance degrade extremely.

  23. Conclusions • The memory testing we performed, led us to the conclusion that the memory access can be performed at any frequency that the system supports (the read access from the memory is always 16 cycles long). • The communication testing between FPGA’s, both on adjacent channels, and the bus between them all, led us to the conclusion

  24. Conclusions • That the results are completely data dependant, as can be seen from the results. • The mathematical operations testing, was the most complicated one. We have tried many system configurations, until chose to implement the test using a unit duplicated several times. • In order to make sure that our testing is performed correctly, and the comparator unit does not affect the results, we have tested it in several ways.

  25. Conclusions • First, we have built a design containing solely the comparator, duplicated 200 times, and made sure that it never fails at any frequency. • Then, to be sure that the physical implementation of the design does not affect comparison, we have created another comparator in each slice unit, and connected same inputs in comparison (both for FSM and ALU units outputs). The result was that this units never failed at any frequency, showing us that at each design, the comparator does not affect the results.

More Related