1 / 52

Nios TM Advanced Training

Nios TM Advanced Training. SESSION II Memory Accesses. Memory Access Instructions. LD, ST LDP, STP LDS, STS PFX EXT16D EXT16S EXT8D EXT8S FILL16 FILL8. Prefixable Instructions PFX. IMM11. 0. 4. 5. 13. 12. 13. 9. 14. 8. 11. 7. 15. 6. 1. 2. 10. 4. 5. 6. 7. 8. 9.

selima
Download Presentation

Nios TM Advanced Training

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. NiosTM Advanced Training SESSION II Memory Accesses

  2. Memory Access Instructions • LD, ST • LDP, STP • LDS, STS • PFX • EXT16D • EXT16S • EXT8D • EXT8S • FILL16 • FILL8

  3. Prefixable Instructions PFX IMM11 0 4 5 13 12 13 9 14 8 11 7 15 6 1 2 10 4 5 6 7 8 9 10 3 11 12 14 15 0 1 2 3 IMM5 Ra X X X X X 1 X X 0 1 X 1 X X 0 1 X X X X X 1 X X 0 X 1 X X X 0 0 31 21 20 16 15 5 4 0 IMM11[15..5] IMM5 IMM11[15..5] IMM5 • The following instructions can be extented by PFX instruction: • ADDI, AND, ANDN, CMPi, MOVHi, MOVi, OR, SUBi, XOR • LD, LDP, LDS, ST, STP or STS DO NOT follow the same mechanism PFX intruction Example: MOVI, MOVHI instruction PFX IMM11 MOVHI IMM5 Ra = PFX IMM11 MOVI IMM5

  4. The best way to use PFX instruction • PFX %hi(100) ; Extract bits 5..15 of x • MOVI %g1, %lo(100) ; Extract low 5 bits of x • PFX %xhi(100) ; Extract bits 21..31 of x • MOVHI %g1, %xlo(100) ; Extract bits 16..20 of x

  5. Addressing Modes – Simple 12 0 15 1 13 2 6 14 11 3 10 9 4 8 7 5 X X X X X X X X X X X X X X X X Index of Rb Index of Ra • LD = Load data from memory Ra = Mem[Rb] • ST = Store data to memory Mem[Rb] = Ra • If prefixed by PFX: Ra = MEM[ Rb + 4.s(K)] Instruction Fields

  6. Addressing Modes - Simple Byte Address Register Contents Memory 87.65.43.21 %r7 Destination 21 00.00.00.04 43 00.00.00.04 00.00.00.05 %r17 @ Source 65 00.00.00.06 87 00.00.00.07 7 . . . . . . . . . . . . . 0 • Sample code: MOV %r17, #4; place read address in register 17 LD %r7, [%r17] ; read word at byte address 4

  7. Addressing Modes - Simple (with Offset) Byte Address Register Contents Memory AB.CD.EF.00 Destination %r7 00 00.00.00.24 00.00.00.08 %i3 EF 00.00.00.25 CD 00.00.00.26 00.00.00.07 %K Offset 00.00.00.24 AB 00.00.00.27 @ Source 7 . . . . . . . . . . . . . 0 • Sample code: • MOV %i3, #8 ; place read address (8) in register %i3 • PFX #7 ; offset is 7 words (28 bytes, 0x1C) • LD %r7, [%i3] ; read word at byte address 0x24

  8. Addressing Modes - Pointer 6 13 0 1 14 2 12 15 3 10 9 4 8 7 5 X X X X X X X X X X X X X X 11 Rp IMM5 Index of Ra • LDP = Load with pointer addressing Ra = Mem[ Rp + 4.IMM5 ] • STP = Store with pointer addressing Mem[ Rp + 4.IMM5 ] = Ra Rp must be (r16, r17, r18 or r19) • If prefixed by PFX: Ra = MEM[ Rp + 4.s(K:IMM5)] Instruction Fields

  9. Addressing Modes - Pointer Byte Address Memory Register Contents 00.00.00.2C @ Destination 32 00.00.00.2C 54 00.00.00.2D 00.00.00.20 %r16 Base Pointer 76 00.00.00.2E 98 00.00.00.2F 00.00.00.0C #3*4 IMM Offset 98.76.54.32 7 . . . . . . . . . . . . . 0 %r3 Source • Sample code: • MOV %r16, #0x20; set base pointer to 0x20 • STP [%r16, #3], %r3 ; store word to byte address 0x2C

  10. Addressing Modes - Pointer (with Offset) 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 0 0 0 0 0 0 0 0 0 1 1 0 0 1 0 0 K-Register = 3 IMM5 = 4 • Sample code: • MOV %r16, #0x20; set base pointer to 0x20 • PFX %hi(100) ; hi loads upper 11-bits • STP [%r16, %lo(100)], %r3 ; lo loads lower 5-bits 100d = b

  11. Addressing Modes - Stack • Load data from memory (with pointer addressing) - LDS • Store data from memory (with pointer addressing) - STS • Address by Stack Register, 8-bit offset Stack register is always r14 Scaled, unsigned 8-bit offset added to base address (stack) • Does not support extended offset using PFX

  12. Instruction Set EXT16D Assembler Syntax: EXT16D %rA, %rB Example: LD %i3, [%i4] EXT16D %i3, %i4 Half Word 1 Half Word 0 rA before rB[1..0] ------------------- 0 ------------------- Half Word n rA after

  13. Instruction Set EXT8D Assembler Syntax: EXT8D %rA, %rB Example: LD %i3, [%i4] EXT8D %i3, %i4 Byte 3 Byte 2 Byte 1 Byte 0 rA before rB[1..0] ------------------------ 0 ------------------------ Byte n rA after

  14. Instruction Set EXT16S Assembler Syntax: EXT16S %rA, IMM1 Example: LD %i3, [%i4] EXT16D %i3, 1 Half Word 1 Half Word 0 rA before IMM1 ------------------- 0 ------------------- Half Word n rA after

  15. Instruction Set FILL16 Assembler Syntax: FILL16 %r0, %rA Example: FILL16 %r0, %i 3 Half Word 1 Half Word 0 rA before Half Word 0 Half Word 0 rA after

  16. Memory Interface Berkeley Architecture WritingBack InstructionFetching InstructionDecoding Executing MemoryPRG Processor Instructions MemoryData Variable, Stack, User Data General-Purpose Processor Register File Program Counter • The Data & Address Busses have to be shared between Data and Instructions

  17. Memory Interface Decoder Impact • Design requirements • Guarantee reliable access to external memory • Achieve 50MHz w/o cache • Uses Fast IO on APEX pad • ~16ns addr-out-to-data-in • Requires 2 clocks when switching between external memory and other devices

  18. 16-bit transfert construction Write 0xFFC0 at @= 0x1FC0 PFX %hi(0x1FC0) MOVI %g1,0x0 PFX %hi(0xFFC0) MOVI %g2,0x0 ST [%g1], %g2 Memory Interface 16-bit Instruction Impact • 32-bit transfert construction • Write 0xAAAA FFC0 at @= 0xF0CE 1FC0 • PFX %hi(0x1FC0) • MOVI %g1,0x0 • PFX %hi(0xF0C0) • MOVHI %g1,0xE • PFX %hi(0xFFC0) • MOVI %g2,0x0 • PFX %hi(0xAAA0) • MOVHI %g2,0xA • ST [%g1], %g2 1 1 1 1 2, 4 1 1 1 1 1 1 1 1 2, 4 Address Construction Address Construction Data Construction Data Construction Write cmd • Nb of clock Clock cycles = 6, 8 • Throughput < 16.5 Mbytes/s(core running at 50MHz) Write cmd • Nb of clock Clock cycles = 10 • Throughput < 20 Mbytes/s(core running at 50MHz)

  19. Memory Interface Read/Write switch • Read/Write switch between 2 internal @ • Read/Write switch between external / internal @

  20. Page transfert Tips 32bits RAM Datasource RAM Datadest. ROM Program • To support page transfert, do use: • PFX • LDP • STP Instruction memory is connected to the Highest Bus.

  21. Page transfert Tips 5, 8 Clock cycles per transfert • Transfert from @1 = 0xAAAA FFC0 to @2= 0xF0CE 1FC0 PFX %hi(0xFFC0) PFX %hi(AFC0) MOVI %r16,0x0 MOVI %r17,0x0 PFX %hi(0xAAA0) PFX %hi(0xFFC0) MOVHI %r16,0xA MOVHI %r17,0x0 LDP %r1, [%r16, 0x0] STP [%r16, 0x0], %r1 LDP %r1, [%r15, 0x1] STP [%r16, 0x1], %r1 … LDP %r1, [%r16, 0x3E] STP [%r16, 0x3E], %r1 LDP %r1, [%r15, 0x3E] STP [%r16, 0x3E], %r1 PFX %hi(0xF0) PFX %hi(0xF0) ADDI %r16, 0xF ADDI %r17, 0xF 1 1 1 1 2, 4 3, 4 2, 4 3, 4 2, 4 3, 4 2, 4 3, 4 1 1 Address @1 Construction Address @2 Construction Read cmd Write cmd Address @1 Update Address @2 Update

  22. Page transfert Tips Address @1 & @2 Construction 8 64 Transferts 320/576 324/580 Clock cycles per 128 words access Address @1 & @2 update4 Throughput 80Mbytes/s, in the best case, if the memory instruction = data memory (Core running at 50MHz) If data memory /= instruction memory then, Throughput  45Mbytes/s (Core running at 50MHz)

  23. Compiler Aspect The C Code • Basic Loop declaration • void main(void) • { • volatile long *segment_read =(long *) 0x2000; • volatile long *segment_write =(long *) 0x2500; • for (i=0; i<100; i++) • { • segment_write[i] = *segment_read; • } • }

  24. Compiler Aspect Default Compiler's output for (i= 0; i < 100; i++) 101a: 03 98 pfx %hi(0x60) 101c: 82 34 movi %g2,0x4 { segment_write[i] = (*segment_read); 101e: 01 b0 ldp %g1,[%l0,0x0] 1020: 01 a8 stp [%l2,0x0],%g1 1022: ff 9f pfx %hi(0xffe0) 1024: e1 37 movi %g1,0x1f 1026: ff 9f pfx %hi(0xffe0) 1028: e1 6f movhi %g1,0x1f 102a: 22 00 add %g2,%g1 102c: c2 7e skprz %g2 102e: f7 87 br 101e <main+0xe> 1030: 92 04 addi %l2,0x4 } Loop index Data Loading Data Storing Loop test @destination update

  25. Compiler Aspect Optimization Options -funroll-loops • Perform the optimization of loop unrolling. This is only done for loops whose number of iterations can be determined at compile time or run time. • How to use it ? NIOS-BUILD –cc "-funroll-loops" myprg.c

  26. Compiler Aspect Optimization Options … segment_write[i] = *segment_read; 101e: 01 b4 ldp %g1,[%l1,0x0] 1020: 01 a0 stp [%l0,0x0],%g1 1022: 90 04 addi %l0,0x4 1024: 01 b4 ldp %g1,[%l1,0x0] 1026: 01 a0 stp [%l0,0x0],%g1 1028: 90 04 addi %l0,0x4 102a: 01 b4 ldp %g1,[%l1,0x0] 102c: 01 a0 stp [%l0,0x0],%g1 102e: 90 04 addi %l0,0x4 … 1054: 01 b4 ldp %g1,[%l1,0x0] 1056: 01 a0 stp [%l0,0x0],%g1 1058: ff 9f pfx %hi(0xffe0) 105a: c1 36 movi %g1,0x16 105c: ff 9f pfx %hi(0xffe0) 105e: e1 6f movhi %g1,0x1f 1060: 22 00 add %g2,%g1 1062: c2 7e skprz %g2 1064: dc 87 br 101e <main+0xe> 1066: 90 04 addi %l0,0x4 • Result Block Copy Block Copy 10 times Block Copy Block Copy Loop test @destination update

  27. C writing Aspect Nios_map.h & Nios_peripherals.h • Nios_map.h & Nios_peripherals.h are generated directly by the megawizard in the /mynios_sdk/inc directory. How to use it in my C code ? Nios_peripherals.h // Timer Registers typedef volatile struct { int np_timerstatus; // read only, 2 bits (any write to clear TO) int np_timercontrol; // write/readable, 4 bits int np_timerperiodl; // write/readable, 16 bits … int np_timersnaph; // read only, 16 bits } np_timer; // Timer Register Bits enum { np_timerstatus_run_bit = 1, // timer is running np_timerstatus_to_bit = 0, // timer has timed out np_timercontrol_stop_bit = 3, // stop the timer … np_timercontrol_start_mask = (1<<2), // start the timer np_timercontrol_cont_mask = (1<<1), // continous mode np_timercontrol_ito_mask = (1<<0) // enable time out interrupt }; // Timer Routines int nr_timer_milliseconds(void); // Starts on first call, hogs timer1. Nios_map.h #define na_null ((void *) 0x00000000) #define na_mycpu_cpu ((void *) 0x00000000) #define na_mycpu_cpu_end ((void *) 0x00400000) #define na_rom_boot ((void *) 0x00000000) #define na_ram_sys ((void *) 0x00000400) #define na_ram_prg ((void *) 0x00001000) #define na_uart ((np_uart *) 0x00000800) #define na_uart_irq 20 #define na_timer ((np_timer *) 0x00000600) #define na_timer_irq 18 #define na_internal_ram_page_A ((void *) 0x00002000)

  28. C writing Aspect Nios_map.h & Nios_peripherals.h Pointer created Here ! int main(void) { np_timer *timer = na_timer; long timerPeriod = 0xFFFFFFFF; // Set Timer timer->np_timerperiodh = timerPeriod >> 16; // Timer TimeOut Period timer->np_timerperiodl = timerPeriod & 0xffff; timer->np_timercontrol = timer->np_timercontrol | np_timercontrol_cont_mask; // Set Continuous mode timer->np_timercontrol = timer->np_timercontrol & ~np_timercontrol_ito_mask; // IRQ Disabled … Bit register selected Here ! Internal Timer Register selected Here !

  29. NiosTM Advanced Training SESSION II Lab – Memory Accesses Measure

  30. Goals • Creating a Quartus II project • Generating a NiosTM System Variation • Writing, compiling C code application to support Page transfers in different modes Template will be provided • Simulating with Modelsim (verilog mode) • Compiling w/ Quartus II Pin-Out file will be provided • Configuring the Nios board • Downloading SREC and Measure the access time • Using the GDB debugger

  31. The Nios System Page A ROM Boot RAM system RAM Prg RAM Page A RAM Page B UART TIMER Page B Rx, Tx External ram On-chip bus Ext. bus On-System Nios

  32. Creating a QuartusII project • Launch Quartus II • Open "FileNew Project Wizard" • Fill the three following fields • Working directory = "d:\training_nios\session2" • Project Name = memory_access • Top Level Name = memory_access • Clique on Finish • Open "File New…" • Select Block Diagram/Schematic File • Open "FileSave as…" • File Name = memory_access

  33. Generating a Nios System Variation 1/2 • Launch the Nios Megawizard Plug-In Manager • Double click in the Schematic Window • Clique on the "MegaWizard Plug-In Manager…" button • Select "Create a new custom megafunction variation" • Select ALTERA Excalibur NiosTM megafunction • Select Verilog HDL output type • Give it the name mycpu • Do parameterise your core system • NIOS 32bits, 20bits @, 256 files reg., 3bits shifter, No MSTEP, No MUL

  34. Generating a Nios System Variation 2/2 • Nios system organisation • Main Prog Memory = ram_prg • Main Data Memory = ram_sys • Host Communication = uart • Debug Communication = uart • Boot ID Message = Free to fill • Boot Device = ram_prg For the simulation we will boot on the ram_prg which will be precharged. For real use, in the board, we will change the boot device to rom_germs • Interrupt Vector Table = ram_sys • Synthesis Target Familly = None For the simulation, we don't synthesis the core • Place the Nios system symbol in the schematic window • Save the schematic file as memory_access.bdf "mycpu.ptf" file is generated which describes your whole Nios system

  35. Writing & compiling a C Program • In Windows Explorer, create the directory mysrc in D:\training_nios\session2\mycpu_sdk\ • Copy the mem_access.c file in D:\training_nios\session2\mycpu_sdk\mysrc\ • Complete the program and set a transfert from • Internal Page A, to • External Page B. • Open a Bash Window & Go in D:\training_nios\session2\mycpu_sdk\mysrc\ • Run "NIOS-BUILDmem_access.c" to generate compiled Code • mem_access.srec • mem_access.objdump

  36. PTF file modification • Open mycpu.ptf file with your Favorite Editor in D:\training_nios\session2\ • Turn on simulation support file generation by setting variable do_build_sim to 1 as follows: SYSTEM mycpu { WIZARD_SCRIPT_ARGUMENTS { do_build_sim = "1" ; • ram_prg user file specification • Find the MODULE ram_prg section and Change the following lines WIZARD_SCRIPT_ARGUMENTS { Writeable = "1"; Contents = "user_file"; Initfile = "mycpu_sdk\\mysrc\\mem_access.srec"; }

  37. Generating the Simulation environment • Open a BASH window and go in"D:\training_nios\session2" • Run the following command • GENERATE_PROJECT mycpu • Create acompile_verilog.do in"D:\training_nios\session2\mycpu_sim" • Add vlog -work work ./mycpu_test_bench.v • Add vsim work.mycpu_test_bench

  38. Simulating with ModelSim 1/5 • Launch Modelsim Altera-Edition or SE 5.4 • Open "FileChange directory.." menu and select "D:\training_nios\session2\mycpu_sim" • Type "do compile_verilog.do" in the command line • Open the "ViewStructure" menu • Open the "ViewSignal" menu

  39. Simulating with ModelSim 2/5 • Select the following signals: • /mycpu_test_bench/the_mycpu_core/clk • /mycpu_test_bench/the_mycpu_core/reset_n • /mycpu_test_bench/the_mycpu_core/the_timer/irq • /mycpu_test_bench/the_mycpu_core/the_timer/timer_select • /mycpu_test_bench/the_mycpu_core/the_timer/internal_counter [set the radix format to dec] • /mycpu_test_bench/the_mycpu_core/the_mycpu_cpu/ifetch • /mycpu_test_bench/the_mycpu_core/the_mycpu_cpu/mem_addr [set the radix format to hex] • /mycpu_test_bench/the_mycpu_core/the_mycpu_cpu/data_from_cpu [set the radix format to hex] • /mycpu_test_bench/the_mycpu_core/the_mycpu_cpu/data_to_cpu [set the radix format to hex] • /mycpu_test_bench/the_mycpu_core/the_mycpu_cpu/mem_wr_n • /mycpu_test_bench/the_mycpu_core/the_mycpu_cpu/mem_rd_n • In the Waves window, open "EditDisplay Properties…" • Set to 1 the Signal Names path elements displayed • In the Waves window, Save your waves format as wave.do • Type "run 200µs" in the command line

  40. Simulating with ModelSim 3/5 • Find the beginning of the transfert • Search for Value 0x2000 in the @ line • Count the number of clock cycles for the read access • Nb_read = _______ • Count the number of clock cycles for the write access • Nb_write = _______ • Find the @ or the instruction of the first read access and write access is fetched. How many clock cycles before the access is done ? • Pipe_length = ______

  41. Simulating with ModelSim 4/5 Re-Simulating each times the SW Has Been Modified • Open a Bash Window & Go in D:\training_nios\session2\mycpu_sdk\mysrc\ • Run "NIOS-BUILDmem_access.c" to generate compiled Code • Open a BASH window and go in "D:\training_nios\session2" • Run the following command • GENERATE_PROJECT mycpu • Under ModelSim, in the command line • Type "do compile_verilog.do" • Type "do wave.do" • Type "run 200 µs"

  42. Simulating with ModelSim 5/5 • Change the program in order to set the following transferts • From External Page A to Internal Page B • From External Page A to External Page B • From Internal Page A to Internal Page B • For every simulations, count the number of clock cycles.

  43. Re-Generating the Nios system & Producing an EDIF file • Edit file mycpu.ptf in "D:\training_nios\session2" • Enable the synthesis by putting "skip_synth" option to 0 • Change the boot device memory by"rom_boot" which contents the GERMS monitor • Find the topic reset_module in WIZARD_SCRIPT_ARGUMENT of the MODULE mycpu_cpu • Open BASH window and go in "D:\training_nios\session1" • Run the following command • GENERATE_PROJECT mycpu

  44. Compiling w/ Quartus II 1/2 • Under Quartus, double click in the schematic window The Symbol manager is launched • Type Input in the Name field and clique OK • Copy and past n times the Input symbol and connect all the input ports of the Nios symbol • Place a Output symbol in front of each output ports • Double Clique on each Input/Output symbol and change the name as shown hereafter

  45. Compiling w/ Quartus II 2/2 • Select the APEX device type • Open "ProcessingCompiler Settings…" • Select "Chips & Devices" tab • Select Family APEX20KE and select EP20K200EFC484-2* • Clique on "Device & Pin Options" button • Select "Unused Pins" tab and select Reserve all unused pins "As inputs, tri-stated" • Clique OK 2 times. • Assign I/O pins accordingly to the Nios demo board features • Close the project by selecting "fileClose Project" • Under Windows Explorer open memory_access.csf file in "D:\training_nios\session2" and copy the I/O assignment in the CHIP session from the session2_io_pin.txt file provided * Please check on your board to know the exact 20K device mounted on it

  46. Configuring the Nios board • Re-open the "memory_access" project • The I/O assignments are now taken into account • Start the compilation by selecting "ProcessingStart compilation" The compilation time takes about 6 minutes • Open "ProcessingOpen Programmer" menu • Clique "Add file" and select memory_access.sof file • Enable "Program/Configure" box • Start the configuration

  47. Question ? • Why your application is running although you didn't send any SREC file through the UART by using Nios-Run ?

  48. Downloading the code Nios soft Reset APEX hard Reset • Testing the connexion with the GERMS by • Going in BASH window, • Typing "NIOS-RUN –t" (terminal mode) • Reseting the Nios and typing ENTER (memory will be dumped) • Open a Bash Window & Go in D:\training_nios\session2\mycpu_sdk\mysrc\ • Download the SREC file by typing "NIOS-RUN mem_access.srec"

  49. Measuring the throughput • Change the program in order to set the following transferts • From Internal Page A to Internal Page B • From Internal Page A to External Page B • From External Page A to External Page B • From External Page A to Internal Page B • For every simulations • Note the number of clock cycles required to complet the loop, • divide by the number of transfers done (32000), • Multiple by 8 to get the throughput in bytes/s

  50. Using of the debugger • Add Code to “main” Function #if NIOS_GDB nios_gdb_install(1); nios_gdb_breakpoint(); #endif • Build The Program nios-build -debug mem_access.c • Run The Shell Script Produced By “nios-build” myprogram.gdb

More Related