520 likes | 624 Views
Nios TM Advanced Training. SESSION II Memory Accesses. Memory Access Instructions. LD, ST LDP, STP LDS, STS PFX EXT16D EXT16S EXT8D EXT8S FILL16 FILL8. Prefixable Instructions PFX. IMM11. 0. 4. 5. 13. 12. 13. 9. 14. 8. 11. 7. 15. 6. 1. 2. 10. 4. 5. 6. 7. 8. 9.
E N D
NiosTM Advanced Training SESSION II Memory Accesses
Memory Access Instructions • LD, ST • LDP, STP • LDS, STS • PFX • EXT16D • EXT16S • EXT8D • EXT8S • FILL16 • FILL8
Prefixable Instructions PFX IMM11 0 4 5 13 12 13 9 14 8 11 7 15 6 1 2 10 4 5 6 7 8 9 10 3 11 12 14 15 0 1 2 3 IMM5 Ra X X X X X 1 X X 0 1 X 1 X X 0 1 X X X X X 1 X X 0 X 1 X X X 0 0 31 21 20 16 15 5 4 0 IMM11[15..5] IMM5 IMM11[15..5] IMM5 • The following instructions can be extented by PFX instruction: • ADDI, AND, ANDN, CMPi, MOVHi, MOVi, OR, SUBi, XOR • LD, LDP, LDS, ST, STP or STS DO NOT follow the same mechanism PFX intruction Example: MOVI, MOVHI instruction PFX IMM11 MOVHI IMM5 Ra = PFX IMM11 MOVI IMM5
The best way to use PFX instruction • PFX %hi(100) ; Extract bits 5..15 of x • MOVI %g1, %lo(100) ; Extract low 5 bits of x • PFX %xhi(100) ; Extract bits 21..31 of x • MOVHI %g1, %xlo(100) ; Extract bits 16..20 of x
Addressing Modes – Simple 12 0 15 1 13 2 6 14 11 3 10 9 4 8 7 5 X X X X X X X X X X X X X X X X Index of Rb Index of Ra • LD = Load data from memory Ra = Mem[Rb] • ST = Store data to memory Mem[Rb] = Ra • If prefixed by PFX: Ra = MEM[ Rb + 4.s(K)] Instruction Fields
Addressing Modes - Simple Byte Address Register Contents Memory 87.65.43.21 %r7 Destination 21 00.00.00.04 43 00.00.00.04 00.00.00.05 %r17 @ Source 65 00.00.00.06 87 00.00.00.07 7 . . . . . . . . . . . . . 0 • Sample code: MOV %r17, #4; place read address in register 17 LD %r7, [%r17] ; read word at byte address 4
Addressing Modes - Simple (with Offset) Byte Address Register Contents Memory AB.CD.EF.00 Destination %r7 00 00.00.00.24 00.00.00.08 %i3 EF 00.00.00.25 CD 00.00.00.26 00.00.00.07 %K Offset 00.00.00.24 AB 00.00.00.27 @ Source 7 . . . . . . . . . . . . . 0 • Sample code: • MOV %i3, #8 ; place read address (8) in register %i3 • PFX #7 ; offset is 7 words (28 bytes, 0x1C) • LD %r7, [%i3] ; read word at byte address 0x24
Addressing Modes - Pointer 6 13 0 1 14 2 12 15 3 10 9 4 8 7 5 X X X X X X X X X X X X X X 11 Rp IMM5 Index of Ra • LDP = Load with pointer addressing Ra = Mem[ Rp + 4.IMM5 ] • STP = Store with pointer addressing Mem[ Rp + 4.IMM5 ] = Ra Rp must be (r16, r17, r18 or r19) • If prefixed by PFX: Ra = MEM[ Rp + 4.s(K:IMM5)] Instruction Fields
Addressing Modes - Pointer Byte Address Memory Register Contents 00.00.00.2C @ Destination 32 00.00.00.2C 54 00.00.00.2D 00.00.00.20 %r16 Base Pointer 76 00.00.00.2E 98 00.00.00.2F 00.00.00.0C #3*4 IMM Offset 98.76.54.32 7 . . . . . . . . . . . . . 0 %r3 Source • Sample code: • MOV %r16, #0x20; set base pointer to 0x20 • STP [%r16, #3], %r3 ; store word to byte address 0x2C
Addressing Modes - Pointer (with Offset) 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 0 0 0 0 0 0 0 0 0 1 1 0 0 1 0 0 K-Register = 3 IMM5 = 4 • Sample code: • MOV %r16, #0x20; set base pointer to 0x20 • PFX %hi(100) ; hi loads upper 11-bits • STP [%r16, %lo(100)], %r3 ; lo loads lower 5-bits 100d = b
Addressing Modes - Stack • Load data from memory (with pointer addressing) - LDS • Store data from memory (with pointer addressing) - STS • Address by Stack Register, 8-bit offset Stack register is always r14 Scaled, unsigned 8-bit offset added to base address (stack) • Does not support extended offset using PFX
Instruction Set EXT16D Assembler Syntax: EXT16D %rA, %rB Example: LD %i3, [%i4] EXT16D %i3, %i4 Half Word 1 Half Word 0 rA before rB[1..0] ------------------- 0 ------------------- Half Word n rA after
Instruction Set EXT8D Assembler Syntax: EXT8D %rA, %rB Example: LD %i3, [%i4] EXT8D %i3, %i4 Byte 3 Byte 2 Byte 1 Byte 0 rA before rB[1..0] ------------------------ 0 ------------------------ Byte n rA after
Instruction Set EXT16S Assembler Syntax: EXT16S %rA, IMM1 Example: LD %i3, [%i4] EXT16D %i3, 1 Half Word 1 Half Word 0 rA before IMM1 ------------------- 0 ------------------- Half Word n rA after
Instruction Set FILL16 Assembler Syntax: FILL16 %r0, %rA Example: FILL16 %r0, %i 3 Half Word 1 Half Word 0 rA before Half Word 0 Half Word 0 rA after
Memory Interface Berkeley Architecture WritingBack InstructionFetching InstructionDecoding Executing MemoryPRG Processor Instructions MemoryData Variable, Stack, User Data General-Purpose Processor Register File Program Counter • The Data & Address Busses have to be shared between Data and Instructions
Memory Interface Decoder Impact • Design requirements • Guarantee reliable access to external memory • Achieve 50MHz w/o cache • Uses Fast IO on APEX pad • ~16ns addr-out-to-data-in • Requires 2 clocks when switching between external memory and other devices
16-bit transfert construction Write 0xFFC0 at @= 0x1FC0 PFX %hi(0x1FC0) MOVI %g1,0x0 PFX %hi(0xFFC0) MOVI %g2,0x0 ST [%g1], %g2 Memory Interface 16-bit Instruction Impact • 32-bit transfert construction • Write 0xAAAA FFC0 at @= 0xF0CE 1FC0 • PFX %hi(0x1FC0) • MOVI %g1,0x0 • PFX %hi(0xF0C0) • MOVHI %g1,0xE • PFX %hi(0xFFC0) • MOVI %g2,0x0 • PFX %hi(0xAAA0) • MOVHI %g2,0xA • ST [%g1], %g2 1 1 1 1 2, 4 1 1 1 1 1 1 1 1 2, 4 Address Construction Address Construction Data Construction Data Construction Write cmd • Nb of clock Clock cycles = 6, 8 • Throughput < 16.5 Mbytes/s(core running at 50MHz) Write cmd • Nb of clock Clock cycles = 10 • Throughput < 20 Mbytes/s(core running at 50MHz)
Memory Interface Read/Write switch • Read/Write switch between 2 internal @ • Read/Write switch between external / internal @
Page transfert Tips 32bits RAM Datasource RAM Datadest. ROM Program • To support page transfert, do use: • PFX • LDP • STP Instruction memory is connected to the Highest Bus.
Page transfert Tips 5, 8 Clock cycles per transfert • Transfert from @1 = 0xAAAA FFC0 to @2= 0xF0CE 1FC0 PFX %hi(0xFFC0) PFX %hi(AFC0) MOVI %r16,0x0 MOVI %r17,0x0 PFX %hi(0xAAA0) PFX %hi(0xFFC0) MOVHI %r16,0xA MOVHI %r17,0x0 LDP %r1, [%r16, 0x0] STP [%r16, 0x0], %r1 LDP %r1, [%r15, 0x1] STP [%r16, 0x1], %r1 … LDP %r1, [%r16, 0x3E] STP [%r16, 0x3E], %r1 LDP %r1, [%r15, 0x3E] STP [%r16, 0x3E], %r1 PFX %hi(0xF0) PFX %hi(0xF0) ADDI %r16, 0xF ADDI %r17, 0xF 1 1 1 1 2, 4 3, 4 2, 4 3, 4 2, 4 3, 4 2, 4 3, 4 1 1 Address @1 Construction Address @2 Construction Read cmd Write cmd Address @1 Update Address @2 Update
Page transfert Tips Address @1 & @2 Construction 8 64 Transferts 320/576 324/580 Clock cycles per 128 words access Address @1 & @2 update4 Throughput 80Mbytes/s, in the best case, if the memory instruction = data memory (Core running at 50MHz) If data memory /= instruction memory then, Throughput 45Mbytes/s (Core running at 50MHz)
Compiler Aspect The C Code • Basic Loop declaration • void main(void) • { • volatile long *segment_read =(long *) 0x2000; • volatile long *segment_write =(long *) 0x2500; • for (i=0; i<100; i++) • { • segment_write[i] = *segment_read; • } • }
Compiler Aspect Default Compiler's output for (i= 0; i < 100; i++) 101a: 03 98 pfx %hi(0x60) 101c: 82 34 movi %g2,0x4 { segment_write[i] = (*segment_read); 101e: 01 b0 ldp %g1,[%l0,0x0] 1020: 01 a8 stp [%l2,0x0],%g1 1022: ff 9f pfx %hi(0xffe0) 1024: e1 37 movi %g1,0x1f 1026: ff 9f pfx %hi(0xffe0) 1028: e1 6f movhi %g1,0x1f 102a: 22 00 add %g2,%g1 102c: c2 7e skprz %g2 102e: f7 87 br 101e <main+0xe> 1030: 92 04 addi %l2,0x4 } Loop index Data Loading Data Storing Loop test @destination update
Compiler Aspect Optimization Options -funroll-loops • Perform the optimization of loop unrolling. This is only done for loops whose number of iterations can be determined at compile time or run time. • How to use it ? NIOS-BUILD –cc "-funroll-loops" myprg.c
Compiler Aspect Optimization Options … segment_write[i] = *segment_read; 101e: 01 b4 ldp %g1,[%l1,0x0] 1020: 01 a0 stp [%l0,0x0],%g1 1022: 90 04 addi %l0,0x4 1024: 01 b4 ldp %g1,[%l1,0x0] 1026: 01 a0 stp [%l0,0x0],%g1 1028: 90 04 addi %l0,0x4 102a: 01 b4 ldp %g1,[%l1,0x0] 102c: 01 a0 stp [%l0,0x0],%g1 102e: 90 04 addi %l0,0x4 … 1054: 01 b4 ldp %g1,[%l1,0x0] 1056: 01 a0 stp [%l0,0x0],%g1 1058: ff 9f pfx %hi(0xffe0) 105a: c1 36 movi %g1,0x16 105c: ff 9f pfx %hi(0xffe0) 105e: e1 6f movhi %g1,0x1f 1060: 22 00 add %g2,%g1 1062: c2 7e skprz %g2 1064: dc 87 br 101e <main+0xe> 1066: 90 04 addi %l0,0x4 • Result Block Copy Block Copy 10 times Block Copy Block Copy Loop test @destination update
C writing Aspect Nios_map.h & Nios_peripherals.h • Nios_map.h & Nios_peripherals.h are generated directly by the megawizard in the /mynios_sdk/inc directory. How to use it in my C code ? Nios_peripherals.h // Timer Registers typedef volatile struct { int np_timerstatus; // read only, 2 bits (any write to clear TO) int np_timercontrol; // write/readable, 4 bits int np_timerperiodl; // write/readable, 16 bits … int np_timersnaph; // read only, 16 bits } np_timer; // Timer Register Bits enum { np_timerstatus_run_bit = 1, // timer is running np_timerstatus_to_bit = 0, // timer has timed out np_timercontrol_stop_bit = 3, // stop the timer … np_timercontrol_start_mask = (1<<2), // start the timer np_timercontrol_cont_mask = (1<<1), // continous mode np_timercontrol_ito_mask = (1<<0) // enable time out interrupt }; // Timer Routines int nr_timer_milliseconds(void); // Starts on first call, hogs timer1. Nios_map.h #define na_null ((void *) 0x00000000) #define na_mycpu_cpu ((void *) 0x00000000) #define na_mycpu_cpu_end ((void *) 0x00400000) #define na_rom_boot ((void *) 0x00000000) #define na_ram_sys ((void *) 0x00000400) #define na_ram_prg ((void *) 0x00001000) #define na_uart ((np_uart *) 0x00000800) #define na_uart_irq 20 #define na_timer ((np_timer *) 0x00000600) #define na_timer_irq 18 #define na_internal_ram_page_A ((void *) 0x00002000)
C writing Aspect Nios_map.h & Nios_peripherals.h Pointer created Here ! int main(void) { np_timer *timer = na_timer; long timerPeriod = 0xFFFFFFFF; // Set Timer timer->np_timerperiodh = timerPeriod >> 16; // Timer TimeOut Period timer->np_timerperiodl = timerPeriod & 0xffff; timer->np_timercontrol = timer->np_timercontrol | np_timercontrol_cont_mask; // Set Continuous mode timer->np_timercontrol = timer->np_timercontrol & ~np_timercontrol_ito_mask; // IRQ Disabled … Bit register selected Here ! Internal Timer Register selected Here !
NiosTM Advanced Training SESSION II Lab – Memory Accesses Measure
Goals • Creating a Quartus II project • Generating a NiosTM System Variation • Writing, compiling C code application to support Page transfers in different modes Template will be provided • Simulating with Modelsim (verilog mode) • Compiling w/ Quartus II Pin-Out file will be provided • Configuring the Nios board • Downloading SREC and Measure the access time • Using the GDB debugger
The Nios System Page A ROM Boot RAM system RAM Prg RAM Page A RAM Page B UART TIMER Page B Rx, Tx External ram On-chip bus Ext. bus On-System Nios
Creating a QuartusII project • Launch Quartus II • Open "FileNew Project Wizard" • Fill the three following fields • Working directory = "d:\training_nios\session2" • Project Name = memory_access • Top Level Name = memory_access • Clique on Finish • Open "File New…" • Select Block Diagram/Schematic File • Open "FileSave as…" • File Name = memory_access
Generating a Nios System Variation 1/2 • Launch the Nios Megawizard Plug-In Manager • Double click in the Schematic Window • Clique on the "MegaWizard Plug-In Manager…" button • Select "Create a new custom megafunction variation" • Select ALTERA Excalibur NiosTM megafunction • Select Verilog HDL output type • Give it the name mycpu • Do parameterise your core system • NIOS 32bits, 20bits @, 256 files reg., 3bits shifter, No MSTEP, No MUL
Generating a Nios System Variation 2/2 • Nios system organisation • Main Prog Memory = ram_prg • Main Data Memory = ram_sys • Host Communication = uart • Debug Communication = uart • Boot ID Message = Free to fill • Boot Device = ram_prg For the simulation we will boot on the ram_prg which will be precharged. For real use, in the board, we will change the boot device to rom_germs • Interrupt Vector Table = ram_sys • Synthesis Target Familly = None For the simulation, we don't synthesis the core • Place the Nios system symbol in the schematic window • Save the schematic file as memory_access.bdf "mycpu.ptf" file is generated which describes your whole Nios system
Writing & compiling a C Program • In Windows Explorer, create the directory mysrc in D:\training_nios\session2\mycpu_sdk\ • Copy the mem_access.c file in D:\training_nios\session2\mycpu_sdk\mysrc\ • Complete the program and set a transfert from • Internal Page A, to • External Page B. • Open a Bash Window & Go in D:\training_nios\session2\mycpu_sdk\mysrc\ • Run "NIOS-BUILDmem_access.c" to generate compiled Code • mem_access.srec • mem_access.objdump
PTF file modification • Open mycpu.ptf file with your Favorite Editor in D:\training_nios\session2\ • Turn on simulation support file generation by setting variable do_build_sim to 1 as follows: SYSTEM mycpu { WIZARD_SCRIPT_ARGUMENTS { do_build_sim = "1" ; • ram_prg user file specification • Find the MODULE ram_prg section and Change the following lines WIZARD_SCRIPT_ARGUMENTS { Writeable = "1"; Contents = "user_file"; Initfile = "mycpu_sdk\\mysrc\\mem_access.srec"; }
Generating the Simulation environment • Open a BASH window and go in"D:\training_nios\session2" • Run the following command • GENERATE_PROJECT mycpu • Create acompile_verilog.do in"D:\training_nios\session2\mycpu_sim" • Add vlog -work work ./mycpu_test_bench.v • Add vsim work.mycpu_test_bench
Simulating with ModelSim 1/5 • Launch Modelsim Altera-Edition or SE 5.4 • Open "FileChange directory.." menu and select "D:\training_nios\session2\mycpu_sim" • Type "do compile_verilog.do" in the command line • Open the "ViewStructure" menu • Open the "ViewSignal" menu
Simulating with ModelSim 2/5 • Select the following signals: • /mycpu_test_bench/the_mycpu_core/clk • /mycpu_test_bench/the_mycpu_core/reset_n • /mycpu_test_bench/the_mycpu_core/the_timer/irq • /mycpu_test_bench/the_mycpu_core/the_timer/timer_select • /mycpu_test_bench/the_mycpu_core/the_timer/internal_counter [set the radix format to dec] • /mycpu_test_bench/the_mycpu_core/the_mycpu_cpu/ifetch • /mycpu_test_bench/the_mycpu_core/the_mycpu_cpu/mem_addr [set the radix format to hex] • /mycpu_test_bench/the_mycpu_core/the_mycpu_cpu/data_from_cpu [set the radix format to hex] • /mycpu_test_bench/the_mycpu_core/the_mycpu_cpu/data_to_cpu [set the radix format to hex] • /mycpu_test_bench/the_mycpu_core/the_mycpu_cpu/mem_wr_n • /mycpu_test_bench/the_mycpu_core/the_mycpu_cpu/mem_rd_n • In the Waves window, open "EditDisplay Properties…" • Set to 1 the Signal Names path elements displayed • In the Waves window, Save your waves format as wave.do • Type "run 200µs" in the command line
Simulating with ModelSim 3/5 • Find the beginning of the transfert • Search for Value 0x2000 in the @ line • Count the number of clock cycles for the read access • Nb_read = _______ • Count the number of clock cycles for the write access • Nb_write = _______ • Find the @ or the instruction of the first read access and write access is fetched. How many clock cycles before the access is done ? • Pipe_length = ______
Simulating with ModelSim 4/5 Re-Simulating each times the SW Has Been Modified • Open a Bash Window & Go in D:\training_nios\session2\mycpu_sdk\mysrc\ • Run "NIOS-BUILDmem_access.c" to generate compiled Code • Open a BASH window and go in "D:\training_nios\session2" • Run the following command • GENERATE_PROJECT mycpu • Under ModelSim, in the command line • Type "do compile_verilog.do" • Type "do wave.do" • Type "run 200 µs"
Simulating with ModelSim 5/5 • Change the program in order to set the following transferts • From External Page A to Internal Page B • From External Page A to External Page B • From Internal Page A to Internal Page B • For every simulations, count the number of clock cycles.
Re-Generating the Nios system & Producing an EDIF file • Edit file mycpu.ptf in "D:\training_nios\session2" • Enable the synthesis by putting "skip_synth" option to 0 • Change the boot device memory by"rom_boot" which contents the GERMS monitor • Find the topic reset_module in WIZARD_SCRIPT_ARGUMENT of the MODULE mycpu_cpu • Open BASH window and go in "D:\training_nios\session1" • Run the following command • GENERATE_PROJECT mycpu
Compiling w/ Quartus II 1/2 • Under Quartus, double click in the schematic window The Symbol manager is launched • Type Input in the Name field and clique OK • Copy and past n times the Input symbol and connect all the input ports of the Nios symbol • Place a Output symbol in front of each output ports • Double Clique on each Input/Output symbol and change the name as shown hereafter
Compiling w/ Quartus II 2/2 • Select the APEX device type • Open "ProcessingCompiler Settings…" • Select "Chips & Devices" tab • Select Family APEX20KE and select EP20K200EFC484-2* • Clique on "Device & Pin Options" button • Select "Unused Pins" tab and select Reserve all unused pins "As inputs, tri-stated" • Clique OK 2 times. • Assign I/O pins accordingly to the Nios demo board features • Close the project by selecting "fileClose Project" • Under Windows Explorer open memory_access.csf file in "D:\training_nios\session2" and copy the I/O assignment in the CHIP session from the session2_io_pin.txt file provided * Please check on your board to know the exact 20K device mounted on it
Configuring the Nios board • Re-open the "memory_access" project • The I/O assignments are now taken into account • Start the compilation by selecting "ProcessingStart compilation" The compilation time takes about 6 minutes • Open "ProcessingOpen Programmer" menu • Clique "Add file" and select memory_access.sof file • Enable "Program/Configure" box • Start the configuration
Question ? • Why your application is running although you didn't send any SREC file through the UART by using Nios-Run ?
Downloading the code Nios soft Reset APEX hard Reset • Testing the connexion with the GERMS by • Going in BASH window, • Typing "NIOS-RUN –t" (terminal mode) • Reseting the Nios and typing ENTER (memory will be dumped) • Open a Bash Window & Go in D:\training_nios\session2\mycpu_sdk\mysrc\ • Download the SREC file by typing "NIOS-RUN mem_access.srec"
Measuring the throughput • Change the program in order to set the following transferts • From Internal Page A to Internal Page B • From Internal Page A to External Page B • From External Page A to External Page B • From External Page A to Internal Page B • For every simulations • Note the number of clock cycles required to complet the loop, • divide by the number of transfers done (32000), • Multiple by 8 to get the throughput in bytes/s
Using of the debugger • Add Code to “main” Function #if NIOS_GDB nios_gdb_install(1); nios_gdb_breakpoint(); #endif • Build The Program nios-build -debug mem_access.c • Run The Shell Script Produced By “nios-build” myprogram.gdb