1 / 45

Memory

INF5060: Multimedia data communication using network processors. Memory. 10/9 - 2004. Overview. Memory on the IXP cards Kinds of memory Its features Its accessibility Microengine assembler Memory management. Kinds of Memory. Microengine. IXP Functional Units. Host machine. PCI Bus.

myrat
Download Presentation

Memory

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. INF5060:Multimedia data communication using network processors Memory 10/9 - 2004

  2. Overview • Memory on the IXP cards • Kinds of memory • Its features • Its accessibility • Microengine assembler • Memory management

  3. Kinds of Memory

  4. Microengine IXP Functional Units Host machine PCI Bus 64 bit/33Mhz PCI-to-PCI bridge IXP Network Processor PCI Bus Unit SDRAM (up to 256 MB) 64 bit/116Mhz StrongARM Core SDRAM Unit Various busses SRAM (up to 8 MB) 32 bit/116Mhz SRAM Unit IX Bus Unit Flash ROM (up to 8 MB) Memory Mapped I/O devices 64 bit/104Mhz IX Bus (other IX devices) Ethernet MAC

  5. Kinds of Memory • Physical memory on the IXP1200 is contiguous • Memory in general is not byte-addressable • Memory units emulate byte addressing for the StrongARM • Big endian architecture • StrongARM: big endian mode • Microengines are big endian

  6. Terms • Careful ! Inconsistencies ! • Wording in Intel IXP manuals • Word: 16 bit • Longword: 32 bit • Quadword: 64 bit • Wording in StrongARM and other ARM manuals • Halfword: 16 bit • Word: 32 bit

  7. SDRAM Scratchpad Microengine registers SRAM Kinds of Memory • Memory accessible to StrongARM • Mapped into a single address space • Memory accessible to microengines • Individually mapped • Separate assembler instructions for each kind Device 6 SDRAM Unit FFFF FFFF C000 0000 Device 5 AMBA Translation Unit B000 0000 Device 4 Reserved A000 0000 Device 3 StrongARM Core System 9000 0000 Device 2 Reserved 8000 0000 Device 1 PCI Unit 4000 0000 Device 0 SRAM Unit 0000 0000

  8. Memory: memory, cache memory, registers • StrongARM core caches • Microengine registers • SDRAM • SRAM • IX Bus Unit: Scratch(pad) memory

  9. StrongARM

  10. StrongARM Core Features • A general purpose processor • With MMU • 16 Kbytes instruction cache • Round robin replacement • 8 Kbytes data cache • Round robin replacement • Write-back cache, cache replacement on read, not on write • 512 byte mini-cache for data that is used once and then discarded • To reduce flushing of the main data cache • Instruction code stored in SDRAM

  11. Full access to SDRAM Unit SRAM Unit incl. FlashROM PCI Bus Unit Access to microengine’s Program code Status registers Program counters Access to IX bus unit’s Status registers Scratch memory Microengine StrongARM Core Access SDRAM Unit SRAM Unit StrongARM Core PCI Bus Unit IX Bus Unit

  12. Microengines

  13. Microengine Features • 4 hardware contexts • 2K x 32 bit instruction control store • Every instruction is 32 bits long • No instruction cache • Instructions downloaded onto the microengine by the StrongARM • Not loaded from RAM on demand • 5-stage instruction pipeline • Blocks for reference operations • Deferred execution to reduce context switch penalty • 256 registers • 32 bit registers • Load and store architecture • Must bring data into registers, work, write to destination • Single cycle access in registers • Use “reference command” to fetch into registers • Yield/sleep during fetch execution

  14. Full access to SDRAM Unit SRAM Unit IX Bus Unit Access to StrongARM Interrupts Trigger status register reads Access to PCI bus unit Initiate DMA with SDRAM Access to other microengines None Access to self Inter-thread signaling No access to own instruction code Microengine Microengine Access SDRAM Unit SRAM Unit PCI Bus Unit MicroEngine StrongARM Core IX Bus Unit

  15. Microengine Registers From: IXP1200 Family Hardware Reference Manual

  16. 256 registers 128 general purpose registers Arranged in two banks A and B Instructions with 2 input registers From different banks Otherwise assembler warning 128 transfer registers Transfer registers are not general purpose registers Ports to their neighboring functional unit 64 SDRAM transfer registers Transfer to and from SDRAM 32 read / 32 write 64 SRAM transfer registers Transfer to and from everything but SDRAM 32 read / 32 write 4 busses can be used in parallel By different threads Loading transfer registers 64 bytes at once from one functional unit to another 128 bytes at once from the IX bus Microengine Registers

  17. SDRAM

  18. General features • Recommended use • StrongARM instruction code • Large data structures • Packets during processing • 64-bit addressed (8 byte aligned, quadword aligned) • 256 Mbytes • 928 Mbytes/s peak bandwidth • Higher bandwidth than SRAM • Higher latency than SRAM • Access • StrongARM • Microengines • StrongARM takes precedence • PCI DMA on behalf of microengines • Direct access to IX Bus Unit’s Transmit and Receive FIFO

  19. Special features • Byte, word, longword access supported through a read-modify-write access to quadwords • Speed penalty • Direct path from SDRAM to IX Bus Transmit and Receive FIFOs • Controlled by microengines • Up to 64 bytes transferable without microengine involvement • Byte aligner between SDRAM and IX Bus • For sending to the Transmit FIFO • Shift bytewise when e.g. header length has changed • Can only be used by microengines in the t_fifo_wr command

  20. SRAM

  21. General features • Recommended use • Lookup tables • Free buffer lists • Data buffer queue lists • 32-bit addressed (4 byte aligned, word aligned) • 8 Mbytes • 464 Mbytes/s peak bandwidth • Lower bandwidth than SDRAM • Lower latency than SDRAM • Access • StrongARM • Microengines • StrongARM takes precedence

  22. Accessing SRAM • StrongARM access • Byte, word and longword access • Bit operations through SRAM Alias Address Space • Bit, byte, word write supported through read-modify-write • Microengine access • Bit and longword access only • Up to 8 longwords with one command • Bit write supported through read-modify-write • Bit operations within instructions

  23. Special features • Atomic push/pop operations • For maintaining lists • 8 entry push/pop register list • Microengines • Named commands • StrongARM • Dedicated memory addresses • Don’t cache these memory areas • Atomic bit test, set and clear • For synchronized access • Microengine • Use a write transfer register • Specify bits to test, read, or write • Reading the bit changes the write transfer register • StrongARM • Special macros for read-modify-write operations • Blocks until operation is completed • Don’t cache this memory

  24. Special features • 8 entry CAM (content addressable memory) for read locks • For synchronized access • 8 concurrent locks on memory • Protect from StrongARM and microengines • Read, unlock and write_unlock • Microengines • sram assembler command • Waits until locks is released • StrongARM • 3 separate 8 MByte mapped memory regions • Failed locking is indicated by flags, read always successful • Don’t cache these memory areas

  25. StrongARM Core Memory Map Device 6 SDRAM Unit FFFF FFFF C000 0000 Device 5 AMBA Translation Unit B000 0000 Device 4 Reserved A000 0000 Device 3 StrongARM Core System 9000 0000 Device 2 Reserved 8000 0000 Device 1 PCI Unit 4000 0000 Device 0 SRAM Unit 0000 0000

  26. Memory Map for SRAM addresses

  27. IX Bus Unit

  28. IX Bus Unit IXP Network Processor SDRAM Unit StrongARM Microengines IX Bus Unit “FBI” Engine Interface Scratchpad Status Registers Hash Units Transmit FIFO Receive FIFO IX Bus (other IX devices) Ethernet MAC

  29. Scratch Memory: General Features • Recommended use • Passing messages between processors and between threads • Semaphores, mailboxes, other IPC • 32-bit addressed (4 byte aligned, word aligned) • 4 Kbytes • Has an atomic autoincrement instruction • Only usable by microengines

  30. StrongARM Core Memory Map Device 6 SDRAM Unit FFFF FFFF C000 0000 Device 5 AMBA Translation Unit B000 0000 Device 4 Reserved A000 0000 Device 3 StrongARM Core System 9000 0000 Device 2 Reserved 8000 0000 Device 1 PCI Unit 4000 0000 Device 0 SRAM Unit ME = microengine 0000 0000

  31. Microengine Assembler

  32. Programming Context-relative addressing Each threads can have its own window of registers (one 4th of the total), so they can’t overwrite each other Absolute addressing Register is visible to all threads Context-relative vs. absolute addressing Decided on a per-instruction basis Assembler Supports symbolic names Assigns registers from the different kinds Programmer must take care concerning the number of registers used can hint the assembler to assign (transfer) registers contiguously Context-relative addressing of the registers Threads are only able to address their own register share This is more typically used Assembler notations symbolic_register_name – general purpose register $symbolic_register_name – SRAM transfer register $$symbolic_register_name – SDRAM transfer register Absolute addressing Threads can use more than their share of registers Threads can communicate via registers Assembler notations @symbolic_register_name – general purpose register @$symbolic_register_name – SRAM transfer register @$$symbolic_register_name – SDRAM transfer register Using Microengine Registers

  33. Microengine Assembler • ALU • alu[dest_reg, A_operand, alu_op, B_operand] • Perform addition, subtraction, bit operations • dest_reg • transfer register (TR), general purpose register (GPR) or nothing • A_operand • TR, GPR, immediate data, or nothing • B_operand • TR, GPR, or immediate data • ALU_SHF • alu_shf[dest_reg, A_operand, alu_op, B_operand, B_op_shift_cnt] • Like ALU, but shift B_operand before evaluation • dest_reg • Context-relative TR, GPR, or nothing • A_operand • TR, GPR, immediate data, or nothing • B_operand • TR, GPR, or immediate data

  34. Microengine Assembler • BR_BCLR, BR_BSET • br_bclr[reg, bit_position, label#] • Branch if the given bit (0-32) in register reg is cleared or set, respectively • reg • Context-relative TR or GPR • BR=BYTE, BR!=BYTE • Br=byte[reg, byte_spec, byte_compare_value, label#] • Ranch if the indicated byte (0-3) of register reg is of the constant value byte_compare_value, or not, respectively • reg • Context-relative TR or GPR

  35. Microengine Acess to SDRAM • Read, write, Receive FIFO read, Transmit FIFO write • sdram[sdram_cmd, $$sdram_xfer_reg, source_op_1, source_op_2, ref_count], optional_token • Parameters • sdram_cmd • read: read from SDRAM to TRs • write: write from TRs to SDRAM • r_fifo_rd: read from Receive FIFO to SDRAM • t_fifo_wr: write to Transmit FIFO from SDRAM • $$sdram_xfer_reg • The first of a set of contiguous TRs for read and write operations • One ref_count requires to TRs • source_op_1/2 • Specifies the address to read from or to write to • ref_count • Values between 1 and 8 are valid • optional_token • ctx_arb allows other threads to run until memory operation is complete • ctx_swap switches context to the next thread • The (complicated) indirect_ref option must be used r_fifo_rd and t_fifo_wr

  36. Microengine Access to SRAM (1/2) • Read, write, read and lock, write and unlock, unlock, … • sram[sram_cmd, $sram_xfer_reg, source_op_1, source_op_2, ref_count] optional_token • sram_cmd • Read or write • $sram_xfer_reg • the first of ref_count contiguous TRs • source_op_1+source_op_2 • Specifies the address to read from or to write to • ref_count • The number of longwords read or written • sram[read_lock, $sram_xfer_reg, source_op_1, source_op_2, ref_count] optional_token • Like sram[read, …] • But lock the address source_op_1+source_op_2 • sram[write_unlock, $sram_xfer_reg, source_op_1, source_op_2, 1] optional_token • Write one TR to source_op_1+source_op_2 and unlock the address • sram[unlock, --, source_op_1, source_op_2, 1] optional_token • Unlock the address specified by souce_op_1+source_op_2

  37. Microengine Access to SRAM (2/2) • …, bit operations, push, pull • sram[bit_wr, $bit_mask, source_op_1, source_op_2, bit_op] optional_token • As with scratch memory but with the larger address space • $bit_mask is a write TR holds mask on input and optional results • sram[push, --, source_op_1, source_op_2, queue_num] optional_token • Add source_op_1 and source_op_2 to get an address • Push the address onto queue queue_num • sram[pop, $popped_list, --, --, queue_num] optional_token • Pop an address from queue queue_num • Store the pointer in the TR $popped_list

  38. Microengine Access to Scratch Memory • Read, write, bit operations, in-place increment • scratch[bit_wr, $sram_xfer_reg, source_op_1, source_op_2, bit_op], optional_token • Bit operations • scratch[read, $sram_xfer_reg, source_op_1, source_op_2, ref_count], optional_token • Read into transfer registers • scratch[write, $sram_xfer_reg, source_op_1, source_op_2, ref_count], optional_token • Write from transfer registers • scratch[incr, --, source_op_1, source_op_2, 1], optional_token • In-place increment by 1 • Parameters • source_op1/2 • Context-relative transfer registers (TRs) or immediate values • Sum between 0 and 1023 • $sram_xfer_reg • For read and write: the first of a set of contiguous TRs to be read or written • For bit_wr: a TR containing a bit mask • ref_count • Number of longwords read or written • Between 1 and 8 • bit_op • set_bits, clear_bits, test_and_set_bits, test_and_clear_bits • For the test_ operations, the write TR is modified

  39. Microengine Assembler • Ordering problems • Example immed[$$temp, 0x1234] sdram[write,$$temp,base,0,1], ctx_swap, defer[1] immed[$$temp,0x5678] • The wrong value may be written • Writing and context swapping are deferred • The register modification may overtake • Address of a register • It is possible to determine the address of a register • .local a_gp_reg • immed[a_gp_reg,&$an_sram_reg] • .endlocal

  40. Memory Management

  41. Resource Manager • Task • Used by StrongARM code • For microACEs and microACE applications to interface with microengines • API • Load code into microengines • Enable/disable microengines • Get/set microengine configuration and resource assignment • Send and receive packets to and from microcode blocks • Allocate and access uncached SRAM, SDRAM and Scratch memory

  42. Resource Manager • Data structures • RmMemoryHandle • Opaque handle identifying memory allocated by the resource manager • typedef int RmMemoryHandle

  43. Resource Manager • RmMalloc • Allocate a particular kind of memory • RM_SRAM • RM_SDRAM • RM_SCRATCH • Some SRAM and SDRAM is already used by the ASL, some SDRAM is used by Linux, the rest can be used freely by microACEs for data structures of its choosing • The memory is not cached • The memory is not protected by an MMU, and the virtual address is the same for all processes • Returned pointers are always aligned (SDRAM to 8 bytes, SRAM and Scratch to 4 bytes) • Requested sizes are rounded to alignment • This allocation is not efficient • microACEs should allocate all memory they need at once and manage it themselves • ix_error RmMalloc( RmMemoryType in_memory_type, unsigned char* out_mem_handle_ptr, int in_size_in_bytes ); • RmFree • Released memory allocated by RmMalloc • ix_error RmFree( unsigned char* ptr );

  44. Resource Manager • Translating between virtual and physical addresses • The microengines map memory differently into their address space then the StrongARM • StrongARM addresses make no sense and have to be translated to offsets from the start of each particular kind of memory (and back) • RmGetPhysOffset • ix_error RmGetPhysOffset( RmMemoryType in_memory_type, unsigned char* in_data_ptr, unsigned int* out_offset ); • Translate address in_data_ptr in RmAlloc’d memory to its offset from the given memory type • The offset is in words (4 byte units) for SRAM and Scratch, and in quadwords (8 byte units) for SDRAM • RmGetVirtualAddress • ix_error RmGetVirtualAddress( RmMemoryType in_memory_type, unsigned char** out_buffer_ptr, unsigned int in_offset); • Take the physical offset from the base of the given memory type and translate it into a virtual address valid for the StrongARM

  45. Summary • Memory on the IXP cards • Kinds of memory • Its features • Its accessibility • Microengine assembler • Resource Manager functionsStrong

More Related