540 likes | 703 Views
TI Enhanced ARM925T Core. OMAP1510 Architecture. TI Enhanced ARM925T Core Up to 168 MHz (maximum frequency) Voltage: 1.5v nominal 16KB I-cache; 8KB D-cache 192-KB of shared internal SRAM - frame buffer Support for 32-bit and 16-bit (Thumb mode) instruction sets Data and program MMUs
E N D
OMAP1510 Architecture • TI Enhanced ARM925T Core • Up to 168 MHz (maximum frequency)Voltage: 1.5v nominal • 16KB I-cache; 8KB D-cache • 192-KB of shared internal SRAM - frame buffer • Support for 32-bit and 16-bit (Thumb mode) instruction sets • Data and program MMUs • Two 64-entry translation look-aside buffers (TLBs) for MMUs • 17-word write buffer
TI925T – MPU SUBSYSTEM • ARM 9TDMI is enhanced by Texas Instruments and it is called as TI925T • Based on the Harvard Architecture • -- Separate bus for Address & Data • -- Allows concurrent Instruction & Data access(reduces CPI of processor) • 32- bit ARM mode and 16- bit Thumb mode
ARM9 • RISC Processor • Load Store Architecture • Fixed length and fixed time pipelined organization • Register Organization -- 16 GPRs under User mode • -- 5 Shadow registers under FIQ mode • -- 5 SP registers for exception mode stack handling • -- 5 LR registers for exception handling • -- 5 SPSRs to handle status flag contents • -- 1 CPSR to indicate status of ALU registers
CPU Details • Register Bank with 37 registers • 32 bit Address & Data Bus. • ALU • Barrel Shifter. • Multiplier.
Registers • The ARM core has a total of 37 registers. • 31 general-purpose registers, including a program counter. These registers are 32 bits wide. • 6 status registers. These are also 32 bits wide, but only 32 bits are allocated or need to be implemented.
SavedProgram Status Registers (SPSRs) • The SPSRs are used to store the CPSR when an exception is taken.One SPSR is accessible in each of the exception-handling modes. • User mode and System mode do not have an SPSR because they are not exception handling modes.
Current Program Status Register(CPSR) • The CPSR holds: • copies of the Arithmetic Logic Unit (ALU) status flags • the current processor mode • interrupt disable flags. • The ALU status flags in the CPSR are used to determine whether conditional instructions are executed or not. • On Thumb-capable processors, the CPSR also holds the current processor state (ARM or Thumb).
Program counter(pc) • The program counter is accessed as r15 (or pc). It is incremented by one word (four bytes) for each instruction in ARM state, or by two bytes in Thumb state. • Branch instructions and data opr. Instrns. load the destination address into the program counter. For example, to return from a subroutine, copy the link register into the program counter using: • MOV pc,lr • During execution, r15 does not contain the address of the currently executing instruction. The address of the currently executing instruction is typically pc– 8 for ARM, or pc– 4 for Thumb.
Memory Interface • ARM data bus (32-bit) • -To ease connection to sub-word sized memory systems, input data & instruction can be latched on byte by byte basis. • External data bus • - 32-bit bi-directional bus • - 32-bit unidirectional both data in & out buses.
Version 5 • Improve the efficiency of ARM/Thumb interworking in T variants • Adds some extra instruction in both ARM and Thumb mode • Adds more instruction options for coprocessor designers • Some instructions are unconditionally executed.
Additional Instructions • BKPT • BLX • CLZ • CDP2, LDC2, STC2, MCR2, MRC2 • Minor changes with LDR, LDM
BKPT Instruction (ARM) • Causes software breakpoint to occur • Handled by an exception handler installed on the prefetch abort vector. • Uses a 16 bit immediate value, but the value is ignored by ARM hardware, but may be used by the debugger to store additional information about breakpoint. • Unconditional instruction. • BKPT <immediate>
BKPT Instruction (Thumb) • Causes software breakpoint and uses prefetch abort vector. • Hardware can optionally override this behaviour. • Uses 8 bit immediate value. • BKPT <immediate_8>
BLX instruction (ARM) • Used to call a Thumb subroutine from ARM instruction set. • Unconditional branching • Uses 24 bit offset, which gives a range of +32 Mbytes. • BLX <target_address>
BLX instruction (ARM) • Uses the address specified in a register like BX instruction. • The least significant bit enters T bit of CPSR. • BLX {<cond>} <Rm>
BLX instruction (Thumb) • Uses 11 bit offset and works same as BL instruction. • BLX <target address>
BLX instruction (Thumb) • Uses the target address specified in a register. • T flag is updated with bit 0 of register specified. • BLX <Rm>
Condition code 0b1111 • Prior to V3 this refers to instruction was never executed (NV) • In V3 &V4 it is unpredictable. • In V5 this is used to encode various instructions which can only be executed unconditionally.
CDP2, LDC2, STC2, MCR2, MRC2 • Causes the conditional field of the instruction to be set to 0b1111. • This provides additional opcode space for coprocessor designers • Resulting instructions can only be executed unconditionally.
CLZ instruction (ARM) • Count Leading Zeros • CLZ {cond} <Rd>, <Rm> • Returns the number of binary zero bits before the first binary one bit in a register value. • Source register is scanned from the most significant bit towards the least significant bit. • Result is 32 if no bits are set in the source register and zero if bit 31 is set.
LDM instruction (ARM) • If PC get loaded in the process, then bit 0 of the loaded value determines whether the execution continues in ARM or Thumb mode. • T bit = Value [0]
POP instruction (ARM) • If PC gets loaded then bit 0 determines whether the execution continues after this branch in ARM state or in thumb state • T = bit[0]
LDR instruction (ARM) • If the destination register is PC, then bit 0 of the loaded value determines whether the execution continues in ARM or Thumb mode. • T bit = Value [0]
ARM 9 Architecture • Based on the Harvard Architecture -- Separate bus for Address & Data -- Allows concurrent Instruction & Data access(reduces CPI of processor) • Normally uses separate instruction and data cache.
ARM 9 Pipeline • Uses Five Stage pipeline • Instruction Fetch (F) • Instruction Decode (D) • Execute (E) • Data Memory Access (M) • Register Write (W)
Instruction Fetch Instr. Decode Reg. Fetch Execute Addr. Calc Memory Access Write Back Next PC MUX Next SEQ PC Adder Zero? 4 PC RS1 Reg File MUX RS2 Memory Inst Data Memory L M D ALU RD MUX MUX Sign Extend Imm WB Data Pipeline stages :Cycle 1 and cycle 2
Pipeline Stage 1 & 2 1. Instruction fetch cycle (IF) load instruction update program counter 2. Instruction decode / register fetch cycle(ID) fetch source registers sign-extend immediate field
Pipeline Stage 3 • The third cycle is known as the Execution/ effective address cycle (EX) • The actions performed in this cycle depend on the type of operations. • Loads and Stores • calculate effective address • ALU operations • perform ALU operation • Branch • compute branch target • determine if the branch is taken
Pipeline Stage 4 • The fourth cycle is known as the Memory access / branch completion cycle (MEM) • The only DLX instructions active in this cycle are loads, stores, and branches • Loads • load memory onto processor • Stores • store data into memory • Branch • go to branch target or next instruction • ALU Operations • do nothing
Pipeline Stage 5 • The fifth cycle is known as the Write-back cycle (WB) • During this cycles, results are written to the register file • Loads • write value from memory into register file • ALU Operations • write ALU result into register file • Stores and Branches • do nothing
Reg Reg Reg Reg Reg Reg Reg Reg Ifetch Ifetch Ifetch Ifetch DMem DMem DMem DMem ALU ALU ALU ALU Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Visualizing Pipelining Time (clock cycles) I n s t r. O r d e r
The MPU core incorporates: • A coprocessor 15 (CP15) and protection module • Data and program memory management units (MMUs) with translation look-aside buffers. • A separate 16K-byte instruction cache and 8K-byte data cache. Both are two-way associative with virtual index virtual tag (VIVT). • A 17-word write buffer (WB) • A local bus interface • The OMAP1510 device uses the TI925T core in little endian mode only.
Cache Memory • Cache memory minimizes external memory access and allows the use of low-cost RAM while maintaining maximum performance. • Cached cores are also ideal in systems where the processor must share limited bus bandwidth with other devices requiring high data throughput (such as streaming audio or video). • The processor operates at full speed from the cache, leaving the system bus free for use by other devices.
Tag Index Offset V T D V T D Data Valid = = 2 Way set-associative mapping • Compromise between direct mapping and fully associative mapping • Index same as in direct mapping • But, each cache address contains content and tags of 2 or more memory address locations • Tags of that set simultaneously compared as in fully associative mapping • Cache with set size N called N-way set-associative • 2-way, 4-way, 8-way are common
Instruction Cache • The 16K-byte instruction cache (I-cache) has 1024 lines of 16 bytes arranged as a two-way set-associative cache. It uses the virtual addresses generated by the processor core. • The I-cache is always reloaded one line at a time. • It can be enabled or disabled via the CP15 control register (I_CP15 bit) and is disabled and flushed upon reset.
Instruction Cache • When the I-cache is enabled, it is searched whenever the processor requests an instruction. • If the cache hits, data is returned to the core whether the MMU is enabled or not. • If a cache read misses, a line fetch is performed and data is written to the cache following a least recently used (LRU) replacement algorithm. • For best performance, enable the I-cache as soon as possible after reset. If the I-cache is disabled, it is not searched. • All instruction fetches generate a single 16-bit or 32-bit external access.
Validity of I-Cache • The flush I-cache instruction is fetched at cycle time 0, for example, but not executed until cycle time 4 (the TI925T uses a five-stage opcode pipe). • Thus, four additional opcodes potentially are still fetched from the I-cache before the flush I-cache opcode is executed. • Once executed, the entire I-cache is invalidated before the next opcode executes. • The I-cache content is not flushed when the I-cache is disabled. Its contents remain valid and are accessible again when the I-cache is reenabled.
Data Cache • The 8K-byte data cache (D-cache) has 512 lines of 16 bytes arranged as a two-way set-associative cache. It uses the virtual addresses generated by the processor. • The D-cache is always reloaded one line at a time, because it always requires the MMU to be enabled. • The MMU can operate in write-through (WT) or in copy-back (CB) mode. • The translation look-aside buffer (TLB) descriptors that are placed in memory determine which mode is used. • D-cache is disabled and flushed upon reset. • The D-cache supports byte,half-word, and word accesses. • The D-cache is always disabled when the MMU is off.
Operation of D-Cache • If the D-cache is enabled, it is searched whenever the processor performs a data load or store. • If the cache hits on a load, data is returned to the core regardless of the C_MMU bit. • If a cache read misses, the C_MMU bit is examined. If it is 1, a line fetch is performed and the line is written to the cache following an LRU (least recently used) replacement algorithm. • If C_MMU is 0, a single external access is performed and the cache is not updated. • Stores that hit the D-cache always update it, regardless of the C_MMU bit, to keep the D-cache contents consistent with the external memory. • Stores that miss do notupdate the D-cache
Validity of D-Cache • The D-cache always requires that the MMU be enabled. • The CP15 register allows software to invalidate the entire D-cache. • Disabling the D-cache and reenabling it does not invalidate it. • If CB mode is used, software must first clean the cache to make it coherent with main memory • Cleaning is not the same as flushing. • The entire D-cache can be invalidated with a single flush D-cache instruction through the CP15 cache operation register. • The D-cache is flushed upon reset. • If the D-cache is disabled, its content is maintained valid and is accessible when the cache is reenabled.
Write Buffer • The write buffer (WB) increases system performance and can buffer up to seventeen 32-bit words of data. • The MMU attributes B (B_MMU) and C (C_MMU) (which are part of the TLB descriptor) and the CP15 control register W bit (W_CP15) control WB behavior. • Clearing W_CP15 and C_CP15 upon reset ensures that all accesses are non-bufferable until the MMU is enabled. To use the write buffer,the MMU must be enabled.
Enabling Write buffer • To use the write buffer, you must enable the MMU. • However, you can enable the two functions simultaneously with a single write to the CP15 control register. • The write buffer is always disabled when the MMU is off. • Clearing bit 3 in the CP15 control register disables the write buffer.
Coprocessor 15 • TI925T operation and configuration are controlled with coprocessor instructions,configuration pins, and the MMU translation tables. • The coprocessor instructions manipulate on-chip registers, which control the configuration of the cache memories, write buffer, MMU.
Memory Management Unit • The MPU MMU performs virtual-to-physical address translations and access permission checks for access to the system memory • provides the flexibility and security required for the OS to manage physical memory space shared by the DSP subsystem and the MPU subsystem. • The MPU MMU provides no protection from DSP shared memory accesses. • The MMU supports memory accesses based on sections or pages: (Sections represent memory blocks of 1M byte). Three different page sizes are supported: • Large pages consist of 64K-byte blocks of memory. • Small pages consist of 4K-byte blocks of memory. • Tiny pages consist of 1K-byte blocks of memory.
The MMU hardware required to perform these functions consists of: • A 64-entry translation look-aside buffer for instructions (I_TLB) • A 64-entry translation look-aside buffer for data (D_TLB) • Access control logic • Translation table walking logic
Translation Look-Aside Buffer The TLB contains entries for virtual-to-physical address translation and access permission checking. Access control logic If the TLB contains a translated entry for the VA, the access control logic determines whether the access is permitted. If access is permitted, the MMU generates the appropriate PA corresponding to the VA. If access is not permitted, the MMU sends an abort signal to TI925T.