350 likes | 473 Views
ADVANCED PROCESSOR ARCHITECTURES AND MEMORY ORGANISATION – ARM. The ARM architecture processors popular in Mobile phone systems. ARM (Advanced RISC Machine) Features. ARM has 32-bit architecture but supports 16 bit or 8 bit data types also.
E N D
ADVANCED PROCESSOR ARCHITECTURESAND MEMORY ORGANISATION – ARM
The ARM architecture processors popular in Mobile phone systems
ARM (Advanced RISC Machine) Features • ARM has 32-bit architecture but supports 16 bit or 8 bit data types also. • ARM is programmable as little endian or big endiandata alignment in memory. • ARM provides the advantage of using a CISC in terms of functionality, along with the advantage of an RISC in terms of faster program implementation as well as reduced code lengths. • ARM processor has an RISC core for processing • Combination of RISC and CISC features - ARM supports to a complex addressing modes based instruction set
In-built compilation unit • Compiles the CISC instructions into RISC formats, which are then implemented by the RISC core of the processor. • Internally the implementation for many instructions is like in an RISC (without the micro-programmed unit) Jazelle technology • Faster Java codes execution
ARM Thumb 16-bit instructions • Thumb Set designed for 16-bit word lengths and instructions, which internally executes by same 32-bit core. • Instruction fetch of 2 bytes in Thumb mode in place of 4 bytes in ARM mode. • Data alignment at steps of 2 bytes in Thumb mode in place of 4 bytes in ARM mode Memory savings of up to 35%, over the equivalent 32-bit code, while retaining all the benefits of a 32-bit system (such as access to a full 32-bit address space). • Enables 32-bit performance at the 8/16-bit system cost in terms of memory needs.
Thumb and 32-bit ARM modes • Switch from one mode to another • No overheads (in terms of time and memory) in moving between Thumb and the normal ARM state of the codes. Two states are compatible on a normal basis. • Gives code designer complete control over performance and code-size optimization.
ARM7 versions • ARM7TDMI® (Integer Core) • ARM7TDMI-S™ (Synthesisable version of ARM7TDMI) • ARM7EJ-S™ (Synthesisable core with DSP and Jazelletechnology) • ARM720T™ (cached processor macrocell , 8K Cached Core with Memory Management Unit (MMU) supporting operating systems including Windows CE, Palm OS, Symbian OS and Linux) • 130 MIPS using Dhrystone 2.1 benchmark in typical 0.13μm process
ARM9 versions • ARM920T (Dual 16k caches with MMU support multiple OSs. • ARM922T (Dual 8k caches for applications support multiple OSs. • ARM940T™ (Dual 4k caches for embedded control applications running a RTOS) • 32-bit RISC processor core Super scaling 5-stage integer pipeline. 8-entry write buffers to avoid blocking the processor on external memory writes • Achieves 1.1 MIPS/MHz, 300 MIPS (Dhrystone 2.1) in a typical 0.13μm process
ARM11 versions • Families with ARMv6 instruction set architecture that includes the Thumb® extensions for code density, Jazelle™ technology for Java™ acceleration, ARM DSP extensions, and SIMD media processing extensions. MMU supporting operating systems and palm OS • 32-bit RISC processor core with 8-stage integer pipeline, static and dynamic branch prediction, and separate load-store and arithmetic pipelines to maximize instruction throughput • Targets a performance range of Dhrystone MIPS 400 to 1200
Memory Architecture • ARM7 has Princeton memory architecture. • ARM9 processor has Harvard architecture
Faster implementation and Reduced code lengths • Due to the instant availability of the register word to the execution-unit. • Reduced code lengths─ Most instructions use registers as operands. • Few bits in the instruction specify a register as operand. 8, 16 or 32 bits specify a memory address as operand and the displacement bits in the instruction
ARM registers • R0 to R15. • R15 also function as program counter. • R14 function as link register. • R13 may be used as stack pointer. • CPSR (current program status register). • SPSR (saved program status register).
Processor Modes • The ARM has seven basic operating modes: • User : unprivileged mode under which most tasks run • FIQ : entered when a high priority (fast) interrupt is raised • IRQ : entered when a low priority (normal) interrupt is raised • Supervisor : entered on reset and when a Software Interrupt instruction is executed • Abort : used to handle memory access violations • Undef : used to handle undefined instructions • System : privileged mode using the same registers as user mode
Current Visible Registers Current Visible Registers Current Visible Registers Current Visible Registers Current Visible Registers Current Visible Registers r0 r0 r0 r0 r0 r0 r0 Abort Mode SVC Mode Undef Mode FIQ Mode User Mode IRQ Mode r1 r1 r1 r1 r1 r1 r1 r2 r2 r2 r2 r2 r2 r2 Banked out Registers Banked out Registers Banked out Registers Banked out Registers Banked out Registers Banked out Registers r3 r3 r3 r3 r3 r3 r3 r4 r4 r4 r4 r4 r4 r4 r5 r5 r5 r5 r5 r5 r5 User User User User User FIQ FIQ FIQ FIQ FIQ FIQ IRQ IRQ IRQ IRQ IRQ IRQ SVC SVC SVC SVC SVC SVC Undef Undef Undef Undef Undef Undef Abort Abort Abort Abort Abort Abort r6 r6 r6 r6 r6 r6 r6 r7 r7 r7 r7 r7 r7 r7 r8 r8 r8 r8 r8 r8 r8 r8 r8 r8 r8 r8 r8 r8 r9 r9 r9 r9 r9 r9 r9 r9 r9 r9 r9 r9 r9 r9 r10 r10 r10 r10 r10 r10 r10 r10 r10 r10 r10 r10 r10 r10 r11 r11 r11 r11 r11 r11 r11 r11 r11 r11 r11 r11 r11 r11 r12 r12 r12 r12 r12 r12 r12 r12 r12 r12 r12 r12 r12 r12 r13 (sp) r13 (sp) r13 (sp) r13 (sp) r13 (sp) r13 (sp) r13 (sp) r13 (sp) r13 (sp) r13 (sp) r13 (sp) r13 (sp) r13 (sp) r13 (sp) r13 (sp) r13 (sp) r13 (sp) r13 (sp) r13 (sp) r13 (sp) r13 (sp) r13 (sp) r13 (sp) r13 (sp) r13 (sp) r13 (sp) r13 (sp) r13 (sp) r13 (sp) r13 (sp) r13 (sp) r13 (sp) r13 (sp) r13 (sp) r13 (sp) r13 (sp) r13 (sp) r13 (sp) r13 (sp) r13 (sp) r13 (sp) r13 (sp) r14 (lr) r14 (lr) r14 (lr) r14 (lr) r14 (lr) r14 (lr) r14 (lr) r14 (lr) r14 (lr) r14 (lr) r14 (lr) r14 (lr) r14 (lr) r14 (lr) r14 (lr) r14 (lr) r14 (lr) r14 (lr) r14 (lr) r14 (lr) r14 (lr) r14 (lr) r14 (lr) r14 (lr) r14 (lr) r14 (lr) r14 (lr) r14 (lr) r14 (lr) r14 (lr) r14 (lr) r14 (lr) r14 (lr) r14 (lr) r14 (lr) r14 (lr) r14 (lr) r14 (lr) r14 (lr) r14 (lr) r14 (lr) r14 (lr) r15 (pc) r15 (pc) r15 (pc) r15 (pc) r15 (pc) r15 (pc) r15 (pc) cpsr cpsr cpsr cpsr cpsr cpsr cpsr spsr spsr spsr spsr spsr spsr spsr spsr spsr spsr spsr spsr spsr spsr spsr spsr spsr spsr spsr spsr spsr spsr spsr spsr spsr spsr spsr spsr spsr spsr spsr spsr spsr spsr spsr The ARM Register Set
The Registers • ARM has 37 registers all of which are 32-bits long. • 1 dedicated program counter • 1 dedicated current program status register • 5 dedicated saved program status registers • 30 general purpose registers • The current processor mode governs which of several banks is accessible. Each mode can access • a particular set of r0-r12 registers • a particular r13 (the stack pointer, sp) and r14 (the link register, lr) • the program counter, r15 (pc) • the current program status register, cpsr • Privileged modes (except System) can also access • a particular spsr (saved program status register)
Condition code flags N = Negative result from ALU Z = Zero result from ALU C = ALU operation Carried out V = ALU operation oVerflowed Sticky Overflow flag - Q flag Architecture 5TE/J only Indicates if saturation has occurred J bit Architecture 5TEJ only J = 1: Processor in Jazelle state Interrupt Disable bits. I = 1: Disables the IRQ. F = 1: Disables the FIQ. T Bit Architecture xT only T = 0: Processor in ARM state T = 1: Processor in Thumb state Mode bits Specify the processor mode 31 28 27 24 23 16 15 8 7 6 5 4 0 N Z C V Q I F T mode U n d e f i n e d J f s x c Program Status Registers
Program Counter (r15) • When the processor is executing in ARM state: • All instructions are 32 bits wide • All instructions must be word aligned • Therefore the pc value is stored in bits [31:2] with bits [1:0] undefined (as instruction cannot be halfword or byte aligned). • When the processor is executing in Thumb state: • All instructions are 16 bits wide • All instructions must be halfword aligned • Therefore the pc value is stored in bits [31:1] with bit [0] undefined (as instruction cannot be byte aligned). • When the processor is executing in Jazelle state: • All instructions are 8 bits wide • Processor performs a word access to read 4 instructions at once
ARM Codes • ARM Codes─ Forward compatible with higher versions. • ARM7 codes ─ Forward compatible with ARM9, ARM9E and ARM10 processors as well as Intel XScale micro-architecture. • ARM9E and ARM 10 families use a Vector Floating Point (VFP) ARM coprocessor, which adds full floating point operands. • VFP also provides fast development in SoC design when using tools like MatLab®. • Applications are in image processing (scaling), 2D and 3D transformations, font generation and digital filters.
ARM Intelligent Energy Manager (IEM) technology • Advanced algorithms to optimally balance processor workload and energy consumption. • Maximizes system responsiveness. • IEM works with the operating system and mobile OS. • Application running on a mobile phone dynamically adjusts the required CPU performance level.
ARM processors AHB (AMBA Advanced High Performance Bus) interface • AMBA an established open source specification for on-chip interconnects. • AMBA serves as a framework for SoC designs and development of the IP library. • AHB support in all new ARM cores. • Provides a high-performance and fully synchronous back plane. (Back plane means additional set of controllers, which can access another common bus, which is distinct from system bus in a multilevel buses in the system.) • Multi-layer AHB in version ARM926EJ-S and all members of the ARM10 family represents a significant advancement. It reduces access latencies and increases the bandwidth available to multi-master systems
ARM Instruction Set Features • Two Instruction Sets─ 16-bit Thumb and 32-bit ARM mode instructions • Operations on 8-bit or 16-bit or 32-bit data types • Data alignment in memory: Two byte words in Thumb set and Four in 32-bit ARM mode
ARM7 instruction set: Data Transfer Instructions • Register-load a byte (LDRB). • Register- byte store (STRB). • Register Half Word store (STRH). [A word in ARM is of 32 bits]. • Register-load Half Word as such or signed (LDRH or LDRSH). • Instructions for transfer between the register memories. The memory address is as per a register used as index or index-relative or post auto-index addressing mode. • Register-load a word (LDR). • Register-word stores a word (STR). • Set a memory address into a register (ADR). Address is of 12 bits. [Alternative for 16 bits address setting in a register is using any register or r15 in an arithmetic operation].
Word transfer between registers • Move (MOV). • Move reverse (MVR). Load or move or store instruction conditionally implementation • Conditions─ signednumberLT(LessThan), GT(GreaterThan), LE(Less or Equal), EQ(Equal), NE (not equal), VS (overflow), VC (no overflow), GE • Conditions─ unsigned number HI (higher), LS (lower), PL (plus, nor Negative), MI (minus), CC (carry bit reset), and CS (carry bit set). • Example: MOVLT r3, #10. • Immediate operand 10 to r3 provided a previous instruction for comparison showed the first source as less than the second.
Bit Transfer or Manipulation Instructions • Register- bits Logical Left Shift (LSL). • Register- bits Logical Left arithmetic Shift (ASL). • Register- bits Logical Right Shift (LSR). • Register- bits Logical Right arithmetic Shift (ASR). • Register- bits Rotate Right (ROR). • Register bits Rotate Right with carry also extended for rotating (RRX).
Arithmetical Instructions • Three operands from the registers. • One source may however, be by immediate operand addressing in addition and subtraction . • Add without carry two words and the result is in the third operand (ADD). • Add with carry two words and the result is in the third operand (ADC). • Subtract without carry two words and the result is in the third operand (SUB). [Carry bit used as borrow.] • Subtract with carry two words and the result is in the third operand (SBC).
Arithmetical Instructions • Subtract reverse (second source with the first) without carry two words and the result is in the third operand (RSB). [Carry bit used as borrow.] • Subtract reverse with carry two words and the result is in the third operand (RSC). • Multiply two different registers and the result is in the destined register (MUL). • Multiply two source registers and add the result with the third source register and accumulate the new result in a destined register. (MLA).
Logic Instructions • Bit wise OR two words and the result is in the third operand. (ORR). • Bit wise AND two words and the result is in the third operand. (AND). • Bit wise Exclusive OR two words and the result is in the third operand. (EOR). • Clear a Bit (BIC). [There is one source for the bits; a second source for the mask and the result is at the third operand.]
Arithmetical or logical instruction conditional implementation • Example, SUBGE r1, r3, r5. The operand from r3 is subtracted from r5 if the GE condition resulted earlier (N and V status bits equal on comparison of two signed numbers). • Conditions can be the results of a comparison or test
Compare and Test Instructions • The result destines to CPSR, which stores the four condition bits, N, V, C, and Z. • Bit wise Test two words (TST). • Bit wise Negated Test between two words (TEQ). • Compare two words and the result is at the CPSR condition bits (CMP). • Compare two negative words and the result is at the CPSR condition bits (CMN).
Program-Flow Control Instructions • Branching (B) or Branch conditional operations. • Branch to an address relative to PC word in r15 (B) 'B #1A8' means add in PC 1A8 and change the program flow. • 'BGE #100' means that if a GE condition resulted on a compare 0 test, add in PC 1A8. • Similar instructions for different conditions of the processor status flags
Software Interrupt instruction • SWI has 8-bit opcode and remaining bits are not used by processor • Give single vector address of the ISR for SWI. • Remaining bits in SWI backtracked by programmer to compute ISR and ISR parameter pointers • This unique feature permits handling large number of SWIs required in the OS and application functions or threads or tasks