440 likes | 1.11k Views
Microprocessor system architectures – ARMv8. Jakub Yaghob. ARM architecture. RISC Large uniform register file Load/store architecture Simple addressing modes Execution states AArch64 x AArch32 Architecture profiles A – application profile R – real-time profile
E N D
Microprocessor system architectures – ARMv8 Jakub Yaghob
ARM architecture • RISC • Large uniform register file • Load/store architecture • Simple addressing modes • Execution states • AArch64 x AArch32 • Architecture profiles • A – application profile • R – real-time profile • M – microcontroller profile
Execution states – AArch64 • AArch64 • 31 64-bit general-purpose registers • X30 – procedure link • 64-bit PC, SPs, ELRs (exception link registers) • 32 128-bit SIMD registers • Single instruction set A64 • Exception levels EL0-EL3 • 64-bit virtual addressing • Names each system register with suffix that indicates the lowest EL with access • PSTATE (Process state)
Execution states – AArch32 • AArch32 • 13 32-bit general purpose registers • 32-bit PC, SP, LR (link register) • Some registers banked for each execution mode • Single ELR (return from Hyp) • 32 64-bit SIMD registers • A32 instruction set – fixed length encoding, compatible with ARMv7 • T32 instruction set – variable-length, compatible with ARMv7 Thumb • 32-bit virtual address • CPSR (current program state register)
Supported data types, cryptographic extension • Integer • B, H, W, D, Q • Floating point • HP, SP, DP • IEEE 754 • Cryptographic extension • Operates on the vector register file • AES, SHA1, SHA2-256
Memory model • The ARM memory model supports • Generating an exception on an unaligned memory access • Restricting access by applications to specified areas of memory • Translating virtual addresses provided by executing instructions into physical addresses • AArch64 – 64-bit addressing, TCR (Translation Control Register) determines VA range, EL0+EL1 have 2 independent VA ranges each with its own TCR • AArch32 – 32-bit addressing, TCR determines VA range, OS can split VA range into 2 subranges for EL0+EL1 with separate TCR • Altering the interpretation of multi-byte data between big-endian and little-endian • Controlling the order of accesses to memory • Controlling caches and address translation structures • Synchronizing access to shared memory by multiple PEs
Application architecture – AArch64 • 31 general-purpose registers R0-R30 • 64-bit GP registers X0-X30 • X30 procedure link • 32-bit GP registers W0-W30 • Encoding 1Fh for register used as ZR (zero register) • 32 vector registers V0-V31 • FPCR, FPSR – floating-point status and control register • SP 64-bit • WSP 32-bit • Current SP • PC 64-bit
Application architecture – AArch64 – PSTATE • Process state for EL0 • Data processing flags • N – negative • Z – zero • C – carry • V – overflow • Exception masking bits • D – debug mask • A – system error mask • I – IRQ mask • F – FIQ mask
System registers • Register naming • <register_name>_Elx, x∈{0,1,2,3} • General system control registers • Debug registers • Generic timer registers • Performance monitor registers • Optional • Trace registers • Optional • Generic Interrupt Controller (GIC) CPU interface registers • Optional
Software control and EL0 • Exception handling • Interrupts • Memory system aborts • Undefined instructions • System calls • Secure monitor or Hypervisor traps • System instructions for control flow • WFI – Wait For Interrupt • WFE – Wait For Event • YIELD – hint • Can enter low-power state • Cache management • Must be enabled by EL1 • Debug events • BKPT – breakpoint • DBG – hint to the debug system • HLT – entry to Debug state
Caches and memory hierarchy • Point of Unification • IC, DC see the same copy of a memory • Point of Coherency • All agents that can access memory are guaranteed to see the same copy
Memory types • Normal • Bulk memory operations, R/W, R/O • Device • Speculative reads forbidden • Additional attributes • Gathering • Prevents aggregation of R/W • Reordering • Preserves access order and synchronization requirements • Early write acknowledgement • Write can be acknowledged other than at the end point • Shareability • Non-shareable, inner shareable, outer shareable • Cacheability • Non-cacheable, write-through cacheable, write-back cacheable
Alignment • Instruction alignment • A64 instructions must be word-aligned • Data alignment • Unaligned access to any Device memory causes an Alignment fault • Normal memory • SCTLR_ELx.A – configure unaligned access behavior • Generate an Alignment fault • Perform an unaligned access • Unaligned access • Not guaranteed to be atomic • Takes a number of additional cycles • Can abort more times for memory exceptions
Endian support • Instruction endianness • A64 instructions are always little-endian • Data endianness • SCTLR_EL1.E0E – configures endianness for EL0 at EL1 or higher • Instructions for reverting data in registers • REV16, REV32, REV64
Synchronization and semaphores • Load-exclusive instructions • LDXP, LDXR, LDXRH, LDXRB • Store-exclusive instructions • STXP, STXR, STXRH, STXRB • Clear-exclusive • CLREX • Should scale on MPS
Exception levels • Exception levels EL0-EL3 • EL0 – unprivileged execution, applications • EL1 – OS kernel • EL2 – supports virtualization of non-secure operation, hypervisor • EL3 – supports switching between two security states (secure state, non-secure state), secure monitor • All implementations must include EL0 and EL1 • Stack pointer register selection • SP_ELx
Exception mechanism • Saved Program Status Register • Saves PE state on taking exceptions • SPSR_ELx for exception taken to ELx • When returning from an exception, PE state restored to the state stored SPSR • Exception link registers • ELR_ELx holds preferred exception return address
Exception vectors • Vector Base Address Register (VBAR) • Each Elx • Defines base address for the table at that ELx
System calls • SVC • Supervisor call exception • EL0 calls OS at EL1 • HVC • Hypervisor call exception • For EL1 and higher • SMC • Secure monitor call exception • For EL1 and higher
Virtual Memory System Architecture • VMSA • Provides MMU • MMU translates VAs to PAs independently for ELx and security states • A64 has 48-bit VA and PA
Address translation system • VMSAv8-64 • Translation Table Base Register (TTBR) • Translation Control Register (TCR) • Up to four levels of address lookup • IA of up to 48 bits • OA of up to 48 bits • A translation granule size of 4K, 16K, 64K
MMU faults • All types of MMU exceptions • Alignment fault • Permission fault • Translation fault • Address size fault • Synchronous external abort on a translation table walk • Access flag fault • TLB conflict abort
Translation Lookaside Buffers (TLB) • TLB • Caches results from translation table walks • Global pages • Process-specific pages • Address Space Identifier (ASID) • Implementation defined size 8 or 16 bits • Virtual Machine Identifier (VMID) • Concept of locked entries • Optional for implementation • Maintenance instructions • TLBI <operation>{,Xt}