280 likes | 498 Views
Microprocessor system architectures – IA 64. Jakub Yaghob. Application architecture. Application architecture features – I. Instruction set Architecture Load-Execute-Store architecture, no stack, no division Explicit parallelism
E N D
Microprocessor system architectures – IA64 Jakub Yaghob
Application architecture features – I • Instruction set • Architecture • Load-Execute-Store architecture, no stack, no division • Explicit parallelism • Massive resources (128 integer and FP registers, 64 predicate registers, 8 branch registers) • Enhancements • Speculation, predication, software pipelining, branch prediction, multimedia instructions • Instruction level parallelism • Independent instructions in bundles • Multiple bundles per clock
Application architecture features – II • Explicit parallelism • Instruction group • Defined by a compiler • Parallel execution of instructions • Strict requirements on dependencies • Forbidden register RAW, WAW dependencies • Memory model • Relatively weak • Only restriction is RAW, WAW, WAR dependencies on one memory location • Explicit memory access synchronization
Speculation • Early memory load • Control speculation • Advancing load in a condition • Sometimes load executed “uselessly”, when the condition is not met • Data speculation • Advancing load before a store with aliases • Checking using ALAT • Speculation check • No speculative load, if it would cause an exception • Data speculation is invalid, if there is a write to the memory location
Prediction • Predicate registers • 64 1-bit predicate registers PR0-PR63 • PR0 hardwired to 1, write is ignored • No specialized arithmetic/logic flags • Set by compare instructions • Pair of PR (one for the comparison, one for complementary comparison) • Modes of setting (some of them breach WAW inside of an instruction group) • Nearly all instructions are conditioned by a PR
Register stack • Support for function calls • GR0-GR31 are global registers • GR32-GR127 create a register stack • Each procedure has a register frame • 2 variable sized areas: local and output • Register renaming using alloc instruction • First output register becomes GR32 • If register stack overflows, then CPU will free some registers by saving them into the memory
Privilege levels and serialization • Privilege levels • Like IA-32, levels 0-3 • System instructions and registers accessible only with CPL=0 • Serialization • Data dependency • All application and system resources excluding control registers • Values written to a register are observed by instructions in subsequent instruction groups • Instruction serialization • Modifications are observed before subsequent instruction group fetches are re-initiated • Data serialization • Modifications affecting both execution and data memory access are observed • In-flight • Non-serialized resources have “some” value for reads
Processor Status Register (PSR) • Current execution environment • Divided into four overlapped sections • Special instructions
Control registers • 128 control registers • Large number of reserved, only 26 used • Groups • Global control registers • CR0 (DCR=Default Control Register) • CR2 (IVA=Interruption Vector Address) • CR8 (PTA=Page Table Address) • Global interrupt control registers • Control of an active interrupt • Writes are not serialized
Banked general registers • Fast switching of GR16-GR31 for interrupt handlers • Current bank in PSR.bn • Bank switching • Interrupt selects bank 0 • rfisets the bank from IPSR.bn • bswswitches to the specified bank • Including NaT
Virtual memory model • Virtual regions • Supports OS with Multiple Address Spaces • Protection domain mechanism • Supports OS with Single Address Space • TLB • Algorithms for paging deferred to OS • VHPT (Virtual Hash Page Table) • Augmenting TLB performance • Inverted page tables • Other mechanisms • Various page sizes, fixed translations, …
TLB • Separated for code and data • Data TLB translates accesses to VHPT or RSE • Each TLB divided into two parts • Translation registers (TR) • Fully associative array • OS can explicitly set the translation • No automatic replacement • Translationcache (TC) • Entries can be inserted by an instruction • Automatic replacement (from VHPT)
Access rights on pages • Defined by TLB.ar and TLB.pl • Using TLB.ar • Read only • Read, execute • Read, write • Read, write, execute • Read only/read, write • Read, execute/read, write, execute • Read, write, execute/read, write • Exec, promote/read, execute
Virtual addressing – other – I • Page sizes • 4k, 8k, 16k, 64k, 256k, 1M, 4M, 16M, 64M, 256M, 4G • Region registers(RR) • Highest 3 bitsof VA create an indexinto RR • rid – region identification • ps – preferred page size • ve –VHPT enabling
Virtual addressing – other – II • Protection keys • At least 16 keys • A key in TLB entry is compared with protection keys; exception „key miss fault“
VHPT – II • Vlastnosti • CPU do VHPT nic nezapisuje • CPU neudržuje koherenci TLB a VHPT • Dva formáty • Krátký – pro každou oblast, položka 8B • Dlouhý – jedna velká pro systém, položka 32B • Různé velikosti mocniny 2 • Prohledáváno, pokud selže TLB • Pokud nalezeno ve VHPT, automaticky vloženo do TC • Pevné hashovací funkce
Physical addressing and memory attributes • Only 63 bits • Current architecture and implementation only 50 bits • Memory attributes • Virtual – like IA-32 (WB, WC, …) • Physical – using bit 63 of FA • 0 – WB, speculative • 1 – UC, nonspeculative • Nontrivial rules for memory ordering
Interrupts – I • Kinds depending on handlers • IVA • Handled by OS, a vector defined by CR2 • PAL • Handled by PAL or by system firmware, ev. by OS • Kinds depending on behavior • Abort • Interrupt • External, asynchronous • Fault • Trap • Interrupts are disabled during interrupt handling
Interrupts – II • Currently defined 81 exceptions • 5 for „hard“ exceptions • RESET, INIT, INT, MCA, PMI • 23 for IA-32 emulation • IVA-interrupts • Vectors have fixed address • Exception groups on one vector • External interrupts • 256 vectors • Priority division using vector number • Current vector CR65 (IVR=Interrupt Vector Register) • Current priority in CR66 (TPR=Task Priority Register)
RSE – 1 • Register Stack Engine (RSE) • Transfers registers stack from/to memory • Without software intervention in the background • Different activity modes (lazy-store intensive-load intensive-eager) • Physical register stack must have size at least 96 registers • More in multiplies of 16
Firmware • Processor Abstraction Layer (PAL) • Unified interface to the CPU firmware • System abstraction layer (SAL) • Separates OS from implementation variation of platforms • Extensible firmware interface (EFI) • OS booting • Each FW layer (including OS) has defined an entry point • PAL and SAL placedin 16M memory exactly below 4G • Fixed structure