190 likes | 309 Views
Welcome To The AM2900. 3-Bus Harvard Memory Architecture General Design. Let’s Get Ready to Rumble. A real RISC machine Lean instruction Free of microcode Pipelining Pipelining Pipelining Double-precision, floating-point arithmetic unit (Am29027 arithmetic accelerator)
E N D
Welcome To The AM2900 3-Bus Harvard Memory Architecture General Design
Let’s Get Ready to Rumble • A real RISC machine • Lean instruction • Free of microcode • Pipelining Pipelining Pipelining • Double-precision, floating-point arithmetic unit (Am29027 arithmetic accelerator) • 64-entry Memory Management Unit on-chip
32-bit, three-bus architecture • Instruction and data held in separate external memory systems • One 32-bit bus for each memory stystem • Shared 32-bit address bus • Avoid conflicts by supporting burst mode addressing (later) and large number of on –chip registers (later) • Shared address bus only causes an estimated 5% performance loss • All instruction fetches are directed to instruction memory • All data accesses are directed to data memory or I/O space
Four external access spaces • Instruction memory (just mentioned) • Data memory and I/O space (Just mentioned) • ROM space • Coprocessor space
Pipelining • 4-stage: • Fetch • Decode • Execute • Write Back • Independent hardware used at each stage • Simplified Execution stage • Since now we have faster memory, processing times are no longer fetch-stage dominated. Rather, they are execute-stage dominated. Therefore we simplify the execution stage with: • Simplified instruction formats • Limited instruction complexity • Operating on data in registers
Registers • 32-bit • Many of them • Reduce data fetch from off chip memory • Act as a cache for program data • Multiport register file allows for simultaneous multiple access to more than one operand • Most instructions operate on registers • To save memory access time
Registers Continue • 3 independent register regions • General Purpose registers • Translation Look –Aside (TLB) registers • Special Purpose registers
General Purpose Registers • 128 local registers • More than 64 global registers • These are are the source and destination for most instructions • User mode can only access general purpose registers • Registers are implemented by a multiport register file • Contains a minimum of 3 access ports • 2 of them provide simultaneous read access to register file • Third is for updating a register value
Register windows • Allocated from stack of 128 registers • Used for parameter passing • Dynamically sized • Results in very efficient procedure calls
Memory Accessed • only through explicit load and store • Delayed branching • To prevent pipeline stalls • Interrupts • Programmer is able to define own interrupt architecture • Enables OPTIMIZATION • Separate data and instruction cache • Enables concurrent data and instruction access • Branch Target Cache • Details on next slide
Branch Target Cache (BTC) • Supplies first four instructions of previously taken branches • Very very cool • Solves jump problem very nicely
Branch Target Cache Continue • Example: • Say we have a 3-cycle first access latency for branch instructions and 1-cycle access in burst-mode • Typically every 5th instruction is branch • Without BTC each of these would take 5 cycles to complete its execution (the pipeline would stall for 4 cycles) • BTC can hide all 3-cycles of latency to enable branch to execute in single cycle • BTC Rocks!
Branch Target Cache Continue • Maintained internally by processor hardware • 32 cache entries (known as blocks) of 4 instructions each • Each entry tells • Whether accessed in User or Supervisor mode • Whether virtual or physical address
Branch Target Cache Continue • Entry does not tell: • Which process accessed it • Therefore systems which operate with multiple tasks must invalidate the cache when a user context switch occurs • Can use IRETINV (interrupt return and invalidate)to do this • BTC can hold instructions of frequently taken trap handler routines but • Entries of table replaced in cache on random basis • No way to lock entries in cache
Burst-Mode Memory Interface • Provides a simplified transfer mechanism • Only applies to consecutive access sequences • Used for all instruction fetches • Used for load-multiple and store-multiple data access