1 / 18

Welcome To The AM2900

Welcome To The AM2900. 3-Bus Harvard Memory Architecture General Design. Let’s Get Ready to Rumble. A real RISC machine Lean instruction Free of microcode Pipelining Pipelining Pipelining Double-precision, floating-point arithmetic unit (Am29027 arithmetic accelerator)

yen
Download Presentation

Welcome To The AM2900

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Welcome To The AM2900 3-Bus Harvard Memory Architecture General Design

  2. Let’s Get Ready to Rumble • A real RISC machine • Lean instruction • Free of microcode • Pipelining Pipelining Pipelining • Double-precision, floating-point arithmetic unit (Am29027 arithmetic accelerator) • 64-entry Memory Management Unit on-chip

  3. 32-bit, three-bus architecture • Instruction and data held in separate external memory systems • One 32-bit bus for each memory stystem • Shared 32-bit address bus • Avoid conflicts by supporting burst mode addressing (later) and large number of on –chip registers (later) • Shared address bus only causes an estimated 5% performance loss • All instruction fetches are directed to instruction memory • All data accesses are directed to data memory or I/O space

  4. Four external access spaces • Instruction memory (just mentioned) • Data memory and I/O space (Just mentioned) • ROM space • Coprocessor space

  5. Pipelining • 4-stage: • Fetch • Decode • Execute • Write Back • Independent hardware used at each stage • Simplified Execution stage • Since now we have faster memory, processing times are no longer fetch-stage dominated. Rather, they are execute-stage dominated. Therefore we simplify the execution stage with: • Simplified instruction formats • Limited instruction complexity • Operating on data in registers

  6. Registers • 32-bit • Many of them • Reduce data fetch from off chip memory • Act as a cache for program data • Multiport register file allows for simultaneous multiple access to more than one operand • Most instructions operate on registers • To save memory access time

  7. Registers Continue • 3 independent register regions • General Purpose registers • Translation Look –Aside (TLB) registers • Special Purpose registers

  8. General Purpose Registers • 128 local registers • More than 64 global registers • These are are the source and destination for most instructions • User mode can only access general purpose registers • Registers are implemented by a multiport register file • Contains a minimum of 3 access ports • 2 of them provide simultaneous read access to register file • Third is for updating a register value

  9. Register windows • Allocated from stack of 128 registers • Used for parameter passing • Dynamically sized • Results in very efficient procedure calls

  10. Memory Accessed • only through explicit load and store • Delayed branching • To prevent pipeline stalls • Interrupts • Programmer is able to define own interrupt architecture • Enables OPTIMIZATION • Separate data and instruction cache • Enables concurrent data and instruction access • Branch Target Cache • Details on next slide

  11. Branch Target Cache (BTC) • Supplies first four instructions of previously taken branches • Very very cool • Solves jump problem very nicely

  12. Branch Target Cache Continue • Example: • Say we have a 3-cycle first access latency for branch instructions and 1-cycle access in burst-mode • Typically every 5th instruction is branch • Without BTC each of these would take 5 cycles to complete its execution (the pipeline would stall for 4 cycles) • BTC can hide all 3-cycles of latency to enable branch to execute in single cycle • BTC Rocks!

  13. Branch Target Cache Continue • Maintained internally by processor hardware • 32 cache entries (known as blocks) of 4 instructions each • Each entry tells • Whether accessed in User or Supervisor mode • Whether virtual or physical address

  14. Branch Target Cache Continue • Entry does not tell: • Which process accessed it • Therefore systems which operate with multiple tasks must invalidate the cache when a user context switch occurs • Can use IRETINV (interrupt return and invalidate)to do this • BTC can hold instructions of frequently taken trap handler routines but • Entries of table replaced in cache on random basis • No way to lock entries in cache

  15. Burst-Mode Memory Interface • Provides a simplified transfer mechanism • Only applies to consecutive access sequences • Used for all instruction fetches • Used for load-multiple and store-multiple data access

More Related