260 likes | 434 Views
2. Outline. IntroductionDRAM OrganizationChallengesBandwidthGranularityPerformanceReading: HP3 5.8 and 5.9. 3. Basics of DRAM Technology. DRAM (Dynamic RAM)Used mostly in main mem.Capacitor 1 transistor/bitNeed refresh every 4-8 ms5% of total timeRead is destructive (need for write-
E N D
1. 1 COMP 206:Computer Architecture and Implementation Montek Singh
Mon., Nov. 22, 2004
Topic: Main Memory (DRAM) Organization
2. 2 Outline Introduction
DRAM Organization
Challenges
Bandwidth
Granularity
Performance
Reading: HP3 5.8 and 5.9
3. 3 Basics of DRAM Technology DRAM (Dynamic RAM)
Used mostly in main mem.
Capacitor + 1 transistor/bit
Need refresh every 4-8 ms
5% of total time
Read is destructive (need for write-back)
Access time < cycle time (because of writing back)
Density (25-50):1 to SRAM
Address lines multiplexed
pins are scarce! SRAM (Static RAM)
Used mostly in caches (I, D, TLB, BTB)
1 flip-flop (4-6 transistors) per bit
Read is not destructive
Access time = cycle time
Speed (8-16):1 to DRAM
Address lines not multiplexed
high speed of decoding imp.
4. 4 DRAM Organization: Fig. 5.29
5. 5 Chip Organization Chip capacity (= number of data bits)
tends to quadruple
1K, 4K, 16K, 64K, 256K, 1M, 4M, …
In early designs, each data bit belonged to a different address (x1 organization)
Starting with 1Mbit chips, wider chips (4, 8, 16, 32 bits wide) began to appear
Advantage: Higher bandwidth
Disadvantage: More pins, hence more expensive packaging
6. 6 Chip Organization Example: 64Mb DRAM
7. 7 DRAM Access Several steps in DRAM access:
Half of the address bits select a row of the square array
Whole row of bits is brought out of the memory array into a buffer register (slow, 60-80% of access time)
Other half of address bits select one bit of buffer register (with the help of multiplexer), which is read or written
Whole row is written back to memory array
Notes:
This organization is demanded by needs of refresh
Has advantages: e.g., nibble, page, and static column mode operation
8. 8 DRAM Refresh Refreshes are performed one row at a time.
Consider a 1Mx1 DRAM chip with 190 ns cycle time
Time for refreshing one row at a time
190?10-9 ?103 = 0.19 ms < 4-8 ms
Refresh complicates operation of memory
Refresh control competes with CPU for access to DRAM
Each row refreshed once every 4-8 ms irrespective of the use of that row
Want to keep refresh fast (< 5-10% of total time)
9. 9 Memory Performance Characteristics Latency (access time)
The time interval between the instant at which the data is called for (READ) or requested to be stored (WRITE), and the instant at which it is delivered or completely stored
Cycle time
The time between the instant the memory is accessed, and the instant at which it may be validly accessed again
Bandwidth (throughput)
The rate at which data can be transferred to or from memory
Reciprocal of cycle time
“Burst mode” bandwidth is of greatest interest
Cycle time > access time for conventional DRAM
Cycle time < access time in “burst mode” when a sequence of consecutive locations is read or written
10. 10 Improving Performance Latency can be reduced by
Reducing access time of chips
Using a cache (“cache trades latency for bandwidth”)
Bandwidth can be increased by using
Wider memory (more chips)
More data pins per DRAM chip
Increased bandwidth per data pin
11. 11 Two Recent Problems DRAM chip sizes quadrupling every three years
Main memory sizes doubling every three years
Thus, the main memory of the same kind of computer is being constructed from fewer and fewer DRAM chips
This results in two serious problems
Diminishing main memory bandwidth
Increasing granularity of memory systems
12. 12 Increasing Granularity of Memory Systems Granularity of memory system is the minimum memory size, and also the minimum increment in the amount of memory permitted by the memory system
Too large a granularity is undesirable
Increases cost of system
Restricts its competitiveness
Granularity can be decreased by
Widening the DRAM chips
Increasing the per-pin bandwidth of the DRAM chips
13. 13 Granularity Example
14. 14 Granularity Example (2)
15. 15 Improving Memory Chip Performance Several techniques to get more bits/sec from a DRAM chip:
Allow repeated accesses to the row buffer without another row access time
burst mode, fast page mode, EDO mode, …
Simplify the DRAM-CPU interface
add a clock to reduce overhead of synchronizing with the controller
= synchronous DRAM (SDRAM)
Transfer data on both rising and falling clock edges
double data rate (DDR)
Each of the above adds a small amount of logic to exploit the high internal DRAM bandwidth
16. 16 Basic Mode of Operation Slowest mode
Uses only single row and column address
Row access is slow (60-70ns) compared to column access (5-10ns)
Leads to three techniques for DRAM speed improvement
Getting more bits out of DRAM on one access given timing constraints
Pipelining the various operations to minimize total time
Segmenting the data in such a way that some operations are eliminated for a given set of accesses
17. 17 Nibble (or Burst) Mode Several consecutive columns are accessed
Only first column address is explicitly specified
Rest are internally generated using a counter
18. 18 Fast Page Mode Accesses arbitrary columns within same row
Static column mode is similar
19. 19 EDO Mode Arbitrary column addresses
Pipelined
EDO = Extended Data Out
Has other modes like “burst EDO”, which allows reading of a fixed number of bytes starting with each specified column address
20. 20 Evolutionary DRAM Architectures SDRAM (Synchronous DRAM)
Interface retains a good part of conventional DRAM interface
addresses multiplexed in two halves
separate data pins
two control signals
All address, data, and control signals are synchronized with an external clock (100-150 MHz)
Allows decoupling of processor and memory
Allows pipelining a series of reads and writes
Peak speed per memory module: 800-1200 MB/sec
21. 21 Revolutionary DRAM Architectures Examples
RDRAM (Rambus DRAM)
MDRAM (MoSys DRAM)
Salient features
Many smaller memory banks interleaved on one chip
“Protocol based” architecture
Narrow, fully multiplexed communication protocol
Example: RAMBUS (RDRAM, DRDRAM)
Each chip is more like a memory system than a component
Interleaved memory and a high-speed interface
Packet-switched bus (split transaction bus)
Chip can return variable #bytes from a single request, performs own reset, transfers on both clock edges
Narrow bus (1-2 data bytes)
Upto 3 transactions can be done concurrently
Internally, 72-bit wide bus with 5 ns cycle time
Up to 1600 Mbps peak bandwidth
Expensive!
22. 22 Achieving Higher Memory Bandwidth
23. 23 Memory Interleaving Goal: Try to take advantage of bandwidth of multiple DRAMs in memory system
Memory address A is converted into (b,w) pair, where
b = bank index
w = word index within bank
Logically a wide memory
Accesses to B banks staged over time to share internal resources such as memory bus
Interleaving can be on
Low-order bits of address (cyclic)
b = A mod B, w = A div B
High-order bits of address (block)
Combination of the two (block-cyclic)
24. 24 Low-order Bit Interleaving
25. 25 Mixed Interleaving Memory address register is 6 bits wide
Most significant 2 bits give bank address
Next 3 bits give word address within bank
LSB gives (parity of) module within bank
6 = 0001102 = (00, 011, 0) = (0, 3, 0)
41 = 1010012 = (10, 100, 1) = (2, 4, 1)
26. 26 Other types of Memory ROM = Read-only Memory
Flash = ROM which can be written once in a while
Used in embedded systems, small microcontrollers
Offer IP protection, security
Other?