1.18k likes | 2.24k Views
Chapter 5-1 Memory System. Memory System Next Lecture Cache Memory. Basic Concept. Memory. Processor. k. -bit. address bus. MAR. n. -bit. data bus. k. Up to 2. addressable. MDR. locations. Word length =. n. bits. Control lines. R. /. W. ( , MFC, etc.).
E N D
Chapter 5-1Memory System • Memory System • Next Lecture • Cache Memory
Basic Concept Memory Processor k -bit address bus MAR n -bit data bus k Up to 2 addressable MDR locations Word length = n bits Control lines R / W ( , MFC, etc.) • The maximum size of the memory that can be used is determined by the addressing scheme • Example: a 32 bit computer that generates 32-bit addresses is capable of addressing 232 = 4G memory locations in bytes
Basic Concept (cont.) • Main memory is used to store and retrieve data in word-length • Definition of word length: the number of bits stored and retrieved in one memory access • Many current CPUs may be able to retrieve 64 bits (Pentium 4) or even 128 bits at one clock to fill their internal cache • Most modern computers are byte addressable • Example: a possible address assignment for a byte addressable 32-bit computer using the big-endian arrangement
Basic Concept (cont.) 8 8 8 8 • When a 32bit address is sent from CPU to memory • The high-order 30 bits determine which word will be accessed • The lower-order 2 bits specify which byte • In a read operation, other bytes may be fetched • If the operation is a byte write, the control circuitry of the memory (called memory controller) must ensure that the contents of other bytes of the same word are not changed A0 2 to 4 decoder A1 Data (D0-D31) Sel 32 Address 0 4 8 c 1 5 9 d 2 6 a e 3 7 b f A2-A31
Basic Concept (cont.) Memory Processor k -bit address bus MAR n -bit data bus k Up to 2 addressable MDR locations Word length = n bits Control lines R / W ( , MFC, etc.) • If MAR is k bits long and MDR is n bits long, then • the memory may contain 2k addressable locations • during a memory cycle, n bits of data are transferred between the memory and the CPU • The transfer takes place over the processor bus, which has k address lines and n data lines • The bus also includes control lines: Read, Write, etc • In a byte addressable computer, another or two control lines are needed to indicate when only a byte rather than a full word of n bits is to be transferred
Memory Access Time • A useful measure of the speed of memory units is the time that elapses between the initiation of an operation and the completion of the operation • Memory Access Time: the time between the Read signal and the MFC (memory function complete) signal from the memory • Memory Cycle Time: minimum time delay between the initiation of successive memory operations • The cycle time is typically slightly longer than the access time • Main memories are usually implemented using RAMs (random access memory) • CPU can usually process instructions and data faster than they can be fetched from a reasonably priced memory unit • The memory cycle time is the bottleneck in the system
Desirable Memory System • Fast access time • Larger size
Possible Speed and Size Enhancements • Cache memory • Memory interleaving: • Divides the memory into a number of memory modules • Arranges addressing such that successive words in the address space are placed in different modules, which lessen the cycle time constraints • If requests for memory access tend to involve consecutive addresses, then the accesses will be sent to different modules • Increases the average rate of fetching words from the main memory
Interleaved Memory Modules A2 1 to 2 decoder 0, 8, 10, 18 4, c, 14, 1c A3-A31 • Assume: MAT is 10 ns and MCT is 12 ns • If always consecutive words are fetched, • What is the speed difference between a system without interleaving and a system with two interleaved modules
Internal Organization of Memory Chips 16x8bits b b ¢ b b ¢ b b ¢ 7 7 1 1 0 0 • • • W 0 FF FF A • • • 0 W 1 A 1 Address Memory • • • • • • • • • • • • • • • • • • cells decoder A 2 A 3 • • • W 15 R / W Sense / Write Sense / Write Sense / Write circuit circuit circuit CS Data input/output lines: b b b 7 1 0
Internal Organization of Memory Chips • Memory cells are usually organized in the form of an array • Each cell of the array is capable of storing one bit of information • Each row of cells constitutes a word • All cells of a row are connected to a common line: word line • A word line is driven by the address decoder on the chip • Cells in each column are connected to a Sense/Write circuit by two bit lines
Internal Organization of Memory Chips • The sense/write circuits are connected to the data input/output lines of the chip • During a read operation, the sense/write circuits read the information stored in the cells selected by a word line and transmit the information to the output data lines • During a write, the sense/write circuits receive input information and store it into the cells of the selected word • The data input and data output of each sense/write circuit are connected to a single bidirectional data line in order to reduce the number of pins required • Control line R/W* specifies the direction of the transfer • Control line CSselects a given chip in a multichip memory system
Internal Organization of Memory Chips • Consider a memory block that has 1K memory cells • Its circuitry can be organized as 128x8 memory cells, requiring a total of 19 external connections • Or can be organized as 1Kx1 which uses 16 pins even if separate pins are provided for the data input and data output lines • The required 10 bit address is divided into two groups • 5 bits are needed to specify a row of 32 cells • all the cells are accessed in parallel • The rest 5 bits for a cell in the row • Given the column address, only one of the cells is connected to the external data lines by the input and output multiplexers
Organization of a 1K x 1 Memory Chip 5-bit row address W 0 W 1 32 ´ 32 5-bit memory cell decoder array W 31 Sense / Write circuitry 10-bit address 32-to-1 R / W output multiplexer and CS input demultiplexer 5-bit column address Data input/output
Static Memories • consist of circuits that are capable of retaining their states as long as power is applied: SRAM b b ¢ T T 1 2 X Y Word line Bit lines
SRAM Operation • Two inverters are cross-connected to two bit lines b and b* • These transistors (T1 and T2) act as switches that can be opened or closed under the control of the word line • When the word line is at the ground level, the transistors are off and the latch retains its state • Example: assuming the cell is in state “1”, if the logic value at point X is 1 and at point Y is 0, this state is maintained as long as the signal on the word line is at the ground level
SRAM Operation • Read operation • The word line is activated to close switches T1 and T2 • If the cell is in state “1”, the signal on bit line b becomes high and the signal on bit line b* becomes low • The opposite is true if the cell is in state “0” • Write operation • The state of the cell is set by placing the appropriate value on bit line b and its complement on b* and then activating the word line • This forces the cell into the corresponding state
Dynamic Memories • SRAMs: fast but expensive b/c each cell requires six transistors • Less expensive RAMs can be implemented if simpler cells are used • But, these cells do not retain their state indefinitely • In dynamic memory, information is stored as a charge on a capacitor • DRAM is capable of storing information only for a few milliseconds • Its contents must be periodically refreshed by restoring the capacitor charge to its full value
DRAM Operation • To store information in a cell, transistor T is turned on and the appropriate voltage is applied to the bit line • If the bit line is high, this causes a known amount of charge to be stored on the capacitor • After the transistor is turned off, the capacitor begins to discharge due to leakage • Information stored in the cell can be retrieved correctly only if it is read before the charge on the capacitor drops below some threshold value recognized as “1”
DRAM Operation • During a read, the bit line is placed in a high impedance and the transistor is turned on • A sense circuit connected to the bit line determines whether the charge on the capacitor is above or below the threshold value • The read operation discharges the capacitor in the cell • To retain the information stored in the cell, DRAM include special circuitry, called Sense/Write circuit, that writes back the value read • Thus, a cell is refreshed every time it is read • Actually, all cells connected to a given word line are refreshed whenever this word line is activated
DRAM Example and Operation R A S Row Row 4096 ´ ( 512 ´ 8 ) address decoder cell array latch A ¤ A CS Sense / Write 20 - 9 8 - 0 circuits R / W Column Column address decoder latch C A S D D 7 0 4096 4096 2M ´ 8 dynamic memory chip
DRAM Example and Operation • 16 megabit DRAM organized as 2M x 8 • The cells are organized in the form of 4K x 4K array such that the high order 12 bits and low order 9 bits of the 21 bit address constitute the row and column addresses of a cell to reduce the number of pins needed for external connection, • The row and column addresses are multiplexed • During a read or write operation, the row address is applied first • It is loaded into the row address latch in response to a pulse on RAS signal
DRAM Read/Write Operation • The read operation is initiated, in which all the cells on the selected row are read and refreshed • The column address is applied to the address pins and loaded into the column latch controlled by a pulse on CAS signal • The information in the column latch is decoded and the appropriate sense/write circuit is selected • If the R/W* indicates a read, the outputs of the selected circuit are transferred to the data pins (D0-D7) • If the R/W* indicates a write, the data at the data inputs are transferred to the selected circuits; this information is used to overwrite the contents of the selected cells in the corresponding column
DRAM Refresh Operation • To ensure that the contents of a DRAM are maintained, each row of cells must be accessed periodically • Typically once every 2 to 16 milliseconds • A refresh circuitry can perform this function automatically • Some dynamic memory chips incorporate a refresh facility with the chips themselves • Because of their high density and low cost, dynamic memories are widely used in the main memory units of computers
Enhancement R A S Row Row 4096 ´ ( 512 ´ 8 ) address decoder cell array latch A ¤ A CS Sense / Write 20 - 9 8 - 0 circuits R / W Column Column address decoder latch C A S D D 7 0 • Consider an application in which a number of memory locations stored in successive addresses are to be accessed • Assume that the cells are all on the same row • It is only necessary to load the row address once • Different column addresses can then be loaded during successive memory cycles • The rate at which such block transfers can be carried out is typically double than for transfers involving random addresses • This faster rate can be exploited in transferring a data block from memory to cache
Memory System Design Consideration • The choice of a RAM depends on several factors: speed, power dissipation, size • SRAMs are generally used when very fast operation is the primary requirement • DRAMs are the choice for implementing computer main memories make large memories economically feasible
Example Configuration Using 16K x 1 64K x 8bits • Consider a small static memory consisting of 64K words of 8 bits each • 16Kx1 chips are used • The address bus is 16 bits wide • The high order 2 bits of the address are decoded to obtain the four chip select control signals • The remaining 14 address bits are used to access specific locations inside each chip of the selected row • the R/W* inputs of all chips are tied together for a common R/W* control
Memory Configuration • Consider a large dynamic memory with an organization similar to the previous figure • The control circuitry differs in three respects • The row and column parts of the address for each chip have to be multiplexed • A refresh circuitry is needed • The timing of various steps of a memory cycle must be carefully controlled
Memory Configuration Example • DRAM chips and the required control circuitry for 16M byte dynamic memory unit • DRAM chips are arranged in 4 X 8 array • The individual chips have a 1M x 4 organization • The array has a total storage capacity of 4M words of 32 bits • The memory unit is assumed to be connected to an asynchronous memory bus that has • 22 address lines (ADRS21-0) • 32 data lines (DATA32-0) • Two handshake signals: memory request and MFC • A Read/Write* line to indicate the type of memory cycle requested
DRAM Read Cycle • The CPU activates the address, the Read/Write* and the memory request lines • The access control block recognizes the request when the memory request signal becomes active • It sets the start signal to 1 • The timing control block responds by activating the Busy signal to prevent the access control from accepting new requests before the cycle ends • The timing control loads the row and column addresses into the memory chip by activating RAS and CAS lines
During this time, it uses the Row/Column* line to select first row address ADRS19-10, followed by column address ADRS9-0 • After obtaining the row and column parts of the address, the selected memory chips place the contents of the requested bit cells on their data outputs • The timing control block then activates MFC • At the end of the memory cycle, the Busy signal is deactivated and the access unit becomes free to accept new requests
The refresh control block periodically generates refresh requests • The access control block indicates to the refresh control block that it may proceed with a refresh operation by activating the refresh grant line • The access control block arbitrates between memory access requests and refresh requests • If memory access requests and refresh requests arrive simultaneously, refresh requests are given priority to ensure that stored information is not lost • When the refresh control block receives the refresh grant signal, it activates the refresh line
This causes the address multiplexer to select the refresh counter as the source of the row address • The contents of the counter are loaded into the row latches of all memory chips when the RAS is activated • During this time, the R/W* may indicate a write • We must ensure that this does not cause new information to be loaded into any cells during the refresh • The decoder block can deactivate all CS lines to prevent memory chips from responding to R/W*
DRAM Refresh Cycle • The rest of the refresh cycle is the same as a normal read cycle • At the end of the refresh, the refresh control block increments the refresh counter in preparation for the next refresh cycle • The CPU and the refresh circuit compete for access to memory • The refresh circuit must be given priority • The response of the memory to a request from the CPU or DMA may be delayed if refresh is in progress • During a refresh operation, all memory rows may be refreshed in succession before the memory is returned to normal use (burst refresh mode) • An alternative interleaves refresh operations on successive rows with accesses from memory bus
Refresh Overhead • Consider a memory array with 1M x 1 chips • Each chip contains a cell array organized as 1024 x 1024 x 1 • There are 1024 rows with 1024 bits per row • Assuming it takes 130 ns to refresh one row • Each row must be refreshed once every 16ms • Time needed to refresh all rows in the chip = 0.133 ms • All cells are refreshed simultaneously in a burst mode • Less than 1% of the memory cycles is used for refresh operations
Refresh Overhead • There is an apparent increase in the access time of the memory when a request arrives while a refresh operation is in progress • The variability in access time is easily accommodated with an asynchronous bus • In the case of a synchronous bus, it may be possible to hide a refresh cycle within the early part of the bus cycle (if sufficient time remains after the refresh to carry out a read or a write) • Alternatively, the refresh circuit may request bus cycles in the same manner as any device with DMA capability
Read Only Memories • Logic value 0 is stored in the cell if the transistor is connected to ground at point P; otherwise, a 1 is stored • To read the state of the cell, the word line is activated • The transistor switch is closed and the voltage on the bit line drops to near zero if there is a connection between the transistor and ground • If there is no connection to ground, the bit line remains at the high voltage, indicating a 1 • A sense circuit at the end of the bit line generates the proper output value • Data are written into a ROM when it is manufactured Possible configuration of a ROM cell
PROM (Programmable ROM) • Some ROMs allow data to be loaded by the user • Programmability is achieved by inserting a fuse at point P • Before it is programmed, the memory contains all 0s • The user can insert 1s at specific locations by burning the fuses with high current pulses • PROMs provide flexibility and convenience • ROMs are economically attractive for high volumes • The cost of preparing the mask for storing specific information into a ROM makes them expensive when only a small number are needed
EPROM, EEPROM, Flash ROM EPROMs (Erasable reProgrammable ROM) • Allow stored data to be erased and new data to be loaded • Provide flexibility during the development phase of a digital system • Erasure requires dissipating the charges trapped in the transistors • This can be done by exposing chip to ultra violet light • A disadvantage of EPROMs is that a chip must be physically removed from the circuit for reprogramming EEPROMS (Electrically Erasable and reProgrammable ROM) • An alternative to EPROMs, they can be programmed and erased electrically • Cells in EEPROMs can be erased selectively • Disadvantage of EEPROMs: different voltages are needed for erasing, writing and reading stored data • Flash ROM: higher density, but erasable only block by block
Speed, Size and Cost • Very fast memory can be achieved by using SRAM chips • SRAMs are more expensive b/c their basic cell uses 6 transistors • It is impractical (cost) to build a large memory using SRAM chips • SRAMs can be used for cache memories • Alternative: DRAM chips • DRAMs are less expensive but significantly slower • A large, affordable, main memory can be built with DRAMs • Very large disks are available at a reasonable price for secondary storage • A huge amount of storage can be provided by magnetic disks
DDR Memory • Double-Data-Rate Synchronous Dynamic Random Access Memory (SDRAM) • 184 pins • 64-bit data width • Transferring data on both the rising and falling edges of the clock signal (double pumped). • This effectively nearly doubles the transfer rate without increasing the frequency of the front side bus. • Thus a 100 MHz DDR system has an effective transfer rate of 200 MHz • With data being transferred 64 bits at a time, DDR RAM gives a transfer rate of (memory bus clock rate) × 2 (for dual rate) × 64 (number of bits transferred) / 8 (number of bits/byte). Thus with a bus frequency of 100 MHz, DDR-SDRAM gives a max transfer rate of 1600 MB/s (or 1.GB/s).
DDR Memory • PC-1600: DDR-SDRAM memory module specified to operate at 100 MHz using DDR-200 chips, 1.600 GByte/s bandwidth • PC-2100: DDR-SDRAM memory module specified to operate at 133 MHz using DDR-266 chips, 2.133 GByte/s bandwidth • PC-2700: DDR-SDRAM memory module specified to operate at 166 MHz using DDR-333 chips, 2.667 GByte/s bandwidth • PC-3200: DDR-SDRAM memory module specified to operate at 200 MHz using DDR-400 chips, 3.200 GByte/s bandwidth • PC-xxxx denotes theoretical bandwidth, whereas DDR-xxx denotes effective clockspeed
DDR2 Memory • Double-Data-Rate Two Synchronous Dynamic Random Access Memory (DDR2 SDRAM) • 240 pins, 64-bit data width • Transfer data both on the rising and falling edge of the clock • Electrical interface improvements, on-die termination, prefetch buffers and off-chip drivers has further boosted the clock frequency. • The key difference between DDR and DDR2 is that in DDR2 the bus is clocked at twice the speed of the memory cells, allowing transfers from two different cells to occur in the same memory cell cycle. Thus, without speeding up the memory cells themselves, DDR2 can effectively operate at twice the bus speed of DDR. • Note: the latency of each cell may be the same as DDR
DDR2 Memory • DDR2's bus frequency is boosted by electrical interface improvements, on-die termination, prefetch buffers and off-chip drivers • However, latency is greatly increased as a trade-off because the cells take twice as long (in terms of bus cycles) to produce a result, and additional buffering adds yet more delay. While DDR SDRAM has typical read latencies of between 2 and 3 bus cycles, DDR2 may have read latencies between 3 and 9 cycles.
Dual Channel • Dual-channel architecture DDR/DDR2 SDRAM describes a motherboard technology that effectively doubles data throughput from RAM to the memory controller. • Dual channel-enabled memory controllers utilize two 64-bit data channels, resulting in a total bandwidth of 128-bits, to move data from RAM to the CPU
DDR3 Memory • Double-Data-Rate 3 Synchronous Dynamic Random Access Memory (DDR3 SDRAM) • The memory comes with a promise of a power consumption reduction of 40% compared to current commercial DDR2 modules, due to DDR3's 90nm fabrication technology, allowing for lower operating currents and voltages (1.5 V, compared to DDR2's 1.8 V or DDR's 2.5 V). • "Dual-gate" transistors will be used to reduce leakage of current. • PC3-6400: DDR3-SDRAM specified to run at 400 MHz using DDR3-800 chips, 6.40 GB/s bandwidth • PC3-8500: DDR3-SDRAM run at 533 MHz using DDR3-1066 chips, 8.53 GB/s bandwidth • PC3-10600: DDR3-SDRAM run at 667 MHz using DDR3-1333 chips, 10.67 GB/s bandwidth
Prefetch Buffer • The prefetch buffer is a memory cache located on modern RAM modules which stores data before it is actually needed. • The width (or burst length) of the prefetch buffer is increased with each successive standard of modern DDR SDRAM modules • DDR SDRAM's prefetch buffer width is 2-bit. • DDR2 SDRAM's prefetch buffer width is 4-bit. • DDR3 SDRAM's prefetch buffer width is 8-bit.