1.41k likes | 2.72k Views
Unit -4 Memory System Design. Memory System. There are two basic parameters that determine Memory systems Performance
E N D
Memory System • There are two basic parameters that determine Memory systems Performance • Access Time: Time for a processor request to be transmitted to the memory system, access a datum and return it back to the processor.( Depends on physical parameter like bus delay, chip delay etc.) • Memory Bandwidth:Ability of the memory to respond to requests per unit of time. ( depends on memory system organization, No of memory modules etc.
Memory System Organization • No. of memory banks each consisting of no of memory modules, each capable of performing one memory access at a time. • Multiple memory modules in a memory bank share the same input and out put buses. • In one bus cycle, only one module with in a memory bank can begin or complete a memory operation. • Memory cycle time should be greater than the bus cycle time.
Memory System Organization • In systems with multiple processors or with complex single processors, multiple requests may occur at the same time causing bus or network congestion. • Even in single processor system requests arising from different buffered sources may request access to same memory module resulting in memory systems contention degrading the bandwidth.
Memory System Organization • The maximum theoretical bandwidth of the memory system is given by the number of memory modules divided by memory cycle time. • The Offered Request Rate is the rate at which processor would be submitting memory requests if memory had unlimited bandwidth. • Offered request rate and maximum memorybandwidth determine maximum Achieved Memory Bandwidth
Achieved vs. Offered Bandwidth Offered Request Rate: • Rate that processor(s) would make requests if memory had unlimited bandwidth and no contention
Memory System Organization • The offered request rate is notdependent on organization of memory system. • It depends on processor architecture and instruction set etc. • The analysis and modeling of memory system depends on no of processors that request service from common shared memory system. • For this we use a model where n simple processors access m independent modules.
Memory System Organization • Contention develops when multiple processors access the same module. • A single pipelined processor making n requests to memory system during a memory cycle resembles the n processor m modules memory system.
The Physical Memory Module • Memory module has two important parameters • Module Access Time: Amount of time to retrieve a word into output memory buffer of the module, given a valid address in its address register. • Module Cycle Time: Minimum time between requests directed at the same module. Memory access Time is the total time for the processor to access a word in memory. In a large interleaved memory system it includes module access time plus transit time on bus, bus accessing overhead, error detection and correction delay etc.
Semiconductor Memories • Semiconductor memories fall into two categories. • Static RAM or SRAM • Dynamic RAM or DRAM The data retention methods of SRAM are static where as for DRAM its Dynamic. Data in SRAM remains in stable state as long as power is on. Data in DRAM requires to be refreshed at regular time intervals.
DRAM Cell Address Line Capacitor Ground Data Line
SRAM Vs DRAM • SRAM cell uses 6 transistor and resembles flip flops in construction. • Data information remains in stable state as long as power is on. • SRAM is much less dense than DRAM but has much faster access and cycle time. • In a DRAM cell data is stored as charge on a capacitor which decays with time requiring periodic refresh. This increases access and cycle times
SRAM Vs DRAM • DRAM cells constructed using a capacitor controlled by single transistor offer very high storage density. • DRAM uses destructive read out process so data readout must be amplified and subsequently written back to the cell • This operation can be combined with periodic refreshing required by DRAMS. • The main advantage of DRAM cell is its small size, offering very high storage density and low power consumption.
Memory Module • Memory modules are composed of DRAM chips. • DRAM chip is usually organized as 2n X 1 bit, where n is an even number. • Internally chip is a two dimensional array of memory cells consisting of rows and columns. • Half of memory address is used to specify a row address, (one of 2 n/2 row lines) • Other half is similarly used to specify one of 2 n/2 column lines.
Memory Module • To save on pinout for better overall density the row and column addresses are multiplexed on the same lines. • Two additional lines RAS (Row Address Strobe) and CAS (Column Address Strobe) gate first the row address and then column address into the chip. • The row and column address are then decoded to select one out of 2n/2 possible lines. • The intersection of active row and column lines is the desired bit of information.
Memory Module • The column lines signals are then amplified by a sense amplifier and transmitted to the out put pins Dout during a Read Cycle. • During a Write Cycle, the write enable signal stores the contents on Din at the selected bit address.
Memory Timing • At the beginning of Read Cycle, RAS line is activated first and row address is put on address lines. • With RAS active and CAS inactive the information is stored in row address register. • This activates the row decoder and selects row line in memory array. • Next CAS is activated and column address put on address lines.
Memory Timing • CAS gates the column address into column address register. • The column address decoder then selects a column line . • Desired data bit lies at the intersection of active row and column address lines. • During a Read Cycle the Write Enable is inactive ( low) and the output line D out is at high impedance state until its activated high or low depending on contents of selected location.
Memory Timing • Time from beginning of RAS until the data output line is activated is called the chip access time. ( t chip access). • T chip cycle is the time required by the row and column address lines to recover before next address can be entered and read or write process initiated. • This is determined by the amount of time that RAS line is active and minimum amount of time that RAS must remain inactive to let chip and sense amplifiers to fully recover for next operation.
Memory Module • In addition to memory chips a memory module consists of a Dynamic Memory Controller and a Memory Timing Controller to provide following functions. • Multiplex of n address bits into row and column address. • Creation of correct RAS and CAS signal lines at the appropriate time • Provide timely refresh to memory system.
Memory Module p bits n address bits Memory Chip 2n x 1 Dynamic Memory Controller n/2 address bits D out Memory Timing Controller Bus Drivers p bits
Memory Module • As memory read operation is completed the data out signals are directed at bus drivers which interface with memory bus, common to all the memory modules. • The access and cycle time of module differ from chip access and cycle times. • Module access time includes the delays due to dynamic memory controller, chip access time and delay in transitioning through the output bus drivers.
Memory Module • So in a memory system we have three access and cycle times. • Chip access and Chip cycle time • Module access and Module Cycle time • Memory (System) access and cycle time. (Each lower item includes the upper items)
Memory Module • Two important features found on number of memory chips are used to improve the transfer rates of memory words. • Nibble Mode • Page Mode
Nibble Mode • A single address is presented to memory chip and the CAS line is toggled repeatedly. • Chip interprets this CAS toggling as mod 2w progression of low order column addresses. • For w=2, four sequential words can be accessed at a higher rate from the memory chip. [00] ---[01]----[10]-----[11]
Page Mode • A single row is selected and non sequential column addresses may be entered at a higher rate by repeatedly activating the CAS line • Its slower than nibble mode but has greater flexibility in addressing multiple words in a single address page • Nibble mode usually refers to access of four consecutive words. Chips that feature retrieval of more than four consecutive words call this feature as fast page mode
Error Detection and Correction • DRAM cells using very high density have very small size. • Each cell thus carries very small amount of charge to determine data state. • Chances of corruptions are very high due to environmental perturbations, static electricity etc. • Error detection and correction is thus intrinsic part of memory system design.
Error Detection and Correction • Simplest type of error detection is Parity. • A bit called parity bit is added to each memory word, which ensures that the sum of the number of 1’s in the word is even (or odd). • If a single error occurs to any bit in the word, the sum modulo 2 of the number of 1’s in the word is inconsistent with parity assumption and word is known to have been corrupted.
Error Detection and Correction • Most modern memories incorporate hardware to automatically correct single errors ( ECC – error correcting codes) • The simplest code of this type might consist of a geometric block code • The message bits to be checked are arranged in a roughly square pattern and each column and row is augmented with a parity bit. • If a row and column indicate a flaw when decoded at receiver end, then fault lies at the intersection bit which can be simply inverted for error correction.
Two Dimensional ECC Row 0 1 2 3 4 5 6 7 Col 0 1 2 3 4 5 6 7 C0 C1 C2 C3 C4 C5 C6 C7 (Data) Column Parity P0 P1 P2 P3 P4 P5 P6 P7 P8 Row Parity
Error Detection and Correction • For 64 message bits we need to add 17 parity bits, 8 for each of the rows and column and one additional parity bit to compute parity on the parity row and column. • If failure is noted in a single row or a single column or multiple rows and columns then it is a case of multi bit failure and a non correctable state is entered.
Achieved Memory Bandwidth • Two factors have substantial effect on achieved memory bandwidth. • Memory Buffers : Buffering should be provided for memory requests in the processor or memory system until the memory reference is complete. This maximizes requests made by the processor resulting in possible increase in achieved bandwidth. • Partitioning of Address Space: The memory space should be partitioned in such a manner that memory references are equally distributed across memory modules.
Assignment of Address Space to m Memory Modules m-1 0 1 2 m m+1 m+2 2m-1 2m 2m+1 2m+2 3m-1
Interleaved Memory System • Partitioning memory space in m memory modules is based on the premise that successive references tend to be successive memory locations. • Successive memory locations are assigned to distinct memory modules. • For m memory modules an address x is assigned to a module x mod m. • This partitioning strategy is termed interleaved memory system and no of modules m is the degree of interleaving.
Interleaved Memory System • Since m is a power of two so x mod m results in memory module to be referenced, being determined by low order bits of the memory address. • This is called low order interleaving. • Memory addresses can also be mapped to memory modules by higher order interleaving • In higher order interleaving upper bits of memory address define a module and lower bits define a word in that module
Interleaved Memory System • In higher order interleaving most of the references tend to remain in a particular module whereas in low order interleaving the references tend to be distributed across all the modules. • Thus low order interleaving provides for better memory bandwidth whereas higher order interleaving can be used to increase the reliability of memory system by reconfiguring memory system.
Memory Systems Design • High performance memory system design is an iterative process. • Bandwidth and partitioning of the system are determined by evaluation of cost , access time and queuing requirements. • More modules provide more interleaving and more bandwidth, reduce queuing delay and improve access time. • But it increases system cost and interconnect network becomes more complex, expensive and slower.
Memory Systems Design The Basic design steps are as follows: • Determine number of memory modules and the partitioning of memory system. • Determine offered bandwidth.: Peak instruction processing rate multiplied by expected memory references per instruction multiplied by number of processors. • Decide interconnection network: Physical delay through the network plus delays due to network contention cause reduced bandwidth and increased access time. High performance time multiplexed bus or crossbar switch can reduce contention but increases cost.
Memory Systems Design 4. Assess Referencing Behavior: Program behavior in its sequence of requests to memory can be - Purely sequential: each request follows a sequence. - Random: requests uniformly distributed across modules. -Regular: Each access separated by a fixed number ( Vector or array references) Random request pattern is commonly used in memory systems evaluation.
Memory Systems Design 5. Evaluate memory model: Assessment of Achieved Bandwidth and actual memory access timeand thequeuingrequired in the memory system in order to support the achieved bandwidth.
Memory Models Nature of Processor: • Simple Processor: Makes a single request and waits for response from memory. • Pipelined Processor: Makes multiple requests for various buffers in each memory cycle • Multiple Processors: Each requesting once every memory cycle. Single processor with n requests per memory cycle is asymptotically equivalent to n processors each requesting once every memory cycle.
Memory Models Achieved Bandwidth: Bandwidth available from memory system. B (m) or B (m, n): Number of requests that are serviced each module service time Ts = Tc , (m is the number of modules and n is number of requests each cycle.) B (w) : Number of requests serviced per second. B (w) = B (m) / Ts
Hellerman’s Model • One of the best known memory model. • Assumes a single sequence of addresses. • Bandwidth is determined by average length of conflict free sequence of addresses. (ie. No match in w low order bit positions where w = log 2 m: m is no of modules.) • Modeling assumption is that no address queue is present and no out of order requests are possible.
Hellerman’s Model • Under these conditions the maximum available bandwidth is found to be approximately. B(m) = m and B(w) = m /Ts • The lack of queuing limits the applicability of this model to simple unbuffered processors with strict in order referencing to memory.
Strecker’s Model • Model Assumptions: • n simple processor requests made per memory cycle and there are m modules. • There is no bus contention. • Requests random and uniformly distributed across modules. Prob of any one request to a particular module is 1/m. • Any busy module serves 1 request • All unserviced requests are dropped each cycle • There are no queues
Strecker’s Model • Model Analysis: • Bandwidth B(m,n) is average no of memory requests serviced per memory cycle. • This equals average no of memory modules busy during each memory cycle. Prob that a module is not referenced by one processor = (1-1/m). Prob that a module is not referenced by any processor = (1-1/m)n. Prob that module is busy = 1-(1-1/m)n. So B(m,n) = average no of busy modules = m[1 - (1 - 1/m)n]
Strecker’s Model • Achieved memory bandwidth is less than the theoretical due to contention. • Neglecting congestion carried over from previous cycles results in calculated bandwidth to be still higher.