410 likes | 795 Views
Modern Computer Architecture. My_notes. www.computing.northampton.ac.uk/~brian. brian.thompson@northampton.ac.uk. Indicative Content: 1. Introduction classification of computer architectures architectural concepts performance measurement and comparison trends in technology
E N D
Modern Computer Architecture My_notes www.computing.northampton.ac.uk/~brian brian.thompson@northampton.ac.uk
Indicative Content: 1. Introduction classification of computer architectures architectural concepts performance measurement and comparison trends in technology 2. Instruction Set Architectures memory addressing and operands operations control flow 3. Pipelining implementation hazards and solutions 4. Instruction-Level Parallelism (ILP) concept dynamic scheduling and out-of-order execution branch prediction speculation limitations of ILP 5. Data-Level Parallelism (DLP) SIMD principle SIMD operations SIMD programming
6. Thread-Level Parallelism (TLP) simultaneous multi-threading multi-core architectures and symmetric multiprocessors cache coherence protocols 7. Vector Processors architecture programming 8. Distributed Shared Memory directory-based cache coherence 9. Distributed Memory Computers (DMC) bus architectures static and dynamic Interconnection networks 10. Alternative Architectures System on Chip architectures (SoC) asynchronous processor designs FPGAs 11. Memory Hierarchy design performance
Read Only Memory ROM Bipolar MOS Mask ROM’s Mask ROM’s EPROM’s EEPROM’s Flash PROM’s PROM’s • ROM - mask programmed • PROM – fusible links • EPROM uv erasable • EEPROM – electrically erasable • Flash – block erasable - BIOS
Random Access Memory RAM Dynamic Static Needs Refresh Cheap-ish Fast Expensive • Fast Page Mode (FPM • Extended Data Out (EDO) • Burst EDO (BEDO) • Synchronous DRAM (SDRAM) • Double Data Rate SDRAM (DDR SDRAM) • DIMM/SIMM
DRAM Cell Capacitor_model During a read operation, one of the row select lines is brought high by decoding the row address (low-order address bits). The activated row select line turns on the switch transistors for all cells in the selected row. This causes the refresh amplifier associated with each column to sense the voltage level on the corresponding capacitor and interpret it as a 0 or a 1. If it is more than 50 percent, it reads it as a 1; otherwise it reads it as a 0. The column address (high-order address bits) enables one cell in the selected row for the output. The read cycle is actually a read/write cycle. If a '1' is read then the cell is re-written to with a '1' to recharge it. However if a '0' is read no recharge is necessary.
Access Time • Memory Cycle Time • Transfer Rate – 1/(Cycle time) • Parity • Error Checking
12 bit column address 12 bit column address latch 4096 x 4096 bit memory array I/O Control 12 bit row address latch CAS (column Address strobe) RAS (row address strobe) 12 bit row address
DRAM Organisation 4 bit module Chip 2 Chip 3 Chip 4 Chip 1
DRAM Module
Precharge delays • Precharge delays plus row access times (tRAC)
Access time 1.Row address placed on address bus 2. The /RAS pin is activated, placing row address into Row Address Latch 3. Row Address Decoder locates row to be sent to sense amplifiers 4. The Write Enable line is deactivated I.e a memory read 5. The column address is multiplexed on to the address pins 6. The /CAS pin is activated, placing column address into Column Address Latch 7. The /CAS acts as Output Enable. Hence once /CAS has stabilised the sense amplifiers can read the desired data. 8. /RAS and /CAS are deactivated.
Asynchronous Latency • Access Time – the amount of time after address placed on bus and when • the data appears on the data bus • Cycle Time – The amount of time between successive read operations • Minimising these two are the goal of the memory designer, with access • time being the main target, to increase bus speed. • A 2GHz processor can do more in 70ns than a 400MHz processor • Need to insert wait states Fast Page Mode Cycle time
Fast Page Mode • Activate /RAS then perform 4 /RAS cycles • Removes /RAS delays • Initial read is 6 cycles whereas next 3 are only 3 cycles • This is called 6-3-3-3 DRAM using x-y-y-y notation • However, you cannot activate the next column address • until the data from the previous read is gone
Note – Column addresses activated before data from previous read goes
Features of SDRAM • Controls with commands • Activate, read, write etc. • Multiple Bank configuration • Can precharge one bank while reading/writing to another • Adoption of control by Mode Register • Can set burst length and CAS latency etc. • Synchronous Operation • Latches each control signal at the rising edge of basic clock • Synchronised with system clock • Selectable CAS latency • Selectable burst length • The number of words that can continuously be input or output
Clock 1: ACTIVATE the row by turning on /CS and /RAS. The row address is placed on the address bus to determine which row to activate. Clock 3: READ the column required from the activated row by turning on /CAS while placing the column's address on the address bus. Clocks 5-10: The data from the row and column that you gave the chip goes out onto the Data Bus, followed by a BURST of other columns, the order of which depends on which BURST MODE has been set.
Memory Latencies On data sheets are written: 9-9-9-24 (2T) for a memory module. What do the numbers mean ? Well this refers to CAS-tRCD-tRP-tRAS and CMD (respectively) and these values are measured in clock cycles. • CAS Latency (1st number) - is the delay between the CAS signal and the availability of valid data on the data pins. The lower the latency, the better the performance. • tRCD (2nd number) - RAS to CAS delay. When memory is accessed sequentially, the row is already active and tRCD will not have much impact. However, if memory is not accessed in a linear fashion, the current active row must be deactivated and then a new row selected/activated. • tRP (3rd number) - is the time required to switch between rows. Therefore, in conjunction with tRCD, the time required (or clock cycles required) to switch banks (or rows) and select the next cell for either reading, writing or refreshing is a combination of tRP and tRCD. • tRAS (4th number) – time from receiving memory access request to initiating RAS. This is why tRAS has little effect on overall system performance but could impact system stability if set incorrectly. • Command Rate - is the time needed between the chip select signal and the when commands can be issued to the RAM module IC. Typically, these are either 1 clock or 2.
Bank Interleaving • SDRAM divides memory into two to four banks for simultaneous access to more data known as interleaving. • Using a notebook analogy, two-way interleaving is like dividing each page in a notebook into two parts and having two assistants to each retrieve a different part of the page. • Even though each assistant must take a break (be refreshed), breaks are staggered so that at least one assistant is working at all times. • Therefore, they retrieve the data much faster than a single assistant could get the same data from one whole page, especially since no data can be accessed when a single assistant takes a break. • This allows the processor to initiate a new memory access before the previous access completes and results in continuous data flow.
Memory Interleaving • An: Row Address • Bn: Column Address • D: Data Readout Comparison of ordinary and interleaved memory access
Memory Voltages - was originally 5 volts. However, as cell geometries decreased, memory circuitry became smaller and more sensitive. Today, computer memory components can operate as low 1.5 volts, which allows them to run faster and consume less power. • Bandwidth - the bandwidth capacity of the memory bus increases with its width (in bits) and its frequency (in MHz). By transferring 8 bytes (64 bits) at a time and running at 100 MHz, SDRAM increases memory bandwidth to 800 MB/s,
Error Correction and Detection • Simple parity checking detects only single-bit errors. • ECC (Error Correction Codes) uses a special algorithm to generate values called check bits. • ECC uses a special algorithm to generate values called check bits. The algorithm adds the check bits together to calculate a checksum, which it stores with the data. When data is read from memory, the algorithm re-calculates the checksum and compares it with the checksum of the written data. • If the checksums are equal, then the data is valid and operation continues. • If they are different, the data has an error and the ECC memory logic isolates the error and reports it to the system. In the case of a single-bit error, the ECC memory logic can correct the error and output the corrected data so that the system continues to operate
Double Data Rate DDR • There are presently three generations of DDR memories: • DDR1 memory, with a maximum rated clock of 400 MHz and a 64-bit (8 bytes) data bus is now becoming obsolete and is not being produced in massive quantities. • DDR2 memory is the second generation in DDR memory. DDR2 starts with a speed of 400 MHz • DDR3 is the third generation in DDR memory. DDR3 memory provides a reduction in power consumption of 30% compared to DDR2 modules due to DDR3's 1.5 V supply voltage.The main benefit of DDR3 comes from the higher bandwidth made possible by DDR3's 8-burst-deep prefetch buffer in contrast to DDR2's 4-burst-deep or DDR’s 2-burst-deep prefetch buffer.