Uniprocessor Architecture

Uniprocessor Architecture

The von Neumann Machine • stored program computer: unlike the ENIAC (1946) which was “programmed” by rewiring the computer. • key insight: programs could be treated as data John von Neumann: 1903-1957 • hardware implements the cycle: • fetch-(decode)-execute • The basis for virtually every computer since • EDVAC (1950)

The Stored Program Concept • Von Neumann’s proposal was to store the program instructions right along with the data • This may sound trivial, but it represented a profound paradigm shift • The stored program concept was proposed about fifty years ago; to this day, it is the fundamental architecture that fuels computers. • What are the components of this architecture?

Four Sub-Components • There are four sub-components in von Neumann architecture: • Memory • Input/Output (called “IO”) • Arithmetic-Logic Unit • Control Unit • While only 4 sub-components are called out, there is a 5th, key player in this operation: a bus, or wire, that connects the components together and over which data flows from one sub-component to another

von Neumann Architecture

von Neumann Architecture • Most of you should know this stuff, so let’s review quickly since it is fundamental to processor design and it will emphasis the problems that still exist with computer architecture. • Remember that most multi-processor systems are actually multi-computers and therefore we never get away from the uniprocessor and its associated issues both positive and negative.

Von Neumann Architecture • What am I talking about when I say positive and negative issues?

Von Neumann Architecture • On the positive side it’s simple to understand. • Its introduction has been fundamentally as the starting point for the rapid improvements in processors over the years.

Von Neumann Architecture • Memory bottlenecks are problem • Could a person looking directly into memory tell the difference between instruction and data? • What about languages like SequenceL? Do their data structures map directly in to memory. • Even if data was identifiable in memory can we tell what type of data it is?

Von Neumann Architecture • Try xxd on any binary file.

Harvard architecture • enhanced design that provides independent paths for addresses, data, instruction addresses, and instructions. • Mark I • Mark IV (Aiken’s work at Harvard).

Harvard architecture

Control Unit • What might we find in the control unit of the von Neumann processor?

Control Unit • Program Counter • Instruction register • Registers

Control Unit • major parameters: speed, complexity (cost), and flexibility. • Goal: minimize instruction cycle. • Implementations: • hardwired and • microprogrammed.

Hardwired control • inflexible but optimal • complex

Microprogrammed control • Microprogramming is a method of control design in which control-signal selection and sequencing is stored in ROM or RAM called Control Memory(CM). • A microinstruction fetched from CU specifies the control signals to be activated at any time.

Microprogrammed control • A set of microinstructions is referred to as a microprogram, whose execution corresponds to the execution of a single machine instruction. • A set of microprograms that interpret a particular instruction set is sometimes referred to as an emulator. • A micro-assembler translates microprograms into executable code that can be stored in the control memory.

Microprogrammed control • Systematic • flexible • more expensive because of control memory and additional access circuitry. • emulate other machines if microprograms are available. • potential decrease in operating speed.

ALU • What’s it’s purpose?

ALU • The Arithmetic Logic Unit is the brains of the processor. • It contains a register called an accumulator upon which it can perform operations. • What operations? • Do these operations take place in all processors?

ALU • Progressive enhancements: • Faster algorithms and implementations: carry lookahead adders, fast multipliers, hardware division, etc. • Large number of registers: data buffers to reduce access to main memory. • Stack-based ALU: zero-address machines; operands and intermediate results are maintained in stack registers. • Pipelined ALU: multiple stage execution. • Multiple functional units: add, multiply, shift, etc., for simultaneous execution of several operations. • Multiple ALUs: each ALU performs all arithmetic logic functions simultaneously.

Memory • There are several different flavors of memory • Why isn’t just one kind used?

Memory • Ram • ROM • Cache • registers

Machine Address Register • The special register is called the MAR – the machine address register. • For a machine with 2n address cells, the MAR must be able to hold a number 2n - 1 big. • What’s it for?

Memory Operations • Two basic operations occur within this subcomponent: a fetch operation, and a store. • The fetch operation: • A cell address is loaded into the MAR. • The address is decoded, which means that thru circuitry, a specific cell is located. • The data contents contained within that cell is copied into another special register, called a Machine Data Register (MDR) also called a Machine Buffer Register (MBR). • Data is copied, not destroyed. Therefore this operation is non-destructive.

Memory Operations • The second memory operation is called a store. • The fetch is like a read operation; the store is like a write operation • In the store, the address of the cell into which data is going to be stored is moved to the MAR and decoded. • Contents from the accumulator, are copied into the cell location (held in the MAR). • This operation is destructive, data originally stored at that memory location is overwritten.

a simplified walkthrough of how a basic read memory access is performed • The address for the memory location to be read is placed on the address bus. • The memory controller decodes the memory address and determines which chips are to be accessed. • The lower half of the address ("row") is sent to the chips to be read. • After allowing sufficient time for the row address signals to stabilize, the memory controller sets the row address strobe (sometimes called row address select) signal to zero.

a simplified walkthrough of how a basic read memory access is performed • When the /RAS signal has settled at zero, the entire row selected is read by the circuits in the chip. Note that this action refreshes all the cells in that row; refreshing is done one row at a time. • The higher half of the address ("column") is sent to the chips to be read.

After allowing sufficient time for the column address signals to stabilize, the memory controller sets the column address strobe (or column address select) signal to zero. This line is abbreviated as "CAS" with a horizontal line over it, or "/CAS". • When the /CAS signal has settled at zero, the selected column is fed to the output buffers of the chip. • The output buffers of all the accessed memory chips feed the data out onto the data bus, where the processor or other device that requested the data can read it.

Memory • important parameters: • speed (bandwidth, latency), • capacity, and • cost

Memory • bandwidth — number of data units that can be accessed per second; function of access time and cycle time (minimum time between requests to memory) • Access times typically 5 - 70 nanoseconds • Cycle times << access times

Memory • Other important parameters: • power consumption, • weight, • volume, • reliability, and • error detection/correction capability

Memory • Progressive enhancements: • Wider word fetch — multiple memory words per memory cycle are fetched • Cache memory — fast memory block 10 to 100 times faster than main memory. It takes advantage of temporal locality of references during the execution of a program.

Cache • If a requested address is not found in the cache then a cache miss occurs. An effective cache memory design renders a low number of cache misses, i.e., a high hit ratio. • Writing to the cache presents a problem because • need to maintain data consistency • need to interact with I/O processors.

a) multiplexes requests from both I/O and the CPU. Note however that all traffic will be concentrated in the cache unit. • b) a direct path to the cache units raises the problem of consistency when several processors have their own cache.

This figure shows a solution in which the directory is replicated. • To maintain consistency any updates in the I/O copy invalidates the corresponding entries in the cache.

Cache • Reading is not a problem in caches unless the item read has been marked dirty, dirty caches are usually associated with multiprocessor systems. A dirty cache entry would require an update. • For a single processor system a read could cause a miss.

Cache • The CPU can handle write operations in two different ways: • Write-Through — a write to the cache implies also a write to main memory This may increase data traffic between m.m. and CPU, but the IOP does not need to read from the cache because data in m.m is consistent with data in the cache. • Write-back (write-in cache) — the CPU writes to the cache only. Main memory is updated when the block (page) is not needed in cache any more. In this case the IOP must check its cache directory, if it is a hit, then reads data from cache; else it accesses main memory.

Cache Performance • What is the price for cache misses? • How often do they typical occur? • When they occur what happens?

Cache performance • We can view CPU execution time as • Memory stall cycles are the number of cycles a CPU waits for a main memory access.

Cache Performance

Cache Performance • Assume we have a computer where the clocks per instruction (CPI) is 1.0 when all memory accesses hit in the cache. The only data accesses are loads and stores, and these total 50% of the instructions. If the miss penalty is 25 clock cycles and the miss rate is 2%, how much faster would the computer be if all instructions were cache hits?

Cache Performance • CPU execution time if all hits?

Cache Performance • Assume a cache with misses what is the memory stall cycles?

Cache Performance • With cache misses included

Cache Performance • The performance ratio is the inverse of the execution times. • The computer with no cache misses is 1.75 times faster.

Cache Performance • Therefore a miss rate of 2% results in a slow down in execution times of 75% with respect to the CPU with no misses. • Given the 25 clock cycle penalty. • Is 25 reasonable?

Memory Hierarchy

Cache design • Direct mapped • A block has only one place it can appear in a cache • Full associative • A block can be any where in a cache • Set associative • A block is restricted to a set of places in a cache.

Uniprocessor Architecture

Uniprocessor Architecture

Presentation Transcript

Uniprocessor scheduling

Uniprocessor Scheduling

Uniprocessor Scheduling

Uniprocessor Scheduling

Uniprocessor Scheduling

Uniprocessor Locks

Chapter 1 Uniprocessor Architecture Overview

Uniprocessor Scheduling

Uniprocessor Scheduling

Uniprocessor Scheduling

Uniprocessor Scheduling

Uniprocessor Scheduling

Uniprocessor Scheduling

Uniprocessor Scheduling

Uniprocessor Scheduling

Uniprocessor Scheduling

Uniprocessor Checkpointing

Uniprocessor Scheduling

Uniprocessor Scheduling

Uniprocessor Scheduling

Uniprocessor Scheduling

Uniprocessor Scheduling