710 likes | 871 Views
CS 2200. Presentation 18a Parallel Processors. Questions?. Our Road Map. Processor. Memory Hierarchy. I/O Subsystem. Parallel Systems. Networking. The Next Step. Create more powerful computers simply by interconnecting many small computers Should be scalable Should be fault tolerant
E N D
CS 2200 Presentation 18a Parallel Processors
Our Road Map Processor Memory Hierarchy I/O Subsystem Parallel Systems Networking
The Next Step • Create more powerful computers simply by interconnecting many small computers • Should be scalable • Should be fault tolerant • More economical • Multiprocessors • High throughput running independent tasks • Parallel Processing • Single program on multiple processors
Key Questions • How do parallel processors share data? • How do parallel processors communicate? • How many processors?
Processor Processor Sharing Data I Communication with memory via loads and stores Memory Same box Single Address Space
Processor Processor Problems? Memory
Sharing Data I has Two Flavors! • Uniform Memory Access (UMA) • Symmetric Multiprocessors (SMP) • Non-Uniform Memory Access (NUMA)
Processor Processor Cache Cache Cache Processor Sharing Data I Uniform Memory Access - UMA Memory Symmetric Multiprocessor SMP
I/O I/O I/O I/O Channel Channel Channel Channel CPU x 4 CPU x 4 CPU x 4 CPU x 4 Cache Cache Cache Cache Memory Memory Memory Memory Sharing Data I Non-Uniform Memory Access - NUMA
Sharing Data II Computer with Private Memory Computer with Private Memory Computer with Private Memory Local Area Network • Use Message Passing • Each machine capable of • Send • Receive
Connection Schemes • Single Bus • Improved feasability due to -processors • Caches can reduce bus traffic • Need to worry about cache coherency • Network
Programming • As contrasted to instruction level parallelism which may be largely ignored by the programmer... • Writing efficient multiprocessor programs is hard. • Wizards write programs with sequential interface (e.g. Databases, file servers, CAD) • Communications overhead becomes a factor • Requires a lot of knowledge of the hardware!!!
Speedup Challenge • To get full benefit of parallelism need to be able to parallelize the entire program! • Amdahl’s Law • Timeafter = (Timeaffected/Improvement)+Timeunaffected • Example: We want 100 times speedup with 100 processors • Timeunaffected = 0!!!
Multiprocessor Cache Coherency • Means that values in cache and memory are consistent or that we know they are different and can act accordingly • Considered to be a good thing. • Becomes more difficult with multiple processors and multiple caches! • Popular technique: Snooping! • Write-invalidate • Write-update
P One of many processors.
P Addr 000000 R W This indicates what operation the processor is trying to perform and with what address.
Addr 000000 R W Tag 0000 0000 0000 0000 11 10 01 00 ID V 0 0 0 0 0 D 0 0 0 0 0 0 S 0 P The processors cache: Tag (4 bits), 4 lines (ID), Valid, dirty and Shared bits.
Addr 000000 R W Tag 0000 0000 0000 0000 ID 00 01 10 11 0 0 V 0 0 0 0 0 D 0 0 S 0 0 0 P Note: For this somewhat simplified example we won’t concern ourselves with how many bytes (or words) are in each line. Assume that it’s more than one.
Addr 000000 R W 0000 0000 0000 Tag 0000 11 ID 00 01 10 0 0 0 V 0 0 0 0 0 D S 0 0 0 0 P The Bus with indication of address and operation. Addr 000000 R W
Addr 000000 R W 0000 0000 0000 Tag 0000 11 ID 00 01 10 0 0 0 V 0 0 0 0 0 D S 0 0 0 0 P These bus operations are coming from other processors which aren’t shown. Addr 000000 R W
Addr 000000 R W Tag 0000 0000 0000 0000 ID 00 11 01 10 0 V 0 0 0 D 0 0 0 0 0 0 S 0 0 P Addr 000000 R W MEMORY Main Memory
Tag 0000 0000 0000 0000 01 00 10 ID 11 0 0 0 0 V D 0 0 0 0 0 0 0 0 S P Processor issues a read Addr 101010 R W Addr 000000 R W MEMORY
P Cache reports... Addr 101010 R W MISS Tag ID V D S 0000 00 0 0 0 0000 01 0 0 0 0000 10 0 0 0 Addr 0000 11 0 0 0 000000 R W MEMORY
P Cache reports... Addr 101010 R W MISS Tag ID V D S 0000 00 0 0 0 0000 01 0 0 0 0000 10 0 0 0 Addr 0000 11 0 0 0 000000 R W Because the tags don’t match! MEMORY
P Data read from memory Addr 101010 R W Tag ID V D S 0000 00 0 0 0 0000 01 0 0 0 1010 10 1 0 1 Addr 0000 11 0 0 0 000000 R W MEMORY
P Data read from memory Addr 101010 R W Tag ID V D S 0000 00 0 0 0 This bit indicates that this line is “shared” which means other caches might have the same value. 0000 01 0 0 0 1010 10 1 0 1 Addr 0000 11 0 0 0 000000 R W MEMORY
P From now on we will show these as 2 step operations…step 1 the request. Addr 101010 R W Tag ID V D S 0000 00 0 0 0 0000 01 0 0 0 0000 10 0 0 0 Addr 0000 11 0 0 0 000000 R W MEMORY
P Step 2…what was the result and the change to the cache. Addr 101010 R W MISS Tag ID V D S 0000 00 0 0 0 0000 01 0 0 0 1010 10 1 0 1 Addr 0000 11 0 0 0 000000 R W MEMORY
P A write... Addr 111100 R W Tag ID V D S 0000 00 0 0 0 0000 01 0 0 0 1010 10 1 0 1 Addr 0000 11 0 0 0 000000 R W MEMORY
1111 00 1 1 0 0000 01 0 0 0 1010 10 1 0 1 0000 11 0 0 0 P Addr 111100 R W Write Miss Tag ID V D S Addr 000000 R W MEMORY
1111 00 1 1 0 0000 01 0 0 0 1010 10 1 0 1 0000 11 0 0 0 P Keep in mind that since most cache configurations have multiple bytes per line a write miss will actually require us to get the line from memory into the cache first since we are only writing one byte into the line. Addr 111100 R W Write Miss Tag ID V D S Addr 000000 R W MEMORY
1111 00 1 1 0 0000 01 0 0 0 1010 10 1 0 1 0000 11 0 0 0 P Note: The dirty bit signifies that the data in the cache is not the same as in memory. Addr 111100 R W Tag ID V D S Addr 000000 R W MEMORY
1111 00 1 1 0 0000 01 0 0 0 1010 10 1 0 1 0000 11 0 0 0 P Another read... Addr 101010 R W Tag ID V D S Addr 000000 R W MEMORY
1111 00 1 1 0 0000 01 0 0 0 1010 10 1 0 1 0000 11 0 0 0 P …this time a hit! Addr 101010 R W Tag ID V D S HIT! Addr 000000 R W MEMORY
1111 00 1 1 0 0000 01 0 0 0 1010 10 1 0 1 0000 11 0 0 0 P Now another write... Addr 111100 R W Tag ID V D S Addr 000000 R W MEMORY
1111 00 1 1 0 0000 01 0 0 0 1010 10 1 0 1 0000 11 0 0 0 P To a dirty line! Addr 111100 R W Tag ID V D S This is a write hit and since the shared bit is 0 we know we are in the exclusive state. Addr 000000 R W MEMORY
1111 00 1 1 0 0000 01 0 0 0 1010 10 1 0 1 0000 11 0 0 0 P Now another processor failing to find what it needs in its cache goes to the bus…a “bus read miss” Addr 000000 R W Tag ID V D S Addr 010101 R W MEMORY
1111 00 1 1 0 0000 01 0 0 0 1010 10 1 0 1 0000 11 0 0 0 P Our cache which is monitoring the bus or snooping sees the miss but can’t help. Addr 000000 R W Tag ID V D S Addr 010101 R W MEMORY
1111 00 1 1 0 0000 01 0 0 0 1010 10 1 0 1 0000 11 0 0 0 P Another bus request... Addr 000000 R W Tag ID V D S Addr 101010 R W MEMORY
1111 00 1 1 0 0000 01 0 0 0 1010 10 1 0 1 0000 11 0 0 0 P Since we have this value in our cache we can satisfy the request from our cache assuming that this will be quicker than from memory. Addr 000000 R W Tag ID V D S Addr 101010 R W MEMORY
1111 00 1 1 0 0000 01 0 0 0 1010 10 1 0 1 0000 11 0 0 0 P And another request. This time to a dirty line. Addr 000000 R W Tag ID V D S Addr 111100 R W MEMORY
1111 00 1 1 0 0000 01 0 0 0 1010 10 1 0 1 0000 11 0 0 0 P We have to supply the value out of our cache since it is more current than the value in memory. Addr 000000 R W Tag ID V D S Addr 111100 R W MEMORY
1111 00 1 1 1 0000 01 0 0 0 1010 10 1 0 1 0000 11 0 0 0 P Addr 000000 R W We also mark it as shared. Why? Tag ID V D S Addr 000000 R W MEMORY
1111 00 1 1 1 0000 01 0 0 0 1010 10 1 0 1 0000 11 0 0 0 P Addr 111100 R W If, for example, our next operation was a write to this line... Tag ID V D S Addr 000000 R W MEMORY
1111 00 1 1 0 0000 01 0 0 0 1010 10 1 0 1 0000 11 0 0 0 P We would have to note that it was again exclusive and let the other caches know Addr 111100 R W ZAP Tag ID V D S Addr 000000 R W MEMORY
1111 00 1 1 0 0000 01 0 0 0 1010 10 1 0 1 0000 11 0 0 0 P We could then write repeatedly to this line and since we have exclusive ownership no one has to know! Addr 111100 R W Tag ID V D S Addr 000000 R W MEMORY
1111 00 1 1 0 0000 01 0 0 0 1010 10 1 0 1 0000 11 0 0 0 P In a similar way we must respond to write misses by other caches. Addr 000000 R W Tag ID V D S Addr 101010 R W MEMORY