210 likes | 382 Views
EV7. Peter Bannon Staff Fellow HP pbannon@hp.com. Alpha Microprocessor Roadmap. EV7 Features. Alpha 21264 core with enhancements Integrated L2 Cache Integrated memory controller Integrated network interface. M. IO. M. M. IO. IO. M. IO. IO. M. IO. M. M. M. IO. M. IO. M.
E N D
EV7 Peter Bannon Staff Fellow HP pbannon@hp.com
Alpha Microprocessor Roadmap BARC 2003
EV7 Features • Alpha 21264 core with enhancements • Integrated L2 Cache • Integrated memory controller • Integrated network interface BARC 2003
M IO M M IO IO M IO IO M IO M M M IO M IO M IO IO M IO M IO EV7 System Block Diagram 364 364 364 364 364 364 364 364 364 364 364 364 BARC 2003
I/O Router Mem0 Mem1 L2 Tag N E S W L2 Data L2 Data P 0 1 2 3 P 4 5 6 7 EV68 Core BARC 2003
Integrated L2 Cache • 1.75 MB, 7-way set associative, with ECC • 20 GB/s total read/write bandwidth • 16 Victim buffers for L1 -> L2 • 16 Victim buffers for L2 -> Memory • 9.6ns load to use latency • Tag access start every cycle • Data access in 4 cycle blocks • Couple Tag/Data access to minimize latency • Decoupled Tag access to minimize resource use. BARC 2003
Two Integrated Memory Controllers • RDRAM memory • Directly connect to the processor • High data capacity per pin • 800 Mb/s operation • 75ns load to use latency • 12.8 GB/sec peak bandwidth • 6 GB/sec read or write bandwidth • 2048 open pages • 64 entry directory based cache coherence engine • ECC SECDED • Optional 4+1 parity in memory BARC 2003
Cache Coherence Engine DRAM Scheduling Data Path QUE ROW ROW OUT 32 MEMORY REFERENCES 4 PRQ ROW COL COL OUT COL 4 RSQ PA DATA OUT SLOT MAP Data R DATA IN D 8 RCAS CC STATE MACHINE CHK COR REMAP R 8 WCAS A M Directory out Directoryin Data to Core ZBox Block Diagram BARC 2003
Integrated Network Interface • Direct processor-to-processor interconnect • 4 links 6.4 GB/second per link • 32 bits + ECC at 800 Mb/s each direction • 18ns processor-to-processor latency • ECC, single error correct, double error detect, per hop • Out-of-order network with adaptive routing • IO, Request, Forward, Special, and Response channels • VO, V1, and Adaptive virtual networks • Asynchronous clocking between processors • 3 GB/second I/O interface per processor BARC 2003
Bytes Adap. V0 V1 Request 12 8 1 1 Forward 12 8 1 1 Block Resp. 76 3 1 1 Resp. 12 8 1 1 Write I/O 76 1 2 2 Read I/O 12 1 2 2 Special 12 8 0 0 ~1KB per port E W N S IO C Z0 Z1 I I I I I I I I Q Q Q Q Q Q Q Q OF O O O O O O O E W N S IO L0 L1 Rbox Block Diagram BARC 2003
while (p) p=*p; BARC 2003
CPU INT 2000 BARC 2003
CPU FP 2000 BARC 2003
Database Performance BARC 2003
64P Running TTOY, 32 memory controllers BARC 2003
64P Running TTOY, 64 memory controllers BARC 2003