560 likes | 574 Views
Explore the evolution of razor blade technology, processor advancements, and future predictions in this engaging discussion led by Charlie Brej. From Brej's Laws to the impact of compound interest, delve into the fascinating world of technological progression. Discover insights on productivity, design complexities, and the shifting landscape of the tech industry, all presented in an informative and thought-provoking manner. Join the conversation and envision the upcoming innovations that could shape our future.
E N D
Group Talk Charlie Brej APT Group University of Manchester Async Forum
Part 1:The Future According to Me Charlie Brej APT Group University of Manchester Async Forum
Razor Blades 1998 Scheme 1: “Name” [Number] Plus/Extreme/Ultra/Turbo/?X Trac II Plus Core Quad Extreme Athlon 64 FX GeForce 8800 Ultra 1971 1901 Scheme 2: “Company Name” Fusion/Quattro/Mach Gillette Fusion, AMD Fusion, Ford Fusion Schick Quattro, NVIDIA Quadro, Audi Quattro Gillette Mach, ATI Mach, Ford Mustang Mach 1 Maybe more soon… 2005 2004 Async Forum
Razor Blade History Async Forum
Prediction:2007 Jan-Sept 15 Blade Apple iShave Async Forum
Why did this not happen? • Because you don’t need more than five blades on your razor • Unless we grow larger faces • Which hasn’t happened before, so we wont need them for some time • We don’t need more than four processors • Unless we invent an automagic parallelism extractor • Which we haven’t since the 60s, so we wont need them for some time • People will still demand faster single thread performance Async Forum
Real Future • Moore’s law will continue • Transistor count doubles every 18 months • Moving into 3rd dimension • Intelligent transistors placed per person will remain constant • Not copy-paste • Verification becomes problematic • Designs become very complicated Async Forum
Productivity Managers 40% Grunt Coder 80% Can we make it pink? Sales 0% Hero Coder 100% Marketting -20% Maintainers 60% Admin 20% How about “Intel Terrano” Async Forum
Brej’s Law • Person years per design doubles every 18 months • Most transistors are copy-paste • Verification becomes much more complex • Hero coders become more rare • People get stupider • Marketing becomes more important Async Forum
Brej’s Law • 1985: 5 person years • ARM • 1997: 2560 person years • Pentium II (about right) • 2007: 81920 person years • Intel has 94,000 employees • AMD has 16,000 • A new design every 7 years Async Forum
Brej’s Law • 2028: Entire population of the USA are employed by Intel • 2031: Entire population of China employed by AMD • 2034: Entire world population working on creating Pentium 12 • 2090: Project to build Pentium 15 starts but hits a snag as universe finishes before the project does Async Forum
“The most powerful force in the universe is compound interest” Albert Einstein “And we didn't have any fancy Sony Playstation video games We had the Atari 2600! There were no multiple levels or screens. It was just ONE screen, forever, and you could never win. The game just kept getting harder and faster and until you died. Just like LIFE!” Ernest Cline Async Forum
Back to the Future • Transistors will be free • Mostly consumed in memory • Diminishing returns • Single thread grinds to a halt • Increase performance by 1% get 100% more money • Fewer designs • Very expensive and long lead up times • Extend rather than redesign Async Forum
Part 2:Wagging Logic: Non Throughput-Bound Design Methodology Charlie Brej APT Group University of Manchester Async Forum
Introduction • Async performance • Asynchronous logic is slow • Wagging Logic • Example circuits • Red Star • Design • Results • Conclusions Async Forum
Data propagation Logic C C C C C C C C Latency Cycle Time 0 1 2 3 4 5 6 7 8 9 10 11 12 Async Forum
Control propagation Logic C C C C C C C C C C C C Latency Cycle Time 0 1 2 3 4 5 6 7 8 9 10 11 12 Async Forum
Control propagation Logic C C C C C C C C C C C C Latency Cycle Time 0 1 2 3 4 5 6 7 8 9 10 11 12 Async Forum
And then it gets worse • Latency is at least six times lower than the cycle time • Assumes all data arrives at arrive at the same time • Assumes all acknowledgements arrive at the same time • Actual number is somewhere between 10 and 100 Async Forum
What can we do • Use two-phase signalling • Halve the control delay • Loose all average case advantages • Fine grain pipelining • Need to add 10+ latches per stage • Adds latency • Faster completion • Anti-tokens, Early-drop latches… • Careful timing analysis Async Forum
Wagging Latches • Alternate latch read/write • Capacity of two latches • Depth of one latch Async Forum
Wagging Logic • Apply same method to the logic • Rotate logic allowing one to set while others reset Set Reset Reset Async Forum
Single Channel Mixer Async Forum
LCM Channels Mixer Async Forum
Direct Connection Mixer Async Forum
32bit Incrementer Example Reg +1 Slice 0 Reg +1 Slice 1 HB +1 Slice 2 HB +1 Async Forum
32bit Incrementer Optimal Design: 3288 Operations 3.04 GDs per operation Original Design: 77 Operations 130 GDs per operation Async Forum
32bit Accumulator Example • Load or Accumulate Async Forum
32bit Accumulator Example Load Accumulate Accumulate Load Accumulate Load Async Forum
32bit Accumulator Example Async Forum
Transistors are “Free” • What is expensive? • Design effort • Time to market • Yield • What we want • Simple • Copy-Paste • Redundancy Async Forum
Redundancy Slice Slice Slice Slice Slice Slice Async Forum
Arrangement Slice 0 Slice 0 Slice 0 Slice 2 Slice 1 Slice 5 Slice 3 Slice 1 Slice 2 Slice 1 Slice 3 Slice 4 Slice 4 Slice 2 Slice 5 Slice 3 Async Forum
Teaching Monkeys • Dynamic extraction of parallelism • Implicit data dependency tracking • No locking • No polling • No handshakes • Average case performance Async Forum
Red Star • MIPS ISA • 32bit RISC • Fast and simple development • Use synchronous design methodology • Complicated features without complicated design effort • OOO execution, banked caching… Async Forum
Red Star Async Forum
Register Bank Async Forum
ADD R1, R1, #1 1401 Operations 7.14 GDs per operation Async Forum
Branch Logic PC +1 + Additional unnecessary stages to extend the branch shadow Async Forum
Overlapping Instructions Fetch Decode Execute Memory Dummy WriteBack Branch Shadow Fetch Decode Execute Memory Dummy WriteBack Fetch Decode Execute Memory Dummy WriteBack Fetch Decode Execute Memory Dummy WriteBack Fetch Decode Execute Memory Dummy WriteBack Fetch Decode Execute Memory Dummy WriteBack Fetch Decode Execute Memory Dummy WriteBack Fetch Decode Execute Memory Dummy WriteBack Async Forum
Nine Instruction Loop Async Forum
Caching: 4 Instruction Loop RAM Slice 0 Cache Slice 1 Cache 0 0 1 1 2 2 3 3 4 5 6 7 Slice 2 Cache 0:Instruction 1:Instruction 2:Instruction 3:Branch 0 Slice 3 Cache Async Forum
Caching: 3 Instruction Loop RAM Slice 0 Cache Slice 1 Cache 0 0 0 0 1 1 1 1 2 2 2 3 4 5 6 7 Slice 2 Cache 0:Instruction 1:Instruction 2:Branch 0 Slice 3 Cache Async Forum
Caching: Delayed Branch RAM Slice 0 Cache If (PC%WagLevel != Slice) Execute a NOP Don’t increment the PC Slice 1 Cache 0 0 1 1 2 2 3 4 5 6 7 Slice 2 Cache 0:Instruction 1:Instruction 2:Branch 0 Slice 3 Cache NOP Async Forum
Caching • Instead of one large 16Kb cache • 12bit address • 16 small 1Kb caches • 8bit address • Approximately 50% faster lookup • No data duplication Async Forum
Area • ~4 times larger than synchronous • Times the number of slices • Currently 45,000 gates per slice • 15,000 gates without the register bank • Approx 6 million transistors (16 way) • 2 million without the register bank • Final design target: 4 million transistors • Don’t wag the register bank (66% of area) • Simplify completion detection (50% of area) • Technology mapper • Complete the ISA Async Forum
How much is 4 million? Async Forum
How much is 4 million? Async Forum
How much is 4 million? Async Forum
How much is 4 million? Async Forum