530 likes | 661 Views
MAD MAC 525. Farhan Mohamed Ali Jigar Vora Sonali Kapoor Avni Jhunjhunwala. Design Manager: Zack Menegakis. 1 st May, 2006 Final Presentation. Design a crucial part of a GPU called the Multiply Accumulate Unit (MAC) which is revolutionizing graphics. Agenda. Marketing – Jigar
E N D
MAD MAC 525 Farhan Mohamed Ali Jigar Vora Sonali Kapoor Avni Jhunjhunwala Design Manager: Zack Menegakis 1st May, 2006 Final Presentation Design a crucial part of a GPU called the Multiply Accumulate Unit (MAC) which is revolutionizing graphics
Agenda • Marketing – Jigar • Project and Algorithm Description – Farhan • Implementation Part I – Farhan • Implementation Part II – Sonali • Floorplan – Sonali • Layout – Avni • Verification – Avni • Design Specifications – Avni • Conclusion – Jigar
Marketing Jigar
Purpose MAD MAC 525 accelerates FP16 blending to enable true HDR graphics Huh?? Marketing Description Implementing Floorplan Layout Verify Specifications
Beauty of High Dynamic Range • With HDR rendering, pixel intensity can extend beyond the range of traditional graphics • Nature doesn’t have a limited pixel intensity and neither should Computer Graphics • In other words: • Bright things can be really bright • Dark things can be really dark • And the details can be seen in both Marketing Description Implementing Floorplan Layout Verify Specifications
Applications of HDR Marketing Description Implementing Floorplan Layout Verify Specifications
Target Market • Target Market Segment • Graphic chip manufacturers • High speed DSP manufacturers • CPU co-processors • Potential Customers Marketing Description Implementing Floorplan Layout Verify Specifications
Design Comparison • Top 180nm graphics chip is the NVIDIA NV16. • Highest speed only 250MHz • 9 bit Integer precision • As games are becoming more advanced, they are in need of fast graphics chips • Conclusion: Market Needs a FAST MAD MAC Marketing Description Implementing Floorplan Layout Verify Specifications
Project Description • Multiply Accumulate unit (MAC) • Executes function AB+C on 16 bit floating point inputs. • Format – 1 bit sign, 5 bit exponent and 10 bit significand • Multiply and add in parallel to greatly speed up operation • Rounding performed only once so greater accuracy than individual multiply and add functions. • Also known as: • Fused Multiply Add (FMA) • Multiply Add (MAD/MADD) in graphics shader programs Marketing Description Implementing Floorplan Layout Verify Specifications
Algorithm • FP Multiply (A*B) • Multiply significands • Add exponents • Normalize • Round • FP Add (A+B) • Align smaller number to larger number • Add significands • Normalize • Round Marketing Description Implementing Floorplan Layout Verify Specifications
Algorithm • FP Multiply-Add (AB+C) • Align sig C based on exp A+B-C • Multiply significands A and B • Add sig A*B result to aligned sig C • Normalize • Round Marketing Description Implementing Floorplan Layout Verify Specifications
Block Diagram A B C Multiplier Exp Calc Align Adder Leading 0 Anticipator Normalize Y Round Output Ovf Checker Marketing Description Implementing Floorplan Layout Verify Specifications
Implementation • Design target: 300MHz • Speed is the design goal • Ambitious target? • How we planned achieve this • Fast Logic – parallelize ops as much as possible • Pipelining Marketing Description Implementing Floorplan Layout Verify Specifications
Implementation • Adder • Carry Select vs Carry Lookahead tree Marketing Description Implementing Floorplan Layout Verify Specifications
Implementation • Adder • Han-Carlson based carry lookahead adder • 6 lookahead logic stages for 32 bit adder • Less logic than a Kogge-Stone adder • Less wiring than a Brent-Kung adder Marketing Description Implementing Floorplan Layout Verify Specifications
Implementation • Multiplier • Carry-Save Multiplier • Avoids having ripple carry in every stage • Enables regular and compact layout • Easy to pipeline • Final 10 bit add stage using carry lookahead adder Marketing Description Implementing Floorplan Layout Verify Specifications
Implementation • Leading Zero Anticipator • Predicts number of shifts to do in normalize • Normalize begins with zero delay • Operates in parallel with adder so normalize shifts can be predicted with accuracy of 1 shift to left or right Marketing Description Implementing Floorplan Layout Verify Specifications
Implementation • Latches • Pulse Latches • Practically eliminates setup time • 16 transistors per pulse generator • Simplified version of those used in a certain high speed CPU Clock pulse generator Marketing Description Implementing Floorplan Layout Verify Specifications
Design Decision: Pass Logic • Extensive use of Pass Logic • Reduces transistor count • Reduces area • Transistor count reduced from 20,200 to 12,800 Example • Normalize: 3400 -> 942 • Align: 1500 -> 530 • Ensure all pass logic is buffered Marketing Description Implementing Floorplan Layout Verify Specifications
Design Decision: Pipelining • Initially planned 6 pipeline stages • Reduced to 4 pipeline stages • Adder – Fast Carry Lookahead architecture • Multiplier – Ripple Carry to Carry Lookahead Marketing Description Implementing Floorplan Layout Verify Specifications
Pipeline Stages Reg A Reg B Reg C Multiplier Exp Calc Align C Adder Ld Zero Normalize Round Output Marketing Description Implementing Floorplan Layout Verify Specifications
Schematics I N P U T S P I PELINE • Multiplier OUTPUTS P I P E L I N E O U T P U T S Marketing Description Implementing Floorplan Layout Verify Specifications
Schematic OUTPUTS Sum Logic • Adder Look Ahead Logic Look Ahead Logic Look Ahead Logic Look Ahead Logic Look Ahead Logic Look Ahead Logic INPUTS Marketing Description Implementing Floorplan Layout Verify Specifications
Floorplan Evolution Initial Floorplan Multiplier Reg A Reg C Exp Calc Reg B Align C Pipeline Reg Pipeline Reg Adder Ld Zero Pipeline Reg Round Normalize Overflow checker Reg Y Marketing Description Implementing Floorplan Layout Verify Specifications
Floorplan Evolution Final Floorplan Reg C Reg A Exponents Multiplier Reg B Ld zero Align Adder O v f N o r m a l i z e Output R o u n d Marketing Description Implementing Floorplan Layout Verify Specifications
Layout Decisions • 3 cell heights – 6.03, 5.04 and 3.55 • Uniform width vdd and ground rails • Wider vdd and ground rails in power hungry modules • Max of 8 latches per clock pulse generator • Uniform metal directionality within each block Marketing Description Implementing Floorplan Layout Verify Specifications
Final Layout Marketing Description Implementing Floorplan Layout Verify Specifications
Final Layout MULTIPLIER Marketing Description Implementing Floorplan Layout Verify Specifications
Multiplier I N I N • Height: 191.6 • Width: 206.38 • Area: 20,388 B I T S L I C E P I P E L I N E R E G O U T P U T O U T P U T Marketing Description Implementing Floorplan Layout Verify Specifications
Final Layout MULTIPLIER ADDER Marketing Description Implementing Floorplan Layout Verify Specifications
Adder A D D E R • Height:122.9 • Width: 110.2 • Area:13,202 INCREMENTER Marketing Description Implementing Floorplan Layout Verify Specifications
Final Layout Input Exponents Input Multiplier Ld zero Align Adder O v f O U T N o r m a l i z e R o u n d Marketing Description Implementing Floorplan Layout Verify Specifications
Layer Masks Active: 14.04% Marketing Description Implementing Floorplan Layout Verify Specifications
Layer Masks Poly : 9.25% Marketing Description Implementing Floorplan Layout Verify Specifications
Layer Masks Metal 1 : 34.08% Marketing Description Implementing Floorplan Layout Verify Specifications
Layer Masks Metal 2 : 18.00% Marketing Description Implementing Floorplan Layout Verify Specifications
Layer Masks Metal 3 : 14.99% Marketing Description Implementing Floorplan Layout Verify Specifications
Layer Masks Metal 4 : 6.23% Marketing Description Implementing Floorplan Layout Verify Specifications
Verification Of Design • Behavioral and Structural Verilog • Extensive Testing – Unable to find C or Matlab Code • Schematic and Layout testing • Analog Simulations – Compare Output with Behavioral • Full Chip Verification Marketing Description Implementing Floorplan Layout Verify Specifications
Design Specifications • Critical path delay = 2.25ns • Clock speed = 400MHz • Pipeline stages = 4 • Height by width = 195.26 um * 303.255 um • Area = 59,214 um^2 • Aspect ratio = 1:1.55 • Transistor density = 0.22 • Total Pin Count = 67 Marketing Description Implementing Floorplan Layout Verify Specifications
Marketing Description Implementing Floorplan Layout Verify Specifications
Marketing Description Implementing Floorplan Layout Verify Specifications
Conclusion Jigar
Everyone Needs a MAD MAC • Graphics – HDR Rendering, Blending and Shader ops • Fastest 180nm GPU: 250 MHz (9-bit Int) • MAD MAC 525: 400 MHz (16-bit FP) Marketing Description Implementing Floorplan Layout Verify Specifications
Everyone Needs a MAD MAC • DSPs – Computing Vector Dot-Products in Digital Filters Marketing Description Implementing Floorplan Layout Verify Specifications