90 likes | 220 Views
Presentation 1 MAD MAC 525. Farhan Mohamed Ali (W2-1) Jigar Vora (W2-2) Sonali Kapoor (W2-3) Avni Jhunjhunwala (W2-4) Siven Seth (W2-5). W2. 25 th January, 2006 Design Proposal. MAD MAC 525 Status:. Project chosen Specifications defined Architecture (in progress) To be done
E N D
Presentation 1 MAD MAC 525 Farhan Mohamed Ali (W2-1)Jigar Vora (W2-2)Sonali Kapoor (W2-3) Avni Jhunjhunwala (W2-4) Siven Seth (W2-5) W2 25th January, 2006 Design Proposal
MAD MAC 525 Status: • Project chosen • Specifications defined • Architecture (in progress) • To be done • Verilog : Gate level design • Schematic • Floor plan • Layout • Extraction, LVS, post-layout simulation
MAD MAC and HDR • MAD MAC accelerates FP16 blending to enable true HDR graphics • What is HDR? • HDR = High Dynamic Range • Dynamic range is defined as the ratio of the largest value of a signal to the lowest measurable value • Dynamic range of luminance in real-world scenes can be 100,000 : 1 • With HDR rendering, pixel intensity are allowed to extend beyond [0..1] range of traditional graphics • Nature isn’t clamped to [0..1] and neither should CG • In lay terms: • Bright things can be really bright • Dark things can be really dark • And the details can be seen in both
MAD MAC 525 • Multiply Accumulate unit (MAC) • Executes function AB+C on 16 bit floating point inputs • Multiply and add in parallel to greatly speed up operation • Rounding is only performed only once so greater accuracy than individual multiply and add functions. • Also known as: • Fused Multiply Add (FMA) • Multiply Add (MAD/MADD) in graphics shader programs • Many applications benefit from a fast FMA • Graphics – HDR rendering, blending and shader ops • DSPs – computing vector dot-products in digital filters • Fast division, square root – eliminates extra hardware • Available in many newer CPUs and DSPs because it’s so cool • One ring (circuit) to rule them all!
Design Decisions: • Implementing a 16 bit (fp16) format • 1 bit sign, 10 bit significand and 5 bit exponent • Range of 6.0e-8 to 6.5e4 • Used today in the industry in HLSL graphics shaders • Compatible with OpenEXR format used in latest games • Adder – high speed custom hybrid adder • Multiplier – use array multiplier for speed and pipelining • Speed – target >=300Mhz on 180nm manufacturing process
Estimated Transistor Count • Registers (input, output, pipelining) - 2500 • Array multiplier 6000 • Adder 2000 • Alignment shifter + lead 0 anticipator 2000 • Normalize 2000 • Rounding 1500 • Special cases and control logic 2000 • Total 18000
Problems and Questions??? • Prefer 32 bit for greater accuracy but number of pins and transistor count would be beyond the scope of this class • Hard to estimate transistor count at this point