150 likes | 284 Views
Jaguar: A Next-Generation Low-Power x86-64 Core. University of Tehran School of Electrical and Computer Engineering . Provided By: Ali Teymouri Based on article “ Jaguar: A Next-Generation Low-Power x86-64 Core ” Coarse: Custom Implementation of DSP Systems . Outline. Introduction
E N D
Jaguar: A Next-Generation Low-Power x86-64 Core University of Tehran School of Electrical and Computer Engineering Provided By: Ali Teymouri Based on article “Jaguar: A Next-Generation Low-Power x86-64 Core” Coarse: Custom Implementation of DSP Systems
Outline • Introduction • Motivation • Comparing two core • Architecture • Improvements • Conclusion
List of AMD microprocessors [5] • 1 AMD-originated architectures • 1.1 Am2900 series (1975) • 1.2 29000 (29K) (1987–95) • 2 non x86 architecture processors • 2.1 2nd source (1974) • 2.2 2nd source (1982) • 3 x86 architecture processors • 3.1 2nd source (1979–91) • 3.2 Am X86 series (1991–95) • 3.3 K5 architecture (1995) • 3.4 K6 architecture (1997–2001) • 3.5 K7 architecture (1999–2005) • 3.6 K8 core architecture • 3.7 K10 core architecture • 3.8 Bulldozer module architecture • 3.9 Bobcat core architecture
Bobcat • Core power gating and a micro architecture optimized for low power • designed for mobile, tablet • to address the specific customer demands • 4.5 – 18 watt power range Bobcatlow-power core [2]
Jaguar core Jaguar Bobcat low-power core [4] Jaguar core [4]
Jaguar CU • First AMD 28nm quad-core x86-64 • Build unit to deploy into a wide variety of SoCs for different applications • Span wide array of applications from sub 5W to 25W Jaguar CU[4]
Motivation Jaguar • Build SoC to fit range of markets – Tablet, hybrids – Value notebook – Ultrathin notebook – Value desktop [1]
Architecture • Improved IPC, frequency and power more than BT • Estimated typical IPC improvement over “Bobcat”: >15%* • The load-store unit is redesigned • 4x32B Instruction Cache loop buffer for power • Improved Instruction Cache prefetcher for IPC • Added L2 prefetcher • Added hardware integer divider • Improved C6 and CC6 entry/exit latencies • Clock gate >92% flops in typical applications
Architecture • The JG core is optimized at two main frequency targets, low and high voltage • giving the core a dynamic range for application in several markets • 3 Vt solution: • HVT/RVT/LVT • Longer lengths for each Vt • BT had 10 metal stack • JG uses 11 metal stack [1]
High Speed Flop • custom built flip-flops [4] to maximize performance over traditional master-slave flops • larger flops consume more dynamic power • To minimize the power and area impact they are inserted only in critical paths custom flops account for < 8% [1]
CU Level Clock Distribution • Matched clock delay to all endpoints to minimize latency • extensive clock gating • Each unit’s clock independently gated to reduce dynamic power [1]
Power Gating • Integrated Power Gating • Headers have 4 independent enables to Longer lengths for each Vt • Diagram showing highlighted headers within the JG core • Area overhead is ~3% [1] [1]
Conclusion • “Jaguar” is first AMD 28nm bulk CPU • Quad core with shared L2 • support a wide range of applications • Is low-power and Focus on high density and smaller chip area • Improved IPC, frequency and power more than BT • Worthy successor to “Bobcat” x86-64 core
References • [1]. T. Singh, J. Bell, S. Southard. , “Jaguar: A Next-Generation Low-Power x86-64 Core,” in 2013 IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech . Papers , Feb. 17–21, 2013, section 3 • [2]. D. Foley, P. Bansal, D. Cherepacha, R. Wasmuth, A. Gunasekar, S. Gutta, A. Naini, ‘‘A Low-Power Integrated x86–64 and Graphics Processor for Mobile Computing Devices, ’’ IEEE Journal of Solid-State Circuits , VOL. 47, NO. 1, January 2012. • [3]. www.hitechreview.com • [4]. www. semiaccurate.com • [5]. www.wikipedia.org