170 likes | 305 Views
Mostafa Koraei A presentation for DSP Implementation Course. Godson-3B1500. Main Reference : Godson-3B1500: A 32nm 1.35GHz 40W 172.8GFLOPS 8-Core Processor on ISSCC 2013. BLX IC Design Corporation founded in 2002 in Beijing BLX is fabless Fabricated in STMicroelectronics
E N D
MostafaKoraei A presentation for DSP Implementation Course Godson-3B1500 • Main Reference : Godson-3B1500: A 32nm 1.35GHz 40W 172.8GFLOPS 8-Core Processor on ISSCC 2013
BLX IC Design Corporation founded in 2002 in Beijing BLX is fabless Fabricated in STMicroelectronics Loongson processors with Linux are strategic for china Based on MIPS64 Prof Hu Weiwu History of Godson [1,2]
Instruction set is MIPS32 Internal architecture is different 32 bit , 266 MHz O.18 micron CMOS 8 KB Data/Instruction Cache 200 MFLOPS In 2007 ICT bought MIPS license Loongson1 or Godson-232 [1,2]
64 bit architecture 500 MHz Loongson2 Loongson2E • 4 way superscalar out of order execution • 2 ALU , 2 FPU • Separate 64/64 KB instruction and data L1 caches • On chip 512 KB L2 cache • Max 7 w at 1 GHz Loongson2F • DDRII memory controller • Max 4 W at 1 GHz [2,3]
Loongson2 Loongson2F • Software-controlled dynamic power management Loongson2G • 1 GHz 65 nm • 3 Watt • 64KB+64KB L1 (four-way) • 1 MB L2 cache Loongson2H [3] • SATA, USB, GMAC controller [2,3]
Scalable architecture at 2010 65 nm ,1 GHz , Quad core ,Max 15 Watt 64 entry register file Separated 16 entry reservation station for fixed point and floating point Reconfigurable core GS464 or Gstera Dynamic L2 cache migration DMA reconfigurable to data is from or to L2 or main memory Godson 3 [4,6]
Micro Architecture BTB : Branch Target Buffer BHT: Branch History Table AGU: Address Generation Unit ITLB: Instruction Translation look aside Buffer DTLB: Data Translation look aside Buffer [6]
8 core ,32 nm , 1.35 GHz ,40 Wat 172.8 GFLOPS Cu-layer high-κ metal-gate (HKMG) 1.14 billion transistors in 182.5mm2 35% power-efficiency improvement Godson 3B1500 [1,5]
last-level cache (LLC) is increased from 4MB to 8MB a 4-way 128KB private victim cache in each core low-cost asynchronous FIFO between every core and uncore to isolate cores in voltage and frequency Modifications of the memory hierarchy [1]
Update P2P HyperTransport (HT) from 1 to 2 Memory access speed from DDRIII 800 to DDRII 1200 Modification in I/O [1]
Clocking scheme globally asynchronous locally synchronous (GALS) External rclk 33 MHz Global clock glck 200 MHz DFS Dynamic Frequency Scaling 1.5 GHz Node Clock 1 GHz DCDL : digital-controlled delay lines [5]
Two 64b 153.6Gb/s on-die termination (ODT) impedance (60-120Ω range with 5Ω step) Dynamic output slew-rate control dynamic off-chip driver (OCD) impedance (34-40Ω range with 1Ω step Memory Interface [1]
Bandwidth of 22.4GB/s Up to 2.8Gb/pin/s BER of less than 10^-15 (CDR)simple direct sampling in low-power mode All-digital DLL-based CDR for high speed HyperTransport [1]
HPC with Godson-3A Linpack test results [4]
[1] Godson-3B1500: A 32nm 1.35GHz 40W 172.8GFLOPS 8-Core Processor ISSCC 2013 / SESSION 3 / PROCESSORS/ 3.5 [2] Loongson on Wikipedia [3] Godson-2H: a complex low power SOC in 65nm CMOS IEEE 2012 [4] Design and Implementation of BIOS for Godson-3A Interconnections IEEE 2011 [5] Godson-3B: A 1GHz 40W 8-Core 128GFLOPS Processor in 65nm CMOS ISSCC 2011 / SESSION 4 / ENTERPRISE PROCESSORS & COMPONENTS / 4.4 [6] GODSON-3:A SCALABLE MULTICORE RISCPROCESSORWITHX86 EMULATION IEEE 2009 [7] Microarchitecture and Performance Analysis of Godson-2 SMT Processor IEEE 2006 References