350 likes | 380 Views
AndesCore TM N1213-S. AndesCore™ N1213-S. CPU Core 32bit CPU Single issue with 8-stage pipeline Andestar™ ISA with 16-/32-bit intermixable instructions to reduce code size Dynamic branch prediction to reduce branch penalties 32/64/128/256 BTB Configurability for customers
E N D
AndesCoreTM N1213-S www.andestech.com
AndesCore™ N1213-S • CPU Core • 32bit CPU • Single issue with 8-stage pipeline • Andestar™ ISA with 16-/32-bit intermixable instructions to reduce code size • Dynamic branch prediction to reduce branch penalties • 32/64/128/256 BTB • Configurability for customers • Configuration options for power, performance and area requirements
AndesCore™ N1213-S • MMU • fully-associative iTLB/dTLB: 4 or 8 entries • 4-way set-associative main TLB: 32/64/128 entries • Two groups of pages size support: (4K,1M) and (8K,1M) • Locking support for TLB • I & D cache • Virtual index and physical tag (for faster context switching) • Cache size: 8KB/16KB/32KB/64KB • Cache line size: 16B/32B • 2/4-way set associative • I Cache locking support
AndesCore™ N1213-S • I&D Localmemory • wide range support for internal /external local memory • 4KB~1024KB • Provide fixed access latencies for internal local memory • Double buffer mode for D local memory • Optional external local memory interface • Bus • Synchronous/Asynchronous AHB • 1 or 2 port configuration • Synchronous HSMP • AXI like • 1 or 2 port configuration
AndesCore™ N1213-S • For performance • Improved memory accesses: • 1D/2D DMA, load/store multiple • Efficient synchronization without locking the whole bus • Load lock, store conditional instructions • Vectored interrupt to improve real-time performance • 6 interrupt signals • MMU • Optional HW page table walker • TLB management instructions • For flexibility • Memory-mapped IO space • PC-relative jumps for position independent code • JTAG-based debug support • Optional embedded program trace interface • Performance monitors for performance tuning • Bi-endian modes to support flexible data input
AndesCore™ N1213-S Overview • For power Management • Clock-gated pipeline • Low-power mode support instructions • Redundantmemoryaccessreduction • Many CPU/bus frequency ratio support
Cache SRAM example – 32KB • Instruction cache tag • 256 (cache line#) x 4 (ways) x 22 • 22={Valid (1), Lock (1), index (20)} • Instruction cache data (32KB) • 2048 (entry #) x 32bit x 4 (ways) • Data cache tag • 256 (cache line#) x 4 (ways) x 22 • Data cache data (32KB and byte access) • 2048 (entry #) x 8x4bit x 4 (ways)
N1213-S Cache configuration • Cache sets per way • 128/256/512 • Cache ways • 2/4 ways • Cache line size • 16B/32B • Cache size combination • 256X16BX2=8KB • 128X32BX2=8KB • 256X16BX4=16KB • 512X16BX2=16KB • 128X32BX4=16KB • 256X32BX2=16KB • 256X32BX4=32KB • 512X32BX2=32KB • 512X16BX4=32KB • 1024X16BX2=32KB • 512X32BX4=64KB • 1024X16BX4=64KB
Cache replacement algorithm • Pseudo LRU (default) • Random
M-TLB Tag M-TLB data M-TLB Tag M-TLB data N1213-S MMU organization M-TLB entry index IFU LSU N(=32) sets k(=4) ways =128-entry 4/8 I-uTLB 4/8 D-uTLB 6 4 Set number 0 5 Way number Log2(N*K)-1 Log2(N) Log2(N)-1 0 M-TLB arbiter 32x4 M-TLB HPTWK Bus interface unit
MTLB entry:63bit • MTLB tag • VPN[31:12]: Virtual page number • 4KB : VPN[31:12] • 8KB : VPN[31:13] • CID[8:0]:process ID • G :Global bit • S :S=1 1MB page table • Valid • Lock • MTLB data • D : dirty bit • X : executable bit • A : accessed bit • PPN[31:12]: Physical page number • C[2:0] : Cacheability attributes • M[2:0] :Access privilege for user and superuser mode • MTLB configuration options • 32x4=128 • 16x4=64 • 8x4=32
Support Inter./ext. vector interrupt • Internal vector interrupt • where interrupts are prioritized inside an AndeScore™ • Hw0 has highest priority • External Vectored Interrupt • where interrupts are prioritized outside AndeScore using an external interrupt controller. • The size of the vectored entry point can be from 4 bytes to 16/64/256 bytes.
Local memory • Internal or external local memory configuration options • Two different access modes for internal local memory • Normal access mode • Double buffer mode • 2 bank structure • ½ local memory size • CPU and DMA can access the same time
DMA • Two channels • One active channel • Only accessed by superuser mode • For both instruction and data local memory • External address can be incremented with stride • Optional 2-D Element Transfer from external memory N1213-S Local Memory DMA Controller Ext. Memory 1D/2D
N1213-S BUS • AMBA 2.0 AHB bus • 1 port • 2 port • ICU/MMU (read only) for port 1 • LSU/DMA/EDM (read/write) for port 2 • HSMP • High speed memory port • Same frequency with CPU core • AMBA 3.0 (AXI) protocol compliant, but with reduced I/O requirements • 1 and 2 port configuration
N1213-S Debug environment AICE N1213-S External ICE CPU core EDM USB In circuit emulator
EDM (Embedded Debug Module) block diagram N1213-S BCU: Breakpoint compare unit TAP:JTAG style interface DIMU:Store debug program
Performance Monitor Performance Counter 0 Performance Counter 1 Performance Counter 2 CPU Cycle Counting CPU Clock Cycle Instruction Counting CPU Instruction Executions … Cache Miss Event Memory and Cache Access Branch or Other Events …
Downsizing control 32KB 16KB 128-entry 64-entry 64X2(Max size=128) 32X2 64 32X1 • Cache and MTLB for 4 way only
Signal pins • General port signals • Reset, CPU clock, AHB clock, Bus_CLOCK_Phase • Configuration port signals • Endian setting • IVB (initial vector base) • Interrupt port signals • AHB interface signal • Multi-core lock signal • HSMP interface signal • Power management • Standby, Wakeup • EDM interface signals • Tracer interface signals • Test port signals • Scan, Mbist, ….. • Optional external local memory interface signals
Clock ratio • The clock bus ratio between CPU core and AMBA bus clock are 1/1,2/1,3/1,4/1,5/1,6/1,3/2,5/2,8/1,10/1,12/1,14/1,15/1,18/1,20/1. • Clock divider is not part of AndeScore • While the high speed memory bus clock is the same with CPU core clock.
Configuration options • Cache size (I & D) • 256X16BX2=8KB • 128X32BX2=8KB • 256X16BX4=16KB • 512X16BX2=16KB • 128X32BX4=16KB • 256X32BX2=16KB • 256X32BX4=32KB • 512X32BX2=32KB • 512X16BX4=32KB • 1024X16BX2=32KB • 512X32BX4=64KB • 1024X16BX4=64KB • Direct map 2 or 4 bank • Write through only (D cache) • Cache policy • Pseudo-LRU (default), Random
Configuration options • Instruction queue • 2/4/8 • u-ITLB-entry/ u-DTLB-entry • 4 or 8 • MTLB-entry • 8x4 (way)=32 • 16x4=64 • 32x4=128 • BTB-entry • 16x2 (way)=32 • 32x2 =64 • 64x2=128 • 128X2=256 • Internal or external local memory • ILM/DLM • 4KB/8KB/16KB/32KB/64KB/128KB/256KB/512KB/1024KB
Configuration options • AHB port • 1 or 2 • HSMP • 1 or 2 • AHB clock synchronization • Synchronous (default) or Asynchronous • EDM break point • 0/1/2/3/4/5/6/7/8
Configuration options • Optional (exist or not) • HPTW (hardware page table walker) • 16bit ISA • Performance extension ISA • MAC related ISA (refer to MSC_CFG) • ICE • EPT interface • Performance monitor • Gated clock
EDA tools • Synthesizer • Synopsys Design Compiler • Simulator • Cadence Incisive • Formal verification • Cadence Formality • STA • Synopsys PrimeTime • FPGA • Synplicity +Xilink
N1213 on UMC 0.13HS process * 1.08V 125C slow silicon
Thank You!!! www.andestech.com