940 likes | 1.06k Views
Goals. Provide an overview of the 860 device Allow a quick start of an 860 design cycle Gain familiarity with debug issues particular to the 860 Create the basis to build further experience. Outline. 860 Architecture Debug considerations. Outline. 860 Architecture Device overview
E N D
Goals • Provide an overview of the 860 device • Allow a quick start of an 860 design cycle • Gain familiarity with debug issues particular to the 860 • Create the basis to build further experience
Outline • 860 Architecture • Debug considerations
Outline • 860 Architecture • Device overview • Core CPU • SIU • CPM
PowerPC Core 4 KB I-Cache SYSTEM INTERFACE UNIT IMMU Memory Controller 4 KB D-Cache Bus Interface Unit DMMU System Functions Real Time Clock COMM. PROCESSOR MODULE PCMCIA Internal Memory Space MAC Four Timers Interrupt Controller Serial DMAs Virtual IDMAs FEC (860T) Parallel I/O 32-bit RISC and Program ROM Baud Rate Generators Timers SCC1 SCC2 SCC3 SCC4 SMC1 SMC2 SPI I2C Serial Interface Time Slot Assigner
CPU • Embedded version of the PowerPC core • One instruction fetched per clock • One instruction issued and retired per clock • Up to three instructions in execution per clock • Most instructions execute in one clock • Branches can execute in zero clocks
Programming Model 32 bits GPR0 CR GPR1 XER GPR2 FPSCR GPR3 GPR4 MSR PVR GPR30 GPR31 CTR LR TBU TBL SRR0 SRR1 DEC SPRn SPRx
MSR Bit 0 is MSB Bit 31 is LSB 0 0 0 0 0 0 0 0 0 0 0 0 0 POW 0 ILE EE PR FP ME 0 SE BE 0 0 IP IR DR 0 0 RI LE Power management enabled Interrupt little endian mode External interrupt enable Privilege level Floating point available Machine check enable Floating point exception mode [0,1] Single step trace enabled Branch trace enabled Exception [interrupt] prefix Instruction address translation enabled Data address translation enabled Recoverable exception Little endian mode
Integer Unit /,+,*,XER GPR File R0-R31 GP Rename Regs Completion Unit CPU Overview Inst. Cache Branch Processing Sequential Fetcher Inst. MMU CTR CR Instruction Queue LR Instruction Unit Dispatch Load/Store Unit Data MMU Main Memory Data Cache
Execution Units • Execution units operate in parallel • Fetch / Branch • Integer • Load / Store • Completion
Fetch / Dispatch • Instructions are fetched individually • Non-branch instructions enter the instruction queue • Branch instructions are redirected to the branch unit • One instruction can be sent to the execution units and one to the branch unit for a total of two issued instructions per clock • All instructions “appear” to execute sequentially
Branch Processing Instruction CTR Instruction Instruction CR Instruction LR Instruction On each CPU clock: 32 bit wide transfer from instruction cache Instruction Cache Instructions fall through to first open location in queue Branch instruction closest to the bottom of the queue is issued to the branch unit on each clock Instruction Bottom non-branch instruction is dispatched to available execution unit Execution Unit Instruction
Branch • Branches are pre-executed, giving an effective execution time of zero clocks • Instruction queue provides look ahead to determine data dependencies • Unresolved conditional branches are statically predicted under control of the compiler
Subroutine Control Flow Software maintained stack Address of this instruction is placed into the Link Register by the branch function GPR1 Branch to sub LR Instructions save the LR to the stack to allow nested function calls The LR is reused for another call Branch to sub LR Branch to LR The LR is recalled from the stack to allow a return from subroutine Branching to the contents of the LR is a return instruction
Integer • Integer unit directly accesses the GPR file • Rename registers prevent stalls and allow instructions to be un-executed • Most instructions execute in one clock
Load/Store • Responsible for all transfers between the GPR file and main memory • Speculative loads are placed in the rename registers • Speculative stores remain in the store queue
Completion • Holds instructions executed in parallel or out of order until they can be retired in order • Retiring an instruction commits it’s results to the processor state • Simply discarding an instruction from the completion queue effectively un-executes it • One instruction can be retired per clock
Instruction Set • 68K instructions were based on an accumulator, direct memory model add (0x00035300).L, D4 D0 0x00035300 D1 D2 D3 D4 D5 D6 + D7
Instruction Set • PowerPC instructions are based on a triadic, load/store model lwz r2,0x00035300 add r6,r2,r4 GPR0 0x00035300 GPR1 GPR2 GPR3 GPR4 GPR5 GPR6 + GPR7 GPR31
Exceptions • All exceptions cause processing to vector to a predetermined memory location • The base address of the vector table is controlled by the [IP] bit in the MSR • Each vector is placed at a page boundary • 64 instructions can be placed at a vector before hitting the next vector • Reset = 0xnnn00100 • Machine Check = 0xnnn00200 • External Interrupt = 0xnnn00500 • Decrementer = 0xnnn00900 • Etc.
Exceptions Flash MSR[IP] = 1 FFF00100 Instruction 64 instructions External 500 Instruction Instruction 64 instructions 400 Instruction ISI Instruction 64 instructions DSI 300 Instruction Instruction 64 instructions RAM MSR[IP] = 0 Machine Check 200 Instruction Instruction 64 instructions 00000100 Reset 100 Instruction
Exceptions • Only the Decrementer and the External Interrupt can be masked by the [EE] bit in the MSR • Machine Check exceptions can vector to a routine or force Checkstop state • All other exceptions are synchronous (caused by instruction execution) and are unmaskable
Nesting Exceptions • When an exception occurs, return state is stored in the processor • There is no automated stacking of critical registers • The address of the return instruction is stored in SRR0 • The MSR prior to the exception is in SRR1 • The [EE] bit of the MSR is cleared • The processor must save these registers and any other GPR’s to a software maintained stack • The EABI specifies GPR1 to be the stack pointer • The [RI] bit in the MSR is set by software when enough information is saved to allow recovery from a nested exception
Exception Control Flow An exception after the completion of Address of this instruction is placed into SRR0 by the hardware this instruction Software maintained stack causes flow to be directed to the GPR1 ISR SRR0 SRR1 Instructions save the SRR’s to the stack to allow nested exceptions The MSR[RI] bit is cleared by the exception hardware and set by software after the SRR’s have been saved It is safe for exceptions to occur in this section of code An exception while MSR[RI] is cleared causes a machine check event Breakpoints Are Exceptions! The SRR’s is recalled from the stack to allow a return from subroutine The MSR[RI] bit is cleared by the software just before the SRR’s are restored by the software rfi
Cache • Independent instruction and data caches implements an internal Harvard Architecture • Each cache is 4Kbyte, two way set associative • The 860P has an 8K, four way set associative instruction cache • Caching of separate memory areas is controlled by the MMU
State State Words 0-7 Words 0-7 Cache Organization 31 0 Stored in address tag (20) Set select (7) Word Byte Way 0 Block 254 128 sets Way 1 Block 255 Way 0 Address Tag 0 Block 0 Way 1 Address Tag 1 Block 1
Cache Operation • Each cache block (or line) can be in one of three state (MEI protocol) • M = modified (or dirty) • Resides in cache and is different than memory • E = exclusive (resident and clean) • Resides in cache and is identical to memory • I = invalid (not resident) • The “shared” state of the full MESI protocol is not supported • Would allow synchronization of multiply cached blocks • There is no cache snooping to monitor external masters
Cache control • Hardware implementation dependent registers (HIDn) control cache function • Enabling • Invalidate • Locking • Supervisor instructions provide block level control • Allocate, flush, invalidate, store, touch, zero • Ability to store a given block of memory into the cache is controlled by the MMU • Each block or page in the MMU has WIG bits • (Write-through, Inhibited, Guarded)
MMU • The MMU provides for both memory translation and access control • The system boots in Real (un-translated) mode • To effectively use the caches, the MMU must be used in page mode • Effectively, a null translation is performed
Protection • The primary use of the MMU in embedded applications is for cache control and access protection • The WIG bits are set for each page • W = write-through (applicable only to data cache) • I = inhibited • G = guarded (indicates that memory is ill-behaved) • I/O spaces • No speculative reads or pre-fetches
Translation • Page translation provides a virtual memory space of 252 bytes • System must be debugged with RTOS tools • Emulators and hardware debuggers don’t support it
Real mode 32 Logical address WIG: W = 0: write-back I = 0: cache enable G = 1: memory is guarded 32 Physical address
10 10 12 Level One Descriptor WG 20 10 00 Level Two Descriptor I Page mode Logical address 20 12 Physical address
Reset Types • Power-on reset is used to align all logic from a chaotic state after Vcc stabilizes • The PLL then begins to lock • Hard reset is analogous to the normal reset on other processors • The PLL is not affected • Soft reset can be used to initiate a warm start • Not commonly used • Not driven or monitored by the emulator • Basically, a non-returnable exception to the reset vector
Reset Sequence POR asserted HRESET asserted SREST asserted HREST & SREST asserted HREST & SREST asserted SREST asserted PLL locks RSTCONF sampled RSTCONF sampled Internal logic reset Internal logic reset Internal logic reset HREST & SRESET negated HREST & SRESET negated SRESET negated
Memory Map Startup Boot Map Application Target Map Before Software execution Flash Flash CS0 At boot, CS0 is active for the entire address space. All other chip selects are invalid. Flash Flash IMMR IMMR I/O CSi Flash Flash RAM CSx,y,z Flash
Configuration Word • Configuration word is latched from upper 16 bits of the data bus during reset cycle EARB IIP 00 BPS 0 ISB DBGC DBPC EBDF 0 0000_0000_0000_0000 • EARB – External arbitration • IIP – Initial core prefix • BPS – Boot port size • DBGC – Debug pin configuration • DBPC – Debug port pins configuration • EBDF – External bus division factor
Memory Map Implications • Since the Flash memory access by CS0 occupies the entire address space, boot code can be linked to execute in a number of different locations • Any branches will change the NIA from the boot location to the linked location • All other chip selects are off • IMMR RAM is still available • CS0 must be reduced in scope before activating other chip selects • Be careful no to pull the rug out from under the boot code when reducing CS0 • BSP re-entry issues: • Altering chip select option registers while assuming the value in the Valid bit • Can the chip selects to the RAM and Flash be altered while running out of either?
Memory Map Init Issues • Three different factors can enhance (confuse) the boot process: • The MSR[IP] • The reset vector can be 0x0000_0100 or 0xfff0_0100 • Determined by the Reset Configuration Word • Not changed by an SRESET • CS0 scope • CS0 responds to the entire memory map • It must be changed while it is being used • It may have already been reduced by a previous pass through the BSP • Code link results • Execution can start in code that is linked to a different address than the boot vector • Only the address lines within the memory device are significant • PC Relative addressing will solve this, right? WRONG! • The first branch, will set the NIA MSB’s to the current execution value
RTOS Boot Sequences Flash External application image Compressed application image Boot Code BSP Boot code loads application over communication channel or backplane Boot code decompresses and relocates application from flash IMMR Data, stack, heap, etc. I/O Chip Select x Uncompressed application image Base Register RAM Base Address V Option Register BSP Mask Options
Endian Bus Connections 8 Bit 68K 31 7 MS Byte Lane 24 0 7 LS Byte Lane 0 X86 31 MS Byte Lane 24 8 Bit 7 7 LS Byte Lane 0 0 8 Bit PPC 0 7 MS Byte Lane 7 0 24 LS Byte Lane 31
Big Endian Bus 8 Bit 16 Bit 32 Bit 0-7 0-7 8-15 0-7 8-15 16-23 24-31 7-0 15-8 7-0 31-24 23-16 15-8 7-0 860 0 MS Byte Lane 7 8 Byte Lane 15 16 Byte Lane 23 24 LS Byte Lane 31
SIU • The SIU contains the logic to interface the external system components to the 860 • Contains all of the glue logic needed for a typical embedded application
SIU Overview SYSTEM INTERFACE UNIT Memory Controller Bus Interface Unit System Functions Real Time Clock PCMCIA
860 bus cycle Address Data
Memory Control • 8 banks of memory • Each can be configured for any type of device • Glueless support of SRAM, EPROM, Flash • Using general purpose chip select machine • Two user programmable machines
System control • Clock synthesis • Reset control • Interrupt control • Real time clock • Periodic interrupt timer • Bus monitor • Bus arbiter • Watchdog timer
Interrupt Control Software Watchdog Timer Or IRQ[0-7] Port C [4:15] IRQ0 Reset CPM Timer[1:4] CPM Interrupt Controller Edge / Level CPU SCC [1:4] SIU Interrupt Controller Timebase SMC [1:4] INT SPI PIT I2C Realtime clock PIP IDMA [1:2] PCMCIA SDMA RISC Timers
SIU Interrupt Vectors • All external interrupts cause processing at 0xnnn00500 • There is space for 64 instructions to save processor state and resolve the SIU vector • Vectors are six bits • Indirect addressing is used to decommutate to service routines • A 16 bit load from the long word address of the SIVEC register will point to a 64 entry array of 1K byte (256 instruction) service routines. • An 8 bit load will allow a 64 entry jump table of branch instructions • A shifting operation can alter the size between these two choices
SIU Interrupt Vector Register 5 6 7 8 15 16 31 0 Six Bit Interrupt Code 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 8 bit read from address 0xnnnn001C 16 bit read from address 0xnnnn001C 32 bit read from address 0xnnnn001C