230 likes | 452 Views
Introduction to the C2000 Control Law Accelerator Part 1. Lori Heustess C2000 Applications April 8, 2009. Session Agenda. Introduction: What is it? Why is it? Architecture: Floating-Point Format, Tasks, CLA Execution Flow,
E N D
Introduction to the C2000 Control Law Accelerator Part 1 Lori Heustess C2000 Applications April 8, 2009
Session Agenda Introduction:What is it? Why is it? Architecture: Floating-Point Format, Tasks, CLA Execution Flow, Time Slicing, Register Set, Program and Data Bus, Memory and Register Access Instructions: Format, Addressing Modes, Types of Instructions Parallel Instructions, Status Flags Pipeline:Pipeline Stages, Affects on Instructions CLA Compared to C28x+FPU CLA in a Control System: Code Partitioning, “Just in Time” ADC Sampling Code Development and Debug: Anatomy of CLA Code, Initialization, Code Debug
Session Agenda Introduction:What is it? Why is it? Architecture: Floating-Point Format, Tasks, CLA Execution Flow, Time Slicing, Register Set, Program and Data Bus, Memory and Register Access Instructions: Format, Addressing Modes, Types of Instructions Parallel Instructions, Status Flags Pipeline: Pipeline Stages, Affects on Instructions CLA Compared to C28x+FPU CLA in a Control System: Code Partitioning,“Just in Time” ADC Sampling Code Development and Debug: Anatomy of CLA Code, Initialization, Code Debug
3.3V What is the Control Law Accelerator (CLA)? An independent 32-bit floating-point math accelerator C28x CPU High Res PWM 12-bit ADC CLA CMP • Operates independently of the C28x CPU • Clocked at the CPU frequency (SYSCLKOUT) • independent register set, memory bus structure and processing unit • Direct access to ePWM+HRPWM, ADC result and comparator registers • Low interrupt response time – no nesting of tasks • Can read an ADC result “just-in-time” • Execution of algorithms in parallel with the C28x CPU • Executes time-critical control loops concurrently with the main CPU
3.3V What is the Control Law Accelerator (CLA)? An independent 32-bit floating-point math accelerator C28x CPU High Res PWM 12-bit ADC CLA CMP • Fully programmable: IEEE 32-bit floating-point • Easier to code than fixed-point • Inherently more robust • Removes scaling and saturation burden. • Sign inversion problems go away • Supports instructions to convert fixed-point to float when reading in the value (ex: MIU16TOF32)
3.3V Key Drivers Digital Power Applications Improved system robustness (IEC60730, SIL-3) Free-up C28x CPU for other tasks (comms, diagnostics) Improved support for multi-channel (phase/freq) loops Faster system response & higher MHz control loops Reduced sample-to-output delay (& jitter) Benefits of the CLA An independent 32-bit floating-point math accelerator C28x CPU High Res PWM 12-bit ADC CLA CMP Automotive, White-goods All Applications
PICCOLO Device With CLA (F2803x series) DAC VDDA Analog Power 3.3V +/-10% Digital Power VREG 3.3V +/-10% VDDIO 3 x Comp VDD (core voltage) VSSA VREGENZ VSS POR/BOR XRSn 32-bit CLA 60MHz Msg RAM 256Byte Prog RAM 8KByte Data0 RAM 2KByte Data1 RAM 2KByte Secure Sleep EQEP LIN (SCI) CAN DAC 3 x Comp 3xDAC 10-bit 3 x Comp Ax - A I O M u x + G P I O M u x GPIO0 P e r B u s EPWM1 HRPWM ADC 12-bit 2 S/H 4.6MSPS Interrupt EPWM2 HRPWM EPWM3 HRPWM EPWM4 HRPWM EPWM5 HRPWM EPWM6 HRPWM Bx 32-bit C28-CPU 60MHz Per Bus EPWM7 HRPWM OSC1 10MHz Interrupt JTAG PLL 3 External Interrupts 32b 32b 32b OSC2 10MHz WD P e r B u s ECAP m u x LPM X1 EXT XTAL M0,M1 RAM 4KByte Secure FLASH 64/128 KByte 4/8 sectors L0 RAM 4KByte X2 SCI GPIO MUX 2 * SPI SPI Boot ROM XCLKIN I2C OTP 2KByte GPIOx
Session Agenda Introduction: What is it? Why is it? Architecture: Floating-Point Format, Tasks, CLA Execution Flow, Time Slicing, Register Set, Program and Data Bus, Memory and Register Access Instructions: Format, Addressing Modes, Types of Instructions Parallel Instructions, Floating-Point Flags Pipeline: Pipeline Stages, Affects on Instructions CLA Compared to C28x+FPU CLA in a Control System: Code Partitioning, “Just in Time” ADC Sampling Code Development and Debug: Anatomy of CLA Code, Initialization, Code Debug
IEEE Single-Precision Floating-Point Format S E M Value 0 1 0 0 Positive or Negative Zero 0 1 0 Non-Zero Denormalized Number 0 1 1–254 0–0x7FFFF Positive or Negative Values* 0 1 255 (max) 0 Positive or Negative Infinity 0 1 255 (max) Non-Zero Not a Number (NaN) 1 Sign Bit (0 = Positive, 1 = Negative) S E M 8-bit Exponent (Biased) 23-bit Mantissa (Implicit Leading Bit + Fraction Bits) *Normal Positive and Negative Values are Calculated as: ( -1 ) s x 2 (E-127) x 1.M+/- ~1.7 x 10 -38 to +/- ~3.4 x 10 +38 The normalized IEEE numbers have a hidden 1. Thus the equivalent signed integer resolution is the number of mantissa bits + sign + 1
IEEE Single-Precision Floating-Point Format IEEE 754 Most Widely Used Standard for Floating Point • Standard number formats, special values (NaN, infinity) • Rounding modes & floating-point operations • Used on many CPUs including C67x, C28x+FPU Simplifications for the CLA (Same as C28x+FPU): • Flags & compare operations: Negative zero is treated as positive zero • Denormalized values are treated as zero • Not-a-number (NaN) is treated as infinity • Round-to-zero mode supported (truncate) • Round-to-nearest mode supported (even) These formats are commonly handed this way on embedded processors. Note: The C3x used a different format.
MIFR MICLR MIFRC MIOVF MICLROVF MVECT1 to MIER MIRUN MVECT8 MR0 (32) MR1 (32) MAR0 MR2 (32) MAR1 MR3 (32) C28x CPU CLA Configuration Registers IACK #16bit Task Triggers From Peripheral Interrupts or Software CLA1_INT1 to CLA1_INT8 LVF, LUF PIE INT11 INT12 ADCINT1 to ADCINT8 MPERINT1 to MPERINT8 EPWM1_INT to EPWM7_INT Main CPU Read/Write Data Bus T0INT (CPU Timer 0) MPISRCSEL1 Data RAMs Map to CPU or CLA Space Map to CPU or CLA Space CLA Program Memory MEMCFG RAM0 CLA Data Bus MCTL CLA Program Bus RAM1 Message RAMs CPU to CLA CLA Data Read Addr Bus CLA Execution Registers CLA to CPU CLA Prog Data Bus CLA Data Read Data Bus Registers CLA Prog Addr Bus CLA Data Write Addr Bus ADC Result ePWM HRPWM MSTF (32) CLA Data Write Data Bus Main CPU Data Read Bus COMP MEALLOW MPC Main C28x CPU Bus
What is a CLA Task? CLA Task: CLA assembly code routine CLA Supports 8 interrupts (Task1 to Task8) The start address of the task is configurable (MVECTx) The end address is marked with an MSTOP instruction Executed by the CLA in response to an interrupt event • The task executed depends on the interrupt received: • Interrupt 1 = Task1: ADCINT1 or EPWM1_INT (Highest Priority) • Interrupt 2 = Task2: ADCINT2 or EPWM2_INT • … • Interrupt 7 = Task7: ADCINT7 or EPWM7_INT • Interrupt 8 = Task8: ADCINT8 or CPU Timer 0 (Lowest Priority) • Once a task begins it runs to completion (no task/interrupt nesting) Tasks can also be started via the main CPU’s IACK instruction For example: IACK #0x0003 will flag Task1 and Task2
CLA Time Slicing CPU Task 4 CPU Task 2 CPU Task 3 CPU Task 3 CPU Task 1 CPU Task 1 CLA Task 1 CLA Task 2 CLA Task N CLA Task 1 CLA Task 2 The CLA performs multiple tasks using the "Time Slicing" method The main CPU handles other system tasks, the two work in parallel Communication between CPU & CLA is via shared RAM
MIFR MICLR MIFRC MIOVF MICLROVF MVECT1 to MIER MIRUN MVECT8 MR0 (32) MR1 (32) MAR0 MR2 (32) MAR1 MR3 (32) CLA Register Set CLA Configuration Registers CLA Configuration Registers: CSM and EALLOW Protected Main CPU has Read and Write Access Interrupt/Task Control MIFR: Flag MICLR: Clear MIFRC: Force MIOVF: Overflow flag MICLROVF: Overflow clear Configuration and Control MEMCFG: Memory config MCTL: CLA control MIER: Interrupt enable/disable MIRUN: Which task is running Interrupt/Task Source Selection MPISRCSEL1: Task1: ADCINT1 or EPWM1_INT Task2: ADCINT2 or EPWM2_INT …. Task7: ADCINT7 or EPWM7_INT Task8: ADCINT8 or CPU Timer 0 MPISRCSEL1 MEMCFG MCTL Eight Interrupt (Task) Vectors MVECT1 to MVECT8 Offset from the start of CLA Program Memory to the beginning of the task CLA Execution Registers: CSM Protected Main CPU has Read Only Access CLA Execution Registers MSTF: Status Register Zero, negative, overflow, underflow Rounding mode RPC: Return PC MEALLOW Four 32-bit Result Registers MR0 – MR3 MSTF (32) MPC: 12-bit Program Counter Offset from the start of CLA program memory Indicates instruction in the D2 phase Two 16-bit Auxiliary Registers MAR0, MAR1 Used for indirect addressing MPC
No Task Pending? (MIFR) Task Enabled? (MIER) No Yes Yes No Task Request ? Clear MIFR.x bit Set MIRUN.x bit MPC == MVECTx Run CLA Yes MIFR bit Set? No End of Task? MSTOP Set MIFR bit (Task Pending) No Yes Set MIOVF Bit (Overflow Flagged) Yes Clear MIRUN.x bit Task x Interrupt to PIE CLA Execution Flow Task request is via software or interrupt assigned in MPISRCSEL1: Task1: ADCINT1 or EPWM1_INT Task2: ADCINT2 or EPWM2_INT … Task7: ADCINT7 or EPWM7_INT Task8: ADCINT8 or CPU Timer 0 x = Highest priority task both enabled and pending Priority Task1: Highest . . . Task8: Lowest The task runs to completion (No task nesting) The main CPU continues code execution in parallel with the CLA When a task completes a task-specific interrupt is sent to the PIE Note: Software task requests will not set MIOVF
Piccolo (2803x) CLA Memory and Register Access Data RAMs RAM0 RAM1 Message RAMs CPU to CLA CLA to CPU Registers ADC Result ePWM HRPWM COMP CLA Program Memory CLA Program Memory: - Mapped to CPU program and data space at reset - CLA code must be even aligned (all instructions are 32-bits) - 4K x 16 (2048 CLA instructions), single cycle CLA Data Memory: - Two blocks: RAM0 and RAM1. 1K x 16 each, single cycle - Mapped to CPU program and data space at reset - Each block can be independently mapped to CLA data space Message RAMs: - Used to pass data between the CLA and CPU CPU to CLA message RAM (Ignores CLA writes) CLA to CPU message RAM (Ignores CPU writes) - Always mapped to both CPU and CLA memory space - 128 x 16 each, single cycle Registers the CLA can Directly Access: - ePWM + HRPWM, Comparator and ADC Result registers - CLA MEALLOW protects EALLOW registers from CLA writes
Session Agenda Introduction: What is it? Why is it? Architecture: Floating-Point Format, Tasks, CLA Execution Flow, Time Slicing, Register Set, Program and Data Bus, Memory and Register Access Instructions: Format, Addressing Modes, Types of Instructions Parallel Instructions, Floating-Point Flags Pipeline: Pipeline Stages, Affects on Instructions CLA Compared to C28x+FPU CLA in a Control System: Code Partitioning,“Just in Time” ADC Sampling Code Development and Debug: Anatomy of CLA Code, Initialization, Code Debug
CLA Instructions Source Operands Destination • Same instruction format as the C28x and C28x+FPU Destination operand is always on the left Fixed Point: MPY ACC, T, loc16 Floating Point: MPYF32 R0H, R1H, R2H CLA: MMPYF32 MR0, MR1, MR2 Same mnemonics as the C28x+FPU but with a leading “M” To enable support for CLA instructions on the use the switch: --cla_support=cla0 (C2800 codegen tools v5.2.x or later)
CLA Addressing Modes CLA has only two addressing modes: • Both modes can access the low 64k of memory which includes: • All of the CLA data space • Both message RAMs • Shared peripheral registers • No stack pointer or data page pointer Direct Addressing: Encodes the 16-bit Address of the Variable: MMOV32 MR1, @_Var1 Indirect Addressing with 16-bit Post Increment Syntax: MAR0[#imm16]++ MAR1[#imm16]++ Uses the address in MAR0 or MAR1 to access memory After the read or write MAR0/MAR1 is incremented by #Imm16 MMOV32 MR1, MAR0[-2]++ ; Load MR1 with what MAR0 points to ; & post increment MAR0 by -2
Types of CLA Instructions * Number of cycles varies based on how many of the delay slots can be used up
Parallel Instructions Single instruction Single opcode Performs 2 operations Example: Add + parallel store Parallel bars indicate a parallel instruction • MADDF32 MR3, MR3, MR1 • || MMOV32 @_Var, MR3 Both Operations Complete in a Single Cycle!
Look for the next presentation Introduction to the C2000 Control Law Accelerator Part 2 Thank you!