Microprocessors, Advanced Partitioning an Embedded System for Multicore Design

Microprocessors, AdvancedPartitioning an Embedded System for Multicore Design January 31, 2012Jack Ganssle

The Schedule Grows Faster ThanThe Code! IBM: person-yrs LOC/month 1 439 10 220 100 110 1000 55 COCOMO: Schedule = C * KLOCM (C and M are both > 1)

The Productivity Crash

Partitioning Code Fact: The easiest way to write great modules fast is to keep them small, with few dependencies. • Smaller functions have: • fewer bugs: bug rate is 2 to 6x lower • more likely to meet specs • done faster.

Eye Scans

We Turn Micros into Mainframes 8051 - Sensors - Interface 1,000,000 lines of code

Complexity is not linear with LOC

A Better Design I/O Code I/O Code supervisory code I/O Code

Small and cheap

Interprocessor Communications Serial/Encrypt I2C – a fast serial interface Rangefinder Main CPU Transaction Processing

National Airport’s Radar

The Sergeant York

The Tradeoff schedule quality features

Feature Management

Requirements Scrubbing = 73.6%!

Don’t Wait for Hardware • Build an I/O board that plugs into the PC • Simulate! • Virtualization – Virtutech, CoWare, VaST • Fitnesse: http://fitnesse.org/ • Catsrunner: • www.agilerules.com/projects/catsrunner/index.phtml

What About Multicore? CPU Memory Hundreds of nsec Tens of MHz

Then Came Prefetchers Queue CPU Memory Under 100 nsec Tens of MHz

Then Came Pipelines CPU Memory 30-50 nsec Tens of MHz Old: Fetch -> Decode -> Execute Pipelined: Fetch Decode Execute

Cache CPU speed CPU Cache Hundreds of MHz Memory 30-50 nsec

Cache Splits in Two CPU speed CPU L1 Cache Over 1 GHz L2 Cache 3-5 nsec Memory 30-50 nsec

SMP Symmetric Multiprocessing (SMP) – multiple identical CPUs working with a shared memory array. CPU Core CPU Core Shared memory

Amdahl’s Law for SMP Max speedup = Where: n = Number of processors f = Percent of operation that can not be parallelized

With an Infinite # CPUs Speedup Portion not parallelizable

Best Case: 66% Parallelizable Speedup Number of cores

But Memory is a Bottleneck! CPU Core CPU Core L1 Cache L1 Cache Typically 32KB Shared L2 Cache Typically 2-4MB Memory

And so is Comm CPU Core CPU Core CPU Core CPU Core L1 Cache L1 Cache L1 Cache L1 Cache Shared L2 Cache Shared L2 Cache Memory Then there’s the cache coherency problem

The Irony • Programs in L1 run blazingly fast • But why use a 32 bit CPU that can • address 4 GB on a 32 KB program?

A Colorimeter SMP Design - Read A/D - FIFO data - Do FIR - Calculate R - Display - Read A/D - FIFO data - Do FIR - Calculate R - Display - Read A/D - FIFO data - Do FIR - Calculate R - Display A/D A/D A/D Display Display Display Core R Core G Core B Common Bus Memory

ASMP Asymmetric Multiprocessing (ASMP or AMP) – Multiple CPUS, identical or not, each running a specific activity CPU Core CPU Core Memory Memory Some comm link

The Assembly Line

A More Natural Design via AMP A/D FIFO FIR A/D FIFO FIR A/D FIFO FIR Calc R Display Display Display Calc G Calc B

Another Assembly Line CPU CPU CPU Memory Memory Memory Memory CPU Data

Implications Multicore can give huge performance improvements. But for non-parallel problems they may not yield much improvement. It’s hard to impossible to predict speed improvements of most algorithms once they grow larger than L1 Many embedded apps are hugely non-parallelizable. In some cases AMP offers a better solution than SMP

Questions?

Microprocessors, Advanced Partitioning an Embedded System for Multicore Design

Microprocessors, Advanced Partitioning an Embedded System for Multicore Design

Presentation Transcript

Advanced Microprocessors

Design and Implementation of Embedded Microprocessors

An Overview of Some Microcontrollers/Microprocessors for Embedded Systems

Embedded System Design

Advanced Embedded Systems Design

Advanced Embedded Systems Design

Advanced Embedded Systems Design

Advanced Embedded Systems Design

EE5900 Advanced Embedded System For Smart Infrastructure

Embedded System Design

Advanced Embedded Systems Design

EE5900 Advanced Embedded System For Smart Infrastructure

Embedded System Design

EE5900 Advanced Embedded System For Smart Infrastructure

Embedded System Design for Automotive Applications

EE5900 Advanced Embedded System For Smart Infrastructure

Advanced Embedded Systems Design

Advanced Embedded Systems Design

Embedded System Design

EE5900 Advanced Embedded System For Smart Infrastructure

Advanced Embedded Systems Design