Partizionamento HW/SW nell'implementazione di sistemi real-time su FPGA con softcore

Partizionamento HW/SW nell'implementazione di sistemi real-time su FPGA con softcore

Outline • Intro & Motivation • Model • Algorithms • Experiments

Intro and Motivation • Past work on design optimization for single-processor scheduling • Realizing that the schedulability condition can be viewed as a feasibility region in the domain of the design variables • Realizing that such region is convex for EDF under reasonable assumptions • Availability of Softcores for FPGAs • NIOS II for Altera • Co-design problem • a functionality can be implemented in HW (inside the FPGA) in SW (inside or outside the FPGA) and executed by one or more (How many?) Softcores.

Motivation • Start from some system Model (Simulink) • Explore different HW design options (0-1-2-4-… NIOS) • For each design option find optimal design configuration by means of convex linear optimization • HW implementation is subject to area constraints • SW implementation is subject to schedulability constraints

HW (area) Constraints • Models available: • Single-dimension • Condition linear bound slotted linear

HW (area) Constraints • Models available: • 2-dimensions cutting stock problem • Complex, more realistic and extremely well-studied problem (real-world implications) • linear bound solutions can be found from operations research literature !

Reality of FPGAs (additional resource constr.)

Schedulability constraints • EDF (or L&L sufficient) bound • How realistic is it? • Implementations of FP and EDF on NIOS exist • How about deadline=periods, independence and so on?

The Model • Starting point: Simulink model

The Model • implementation of a Simulink model • HW implementation: market tools exist (Celoxica) for implementing Simulink blocks in FPGA.

The Model • SW implementation: market tools exist RT-Workshop+embedded coder (Mathworks) or TergetLink (Dspace) for implementing Simulink blocks as a set of concurrent threads. • Threads inherit the sampling period of the blocks (periodic model) • No overrun is permitted (deadlines=periods) • Communication is by switched buffers (asynchronous, tasks are independent) • Of course code generation and switched buffers are not commercially available for EDF but there is nothing that prevents their implementation

The Model • FPGA = rectangular area of Logic Elements (Les). All dimensions will be in terms of Les • FPGA height = H • FPGA width = W • Assume homogeneous bidimensional model of FPGA (array of Les) • k Softcores CPUl l=1..k are implemented in FPGA: each core requires an area slsh (k=0, 1, 2 ..) sh H sw W

The Model • System model = network of blocks • V = {F1, F2, … Fn} is the set of functional block • A block Fi can be implemented in HW or SW. according to the value of sil {0,1}. sil=1 if block Fi is executed in SW upon CPUl. If not executed in SW a block MUST be implemented in HW. • If implemented in HW, a block requires an area wi hi • If implemented in SW, a block Fi has a worst case comp. time gi and a period of execution ti. (HW implementation has gi 0) ui = gi/ti

The Model • If implemented in SW, a block is executed in the context of a thread with the same period. • mi,j =1 if Fi is mapped for execution in tj and 0 otherwise (these are not optimization variables but constants!) • Schedulability constraint (for each NIOS)

Results to be exploited • Cutting Stock approximate (linear) solution: Level packing (Lodi) • pack the items in row forming levels • the first level is the bottom of the bin, the second level is built on top of the first and so on … • In each level, the leftmost item is the tallest one • The bottom level is the tallest one • Items are sorted and renunmbered by non-increasing hi values.

Results to be exploited • An example: • there are n potential levels (one for each initializing block)

Results to be exploited • Variables: • yi = 1 if item i initializes level i and 0 otherwise • Objective (original): • minimize the height of the required rectangle

Results to be exploited • Constraints (original): • xij , i {1.. n-1}, j>i, xij=1 if item j is packed in level i, 0 otherwise • Each item is packed exactly once • Width constraint

Reusing Results • These results can be reused as follows: • The original objective can be retained or it can become a constraint

Results to be exploited • The existence of a packing (Each item is packed exactly once) • Becomes … • Each item is packed exactly once or it is executed on a CPU

Results to be exploited • The width constraint is retained … • A schedulability constraint must be added for eack CPU • Options: • Minimize height with the utilization constraint • Minimize utilization with height constraint

Problem • The available area is not squared! • The area necessary for implementing the k CPUs must be considered • Solution: • start with the 1-CPU case: there are two possible partitionings W sh sw H H-sh W-sw • Duplicate all packing variables (the complexity of the problem is correspondingly increased)

Problem • For the k-CPU case additional assumptions are required (CPUs are packed by rows, columns, or …) W W sh sw H H H - k sh H - 2 sh W - k sw W - 2 sw

Experimenting with GPLK • Demo …

Partizionamento HW/SW nell'implementazione di sistemi real-time su FPGA con softcore