890 likes | 897 Views
This project focuses on the synthesis of embedded software for reactive systems. It involves the development of a formal design environment using methodologies, formal methods, and modeling techniques. Participants from various institutions and companies contribute to the project.
E N D
Synthesis of Embedded Software for Reactive Systems Jordi Cortadella Universitat Politècnica de Catalunya, Barcelona Joint work with: Robert Clarisó, Alex Kondratyev, Luciano Lavagno, Claudio Passerone and Yosinori Watanabe (UPC, Cadence Berkeley Labs, Politecnico di Torino)
System designer(Set-top box) Contents provider (e.g. TV broadcast company) MPEG2Engine µP custom Coms Graphics DSP System Design Platform provider (e.g. Semiconductor company) MethodologyforPlatform-basedSystem Design External IP provider(e.g. software modem) Internal IP provider (e.g. MPEG2 engine) : Requirements specification, Testbench : Functional and performance models (with agreed interfaces and abstraction levels)
Metropolis Project etropolis • Goal: develop a formal design environment • Design methodologies: abstraction levels, design problem formulations • EDA: formal methods for automatic synthesis and verification, a modeling mechanism: heterogeneous semantics, concurrency • Participants: • UC Berkeley (USA): methodologies, modeling, formal methods • CMU (USA): formal methods • Politecnico di Torino (Italy): modeling, formal methods • Universitat Politècnica de Catalunya (Spain): modeling, formal methods • Cadence Berkeley Labs (USA): methodologies, modeling, formal methods • Philips (Netherlands): methodologies (multi-media) • Nokia (USA, Finland): methodologies (wireless communication) • BWRC (USA): methodologies (wireless communication) • BMW (USA): methodologies (fault-tolerant automotive controls) • Intel (USA): methodologies (microprocessors)
Function Specification Metropolis Formal Methods: Synthesis/Refinement Metropolis Formal Methods: Analysis/Verification Metropolis Framework Architecture Specification Design Constraints • Metropolis Infrastructure • Design methodology • Meta model of computation • Base tools • - Design imports • - Meta model compiler • - Simulation
Outline • The problem • Synthesis of concurrent specificationsfor sequential processors • Compiler optimizations across processes • Previous work: Dataflow networks • Static scheduling of SDF networks • Code and data size optimization • Quasi-Static Scheduling of process networks • Petri net representation of process networks • Scheduling and code generation • Open problems
TSENSOR HSENSOR TEMP FILTER HUMIDITY FILTER TDATA HDATA CONTROLLER AC-on DRYER-on ALARM-on Environmental controller Temperature Humidity ENVIRONMENTAL CONTROLLER AC Dehumidifier Alarm
Environmental controller TEMP-FILTER float sample, last; last = 0; forever { sample = READ(TSENSOR); if (|sample - last| > DIF) { last = sample; WRITE(TDATA, sample); } } TSENSOR HSENSOR TEMP FILTER HUMIDITY FILTER TDATA HDATA CONTROLLER AC-on DRYER-on ALARM-on
Environmental controller TEMP-FILTER float sample, last; last = 0; forever { sample = READ(TSENSOR); if (|sample - last| > DIF) { last = sample; WRITE(TDATA, sample); } } TSENSOR HSENSOR TEMP FILTER HUMIDITY FILTER TDATA HDATA HUMIDITY-FILTER float h, max; forever { h = READ(HSENSOR); if (h > MAX) WRITE(HDATA, h); } CONTROLLER AC-on DRYER-on ALARM-on
Environmental controller CONTROLLER float tdata, hdata; forever { select(TDATA,HDATA) { case TDATA: tdata = READ(TDATA); if (tdata > TFIRE) WRITE(ALARM-on,10); else if (tdata > TMAX) WRITE(AC-on, tdata-TMAX); case HDATA: hdata = READ(HDATA); if (hdata > HMAX) WRITE(DRYER-on, 5); } } TSENSOR HSENSOR TEMP FILTER HUMIDITY FILTER TDATA HDATA CONTROLLER AC-on DRYER-on ALARM-on
TSENSOR HSENSOR TEMP FILTER HUMIDITY FILTER TDATA HDATA CONTROLLER AC-on DRYER-on ALARM-on Environ. Processes OS Tsensor T-FILTERwakes up Operating system T-FILTERexecutes T-FILTERsleeps Hsensor H-FILTERwakes up H-FILTERexecutes &sends datato HDATA H-FILTERsleeps CONTROLLERwakes up CONTROLLERexecutes &reads datafrom HDATA . . .
Instruction level Basic blocks Intra-procedural(across basic blocks) Inter-procedural Inter-process ? a = b*16 a = b >> 4 common subexpr.,copy propagation loop invariants,induction variables inline expansion,parameter propagation channel optimizations,OS overhead reduction Compiler optimizations Each optimization enables further optimizations at lower levels
Partial evaluation (example) Specification: subsets (n,k) = n! / (k! * (n-k)!) ________________________________________________ int subsets (int n, int k) { return fact(n) / (fact(k) * fact(n-k)); } int pairs (int n) { return subsets (n,2);} ... print (pairs(x+1)) ... ... print (pairs(5)) ... Partial evaluation (compiler optimizations)
Partial evaluation (example) Specification: subsets (n,k) = n! / (k! * (n-k)!) ________________________________________________ int subsets (int n, int k) { return fact(n) / (fact(k) * fact(n-k)); } int pairs (int n) { return subsets (n,2);} ... print ((x+1)*x / 2) ... ... print (pairs(5)) ... Partial evaluation (compiler optimizations)
Partial evaluation (example) Specification: subsets (n,k) = n! / (k! * (n-k)!) ________________________________________________ int subsets (int n, int k) { return fact(n) / (fact(k) * fact(n-k)); } int pairs (int n) { return subsets (n,2);} ... print ((x+1)*x / 2) ... ... print (10) ...
Inter-process partial evaluation forever { n = read (A); write (B,n); write (C, n-2); write (D, 2); } A forever { x = read (E); y = read (F); z = read (G); write (H, x/(y*z)); } n x! x! x! H pairs (n)
Inter-process partial evaluation forever { n = read (A); write (B,n); write (C, n-2); write (D, 2); } A forever { x = read (E); y = read (F); z = read (G); write (H, x/(y*z)); } x! x! x! H No chances for optimization
Inter-process partial evaluation forever { n = read (A); write (B,n); write (C, n-2); write (D, 2); } A forever { x = read (E); y = read (F); z = read (G); write (H, x/(y*z)); } x! x! x! H 2...2 2...2
Inter-process partial evaluation forever { n = read (A); write (B,n); write (C, n-2); write (G, 2); } A forever { x = read (E); y = read (F); z = read (G); write (H, x/(y*z)); } x! x! H 2...2 2...2
Inter-process partial evaluation forever { n = read (A); write (B,n); write (C, n-2); write (G, 2); } A forever { x = read (E); y = read (F); z = read (G); write (H, x/(y*z)); } x! x! H 2...2 2...2
Inter-process partial evaluation forever { n = read (A); write (B,n); write (C, n-2); write (G, *); } A forever { x = read (E); y = read (F); read (G); write (H, x/(y*2)); } x! x! H • Copy propagation across processes • Channel G only synchronizes (token available)
Inter-process partial evaluation forever { n = read (A); write (B,n); write (C, n-2); write (G, *); } A forever { x = read (E); y = read (F); read (G); write (H, x/(y*2)); } x! x! H By scheduling operations properly, FIFOs may become variables (one element per FIFO, at most)
Inter-process partial evaluation forever { n = read (A); v1 = n; v3 = n-2; A x = v2; y = v4; write (H, x/(y*2)); } x! v1 v2 x! v3 v4 H
Inter-process partial evaluation A forever { n = read (A); v1 = n; v2 = fact (v1); x = v2; v3 = n-2; v4 = fact (v3); y = v4; write (H, x/(y*2)); } H And now we can apply conventional compiler optimizations
Inter-process partial evaluation A forever { n = read (A); x = fact (n); y = fact (n-2); write (H, x/(y*2)); } H If some “clever” theorem prover could realize that fact(n) = n*(n-1)*fact(n-2) the following code could be derived ...
Inter-process partial evaluation forever { n = read (A); write (H,n*(n-1)/*2); } A H
Inter-process partial evaluation forever { n = read (A); write (B,n); write (C, n-2); write (D, 2); } A forever { x = read (E); y = read (F); z = read (G); write (H, x/(y*z)); } x! x! x! H This was the original specification of the system !
Inter-process partial evaluation forever { n = read (A); write (H,n*(n-1)/*2); } A H • This is the final implementation after inter-process optimization: • Only one process (no context switching overhead) • Channels substituted by variables (no communication overhead)
Goal: improve performance, code size power consumption, ... • Reduce operating system overhead • Reduce communication overhead • How?: Do as much as possible staticallyand automatically • Scheduling • Compiler optimizations Operating system TSENSOR HSENSOR TEMP FILTER HUMIDITY FILTER TDATA HDATA CONTROLLER AC-on DRYER-on ALARM-on
Outline • The problem • Synthesis of concurrent specifications • Compiler optimizations across processes • Previous work: Dataflow networks • Static scheduling of SDF networks • Code and data size optimization • Quasi-Static Scheduling of process networks • Petri net representation of process networks • Scheduling and code generation • Open problems
Dataflow networks • Powerful mechanism for data-dominated systems • (Often stateless) actors perform computation • Unbounded FIFOs perform communication via sequences of tokens carrying values • (matrix of) integer, float, fixed point • image of pixels, ….. • Determinacy: • unique output sequences given unique input sequences • Sufficient condition: blocking read (process cannot test input queues for emptiness)
Intuitive semantics • Example: FIR filter • single input sequence i(n) • single output sequence o(n) • o(n) = c1* i(n) + c2* i(n-1) i(-1) i c1 c2 + o
FFT 1024 1024 10 1 Examples of Dataflow actors • SDF: Static Dataflow: fixed number of input and output tokens • BDF: Boolean Dataflow control token determines number of consumed and produced tokens 1 + 1 1 T F select merge T F
Static scheduling of DF • Key property of DF networks: output sequences do not depend on firing sequence of actors (marked graphs) • SDF networks can be statically scheduled at compile-time • execute an actor when it is known to be fireable • no overhead due to sequencing of concurrency • static buffer sizing • Different schedules yield different • code size • buffer size • pipeline utilization
A B np nc Balance equations • Number of produced tokens must equal number of consumed tokens on every edge (channel) • Repetitions (or firing) vector vof schedule S: number of firings of each actor in S • v(A) ·np= v(B) ·nc must be satisfied for each edge nc np A B
Balance equations • Balance for each edge: • 3 v(A) - v(B) = 0 • v(B) - v(C) = 0 • 2 v(A) - v(C) = 0 • 2 v(A) - v(C) = 0 A 2 3 2 1 1 1 B C 1 1
3 -1 0 0 1 -1 2 0 -1 2 0 -1 A 2 3 M = 2 1 1 1 B C 1 1 Balance equations • M ·v = 0 iff S is periodic • Full rank (as in this case) • no non-zero solution • no periodic schedule (too many tokens accumulate on AB or BC)
2 -1 0 0 1 -1 2 0 -1 2 0 -1 A 2 2 M = 2 1 1 1 B C 1 1 Balance equations • Non-full rank • infinite solutions exist (linear space of dimension 1) • Any multiple of v = |1 2 2|T satisfies the balance equations • ABCBC and ABBCC are minimal valid schedules • ABABBCBCCC is non-minimal valid schedule
Static SDF scheduling • Main SDF scheduling theorem (Lee ‘86): • A connected SDF graph with n actors has a periodic schedule iff its topology matrix M has rank n-1 • If M has rank n-1 then there exists a unique smallest integer solution v to M v = 0
Deadlock ! Deadlock • If no actor is firable in a state before reaching the initial state, no valid schedule exists (Lee’86) A 1 1 2 2 B C 1 1 Schedule: (2A) B C
Deadlock • If no actor is firable in a state before reaching the initial state, no valid schedule exists (Lee’86) A 1 1 2 2 B C 1 1 Schedule: (2A) B C
Deadlock • If no actor is firable in a state before reaching the initial state, no valid schedule exists (Lee’86) A 1 1 2 2 B C 1 1 Schedule: (2A) B C
Deadlock • If no actor is firable in a state before reaching the initial state, no valid schedule exists (Lee’86) A 1 1 2 2 B C 1 1 Schedule: (2A) B C
Deadlock • If no actor is firable in a state before reaching the initial state, no valid schedule exists (Lee’86) A 1 1 2 2 B C 1 1 Schedule: (2A) B C
Deadlock • If no actor is firable in a state before reaching the initial state, no valid schedule exists (Lee’86) A 1 1 2 2 B C 1 1 Schedule: (2A) B C
Code size minimization • Assumptions (based on DSP architecture): • subroutine calls expensive • fixed iteration loops are cheap (“zero-overhead loops”) • Global optimum: single appearance schedule e.g. ABCBC A (2BC), ABBCC A (2B) (2C) • may or may not exist for an SDF graph… • buffer minimization relative to single appearance schedules (Bhattacharyya ‘94, Lauwereins ‘96, Murthy ‘97)
1 A 10 C D 1 10 B 10 1 Buffer size minimization • Assumption: no buffer sharing • Example: v = | 100 100 10 1|T • Valid SAS: (100 A) (100 B) (10 C) D • requires 210 units of buffer area • Better (factored) SAS: (10 (10 A) (10 B) C) D • requires 30 units of buffer area, but… • requires 21 loop initiations per period (instead of 3)
Scheduling more powerful DF • SDF is limited in modeling power • More general DF is too powerful • non-Static DF is Turing-complete (Buck ‘93) • bounded-memory scheduling is not always possible • Boolean Data Flow: Quasi-Static Scheduling of special “patterns” • if-then-else, repeat-until, do-while • Dynamic Data Flow: run-time scheduling • may run out of memory or deadlock at run time • Kahn Process Networks: quasi-static scheduling using Petri nets • conservative: schedulable network may be declared unschedulable
Outline • The problem • Synthesis of concurrent specifications • Compiler optimizations across processes • Previous work: Dataflow networks • Static scheduling of SDF networks • Code and data size optimization • Quasi-Static Scheduling of process networks • Petri net representation of process networks • Scheduling and code generation • Open problems
The problem • Given: a network of Kahn processes • Kahn process: sequential function + ports • communication: port-based, point-to-point, uni-directional, multi-rate • Find: a single sequential task • functionally equivalent to the originalnetwork (modulo concurrency) • threads driven by input stimuli(no OS intervention) TSENSOR HSENSOR TEMP FILTER HUMIDITY FILTER TDATA HDATA CONTROLLER AC-on DRYER-on ALARM-on
Event-driven threads Init() last = 0; Reset Tsensor() sample = READ(TSENSOR); if (|sample - last| > DIF) { last = sample; if (sample > TFIRE) WRITE(ALARM-on,10); else if (sample > TMAX) WRITE(AC-on,sample-TMAX); } Hsensor() h = READ(HSENSOR); if (h > MAX) WRITE(DRYER-on,5);