810 likes | 991 Views
Precision Timed Embedded Systems Using TickPAD Memory. Matthew M Y Kuo* Partha S Roop* Sidharta Andalam † Nitish Patel* *University of Auckland, New Zealand † TUM CREATE, Singapore. Introduction. Hard real time systems Need to meet real time deadlines
E N D
Precision Timed Embedded Systems Using TickPAD Memory Matthew M Y Kuo* Partha S Roop* Sidharta Andalam† Nitish Patel* *University of Auckland, New Zealand †TUM CREATE, Singapore
Introduction • Hard real time systems • Need to meet real time deadlines • Catastrophic events may occur when missed • Synchronous execution approach • Good for hard real time systems • Deterministic • Reactive • Aids static timing analysis • Well bounded programs • No unbounded loops or recursions
Synchronous Languages • Executes in logical time • Ticks • Sample input → computation → emit output • Synchronous hypothesis • Tick are instantaneous • Assumes system is executes infinitely fast • System is faster than environment response • Worst case reaction time • Time between two logical ticks • Languages • Esterel • Scade • PRET-C • Extension to C
Synchronous Languages • Executes in logical time • Ticks • Sample input → computation → emit output • Synchronous hypothesis • Tick are instantaneous • Assumes system is executes infinitely fast • System is faster than environment response • Worst case reaction time • Time between two logical ticks • Languages • Esterel • Scade • PRET-C • Extension to C
PRET-C • Light-weight multithreading in C • Provides thread safe memory access • C extension implemented as C macros
Introduction • Practical System require larger memory • Not all applications fit on on-chip memory • Require memory hierarchy • Processor memory gap [1] Hennessy, John L., and David A. Patterson. Computer Architecture: A Quantitative Approach. San Francisco, CA: Morgan Kaufmann, 2011.
Introduction • Traditional approaches • Caches • Scratchpads • However, • Scant research for memory architectures tailored for synchronous execution and concurrency.
Caches CPU Main Memory
Caches CPU Main Memory • Traditionally Caches • Small fast piece of memory • Temporal locality • Spatial locality • Hardware Controlled • Replacement policy Cache
Caches CPU Main Memory • Hard real time systems • Needs to model the architecture • Compute the WCRT • Caches models • Trade off between length of computation time and tightness • Very tight worse case estimate is not scalable Cache
Scratchpad CPU Main Memory • Scratchpad Memory (SPM) • Software controlled • Statically allocated • Statically or dynamically loaded • Requires an allocation algorithm • e.g. ILP, Greedy SPM
Scratchpad CPU Main Memory • Hard real time systems • Easy to compute tight the WCRT • Reduces the worst case performance • Balance between amount of reload points and overheads • May perform worst than cache in the worst case performance SPM
TickPAD CPU Main Memory Cache SPM • Good at overall performance • Hardware controlled • Good at worst case performance • Easy for fast and tight static analysis
TickPAD CPU Main Memory TPM Cache SPM • Good at overall performance • Hardware controlled • Good at worst case performance • Easy for fast and tight static analysis
TickPAD CPU Main Memory TPM • TickPAD Memory • TickPAD - Tick Precise Allocation Device • Memory controller • Hybrid between caches and scratchpads • Hardware controlled features • Static software allocation • Tailored for synchronous languages • Instruction memory
PRET-C main int main() { init(); PAR(t1,t2,t3); ... } void thread t1() { compute; EOT; compute; EOT; } t1 t3 t2
PRET-C main Computation int main() { init(); PAR(t1,t2,t3); ... } void thread t1() { compute; EOT; compute; EOT; } t1 t3 t2
PRET-C main Spawn children threads int main() { init(); PAR(t1,t2,t3); ... } void thread t1() { compute; EOT; compute; EOT; } t1 t3 t2
PRET-C main End of tick – Synchronization boundaries int main() { init(); PAR(t1,t2,t3); ... } void thread t1() { compute; EOT; compute; EOT; } t1 t3 t2
PRET-C main Child thread terminate int main() { init(); PAR(t1,t2,t3); ... } void thread t1() { compute; EOT; compute; EOT; } t1 t3 t2
PRET-C main Main thread resume int main() { init(); PAR(t1,t2,t3); ... } void thread t1() { compute; EOT; compute; EOT; } t1 t3 t2
PRET-C Execution main t1 t3 t2 Sample inputs Time
PRET-C Execution main t1 t3 t2 main Time
PRET-C Execution main t1 t3 t2 main t1 Time
PRET-C Execution main t1 t3 t2 main t1 t2 Time
PRET-C Execution main t1 t3 t2 main t1 t2 t2 Time
PRET-C Execution main t1 t3 t2 Emit Outputs main t1 t2 t2 Time
PRET-C Execution main t1 t3 t2 1 tick (reaction time) main t1 t2 t2 Time
PRET-C Execution main t1 t3 t2 local tick main t1 t2 t2 Time
Assumptions 4 Instructions 1 Cache Line Takes 1 burst transfer from main memory Cache miss, takes 38 clock cycles [2] buffer Buffers are 1 cache line in size Each instructions takes 2 cycles to execute 2. J. Whitham and N. Audsley. The Scratchpad Memory Management Unit for Microblaze: Implémentation, Testing, and Case Study. Technical Report YCS-2009-439, University of York, 2009.
TickPAD - Overview • Spatial memory pipeline • To accelerate linear code
TickPAD - Overview • Associative loop memory • For predictable temporal locality • Statically allocated and Dynamically loaded
TickPAD - Overview • Tick address queue • Stores the resumptions address of active threads
TickPAD - Overview • Tick instruction buffer • Stores the instructions at the resumption of the next active thread • To reduce context switching overhead at state/tick boundaries
TickPAD - Overview • Command table • Stores a set of commands to be executed by the TickPAD controller.
TickPAD - Overview • Command buffer • A buffer to store operands fetched from main memory • Command requiring 2+ operands
Spatial Memory Pipeline • Cache – on miss • Fetches from main memory on to cache • First instruction miss, subsequence instructions on that line hits • Requires history of cache needed for timing analysis • Scratchpad – unallocated • Executes from main memory • Miss cost for all instructions • Simple timing analysis
Spatial Memory Pipeline • Memory controller • Single line buffer • Simple analysis • Analyse previous instruction • First instruction miss, subsequence instructions on that line hits Main Memory CPU
Spatial Memory Pipeline • Computation required many lines of instructions • Exploit spatial locality • Predictability prefetch the next line of instructions • Add another buffer
Spatial Memory Pipeline • To preserve determinism • Prefetch only active if no branch