1 / 43

Instruction Cache Memory Issues in Real-Time Systems

Instruction Cache Memory Issues in Real-Time Systems. Licentiate dissertation Filip Sebek October 11 th , 2002 Opponent: Axel Jantsch (KTH) Examinator : Lars Wanhammar (LiTH). Outline of this dissertation. Seminar About this thesis ( Lennart Lindh )

agalia
Download Presentation

Instruction Cache Memory Issues in Real-Time Systems

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Instruction Cache Memory Issuesin Real-Time Systems Licentiate dissertation Filip Sebek October 11th, 2002 Opponent: Axel Jantsch (KTH) Examinator: Lars Wanhammar (LiTH)

  2. Outline of this dissertation • Seminar • About this thesis (Lennart Lindh) • Thesis presentation (Filip Sebek) • Comments and questions (Axel Jantsch and Filip Sebek) • Questions from the audience • Consideration (Lars Wanhammar, Axel Jantsch, and Lennart Lindh) • Festivity (?) at the department

  3. Organisation RT Systems Design Lab Comp. Architecture Lab Computer Science Lab Graduate Education Lic school Int’l MSc school Undergraduate Education

  4. Stefan Sjöberg: Design ASIC/FPGA with Top Down Design Flow and VHDL (RealFast ABB) Leif Enblom (ABB APR): Multiprocessor system for (ABB KK) Joakim Persson: Redundant System (ProTang, KK) Mohammed El Shobaki: System Monitoring/Debugging of S/Multiprocessor Systems Stefan Stjernen: IP Design (RealFast, Industrial ResearchSchool: Electronic Design ) Johan Stärner: Multiprocessor Architecture (KK) Tommy Klevin: Bus analyzer (RealFast) Filip Sebek: Instruction Cache Memory Issues in RTS Raimo Haukilahti KTH/MDH: Low-Power Techniques for HW-RTOS (KTH)

  5. The title and the questions • Title:Instruction Cache Memory Issuesin Real-Time Systems • Initial questions • How do I measure the cache-related preemption delay in a real-time system? • Is a cache memoryreally a problem in real-time systems?

  6. Automatic control – Real-time system • A real-time system must produce correct results in time • Examples • Air bag in action • An armored tank in movement shoots • Supertanker turns • Toaster • Get input – sample… • Compute – execute instructions • Actuate – control the process… • = Action!

  7. Real-time system implementation • Often as many ”small” cyclic programs – tasks or processes – that communicate with each other Alarm task Computation Sample task Actuate

  8. What Real-Time research is about: • Predicting execution time (of a task) • Difficult – Many parameters • Input data sensitive • Program design • Hardware dependant • Compiler dependent • Several methods • Scheduling tasks • static or dynamic • may allow pre-emption

  9. The title and the questions • Title:Instruction Cache Memory Issuesin Real-Time Systems • Initial questions • How do I measure the cache-related preemption delay in a real-time system? • Is a cache memoryreally a problem in real-time systems?

  10. Fast (~95%) I/O CPU CACHE MEM Slow (~5%) What is a cache memory? • Cache memories are faster than primary memory and keeps pace with CPU speed • Reduce congesting bus-traffic • Saves energy • Instruction fetch time becomes variable with caches; hit-time and miss-penalty

  11. How does a cache memory work? • Cache hit and cache miss • Locality • Temporal locality; • memory references close in time • loops and functions • Spatial locality; • memory references close in space • cache block and wide data bus int funk(int term) { int vector[SIZE]; int i, sum=0; for(i=0;i<SIZE;i++) { vector[i] +=term; sum +=vector[i]; } return sum; }

  12. The title and the questions • Title:Instruction Cache Memory Issuesin Real-Time Systems • Initial questions • How do I measure the cache-related preemption delay in a real-time system? • Is a cache memoryreally a problem in real-time systems?

  13. Cache memories and real-time • Cache memories make execution time variable • Sample, execute, actuate – action! • Sample, execute, actuate – action! • Sample, execute, actuate – action! Missed deadline? • Analysis is non-trivial; • cache contents depends on execution path • execution path depends on cache contents

  14. Predicting cache behavior • Avoidance and simplifications • Disable cache! • Special designed processors and caches • Static analysis • + no probe effects • + safe overestimation • - modern hardware • (Paper C) • Simulation • + simple • - simulator must model correctly • Real measurement • + measure on complex systems • - probe effect • (Papers A, B, D)

  15. The title and the questions • Title:Instruction Cache Memory Issuesin Real-Time Systems • Initial questions • How do I measure the cache-related preemption delay in a real-time system? • Is a cache memoryreally a problem in real-time systems?

  16. Measurement and probe effect • Most measurement affect the measured object when included or removed from the measured environment. • Examples: • A warm thermometer measures a glass of cold water • A computer monitoring system measures CPU load Reduce the intrusion (probe effect) to a minimum!

  17. Facts and Problems  Solutions

  18. The Built-in Performance Monitor • Exploit the performance monitor that is equipped on CPU • 4 registers on MPC750 • Counts events • L1 Instruction fetch miss • Branch miss • Processor clocks • Completed instructions • Completed Load/Stores • … NON-INTRUSIVE !

  19. SARA CPU Card

  20. SARA MP-system and MAMon

  21. My questions revised • Initial questions: • How do I measure the cache-related preemption delay in a real-time system? • Is a cache memory really a problem in real-time systems? • Modified questions: • Is there a simple(r) way to predict or measure cache misses in a real-time system? • Can an instruction cache cause a missed deadline when it is enabled? • How much is the cache-related pre-emption delay in absolute and relative terms?

  22. Outline of this presentation • Introduction • The cache memory and real-time • Measurement and probe effect • CPX2000 – “SARA system” • My own questions • Synthetic code generation • Analysis • Determine worst-case cache miss-ratio of a program • Measure instruction execution time w/wo cache • Measure cache related preemption delay • Conclusion and future work

  23. Current state in presentation: • We have 3 questions! • We have an experimental system! • We can measure on it with a small intrusion! • Q: Measure on what program?

  24. Code generation: size • Workbench • Standard benchmark? (Rhealstone, EEMBC etc.) • Measure worst-case situations • Synthetic code – size specific • One big loop • addis r3,r3,0x0000 = 4 bytes • Not representative code – no problem! • Swap out cache contents – find maximum cost • Code size measured in “cache size”

  25. Code generation: miss-ratio • One (out of several methods) • ”Play with spatial locality” • Method: Jump instructions breaks spatial locality • Requirements: code size  2×cache size • Result: 1/block size – 100% cache misses L1: nop(m) nop(h) nop(h) nop(h) L2: nop(m) nop(h) nop(h) nop(h) L1: J L2(m) n.u. n.u. n.u. L2: J L3(m) n.u. n.u. n.u. L1: nop(m) J L2(h) n.u. n.u. L2: nop(m) J L3(h) n.u. n.u. L1: nop(m) nop(h) J L2(h) n.u. L2: nop(m) nop(h) J L3(h) n.u. 25% 100% 50% 33%

  26. Analysis!

  27. Block size = 8 words i1 miss 1/4 i1 miss 1/6 i2 hit 1/4 i2 hit 1/6 i3 hit 1/4 i3 hit 1/6 i4 hit 1/4 i4 hit 1/6 i5 miss 1/2 i5 hit 1/6 beq 10 hit 1/2 beq 10 hit 1/6 i7 - - i7 - - i8 - - i8 - - i9 - - i9 - - i10 miss 1/3 i10 miss 1/4 i11 hit 1/3 i11 hit 1/4 i12 hit 1/3 i12 hit 1/4 jmp 18 miss 1/1 jmp 18 hit 1/4 i14 - - i14 - - i15 - - i15 - - i16 - - i16 - - 2/10 = 20% miss-ratio 1.Code interpretation: miss-ratio (reversed process to generate code with a fix miss-ratio) Block size = 4 words 4/10 = 40% miss-ratio

  28. i1 miss 1/4 i2 hit 1/4 i3 hit 1/4 miss 1/4 i4 hit 1/4 hit 1/4 i5 miss 1/2 hit 1/4 beq 10 hit 1/2 hit 1/4 i7 - - miss 1/2 i8 - - hit 1/2 i9 - - hit 1/4 i10 miss 1/3 hit 1/4 i11 hit 1/3 miss 1/4 i12 hit 1/3 miss 1/3 jmp 18 miss 1/1 hit 1/3 i14 - - hit 1/3 i15 - - miss 1/1 i16 - - hit 1/4 hit 1/4 hit 1/4 1.Code interpretation: miss-ratio (reversed process to generate code with a fix miss-ratio) Line size = 4 words

  29. 1.Code interpretation: miss-ratio • Determine the worst-case cache miss-ratio (WCCMR) • The highest frequency of misses possible for a program! • Depends on execution path (actually input data)  >  Miss% < Miss% • The WCCMR-path is the most energy consuming! • Optimize for • Speed or Size • Energy consumption

  30. 1.Key concepts bounding WCCMR • Spatial locality analysis • Determine instruction’s ”local miss-ratio” • Execution path analysis • Determine the weight of each basic block (loop dependent) • Search • Find the execution path with the highest cache miss-ratio

  31. 1.Result (finding WCCMR) ... if(a>b) { ... ... do{ ... }while(c>d); } else { ... ... while(e<3){ ... } } ... max !! (1) (2) (3) (4) (5) (6)

  32. Outline of this presentation • Introduction • The cache memory and real-time • Measurement and probe effect • CPX2000 – “SARA system” • My own questions • Synthetic code generation • Analysis • Determine worst-case cache miss-ratio of a program • Measure instruction execution time w/wo cache • Measure cache related preemption delay • Conclusion and future work

  33. 2.When is a cache memory beneficial? • On cache misses, the complete cache block is loaded • If cache block > instruction size  miss-penalty • A cache can reduce system performance! • High miss-ratio AND long miss-penalty • Experiment: • Generate code with fix miss-ratio • Measure time • Plot the average execution time

  34. 2.Threshold miss-ratio level (@CPX2000) Threshold-level (84%) Cache enabled Cache miss-ratio (%) Cache disabled Execution time (ns/instruction)

  35. I/O I/O CPU CPU CACHE CACHE MEM MEM 2.When is a cache memory beneficial? • Concluding question: • “When is instruction caching beneficial?” • Answer: • ”Always” (!!) • “No code is so jumpy” • “No missed deadlines” • “Safe!” • (New Q&As) • ”Why 84% miss?” • ”Low refill penalty” • ”Why?” • ”Burst refill!” Refill block HIT Request MISS! Request

  36. Outline of this presentation • Introduction • The cache memory and real-time • Measurement and probe effect • CPX2000 – “SARA system” • My own questions • Synthetic code generation • Analysis • Determine worst-case cache miss-ratio of a program • Measure instruction execution time w/wo cache • Measure cache related preemption delay • Conclusion and future work

  37. 3.Cache Related Preemption Delay Miss-ratio T1 T2 Time Miss-ratio T1 T2 T1 Time T2 preempts T1 T1 resumes • Extrinsic cache behavior - Task interference • Non-preemptive systems • Preemptive systems • Cache Related Preemption Delay - CRPD

  38. non-preempted preempted i4 (cont.) i4 iteration 1 iteration 2 i3 Miss-ratio T1 T2 T1 Time T2 preempts T1 T1 resumes 3.CRPDmax measurement

  39. non-preempted preempted OS:43-87 s 915399425 922751625 921219825 921592925 918791225 3.CRPDmax measurement CRPD = ((e - d) + (c - b)) – (b - a) = 195 500 ns = 195,5 s

  40. 3.CRPD (@CPX2000) 195,5 s CRPD (micro seconds) T1 Task size (cache size %)

  41. Conclusions and summary of results • The worst-case cache miss-ratio of a program can be identified to quantify the energy usage of the memory system • The CPX2000 system cannot miss any deadline because of an enabled instruction cache. • Synthetic workbenches can force a system into a worst-case state • The cache related preemption delay has been measured as a function of task size.

  42. Future Work • None! • Develope the analysis method of worst-case cache miss-ratio levels • by including temporal locality • Data caches • (Generate synthetic code) • Measure CRPD • Measure threshold miss-ratio level

  43. Acknowledgements • Research was funded by • KK-stiftelsen • Department of Computer Science and Engineering (Mälardalen University) • Thank you… • Supervisor Professor Dr. Ing. Lennart Lindh • All people at the Computer Architecture Lab • My family

More Related