1 / 20

Reliability-Aware OS Support for FPGA-Based Systems

This paper explores a reliability-aware operating system scheduling approach for FPGA-based systems. By duplicating critical tasks, reliability is improved without compromising performance. The study introduces a novel method for task duplication under Quality of Service guarantees, ensuring maximum FPGA space utilization while minimizing performance degradation. Experimental simulations using error injection modules show promising results, indicating the potential of this approach in enhancing system reliability. Ongoing work includes expanding the study to diverse benchmarks and implementing task duplication strategies in various operating system schedulers for comprehensive evaluation.

greenl
Download Presentation

Reliability-Aware OS Support for FPGA-Based Systems

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Reliability-Aware OS Support for FPGA-Based Systems M. Kandemir, G. Chen, and F. Li Department of Computer Science & Engineering The Pennsylvania State University, USA 224/MAPLD 2004

  2. Outline • Introduction • Background knowledge • Improving reliability by duplicating tasks • Experimental results • Ongoing work and conclusion 224/MAPLD 2004

  3. Acronyms • FPGA: Field Programmable Gate Array • CLB: Configurable Logic Block • STG: Subtask Graph 224/MAPLD 2004

  4. Introduction • FPGA combines the flexibility of software and high performance of ASICs • Prior research mostly addressed architecture design and programming and compilation issues • Increasing soft-error rates make reliability an important factor in system design • Our focus: Reliability-aware OS scheduling for FGPA based systems 224/MAPLD 2004

  5. Configurable Logic Block CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB The Reconfigurable System Process 2 Process 3 Process 1 a 6X8 CLB array the interconnects and input-output blocks are omitted 224/MAPLD 2004

  6. Improving Reliability • Traditionally, OS-scheduler schedules parallel executions of multiple processes to maximize FPGA space utilization • Data dependencies between different processes might prevent the full utilization of FPGA space • Our approach utilizes the available FPGA space to duplicate processes and improve reliability 224/MAPLD 2004

  7. CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB Duplicating Processes Process 2 Duplicate of Process 1 Process 3 Process 1 Duplicate of Process 3 224/MAPLD 2004

  8. Issues in Duplicating Processes • Tasks (processes) have different criticality • Each task may require a different amount of FPGA space • Duplications can cause performance degradation • We use a QoS parameter to indicate the maximum tolerable performance degradation • A checker task is scheduled for each duplicated task to check the outputs of the primary task and the duplicate 224/MAPLD 2004

  9. Vi Vj Each process to be scheduled is presented by a subtask graph Each node represents a process code portion (subtask) that will be executed in a single quantum of time once it gets scheduled. The jth node of process i is denoted as STGij Indicates a data or control dependence from vi to vj Subtask Graph (STG) 224/MAPLD 2004

  10. Vi Vj Since our processes are extracted from the same application, there might be data dependences between different processes Subtask Graph 224/MAPLD 2004

  11. Our Approach • Task duplication under QoS guarantees • Current implementation focuses only on error detection Annotation step QoS specification step Task identification step Task ranking step Scheduling step 224/MAPLD 2004

  12. Annotation Step • The application programmer indicates which data structure are critical from the reliability view point using annotations Annotation step QoS specification step Task identification step Task ranking step Scheduling step 224/MAPLD 2004

  13. QoS Specification Step • The application programmer also indicates the tolerable latency during application execution as a result of the reliability provided Annotation step QoS specification step Task identification step Task ranking step Scheduling step 224/MAPLD 2004

  14. Task Identification Step • An automatic application code analyzer analyzes the source code, and identifies tasks Annotation step QoS specification step Task identification step Task ranking step Scheduling step 224/MAPLD 2004

  15. Task Ranking Step • Based on how these tasks operate on critical data, they are ranked • They are ordered from the most important task to the least important one Annotation step QoS specification step Task identification step Task ranking step Scheduling step 224/MAPLD 2004

  16. Scheduling Step • The OS scheduler is modified such that whenever there is opportunity, the OS duplicates tasks that run on FPGA device • Whenever the scheduler predicts the QoS limit is about to be reached, it stops duplicating the tasks Annotation step QoS specification step Task identification step Task ranking step Scheduling step 224/MAPLD 2004

  17. Experimental Setup • An error injection module injects errors with a specified probability • Two real-life embedded applications: encr and usonic • The performance of our reliability-aware scheduler is compared with that of a normal Short-Job-First scheduler • Tolerate at most 5% performance degradation • Rank tasks according to the frequency of accesses to critical data • Fatal errors: Errors that would lead to crash of the application 224/MAPLD 2004

  18. Experimental Data 224/MAPLD 2004

  19. Ongoing Work • Experimenting with a diverse set of benchmarks • Implementing task duplication within other types of OS schedulers such as First-Come-First-Server 224/MAPLD 2004

  20. Conclusion • The OS scheduler tries to provide reliability through task duplication under QoS guarantees • Improving FPGA space utilization by duplicating for reliability • Providing reliability for critical tasks first • Catching most fatal errors 224/MAPLD 2004

More Related