200 likes | 338 Views
Reliability-Aware OS Support for FPGA-Based Systems. M. Kandemir, G. Chen, and F. Li Department of Computer Science & Engineering The Pennsylvania State University, USA. Outline. Introduction Background knowledge Improving reliability by duplicating tasks Experimental results
E N D
Reliability-Aware OS Support for FPGA-Based Systems M. Kandemir, G. Chen, and F. Li Department of Computer Science & Engineering The Pennsylvania State University, USA 224/MAPLD 2004
Outline • Introduction • Background knowledge • Improving reliability by duplicating tasks • Experimental results • Ongoing work and conclusion 224/MAPLD 2004
Acronyms • FPGA: Field Programmable Gate Array • CLB: Configurable Logic Block • STG: Subtask Graph 224/MAPLD 2004
Introduction • FPGA combines the flexibility of software and high performance of ASICs • Prior research mostly addressed architecture design and programming and compilation issues • Increasing soft-error rates make reliability an important factor in system design • Our focus: Reliability-aware OS scheduling for FGPA based systems 224/MAPLD 2004
Configurable Logic Block CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB The Reconfigurable System Process 2 Process 3 Process 1 a 6X8 CLB array the interconnects and input-output blocks are omitted 224/MAPLD 2004
Improving Reliability • Traditionally, OS-scheduler schedules parallel executions of multiple processes to maximize FPGA space utilization • Data dependencies between different processes might prevent the full utilization of FPGA space • Our approach utilizes the available FPGA space to duplicate processes and improve reliability 224/MAPLD 2004
CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB Duplicating Processes Process 2 Duplicate of Process 1 Process 3 Process 1 Duplicate of Process 3 224/MAPLD 2004
Issues in Duplicating Processes • Tasks (processes) have different criticality • Each task may require a different amount of FPGA space • Duplications can cause performance degradation • We use a QoS parameter to indicate the maximum tolerable performance degradation • A checker task is scheduled for each duplicated task to check the outputs of the primary task and the duplicate 224/MAPLD 2004
Vi Vj Each process to be scheduled is presented by a subtask graph Each node represents a process code portion (subtask) that will be executed in a single quantum of time once it gets scheduled. The jth node of process i is denoted as STGij Indicates a data or control dependence from vi to vj Subtask Graph (STG) 224/MAPLD 2004
Vi Vj Since our processes are extracted from the same application, there might be data dependences between different processes Subtask Graph 224/MAPLD 2004
Our Approach • Task duplication under QoS guarantees • Current implementation focuses only on error detection Annotation step QoS specification step Task identification step Task ranking step Scheduling step 224/MAPLD 2004
Annotation Step • The application programmer indicates which data structure are critical from the reliability view point using annotations Annotation step QoS specification step Task identification step Task ranking step Scheduling step 224/MAPLD 2004
QoS Specification Step • The application programmer also indicates the tolerable latency during application execution as a result of the reliability provided Annotation step QoS specification step Task identification step Task ranking step Scheduling step 224/MAPLD 2004
Task Identification Step • An automatic application code analyzer analyzes the source code, and identifies tasks Annotation step QoS specification step Task identification step Task ranking step Scheduling step 224/MAPLD 2004
Task Ranking Step • Based on how these tasks operate on critical data, they are ranked • They are ordered from the most important task to the least important one Annotation step QoS specification step Task identification step Task ranking step Scheduling step 224/MAPLD 2004
Scheduling Step • The OS scheduler is modified such that whenever there is opportunity, the OS duplicates tasks that run on FPGA device • Whenever the scheduler predicts the QoS limit is about to be reached, it stops duplicating the tasks Annotation step QoS specification step Task identification step Task ranking step Scheduling step 224/MAPLD 2004
Experimental Setup • An error injection module injects errors with a specified probability • Two real-life embedded applications: encr and usonic • The performance of our reliability-aware scheduler is compared with that of a normal Short-Job-First scheduler • Tolerate at most 5% performance degradation • Rank tasks according to the frequency of accesses to critical data • Fatal errors: Errors that would lead to crash of the application 224/MAPLD 2004
Experimental Data 224/MAPLD 2004
Ongoing Work • Experimenting with a diverse set of benchmarks • Implementing task duplication within other types of OS schedulers such as First-Come-First-Server 224/MAPLD 2004
Conclusion • The OS scheduler tries to provide reliability through task duplication under QoS guarantees • Improving FPGA space utilization by duplicating for reliability • Providing reliability for critical tasks first • Catching most fatal errors 224/MAPLD 2004