180 likes | 322 Views
Custom designed CPU architecture based on a hardware scheduler and independent pipeline registers - concept and theory of operation. 黃翔 Huang, Xiang 電機系 , Department of Electrical Engineering 國立成功大學 , National Cheng Kung University Tainan, Taiwan, R.O.C
E N D
Custom designed CPU architecture based on a hardware scheduler and independent pipeline registers - concept and theory of operation 黃翔 Huang, Xiang 電機系, Department of Electrical Engineering 國立成功大學, National Cheng Kung University Tainan, Taiwan, R.O.C (06)2757575 轉62400 轉2825, Office: 奇美樓, 6F,95602 Email: hxhxxhxxx@gmail.com Website address: http://j92a21b.ee.ncku.edu.tw/broad/index.html
Abstract • In classical RTOS based on software schedulers, overhead and jitter are a major problem when the number of tasks and the rate of context switches are high. • Increased values for those parameters over admissible values can lead to performance degradation, increased power consumption or even deadline misses. • If a part of the scheduling components or the entire functionality is moved from software to hardware, a significant improvement in task switching times can be achieved. • This paper presents a custom designed multi pipeline register architecture (MPRA) that has a dedicated hardware scheduler unit integrated into the CPU.
1. Introduction (1/3) • Depending on the target application where an embedded system will be used, we can differentiate two distinct types: • Hard real-time : all tasks must provide their result before the deadline or the system may go in an unrecoverable state and put all the functionality of the controlled system in danger. • Soft real-time : If a result from a task comes too late, it can be ignored by the other consumer tasks. • In real time systems, any additional overhead due to task switching may lead to increased jitter. • With the need to provide high resolution timing for such applications, software schedulers would usually go for an increased tick of the system timer. • This entails an increased overhead that may determine deadline missing. • The overhead of the CPU is directly influenced by the number of tasks, interrupts and the frequency of the task switches.
1. Introduction (2/3) • Fig. 1 shows the μC/OS-II context switch for the ARM920T architecture. • Normally this happens two times plus the time that is needed by the Interrupt Service Routine and the scheduler code. • The rest of the CPU load is reserved for the OS services, interrupts and context switching. • The proposed architecture minimizes the overhead of task switching by moving the scheduling mechanism from software to hardware, thus giving the processor full power and time for processing useful tasks instead of switching contexts. • The overhead of the CPU is directly influenced by the number of tasks, interrupts and the frequency of the task switches.
1. Introduction (3/3) SP-stack pointer, TCB-task control block, R1-R12- preocessor registers, CPSR-current program status register Fig. 1. Context switch sequence in the μC/OS-II software scheduler
2. Related Work (1/2) • [1] proposed an architecture based on a mixed hardware-software scheduler. • The hardware scheduler was in charge of handling the system timer tick events and the execution of a selected scheduling algorithm and • the software side was responsible of performing the context switches. • In [1], the hardware scheduler is an independent piece of hardware which is interconnected to the CPU using the address/data bus and an interrupt line. • They also proposed the idea of changing the scheduling algorithm on the fly with a time penalty. • Besides an increased performance in task switching time, dedicated hardware schedulers offer a significant reduction in power consumption [12].
2. Related Work (2/2) • [8], make a relevant performance test between systems that use a single processor, coprocessor or an external dedicated hardware unit for implementing the scheduler. • an additional overhead due to processor-coprocessor communication specific to software schedulers that are implemented in a separate CPU. • [11] implemented an architecture based on an external hardware scheduler that can also perform error detection in task execution flow.
3. Concept and Theory of Operation (1/3) • The architecture presented in this paper has the following characteristics: • RISC CPU, Von-Newmann memory architecture • task switching and scheduling performed in hardware with 0 CPU cycles overhead (next task is available for execution starting with the next CPU cycle) • stack free • multi pipeline • fully preemptive execution • fixed priority tasks • fixed priority interrupts • centralized offline scheduling • multitasking up to 128 tasks and interrupts • hardware activated tasks
3. Concept and Theory of Operation (2/3) • The proposed architecture uses the HAT model described by Gaitan. • We avoid any priority inversion situations where high priority tasks get interrupted by interrupts assigned to lower priority tasks. • In our model, the Interrupt Service Routine (ISR) becomes Interrupt Service Task (IST). • Each task has its own set of pipeline registers and its own context in the register file. • Task switching is performed by simply remapping the pipeline registers and the register file context that are specific to a task. • The component that is responsible for this remapping is the hardware scheduler. • In our design, the hardware scheduler is a component of the CPU itself and is directly responsible for the pipeline and register file page remapping. • Fig. 2 presents a block implementation of the design.
3. Concept and Theory of Operation (3/3) Fig. 2. Multi pipeline real-time architecture based on hardware schedulers. Green color is used to represent banked components.
4. Hardware Scheduler Engine (1/6) • Hardware Scheduler Engine is responsible for the high speed task context switching. • With real-time systems, it is possible that high urgency and high importance tasks are interrupted by interrupts assigned to less important tasks. • Sometimes we do not want this to happen. • This is the reason interrupts are treated as tasks here, and they are assigned priorities like any other task. • Fig. 3 shows a sample application. • Using the unified interrupt space model tasks and interrupts will be arranged in the following priority order: T1 T2 IGPIO IADC T3 ICAN IRS485 I868 T4 ITimer T5
4. Hardware Scheduler Engine (2/6) Fig. 3. Interrupts and tasks organization in a sample real-time industrial control application. GPIO-General Purpose Input Output, ADC-Analog to Digital Converter, CANController Area Network, RS485- Electrical standard for balanced serial data transmission described in the TIA/EIA-485 standard.
4. Hardware Scheduler Engine (3/6) • The sorted string will be used directly for configuring the Service Request Register (SRR) inside the hardware scheduler (Fig. 4). • The HAT and IST are directly activated by the inputs from the SRR. • Each of the tasks has its own comparator module that is fed from the system timer and can be used to provide a cyclic execution for each task that has been executed. • IST and asynchronous activated tasks do not subscribe for the timer cyclic event. • Whenever a task has to be executed, the correspondent bit in the SRR is set, and the HSE uses this information to decide what task follows to be executed. • The bits from the SRR register are already arranged in a descending priority order.
4. Hardware Scheduler Engine (4/6) Fig. 4. HSE organization. SRR-Service Request Register, MR-Mask Register, ISR-In Service Register, HAT-Hardware Activated Task, IST-Interrupt Service Task
4. Hardware Scheduler Engine (5/6) • The Instruction Set Architecture has a dedicated instruction that is used to signal execution information from the task. • The watchdog unit expects to receive this information from a running task only once in the time period specific to that task. • Table I presents the number of clock cycles needed to perform the context switch of the proposed design in comparison with other software schedulers and one commercial hardware scheduler. • As we can see here, the hardware schedulers are within one or two orders of magnitude faster.
4. Hardware Scheduler Engine (6/6) TABLE I Multi Pipeline Register Architecture context switch comparison with other schedulers
5. Register File • The register file has been specially designed to be controlled by the HSE. • The stack concept has been eliminated to totally remove the CPU to memory transactions during the task context saving. • Depending on the number of tasks that have to be saved, the time requested to complete the context switching operation will usually rise as well. • However, a similar mechanism has been implemented to support the calls, returns and argument passing between functions, but this is automatically handled by the register file with a zero time overhead.
Conclusion • We have seen in the introductory part that external hardware schedulers will add an extra overhead due to communication with the CPU over memory mapped IO or data busses. • This communication overhead is totally eliminated as the hardware scheduler is now becoming part of the CPU itself, and the control of the scheduler is done using special CPU instructions. • The special design of the register file from this project completely eliminates the need of saving the task context on the stack and thus eliminates the time that normally would have to be spent for stack data handling. • Software RTOS schedulers will increase the system timer frequency to ensure the real-time response of some tasks. • This usually comes with an extra overhead due to the increased number of context switches.