HartOS - a Hardware Implemented RTOS for Hard Real-time Applications

HartOS - a Hardware Implemented RTOS for Hard Real-time Applications

HartOS - a Hardware Implemented RTOS for Hard Real-time Applications Anders Blaabjerg Lange, Karsten Holm Andersen Ulrik Pagh Schultz, Anders Stengaar...

439KB Sizes 0 Downloads 12 Views

HartOS - a Hardware Implemented RTOS for Hard Real-time Applications Anders Blaabjerg Lange, Karsten Holm Andersen Ulrik Pagh Schultz, Anders Stengaard Sørensen University of Southern Denmark, Faculty of Engineering The Maersk Mc-Kinney Moller Institute. Campusvej 55, DK-5230 Odense M, Denmark (e-mail: { anlan, kha, ups, anss } @mmmi.sdu.dk). Abstract: This paper introduces HartOS, a hardware-implemented, micro-kernel-structured RTOS targeted for hard real-time embedded applications running on FPGA based platforms. Historically hardware RTOSs have been too inflexible and have had limited features and resources. HartOS is designed to be flexible and supports most of the features normally found in a software-based RTOS. To ensure fast, low latency and jitter-free communication between the CPU and RTOS, HartOS uses the ARM AXI4-Stream bus recently supported by the MicroBlaze softcore processor. Compared to μC/OS-II, HartOS has up to 3 orders of magnitude less mean error in generating the correct period for a periodic task, and around 1 order of magnitude less jitter, while having up to 100% less overhead depending on the tick frequency. Keywords: Hardware RTOS, FPGA, AXI4-Stream, Fast Simplex link, Coprocessor, MicroBlaze 1. INTRODUCTION & MOTIVATION An RTOS is a critical component in the development of flexible, high quality and maintainable real-time embedded software. But utilizing a software-based RTOS can make it difficult to achieve hard real-time performance (Vetromille et al., 2006). A hardware based RTOS can be implemented to provide all the benefits and resources of standard software based RTOSs, but without the drawbacks and limitations (Song et al., 2007) (Maruyama et al., 2010).

interrupts be guaranteed not to interfere with high priority/critical tasks by handling and scheduling ISR’s in hardware. The RTOS memory footprint can be reduced as only a simple software-based API is required. Historically hardware RTOSs has nevertheless had limited success, the main issues of the earlier implementations are that they have been too inflexible (Kuacharoen et al., 2003), used inefficient, high latency communication interfaces (C. M. Ferreira, 2009) as well as lacked key features and resources.

The main weaknesses of software based RTOSs are that they suffer from computational overhead, jitter and large memory footprint (Kuacharoen et al., 2003) (Kohout et al., 2003). RTOS computational overhead is caused mainly by tick interrupt management, this gets worse with more tasks and higher tick frequencies (Vetromille et al., 2006), but also task scheduling, resource allocation and various other API functions take execution time from the tasks running on the CPU. API functions whose execution time depend on the system state, the number of tasks and resources in use etc. cause jitter. Also external asynchronous interrupt handling is a source of indeterminism if not used with care, as it can impose uncertainty on the ability of normal, perhaps high priority, tasks to complete within their deadline (Lindh, 1991).

We present the hardware-based RTOS HartOS, which significantly improves on the limitations of both softwareand previously hardware-based RTOSs. HartOS provides both high performance while retaining much of the flexibility and features found in a software-based RTOS. These improvements are enabled through a kernel design optimized for hardware implementation and by utilizing the newest tools and recent advances in FPGA hardware, such as the Xilinx EDK suite and AXI4-Stream bus.

By implementing task, resource and interrupt management in hardware it is possible to remove the RTOS computational overhead of scheduling tasks, handling tick/time and interrupt management (Kohout et al., 2003). API functions can in most cases be accelerated vastly by handling them in hardware (Nakano et al., 1995). The jitter caused by varying runtime of API functions can by removed as execution in hardware can be made completely deterministic. Likewise can external asynchronous

2. RELATED WORK

The following sections will describe: (2) Related work, (3) HartOS features and benefits, (4) HartOS design and implementation, (5) Experiments: description, results and evaluation and finally (6) Conclusion and Future work.

Since the beginning of the 90’s the design and development of specialized coprocessors, implementing for example hardware acceleration for time consuming scheduling algorithms or other real-time kernel primitives, and even full blown RTOSs has received growing attention from the scientific communities working within the area of computer architecture, real-time and operating systems.

Two trends in the design of hardware based RTOSs and scheduling accelerators have emerged during the last two decades. The minor of these two trends is the design of specialized processors like in the FASTCHART (Lindh, 1991), Silicon-RTOS (Murtaza et al., 2006) and H-Kernel (Song et al., 2007) projects. The common feature of these specialized processors is that they have two or more register sets, making them capable of executing a context switch in only one clock cycle. Both the FASTCHART and SiliconRTOS RTOSs use instruction set architecture (ISA) integration of the hardware based RTOS, whereas the HKernel uses a standard address/data bus for CPU to RTOS communication. Although the approach of the abovementioned projects yields the largest possible improvement in removing all RTOS overhead including context switch time, the approach is not the most popular. The largest trend including both limited and full RTOSs such as FASTHARD (Lindh, 1992), RTU (Adomat et al., 1996), Silicon TRON (Nakano et al., 1995), F-Timer (Parisoto et al., 1997), δ-Framework (Mooney and Blough, 2002), RTM (Kohout et al., 2003), OReK CoP (C. M. Ferreira, 2009), ARTESSO (Maruyama et al., 2010) and scheduling accelerators such as the Spring Scheduling Coprocessor (Stankovic and Ramamritham, 1991) all makes use of standard processors and all exclusively makes use of a standard address/data bus for CPU to RTOS communication. Only the Hardware RTOS by (Kuacharoen et al., 2003) is implemented to be able to make use of both: address/data bus, coprocessor (stream) bus and ISA integration. Looking at the results of just a few of the projects outlined above: The RTU (Lee et al., 2003) for example is reported to be up to 50% faster in executing a test application than the Atlanta RTOS, and in another experiment (Nordstrom et al., 2005) where μC/OS-II was ported to use the RTU, speedups of 7.5% to 370% for OS system calls were reported. The memory footprint was also reported to be between 24-38% of that of μC/OS-II. (Kohout et al., 2003) describes how results of 90% RTOS overhead decrease and 81% interrupt latency decrease using his Real-Time task Manager (RTM) is achieved. Although many hardware based RTOSs have emerged over the last two decades, few have been successfully commercialized or just become popular. Only the RTU has (to our knowledge) managed to successfully make it into a commercial product, the Sierra kernel (Nordstrom and Asplund, 2007), although it seems to have been discontinued by its last known owner Prevas AB. The Sierra kernel is a full hardware-based RTOS, it has few resources which limits its applicability mainly to small embedded systems. The believed main reasons for the limited success of hardware RTOSs is a combination of historical reasons such as slow chip-to-chip communication, inflexible implementations and too few features compared to software based RTOSs. Nevertheless with the increasing use of SoC, programmable logic becoming cheaper and the possibility of fast runtime reprogramming of logic hardware as supported in todays FPGA’s, the problems of communication speed, inflexibility and lacking features can be overcome. 3. HARTOS FEATURES & BENEFITS HartOS is designed to be flexible and support most of the features normally found in a standard software RTOS

directly in hardware. The API is developed for the MicroBlaze processor but can easily be ported to other platforms as well. In order to ensure a low latency and jitter-free communication infrastructure between the processor and the kernel, HartOS uses the coprocessor/stream interface of the MicroBlaze processor. Both the old FSL and the new AXI4-Stream interfaces are supported. The kernel is structured in three major parts, the first and mandatory part is the Task Manager which implements the necessary functions for basic task management. The other two major parts are the optional: Interrupt and Resource Managers. All HartOS kernel functions are implemented in hardware, only the context switch ISR (interrupt service routine) and an easy-to-use API are implemented in software. Interrupts are handled in hardware and scheduled on equal terms with all other tasks, so the CPU is only interrupted when a context switch actually needs to take place. The flexibility of the HartOS kernel is provided through graphical-tool based configuration of the VHDL generics ensuring the kernel easily can be configured in numerous ways, the scheduler which uses a Static Fixed Priority FCFS (First Come First Serve) algorithm (Liu, 2000) can run both as a pure time (tick) based or full event-based scheduler. The structure of the scheduler can be altered to optimize the system for speed (low latency) or area (low logic consumption) and any optimum in between. The task manager supports up to 16384 tasks, but is currently limited to 2048 tasks by the implementation of the resource manager. Tasks can be blocked not only for resources and interrupts, relative-time, but also periodictime increments. The latter makes it easy to create truly periodic tasks. Watchdogs, CPU load calculation, stack over- and underflow detection circuitry as well as statistics capture logic for each task and the dispatcher can be included in the kernel if desired. The resource manager supports up to 512 mutexes and 512 semaphores, the number of these resources can be individually configured. The semaphores support a maximum counting argument of up to 16 bits and can be individually configured with regard to maximum counting number. Mutexes are protected using a simplified version of the Stack Based Priority Ceiling protocol (Liu, 2000), ensuring deadlocks cannot occur and that tasks always acquires the needed mutexes when scheduled. The interrupt manager supports up to 32 external interrupts. Each interrupt input is individually configurable for high/low-level or rising/falling-edge triggering. HartOS handles interrupts as events that can trigger (unblock) a task, it is thus possible to have several tasks waiting for the same interrupt, and a task can also be waiting for several interrupts. Tasks blocking for an interrupt can specify a maximum wait time after which a timeout will occur and unblock the task. If the scheduler runs as an event-based scheduler all interrupt events will trigger a scheduler cycle in order to keep the interrupt latency low. The advantage of handling interrupts in this way is that they can be prioritized and scheduled like any other task in the system. This enables high priority tasks running control loops and digital filters to maintain a strict period without being affected by simultaneous handling of asynchronous interrupts, as any running task will only be interrupted

HartOS Register Map API Processor Micro code ROM

Interrupt Manager ICM ICM ICM

(Shared Memory) BR-MMU

CISC CORE

ICM

ICB RAM

ICB Updater

Interrupt Controller

TSR-MMU Strobe Generator

DIFU

DTU

Task Manager Scheduler Module TCM FSM RxFifo

TxFifo

AXIS-toFIFO

FIFO-toAXIS

AXIS Slave Interface

AXIS Master Interface

AXIS Master Interface

AXIS Slave Interface

TCM TCM TCM TCM

TCB Ram

Dispatcher Module FSM

IRQ

RTOS API

TCB Updater

TCM Selector

TCB Selector

Timer Module

Watchdog Module

Tick gen

Context switch Latency WD

Load calc

Time WD

Ressource Manager Semaphore Manager

ISR

Microblaze

Mutex Manager Mutex Controller

Semaphore Controller

SCB

MCB STCM STCM STCM STCM

Fig. 1. HartOS architecture (preempted) and context switched out if the task servicing an interrupt has a higher priority. The disadvantage to this method is the time required for the scheduler execution, but the latency can be controlled/minimized through the layout of the scheduler. The timing argument used by HartOS for all timeouts and for task blocking is 32 bits wide, so there is no need for software emulation of long blocking periods. Depending on the kernel configuration a system tick period in the sub microsecond area is possible, ensuring high granularity and hence also smaller jitter in the timing of each task. An additional feature of the HartOS kernel is the ability to stall (shutdown) the processor pipeline while no tasks are ready to run and in this way conserve power. By implementing task, resource and interrupt management in hardware, HartOS makes it possible to completely remove the overhead caused by task scheduling, tick/time, resource- and interrupt-management, existing in a normal software-based RTOS. API functions are accelerated vastly by handling them in hardware, only leaving a small API interface in software, this minimizes the memory footprint. Jitter is removed from RTOS function calls as the execution in hardware is completely deterministic, and the jitter/indeterminism introduced by external asynchronous interrupts is removed by handling them in hardware and scheduling interrupt service routines as tasks. For details regarding HartOS performance, interrupt latencies, API functions etc. please see (Lange, 2011a). 4. HARTOS DESIGN & IMPLEMENTATION Figure 1 shows a detailed architecture blockdiagram of the HartOS kernel. HartOS has been developed with a Micro-kernel structure in mind and therefore consist of different modules with clearly defined interfaces. This allows changes/additions to be integrated into the kernel with a minimum amount of work. The overall architecture is based on three modules responsible for the main features of the kernel: Task, Interrupt and Resource Management. The three main modules are interfaced to each other and the software API through a register map/shared memory space.

For HartOS it would be both convenient and easy to use a memory-mapped bus to interface to the shared memory space in the kernel, this is also the predominant approach of similar projects e.g. (Kohout et al., 2003) see (Lange, 2011a) for more details. However as the MicroBlaze only has one memory-mapped peripheral PLB/AXI bus, latencies added by arbitration between possible multiple masters, address decoding and data routing would add jitter and decrease performance. In order to get the highest possible performance in the communication link between the CPU and the kernel, a stream link/coprocessor interface such as the Xilinx Fast Simplex Link (FSL) or the ARM AXI4-Stream interface was the logical choice. These interfaces have been selected based on their high throughput, low latency and simplicity. A set of FIFO’s in the upand down-stream links ensure fast packet transfers and the possibility of running the HartOS kernel at another clock frequency than the MicroBlaze processor. In order to efficiently and flexibly interface the stream link infrastructure to the shared memory/register map, a special purpose CISC-based API Processor has been designed and implemented in order to execute the kernel functions invoked from the API. The reason for choosing a CISC architecture is that this directly can mimic the desired functionality of a specialized finite state machine for each API kernel function. The CISC-based design also makes it possible to add functionality that is transparent (i.e. runs in parallel) to the software API and application running on the CPU. The control and data paths will be in use for every kernel call, and additional functions can easily be added and existing modified by updating the microcode stored in the Microcode ROM, in this way the API↔Kernel interface is both efficient in terms of logic usage as well as easily extensible and reconfigurable. The Task Manager (shown in Figure 1) contains four submodules: the Dispatcher, Timer, Watchdog and Scheduler. The Watchdog module implements a traditional watchdog and an automatic (once configured) context switch latency watchdog, both watchdogs are optional. The load calculation module in the Timer module, which also contains the tick and system time generation logic, is also optional. The Dispatcher module takes care of generating an IRQ (Interrupt Request) whenever a new task is scheduled by the scheduler module. The dispatcher also generates the idle signal (used for calculating the CPU load and halting the CPU pipeline if enabled) and context switch latency count which is used by the Context Switch Latency Watchdog. The dispatcher module can also optionally generate context switch statistics data. As the scheduler like the rest of the kernel is implemented in hardware it is possible to update the Task Control Blocks (TCB’s) in parallel and in that way speed up the scheduling process. The scheduler implementation in HartOS is flexible and can be configured for the desired performance and logic/resource consumption. As it can implement both a scheduler that is: (A) fully sequential, (B) fully parallel and (C) a hybrid of the first two (see Figure 2), it is denoted a Hybrid Scheduler. A TCM (Task Control Module) contains a set of TCB’s which are updated and scheduled sequentially, the process inside each TCM is run in parallel and the results are sched-

TCB FSM

TCB TCB

TCB FSM

TCB

TCB FSM

TCB TCB

TCB FSM FSM

TCB FSM

TCB

TCB FSM

TCB TCB

A

TCB FSM TCB FSM

TCB

CMP_ SEL

CMP_ SEL

CMP_ SEL

TCB

TCB TCB

CMP_ SEL CMP_ SEL

FSM

TCB

CMP_ SEL

TCB

B

Mutex Contoller

TCM

TCB

CMP_ SEL

TCB

System Ceiling

7

Front Mutex ID

M1

Back Mutex ID

M3

CMP_ SEL

TCM

Mutex 3 (M3) FSM

C

Mutex 2 (M2)

Mutex 1 (M1)

Priority

10

Priority

15

Priority

Owning Task

5

Owning Task

3

Owning Task

7 4

Next Mutex

M3

Next Mutex

M3

Next Mutex

M2

Saved Ceiling

31

Saved Ceiling

10

Saved Ceiling

10

Fig. 2. Scheduler architectures Fig. 3. Mutex queue uled (selected) in a tree structured scheduler. The Hybrid Scheduler of HartOS supports 1 to 32 TCM’s and 1 to 512 TCB’s in each TCM, hence a maximum of 16384 tasks. The Interrupt Manager consist of three main components and two sub components. The main components are the Interrupt Controller, Strobe Generator and Interrupt Control Module’s (ICMs). The ICM in turn is made up of the two subcomponents: Interrupt Control Block (ICB) RAM and ICB Updater. There exists an ICB for each TCB in the Scheduler and likewise an ICM for each TCM. This enables Interrupt processing for each task to be done synchronous with the TCB update procedure of the Scheduler module and also provides all tasks the possibility of being triggered on any number of the implemented interrupt inputs. The Interrupt Controller provides up to 32 individually configurable interrupts inputs, that can be triggered on both edge and level with the polarity desired. The Strobe Generator sends an Interrupt Strobe event to the TCM FSM in the Task Manager when an interrupt is detected by the Interrupt Controller. This feature triggers a scheduler sequence in the Task Manager ensuring the shortest interrupt latency possible for the kernel. The Resource Manager implements Semaphores which can be individually configured to implement both Binary and Counting Semaphores, it also implements Mutexes that can be protected using the Stack Based Priority Ceiling protocol. The Stack Based Priority Ceiling protocol is selected because it prevents deadlocks, is simple to implement, ensures resources are always granted (tasks are never blocked when trying to acquire a resource) and it has the same worst case performance as the Basic Priority Ceiling protocol (Liu, 2000). The Mutex Manager consist of two sub components, the Mutex Controller and Mutex Control Block (MCB). As the Stack Based Priority Ceiling protocol will be used for protecting mutex access, tasks are guaranteed to always acquire the needed mutexes, because of this there is no need to support a pend queue for each mutex. A task must never block itself (self-suspend) if it holds a mutex as this would prevent tasks below the ceiling from running. Taking this requirement into account and also requiring tasks to release mutexes in the reverse order (enforced by the kernel) of which they are acquired, enables an efficient hardware based implementation to be made using the structure of a single linked list. Figure 3 illustrates how a queue of acquired mutexes will be represented. Mutexes are inserted and removed from the front of the queue. The Semaphore Manager consist of three sub components, the Semaphore Controller, Semaphore Control Block (SCB) and Semaphore Task Control Module (STCM). In order to ensure the necessary operations can be executed efficiently the semaphore pend queue is implemented as a doubly linked list.

5. EXPERIMENTS In order to compare the performance of HartOS to a standard software based RTOS two different test have been conducted. μC/OS-II has been selected to be the software RTOS to be tested against HartOS, the results can however be generalized for any software RTOS based on the same design principles as μC/OS-II. The first test is a synthetic benchmark inspired from the preemptive context switch test from the Tread Metric benchmarking suite by (Expresslogic, 2011). This test is designed to measure the RTOS overhead as a function of the tick frequency and number of tasks. To simulate the tasks doing work a simple delay loop is executed in each, the number of iterations used for the loop is indicated as Loop delay on figures illustrating the results. The second test is a simulation of an application controlling a three-joint robotic manipulator. The application consists of a background task calculating various statistical data based on the data output from three periodic tasks each responsible for controlling a joint. The first periodical tasks must run at 5 KHz and the other two at 1 KHz. This test is designed to measure the RTOSs ability to generate the correct period and how much CPU power that is left to the background calculation task. 5.1 Experiment results The results of the first test can be seen on Figure 4, 5 and 6. Figure 4 shows the RTOS CPU load as a function of the number of tasks at various tick frequencies with a fixed loop delay for both HartOS and μC/OS-II. For HartOS there was only was an average std. dev. of 0.0425% between the individual curves obtained for each tick period, so figure 4 only shows the average curve for HartOS. Figure 5 and 6 show the RTOS CPU load as a function of the number of tasks at a fixed tick frequency but with varying loop delays in order to show the impact from longer tasks. The results of the second test, can be seen from Figure 7 which shows the available CPU power to the calculation task, as well as from Figure 8 and 9 which respectively shows the mean period error and the peak to peak period jitter for each of the periodic tasks. Periodic Task 1 (PT1) has a target period of 200 μs while PT2 and PT3 has a target period of 1 ms. The experiments have been performed on a AXI-based MicroBlaze CPU, at 100MHz on a Xilinx SP605 kit equipped with a XC6LX45T FPGA. Task period measurements have been obtained using a LeCroy WaveRunner 204Xi 2 GHz 10GS/s Oscilloscope, each data-point on the curves is the mean of a dataset containing 10 samples each obtained over a 5 second period. The first sample in each dataset has

Fig. 4. HartOS and μC/OS-II CPU load vs. tick period

Fig. 7. Calculation task CPU usage/load

Fig. 5. HartOS CPU load vs. loop delay

Fig. 8. Periodic tasks: period error (mean)

Fig. 6. μC/OS-II CPU load vs. loop delay

Fig. 9. Periodic tasks: period jitter (Peak-Peak)

been censored to remove startup noise. A single sample for HartOS at 25ms tick period and two for μC/OS-II at 500μs tick period containing obvious false data has also been censored. The cause of these errors is believed to be a timing error/bug in the test code. For the HartOS CPU load measurements, with a loop delay between 100 and 10000, the maximum standard deviation (MSD) is 0.009%. For the measurements with a loop delay of 100000 the MSD is 0.107%, this is due to the small integer count achieved by the worker loops, making a difference of 1 in the counts result in a large percentage of deviation. The MSD for the calculation task CPU time measurements is 0.00054% for HartOS and 0.00068% for μC/OS-II. Detailed experiment descriptions, measurements and source code can be found at the project website (Lange, 2011b).

5.2 Experiment result evaluation The benchmark test which has been run at tick periods ranging from 5 μs to 100 ms, clearly shows the fact that software based RTOSs (in this instance μC/OS-II) suffer from a much larger overhead, comprised mainly of tick interrupt processing and scheduling, than HartOS does. Looking at Figure 4 it can be seen that HartOS has a consistent overhead independent of tick frequency, this overhead consists only of Context switching and API calls. Whereas the overhead of μC/OS-II consists of both Context switching, API calls, Tick interrupt handling and Scheduling. The overhead of μC/OS-II strongly depend on the tick frequency, thus encouraging the use of a tick frequency that is as low as possible.

The results shown in Figure 4 have been obtained with a relatively low loop delay, implying rather short tasks hence many context switches. In order to get a picture of how the RTOSs behave with tasks doing more work, the loop delays have been increased; the result of these tests are shown in Figures 5 and 6. Figure 5 clearly shows how the overhead of HartOS quickly approaches zero as the tasks have more work to do, this is expected as HartOS’ overhead only consists of API calls and context switches. For μC/OSII the overhead also decreases as seen on Figure 6, but instead of going toward zero it approaches a straight line with a slope that is a function of the tick frequency. The tick frequency, and hence the size of the discrete time quantities used in any RTOS, limits the accuracy to with a task can be scheduled. It is thus desirable to raise the tick frequency as high as possible, to ensure accurate task scheduling. The problem of software-based RTOSs is that high tick frequencies causes excessive overhead as shown by the benchmark test. HartOS does not suffer from this problem as it implements tick handling, scheduling etc. in hardware. The simulation of the robotic manipulator controller, has been run at tick periods of: 5, 10, 25, 50, 100 and 200 μs. μC/OS-II was not able to run at the 5 μs tick period. Hence Figures 8 and 9 does not contain data at 5 μs for μC/OS-II. Figure 7 shows the expected result, being that the calculation task running under HartOS is unaffected by the tick frequency, and hence is able to do the same amount of work no matter how fast the tick timer is running. For μC/OS-II the result also fits with what is to be expected from the schedulers design and the benchmark test. The result shown on Figures 8 and 9 shows HartOS to have a nearly constant (tick frequency independent) mean period error of only 0.9ns and 4ns respectively for the tasks with a target period of 200μs and 1ms. μC/OS-II on the other hand has a much larger error, that however drastically reduces as the tick frequency is lowered, with less overhead as a result. Although the mean period error is reduced for μC/OS-II when the tick frequency is lowered, the peak to peak period jitter is on the other hand increased dramatically (for PT1), from an initially low 48ns to 1μs peak-peak as shown on Figure 9. HartOS on the other hand has a constant relatively low peak to peak jitter around 100ns regardless of the tick frequency. 6. CONCLUSION AND FUTURE WORK HartOS is considered to be a step forward in the development of hardware-based RTOSs, as it can relieve the processor from the computational overhead of running a software-based RTOS, it supports high tick frequencies with no overhead penalties, provides a hard real-time environment where API functions are jitter-free and their execution time is independent of the number of active tasks, external events and tick frequency, while still providing the flexibility and features expected by developers from a real RTOS. Compared to μC/OS-II, HartOS has up to 3 orders of magnitude less mean error in generating the correct period for a task, and around 1 order of magnitude less jitter in the generated period. While at the same time (depending on the tick frequency) having between 100% to 6.5% less overhead than μC/OS-II in a simulated task set for a robotic manipulator.

Future work will encompass both further testing, reallife field-tests and additional features such as: event flags, queues, mailboxes, deadlock detection for semaphores, advanced scheduling algorithms and multiprocessor support. REFERENCES

Adomat, J., Furunas, J., Lindh, L., and Starner, J. (1996). Real-time kernel in hardware RTU: a step towards deterministic and high-performance real-time systems. Proceedings of the Eighth Euromicro Workshop on RealTime Systems, 1996., 164–168. C. M. Ferreira, A.S.R.O. (2009). RTOS Hardware Coprocessor Implementation in VHDL. Technical report. http://www.academia.edu/. Expresslogic (2011). Meas. RTOS Performance. Technical report, Online. http://rtos.com/PDFs/MeasuringRTOSPerformance.pdf. Kohout, P., Ganesh, B., and Jacob, B. (2003). Hardware support for real-time operating systems. First IEEE/ACM/IFIP Int. Conf. on HW/SW Codesign and System Synthesis, 2003., 45–51. Kuacharoen, P., Shalan, M.A., and III, V.J.M. (2003). A Configurable Hardware Scheduler for Real-Time Systems. Proceedings. of the International Conference on Engineering of Reconfigurable Systems and Algorithms, 96–101. Lange, A.B. (2011a). Hardware RTOS for FPGA based embedded systems. Master’s thesis, University of Southern Denmark. http://www.hartos.dk/publications/msc-thesis/hartos.pd Lange, A.B. (2011b). HartOS PDeS experiments data. http://www.hartos.dk/publications/pdes-2012/hartos.zip Lee, J., Mooney, V.J., I., Daleby, A., Ingstrom, K., Klevin, T., and Lindh, L. (2003). A comparison of the RTU hardware RTOS with a hardware/software RTOS. Proceedings of the Asia and South Pacific Design Automation Conference, 2003, 683–688. Lindh, L. (1991). Fastchart-a fast time deterministic CPU and hardware based real-time-kernel. Proceedings., Euromicro Workshop on Real Time Systems, 1991., 36– 40. Lindh, L. (1992). FASTHARD - A Fast Time Deterministic HARDware Based Real-time Kernel. Proceedings., Fourth Euromicro workshop on Real-Time Systems, 1992., 21–25. Liu, J.W.S. (2000). Real-Time Systems. ISBN-10: 0-13099651-3. Maruyama, N., Ishihara, T., and Yasuura, H. (2010). An RTOS in hardware for energy efficient software-based TCP/IP processing. IEEE 8th Symp. on Application Specific Processors, 2010, 58–63. Mooney, V.J., I. and Blough, D. (2002). A hardwaresoftware real-time operating system framework for SoCs. Design Test of Computers, IEEE, 19(6), 44–51. Murtaza, Z., Khan, S., Rafique, A., Bajwa, K., and Zaman, U. (2006). Silicon real time operating system for embedded DSPs. International Conference on Emerging Technologies ’06, 188–191. Nakano, T., Utama, A., Itabashi, M., Shiomi, A., and Imai, M. (1995). Hardware implementation of a realtime operating system. Proc. of the 12th TRON Project Int. Symposium, 1995., 34–42.

Nordstrom, S. and Asplund, L. (2007). Configurable Hardware/Software Support for Single Processor RealTime Kernels. International Symposium on System-onChip 2007, 1–4. Nordstrom, S., Lindh, L., Johansson, L., and Skoglund, T. (2005). Application specific real-time microkernel in hardware. 14th IEEE-NPSS Real Time Conference, 2005., 4 pp. Parisoto, A., Souza, A., J., Carro, L., Pontremoli, M., Pereira, C., and Suzim, A. (1997). F-Timer: dedicated FPGA to real-time systems design support. Proceedings of the 9th Euromicro Workshop on Real-Time Systems, 1997., 35–40. Song, M., Hong, S.H., and Chung, Y. (2007). Reducing the Overhead of Real-Time Operating System through Reconfigurable Hardware. 10th Euromicro Conference on Digital System Design Architectures, Methods and Tools, 2007., 311–316. Stankovic, J.A. and Ramamritham, K. (1991). The spring kernel: a new paradigm for real-time systems. Software, IEEE, 8(3), 62–72. Vetromille, M., Ost, L., Marcon, C., Reif, C., and Hessel, F. (2006). RTOS Scheduler Implementation in Hardware and Software for Real Time Applications. 17th IEEE International Workshop on Rapid System Prototyping, 2006., 163–168.