Copyright © IFAC Real Time Programming. Schloss Dagstuhl . Germany. 1999
DEVELOPING A TESTBED FOR DISTRIBUTED REAL-TIME APPLICATIONS P.T.Woolley· W.M.Walker* A.Burns·
• Department of Computer Science, University of York, UK. e-mail [woolley. billy.
[email protected]. uk
Abstract: Fixed priority scheduling is viewed to be a standard technique for the scheduling of tasks in real-time systems on uni-processors. There are, however, also communication protocols which are based upon fixed priority scheduling. This paper discusses a test bed which has been constructed using embedded controller boards connected via a CANbus. The testbed supports experimentation with distributed real-time systems using both the Ada-95 and C programming languages. Support is provided for a higher level CAN communication protocol by a set of interface routines, and for multi-tasking on a single processor through the provision of a kernel of routines based upon the Pthread standards. Results are provided which suggest that the test bed can allow the deterministic scheduling of both tasks and communications, thus facilitating experimentation with distributed real-time systems. Copyright © 19991FAC
Keywords: Ada 95, Real-Time Systems, Testbed, Controller Area Network.
1. INTRODUCTION
was ported to a bare processor. The fixed priority communications protocol used was one based upon the CANbus, and thus it was also necessary to develop a programmer's API to the CAN. The way in which Gnat was ported to the hardware architecture means that the testbed also includes support for programs written in C.
Fixed priority preemptive scheduling has become a standard technique for implementing real-time systems (Burns and Wellings, 1996) . The majority of attention has been focused on the issues relating to the scheduling of tasks on a single processor, but communication protocols are also available which are based on fixed priority scheduling of messages. It is therefore possible to consider complete distributed systems based upon fixed priority despatching. The theoretical scheduling work has been presented previously e.g. (Liu and Leyland, 1973), (Audsley et al, 1993), (Gallmeister, 1995) , (Burns and Wellings, 1996).
The availability of Gnat has significantly improved access to the Ada programming language for general programming, but there is still no easy way to experiment with real-time behaviour, particularly on distributed hardware. The test bed discussed in this paper facilitates such experimentation. In this paper we address the issues involved in
The modifications made to the Ada programming language during the 9X process has resulted in a language which is suited to programming realtime systems, and so Ada was used as the means of programming fixed priority based applications. To make use of Ada it was first necessary to produce a predictable and effective run-time system, and it is for this reason that the Gnat Ada compiler
constructing an effective test bed for analysing and evaluating distributed scheduling schemes, and describe our implementation. Data on the performance and predictability of the implementation are presented , and a small demonstration system is also described.
101
2. HARDWARE ARCHITECTURE OF THE TESTBED
4. INTEGRATING THE CAN Integrating CAN into the testbed can be considered to be composed of issues in two areas; technical issues concerned with serial communications via CAN, and provision of a programmer's API (Figure 1) for access to the CANbus.
The basic hardware architecture used for the testbed comprises a number of processor nodes connected together via a serial CANbus. The processor nodes are mainly embedded Motorola 68376 based boards. This processor was chosen because it includes an on-board CAN controller; the TouCAN module. The remaining nodes in the network are either standard personal computers (PC's) with PCI CAN hardware, or further Motorola 68332 based processor boards which have access to additional CAN hardware devices; in both cases the CAN hardware is based upon the Intel CAN controller chip.
4.1 Technical Issues The major technical issue to be overcome in setting up a CAN based serial communications scheme is ensuring that all CAN devices on the network are transmitting data at as close to the same rate as possible. A difference in communication rate of as little as 2% can cause messages to go undetected due to loss of synchronisation. Initially the Intel based CAN devices were generating an exact (measured on an oscilloscope) 1Mbps transmission rate, while the TouCAN chips were operating approximately 2% faster. The difference in speed was due to the underlying processor clock rate from which the CAN chip derived its timing. The TouCAN boards were operating with a 16MHz clock rate, while the Intel based chips obtained a 1MHz frequency from dedicated oscillators. This led, after between 15 minutes and 2 hours, to a loss of synchronisation. Additionally some data frames were more likely to fail than others. This effect was caused by the bitstuffing behaviour of the low level CAN protocol. By changing the processor clock rate on the TouCAN boards to 21MHz the difference in the CAN transmission rates was brought to within 1%, and the problem was alleviated.
The CANbus is operated using the maximum 1Mbps data rate , and has an overall length under 10m. The CAN API, discussed in section 4.1, supports up to 10 nodes on the network. The embedded controller boards are used to provide predictable applications, while the PC's provide support for the development of programs. This support is in the form of the Gnat Ada compiler (configured as a cross-compiler), and software tools for downloading programs and data file access via CAN and monitoring the network traffic.
3. PORTING THE ADA RUNTIME Support for the Ada programming language has been provided by porting the Gnat system to the bare processors used on the embedded controller boards. Apart from the production of a Gnat cross-compiler this has meant the construction of a kernel which implements the functions necessary for multi-tasking and time-based processing.
4.2 Programmer's API to CAN
The kernel supports multi-tasking via the POSIX threads standard (POSIX.1c), and includes support for priority based preemptive scheduling, mutexes , condition variables, and timed delays. The kernel itself is written in a mixture of Motorola assembly language and C, in order to obtain maximum efficiency in terms of execution time and compactness of machine code. This additionally means that it is possible to use POSIX threads from within C programs.
Once the physical data transfer problems were solved, the problem remained of sending arbitrary length data messages across the CAN bus, while maintaining predictability in message transmissions. The standard and extended low level CAN protocols (2.0A and 2.0B) allow for a maximum of 8 bytes per frame. If more space than this is required for data, then multiple frames must be sent. A further motivation for the provision of a higher level CAN communication protocol was to isolate the programmer from the underlying hardware. This has the advantage that programming communications becomes both easier to achieve, and more reliable (in that direct access to hardware registers is not required) .
When writing an Ada tasking program, the compiler links to the relevant functions of the pthreadkernel and thus the underlying despatching policy of Ada tasks within this environment is priority based FIFO scheduling by default. This means that the analyses developed for fixed priority scheduling can be used for the applications created. Full details of the porting of Gnat to the processors used within the testbed, and of the pthread kernel are given in (Walker et aI, 1999).
From the programmer's point of view communication across the CAN network is performed by reading and writing messages of arbitrary length to CAN communication channels. The first step 102
procedure
Node-Humber : integer ) ;
initialise_can~essage_system(
procedure reserve_can_channel( channel : integer ; mode : unsigned_B); function open_can_tx_channel(channel : integer;
queueing~ode :
integer;
access~ode :
integer)
queueing~ode :
integer ;
access~ode :
integer;
return integer ; function
open_can~x_channel(channel :
blocking~ode :
integer;
integer) return integer ;
procedure
read_can~essage( CAN~andle :
integer; data: in out buffer );
procedure
send_can~essage(
integer; destination: integer ; data : in buffer ;
CAN~andle :
byte_count : integer) ;
Fig. 1. Ada Programmer's API to the Distributed Real-Time Execution Environment's CAN protocol will be delayed by no more than the time taken to transmit one frame of a lower priority message.
in using the CAN communication protocol developed here, is to initialise the CAN message system. This is required both to activate the TouCAN hardware and to assign a unique NODE NUMBER to the board in question. This is important since messages are directed to specific nodes via their node number, and allowing software assignment of node identifiers enables greater flexibility. Step two, is to reserve CAN channels for TRANSMITing and RECEIVEing data. The idea behind reserving channels is to allow protection against accidental use of the CANbus. If a message is directed to a channel which has not been specifically reserved, an error will be generated. The channel number selected is effectively the priority of the CAN message, with higher numbered channels representing higher priorities. Once reserved, a channel may then be used for reading or writing messages to the CANbus. If the message data length is greater than 8 bytes, the message is packaged and sent out in as many frames as required. The reading of CAN messages can be either BLOCKING or NONJ3LOCKING , and reading and writing can be performed in QUEUING or NON_QUEUING modes.
If two tasks on the same node are transmitting messages the lower priority message (i.e. the message on the lower channel number) will be suspended after the currently transmitting frame has been sent. Once the higher priority message has been successfully transmitted the low priority message is resumed. This means that the delay on any message is equal to its message transmission time plus the combined message transmission times of all higher priority messages on any node, plus some overheads for preempting and packaging of data. The traffic on the CANbus can thus be analysed in a similar manner to tasks on a single processor (Tindell et al, 1995) .
The aim of the particular communication protocol developed for the testbed was to generate a communication protocol which was simple, and encapsulated the concept of fixed priority preemptive messaging. In this implementation the channel numbers represent message priorities, thus faciltating analysis of the network traffic. Messages may be analysed according to the channel number upon which they are transmitted together with their expected duration on the network (obtained from Figure 2) in order to assess the amount of time required for the message to be transmitted in entirety. In a sense the protocol itself can be considered as a medium level protocol, the purpose of which is to provide a base for predictable scheduling of abritrary length messages.
Due to the nature of the low level CAN protocol, multi-frame messages of higher priority will be sent in entirety before any lower priority messages sent from another node, even if a multi-frame low priority message has begun transmission. The reason for this behaviour is that priority arbitration is performed for each low level CAN frame sent, not on a per message basis. The multiple parts of the message are queued in 7-frame blocks, within the CAN device's message buffers. This means that as soon as one message has been transmitted, the next (with the same priority) is entered into the arbitration process for the next message frame. This behaviour ensures that the highest priority message, regardless of its length,
5. PERFORMANCE EVALUATION OF THE TESTBED Performance evaluation of the test bed is split into two parts; evaluation of the task switching overheads and evaluation of the data transfer times across the CANbus. The values obtained 103
~~r-----------------------------------------------------~-------' Tronslllission -
. ~-:~ .; 3350
-rr-----=''==-.-== ---------..-_-="",I -
3300 Tioes(ticks)
.
,.,
3250
_r -
3~~------------~------~--------------~------~------~------~ S
10
lS
20
25
30
35
40
Fig. 2. Transmission and Reception Times for CAN Messages from 1 to 40 bytes. are timed from the perspective of an application program, so that the measurements represent the timing overheads which are incurred rather than the time taken to perform kernel functions. The time values obtained thus include the overheads of entering and exiting the kernel functions as well as performing the required functions . The timer used was the standard Motorola 68376 real-time clock updating every 244us.
value is divided by the number of messages sent this error is reduced to ± 0.02 clock ticks. In order to assess the task switching overhead of an Ada tasking program, a similar experiment was set up. A block of Ada code was executed and timed. This code was then used for both the main task and all child tasks in a multi-tasking Ada program. The action of the main task was to record the time and then lower its own priority so that all child tasks were released before the main task could complete its computation. The kernel overhead was then calculated by comparing the measured main task execution time with the combined execution times of all of the tasks in the system. It was expected that , since the Ada compiler adds additional validity checks and code to manage the tasking, that the task switch overhead would be greater than the calculated threadJIlanager overhead. The results obtained are shown below:
5.1 Task Switching Overhead Switching between tasks (whether thay are Ada tasks or C-Ianguage threads) is governed by the threadJIlanager within the pthread kernel developed as a part of the run-time environment of the testbed. The major overhead (in terms of execution time) is thus present within this kernel function , since it is here that the thread queues are examined and modified. In order to obtain the execution time of this function , a C program was constructed such that a number of threads with known execution time were executed. A flag within the kernel registered the number of times that the threadJIlanager was called. The difference in time between the expected execution time of the group of threads (the sum of their individual execution times) and the measured execution time was taken to be the time spent within the threadJIlanager and was calculated to be 0.5 of a clock tick (122us) per entry.
Number of tasks 5 (Main task
+ 4 child-tasks)
Expected Loop Execution Time
= 1410 ticks
Measured Loop Execution Time = 1422 ticks Number of calls to threadJIlanager Task Switching Overhead 1.71 ticks (approx. 418us)
= (1422
=7 - 1410)/7
=
The task switching overhead can be broken down by tracing the kernel function calls which the Ada program makes. It was found that for a change of priority the Gnat Ada compiler checks the priority range against the maximum and minimum allowbale priority values returned by the kernel , creates and locks a mutex, performs the priority change, unlocks the mutex, and then enters the threadJllanager to select which task to make active next. This means that the threadJIlanager is entered more than once per call to the set priority function in an Ada program, thus accounting for the increased task switching overhead.
The timing measurements taken where recorded with the processor system clocks, and this means that there is always a ± 1 clock tick (244us) error on any recorded time value. The experiments were arranged in such a way that these errors were small in comparison to the computations being timed. For example, when recording CAN transmission times the time taken was measured around 100 message transmissions. The error on the clock values was ± 1 and so the error on the duration recorded is ± 2 clock ticks. Once the time 104
5.2 CAN Communication Overheads
kernel status data. If BLOCKING mode had been used for reception , there could have been an additional 1 tick (244us) on all reception time values due to the need for the kernel to wake-up the task when a CAN frame was detected. This must be considered when using BLOCKING mode reads, but represents an additional delay on the basic function of reception.
The overhead of using the CANbus for communications can be split into three considerations; the first is the time between issuing the send command and the time at which the full message has actually been sent onto the CANbus. The second is the time required to receive a message from the CANbus and make it available to the application, and the third is the impact of on-node and offnode preemption on multi-frame messages.
The impact of message preemption was estimated by arranging for two tasks to write messages to the CANbus at a regular interval. The priority of one message was much higher than the other, with the lower priority message also having the longer data length and lower task priority. Thus when the lower priority task is running and transmitting its long message, the higher priority task can become active and preempt that message. By timing the transmission time of the lower priority message the impact of the preemption of messages can be assessed. This was performed with both on-node preemption of messages and off-node preemption and the results are shown below.
To assess the transmission time characteristics of the CAN protocol developed for this test bed the time taken to transmit 100 messages of varying data length (from 1 byte to 200 bytes) was measured. To ensure that the complete message had been sent, immediately after the send command the program entered a loop continually polling a kernel flag (via a function call) which was set once the message had completed its transmission. Figure 2 shows the resulting transmission times to be step-wise linear with respect to message length. The steps in the plot are consistent with increasing frane sizes of the messages, and can be expected from the transmission behaviour (i.e. a 16-byte message requires 2 frames, and thus has an additional overhead involved in initialising 2 CAN buffers compared to a message which requires only 1 CAN buffer) . After every 7th frame there is also an additional delay caused by the refilling of buffers, because messages are placed into the buffers 7-frames at a time (the TouCAN chip has space for 16 CAN buffers, and so 7 of these are used for transmission leaving the rest free for reception ). Since the message transmission times are broadly speaking linear and repeated experimentation gives the same results , they are also predictable.
Time to transmit 200 byte messages
= 17667 ticks
Time to transmit 100 byte messages
= 974 ticks
Estimated time for preempted 200 byte messages = 17667 + 974 = 18641 clock ticks Measured time for preempted 200 byte messages = 18639 clock ticks Accounting for a ± 2 clock tick error on all timings, these results indicate that there is negligible cost involved in the on-node preemption of CAN messages. Additionally, the low-priority message did not interfere with the transmission of the high priority message.
In order to assess the reception time for CAN messages, two nodes were employed. The first node transmitted 100 messages each of sizes from 1 byte to 200 bytes. The second node polled for these messages in NON..BLOCKING mode, and recorded the amount of time taken from the start of transmission until the complete messages had arrived. Synchronisation was achieved by having the reception node BLOCK on a CAN channel before commencing timing, and having the transmitting node send a header for each block of messages. The reception time was found to closely follow transmission time. The activity of receiving messages is slighlty faster than the complete transmission of a message, and in Figure 2 it appears that messages are received before they have been completely sent. It must be remembered that the transmission program was timing until the time at which confirmation of the transmission was recorded , and will include time taken to change the TouCAN hardware registers and update the
5.3 Implementation of the CAN Driver The memory model of the boards used for this test bed is simple linear addressing with no memory protection. An application running on one of the boards has access to all available memory at anyone time. The CAN "driver" consists of kernel functions for transmitting and receiving messages. This means that there is no task or daemon running to serve the CAN and thus no impact is made upon schedulability unless the CANbus functions are actually used. For transmission the overheads described in the previous section are all that are incurred. The Transmission function can be treated from a timing perspective like any other function, since it simply copies the messages into the TouCAN hardware buffers. Reception requires more consideration, as it is interrupt driven. If a CAN frame arrives at the node , an interrupt is generated and 105
the message is copied from the TouCAN hardware into a kernel data structure. Control then returns to whatever task was previously running, unless a higher priority task is blocked awaiting a message on the appropriate channel number. Only CAN frames which have been expressly directed to the node have any impact on timing, because the TouCAN hardware is used to mask out messages which do not have the correct node mask in their CAN identifier.
7. CONCLUSION The software and hardware of the test bed provide a suitable platform for experimentation with realtime distributed applications. The use of the Gnat Ada compiler in conjunction with the pthread kernel which has been developed, enables the predictable and timely execution of multi-tasking programs on individual nodes in the network, and the CANbus facilitates the analysis of network traffic. The measurements undertaken thus far indicate that sufficient data is available on the performance of the testbed to enable the accurate assessment of analysis techniques, and to facilitate experimentation with real-time distributed software.
The additional overhead which needs to be considered in schedulabilty analysis is therefore related to the expected arrival rate of messages at a particular node. The overhead in terms of schedulability analysis can be treated as a highest priority task with period equal to message arrival rate. Where there are multiple messages directed to one node, each message has a priority with respect to other messages related to its channel number.
8. REFERENCES Audsley, N. Tindell, K. and Burns, A. (1993) . The end of the line for static cyclic scheduling ? In Proc. Fifth Euromicro Workshop on Real Time Systems 36-4l. Burns, A. and Wellings, A. (1996) . Real-Time Systems and Programming Languages. 2nd ed .. Addison-Wesley. Harlow. Gallmeister, B.O. (1995) . POSIX.4: Programming for the Real- W orId. 0 'Reilly & Associates. Sebastapol CA. Liu, C. and Layland, J. (1973) . Scheduling Algorithms for Multiprogramming in a hard Real Time Environment. JACM 20(1). 46-6l. Tindell K, Burns A, and Wellings, A. (1995) . Calculating Controller Area Network (CAN) Message Response Times. Control Engineering Practice. 3(8). 1163-1169. Walker, W.M. Woolley, P.T. and Burns, A (1999). An Experimental Testbed for Embedded Real-Time Ada95. To Appear in Ada Letters, 1999.
6. A DEMONSTRATION SYSTEM The controlled system for the demonstration was an electric slot-racing set, modified such that there were infrared sensors around the lanes of the track (784 per lane), and the cars were fitted with infrared emitters. The sensors were connected to a Motorola 68030 processor board in such a way that when the emitter from a car triggered a sensor, an interrupt was generated on the processor board and the sensor number and current processor clock counter value were recorded. in memory locations. This processor board was connected via a VMEbus to a Motorola 68332 based processor board with an Intel CAN controller chip. Track sensor data was broadcast on the CANbus every 8ms, and this was sufficient to locate the position of cars on either lane to the closest sensor just passed. The lane voltages were set in response to a CAN message containing a lane number and requested voltage value. An interesting feature of the track layout was the chicanes, where it was possible for two cars to collide. The aim of the demonstration controller was to allow two cars to complete the circuit with fast laps times, while ensuring that they did not collide on any of the chicanes. The control software was arranged such that two MC68376 boards were used, one for each lane, and co-ordination was performed by direct observation of the position of both cars (by sensor number) . By ensuring that, when in chicanes, the cars were never closer than 5 sensors apart (just over one car length in this case), the cars were able to complete circuits of the track without colliding on the chicanes. In addition the trailing car was not held up for the full length of the chicane, but only for sufficient time to void collision with the lead car. 106