Bit accurate timing analysis on a frame based CAN model Marcus M¨ uller, Johannes Kl¨ ockner, Wolfgang Fengler ∗ ∗
Ilmenau University of Technology, Ilmenau, Germany (e-mail: {marcus.mueller, johanner.kloeckner, wolfgang.fengler}@ tu-ilmenau.de)
Abstract: This paper presents an approach to model the communication protocol CAN on a level that abstracts from the bit representation in order to significantly reduce simulation time for models of distributed applications on heterogeneous communication clusters. A central part of the approach lies in the definition of methods to abstract bit-based behaviour into a frame based model, but to preserve important timing and error characteristics, for which the calculation of the transmitted frame length is an important aspect. The approach uses apriori simulations to determine specific model parameters like stuff bit distribution and error propabilities. An error classification has been conducted to represent the inherent CAN error handling on frame level, detemining the exact number of bit times, after which a transmission error has been detected, to ensure a high grade of accuracy. Keywords: Fieldbus, CAN, modelling, Discrete event systems, System level modelling, MLDesigner, simulation 1. INTRODUCTION Current development in automotive industry is concerned with a transition from CAN based communication to the more advanced FlexRay protocol for hard real time applications. Considering further established protocols like MOST and LIN and their interactions with FlexRay will result in complex heterogeneous communication clusters combining subnetworks with different physical, architectural and protocol characteristics. Development of such complex distributed systems is aided by the introduction of a model based design stage, at which major application scenarios can be analysed, resulting in educated decisions on cluster architecture and function distribution and their effects on the communication behaviours and performance. The modeling of communication clusters for automotive applications centered around the FlexRay protocol demands means to model heterogeneous distributed embedded communication clusters including the currently still widely used protocol CAN. A general approach is the topdown modeling of a distributed embedded application, that is, with increasing hierarchical resolution, mapped onto execution units, that are interconnected by heterogeneous communication systems. Since in this context the focus of attention is obviously not on researching the inherent properties of the CAN protocol, the overhead of building and simulating a detailed bit-oriented model would defeat its purpose. Therefore it is attempted to create a more abstract model, that saves a significant amount of simulation time, yet is able to introduce the important behavioural properties into the cluster modeling.
The following section will introduce the underlying modeling paradigm and tool used for the studies. Before the discussion of the characteristics of the presented approach the necessary basics of the CAN protocol are summarized. 2. UNDERLYING MODELING CONCEPT The results presented in this paper extend a modeling approach presented in Kl¨ockner et al. (2008). The modeling strategy supports a generalized model based top-down development process. A single communication system is defined as a composition of three basic elements, which can be realized in different levels of abstraction. These elements are Host, Communication Controller (CC) and Channel. The interconnection is shown in Figure 1. Node
Node
Host
Host
Communication Controller
Communication Controller
Channel
Fig. 1. Basic Model Structure for networked embedded systems Kl¨ockner et al. (2008) In the actual works the focus is on the CC and the Channel. The Host contains only basic functionality which is caused by the dependency to a special application. A combination of a Host and a CC is named node so it contains the
functionality of the application and the communication behavior. The communication protocol should be implemented within the CC. The CC contains functionalities like synchronization, error detection and message handling. Towards the host several services should be provided, which are to be accessed by a host application. The application itself is described within the host. In addition, the host contains a specialized sublayer to realize the protocol related access to the CC. The host uses the CC services for configuration, send and receive operations and processing of the received data. Inside of the channel element the physical characteristics of nodes interconnection is described. The main task of this element is the reception and transmission of data. This data transfer takes the physical delay into account. A fault model can be integrated in the channel element to model transmission errors. 3. MODELING PARADIGM The focus of the approach Kl¨ ockner et al. (2008) lies on homogeneous communication systems using FlexRay. The modeling strategy uses the tool MLDesigner, MLDesign Technologies Inc. (2007) and is also chosen for the modeling of Controller Area Network (CAN) systems. MLDesigner is based on the Ptolemy project of UC Berkeley, The Ptolemy Project (2009). It provides different models of computation called domains, e.g. discrete event domain (DE) and finite state machines (FSM), synchronous data flow domain (SDF). The tool provides means for functional modeling, simulation and scenario based validation. MLDesigner uses a special terminology for its elements. Here is a short introduction of the most relevant ones. The top level element in the modeling hierarchy is called System. It provides no interfaces and contains other blocks. These building blocks can be elementary blocks (Primitives), like FSMs and C/C++ source code blocks, or hierarchical structure blocks called (Modules). Building blocks communicate with their environment by using Variables or Ports. A Port is represented as an arrow on the bounding box of a building block. Information is distributed by exchanging Particles over ports along attached signal paths. Variables can be realized as Memory blocks, that can be accessed by elementary blocks even across hierarchy levels. In comparison to bit and cycle accurate protocol simulation the frame based discrete event simulation in MLDesigner saves simulation time especially for large clusters with application level. The advantage of the used simulation tool is the multi-domain simulation. The developed models use the domains DE and FSM, which are well suited for networked embedded systems at the regarded level of abstraction. For more detailed models taking physical characteristics into account and models considering the systems’ environment a multi-domain approach can easily be undertaken. 4. CAN BASICS CAN is a widely used field bus protocol originally designed for automotive applications and specified in Robert Bosch GmbH (1991). It has been designed as a message-based CSMA-CA (Carrier Sense Multiple Access with Collision
Avoidance) protocol. Message based means that the protocol addresses different message types rather than different nodes. Carrier Sense means for a node to monitor the bus and only start a transmission on an idle bus, as well as monitoring the own transmission for bit errors. These bit errors can be caused by a faulty bus or, more likely, by another node transmitting a more dominant bit. In the latter case, Collision Avoidance requires the node, that detected a bit error to cease its transmission and wait for an idle bus again. This mechanism is the main aspect of CAN’s bus arbitration, since the more dominant bits a message begins with the higher its transmission priority will be. SOF Arbitration field Ctrl. field Data
CRC field
ACK field EOF
Fig. 2. CAN frame structure A CAN message is called a frame, its basic structure is depicted in Figure 2. The standard CAN protocol features two distinctive frames, the data frame with a data section of up to 8 bytes and the remote frame without a data section as a data request message. The arbitration field is comprised of an identifier that assigns the priority to the message and the RTR bit, that privileges a data frame over a remote frame. The extended CAN format provides the respective frames with a longer identifier to address more message types in larger systems. CAN features various error detection mechanisms. Basically bit monitoring is used to detect bit shift errors, violations of the bit stuffing rule and general frame structure violations, additionally a CRC check is performed. Immediately when detecting an error the detecting node disrupts the current transmission by sending a dominant Error sequence, which terminates the erroneous frame and notifies all nodes. In case the reserved space between two regular frames is disrupted by a transmission, which is a sign for a temporarily high load, nodes can respond with a dominant Overload sequence, which blocks the bus and provides additional processing time for the nodes. Table 1. Basic CAN data frame bit times SOF 1
Arbit. 12
Ctrl. 6
Data 8*data bytes
CRC 16
ACK 2
EOF 7
Although all possible CAN message formats have been considered during the conducted research, in the course of this paper only one frame type, the CAN standard data frame with 11 bit identifier and up to 8 bytes of data, the bit timing of which is shown in Table 1 will be used to illustrate the developed modeling techniques. 5. FRAME BASED MODELING APPROACH As indicated before in this approach it is abstracted from the bit level, so not the logic bit, but the MAC layer CAN frame is considered as the atomic element of information transfer. Concerning modeling and simulation in the Discrete Event domain a frame transmission is expressed by emission and consumption of a single particle. The frames’ contents are modeled as a complex data structure attached to the particle.
The characteristics of this underlying paradigm causes a central problem. Since the emission and consumption of a discrete event particle, and thereby the complete frame data, is conducted without time delay, the actual delay of the bit-wise transmission has to be modeled intentionally. Furthermore, bit-wise mechanisms and effects like bus arbitration, bus utilization, fault injection and error handling have to be abstracted to work on atomic frames, but show bit time based effects. The modeling aspects to address these challenges will be presented in the following sections.
transceiver and bus assembly); secondly, the probability of creating a fake error PoCFE in case a node detects an error in a correct frame due to a local receiver failure. The internal functional structure of the CAN MAC ele-
5.1 Model structure
Fig. 4. CAN node MAC layer ment is depicted in Figure 4 and reveals the central state machine NodeControl, that contains the frame handling during transmit and receive operation, and the subsystem ErrorControl for the local error state protocol of the node, as defined in the CAN specification. The blocks FrameCoding and LLCMsgCoding form the interface to the higher protocol level. Fig. 3. Small CAN cluster model A simulation model is structured as depicted by the small example in Figure 3 with two CAN nodes connected by a bus. Each node is comprised of a host application element and a communication controller containing the LLC and MAC level functionality of the CAN protocol. The MAC functions have been adapted to process data according to the frame-based mechanisms. Each node possesses a transmitting and a receiving connection to the bus, over which event particles with attached frame information are provided and consumed, respectively. The CAN bus element constitutes the model element for reconstructing bit-based behavioural effects on the abstracted frames based on error probability information of the connected nodes. 5.2 The CAN node element A CAN node’s communication is parametrized by preconfiguring the frame format FF (standard or extended) and the number of data bytes DLC. To characterize its abstracted low level behavioural properties a CAN node features a set of parameters relevant for simulation of fault injection and error handling mechanisms only. Although explicitly not necessary in CAN protocol the identification of a faulty node requires the introduction of an abstract node id NID. To address reliability, two probabilities have been added. Firstly, the probability of a node producing an erroneous frame PoNE, which results from the accumulated bit error probabilities (a property of the physical
5.3 The abstract CAN frames To support the abstract modeling and frame based simulation, a special data structure had to be defined. This data structure embeds the functionally significant sections of the CAN frame into additional data required by the simulation semantics. From this approach’s abstract point of view the bus element transforms a frame transmitted by a node’s MAC layer to the channel (MAC2ChanFrame) into a frame to be received by a node’s MAC layer (Chan2MACFrame). Since both address different aspects of the simulation semantics, different structures for transmitter-to-bus and bus-to-receiver frames with respective data fields have to be used. The following figures present the structures of these frames. The first field FL represents the frame length in bit. This value is calculated by the sending node and will be used to determine the frame’s transmission duration. The ID field represents the CAN message identifier. Together with frame format FF and RTR bit RTR it is used for bus arbitration. FL
ID
FF
RTR
Data
PoNE
Fig. 5. Frame from node MAC to Bus element (MAC2ChanFrame) When a frame is transmitted the data structure depicted in Figure 5 is emitted from a node element and carries the
corresponding PoSE field to provide this information to the fault injection mechanisms in the Bus element. FL
ID
FF
RTR
Data
EFL ID
FF
RTR
FD
Fig. 6. Frames from Bus element to node MAC (Chan2MACFrame) - unspoiled frame (top) and erroneous frame (bottom) The bus element transforms the transmitted structure and produces a data structure, that represents the frame read from the bus by the nodes. This can be a correctly transmitted frame or a spoiled frame. As can be seen in Figure 6 two different data structures have to be used to handle both cases. The unspoiled frame only contains the essential data and CAN protocol information, supplemented by the FL field for simulation timing reasons. The erroneous frame has first and most important of all a modified error frame length EFL, since at some point during transmission an error would have been detected, the frame aborted and replaced by the error code transmission. The additional field at the end of the structure is the first detector field FD containing information about the node first to report the error. It is used to influence the local error handling of each node. 5.4 The Bus element As indicated before, all the behavioural aspects that emerge only on bit level are introduced by the abstract Bus element. Figure 7 shows the internal functional structure of this model block. As mentioned before, the frames coming from the nodes are transformed. This recoding is performed by the block Chan2MACFrameCoding, which emits the data structures shown in Figure 6. Which of these structures is eventually transmitted to the nodes, an unspoiled or an erroneous frame, is determined by the fault injection model, that is housed in the block Cluster ErrorCreation. The preceding block BusArbitration performs the arbitration mechanism on abstract frames and thereby selects the frame actually to be transmitted via the bus. Parallel to this main data path, the frame data structures are submitted to the subsystem BusControl, which determines the global bus state, which is the most important property for the abstracted Carrier Sense mechanism.
6.1 Arbitration The bus arbitration during the CSMA-CA mechanism is conducted bitwise and evaluated by the transmitting node itself through comparison of the sent bit and the actual bus level. In discrete event simulation all simultaneously transmitting nodes emit their frame atomically as particles with identical time stamps. Arbitration is now performed by collecting the frame particles consumed at the same time step, determining the dominant message by comparing the ID fields and discarding all other particles. The nodes, that would have lost arbitration during transmission have to determine that based on the returning bus particles. 6.2 Time and Bus state modeling Modeling the timing behaviour of the transmissions is the most important part of this modeling approach. This aspect is closely coupled with the Bus state as a global but abstract cluster property. Once the arbitration has decided, what frame is actually transmitted over the channel, based on the frame length property FL coded in the data structure, the timing behaviour can be determined. As depicted in Figure 8 the timing behaviour model is built around a central FSM block, the task of which is to switch between the different phases of bus utilisation. These phases are first of all the Idle phase without any transmission and the Frame phase, when a frame is being transmitted and the bus is blocked. The inter frame sequence IFS has a fixed length of three bit times, as defined in the CAN specification, as has the Overload sequence with 14 bit times. Refer to Robert Bosch GmbH (1991) for details on these values. The blocks surrounding the central Bus State Control FSM create the duration of the phases by inserting the required delays into the signal flows.
Fig. 8. Bus element time behaviour model Fig. 7. Internal view of the Bus element 6. ABSTRACTING BEHAVIOURAL ASPECTS This section is dedicated to the behavioural abstractions from the bit level to the frame level, that are performed by the aforementioned subsystems inside the Bus element.
The FSM controlling the Bus state according to the utilization phases is depicted in Figure 9. Starting from the “Idle” state, on an incoming event on the FramIn port, the state is change to “frame” during which the FL value is transferred through the “FrameLength” port to the externally attached delay block. After the delay has
passed, a “Release” signal is generated, which triggers the transition to the “ifs” state. The other state transitions are conducted accordingly.
bytes (one through eight) will follow. Combining these criteria results in 16 classes for all lengths of standard and extended data frames plus two remote frame classes. Classes Basic/ extended
SOF
ID
Classes 0 .. 8 byte
RTR
DLC
Data CRC ... EOF
Classes Data/Remote
Fig. 9. Bus state FSM Fig. 10. CAN frame classification criteria The duration of the Frame phase is determined by the frame length property FL or EFL of the message put back on the channel. These values are calculated from the original frame length by considering two additional aspects - the bit stuffing and the fault injection. These two processes change the actual time the frame occupies the bus, the former is highly dependent on the bit distribution throughout the frame, the latter depends on possible errors in the frame and the corresponding time of detection and frame cancellation. Since both aspects require a more sophisticated approach, they will be handled separately in the following sections.
Within one class of frames only the identifier and the data section are truly variable, whereas the SOF bit is predefined, the RTR and DLC fixed and the CRC dependent on the preceding contents, as indicated in Figure 11. depends
SOF
ID
fixed
RTR DLC fixed
variable
Data
CRC ... EOF
fixed variable
6.3 Bit stuffing Each real CAN frame undergoes a coding process called bit stuffing before being submitted to the physical layer, during which after a series of five consecutive bits of the same polarity a single opposite bit is inserted, thus supporting the bit synchronisation of the nodes and introducing a simple fault detection mechanism. The amount of stuff bits inserted into a frame obviously depends on the bit distribution in the frame, mainly in its data section. Since the presented approach abstracts from the bit representation, the exact number of stuff bits cannot be determined on the given abstraction level. Therefor the exact length of a given frame and its bit accurate transmission time have to be deduced otherwise. The simulation model is based on an a-priori determination of the stuff bit distribution. A worst case assumption subjects a standard CAN data frame to the insertion of b(34 + 8 ∗ DLC)/5c stuff bits into the stuffable region from SOF to CRC. To get more realistic results, statistical studies have been conducted, both simulation and experiment based. In Rauchaupt (1994) a probability distribution for the appearance of one to ten stuff bits and an average number of stuff bits in frames with one to eight data bytes is given. In Nolte et al. (2001) a probabilistic model for stuff bit estimation has been introduced. Based on these ideas a simulation setup was developed, which allows a more specific analysis of the different classes of CAN frames. These classes are determined by analysing the variable and interdependent sections of the frame structure. As shown in Figure 10 CAN frames can be distinguished by three attributes: the ID the length of which determines the frame type standard or extended ; the RTR bit that distinguishes data and remote frames; and the data length code DLC indicating, how many data
Fig. 11. CAN data frame variable sections The a-priori determination of stuff bits is conducted by creating a significant amount of frames of each class, the bit representation of which varies randomly within the respective sections, calculate the CRC, conduct the bit stuffing and determine the stuff bit distribution. The creation of frame contents can be conducted purely randomly or application dependent, in case real data exist. Once the distribution are available, they can be introduced into the actual frame-based CAN scenario models. The block Stuffbit Computation housed in the Bus control element model shown in Figure 8 is able to classify the incoming frames according to their attributes and apply the respective stuff bit distribution, which results in a more realistic frame length and transmission duration. This block also allows the substitution of different methods of stuff bit determination within this modeling approach. 6.4 Error modeling Errors in frames occur, when at least one bit’s polarity is reversed by a disturbance. Considering the methods of detection, errors can be classified in two main categories: CRC errors and bit errors. CRC errors are reported by any node after a failing cyclic redundancy check of the received bit sequence from SOF to CRC. Bit errors are detected when the comparison of the expected bit value and the actual bus level fails. These bit errors can be further distinguished. Bit monitoring errors occur, when the actual bus level differs from the bit polarity a sending node has put on the bus. Bit stuff errors can be detected by any node as a violation of the stuff rule. Frame errors
are irregularities in the frame coding scheme (e.g. reserved delimiter bits, EOF sequence). When transferring this classification to the frame based modeling approach, each frame can be subdivided into significant sections, according to which class of error is detectable. Figure 12 shows the four partitions (G1 through G4) of globally detectable errors - errors caused by a transceiver failure or disturbance on the channel. Section G1 is the SOF bit, an error in this bit is only detectable by a sending node. During the arbitration phase G2 only bit stuff errors can be detected equally by senders and receivers, any other bit shift will just be interpreted as a loss of arbitration. In G3 any bit shift will immediately be detected by the sender due to bit monitoring. Violations in section G4 will be detected by all nodes as frame errors. SOF Arbit. (S)
Ctrl.
Data
CRC
CRCDel ACK ACKDel EOF
G1
G2
G3
Frame errors (S & R) G4
Fig. 12. Frame segments for global error behaviour Another type of errors are the locally detected errors induced by a local receiver fault - these errors are nevertheless globally reported and result in the destruction of the frame. Figure 13 shows the local fault sections L1 through L3. Section L1 contains the first four bits of the arbitration field, where errors cannot yet be detected as bit stuff errors, so only CRC violations can occur. In section L2 a receiver can potentially detect both stuff and CRC errors, and in L3 only frame errors can be detected. SOF Arbit. Arbit. Ctrl.Ctrl.DataData CRCCRC CRCDel CRCDel ACK ACKACKDel ACKDel EOF CRC fault (R)
L1
Segment G1 G2 G3 G4 L1 L2 L3
Bit time intervals 1 2..13 14..34+8*DLC 35+8*DLC..44+8*DLC 2..5 6..34+8*DLC 35+8*DLC..44+8*DLC
tion is performed by the Bus element within the Cluster ErrorCreation block (refer to Figure 7). This block relies on the sender’s probability of node error (PoNE) attached to the sent frame, the Bus element’s probability of transmission error (PoTE) and the global knowledge of the receivers’ PoNEs to create an erroneous frame. On the other hand, the error handling is still being performed in the CAN nodes’ MAC layer. 7. CONCLUSIONS
Bit Monitoring fault (S) Bit Stuff errors (S&R)
Table 2. Errors segment bit times
Bit stuff errors (R)
Frame errors
L2
L3
This paper presented an approach to model the CAN protocol on a level abstracting from the actual bit representation. For reasons of simulation performance the data exchanged between the nodes is described at message level, which reduces a CAN frame to an atomic unit of information. Introducing means to abstract bit-based behavioural aspects, like error modeling and bit stuffing, allows to conduct transmission timing analysis with bit-level accuracy on frame-level models. Specific model parameters concerning failure behaviour and error probabilities applying to concrete applications can be created by additional a-priori simulations. Using the developed modeling approach supports cluster design decisions for complex heterogeneous networked embedded systems in early development stages by enabling the detailed exploration, analysis and validation of timing and functional behaviour. Simulation based analysis can be used to explore system parameters adapted to the functionality and needed for the developed applications.
Fig. 13. Frame segments for local error behaviour REFERENCES This error classification has been conducted to represent the inherent CAN error handling on frame level, but also to consider the influence of error handling on the transmission timing behaviour. The most important property for the timing behaviour of a transmission is the length of an erroneous frame, that means, the exact number of bit times, after which a transmission error has been detected, the frame is rejected and the bus state is reset to idle. Table 2 shows the bit time intervals of the above mentioned error detection segmentation for a standard CAN data frame. The error frame length can be computed depending on the type of the detected error. A CRC error can only be signalled after the acknowledgement delimiter (ACKDel). The other error types bit monitoring, bit stuffing and frame coding errors are signalled immediately. The error model for the frame based approach is realized in a distributed way. On one hand, fault injec-
Kl¨ockner, J., K¨ohler, S., and Fengler, W. (2008). Model based design of networked embedded systems. In ICINCO-SPSMC, 253–259. INSTICC Press. MLDesign Technologies Inc. (2007). MLDesigner Documentation, Version 2.7. http://www.mldesigner.com/. Nolte, T., Hansson, H., Norstr¨omand, C., and Punnekkat, S. (2001). Using bit-stuffing distributions in can analysis. IEEE Real-Time Embedded Systems Workshop. Rauchaupt, L. (1994). Performance analysis of can based systems. In 1st international CAN Conference, volume 135, 7–9. Robert Bosch GmbH (1991). CAN Specification V2.0. The Ptolemy Project (2009). http://ptolemy.eecs.berkeley.edu/.