Available online at www.sciencedirect.com
Procedia Computer Science 10 (2012) 422 – 431
The 3rd International Conference on Ambient Systems, Networks and Technologies (ANT)
A State-Machine Model for Reliability Eliciting over Wireless Sensor and Actuator Networks José Cecílioa, Pedro Furtado University of Coimbra, Coimbra, Portugal
Abstract Advances in communications and embedded systems have led to the proliferation of wireless sensor and actuator networks (WSANs) in a wide variety of application domains. One important key of many such WSAN applications is the needed to meet non-functional requirements (e.g., lifetime, reliability, time guarantees) as well as functional ones (e.g. monitoring, actuation). Some application domains even require that sensor nodes be deployed in harsh environments (e.g., refineries), where they can fail due to communication interference, power problems or other issues. Unfortunately, the node failures can be catastrophic for critical or safety related systems. State machines can offer a promising approach to separate the two concerns – functional and non-functional – bringing forth reliability exception conditions handling, by means of fault handling states. We develop an approach that allows users to define and program typical applications using their platform language, but also adds state machine logic to design, view and handle explicitly other concerns such as reliability. The experimental section shows a working deployment of this concept in an industrial refinery setting.
© 2011 Published by Elsevier Ltd. Selection and/or peer-review under responsibility of [name organizer] Keywords: WSN; Programming model; State-Machine; Industrial applications.
1. Introduction Sensor network programmers have to deal with functional requirements – e.g. nodes have to collect sensed data with a certain period, other nodes relay information and yet other nodes have to actuate based on sensed values and computed values. Non-functional requirements, and in particular reliability and performance issues, are a cross-cutting concern, in the sense that they are very important but orthogonal to the functional processing requirements. Ensuring the reliability of sensor network applications is an important problem, and carefully taking into account reliability exception processing should be an a
Corresponding author. Tel.: +351 239 790 000; fax: +351 239 701 266. E-mail address:
[email protected].
1877-0509 © 2012 Published by Elsevier Ltd. doi:10.1016/j.procs.2012.06.055
José Cecílio and Pedro Furtado / Procedia Computer Science 10 (2012) 422 – 431
important part of sensor network programming paradigms, especially in industrial and other safety critical environments. It is important to be able to elicit reliability and performance issues in a way that it is easily specified and verifiable by programmers. We propose a framework that allows programmers to use their platform language (e.g. Contiki-C or NesC) for the normal functional processing, but then adds a state-machine model for specifying and clearly eliciting non-functional exception handling into the design and code. The framework generates platform-level code (which can be further modified at will by the programmer). This way the programmers have all the flexibility of the platform code language such as NesC, but the program is able to clearly separate the functional from non-functional concerns and programmers are able to see what exception conditions have been dealt with and to add or modify those in a simple way. Consider the following application scenario, which motivates the usefulness of the approach: A sensor network in some industrial setting has nodes sensing some physical processes and sending the samples into controller programmable logic controllers (PLCs). The sensor network is using a schedule-based media access control protocol (MAC) to deliver performance guarantees, but the MAC was designed so that it is possible to modify the slot assignment dynamically and there are spare slots within the epoch to be used optionally for retries. With our approach, after the functional code is developed to sense and send the messages, we setup a state machine as shown in figure 1 so that up to 2 extra message retry slots can be added or removed to/from a node if the node’s packet loss rate is above a desired reliability threshold value. The figure shows three states – a Monitoring state, the one implementing the functional sensing and data communication requirements – and two reliability-related states: add and remove retry slot states. The figure also specifies that a packet loss above 1% jumps to the add extra slot state, which will configure an extra slot (if there are less than 2 extra slots for the node) and always return to the previous state, and the remove extra slot state, which will do the inverse of returning extra slots t a free pool when the packet loss rate is below 0.5%. Always
Remove retry slot
packetLoss > 1% Add retry slot
Monitor
packetLoss < 0.5%
Always
Fig. 1. State machine model
This example shows clearly how our approach separates concerns and allows programmers to verify and be sure about their reliability handling functionality. If you would look into the Monitor state, you would see the typical sensing code (functional part), while the state-transition conditions on packet loss and the “Add Retry Slot” and “Remove Retry Slot” states elicit reliability issues handling. The rest of the paper is organized as follows: section 2 reviews the related work; section 3 describes the proposed approach and framework. It explains how the approach allows programmers to use their platform language coding together with state machines for separation of concerns. Section 4 shows an example of practical implementation and results obtained from deployment in a refinery industrial setting, where the state machine can be used to deal effectively with reliability concerns; and section 5 concludes the paper.
423
424
José Cecílio and Pedro Furtado / Procedia Computer Science 10 (2012) 422 – 431
2. Related Work Related work on this subject includes strategies to abstract low-level programming details based on state machines. Those works focus mainly on providing an approach for high-level event-handling that is based on state-machines, while we propose the use of state-machines to elicit reliability functionality in a typical programming environment for WSANs. A number of works have motivated our work and the hypothesis that state machines are good highlevel abstractions of event-driven sensor network programs. Kasten [1] and Kim [2] present the design of two models of state machines. They present a programming language that allows the programmer to directly implement sensor networks as finite state machines. In comparison to those works, our work does not focus on providing a different programming language for functional specification, but rather at adding state machines to the typical programming approach for handling non-functional - reliability - exception handling. Other works apply state-machines to perform validation of sensor network programs. For example, Lighthouse [3] uses state machine specifications in order to statically analyze dynamic memory usage in SOS [4] applications and Archer [5] use the state machines to represent correct memory usage specifications on TinyOS, also enforcing these specifications at run time. Another somewhat related issue is programming abstractions for WSNs. Mottola [7] provides a taxonomy of WSN programming abstractions, where some of the relevant characteristics concerning language are the communication and computation perspectives, programming idiom and distribution model. Macro-programming approaches include: SNACK [8], Tenet [9], Flask [10], semantic streams [11]. Some approaches target subsets of nodes depending on application-level information (e.g. Regiment [12] and Pieces [13]) and others offer a global view programming of the system (e.g. TinyDB [14]). Some are imperative approaches, which are programming solutions based on sequential or event-driven semantics (e.g. Abstract Regions [15] and Pleaides [16]), others are declarative, which are usually very concise in describing the system behaviour. This description can be based on database-style (e.g. TinyDB [14] and Cougar [17]) or rule-oriented approaches. Functional paradigms express application processing by applying one or more functions to data sensed in some part of the system (e.g. Regiment [7] and snBench [18]). If more than a single idiom is associated to address different aspects, those are Hybrid approaches such as presented in ATaG [19]. By putting together the flexibility of platform languages and the important separation of concerns given by mixing that with state machines for reliability concerns, our approach uses only the necessary amount of abstraction to elicit clearly the reliability issues, while maintaining the full power of platform languages. 3. The State Machine Approach for Separation of Concerns A state machine is a model used to design computer applications. It is composed of a finite number of states and transitions. A transition is a set of actions that starts from one state and ends in another (or the same) state. A transition is started by some trigger, which can be an event or a condition. State machines can be used in a multitude of very different problems and with very different objectives. The details of the approach vary significantly depending on those factors. Most frequently, it would not be a great advantage to be able to write a whole sensor network application by expressing every little detail as states and transitions, a typical platform programming language such as NesC is much better suited at that. However, the state machine abstraction is still very useful in that context for explicitly specifying the handling of some cross-cutting concerns in a manner that makes them quite explicit and easy to verify. This is the reason why we propose representing
José Cecílio and Pedro Furtado / Procedia Computer Science 10 (2012) 422 – 431
reliability exception handling in that manner, while the normal functional coding uses the traditional imperative language. 3.1. State Machine Elements and Model The main elements of our proposed approach are depicted in Figure 2 a). The state machine model is composed of states, transitions, global variables and global code. Global variables are variables that all states share, and include variables that are used in specifying transition conditions. Global code is code that runs outside states and is executed at the beginning of the application. For instance, network message received callbacks are typically global code. Users specify the states, each with a well-defined state name (ending by the “State” keyword) and transition condition expressions, which are regular “if” condition clauses of the platform language. A state is further defined by a setup block of code, a cycle block of code and local variables. The setup block is where the programmer puts the code that runs only once at the beginning in the state. Some states may have only a setup, in which case they run a piece of code and jump back to another state through a transition with a condition that is evaluated to true, or with a condition keyword “always” (always jump). The cycle block has an associated “cycle period” variable that is set by the programmer, and includes code that will run until some transition condition evaluated to true makes the execution jump to another state. With this simple approach, programmers specify any code they want to in their native platform language as they would do without the approach, and use the state machine approach (optionally) to organize the code into states and transitions, adding the reliability exception handling solution code. This state machine model runs sequentially in a single thread. It is, however, possible to run multiple state machines simultaneously in multiple threads, and they may share resources. Figure 2 b) shows an example of two state machines: for example, one state may be receiving actuation commands and includes a time fault handling state which uses a safety value for actuation when the actuation command does not arrive within a certain time (we describe this example in more detail later on); the other thread may be testing network connection, and if it fails the node tries to establish the connection through some other node.
Fig. 2.
a) Generic State Machine Model
b) Parallel state machines
425
426
José Cecílio and Pedro Furtado / Procedia Computer Science 10 (2012) 422 – 431
3.2. Code Generation Platform code is generated from the state machine filled with all relevant elements for an application. Figure 3 reveals the skeleton structure of part of the code that is generated, in this case for contiki-C. Each state will have a corresponding method with only two calls – setup and cycle – the setup method will get the programmer code that runs as soon as that state starts, and the cycle method consists mainly of a cycle that receives the code that should run in loop mode, and tests the state transition conditions to jump to another state if necessary. Finally, Figure 4 shows the main method of the application, which simply calls the appropriate states at each time. char nextState; // Put here your global state variables; /* your state A */ void state_A(){ setupA() cycleA();// states may have cycles or not } void setupA() { // one-time state start running code } void cycleA() { // cycle code, code that runs every cycle until state change or app end unsigned stateAcycleInterval = 10; while (nextState == A) { PROCESS_WAIT_EVENT(); if (etimer_expired(&et)){ // code goes here } if( /* jump to B condition goes here */) { et.kill(); nextCycle=B; } if( /* jump to C condition goes here */) { et.kill(); nextCycle=C; } etimer_set(&et, stateAcycleInterval); } }
Fig. 3. Generated Code – States, transitions and variables /*---------------------------------------------------------------------------*/ //Set up ABC static const struct abc_callbacks abc_call = {packet_recv}; static struct abc_conn abc; /*---------------------------------------------------------------------------*/ PROCESS_THREAD(envmon_process, ev, data) { static struct etimer et; PROCESS_BEGIN(); SENSORS_ACTIVATE(sht11_sensor); SENSORS_ACTIVATE(all_sensor); abc_open(&abc, 128, &abc_call); nextState=A; while (1) { if(nextState==A) state_A(); if(nextState==B) state_B(); …. } }
Fig. 4. Generated Code – Main method
José Cecílio and Pedro Furtado / Procedia Computer Science 10 (2012) 422 – 431
4. Experimental Results In this section, we describe our tests with a prototype of the state machine model we developed in a testbed with reliability and safety control concerns. For these tests we used the context of a wireless sensor networks with performance control guarantees, with a small testbed WSAN deployment in an oil refinery. In that context our proposal provided us an alternative approach to define applications where reliability and timing are important requirements. For instance, applications such as closed-loop with time guarantees requirements or data monitoring reliability are easily modelled by our proposal. The testbed is comprised of 16 hierarchically-organized nodes (figure 5), which have been deployed to monitor and act in physical variables of the refinery.
2
7
3
4
5
6
12
10
8
9
11
Data Messages
13
14
15
16
Actuation Messages
Fig. 5. Deployment structure
The WSAN setup uses TelosB motes running Contiki operating system, where the Rime stack is used to establish network communication. The control stations are connected to the WSN through a bridge, which allows collecting data and sending actuation commands to the nodes. A schedule based MAC protocol (GinMAC [6]) is used, where time slots are defined for upstream messages, downstream messages and control flow messages. These control flow slots are used to send specific information, such as time synchronization. There is also a slot pool to be allocated by nodes for retries if necessary. The testbed was configured to have leaf nodes monitoring physical variables, and an actuator node operating a valve based on commands that were determined in a PC control station based on sensed data. Each leaf node should get data from sensors every second. The data is relayed by intermediate nodes until it reaches the sink node. Figure 6 depicts the state machines designed for the leaf nodes with which we monitor refinery variables, evaluate and control reliability. Always
Remove retry slot
Add retry slot
Monitor
packetLoss < 0.5%
Fig. 6. Deployment structure
packetLoss > 1%
Always
427
428
José Cecílio and Pedro Furtado / Procedia Computer Science 10 (2012) 422 – 431
The global code of the model includes a pool of slots that can be allocated by each node to retransmit the data messages. The monitor state depicted in figure 6 will collect data from environment sensors and send the data to the sink node. At the same time, this state evaluates the packet loss. When the packet loss is higher than 1%, the state “Add retry slot” is activated and a new retry slot is added from a pool of free retry slots (if there is one available), for additional message retries by the link. This way it is expected that the packet loss of the node decreases. The “Add retry slot” state then returns control flow to the “Monitor” state. This process can be repeated if the packet loss remains too high, until a maximum number of two retries is reached. Meanwhile if packet losses decrease (below 0.5%) the state “Remove retry slot” is activated and, if there is any extra slot allocated to the node, this slot is removed. Figure 7 shows the pseudo-code of each state. State “Monitor”: SetupMonitor(){} CycleMonitor (){ While (1) { collect data from environment Build data message Send message to the sink node. Evaluate packet loss If packet_loss > 1% Activate state “add retry slot” If packet_loss < 0.5% Activate state “remove retry slot” } }
State “AddRetrySlot”:
State “RemoveRetrySlot”:
SetupAddRetrySlot(){ If free slot to add Add slot; Return; } cycleAddRetrySlot(){}
SetupRemoveRetrySlot(){ If any extra slot occupied remove slot; Return; } cycleRemoveRetrySlot(){}
Fig. 7. Pseudo-code
To assess the data reliability we ran two tests during 75 hours. Figure 8 shows the results of reliability when we compare the state machine with retry slot allocation and without allocation.
José Cecílio and Pedro Furtado / Procedia Computer Science 10 (2012) 422 – 431
Message delivery rate per node 100% 99%
Message delivery rate
98% 97% 96% 95% 94% End to End Messages Success Rate
93% 92%
End to End Messages Success Rate (with retries)
91% 90% 2
3
4
5
6
7
8
10
12
13
14
15
16
Node
Fig. 8. Reliability
From Figure 8, it is possible to see that the designed state machine has allowed nodes with lower reliability to improve their reliability, by incorporating retry slots. This way the system was able to provide high reliability for all nodes (all nodes have less than 0.5% of losses). In the same experimental setup we also attached an actuator to node 16 (figure 5), which opens and closes a valve based on a closed loop control decided in a PC control station. The PC-based control station should send a command to the actuation node every two seconds, based on a decision algorithm that operates over the sensed data. We developed a state machine for the actuator, illustrated in figure 9. The state machine involves periodically waiting for an actuation command. Upon receiving the actuation message, the node writes a value that comes in the message in the Digital-to-Analog Converter (DAC) controlling the valve, effectively closing or opening the valve to a certain degree based on the value. When an actuation command message is received, the application also updates a variable holding the timestamp of the last received command instant. The state transition condition evaluates whether the time between now and the instant of the last received command is greater than an actuationTimeout (3s was used as timeout). If the time elapsed was greater than a certain threshold, the state machine goes to a safety control state. This state writes a predefined safety value in the actuator register to position the valve in a safe position. When a command is received, if the current state is “Safety Control State”, the “BackToNormal” condition causes a return to the “Actuation State”. Actuation Timeout: Now()-lastCmdTS > actuationTimeout
Actuation State
Safety Control State
BackToNormal: msgWaiting == TRUE
Fig. 9. Actuation and Safety Control State Machine
With our modelling done, we then controlled the control station application that closes the loop with a specific rate to inject a timing fault – that is, the application would stop sending commands for a specific period of time - and collected performance logs in the control station to analyze the modifications (state switching). With these logs we have built timeline charts to show the results.
429
430
José Cecílio and Pedro Furtado / Procedia Computer Science 10 (2012) 422 – 431
Figure 10 illustrates a timing diagram showing the instant of submission of the fault, the instant when the node detected the fault and switched to safety control state. The fault had 1 minute of duration and the actuation timeout was defined as 3s. 5 Start fault
4,5 4 3,5
(b)
3 2,5
Safety control State
(a)
2
Actuation state
1,5 1 0,5 15:33:02
15:32:53
15:32:44
15:32:35
15:32:26
15:32:17
15:32:08
15:31:59
15:31:50
15:31:41
15:31:32
15:31:23
15:31:14
15:31:05
15:30:56
15:30:47
15:30:38
15:30:29
15:30:20
0
Time
Fig. 10. Fault injection timeline
In this experiment the green line (a) shows the value being written to the valve in the actuation state, while the red line (b) shows the value that was written during the safety control state. 5. Conclusions In this paper we proposed a state machine based approach to add reliability concerns to WSN applications with reliability and performance requirements. The idea is that it is beneficial to separate those non-functional concerns from the functional concerns, so that reliability and performance related requirements are clearly elicited through the state machine mechanism. On the other hand, it is also important to keep the platform language programming capability, which is why the approach allows users to program the applications in platform language and to add reliability and performance concerns elicitation through state machine behaviour at the same time. We have described the mechanism, code templates used by it and how platform code is generated. We have shown an example that prove the benefit of the approach and presented some results from a practical testbed deployment of it. REFERENCES
[1] O. Kasten and K. Romer, “Beyond event handlers: Programming wireless sensors with attributed state machines”. In Proc. of IPSN, 2005.
[2] T. Kim and S. Hong, “State machine based operating system architecture for wireless sensor networks”. In Proc. of Parallel and Distributed Computing: Applications and Technologies, 2004.
[3] R. Shea, S. Markstrum, T. Millstein, R. Majumdar, and M. B. Srivastava, “Static checking for dynamic resource management in sensor network systems”. Technical Report TRUCLA-NESL-200611-02, UCLA, 2006.
[4] C.-C. Han, “A dynamic operating system for sensor nodes”. In Proc. of MobiSys, 2005. [5] W. Archer, P. Levis, and J. Regehr, “Interface contracts for tinyOS”. In Proc. of IPSN, 2007. [6] P. Suriyachai, J. Brown, and U. Roedig, “Poster Abstract: A MAC Protocol for Industrial Process Automation and Control,” in In Proceedings of 7th European Conference on Wireless Sensor Networks (EWSN 10), Coimbra, Portugal, February 2010.
[7] Mottola L., “"Programming Wireless Sensor Networks: From Physical to Logical Neighborhoods".,” Politecnico di Milano (Italy)., 2008.
[8] B. Greenstein, E. Kohler, e D. Estrin, “A sensor network application construction kit (SNACK),” Proceedings of the 2nd international conference on Embedded networked sensor systems, Baltimore, MD, USA: ACM, 2004, pp. 69-80.
José Cecílio and Pedro Furtado / Procedia Computer Science 10 (2012) 422 – 431
[9] O. Gnawali, K. Jang, J. Paek, M. Vieira, R. Govindan, B. Greenstein, A. Joki, D. Estrin, e E. Kohler, “The Tenet architecture
[10] [11] [12] [13] [14] [15]
[16] [17] [18] [19]
for tiered sensor networks,” Proceedings of the 4th international conference on Embedded networked sensor systems, Boulder, Colorado, USA: ACM, 2006, pp. 153-166. G. Mainland, M. Welsh, e G. Morrisett, Flask: A Language for Data-driven Sensor Network Programs, 2006. K. Whitehouse, F. Zhao, e J. Liu, “Semantic streams: A framework for composable semantic interpretation of sensor data,” Lecture Notes in Computer Science, vol. 3868, 2006, p. 5. R. Newton, G. Morrisett, e M. Welsh, “The regiment macroprogramming system,” Proceedings of the 6th international conference on Information processing in sensor networks, Cambridge, Massachusetts, USA: ACM, 2007, pp. 489-498. J. Liu, M. Chu, J. Reich, e F. Zhao, “State-centric programming for sensor-actuator network systems,” Pervasive Computing, IEEE, vol. 2, 2003, pp. 50-62. S.R. Madden, M.J. Franklin, J.M. Hellerstein, e W. Hong, “TinyDB: an acquisitional query processing system for sensor networks,” ACM Trans. Database Syst., vol. 30, 2005, pp. 122-173. M. Welsh e G. Mainland, “Programming sensor networks using abstract regions,” Proceedings of the 1st conference on Symposium on Networked Systems Design and Implementation - Volume 1, San Francisco, California: USENIX Association, 2004, pp. 3-3. N. Kothari, R. Gummadi, T. Millstein, e R. Govindan, “Reliable and efficient programming abstractions for wireless sensor networks,” SIGPLAN Not., vol. 42, 2007, pp. 200-210. Y. Yao e J. Gehrke, “The cougar approach to in-network query processing in sensor networks,” SIGMOD Rec., vol. 31, 2002, pp. 9-18. M.J. Ocean, A. Bestavros, e A.J. Kfoury, “snBench,” Proceedings of the 2nd international conference on Virtual execution environments - VEE '06, Ottawa, Ontario, Canada: 2006, p. 89. A. Bakshi, V.K. Prasanna, J. Reich, e D. Larner, “The Abstract Task Graph: a methodology for architecture-independent programming of networked sensor systems,” Proceedings of the 2005 workshop on End-to-end, sense-and-respond systems, applications and services, Seattle, Washington: USENIX Association, 2005, pp. 19-24.
431