A new learning automata-based approach for maximizing network lifetime in wireless sensor networks with adjustable sensing ranges

A new learning automata-based approach for maximizing network lifetime in wireless sensor networks with adjustable sensing ranges

Author's Accepted Manuscript A new learning automata-based approach for maximizing network lifetime in wireless sensor networks with adjustable sensi...

776KB Sizes 0 Downloads 11 Views

Author's Accepted Manuscript

A new learning automata-based approach for maximizing network lifetime in wireless sensor networks with adjustable sensing ranges Hosein Mohamadi, Shaharuddin Salleh, Mohd Norsyarizad Razali, Sara Marouf

www.elsevier.com/locate/neucom

PII: DOI: Reference:

S0925-2312(14)01650-6 http://dx.doi.org/10.1016/j.neucom.2014.11.056 NEUCOM14939

To appear in:

Neurocomputing

Received date: 4 June 2014 Revised date: 15 November 2014 Accepted date: 21 November 2014 Cite this article as: Hosein Mohamadi, Shaharuddin Salleh, Mohd Norsyarizad Razali, Sara Marouf, A new learning automata-based approach for maximizing network lifetime in wireless sensor networks with adjustable sensing ranges, Neurocomputing, http://dx.doi.org/10.1016/j.neucom.2014.11.056 This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting galley proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

A new learning automata-based approach for maximizing network lifetime in wireless sensor networks with adjustable sensing ranges Hosein Mohamadia , Shaharuddin SallehIa , Mohd Norsyarizad Razalib , Sara Maroufa a Center

for Industrial and Applied Mathematics, Universiti Teknologi Malaysia, 81310 Johor Bahru, Malaysia b Faculty of Science, Universiti Teknologi Malaysia, 81310 Johor Bahru, Malaysia

Abstract Recently, several algorithms have been proposed to solve the problem of target coverage in wireless sensor networks (WSNs). A conventional assumption is that sensors have single power level (i.e., fixed sensing range); however, in real applications, sensors might have multiple power levels, which determines different sensing ranges and, consequently, different power consumptions. Accordingly, one of the most important problems in WSNs is to monitor all the targets in a specific area and, at the same time, maximize the network lifetime in a network in which sensors have multiple power levels. To solve the problem, this paper proposes a learning-automata based algorithm equipped with a pruning rule. The proposed algorithm attempts to select a number of sensor nodes with minimum energy consumption to monitor all the targets in the network. To investigate the efficiency of the proposed algorithm, several simulations were conducted, and the obtained results were compared with those of two greedy-based algorithms. The results showed that, compared to the greedy-based algorithms, the proposed learning automata-based algorithm was more successful in prolonging the network lifetime and constructing higher number of cover sets. Keywords: Wireless sensor networks, Cover set formation, Learning automata

I Corresponding

author. E-mail addresses: [email protected] (Shaharuddin Salleh). Hp: +60122130572 Preprint submitted to Neurocomputing

November 26, 2014

1. Introduction In recent years, wireless sensor networks (WSNs) have been widely used in many applications such as national security, surveillance, health care, etc [2]. A WSN is composed of a number of low-power, low-cost sensor nodes equipped with components for sensing and processing data, as well as communicating with other sensor nodes [2]. One of the most challenging issues in WSNs is coverage that focuses on how well the sensors cover the monitoring region. In general, coverage problem can be classified into two main types: area coverage and target coverage. The area coverage addresses the problem of covering the whole points within the monitoring area. Whereas, the target coverage problem refers to covering only a set of fixed or moving targets within the sensor field [3]. This study addresses the problem of target coverage in cases in which sensors have multiple power levels with the aim of extending the network lifetime (i.e., the amount of time during which the monitoring activity can be performed). This problem, which is known as Maximum Network Lifetime with Adjustable Ranges (MNLAR) [1], is of a great importance because the energy of sensors is limited and their batteries cannot be easily recharged, especially in harsh environments. To overcome this problem, power saving mechanisms can be used to optimize sensor energy consumption. In general, these mechanisms are presented in the form of two different techniques: (i) scheduling the state of sensor nodes and (ii) adjusting the sensing range of sensor nodes. In the scheduling technique, an appropriate state (either active or passive) is chosen for each sensor node in order to save the limited energy of sensors during the network operation. In the adjusting technique, the most appropriate sensing range of each active sensor is chosen for monitoring the targets in such a way that the energy can be saved as much as possible. To solve the target coverage problem and prolong the network lifetime as far as possible, in this paper, we make use of advantages of both techniques. In other words, a scheduling algorithm is designed, in which only some of the sensors are active at any given time, whereas other sensors are switched to sleep state. Additionally, the algorithm attempts to find a minimum sensing range for the active sensors to meet the target coverage requirement. To demonstrate the efficiency of scheduling technique, an example network is given in Fig. 1, which is composed of three sensors and three targets. Let S = s1 , s2 , s3 signify the set of sensors and T = t1 ,t2 ,t3 show the set of targets. In this network, each sensor node has a single power level and can monitor a target if the target is positioned within its sensing range. For instance, sensor s3 can monitor target t1 and t2 . In this network, the possible cover sets are {s1 , s2 }, 2

{s1 , s3 }, and {s2 , s3 }. A cover set is a subset of sensors that monitor the whole targets. The classical assumption is that each sensor can be active for 1 unit of time. For example, by activating one of the above-mentioned cover sets, e.g., {s1 , s2 } for the whole battery life of its sensors, the whole targets can be monitored for 1 unit of time. Therefore, the network lifetime cannot be further extended since only s3 has residual lifetime and it is not able to monitor all the targets. While, by activating each of the three cover sets {s1 , s2 }, {s1 , s3 }, and {s2 , s3 } for 0.5 units of time, the network lifetime is equal to 1.5 units of time. As can be concluded from this example, the scheduling technique can successfully extend the network lifetime.

t2

s2

t3

Source sensor s3

s1 t1

Target

Sensor

.

Figure 1: Example network with three sensors and three targets.

To show the efficiency of the adjusting technique, an example network is depicted in Fig. 2, which consists of three sensors, three targets, and two power levels. Figures 2.A and 2.B display a network in which the sensors have a single power level and multiple power levels, respectively. In this paper, (si , a) refers to sensor si when activated at level a. Assume that the batteries can keep the sensors active for 1 unit of time at power level 1 and 0.5 unit at power level 2. If a single power level taken into account, there will be only one cover set (i.e., (s1 , 1), (s3 , 1)) and the total network lifetime will be equal to 1. Whereas, with multiple power levels, there will be more cover sets and the network lifetime is equal to 1.5 (for example, consider, {(s1 , 2), (s2 , 2)}, {(s3 , 2), (s2 , 2)}, and {(s1 , 2), (s3 , 2)} are activated for 0.5 unit of time). In fact, the selection of sensors with appropriate sensing ranges can prolong considerably the network lifetime.

3

(s2, 2) (s2, 1) s2 t2

t2

t1

t1 s2

Source sensor s3

s3

s1

s1 (s3, 1)

t3

A

(s3, 2)

t3

(s1, 1) (s1, 2)

B

Target

Sensor

.

Figure 2: Example network with three sensors, three targets and two power levels.

A number of studies have been recently conducted to solve the problem of MNLAR; however, learning automata (LA), which are modern heuristic methods, have not received adequate attention. This paper proposes an LA-based scheduling algorithm to find a near optimal solution to the MNLAR problem. In the proposed algorithm, network operation is divided into a number of rounds and the outcome of each round is a cover set. To generate a cover set, the algorithm explores a possible cover set of the network. The cover set is rewarded if the amount of energy consumed by the adjusted sensors in the cover set is less than that of the best cover set found so far. As the proposed algorithm goes on, the automata learn how to choose the best actions (the adjusted sensors with appropriate sensing ranges) to find an optimal cover set of the network among all cover sets. The process of generating cover sets continues until all the targets are under full coverage of the sensors. The performance of the proposed algorithm was evaluated through conducting several experiments in terms of the network lifetime, and the obtained results were compared with those of two greedy-based algorithms. The results demonstrated that the proposed algorithm could contribute more successfully to extending the network lifetime and number of constructed cover sets. Generally, this paper has the following contributions: (1) designing an LAbased algorithm to solve the MNLAR problem; (2) proposing a pruning rule to improve the performance of the proposed algorithm; and (3) evaluating the performance of the algorithm through several experiments. The remainder of this article is organized as follows. In Section 2, the related studies on prolonging the network lifetime are presented. In Section 3, the problem of MNLAR is presented. In Section 4, LA and variable action-set LA are 4

introduced. In Section 5, a new scheduling algorithm is proposed for solving the problem. In Section 6, the performance of the proposed algorithm is evaluated through the simulation experiments. Finally, Section 7 concludes the paper. 2. Related work One of the most important challenges in WSNs is increasing the network lifetime. In this condition, energy efficiency is an important issue in WSNs where the battery of sensors cannot be changed or recharged. Recently, most of the studies conducted on extending the network lifetime have been focused on the management of energy consumption. One of the methods commonly used for increasing the network lifetime is scheduling the nodes’ activity. In this method, the network operation falls into several rounds. In each round, the sensors of one cover set are activated to perform the network operation and the other sensors are switched to inactive state for saving energy. This strategy has an important contribution to prolonging the network lifetime because of two reasons. First, inactive sensors consume a negligible amount of energy. Second, if the battery of sensors is frequently switched between active and inactive state, it can last for a longer time. The scheduling technique is applied to cases where sensors are deployed redundantly. The adjusting technique enables sensors to save energy when they are covering targets positioned in their vicinity since power consumption of sensors depends on the distance between sensors and targets. This technique is applied to the networks where sensors have multiple sensing ranges [23]. This paper attempts to find a solution to the target coverage problem and, simultaneously, extend the network lifetime by means of both techniques of power saving (i.e., scheduling sensors and adjusting sensing ranges). In the following, some outstanding studies are presented. In the literature, several studies can be found, which have used the scheduling technique for solving the target coverage problem in WSNs (see [5, 6, 7, 8, 9, 10, 11, 12]). For the first time, Cardei and Du [5] addressed the problem of target coverage and proved its NP-completeness. They modeled the problem as disjoint cover sets each of which could monitor the whole targets. In [6], non-disjoint cover sets were introduced, in which each sensor could take part in more than one cover set. The authors demonstrated that non-disjoint cover sets had a positive contribution to prolonging the network lifetime. In [7], two greedy algorithms were proposed to solve the target coverage problem through maximize the number of cover sets by managing the critical targets. Another method to solve the target coverage problem proposed in [8] was applying the optimization capability of 5

memetic algorithms. In [9], the authors proposed an iterative approximation based on Lagrangean relaxation and subgradient optimization to solve the problem and prolong the network lifetime. LA were also employed to solve the target coverage problem (e.g., [10, 11, 12, 24]). The above-mentioned studies attempted to solve efficiently the target coverage problem in networks where sensors had a single power level. On the other hand, the literature contains a number of studies that have combined both techniques of power saving in order to solve the target coverage problem in WSNs (see [4, 14, 25, 26, 27]). In [4], the authors presented the target coverage problem in a mathematical model using a linear program with exponential number of variables. They solved the linear program using an approximation algorithm. In [14], a different model was introduced to adjust the sensing range of sensors, in which each sensor could smoothly adjust its sensing range between 0 and a maximum value of sensing range. In [25], two coverage algorithms were introduced based on an adjustable model, in which sensors could select their sensing ranges from among three sensing ranges (i.e., maximum, medium, and small). The authors attempted to decrease as much as possible the overlapped area between sensor nodes in order to save energy, hence maximizing the network lifetime. In [26], a number of heuristics, including greedy formulations, Steiner Tree, and Vornoi-based approaches were proposed in order to select a connected cover set with minimum energy consumption in networks in which sensors could adjust their sensing range and transmission radius. The authors demonstrated the superiority of the Vornoi-based approach over the others in solving the problem. In [27], the problem of finding the maximum number of sensor cover sets was investigated. To this end, the authors proposed a linear programming-based formulation, a linear programming-based heuristic, and greedy formulations. They used a sensing model wherein sensors could select their sensing range from among a set of predefined values. In [20], two versions of the problem were addressed. In the first one, it was assumed that sensing ranges were continuously adjustable, while in the second one, sensors had to select their sensing range from a set of predefined values that were the same to all sensors. For solving the problems, an exact approach was proposed based on a column generation algorithm. In the column generation algorithm, a genetic algorithm was used to reduce the computation time of the exact approach. In [1], the assumption was that each sensor was able to be activated in a certain number of alternative power levels, indicating different sensing ranges and power consumptions. To solve the problem, the authors proposed a greedy heuristic, a local search procedure, and an exact method that was based on the 6

column generation technique. In [20, 1], several experiments were conducted to evaluate the effect of using adjustable sensing ranges and classical single range scheme on extending the network lifetime. The obtained results showed that the use of adjustable sensing ranges was superior to classical single range scheme. 3. The Problem of Maximum Network Lifetime with Adjustable Ranges This section explains the problem of maximizing the network lifetime in WSNs, which is known as MNLAR (Maximum Network Lifetime with Adjustable Ranges) [1]. Table 1 presents the notations used in this paper. Table 1: Notations

Notation

Meaning

N M a sn tm ln S T (sn , a) T(sn , a)

Number of sensors Number of targets Number of alternative power levels, a = 1, 2,… A sensor, for all n = 1, 2,…,N A target, for all m= 1, 2,…,M Lifetime of sensor sn Set of sensors, S = s1, s2,…,sN Set of targets, T = t1, t2,…,tM Refers to sensor sn activated at level a. This pair also is defined as adjusted sensor Refers to all the targets covered by sensor sn when it is set at level a

The problem can be described as follows. Assume that several targets with known locations are deployed within an interested area. In addition, a number of non-rechargeable sensors with adjustable sensing ranges are distributed near to the targets to monitor them continuously. Each sensor can monitor the whole targets located within its sensing range and has a limited amount of energy and it can be activated at a finite number of alternative power levels (sensing ranges). Each active sensor consumes energy depending on the size of its sensing range [1]. Whereas, inactive sensors do not monitor any target, hence consuming a negligible amount of energy. The sensors are homogeneous in both initial battery power and the energy consumption for each sensing range. As the power levels extend gradually the sensing ranges of the sensors, for each sensor sn and each level 7

a > 1, we have T (sn , b) ⊆ T (sn , a) ∀ b ∈ {1, ..., a − 1}. Moreover, the adjusted sensor (sn , a) is defined as minimal adjusted sensor for target tm if tm ∈ T (sn , a) and either a = 1 or tm ∈ / T (sn , b) ∀ b ∈ {1, ..., a − 1}. In this paper, to model different battery consumptions, a positive parameter ∆a is assigned to each power level a [1]. This parameter, ∆a , denotes the ratio between battery consumption at level a and level 1 (which is the least powerful level, hence the cheapest one). For instance, ∆a = 2 indicates that the energy consumption of level a is two times more than level 1 (obviously, ∆1 = 1). Additionally, the total battery power is normalized on the energy consumption of level 1; it means that the battery of a sensor keeps it activated for 1 time unit if it is always set at level 1. Problem: How to organize sensors into several cover sets in such a way that each cover set could monitor all the targets and, at the same time, the network lifetime could be maximized. In this paper, organizing the sensors refers to specifying the mode of each sensor as either active or passive and finding the most appropriate sensing range of the active sensors. 4. Learning automata and variable action-set learning automata 4.1. Learning Automata A learning automaton is an adaptive decision-making unit that is able to improve its own performance by learning how to select the optimal action from among a finite set of actions [15]. To achieve this goal, the learning automaton continuously interacts with a random operating environment. The operation of learning automaton is started when the automaton chooses one of its available actions according to its action probability vector. Then, the chosen action is applied as an input to the random environment. Afterwards, the environment assesses the action and lets the automaton know the results via a reinforcement signal in the form of a reward or penalty. Accordingly, the automaton updates its action probability vector and selects the next action. The environment is shown by a triple E = (α, β , c) where: α = {α1 , α2 , ..., αr } signifies the set of inputs (actions). β = {β1 , β2 , ..., βm } is the set of outputs. c = {c1 , c2 , ..., cr } is the set of penalty probabilities (the probability is measured by the reaction of the environment). Each element c is associated with an element of α (to assess the inputs that will be considered actions in the environment). Depending on whether the penalty probabilities are constant or variable, the environment is categorized as stationary or non-stationary, respectively. Based on 8

the nature of the reinforcement signal β , the environment can be classified into P-model, Q-model, and S-model. P-model refers to an environment wherein the reinforcement signal is able to take only two binary values 0 and 1. Q-model is an environment in which a finite number of the values in the interval [0,1] can be taken by the reinforcement signal. Finally, in an S-model of the environment, the reinforcement signal is a continuous random variable that assumes values in the interval [a,b]. LA are categorized into two main groups [15]: fixed structure LA and variable structure LA. Here, the latter is explained in detail. Variable structure LA are represented by a triple (β , α, L), where β signifies the set of inputs, α represents the set of actions, and L is learning algorithm that is a recurrence relation used for modifying the action probability vector. Let αi (k) ∈ α represent the action that is selected by learning automaton and p(k) denote the probability vector defined over the action-set at instant k. Let a represent the reward parameter that determines the amount of increase of the action probability values. And let b signify the penalty parameter determining the amount of decrease of the action probability values. Let r denote the number of actions that learning automaton can take. At each instant k, if the chosen action αi (k) is rewarded by the random environment, the action probability vector p(k) is updated according to Eq. (1). On the other hand, if that action is penalized, updating will be performed by Eq. (2). If a = b, the recurrence Eqs. (1) and (2) are called the linear reward-penalty (LR−P ) method; if a  b, those equations are called the linear reward-ε penalty(LR−εP ) method; and if b = 0, they are called the linear reward-inaction (LR−I ) method. In the last condition, once the chosen action is penalized by the environment, the action probability vectors remain unchanged. ( p j (k) + a[1 − p j (k)] j=i p j (k + 1) = (1) (1 − a)p j (k) ∀ j 6= i ( (1 − b)p j (k) p j (k + 1) = b ) + (1 − b)p j (k) ( r−1

j=i ∀ j 6= i

(2)

LA play the role of an appropriate optimization tool in cases such as (i) the complex and dynamic environments where a large amount of uncertainty exists, (ii) situations where there is insufficient information about the environment, and (iii) solutions to large-scale problems categorized as NP-complete problems(see [21, 22]). WSNs are one of the most significant examples in which all aforementioned cases can be found. Recently, several LA-based scheduling algorithms 9

have been proposed for solving the problem of target coverage in sensor networks [17, 18, 19, 28]. 4.2. Variable Action Set Learning Automata In a variable action-set learning automaton, the number of available actions at each instance is variable. In [16], it has been shown that once the reinforcement scheme is (LR−I ), a variable action-set learning automaton is absolutely expedient and also ε-optimal. Such an automaton consists of a finite set of n actions, α = {α1 , α2 , ..., αn }. A = {A1 , A2 , ..., Am } represents the set of action subsets and A(k) ⊆ α stands for the subset of all actions that learning automaton can choose at each instant k. According to the probability distribution ψ(k) = {ψ1 (k), ψ2 (k), ..., ψm (k)}, an external agency selects randomly the particular action subsets. Where ψ(k) = prob[A(k) = Ai |Ai ∈ A, 1 ≤ i ≤ 2n − 1] pˆi (k) = prob[a(k) = ai |A(k), ai ∈ A(k)] represents the probability of choosing action αi , if the action subset A(k) has already been selected and αi ∈ A(k) too. The scaled probability pˆi (k) is defined as pˆi (k) =

pi (k) K(k)

(3)

where K(k) = ∑αi ∈Ak pi (k) is the sum of the probabilities of the actions in subset A(k), and pi (k) = prob[α(k) = αi ]. In a variable action-set learning automaton, the process of selecting an action and updating its probability can be explained as follows. Let A(k) denote the action subset chosen at instant k. Before selecting an action, using Eq. (3), the probabilities of all the actions in the selected subset are scaled. Then, according to the scaled action probability vector p(k), ˆ the automaton randomly selects one of its own possible actions. Based on the responses from the environment, the scaled action probability vector of the learning automaton is updated. Note that in this step, only the probability of the available actions is updated. Finally, using pi (k + 1) = pˆi (k + 1).K(k), probability vector of the actions contained in the chosen subset is re-scaled, for all αi ∈ A(k). A proof of the absolute expediency and ε-optimality of the above-described method can be found in [16]. 5. Proposed Algorithm In this section, we propose a new scheduling algorithm in order to solve the problem of MNLAR. The pseudo code of the algorithm is shown in Algorithm 1. 10

In the proposed algorithm, the network operation falls into several rounds (the number of rounds depends on the sensor selection strategy), and the output of each round is a cover set that is able to monitor all targets. In the following, we explain this pseudo code and introduce some notations used in this paper. Line 1 contains the input parameters. Parameter ln initialized in lines 2-4 represents the amount of residual lifetime for each sensor sn . Set SOL initialized in line 5 contains the cover sets with their related activation times. Parameter LF initialized in line 6 shows the maximum network lifetime. Set SOL and the value of LF are returned as solution and the maximum network lifetime, respectively. Line 7 checks whether the available sensors can still monitor all the targets and generate a new cover set. Algorithm 1 Scheduling Algorithm 01.Input: Wireless sensor network Net = (S, T), number of power levels a, 02. for each sn ∈ S do 03. ln ← 1 04. end for 05. SOL ← 0/ 06. LF ← 0 S 07. while sn ∈S T (sn , a) ≡ T do 08. Ccur = Call procedure CSF(S, T, a) to construct a cover set 09. W Tl =max feasible activation time for Ccur 10. LF ← LF +W Tl 11. for each (sn , a) ∈ Ccur do 12. ln ← ln − (∆a ) 13. if ln = 0 then 14. S ← S − sn 15. end if 16. end for S 17. SOL ← SOL {(Ccur , W Tl )} 18. end while 19. return (SOL, LF)

To generate a cover set at each round, we propose an LA-based algorithm (see subsection 5.1). The proposed algorithm iteratively finds different possible cover 11

sets of the network, and depending on the response received from the environment, the cover sets are penalized or rewarded. Finally, the proposed algorithm learns how to find the cover set with the highest probability among all of the cover sets. Following the construction of a cover set, an appropriate activation time within a given upper bound is assigned to the cover set to keep the feasibility of the solution. The activation time is added to the total network lifetime. In line 10, the network lifetime is updated. Next, the residual energy of those sensors existing in the cover set is updated based on the activation time, and the sensors that have no residual energy are eliminated from the set of available sensors (see Lines 11-16). To compute the activation time, we consider the adjusted sensor of Ccur that minimizes ∆lna ; let us call it (sh , b). We set W Tl = L∆shb ; this ensures a feasible activation time for each (sn , a) ∈ Ccur . Line 14 updates the list of available sensors in set S. In line 17, the newly-constructed cover set and its activation time are added to the solution. The process of cover sets formation continues until all the targets are under the full coverage of the sensors. Finally, as the output of the algorithm, the constructed cover sets and their activation times are returned. 5.1. Cover Set Formation In this section, we propose an LA-based algorithm that is able to generate cover sets containing sensors with appropriate sensing ranges. The algorithm is composed of two phases: (i) initialization and (ii) sensing node selection. In the former, a network of LA is constructed, and then the action-set and action probability vector of each learning automaton are configured. In the latter, LA attempt to select a subset of sensors with appropriate sensing ranges to construct an optimal cover set. The following subsections describe these phases in detail. 5.1.1. Initialization In the initialization phase, the process of constructing a network of LA and adjusting their action-set and action probability vector is performed. In the proposed algorithm, the network of LA is formed through equipping each target in the network with a learning automaton. The network of LA can be modeled by a duple hA, αi, where A = {Am |∀tm ∈ T } represents the set of LA corresponding to the targets in the network, and α = {αm |∀Am ∈ A} denotes the set of action-sets of LA in which αm = {αm1 , αm2 , ..., αmri } defines the set of actions that learning automaton Am is able to select (for each αm ∈ α), and ri is the cardinality of action-set αm , which is dependent on the number of minimal adjusted sensors that monitor the target corresponding to the learning automaton. In this algorithm, LA with a variable number of actions are used and a pruning rule is proposed to prune the 12

action-set of LA. This is because in a fixed action-set LA, the redundant sensors and/or more than one sensing range of each sensor might be chosen by the LA. In the algorithm, LA can be in either active or passive state (initially, they are set to the passive state). After generating a network of LA, their action-set and action probability vector should be configured. In the algorithm, each learning automaton forms its actionset by assigning an action to each minimal adjusted sensor that covers the target j corresponding to the learning automaton. Let αm = {αm |(sn , a) cover target tm }. j Action αm is corresponding to the selection of minimal adjusted sensor (sn , a) (as an active sensor) that is done by learning automaton Am . Having formed the action-set of LA, the algorithm should adjust the action probability vectors of LA. Let p = {pm |∀αm ∈ α} signify the set of action probability vectors. Let pm = j j {pm |∀αm ∈ αm } represent the action probability vector of learning automaton Am , j j where pm is corresponding to the choice probability of action αm . To improve the convergence speed of the proposed algorithm, the action probability vector of automaton Am should be configured in such a way that the adjusted sensors with high covering power can be more likely to be selected. To this purpose, the proposed algorithm initially configures the action probability vector of learning automaton Am using Eq. (4). pmj (k) =

CP(sn , a) ∑ CP(sn , a)

(4)

where CP(sn , a) denotes the covering power of adjusted sensor (sn , a), and ∑ CP(sn , a) signifies the total covering power of all adjusted sensors that monitor target tm . In order to evaluate the covering power of adjusted sensor (sn , a), we use Eq. (5). |T(sn , a) Tcur | CP(sn , a) = ∆a T

(5)

Here, |T(sn , a) Tcur | represents the total number of uncovered targets that can be monitored by the adjusted sensor (sn , a), and ∆a denotes consumption ratio. Eq. (5) gives a higher score to the sensors that have relevant covering capabilities, while it gives a lower score to the sensors with high power levels, which do not bring considerable improvements. T

13

(s2, 2) (s2, 1) s2 t2

t3

t1

Source sensor s1

s3

(s1, 1) (s1, 2)

(s3, 1) (s3, 2)

Target

Sensor

.

Figure 3: Example network with three sensors, three targets, and two power levels.

For more clarification, let us consider the example network depicted in Figure 3, in which three sensors and three targets are deployed. As mentioned earlier, each learning automaton forms its action-set by assigning an action to each minimal adjusted sensor that covers the target corresponding to that learning automaton. For example, learning automaton A3 that is assigned to target t3 has three actions. This is because target t3 is covered by three minimal adjusted sensors ({(s1 , 2), (s2 , 1), (s3 , 2)}). Let us assume that ∆2 = 2 (note that by definition ∆1 = 1). The covering powers of the sensors are as follow: CP(s1 , 2) = 22 = 1, CP(s2 , 2) = 21 = 2, and CP(s3 , 2) = 22 = 1. Here, the total covering power ∑ CP(sn , a) is 4. Therefore, the initial action probability vector of automaton A3 is defined as p3 (1) = {ps31 = 0.25, ps32 = 0.5, ps33 = 0.25}. It can be seen that the sensor that has a higher covering power is more probable to be chosen. Note that the action-set and action probability vector of LA may change over time. There are two conditions in which the change occurs: (i) when the energy of a sensor runs out and (ii) when the action-set of a learning automaton is to be pruned. For example, if adjusted sensor (sn , a) becomes disabled at stage k + 1, the action-set of learning automaton Am is updated through removing the action corresponding to the adjusted sensor. Then, the choice probability of the removed j j0 action (αm ) is set to zero, and that of other actions (αm ) is updated as follows. 0

j

0

pmj (k + 1) = pmj (k).[1 +

pm (k) j 1 − pm (k)

]

j 6= j0

(6)

Consider the example network depicted in Fig. 3. Let us assume that sensor node s1 becomes unavailable at stage k. It is also assumed that the probability vector of 14

automaton A3 at stage k is p3 (k) = {ps31 = 0.2, ps32 = 0.4, ps33 = 0.4}. According to Eq. (6), the probability vector of automaton A3 at stage k + 1 (after sensor node s1 runs down) is defined as p3 (k + 1) = {ps31 = 0, ps32 = 0.5, ps33 = 0.5}. Up to this step, a network of LA has been constructed, and the action-set and action probability vector of LA have been configured. Next, the sensing node selection phase will be explained. 5.1.2. Sensing Node Selection After adjusting the action-set and action probability vector of LA, the proposed algorithm selects a subset of appropriate adjusted sensors to form a cover set. The algorithm is composed of a number of stages at each of which a subset of adjusted sensors is selected. These adjusted sensors are chosen by LA through a process leading to formation of a cover set, during which the actions that may cause redundancy are disabled. Afterward, the optimality of the constructed cover set is evaluated by a random environment. Here, WSN plays the role of random environment for LA. The environment computes the sum of energy consumed by the sensors that exist in the constructed cover set and produces a response to optimality. Depending on this response, the selected actions are either rewarded or penalized. The iterative process of constructing a cover set and updating the action probability vectors continues until a cover set with the minimum energy consumption is formed. The pseudo code of the proposed algorithm is shown in Algorithm 2. The k-th stage of the algorithm is described as follows. Sets Tcur and Ccur keep the list of uncovered targets and that of the adjusted sensors that have already been included in Ccur , respectively. Set Dcur keeps the list of dominated automata. Once the process of cover set formation gets started, the most critical passive automaton is chosen and marked as an active one. Then, the automaton prunes its action-set using the pruning rule and selects one of its actions that is corresponding to an adjusted sensor that covers the most critical target. Next, the adjusted sensor is added to the cover set (i.e., Ccur ). After that, the list of uncovered targets is updated by removing the targets covered by the selected adjusted sensor. Then, the activated automaton and those passive automata corresponding to the targets covered by the selected adjusted sensor are added to the list of dominated automata (i.e., Dcur ). It is performed in order to avoid the activation of these automata and the selection of redundant adjusted sensors in next stages. This means that once an adjusted sensor that covers more than one target is selected, the automata corresponding to those targets must be added to the list of dominated automata. This helps the algorithm to choose only automata whose actions cover new targets. The process of activating a passive automaton 15

and selecting an action continues until all the targets are covered. Algorithm 2 Cover Set Formation 01.Input: Wireless sensor network 02.Output: A cover set with appropriate adjusted sensors 03.assumption: 04. Assign an automaton to each target 05. Let αm denote the action-set of automaton Am 06.begin 07. Let Tk denote the dynamic threshold at stage k 08. Let k denote the stage number initially set to zero 09. repeat 10. Tcur ← T 11. Ccur ← 0/ 12. Dcur ← 0/ 13. whileTcur 6≡ 0/ do 14. Find a critical passive automaton and activate it (call it Am ) 15. Automaton Am prunes its action-set and chooses one of its actions (say (sn , a)) 16. Add (sn , a) corresponding to the selected action to Ccur 17. Update the list of uncovered targets (i.e., Tcur ) 18. Add Am and the automata related to covered targets by (sn , a) to Dcur 19. end while 20. for each (sn , a) ∈ Ccur do 21. SumE = SumE + (a ∗ (EI/P)) 22. end for 23. if SumE ≤ Tk then 24. Reward the chosen actions of the activated automata 25. Tk ← SumE 26. else 27. Penalize the chosen actions of the activated automata 28. end if 29. k ← k + 1 30. until (stopping conditions = true) 31.end algorithm

16

Pruning rule: In this algorithm, each activated learning automaton prunes its action-set by disabling the actions corresponding to both the adjusted sensors that monitor the already-covered targets and the other sensing ranges of the sensors that one of their sensing ranges has been already selected. This pruning rule aims at avoiding two cases: the selection of redundant adjusted sensors and the selection of more than one sensing range of each sensor. As a result, the number of actions is reduced, and the convergence speed is increased which, in turn, leads to a decrease in the running time of the algorithm. After forming a cover set, a comparison is made between the value of energy consumption of the created cover set and the dynamic threshold Tk . Based on this comparison, the chosen action of the activated automata in the cover set are either rewarded or penalized. If the value of energy consumption of the created cover set is greater than dynamic threshold Tk , the chosen actions are penalized; otherwise, they are rewarded. Dynamic threshold is initially set to a large value, and at each stage, it is set to the value of energy consumption of the last rewarded cover set. When action probability vectors of activated automata are updated, the k-th iteration ends with re-enabling the disabled actions of each activated learning automaton. As the algorithm proceeds, the automata learn how to choose a subset of sensors with the appropriate sensing ranges in such a way that a cover set with minimum energy consumption can be generated. The algorithm is terminated once either the probabilities of all selected adjusted sensors are greater than a specific threshold, or the number of constructed cover sets reaches a value higher than the predefined threshold. Finally, the cover set with minimum energy consumption is returned as the output. 6. Simulation Results This section presents an evaluation on the performance of the proposed algorithm. For this purpose, several experiments were carried out to assess the influence of various parameters on the network lifetime and number of cover sets. The obtained results were compared to those of two greedy algorithms: AR-Greedy algorithm [1] and Centralized Greedy [4]. In these experiments, the wireless sensor network was setup as follows. The sensors and targets were uniformly and randomly distributed within a square sensor deployment area of size 1000(m) ∗ 1000(m). Initially, it was assumed that all the targets were monitored by at least one sensor and each sensor could monitor at least one target. By default, each sensor had one unit of energy; additionally, the number of sensors and targets were set to 250 and 10, respectively, and the sensing range was fixed to 17

200(m). Each simulation scenario was executed 10 times, and the average network lifetime and number of cover sets were then computed for each scenario. In the proposed LA-based algorithm, a round of algorithm was terminated in two cases: either the number of constructed cover sets exceeded 100, or the value of probability for the constructed cover set exceeded 0.9. In LA-based algorithms, the optimality of the solution is affected by the value of learning rate. Thus, this value should be set accurately in such a way that the algorithm can obtain acceptable results during a reasonable running time. In this study, we set the learning rate of the proposed algorithm to 0.1 for all experiments. Additionally, we employed the reinforcement scheme LR−I to update the action probability vector of the learning automata. Experiment 1. In this experiment, the influence of the number of sensors on the network lifetime and number of cover sets was investigated. The number of sensors was ranged between 100 and 200 with incremental step 25. The number of power levels was ranged from 1 to 4 with incremental step 1. It should be noted that if power level equals 1, all sensors have a fixed sensing range. The obtained results presented in Table 2 indicate that with increasing the number of sensors, the network lifetime increases, too. This is because with increasing the number of sensors, each target has chance to be covered by more sensors, which leads to the construction of more cover sets. The results also show that multiple power levels considerably affect the network lifetime. Additionally, the obtained results indicate that when power level is set to 1, the performance of LA-based algorithm is identical to that of AR-greedy; whereas, when this is set to 3 or 4, LA-based algorithm achieves longer network lifetime compared to AR-Greedy. This shows that LA-based algorithm is more successful in constructing cover sets with more appropriate sensing ranges. Furthermore, the results obtained from the proposed algorithm were compared to those of Centralized Greedy in terms of the average number of constructed cover sets when the power level was set to 4. The results demonstrated in Fig. 4 show that the proposed algorithm is able to construct more cover sets. This is because of the management of critical targets.

18

Table 2: Effect of the number of sensors on the network lifetime

Network lifetime Power

N = 100

N = 125

N = 150

N = 175

N = 200

level

AR

LA

AR

LA

AR

LA

AR

LA

AR

LA

1

1.47

1.47

1.85

1.86

2.3

2.28

2.7

2.7

3.05

3.1

2

1.57

1.55

2

2.05

2.47

2.49

2.8

2.79

3.35

3.48

3

1.85

1.89

2.42

2.5

3.05

3.15

3.42

3.54

3.82

4.05

4

2.07

2.27

2.65

2.97

3.37

3.7

3.95

4.27

4.65

5.2

LA : learning automata-based algorithm, AR : AR-Greedy

Network lifetime Power

M = 5 18

M = Centralized 10 M = 15 Greedy LA−based Algorithm LA AR LA

level

AR

LA

AR

1

5.85

4.05

4

3.42

2

7.47

5.8 14 7.45

4.2

4.32

3

9.4

9.52 12

5.17

4

11.3

11.52 10

6.35

Cover sets

16

M = 20

M = 25

AR

LA

AR

LA

3.41

3.17

3.2

2.87

2.9

3.55

3.61

3.35

3.35

2.75

2.82

5.42

4.3

4.45

3.77

3.84

3.45

3.52

6.7

4.55

4.8

4.02

4.18

3.65

3.84

8 6 100

125

150

175

200

Number of sensors Figure 4: Effect of the number of sensors on the number of constructed cover sets

Experiment 2. In this experiment, the impact of the number of targets on the network lifetime and number of cover sets was examined. To this end, the number of targets was ranged from 5 to 25 with incremental step 5. As presented in Table 3, with an increase in the number of targets, the network lifetime decreases. 19

Network lifetime Power level

N = 100 AR

N = 125

LA

AR

N = 150

LA

AR

N = 175

LA

AR

N = 200

LA

AR

LA

This is because in this situation, more sensors are needed for covering more tar1.47 1.47 1.85 1.86sensors 2.3 would 2.28 be exhausted 2.7 2.7 sooner. 3.05 The 3.1results gets. 1Consequently, the energy of also show that LA-based algorithm outperforms the AR-Greedy terms of 2 1.57 the1.55 2 2.05 2.47 2.49 2.8 2.79 3.35 in 3.48 extending the network lifetime. Additionally, as shown in Fig. 3.82 5, the proposed 3 1.85 2.42 3.05 3.42 1.89 2.5 3.15 3.54 4.05 algorithm is more successful than Centralized Greedy in terms of the number of 4 2.07 3.95 4.65 3.7 4.27 5.2 constructed cover 2.27 sets. 2.65 2.97 3.37 LA : learning automata-based algorithm, AR : AR-Greedy Table 3: Effect of the number of targets on the network lifetime

Network lifetime Power

M=5

M = 10

M = 15

M = 20

M = 25

level

AR

LA

AR

LA

AR

LA

AR

LA

AR

LA

1

5.85

5.8

4.05

4

3.42

3.41

3.17

3.2

2.87

2.9

2

7.47

7.45

4.2

4.32

3.55

3.61

3.35

3.35

2.75

2.82

3

9.4

9.52

5.17

5.42

4.3

4.45

3.77

3.84

3.45

3.52

4

11.3

11.52

6.35

6.7

4.55

4.8

4.02

4.18

3.65

3.84

26

Centralized Greedy LA−based Algorithm

24

Cover sets

22 20 18 16 14 12 5

10

15

20

25

Number of targets Figure 5: Effect of the number of targets on the number of constructed cover sets

20

Experiment 3. In this experiment, the impact of the sensing range on the network lifetime and number of constructed cover sets was investigated. The sensing range was fixed from 150(m) to 250(m) with incremental step 25(m). The results shown in Table 4 demonstrate that an increase in the sensing range causes the network lifetime to be increased, too. The reason is that when the sensing range enlarges, sensor nodes can cover more targets; thus, fewer sensors are required for monitoring the whole targets. The obtained results also show that the LA-based algorithm is able to achieve a longer network lifetime compared to AR-Greedy algorithm. Figure 6 shows that the proposed algorithm outperforms the Centralized Greedy in terms of the number of constructed cover sets. As can be concluded from all conducted experiments, the proposed LA-based algorithm is relatively more successful than AR-Greedy algorithm and Centralized Greedy algorithm in terms of extending the network lifetime and the number of constructed cover sets, respectively. Table 4: Effect of sensing range on the network lifetime

Network lifetime Power

R = 150

R = 175

R = 200

R = 225

R = 250

level

AR

LA

AR

LA

AR

LA

AR

LA

AR

LA

1

2

2

3.05

3

4.05

3.96

5.02

4.96

5.8

5.7

2

2.17

2.15

3.27

3.27

4.2

4.25

5.57

5.5

6.45

6.5

3

2.82

2.96

4.02

4.15

5.17

5.35

6.17

6.24

7.65

7.8

4

3.22

3.75

4.67

5

6.2

6.9

7.35

8.15

8.7

9.5

21

30

Centralized Greedy LA−based Algorithm

Cover sets

25

20

15

10 150

175

200

225

250

Sensing range Figure 6: Effect of sensing range on the number of constructed cover sets

7. Conclusion This paper addressed the problem of target coverage in WSNs in which the sensors had multiple power levels (sensing ranges). To solve this problem, we proposed a learning automata-based algorithm provided with a pruning rule for enhancing its performance. The algorithm was aimed to select sensors with appropriate sensing ranges in a way to meet the requirement of target coverage problem and, at the same time, maximize the network lifetime. For evaluating the performance of the proposed algorithm, several experiments were conducted and the results were compared to those of two greedy-based algorithms. It was shown that the proposed algorithm could contribute more to extending the network lifetime and constructing higher number of cover sets. 8. Acknowledgment The authors would like to thank Universiti Teknologi Malaysia and the Malaysian Ministry of Education for providing funds and support with research grants no. 01G14 and 04H43 for this research. [1] R. Cerulli, R. De Donato, A. Raiconi, Exact and heuristic methods to maximize network lifetime in wireless sensor networks with adjustable sensing ranges, European Journal of Operational Research, 220 (2012) 58-66. 22

[2] J. Yick, B. Mukherjee, D. Ghosal, Wireless sensor network survey, Computer Networks, 52 (2008) 2292-2330. [3] C. Zhu, C. Zheng, L. Shu, G. Han, A survey on coverage and connectivity issues in wireless sensor networks, Journal of Network and Computer Applications, 35 (2012) 619-632. [4] M. Cardei, J. Wu, M. Lu, Improving network lifetime using sensors with adjustable sensing ranges, Int. J. Sen. Netw., 1 (2006) 41-49. [5] M. Cardei, M.T. Thai, L. Yingshu, W. Weili, Energy-efficient target coverage in wireless sensor networks, In Proceedings of 24th Annual Joint Conference of the IEEE Computer and Communications Societies (INFOCOM), pp. 1976-1984 vol. 1973. [6] M. Cardei, D.-Z. Du, Improving wireless sensor network lifetime through power aware organization, Wirel. Netw., 11 (2005) 333-340. [7] D. Zorbas, D. Glynos, P. Kotzanikolaou, C. Douligeris, Solving coverage problems in wireless sensor networks using cover sets, Ad Hoc Networks, 8 (2010) 400-415. [8] C.-K. Ting, C.-C. Liao, A memetic algorithm for extending wireless sensor network lifetime, Information Sciences, 180 (2010) 4818-4833. [9] J. Fethi, A Lagrangean-based heuristics for the target covering problem in wireless sensor network, Applied Mathematical Modelling, 37(10) (2013) 6780-6785. [10] H. Mostafaei, M. Meybodi, Maximizing lifetime of target coverage in wireless sensor networks using learning automata, Wireless Pers Commun, 71 (2013) 1461-1477. [11] H. Mohamadi, A. Ismail, S. Salleh, A. Nodhei, Learning automata-based algorithms for finding cover sets in wireless sensor networks, J Supercomput, 66 (2013) 1533-1552. [12] H. Mohamadi, A. Ismail, S. Salleh, Solving target coverage problem using cover sets in wireless sensor networks based on learning automata, Wireless Pers Commun, 75 (2014) 447-463.

23

[13] A. Dhawan, C.T. Vu, A. Zelikovsky, Y. Li, S.K. Prasad, Maximum lifetime of sensor networks with adjustable sensing range, In Proceedings of Seventh ACIS International Conference on Software Engineering, Artificial Intelligence, Networking, and Parallel/Distributed Computing, (2006) 285-289. [14] A. Dhawan, A. Aung, S. Prasad, Distributed scheduling of a network of adjustable range sensors for coverage problems, in: S. Prasad, H. Vin, S. Sahni, M. Jaiswal, B. Thipakorn (Eds.) Information Systems, Technology and Management, Springer Berlin Heidelberg, 2010, pp. 123-132. [15] K. Najim, A.S. Poznyak, Learning automata: theory and applications. New York: Printice-Hall, 1994. [16] M.A.L. Thathachar, B.R. Harita, Learning automata with changing number of actions, IEEE Trans. Syst. Man Cybern., 17 (1987) 1095-1100. [17] H. Mohamadi, A.S. Ismail, S. Salleh, A learning automata-based algorithm for solving coverage problem in directional sensor networks, Computing, 95 (2013) 1-24. [18] H. Mohamadi, A. Ismail, S. Salleh, A. Nodehi, Learning automata-based algorithms for solving the target coverage problem in directional sensor networks, Wireless Pers Commun, 73 (2013) 1309-1330. [19] H. Mohamadi, A.S. Ismail, S. Salleh, Utilizing distributed learning automata to solve the connected target coverage problem in directional sensor networks, Sensors and Actuators A: Physical, 198 (2013) 21-30. [20] A. Rossi, A. Singh, M. Sevaux, An exact approach for maximizing the lifetime of sensor networks with adjustable sensing ranges, Computers & Operations Research, 39 (2012) 3166-3176. [21] W. Jiang, C.-L. Zhao, S.-H. Li, L. Chen, A new Learning Automata based approach for online tracking of event patterns, Neurocomputing, 137 (2014) 205-211. [22] M. Mozafari, R. Alizadeh, A cellular learning automata model of investment behavior in the stock market, Neurocomputing, 122 (2013) 470-479. [23] H. Mohamadi, S. Salleh, M. Norsyarizad Razali, Heuristic methods to maximize network lifetime in directional sensor networks with adjustable sensing ranges, Journal of Network and Computer Applications, 46 (2014) 26-35. 24

[24] H. Mohamadi, S. Salleh, A. Ismail, S. Marouf, Scheduling algorithms for extending directional sensor network lifetime, Wireless Networks, in press. [25] J. Wu, S. Yang, Coverage issue in sensor networks with adjustable ranges, In: Proceedings of international conference on parallel processing workshops, 2004, pp. 6168. [26] Z. Zhou, S.R. Das, H. Gupta, Variable radii connected sensor cover in sensor networks, ACM Transactions on Sensor Networks (TOSN), 5 (2009) 8. [27] M. Cardei, J. Wu, M. Lu, M.O. Pervaiz, Maximum network lifetime in wireless sensor networks with adjustable sensing ranges, In: Proceedings of international conference on wireless and mobile computing, networking and communications, vol. 3, 2005, pp. 438445. [28] H. Mohamadi, S. Salleh, A. Ismail, A learning automata-based solution to the priority-based target coverage problem in directional sensor networks, Wireless Pers Commun, in press.

25