Relay node placement under budget constraint

Relay node placement under budget constraint

Pervasive and Mobile Computing 53 (2019) 1–12 Contents lists available at ScienceDirect Pervasive and Mobile Computing journal homepage: www.elsevie...

969KB Sizes 0 Downloads 25 Views

Pervasive and Mobile Computing 53 (2019) 1–12

Contents lists available at ScienceDirect

Pervasive and Mobile Computing journal homepage: www.elsevier.com/locate/pmc

Relay node placement under budget constraint ∗

Chenyang Zhou, Anisha Mazumder , Arun Das, Kaustav Basu, Navid Matin-Moghaddam, Saharnaz Mehrani, Arunabha Sen Arizona State University, School of Computing, Informatics and Decision System Engineering, Tempe, AZ 85287, USA

article

info

Article history: Available online 17 December 2018 Keywords: Maximal connectedness Disconnectivity NP-complete Approximation algorithms Inapproximability

a b s t r a c t The relay node placement problem in the wireless sensor network domain has been studied extensively. But under a fixed budget, it may be impossible to procure the minimum number of relay nodes needed to design a connected network of sensor and relay nodes. Nevertheless, one would still like to design a network with high level of connectedness, or low disconnectedness. In this paper, we introduce the notion of a measure of the ‘‘connectedness’’ of a disconnected graph. We study a family of problems whose goal is to design a network with ‘‘maximal connectedness’’ subject to a fixed budget constraint. © 2018 Elsevier B.V. All rights reserved.

1. Introduction The relay node placement (RNP) problem, because of its importance in wireless sensor networks, has been studied fairly extensively in the last decade [1–10]. The RNP problem has been studied in several different scenarios. In one scenario, a number of sensor nodes (SNs) have been placed in a deployment area and the objective is to place the fewest number of relay nodes (RNs) in the area such that, (i) each SN is within the communication range of at least one RN and, (ii) the network formed by the RNs is connected. This is a two tiered network model where the RNs serve as cluster heads (or higher tier nodes) to form a connected network topology for delivery of data collected by the sensors (lower tier nodes). RNP problems have also been studied with a single tier network model scenario, where each SN is not required to be in direct contact with a RN, as they have the capability of forwarding packets received from other SNs. In a single tier network model, data collected at a SN is delivered to the data collection point by multiple hops through other sensor and RNs. In this paper, we focus our attention to the single tier network model, where a set of SNs have already been placed in the deployment area, and the goal is to place at most a specified number of RNs to realize a certain objective. This goal is novel and is different from the literature on RNP problems where the goal is to place as few RNs in the deployment area as possible so that the resulting network comprising of SNs and RNs is connected. As the deployment of RNs involves cost, it may not be possible to acquire and deploy the number of RNs necessary to make the entire network connected, particularly when one has to operate under a fixed budget. Budget constraints are extremely practical considerations in the real world. So, under stringent budget constraints, one has to give up the idea of having a fully connected network connecting all the SNs, i.e., traditional 1− connected network [11]. But one would still like to have a network with high level of ‘‘connectedness’’. In other words, make the network as much connected as possible even though, according to traditional notion of connectivity, the resultant network’s connectivity remains 0. In this paper, we introduce the notion of ‘‘connectedness’’ in a disconnected graph and present three metrics to measure it. Although resource constrained version of RNP problems have been studied in literature [7,8], to the best of our knowledge, a formal treatment of the study of the ‘‘connectedness’’ of a disconnected graph has not been undertaken earlier. Our first metric to ∗ Corresponding author. E-mail addresses: [email protected] (C. Zhou), [email protected] (A. Mazumder), [email protected] (A. Das), [email protected] (K. Basu), [email protected] (N. Matin-Moghaddam), [email protected] (S. Mehrani), [email protected] (A. Sen). https://doi.org/10.1016/j.pmcj.2018.12.001 1574-1192/© 2018 Elsevier B.V. All rights reserved.

2

C. Zhou, A. Mazumder, A. Das et al. / Pervasive and Mobile Computing 53 (2019) 1–12

Fig. 1. Figure showing variation in placing relay nodes for different objectives and budget constraints.

measure connectedness of a disconnected graph is the size of the largest connected component [11] of the graph. We argue that a larger size of the largest connected component in a disconnected graph is an indicator of a higher degree of connectedness of the graph. Our second metric to measure connectedness of a disconnected graph is the number of connected components of the graph. In a manner similar to the first metric, we argue that a lower number of connected components in a disconnected graph is also an indicator of a higher degree of connectedness of the graph. The problem scenario studied in this paper is depicted diagrammatically in Fig. 1. By communication range, we refer to the upper bound on transmission range. Furthermore, the mathematical abstraction of the RNP problem corresponds to the Geometric Steiner Tree Problem [12], and the terms Steiner Points and terminal points are used in the abstraction (where the Steiner Points and terminal points correspond to the locations of the RNs and SNs respectively). So, in this paper we have used the terms ‘‘sensor nodes’’ and ‘‘terminal points’’ interchangeably. Consider a set of twenty three SNs (shown as blue circles) deployed as shown in Fig. 1(a). In Fig. 1(a) there are three clusters—the first one with ten terminal points, the second one with eight, while the third one with five. The intra-cluster distances are within the communication range, whereas the inter-cluster ones are not. Suppose that the maximum inter cluster distance is less than twice the communication range, and as such only one RN is sufficient for connecting any two clusters. If we have the budget to place two RNs (shown as red squares), then under both metrics of connectedness, the placement of RNs as shown in Fig. 1(b) is an optimal solution. However, if we have a budget for only one RN, the solution shown in Fig. 1(c) is an optimal solution under budget constraint according to the second metric of connectedness. This is true as there are exactly two connected components which is the best that can be achieved with only one RN. However, in this solution, the largest connected component has only thirteen nodes and is not optimal according to the first metric. Fig. 1(d) shows the placement of the RN which is optimal under budget constraint according to the first metric, having the largest connected component with eighteen terminal points. It may be noted that this placement also results in an optimal solution under budget constraint according to the second metric. The third metric to measure the ‘‘connectedness of a disconnected graph’’ is the size of the smallest connected component of the graph. We argue that a larger size of the smallest connected component can also be viewed as an indicator of a higher degree of connectedness of the graph. The contributions and organization of this paper are as follows: 1. In this paper, we present the novel notion of ‘‘connectedness’’ of a disconnected graph and propose three metrics for measuring the same. 2. We show that network design problem using the three metrics are NP-complete (Section 3). 3. We show that the computation of the optimal solution w.r.t the first metric even for mere three SNs is non trivial (Section 4). 1 4. We present an approximation algorithm with a performance bound of 10 for the first metric (Section 4) and present experiments results to demonstrate the efficacy of the algorithm 5. 5. We present inapproximability results for the second and the third metrics (Section 4). Size and number of connected components are both important attributes of a disconnected graph. Our first and third metrics take into account only to the size (largest and smallest) and the second takes into account only the number of components. One can argue that a metric to measure connectedness of a disconnected graph should take into account both the size and the number of connected components simultaneously. To address this issue and as a direction of future research, in Section 6 we propose a fourth metric to measure the ‘‘connectedness’’ of a disconnected graph that takes into account both the size as well as the number of connected components. 2. Related works The RNP problem in wireless sensor networks has been studied extensively in the last few years [2,4–8,12,13]. Most of the studies can be categorized in the following way: (i) single-tiered network versus two-tiered network [12,14], (ii) 1-connected

C. Zhou, A. Mazumder, A. Das et al. / Pervasive and Mobile Computing 53 (2019) 1–12

3

network versus k-connected network (k ≥ 2) [12,15], (iii) homogeneous transmission range of nodes versus heterogeneous transmission range of nodes [1,4,12], (iv) location (for placement of RNs) unconstrained problem versus location constrained problem [8,12]. Lin and Xue introduced the Steiner Minimum Tree with Minimum Number of Steiner Points and Bounded Edge Length problem (SMT-MSPBEL) in [12]. This problem is exactly the same as placing the fewest number of RNs in the deployment area, so that the network formed by the sensor and the RNs are connected. They proved that the SMT-MSPBEL problem is NP-hard and presented an approximation algorithm with a performance bound of 5. Later Chen et al. in [16] proved that the Lin–Xue algorithm is actually a 4-approximation algorithm and also presented a 3-approximation algorithm for this problem. Cheng et al. in [17] presented another 3-approximation algorithm with faster execution time. The RNP problem studied in [12] was conducted w.r.t a single-tiered network. [14] studied the Delay Constrained Relay Node Placement (DCRNP) problem in two-tiered setting and proposed approximation algorithms for the same. In a twotiered network model the SNs are grouped into clusters, where each SN in a cluster is within the communication range of its cluster head, which is a RN. In this model, the SNs transmit sensed data to the cluster head, which relays it to the data collection point, with multiple hops through other RNs. Several authors have focused their attention on the study of both single tier and two tier networks [2,4]. The RNP problem studied in [12] had the goal of making the network formed by the SNs and RNs connected (i.e., 1-connected). [15] studied the generalized version of the RNP problem with the goal of designing a k-connected network where k > 1. They presented a genetic algorithm based approach as well as a greedy algorithm based approach. The transmission range of SNs and RNs were assumed to be homogeneous (identical) in [12]. Follow up research on this topic introduced heterogeneity of transmission range at two levels. The first level of heterogeneity was introduced in [4] where transmission range of the SNs and the RNs were different. They proved NP-hardness of the problem and presented a 7-approximation algorithm for the case where connectivity k = 1. Zhang et al. in [2] presented a 14-approximation algorithm for case where k = 2. The second level of heterogeneity was introduced in [1] where different SNs were allowed to have different transmission ranges. They presented approximation algorithms for scenarios where unidirectional or bidirectional paths connect the SNs and RNs. [12] studied the unconstrained location problem in the sense that there were no constraints on the locations where the RNs can be placed. In the constrained version of the problem studied in [13], the relay nodes can only be placed in the pre-defined set of candidate locations. The goal is to add a minimum number of RNs that guarantee connectivity such that the outage probabilities is minimized when constructing the routing tree. [7] considered a deployment budget and studied the tradeoff between the network throughput, the deployment budget, and overall system coverage. [18] studied the budget constrained Road Side Unit (RSU) deployment problem trying to the maximize Vehicles–RSUs–Vehicles (V–R–V) communications. 3. Problem formulation As discussed earlier, the goal of this study is to design sensor networks with a high degree of connectedness even when operating in an environment where the number of available RNs is less than the number of RNs necessary to make all the SNs connected. As a first step in this direction, we formalize the notion of connectedness in three different ways. Accordingly, we have three well defined problems and formal statements of these three problems are provided below. The input to the sensor network design problem is: (i) the locations of a set of SNs (terminal points) P = {p1 , p2 , . . . , pn } in the Euclidean plane, (ii) the communication range R of the SNs, and (iii) a budget B on the number of RNs that can be placed in the deployment area. From the set of points P and communication range R, we construct a graph G = (V , E) in the following way: Corresponding to each point pi ∈ P we create a node vi ∈ V and two nodes vi and vj have an edge ei,j ∈ E if the distance between the corresponding points pi and pj is at most R. It may be noted that the graph G = (V , E) so constructed may be disconnected. The purpose of deploying the RNs is to make the augmented graph, G′ = (V ′ , E ′ ), (comprising of SNs and RNs) connected. Suppose that m RNs are deployed at points Q = {q1 , q2 , . . . , qm }. Corresponding to every point qi ∈ Q there is a node vi ∈ V ′ − V and there is an edge between two nodes vi and vj in V ′ if the distance between the corresponding points is at most R. With unlimited budget B, sufficient number of RNs can be deployed to ensure that the graph G′ = (V ′ , E ′ ) is connected. However, if the budget is smaller than the minimum number of RNs necessary to make the graph G′ = (V ′ , E ′ ) connected, this goal will be unachievable. However, in this scenario also, one would like to have the graph G′ = (V ′ , E ′ ) as much connected as possible. This gives rise to a connectedness maximization problem. The goal of creating the graph G′ = (V ′ , E ′ ) with maximal connectedness or least disconnectedness can be achieved by (i) deploying the RNs in a fashion that maximizes the size of the largest connected components of G′ = (V ′ , E ′ ), or (ii) deploying the RNs in a fashion that maximizes the size of the smallest connected components of G′ = (V ′ , E ′ ), or (iii) deploying the RNs in a fashion that minimizes the number of connected components of G′ = (V ′ , E ′ ). We refer to (i) as Budget Constrained Relay node Placement for Maximizing the size of the Largest Connected Component (BCRP-MLCC) problem, (ii) as Budget Constrained Relay node Placement for Maximizing the size of the Smallest Connected Component (BCRP-MSCC) problem and (iii) as Budget Constrained Relay node Placement for Minimizing the Number of Connected Components (BCRP-MNCC) problem. Next, we provide formal definitions of the three problems: Given the locations of n SNs in the Euclidean plane P = {p1 , p2 , . . . , pn }, transmission range R and a budget B on the number of relay nodes that can be deployed, is it possible to find a set of points Q = {q1 , q2 , . . . , qm }, where m ≤ B, in the same plane where relay nodes can be deployed, so that:

4

C. Zhou, A. Mazumder, A. Das et al. / Pervasive and Mobile Computing 53 (2019) 1–12

(i) BCRP-MLCC Problem: the size of the largest connected component in the graph G′ = (V ′ , E ′ ) corresponding to the point set P and Q is at least X , for a pre-specified value X ? (The graph construction rule from the point set P and Q is described earlier in the section). (ii) BCRP-MSCC Problem: the size of the smallest connected component, of the graph G′ = (V ′ , E ′ ) corresponding to the point set P and Q is at least Y , for a pre-specified value Y ? (iii) BCRP-MNCC Problem: the number of connected components in the graph G′ = (V ′ , E ′ ) corresponding to the point set P and Q is at most Z , for a pre-specified value Z ? The unconstrained version of the RNP problem is known as the Steiner Tree Problem with Minimum Number of Steiner Points with Bounded Edge Length (STP-MSPBEL) and was studied in [12]. STP-MSPBEL Problem: Given a set of n terminals points (location of SNs) P = {p1 , p2 , . . . , pn } in the Euclidean plane, and positive constants R and B, is there a tree T spanning a point set Q ⊇ P such that each edge in the tree has a length no greater than R and the number of points in Q \ P, called Steiner points is at most B? The authors in [12] have shown that the STP-MSPBEL is NP-complete. As the STP-MSBEL problem is a special case of all of BCRP-MLCC, BCRP-MSCC and BCRP-MNCC problems, and STP-MSBEL is NP-complete, we can conclude that all three problems we study are NP-complete. 4. Problem solution In Section 4.1 we show that the computation of the optimal solution of the BCRP-MLCC problem even when the number of sensor nodes is as few as three is non-trivial. In Section 4.2 we present an approximation algorithm for the BCRP-MLCC 1 problem with a performance bound of 10 . And in Section 4.3 we present inapproximability results for the BCRP-MSCC and BCRP-MNCC problems. 4.1. Optimal solution for a special case of the BCRP-MLCC When the number of SNs is 2, i.e., n = 2, the BCRP-MLCC problem can be solved trivially. Consider a special case of the BCRP-MLCC problem where n = 3, and the distance between each of these nodes is more than the transmission range R. W.l.o.g, assume that transmission range for a RN is 1 unit, i.e. R = 1 otherwise we can always divide the length of each side by R. Then, for any two nodes u, v on the two-dimensional plane, let Iu,v be the interval formed by u, v as end points, and let |Iu,v | be the length of the interval, and the subsequent observation and lemmas follow: Observation 1. If we want to make u communicate with v (in isolation w.r.t. other nodes), let the minimum number of relay nodes we need to place be f (u, v ), then f (u, v ) = ⌈|Iu,v |⌉ − 1. Lemma 1. Let u, v, x, y be four nodes on the two-dimensional plane, if |Iu,v | ≥ |Ix,y |, then f (u, v ) ≥ f (x, y). Lemma 2. If |Iu,v | is an integer and |Iu,v | − 1 < |Ix,y | ≤ |Iu,v |, then f (u, v ) = f (x, y). Given three SNs A, B, C on the two-dimensional plane, we want to find the minimum number M of RNs such that A, B, C can communicate with each other. If budget is at least M, then the optimal solution is 3. Otherwise, the optimal solution is at most 2 and can be computed trivially. Here, we assume A, B, C are not in a straight line, otherwise, the problem can be solved easily by considering two intervals. Therefore, we consider the setting that A, B, C forms a triangle. It may be also noted that if the length of the smallest side of the triangle is at most 1, the problem becomes trivial as well. W.l.o.g, we say that the side A, B is shorter than 1, then A, B can communicate with each other directly. So we only need to consider link A, C or B, C . From Observation 1 the solution will be min{f (A, C ), f (B, C )} = min{⌈|IA,C |⌉ − 1, ⌈|IB,C |⌉ − 1}. Hence, we consider scenarios where all side lengths are greater than 1. Evidently, in such a scenario, we need to place at least one RN and we should place all RNs within the triangle area. Claim 1. There exists an optimal solution which contains a relay node D, such that all the other relay nodes are located on the intervals IA,D and IB,D and IC ,D . In other words, the resulting solution looks like a star as shown in Fig. 2(a). Proof. Given any optimal solution, we know the location of all RNs. Since A, B can communicate with each other (Fig. 2(b)), there must be a path A − B, path P = (A = v1 , v2 , . . . , vn′ = B) using RNs as intermediate nodes. Similarly, there is an A − C path Q = (A = u1 , u2 , . . . , un′′ = C ). Also, there is no other RN that is not in P ∪ Q as A, B, C are already connected. We say D is the common node of P , Q and D has the largest index on P. Such a D exists since A = v1 = u1 is a candidate. After obtaining D, we divide P into two sub-paths A − D and D − B. Since our objective is to minimize the number of RNs, both of these sub-paths should be intervals. We consider the same for path Q , and the resulting shape looks like a star (in some cases, the resulting shape overlaps two sides of the triangle when D is located at the same location as one of A or B or C . □

C. Zhou, A. Mazumder, A. Das et al. / Pervasive and Mobile Computing 53 (2019) 1–12

5

Fig. 2. Constructions for proof of Claim 1.

Fig. 3. Scenario 1.

For any triangle, w.l.o.g, say (B, C ) is the longest side with length L. Then, it takes at least Θ (⌈L⌉) time to compute the coordinates of all the RNs. Next we will present an algorithm that finds the minimum number of required RNs in O(L2 ) time. The main idea behind the algorithm is to consider the possible options for the optimal location of D. We see that once the location of D is fixed, the other RNs can be placed greedily at unit distance apart (since R = 1) from each other along IA,D , IB,D and IC ,D and we can conclude upon the required minimum number of RNs for this choice of location of D. We categorize the different options of location of D into three major ‘Scenarios’ which are further divided into different cases. For each setting, we compute the optimal location of D and the total number of RNs needed for that choice of D. We finally consider the location of D which minimizes the total required number of RNs over all categories to obtain the solution for BCRP-MLCC when n = 3. The three major scenarios considered are as follows: Scenario 1: D is located inside the triangle (not on a side). Scenario 2: D is located on side AC . Scenario 3: D is located on either side BC or AB. We describe Scenario 1 in details and omit the descriptions and analysis of Scenarios 2 and 3 due to space limitations. Scenario 1: As mentioned earlier, Scenario 1 is when D is located inside the triangle (not on a side) shown in Fig. 3(a). Claim 2. In Scenario 1, an optimal solution exists where |IC ,D | is an integer. Proof. We pick an optimal solution such that α = ̸ BAD is the smallest. Since D is located inside the triangle, α > 0. Let CIRP ,r be the circle whose centre is P with radius R, then D is on the circumference of CIRA,|IA,D | as well as CIRB,|IB,D | as shown in Fig. 3(b). Suppose |IC ,D | is not an integer, say |IC ,D | = M − ϵ, M ∈ N+ . Then we move D along circumference of CIRA,|IA,D | a very small distance, such that ̸ BAD′ < α and ̸ D′ AD ≤ min{α, |I ϵ | }. By triangular inequality, |IC ,D′ | < |IC ,D | + |ID,D′ | < ′

A ,D

ı | =≤ M. According to Lemma 2, f (C , D) = f (C , D′ ). Next we consider IA,D′ . By the construction of D′ , |IA,D | = |IA,D′ | |IC ,D |+|DD which implies f (A, D) = f (A, D′ ). Finally, we consider IB,D′ . Since CIRA,|IA,D | intersects CIRB,|IB,D | at D, D′ must be within CIRB,|IB,D | , hence |IB,D′ | < |IB,D | and f (A, D′ ) + f (B, D′ ) + f (C , D′ ) < f (A, D) + f (B, D) + f (C , D). However, based on our choice of D and α , this is a contradiction. So, such a D′ does not exist and |IC ,D | must be an integer. □

6

C. Zhou, A. Mazumder, A. Das et al. / Pervasive and Mobile Computing 53 (2019) 1–12

Algorithm 1: Algorithm to compute scenario 1.(i) 1: 2: 3: 4: 5: 6: 7: 8: 9:

for i = 0 to ⌊L⌋ do for j = 0 to ⌊L⌋ do Compute intersection point, say D, of CIRA,i and CIRC ,j if two circles intersect. if the intersection point is inside triangle then Compute f (A, D) + f (B, D) + f (C , D) using f (A, D) = ⌈|IA,D |⌉ − 1 etc. end if end for end for Choose D that minimizes f (A, D) + f (B, D) + f (C , D), call it D1 .

Algorithm 2: Algorithm to compute scenario 1.(ii).I 1: 2: 3: 4: 5: 6: 7:

for i = 0 to ⌊L⌋ do Compute tangent line AD to circle CIRC ,i . if the intersection point is inside triangle then Compute f (A, D) + f (B, D) + f (C , D). end if end for Choose D that minimizes f (A, D) + f (B, D) + f (C , D), call it D2 .

Algorithm 3: Algorithm to compute scenario 1.(ii).II 1: 2:

for i = 0 to ⌊L⌋ do Compute intersection point of CIRC ,i and CIRE , |AB| where E is mid point of side AB. 2

3: 4: 5: 6: 7:

if the intersection point is inside triangle then Compute f (A, D) + f (B, D) + f (C , D). end if end for Choose D that minimizes f (A, D) + f (B, D) + f (C , D), call it D3 .

Next we show that in Scenario 1 (i) either |IA,D | is an integer, or (ii) one of ̸ ADC and ̸ ADB is

π 2

.

Case (i): |IA,D | is integral: Here, as shown in Algorithm 1, we enumerate |IA,D | and |IC ,D | (since both are integers), the intersection point (if there are two intersection points, we pick the one inside the triangle) of CIRA,|IA,D | and CIRC ,|IC ,D | is the candidate of D (Fig. 4(a)). Among all candidates, the one that minimizes f (A, D) + f (B, D) + f (C , D) is the final candidate. Case (ii): |IA,D | is not integral: By the choice of D, |IA,D | cannot be extended. There could be only two reasons for this: either ̸ ADC = π , i.e., AD is a tangent line of circle CIRC ,|I | ; or ̸ ADB = π , i.e., AD is a tangent line of circle CIRB,|I | . We omit the C ,D B,D 2 2 proof of this claim due to space limitations. This gives rise to the following sub-cases: Sub-Case I: ̸ ADC = π2 : As shown in Algorithm 2, we enumerate over integer values of |IC ,D |. Then we compute a tangent AD to CIRC ,|IC ,D | and get coordinates of D (Fig. 5(a)). Among all different Ds, choose the one that minimizes f (A, D) + f (B, D) + f (C , D) as the final candidate. Sub-Case II: ̸ ADB =

π 2

: Let E be the mid point of AB, then by knowledge of geometry, D lies on the circumference of CIRE , |AB| . 2

As presented in Algorithm 3, again we enumerate over integral values of IC ,D and compute intersection point of CIRE , |AB| and CIRC ,|IC ,D | (Fig. 5(b)).

2

4.2. Approximation algorithm for BCRP-MLCC As our approximation algorithm is based on the Minimum K − Spanning Tree (K -MST) problem, first we describe the K -MST problem. Minimum K -Spanning Tree Problem (K-MST) : Given a graph G = ∑ (V , E), real numbers K , B, and a weight function w : E → N, is there a spanning tree T of G of at least K nodes, so that e∈T w(e) ≤ B. The K -MST problem is NP-hard [19]. The approximation algorithm for the K -MST problem provided by Garg in [19] guarantees a performance bound of 2. We will present this algorithm and make use of it for developing an approximation algorithm for BCRP-MLCC problem. Consider a complete weighted graph G = (V , E) constructed from the set of points P for BCRP-MLCC where each node vi ∈ V corresponds to a point pi ∈ P and weight on the edge w (vi , vj ) between the nodes vi

C. Zhou, A. Mazumder, A. Das et al. / Pervasive and Mobile Computing 53 (2019) 1–12

7

Fig. 4. Constructions for Case (i) and Case (ii) under Scenario 1.

Fig. 5. Constructions for Scenario 1, Case (ii), Sub-Cases I and II.

and vj is set equal to (⌈d(pi , pj )/R⌉ − 1), where d(pi , pj ) is the Euclidean distance between the points pi and pj and R is the transmission range of the sensors. We then apply the following algorithm on the constructed graph G = (V , E), |V | = n. Algorithm 4: Approximation Algorithm for BCRP-MLCC 1: 2: 3: 4: 5: 6:

for i = n to 1 do Set K = i and apply Garg’s algorithm to find the spanning tree T if w eight(T ) ≤ B then return T end if end for

We denote P ′ as the vertex set from resulting tree T and AppSol = |P ′ |. Each node of P ′ corresponds to a point pi ∈ P, and very naturally we have P ′ ⊂ P . An edge in T connecting nodes pi and pj corresponds to a straight line segment Ipi ,pj between points pi and pj . As our goal is to have a deployment of RNs, we then show how to place them on T . Denote all RNs as Q ′ . For any straight segment Iu,v , RNs are placed at distance R (transmission range) apart starting from one, w.l.o.g. say u, to the other one v . It is easily verifiable that every edge needs ⌈d(pi , pj )/R⌉ − 1 RNs. As required, |Q ′ | ≤ B, where B is the budget for the number of RNs. Thus, we obtain a new tree TAppSol formed by node set P ′ ∪ Q ′ and layout of TAppSol is our desired connected component Fig. 6(a). We will refer to an optimal solution of the BCRP-MLCC problem as TOptSol which is also a tree-style layout after deployment of relay nodes. Similarly, TOptSol can be viewed as point set P ′′ ∪ Q ′′ , where P ′′ ⊂ P and OptSol = |P ′′ | ≥ |P ′ |. In addition, budget constraint is satisfied, i.e., |Q ′′ | ≤ B. Without loss of generality, we can assume that TOptSol is the optimal solution which uses the fewest number of RNs Fig. 6(b). The following lemma was established in [12]. Lemma 3. There exists a shortest length optimal Steiner tree for STP-MSPBEL such that every Steiner point has degree at most five. The Lemma states that the optimal solution to STP-MSPBEL, denoted as OptSol, has a layout on a two dimensional plane such that every Steiner point has degree at most five. Evidently it is true for any connected component if the graph cannot

8

C. Zhou, A. Mazumder, A. Das et al. / Pervasive and Mobile Computing 53 (2019) 1–12

Fig. 6. Example of layout of TAppSol and TOptSol .

Fig. 7. Layout of TOptSol whose relay nodes have degree no more than 5.

be connected. Since we consider the largest connected component, according to Lemma 3, the tree TOptSol formed by P ′′ ∪ Q ′′ also has a layout in two dimensional plane such that no qi ∈ Q ′′ has degree greater than five. Layout of such a tree is shown in Fig. 7. Definition 2. Generalized Depth First Search (GDFS): This is almost the same as Depth First Search [11], we maintain a sequence of the order when a node is traversed. However, a node is again enumerated when coming back from one of its neighbours and going to the next one. It implies a node may appear multiple times in the final resulting sequence. One possible sequence generated by the GDFS on TOptSol layout shown in Fig. 7 starting from the p1 is as following: p1 , q1 , p2 , q1 , p3 , q1 , p4 , q1 , p5 , q2 , p6 , q2 , p7 , q6 , p10 , q6 , p11 , q6 , p12 , q6 , p7 , q3 , q4 , q5 , p13 , q5 , p14 , q5 , p15 , q5 , p16 , q5 , q4 , q3 , p7 , q2 , p8 , q2 , p9 , q2 , p5 , q1 , p1 . Some nodes appear multiple times because it is still counted when going backwards. W.l.o.g, we refer to terminal points as p − type points and RNs as q − type points. The following lemma follows from Lemma 3. Lemma 4. The number of occurrences of any q − type point in the sequence produced by GDFS on TOptSol is at most five. Definition 3. Logical Hamiltonian Path: Given a connected graph G = (V , E), a Logical Hamiltonian Path, LHP = v1 , v2 , . . . , vn is a permutation of nodes, such that e(vi , vi+1 ) is an edge of LHP and corresponds to a path from vi to vi+1 in G.

∑ Given a TOptSol and its layout, define cost(P) be the number of RNs on P if P is a path in TOptSol . Then cost(LHP) =

e∈LHP cost(e). We have the following lemma. The detailed proof has been made available at https://sites.google.com/site/ appendixforbcrp/:

Lemma 5. If TOptSol contains B relay nodes, there is an LHP s.t. cost(LHP) ≤ 5B and we can construct a corresponding graph C ∗ with at most 5B relay nodes while using straight segments between terminal points. Fig. 8 shows one desired LHP and corresponding C ∗ from TOptSol in Fig. 7. We use dash lines in Fig. 8(a) since LHP is not a real path and ‘‘edges" only stand for logical order and cost. While in Fig. 8(b), edges are solid lines because they represent real intervals. We note here that we use the same label of RNs from TOptSol such that it gives us a direct impression of how each edge is constructed and how RNs repeat. For example, (p1 , p2 ) are directly connected via q1 . Therefore, it does not take

C. Zhou, A. Mazumder, A. Das et al. / Pervasive and Mobile Computing 53 (2019) 1–12

9

Fig. 8. Figure showing one desired LHP and corresponding C ∗ from TOptSol in Fig. 7.

more than one node linking p1 and p2 using straight interval. And we draw q1 on Ip1 ,p2 to indicate that. Similarly, (p5 , p6 ) are directly connected via q2 . Hence we draw q2 on Ip5 ,p6 to show another ‘‘propagation" of q2 . On the other hand, (p12 , p13 ) are indirectly connected, thus we draw two segments Ip12 ,p7 and Ip7 ,p13 in C ∗ with repetition of RNs. While reflecting on Fig. 8(a), p12 has a logical edge to p13 using at most 4 relay nodes. One may notice that C ∗ is not necessarily a tree, but it must contain a spanning tree. Algorithm 5: Algorithm to find a subpath, P ′ , of LHP that contains at least 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17:

n 10

p points and costs at most 2B .

S = ∅ (S is a set of p points that will be in the subpath) i=1 Total_Cost = 0 for i = 1 to n − 1 do S = S ∪ {pi } n then if |S |≥ 10 return S end if if Total_Cost + cost(e(p∗i , p∗i+1 )) ≤ 2B then Total_Cost = Total_Cost + cost(e(p∗i , p∗i+1 )) else S=∅ Total_Cost = 0 end if end for S = S ∪ {p∗n } return S

n In the following we give an algorithm that finds a subpath of the logical path LHP which contains at least 10 number B of p − type points and costs at most 2 . It may be recalled that LHP is a permutation of n terminal points with costs at most 5B. Again, without loss of generality, we assume that the p points on the path LHP are numbered sequentially from p∗1 to p∗n .

Lemma 6. There exists a subpath, P ′ , of LHP, that includes at least number of p points).

n 10

p − type points and whose cost is at most

B 2

(n is the total

Proof. Algorithm 5 attempts to construct such a subpath P ′ by sequentially scanning terminal points {p∗1 , . . . , p∗n } on the logical path LHP, by adding one point at a time to the set S, starting from the point p∗1 . If at any point of time, it finds a set S n and the cost of current set S is at most 2B , it returns that set and the algorithm terminates. On the other such that |S | ≥ 10 n hand, if current cost exceeds threshold 2B , but |S | < 10 , it discards this set and resets Total_Cost to zero. Since the total cost of LHP is at most 5B and it contains n terminal points, this resetting can take place at most nine times and we have at most 10 disjoint sets, each set forms a subpath with cost no more than 2B . By Pigeon Hole principle, one of these 10 sets contains at n least 10 p − type points and hence we obtain desired P ′ . Here we should notice, from p∗i to p∗i+1 , there could be other terminal nodes (for example from p16 to p8 , p7 is also included). However, it does not decrease the cardinality of set S and the lemma still holds. □

10

C. Zhou, A. Mazumder, A. Das et al. / Pervasive and Mobile Computing 53 (2019) 1–12

Fig. 9. Experimental evaluation of Algorithm 4 for BCRP-MLCC problem. X -axis and Y -axis denote the budget and the approximation ratio size of largest connected component in approximate solution i.e., size of largest connected component in optimal solution respectively.

Now we present Garg’s algorithm for K − MST problem. Definition 4. Garg’s algorithm: Given a weighted graph G = (V , E) and number k, the algorithm finds a subgraph with exactly k vertices, and the total cost is at most 2 times the optimal. The following lemma follows from Lemma 6, Lemma 5 and the guaranteed 2 performance bound of Garg’s algorithm. Lemma 7. Given TOptSol , let GV be the complete graph using P ′ as node set and all straight segments between any two terminals n as edge set. If we set k = 10 (n = |P ′ |) and run Garg’s algorithm on GV , it returns a subgraph(tree) with total cost at most B. Theorem 1. Algorithm 4 is a

1 -approximation 10

algorithm, i.e.,

AppSol OptSol



1 . 10 OptSol

Proof. From Lemma 7, we know that with cost B, Algorithm 4 will be able to find a Spanning Tree with at least k ≥ 10 nodes. By enumerating value of k in decreasing order, we must find a feasible tree and AppSol ≥ k. Hence the theorem holds. Garg ′ s algorithm runs in O(mn4 log n) time, and our Algorithm 4’s running time is O(mn5 log n). □ 4.3. Inapproximability of BCRP-MSCC and BCRP-MNCC Detailed proofs for the following two theorems have been made available online at https://sites.google.com/site/ appendixforbcrp/. Theorem 2. There is no polynomial-time approximation algorithm for BCRP-MSCC with approximation better than NP.

1 2

unless P =

Theorem 3. There is no polynomial-time approximation algorithm for BCRP-MNCC within approximation 2 − ϵ for any ϵ > 0, unless P = NP. 5. Experimental results In this section, we present our experimental results to demonstrate the efficacy of our BCRP-MLCC approximation algorithm. We have developed a mathematical programming formulation (detailed description of the formulation has been made available online at https://sites.google.com/site/appendixforbcrp/) to compute the optimal solution and compare the results obtained from Algorithm 4. For our experiments, we have used synthetic random data with 6, 8, 10, 12 SNs in a 5 × 10 deployment area, varying the number of the RNs from 1 to 5. It may be noted that experimental results with a larger number of SNs, or larger number of RNs could not be provided as the time to compute the optimal solutions using our mathematical programming formulation turned out to be unacceptably high. We have computed the optimal solution using CPLEX solver for AMPL on a Intel Core i7 machine with 8GB RAM and 2.3 GHz processor. The approximation algorithm was also executed on the same machine. In Fig. 9, we have plotted the ratio between the sizes of the largest connected component in the approximate (Algorithm 4) and the optimal (obtained using mathematical formulation) solutions for the maximization problem BCRP-MLCC. X -axis and Y -axis denote the budget (i.e., the number of RNs available) and the approximation ratio size of largest connected component in approximate solution (i.e., size of largest connected component in optimal solution ) respectively. It may be noted that although our analysis in Section 4.2 shows that the ratio between the approximate to optimal solutions could not be any lower than 0.1, in our experiments this ratio was never any lower than 0.4. Also, the computation time for the approximate solution was only a fraction of the time required for finding the optimal solution.

C. Zhou, A. Mazumder, A. Das et al. / Pervasive and Mobile Computing 53 (2019) 1–12

11

Table 1 Component frequency distribution table. Frequency Distribution of Connected Components in G = (V , E) Ci = Number of connected components of size i, n = |V | G = (V , E), C G1 = (V1 , E1 ), C1 G2 = (V2 , E2 ), C2

Cn xn yn

Cn−1 xn−1 yn−1

Cn−2 xn−2 yn−2

··· ··· ···

C2 x2 y2

C1 x1 y1

6. Future direction It may be noted that while BCRP-MLCC/BCRP-MSCC problems focus only on the size of the largest/smallest connected component while ignoring the number of components, the BCRP-MNCC problem focuses only on the number of connected components while ignoring the size of these components. One may argue that all three metrics are myopic in nature as they pay attention either only to the size of the largest/smallest component or to the number of components, where both size and number of connected components are key attributes of a disconnected graph. To address this concern, we propose a fourth metric to measure the ‘‘connectedness’’ of a disconnected graph that takes into account both the size as well as the number of connected components. In the following we define the metric. Every disconnected graph comprises of a number of components of differing sizes. It might comprise of one component of size n, where n is the number of nodes of the graph, or n components of size one, or any other distribution in between. Accordingly, any graph can be described by the frequency distribution of its component sizes. We define Ci as the number of connected components of size i in G = (V , E), 0 ≤ Ci ≤ n, ∀i, 1 ≤ i ≤ n (as shown in Table 1). A Component Frequency Distribution Vector (CFDV) C is defined as a vector of size n where the ith entry of the vector specifies the number of connected components of the graph G = (V , E). The weight of a CFDV C is defined as follows: w(C ) = Σin=1 Ci .(n + 1)i−1 . It may be noted w (C ) takes into account both the size and the number of components of the graph, with larger components being assigned with higher weights than the smaller components. We define the ‘‘disconnectivity’’ of a graph G = (V , E), δ (G), with CFDV C as δ (G) = 1/w (C ). Example: Consider two graphs G1 = (V1 , E1 ) and G2 = (V2 , E2 ) each with 6 nodes. The CFDVs of G1 and G2 (C1 and C2 ) are given as [0, 0, 1, 0, 0, 2] and [0, 0, 0, 2, 0, 0] respectively, implying that G1 comprises of one component of size 4 and two components of size 1, while G2 comprises of two components of size 3. With these parameter values, w (C1 ) is 345 and w (C2 ) is 198 which in turn determines δ (G1 ) = 0.002899 and δ (G2 ) = 0.005051. As the value of δ (G1 ) is smaller than δ (G2 ), it implies that G1 is less disconnected (or, more connected) than G2 . It may be noted that we arrive at this conclusion because we choose to give higher weights to components with a larger size. The optimization problem using this metric is referred to as the Budget Constrained Relay node Placement for Minimum Disconnectivity (BCRPMD) problem. Next, we provide a formal definition of the BCRP-MD problem. BCRP-MD Problem: Given the locations of n SNs in the Euclidean plane P = {p1 , p2 , . . . , pn }, transmission range R and a budget B on the number of RNs that can be deployed, is it possible to find a set of points Q = {q1 , q2 , . . . , qm }, where m ≤ B, in the same plane where RNs can be deployed, so that the disconnectivity, δ (G′ ), of the graph G′ = (V ′ , E ′ ) corresponding to the point set P and Q is at most D, for a pre-specified value D? It is true that the two key attributes of a disconnected graph (size and number of components) could have been combined in ways other than the one proposed here, i.e., w (C ) = Σin=1 Ci .(n + 1)i−1 . However, this way of combining the two attributes has the advantage that, when simply given the Component Frequency Distribution Vector (CFDV) C , one can easily compute the ‘‘disconnectivity’’ δ (G). Conversely given δ (G) and n, the number of nodes in G, one can easily recompute the CFDV of G. 7. Conclusion In this paper we study a family of problems in the domain of relay node placement under a budget constraint where it might not be possible to obtain enough relay nodes so as to make the network of sensor and relay nodes connected. To the best of our knowledge, this is the first formal study on the ‘‘connnectedness’’ of a disconnected graph. To this end, we propose four metrics. Detailed analyses have been presented for three of the metrics and the fourth metric is proposed as a direction for future research. References [1] X. Han, X. Cao, E.L. Lloyd, C.C. Shen, Fault-tolerant relay node placement in heterogeneous wireless sensor networks, Mobile Comput. IEEE Trans. 9 (5) (2010) 643–656. [2] W. Zhang, G. Xue, S. Misra, Fault-tolerant relay node placement in wireless sensor networks: Problems and algorithms, INFOCOM 2007, in: 26th IEEE International Conference on Computer Communications, IEEE, 2007, pp. 1649–1657. [3] M. Younis, I.F. Senturk, K. Akkaya, S. Lee, F. Senel, Topology management techniques for tolerating node failures in wireless sensor networks: A survey, Comput. Netw. 58 (2014) 254–283. [4] E.L. Lloyd, G. Xue, Relay node placement in wireless sensor networks, IEEE Trans. Comput. 56 (1) (2007) 134–138. [5] J. Li, L.L.H. Andrew, C.H. Foh, M. Zukerman, H.H. Chen, Connectivity coverage and placement in wireless sensor networks, Molecul. Diversi. Preserv. Intl. Sens. 9 (2009) 7664–7693. [6] F. Senel, M. Younis, Novel relay node placement algorithms for establishing connected topologies, J. Netw. Comput. Appl. 70 (2016) 114–130.

12

C. Zhou, A. Mazumder, A. Das et al. / Pervasive and Mobile Computing 53 (2019) 1–12

[7] J.Y. Chang, Y.W. Chen, A cluster-based relay station deployment scheme for multi-hop relay networks, Commun. Netw. J. IEEE 17 (2015) 84–92. [8] S. Misra, N.E. Majd, H. Huang, Approximation algorithms for constrained relay node placement in energy harvesting wireless sensor networks, IEEE Trans. Comput. 63 (12) (2014) 2933–2947. [9] C. Ma, W. Liang, M. Zheng, H. Sharif, A connectivity-aware approximation algorithm for relay node placement in wireless sensor networks, IEEE Sens. J. 16 (2) (2016) 515–528. [10] C. Yang, K. Chin, On nodes placement in energy harvesting wireless sensor networks for coverage and connectivity, IEEE Trans. Ind. Inf. 13 (1) (2017) 27–36. [11] T.H. Cormen, C.E. Leiserson, R.L. Rivest, C. Stein, Introduction to Algorithms, vol. 6, MIT press Cambridge, 2001. [12] G.H. Lin, G. Xue, Steiner tree problem with minimum number of Steiner points and bounded edge-length, Inform. Process. Lett. 69 (1999) 53–57. [13] M. Bagaa, A. Chelli, D. Djenouri, T. Taleb, I. Balasingham, K. Kansanen, Optimal placement of relay nodes over limited positions in wireless sensor networks, IEEE Trans. Wirel. Commun. 16 (4) (2017) 2205–2219. [14] C. Ma, W. Liang, M. Zheng, Delay constrained relay node placement in two-tiered wireless sensor networks: A set-covering-based algorithm, J. Netw. Comput. Appl. 93 (2017) 76–90. [15] S.K. Gupta, P. Kuila, P.K. Jana, Genetic algorithm for k-connected relay node placement in wireless sensor networks, in: Proceedings of the Second International Conference on Computer and Communication Technologies, Adv. Intell. Syst. Comput. 379 (2016) 721–729. [16] D. Chen, D.Z. Du, X.D. Hu, G.H. Lin, L. Wang, G. Xue, Approximations for Steiner trees with minimum number of Steiner points, J. Global Optim. 18 (2000) 17–33. [17] Cheng. Xiuzhen, D.Z. Du, L. Wang, B. Xu, Relay sensor placement in wireless sensor networks, Wirel. Netw. 14 (2008) 347–355. [18] W. Huang, P. Li, T. Zhang, RSUs placement based on vehicular social mobility in VANETs, in: Proceedings of 13th IEEE Conference on Industrial Electronics and Applications, ICIEA, 2018, pp. 1255–1260. [19] N. Garg, Saving an epsilon: a 2-approximation for the k-MST problem in graphs, in: Proceedings of the thirty-seventh annual ACM symposium on Theory of computing, vol. 12, 2005, pp. 386–402.