Opportunistic coverage for urban vehicular sensing

Opportunistic coverage for urban vehicular sensing

Computer Communications 60 (2015) 71–85 Contents lists available at ScienceDirect Computer Communications journal homepage: www.elsevier.com/locate/...

5MB Sizes 58 Downloads 129 Views

Computer Communications 60 (2015) 71–85

Contents lists available at ScienceDirect

Computer Communications journal homepage: www.elsevier.com/locate/comcom

Opportunistic coverage for urban vehicular sensing Dong Zhao a, Huadong Ma a,⇑, Liang Liu a, Xiang-Yang Li b,c a

Beijing Key Laboratory of Intelligent Telecommunications Software and Multimedia, Beijing University of Posts and Telecommunications, Beijing, China Department of Computer Science, Illinois Institute of Technology, Chicago, IL, USA c Department of Computer Science and Technology, and School of Software, Tsinghua University, Beijing, China b

a r t i c l e

i n f o

Article history: Received 26 August 2014 Received in revised form 5 December 2014 Accepted 23 January 2015 Available online 7 February 2015 Keywords: Vehicular sensor networks Opportunistic sensing Urban sensing Coverage

a b s t r a c t Opportunistic vehicular sensing is a new paradigm which exploits variety of sensors embedded in vehicles or smartphones to collect data ubiquitously for large-scale urban sensing. Existing work lacks indepth investigations on the coverage problem in such sensing systems: (1) how to define and measure the coverage? (2) what is the relationship between the coverage quality and the number of vehicles? and (3) how to select the minimum number of vehicles to achieve the specific coverage quality? First, we propose a metric called Inter-Cover Time (ICT) to characterize the coverage opportunities. According to the empirical measurement studies on real mobility traces of thousands of taxis, we find that the aggregated ICT Distribution (ICTD) follows a truncated power-law distribution. We also analyze the reasons behind this particular pattern by evaluating four known mobility models. Second, we propose a metric called opportunistic coverage ratio, and derive it as a function of the aggregated ICTD. We also analyze the changes of opportunistic coverage ratios on different days of a week. Finally, we present a vehicle selection algorithm to address the third problem. In addition, we present a framework of recruiting vehicles, serving as fundamental guidelines on the coverage measurement and network planning for urban vehicular sensing applications. Ó 2015 Elsevier B.V. All rights reserved.

1. Introduction Over the past decade, the focus of Wireless Sensor Networks (WSNs) research has evolved from a series of small-scale testbeds and specialized applications, to a new ubiquitous data collection paradigm for urban sensing at very large scale [1–4]. However, large-scale and fine-grained urban sensing requires the deployment of a large number of sensor nodes, which is economically infeasible or undesirable. Take the CitySee [5] project for instance, 100 sensor nodes and 1096 relay nodes are deployed for CO2 monitoring in an urban area of around 1 km2. If this system is extended to a larger urban area, for example, within 5 ring in Beijing (about 900 km2), we will need to deploy at least 90 thousand sensor nodes and around 1 million relay nodes to maintain full field coverage and communication connectivity. Expensive sensor cost together with the deployment and maintenance cost will make it hard to implement. Fortunately, recent advances in mobile sensing and communication technologies trigger research in leveraging mobile sensor networks as an effective and affordable solution for large-scale ⇑ Corresponding author. E-mail addresses: [email protected] (D. Zhao), [email protected] (H. Ma), [email protected] (L. Liu), [email protected] (X.-Y. Li). http://dx.doi.org/10.1016/j.comcom.2015.01.018 0140-3664/Ó 2015 Elsevier B.V. All rights reserved.

and fine-grained urban sensing. Such sensing paradigm is popularly called opportunistic sensing [3,6], participatory sensing [6], mobile crowdsensing [7] or people/human-centric sensing [3,8,2]. In particular, Vehicular Sensor Networks (VSNs) are emerging as a new tool for effectively monitoring the physical world, especially in urban areas where a large number of vehicles equipped with various sensors are expected [9]. In addition, today’s smartphones are programmable and come with a rich set of cheap powerful embedded sensors, such as GPS, Wi-Fi/3G/4G interfaces, accelerometer, digital compass, gyroscope, microphone, and camera, which sheds lights on using smartphones as a platform for urban vehicular sensing. These kinds of vehicular sensing platforms will enable numerous novel applications such as environment monitoring [10,11], traffic monitoring [12,13], road surface monitoring [14], and street-parking availability statistics [15]. One application scenario is illustrated in Fig. 1: a group of vehicles equipped with sensors, GPS receivers, and wireless communication modules (e.g., Wi-Fi, 3G, and GSM modules) roam within the urban area; each vehicle periodically reports its sensory data (e.g., temperature, CO2 concentration) and current location to the monitoring center; the monitoring center then calculates the distribution of sensory data and presents the result on urban maps (e.g., Google map). This paradigm has several advantages over the traditional stationary WSN solution:

72

D. Zhao et al. / Computer Communications 60 (2015) 71–85

 It is easier to deploy the network at lower cost, because a lot of GPS-equipped vehicles already exist in major cities around the world. Meanwhile, the vehicle mobility can be exploited to improve the coverage of the urban area.  It is easier to maintain the network, because vehicle sensor nodes often have more power supply, stronger computation, storage, and communication capacity.  It is more extensible and flexible, because we only need to recruit more vehicles to adapt to the expansion of the system scale. At present, most work about urban sensing focuses only on hardware development, system and algorithm design, or prototype system deployment. However, there are few studies on the coverage quality, where the number of vehicles participating in sensing applications is a determining factor. Naturally, we always expect to achieve the satisfactory coverage quality by using the minimum number of vehicles. To achieve this objective, we design a framework of recruiting vehicles (Section 3). However, there are three basic problems unsolved in this framework. The first one is: how to define and measure the coverage? In stationary WSNs, it often requires that each point in the monitored region is covered all the time, and the coverage quality cannot change over time [16– 18]. In contrast, the coverage in VSNs, called opportunistic coverage, is time-variant due to the vehicle mobility. Thus, it is necessary to consider the time factor. In this paper, we divide the whole urban area into grid cells, and use a new metric Inter-Cover Time (ICT), defined as the time elapsed between two consecutive coverage of the same grid cell, to characterize the opportunity with which a grid cell is covered. Obviously, the shorter ICT means the better coverage quality for a grid cell. We notice that the ICT is similar to the inter-contact time, which is a key factor affecting the packet delivery delay in Delay-Tolerant Networks (DTNs). Recently, some empirical results based on human and vehicle mobility traces have shown different distributions of inter-contact times: truncated powerlaw distribution and exponential distribution [19–25]. In fact, many studies have shown that both the temporal and spatial scaling law of human communication, web access, working and circadian patterns have some universal characters [26,27]. Thus, we conjecture that the ICT Distribution (ICTD) should follow a particular pattern. According to our empirical analysis on real mobility traces of thousands of taxis collected in Beijing and

Shanghai, this conjecture has been verified, although it is a complex pattern affected by some factors such as the size of grid cells and the number of vehicles. In order to identify the reasons behind this particular pattern, we also evaluate the features of opportunistic coverage under four known mobility models. We find that the coverage features of various mobility models can be well explained by the queuing model proposed by Barabasi [28]. As a byproduct, this evaluation result can also provide guidelines on selecting proper mobility models for verifying cooperative sensing or coverage algorithms (e.g., [10,29]). The second basic problem is: what is the relationship between the coverage quality of an urban area and the number of vehicles? This relationship can be used to accurately estimate the required number of vehicles to achieve the specific coverage quality, which is very important for setting goals of incentive mechanisms. To solve this problem, we use a metric opportunistic coverage ratio, defined as the expected ratio of grid cells that can be opportunistically covered during a specific time interval. It can be derived as a function of the aggregated ICTD, which increases monotonically with the number of vehicles and the time interval. The third basic problem is: how to select the minimum number of vehicles to achieve the specific coverage quality requirement? Since different vehicles always have heterogeneous mobility regions with some randomness, they could make different contributions to the coverage. Although we have derived the relationship between the coverage quality and the number of vehicles, there may be redundant vehicles since vehicles are selected randomly in the measurements. In order to eliminate redundancy, we propose an algorithm to select the minimum number of vehicles to achieve the specific opportunistic coverage ratio based on the collected mobility traces of vehicles. As a summary, the contributions of this paper are:  We design an effective and efficient framework to recruit vehicles for urban sensing while guaranteeing the specific coverage quality requirement.  We present a general model of the opportunistic coverage, propose a new metric called ICT to measure the coverage quality of urban vehicular sensing systems. By empirical measurement studies on Beijing and Shanghai taxi mobility traces, we find that the aggregated ICTD follows a truncated power-law distribution regardless of the size of grid cells and the number of vehicles.

Sensor

Wi-Fi/ 3G/GSM GPS module

Wireless networks

Monitoring center Fig. 1. The illustration of urban vehicular sensing.

Sensing map

D. Zhao et al. / Computer Communications 60 (2015) 71–85

 We evaluate the features of opportunistic coverage under four known mobility models, and provide a preliminary investigation on the reasons behind our key observation on these features.  Based on the two cases in Beijing and Shanghai, we analyze the relationship between the opportunistic coverage ratio and the number of vehicles. We also analyze the changes of the opportunistic coverage ratio on different days of a week, through which the effectiveness of our proposed models and methods are further verified.  We propose an algorithm to select the minimum number of vehicles to achieve the specific opportunistic coverage ratio. The effectiveness of this algorithm is evaluated based on Beijing taxi mobility traces. A preliminary conference paper based on this work can be found in [30]. The remainder of this paper is organized as follows: Section 2 reviews the related work. Section 3 describes the vehicle recruitment framework. Section 4 introduces the general model of opportunistic coverage. We describe the two datasets of taxi mobility traces, show empirical analysis methods and results, and evaluate the features of opportunistic coverage under four known mobility models in Section 5. We analyze the opportunistic coverage ratio of Beijing and Shanghai in Section 6. The vehicle selection algorithm is presented in Section 7. Finally, we conclude the paper in Section 8.

2. Related work 2.1. People-centric urban sensing Recently, many people-centric urban sensing applications have been developed to leverage human-carried or vehicle-mounted sensors to share local data, increase global awareness, compute community statistics, or map physical phenomena. These applications can be classified into two categories according to the types of carriers: vehicular sensing systems and human sensing systems. In the first category, a prototype of VSN was demonstrated to monitor the CO2 concentration in areas of interest [11]. Nericell [12] was a system that monitored road and traffic conditions in a city by piggybacking on smartphones that users carry around with them. VTrack [13] was an accurate, energy-aware road traffic delay estimation system using a variety of sensors such as GPS and WiFi embedded in smartphones. Pothole Patrol [14] was a system that used the inherent mobility of the participating vehicles, opportunistically gathering data from vibration and GPS sensors, and processing the data to assess road surface conditions. ParkNet [15] was a system that used vehicles equipped with a GPS receiver and a passenger-side-facing ultrasonic rangefinder to collect parking space occupancy information while driving by. CarTel [31] was a mobile sensor computing system designed to collect, process, deliver, and visualize data from sensors located on mobile units such as automobiles, which has been used to analyze commute times, analyze metropolitan Wi-Fi deployments, and for automotive diagnostics. MobEyes [32] provided an efficient lightweight support for proactive urban monitoring based on the primary idea of exploiting vehicle mobility to opportunistically diffuse summaries about sensed data. In the second category, the Common Sense project [33] developed an urban sensing system using a network of handheld air quality monitors, which allowed individuals to measure their personal exposure, groups to aggregate their members’ exposure, and activists to mobilize grassroots community action. Ear-Phone [34] was an urban noise mapping system that created an open and inexpensive platform for rendering up-to-date noise maps. The compressed sensing approach was proposed to

73

recover the sensing map from random and incomplete samples more effectively [10,34]. However, the above work focused only on hardware development, system and algorithm design, or prototype system deployment, but failed to consider the coverage problem, where the number of humans or vehicles participating in the sensing applications was a determining factor. 2.2. Coverage problems in sensor networks The coverage is often used to measure the sensing quality provided by a particular sensor network, which has received considerable research attention. Coverage problems in stationary WSNs have been extensively studied and relatively well understood. For more details on static coverage, we refer interested readers to several survey papers [16–18]. Recently, researchers have started to explore the dynamic coverage of mobile WSNs, where the mobility of sensors may be random or controlled. The authors of [35] showed that the coverage could be improved by allowing nodes to move continuously. The authors of [36] analyzed the target detection performance in a hybrid WSN consisting of both static and mobile nodes. The authors of [37] investigated the coverage of mobile heterogeneous WSNs where sensors have various sensing radii. All of these studies assumed that the sensors followed a random mobility model. The authors of [38,39] studied the sweep coverage problem, where a set of PoIs (Points of Interest) needed to be monitored periodically. They assumed that mobile sensors could be controlled to follow the planned trajectory for achieving the sweep coverage. However, none of these studies took the real features of human or vehicle mobility (e.g., hotspots effects and bursty nature) into consideration, and thus could not be applied to the opportunistic coverage problem in VSNs. 2.3. Coverage problems in people-centric urban sensing Some researchers designed incentive mechanisms to attract more users participating in the sensing applications [40–42]. The authors of [43] described a full-featured geo-social crowdsensing platform for smart cities, and proposed some policies to manage the participating users for completing the specific sensing tasks. However, they still failed to consider how to measure the coverage quality of urban vehicular sensing systems and how many users were required for guaranteeing the required coverage quality. The authors of [29] proposed the concept of probabilistic coverage in people-centric sensing. However, they focused only on node selection to satisfy the coverage of specific Area of Interest (AoI) instead of the whole urban area. Moreover, they assumed that humans moved according to a theoretical mobility model (Markov model). Similarly, the authors of [10] assumed that vehicles moved according to Manhattan model or City Section model, and used a compressive sensing approach to reconstruct the urban sensing map with random sensory data sampled by vehicles. In contrast, we perform empirical data analysis on real taxi mobility traces, showing that such theoretical mobility models cannot characterize the real features of opportunistic coverage. The authors of [44] proposed a participant selection framework, named CrowdRecruiter, for minimizing incentive payments by selecting a small number of participants while satisfying probabilistic coverage constraint. Similarly, we proposed energy-efficient cooperative sensing and transmission methods to satisfy coverage constraints [45,46]. However, it is still necessary to have a systematic study of the coverage problem in people-centric urban sensing [47]. Similar to our work, the authors of [48] presented a systematic study of the coverage and scaling properties of place-centric crowdsensing. In contrast, we focused more on the coverage problem for periodic urban vehicular sensing applications.

74

D. Zhao et al. / Computer Communications 60 (2015) 71–85

3. Vehicle recruitment framework In order to select the minimum number of vehicles to achieve the satisfactory coverage quality, we need to analyze the coverage quality based on the mobility traces of vehicles. However, one prerequisite is that we have obtained the mobility traces of vehicles. In fact, not all users want to share their mobile traces which may reveal their private and sensitive information. To solve this problem, incentive mechanisms were designed to attract users to contribute their private information [49,40,42]. However, there is still a key problem: how to set goals of incentive mechanisms, namely that how many users are required to contribute their mobile traces? In order to solve this problem at a low cost, we design a framework of recruiting vehicles for urban sensing, as illustrated in Fig. 2. There are three components residing in the cloud:  Incentive mechanism. First, the monitoring center publicize tasks of collecting a specific number (incentive goal) of mobile traces with an announced rewarding strategy; second, a set of users are selected by the monitoring center to report their mobility traces.  Coverage analysis. By analyzing the relationship between the coverage quality of an urban area and the number of vehicles, we can determine the goal of incentive mechanisms. Moreover, according to the analysis on coverage contributions of different vehicles, we can perform vehicle selection algorithm.  Vehicle selection. It can select the minimum number of vehicles to achieve the satisfactory coverage quality. Specially, the framework consists of five steps as follows. Step 1: Exploiting incentive mechanisms to collect mobility traces of a certain number of vehicles. Step 2: Performing coverage analysis based on the collected mobility traces, and estimating the required number of vehicles to achieve the specific coverage quality. Step 3: Exploiting incentive mechanisms to further collect mobility traces of the required number of vehicles. Step 4: Performing coverage analysis based on the collected mobility traces. Step 5: Selecting the minimum number of vehicles to achieve the specific coverage quality.

The above five steps are performed by the three components respectively, as illustrated in Fig. 2. Incentive mechanisms have been investigated by many recent studies for crowdsourcing applications [49,40,42]. For example, Danezis et al. used techniques from experimental economics and psychology to determine how much compensation must be offered to persuade someone to collect location information, and developed a sealed-bid second-price auction to attract user participation [49]. More recently, many researchers have developed sophisticated incentive mechanisms by considering different utility objectives from perspectives of both crowdsourcing platform (monitoring center) and participating users [40,42]. Here we provide a simple and effective mechanism for example: first, the monitoring center announces a total budget B to collect mobility traces of k users in a specified day by a sealed-bid auction; then, n interested users submit their bids to participate in the auction; finally, the monitoring center selects k users to buy their mobility traces with proper payments. Singer presented a budget feasible mechanism to satisfy this objective, which could guarantee the truthfulness (i.e., it makes each user report her true bid to obtain the best utility) [50]. If the monitoring center has not a specified budget constraint, the ‘‘M + 1-st Price Auction’’ mechanism [51] could be used. Note that we focus only on coverage analysis and vehicle selection in the rest of this paper, and the incentive mechanism is out of the scope of our discussion, since the existing method can serve as a stand-alone component in our framework. 4. Opportunistic coverage model We consider a VSN composed of n mobile vehicles V ¼ fv 1 ; v 2 ; . . . ; v n g equipped with sensors in a large-scale urban area, where v k ðk ¼ 1; 2; . . . ; nÞ is a vehicle identifier. Each vehicle has a movement trajectory, which is a sequence of time-ordered GPS points. The position of vehicle v k at time t is denoted by Lk ðtÞ. We focus on the coverage quality during a time span T, and divide T into l time slots with the same size, i.e., T ¼ l  T s , where T s represents a sampling period, as illustrated in Fig. 3(a). In space domain, we divide the whole urban area into m grid cells with the same size, as illustrated in Fig. 3(b). Thus, the urban area can be denoted by a set of grid cells: G ¼ fg 1 ; g 2 ; . . . ; g m g, where g i ði ¼ 1; 2; . . . ; mÞ is a grid cell identifier. The size of grid cells represents the spatial sensing granularity, which is decided by the application requirements. We consider that a grid cell is covered by a vehicle only when a new sampling period arrives and the location of the vehicle is just within the area of the grid cell. Let Cðg i ; Lk ðtÞÞ denote whether the grid cell g i is covered by the vehicle v k at the instant t, i.e.,

Ts

Sampling period Mobile vehicle

g

Trajectory T Ts

Time span

(a) Time domain Fig. 2. The framework of recruiting vehicles for urban sensing.

(b) Space domain

Fig. 3. The illustration of discretizing the time–space domain.

D. Zhao et al. / Computer Communications 60 (2015) 71–85

Cðg i ; Lk ðtÞÞ ¼



1; if t 2 fT s ; 2T s ; . . . ; lT s g and Lk ðtÞ 2 g i ; 0; otherwise:

ð1Þ

75

Obviously, the opportunistic coverage ratio increases monotonically with the number of vehicles and the time interval.

5. Empirical data analysis Definition 1 (Opportunistic coverage over a time interval). A grid cell g i 2 G is said to be opportunistically covered during the time interval s iff. it is covered by at least one vehicle during this interval, i.e., it should be satisfied that: tX n 0 þsX

Cðg i ; Lk ðtÞÞ P 1;

ð2Þ

t¼t 0 k¼1

where t 0 is an arbitrary instant. In order to characterize the frequency with which each grid cell can be sensed, we define the Inter-Cover Time as follows. Definition 2 (Inter-Cover Time (ICT)). For grid cell g i , the InterCover Time T I is defined as the time elapsed between two consecutive coverage, i.e.,

( T I , inf

s:

)

n X

Cðg i ; Lk ðt 0 þ sÞÞ P 1 ;

ð3Þ

k¼1

given that

Pn

k¼1 Cðg i ; Lk ðt 0 ÞÞ

P 1, and

Pn

k¼1 Cðg i ; Lk ðt 0

þ T s ÞÞ ¼ 0.

For example in Fig. 4, a grid cell is covered three times by vehicle v 1 ; v 2 , and v 3 , where the coverage durations for the grid cell are 3T s ; 2T s , and T s length, respectively. Meanwhile, we can extract two ICT values (denoted as T I in Fig. 4). In this paper, we mainly focus on the ICT Distribution (ICTD), and omit the effects of coverage durations, as we elaborate later in Section 5.2. Obviously, the ICT reflects the coverage quality directly: the shorter ICT means the better coverage quality for a grid cell. Intuitively, the ICTD is affected by two factors: the size of grid cells and the number of vehicles. Specially, the larger the grid cell size, or the more vehicles there are, then the shorter the ICT becomes, because the grid cell would be covered with more frequent opportunities. In order to characterize these factors, we denote the ICTD of grid cell g i under the condition with n vehicles as:

F i ðs; nÞ ¼ PfT I 6 sjðg ¼ g i ; N ¼ nÞg:

ð4Þ

In order to characterize the relationship between the coverage quality of an urban area and the number of vehicles, we define the opportunistic coverage ratio as follows. Definition 3 (Opportunistic coverage ratio). The opportunistic coverage ratio is the expected ratio of grid cells that can be opportunistically covered during the time interval s, which can be expressed as follows:

f I ð sÞ ¼

Pm

s; nÞ

i¼1 F i ð

m

:

ð5Þ

In this section, we first briefly introduce the two datasets of taxi mobility traces and the data preprocessing approach. Second, we analyze the coverage durations. Third, we introduce the method of model selection for evaluating the ICTD. Fourth, we show our empirical analysis results of the ICTD considering the effects of two factors: the size of grid cells and the number of vehicles. Finally, we evaluate the features of opportunistic coverage under four known mobility models, and provide a preliminary investigation on the reasons behind our key observation on these features. 5.1. Data description To explore the ICTD pattern in urban scenarios, we use two large-scale datasets of taxi mobility traces.1 The first one contains the GPS trajectories of 10,357 taxis during the period of February 2 to February 8, 2008 in Beijing [4]. The average sampling interval is about 177 s. The second one contains the GPS trajectories of 4316 taxis on February 20, 2007 in Shanghai [25]. The sampling interval is about 60 s when a taxi has passengers onboard or about 15 s when it is vacant. For both datasets, each GPS report is denoted by a tuple (taxi ID, timestamp, longitude, latitude). To account for GPS errors and facilitate the analysis, we perform the data preprocessing for both datasets including three steps as follows. Step 1: Removing all GPS points that contain error locations outside of the urban area. Step 2: Extracting the mobility traces of those vehicles which have at least one GPS report every 30 min during 6:00–24:00, and removing the mobility traces of those vehicles which remain still during the entire time span. Step 3: Recomputing a position at every 60 s by averaging all GPS points over that 60 s period. If there is no GPS report over one period, we use the interpolation approach to estimate the position. After the above steps, we finally obtain two new datasets that contain the GPS points of 4067 taxis (February 32) and 2079 taxis at every 60 s during 6:00–24:00 in Beijing and Shanghai, respectively. Part of mobility traces are shown in Fig. 5. We select the area within the 5th Ring Road (about 900 km2) in Beijing and the area of about 900 km2 in Shanghai for analysis. The time span T is 18 h (6:00–24:00), and the sampling period T s is 60 s. 5.2. Analysis on the coverage durations We analyze the taxi mobility traces in Beijing and Shanghai, and show the CCDF plots of the coverage durations with different numbers of vehicles in Fig. 6. We observe that all plots are almost a straight line in log–log scale, which means the distribution of the coverage durations closely resembles a power-law distribution.

Fig. 4. Extract ICTs of the same grid cell from GPS reports of vehicle v 1 ; v 2 , and v 3 . Boxes in dotted line denote coverage durations for the grid cell. Individual GPS reports are denoted by short arrow line segments.

1 Although there are other datasets available, they only have a small number of vehicles, which are not enough for comprehensive analysis. In order to evaluate our solution more completely, we still performed empirical data analysis on a small-scale taxi dataset in Rome, Italy [52], and found the same results as we detailed later in this paper. The details are omitted since the evaluation is limited. 2 In the rest of this paper, the taxi mobility traces in Beijing only contains part of traces on February 3 for analysis, and we further analyze the mobility traces on other days in Beijing only in Section 6.3 and Section 7.

76

D. Zhao et al. / Computer Communications 60 (2015) 71–85

Fig. 5. Distribution of mobility traces of 1000 taxis on Beijing and Shanghai maps during 6:00 am–12:00 am. Every red dot denotes a GPS point. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)

0

CCDF of coverage durations

CCDF of coverage durations

0

10

−1

10

−2

10

−3

10

N=1000 N=2000 N=3000 N=4000

−4

10

−5

10

−6

10

10

1

10

2

10

3

10

4

10

5

10

−1

10

−2

10

−3

10

N=500 N=1000 N=1500 N=2000

−4

10

−5

10

−6

10

1

10

2

10

10

3

10

4

5

10

time (seconds)

time (seconds)

(a) Results for Beijing datasets

(b) Results for Shanghai datasets

Fig. 6. CCDF of coverage durations with different number of vehicles.

Over 95 percent of coverage durations are less than 2 min, which can be used for one or several times of sensing operations. In addition, the distributions of coverage durations are almost the same with different numbers of vehicles. Thus, the coverage durations may be only determined by the speed of vehicles and the size of grid cells regardless of the number of vehicles. For this reason, we mainly focus on the ICTs, and omit the effects of coverage durations.

Table 1 Descriptions of three statistical models, including the name of the distribution, the ranges of data, the Probability Density Function (PDF) f ðxÞ and the Cumulative Distribution Function (CDF) FðxÞ. Distribution

f ðxÞ

FðxÞ

Exponential ðx P aÞ Power-law ðx P aÞ

kekðxaÞ ðk  1Þak1 xk

1  ekðxaÞ 1  ak1 x1k

Truncated Pareto ða 6 x 6 bÞ

kak xk1 k 1ða=bÞ

1a

k ðxk bk Þ k

1ða=bÞ

5.3. Model selection method for evaluating the ICTD Many measurement studies of human and vehicle mobility traces have shown that some statistical patterns, e.g., flight lengths, pause-times, and inter-contact times, all resemble some forms of distributions: exponential, power-law and truncated power-law distributions. Therefore, we use the three statistical

models to estimate the ICTD, as shown in Table 1, where the truncated Pareto distribution has a power-law tendency at the head part and decays exponentially at the tail. To identify the most appropriate model supported by the actual data from candidate models, we perform the Akaike test [53,54] including three steps as follows.

77

D. Zhao et al. / Computer Communications 60 (2015) 71–85

Step 1: Estimating the parameters of models using the Maximum Likelihood Estimation (MLE) method. Step 2: Calculating the Akaike’s Information Criterion (AIC) values for the models. The AIC value for model i 2 f1; 2; 3g is given by

AIC i ¼ 2 log½Li ðk^i jdata xÞ þ 2K i ;

ð6Þ

where Li ðk^i jdataxÞ is the likelihood in which the parameter ki is assigned with the estimated value from step 1 given the known data set x ¼ fx1 ; x2 ; . . .g, and K i is the number of parameters being estimated for model i (K 1 ¼ K 2 ¼ K 3 ¼ 1 here 3). Step 3: Determining the best model. The Akaike Weight (AW) can be considered as the relative likelihood of each model. Let

AIC min ¼ min fAIC i g;

ð7Þ

Mi ¼ AIC i  AIC min ;

ð8Þ

i2f1;2;3g

i 2 f1; 2; 3g:

Then, the AW values are given by

Table 2 The AIC and AW values for the ICT data from Beijing with different grid cell sizes. The number of taxis is 1000. Grid cell size

100 m  100 m 200 m  200 m 300 m  300 m 400 m  400 m 500 m  500 m 600 m  600 m 700 m  700 m 800 m  800 m 900 m  900 m 1000 m  1000 m

i 2 f1; 2; 3g:

ð9Þ

Thus, the best model is the one giving the minimum AIC value or the AW value closest to 1. 5.4. Analysis results of the ICTD We use the above model selection method to analyze the ICTD of Beijing and Shanghai taxi mobility datasets considering the effects of two factors. The MLE for the truncated Pareto distribution is performed over the range between two minutes up to the 99% quantile of each dataset. Similar analysis results are observed for both datasets. 5.4.1. Effects of the size of grid cells Intuitively, the larger the grid cell size, the shorter the ICT becomes. In order to analyze the effects of the grid cell size on the ICTD, we extract mobility traces of 1000 taxis in Beijing and Shanghai respectively, and vary the grid cell size from 100 m  100 m to 1000 m  1000 m. We observe the aggregated ICTD from MLE and Akaike test results, as shown in Table 2 and Table 3. Akaike test shows AW = 1 for the truncated Pareto distribution regardless of the grid cell size. We also observe from Table 2 and Table 3 that the AIC value for the exponential distribution is less than that for the power-law distribution when the grid cell size is 100 m  100 m, which means that the aggregated ICTD more closely resembles an exponential distribution than a power-law distribution. We can also see from Figs. 7(d) and 8(d) that the CCDF (Complementary Cumulative Distribution Function) plots of ICTs are almost a straight line in linear-log scale. As the grid cell size increases, the AIC value for the power-law distribution becomes less than that for the exponential distribution, which means that the aggregated ICTD more closely resembles a powerlaw distribution. We can also see from Figs. 7(b) and (c) and 8(b) and (c) that the CCDF plots of ICTs are almost straight lines in log–log scale. Anyhow, the truncated Pareto distribution reaches the optimal compromise. Observation 1. The aggregated ICTD closely resembles a truncated Pareto distribution regardless of the grid cell size.

3 k is the only unknown parameter for the three models; a and b are known as the lower and upper bound of the data xi 2 x.

Power-law

Truncated Pareto

AW

AIC

AW

AIC

AW

5,455,200 3,704,200 2,504,500 1,718,100 1,211,800 848,750 596,690 443,270 326,700 248,060

0 0 0 0 0 0 0 0 0 0

5,633,400 3,543,600 2,274,300 1,503,500 1,035,300 714,530 494,840 366,320 271,080 207,990

0 0 0 0 0 0 0 0 0 0

5,349,600 3,453,000 2,237,600 1,486,000 1,025,300 708,570 491,150 363,640 268,990 206,260

1 1 1 1 1 1 1 1 1 1

Table 3 The AIC and AW values for the ICT data from Shanghai with different grid cell sizes. The number of taxis is 1000. Grid cell size

eMi =3 wi ¼ P3 M =3 ; j j¼1 e

Exponential AIC

100 m  100 m 200 m  200 m 300 m  300 m 400 m  400 m 500 m  500 m 600 m  600 m 700 m  700 m 800 m  800 m 900 m  900 m 1000 m  1000 m

Exponential

Power-law

Truncated Pareto

AIC

AW

AIC

AW

AIC

AW

5,403,800 3,824,400 2,777,200 1,998,800 1,440,800 1,052,500 764,040 574,050 427,220 326,950

0 0 0 0 0 0 0 0 0 0

5,475,300 3,548,700 2,478,300 1,724,800 1,210,000 867,700 625,410 467,290 347,520 268,690

0 0 0 0 0 0 0 0 0 0

5,205,500 3,477,800 2,444,500 1,707,700 1,200,700 862,100 621,500 464,600 345,500 266,940

1 1 1 1 1 1 1 1 1 1

5.4.2. Effects of the number of vehicles Intuitively, the more vehicles there are, the shorter the ICT becomes. In order to analyze the effects of the vehicle number on the ICTD, we fix the grid cell size as 100 m  100 m, and extract mobility traces of different number of taxis in Beijing and Shanghai respectively. We observe the aggregated ICTD from MLE and Akaike test results, as shown in Table 4 and Table 5. Akaike test shows AW = 1 for the truncated Pareto distribution regardless of the vehicle number. We also observe that the AIC value for the exponential distribution is less than that for the power-law distribution when the vehicle number is less than or equal to 1500 in Beijing, which means that the aggregated ICTD more closely resembles an exponential distribution than a power-law distribution. In contrast, the aggregated ICTD more closely resembles a power-law distribution when the vehicle number is larger than 1500. Similarly, when the vehicle number is less than 1000 in Shanghai, the aggregated ICTD more closely resembles an exponential distribution, otherwise it more closely resembles a power-law distribution. We can also have a visualized observation from Fig. 9 and Fig. 10. Anyhow, the truncated Pareto distribution reaches the optimal compromise. Observation 2. The aggregated ICTD closely resembles a truncated Pareto distribution regardless of the vehicle number.

5.5. Evaluation on mobility models Based on the above empirical data analysis, the opportunistic coverage with vehicles has two important features: coverage imbalance and truncated power-law ICTs. In order to identify the reasons behind this particular pattern, we evaluate the two features under four known mobility models. Random Way Point (RWP) and

τ

D. Zhao et al. / Computer Communications 60 (2015) 71–85

τ

78

τ

τ

(b) Grid cell size: 500m 500m

τ

τ

(a) Grid cell size: 100m 100m

τ

τ

(c) Grid cell size: 1000m 1000m

(d) Grid cell size: 100m 100m

τ

τ

Fig. 7. The CCDF of ICTs collected from 1000 taxis in Beijing with different grid cell sizes.

τ

τ

(b) Grid cell size: 500m 500m

τ

τ

(a) Grid cell size: 100m 100m

τ

(c) Grid cell size: 1000m 1000m

τ

(d) Grid cell size: 100m 100m

Fig. 8. The CCDF of ICTs collected from 1000 taxis in Shanghai with different grid cell sizes.

Random Direction (RD) are two of the most widely used models mainly due to their simplicity of implementation and analysis. However, many studies have observed that they are insufficient to capture many features of human or vehicle mobility as follows: (F1) spatio-temporal correlation, (F2) geographic restrictions, (F3)

heterogeneous bounded mobility areas, (F4) truncated power-law flights and pause-times, (F5) truncated power-law inter-contact times, and (F6) fractal waypoints. In order to capture the features (F1) and (F2), the authors of [55] introduced the Manhattan model to simulate the movement pattern of mobile nodes on streets

D. Zhao et al. / Computer Communications 60 (2015) 71–85 Table 4 The AIC and AW values for the ICT data from Beijing with different vehicle numbers. The grid cell size is 100 m  100 m. Vehicle number

100 500 1000 1500 2000 2500 3000 3500 4000

Exponential

Power-law

Truncated Pareto

AIC

AW

AIC

AW

AIC

AW

816,710 3,025,200 5,455,200 7,656,600 9,662,000 11,531,000 13,221,000 14,753,000 16,060,000

0 0 0 0 0 0 0 0 0

925,700 3,233,000 5,633,400 7,700,300 9,526,900 11,181,000 12,640,000 13,950,000 15,038,000

0 0 0 0 0 0 0 0 0

815,410 3,007,000 5,349,600 7,392,900 9,208,600 10,857,000 12,314,000 13,622,000 1,471,200

1 1 1 1 1 1 1 1 1

Table 5 The AIC and AW values for the ICT data from Shanghai with different vehicle numbers. The grid cell size is 100 m  100 m. Vehicle number

100 500 1000 1500 2000

Exponential

Power-law

Truncated Pareto

AIC

AW

AIC

AW

AIC

AW

710,520 3,044,100 5,475,300 7,486,600 9,359,400

0 0 0 0 0

789,850 3,142,800 5,403,800 7,168,700 8,739,300

0 0 0 0 0

704,520 2,967,100 5,205,500 6,967,100 8,540,600

1 1 1 1 1

selecting proper mobility models for verifying cooperative sensing or coverage algorithms (e.g., [10,29]). In our simulations, 1000 nodes move in an area of 30 km  30 km during 18 h. Specially, we set the maximum pause time of nodes to 1 h in the RWP and RD models, and set the maximum movement time to 1 h in the RD model. In the Manhattan model, the urban map is composed of 250 horizontal streets and 250 vertical streets, and the movement patterns of nodes are referred to [55]. We set the movement velocity of each node to 20 km/h. The parameters of the SLAW model are summarized in Table 6. For detailed meanings of these parameters, please refer to [21]. On one hand, we observe the distributions of coverage times of 900 grid cells under the four mobility models by setting the grid cell size as 1000 m  1000 m, as shown in Fig. 11. We see that the coverage imbalance only appears in the SLAW model. It is because that in the other three models, all nodes move randomly in the whole area, and the probabilities that each node arrives different locations are the same. In contrast, the SLAW model can capture the feature (F6), namely the waypoints of nodes are modeled by fractal points, implying that people are always more attracted to more popular places. On the other hand, we observe the aggregated ICTD from MLE and Akaike test results under these mobility models, as shown in Table 7. The corresponding CCDF plots of ICTs are shown in Fig. 12. We see that the truncated power-law ICTs only appear in the SLAW model, and the other three models have exponential ICTs. These patterns can be explained by the queuing model proposed by Barabasi [28]. The behaviors that vehicles go to different grid cells can be treated as executing different tasks. According to Barabasi’s model, first-come-first-serve and random task execution lead to uniform Poisson-like dynamics, and the time interval between two consecutive actions by the same individual, called the waiting or inter-event time, follows an exponential distribution. In contrast, when individuals execute tasks based on

τ

τ

defined by maps. In order to capture the features (F3)–(F6), the authors of [21] introduced the Self-similar Least Action Walk (SLAW) model to characterize some significant statistical patterns of human mobility. Although the impacts of these mobility models on routing performance for Mobile Ad hoc NETworks (MANETs) or DTNs have been extensively analyzed, we evaluate these mobility models from the perspective of opportunistic coverage. As a byproduct, the evaluation results can also provide guidelines on

τ

τ

(a) Taxi number: 100

(b) Taxi number: 2000

τ

τ

79

τ

τ

(c) Taxi number: 3000

(d) Taxi number: 4000

Fig. 9. The CCDF of ICTs collected from different number of taxis in Beijing by setting the grid cell size as 100 m  100 m.

τ

D. Zhao et al. / Computer Communications 60 (2015) 71–85

τ

80

(a) Taxi number: 100

(b) Taxi number: 500

τ

τ

τ

τ

τ

τ

(c) Taxi number: 1500

(d) Taxi number: 2000

Fig. 10. The CCDF of ICTs collected from different number of taxis in Shanghai by setting grid cell size as 100 m  100 m.

Table 6 SLAW model parameters settings. Parameter

Value

Distance alpha Number of waypoints Hurst parameter Clustering range Levy exponent for pause time Minimum/maximum pause time

3 10,000 0.75 100 m 1 30 s/3600 s

computation cost of fitting the ICTDs of all grid cells is very high. In this section, we firstly take several grid cells for example to analyze the sensing quality of individual grid cells in a relative accurate way. Then, the opportunistic coverage ratio of the whole urban area is analyzed based on the aggregated ICTD for all grid cells in a simple and effective way. Finally, we show the changes of the opportunistic coverage ratio on different days of a week in Beijing, through which the effectiveness of our proposed models and methods are also verified. 6.1. Opportunistic coverage for individual grid cells

some perceived priority, the timing of the tasks will be heavy tailed, with most tasks being rapidly executed, whereas a few experience very long waiting times. Obviously, the movement patterns of nodes under the RWP, RD and Manhattan models resemble random task execution, thus leading to exponential ICTs. In contrast, the SLAW model first generates fractal waypoints over a 2D map. Then each node selects a subset of fractal waypoint clusters and restricts the movement to its own designated set of clusters. Finally, each node moves within these fractal waypoints by a Least Action Trip Planning (LATP) principle. This movement pattern resembles priority-based task execution, thus leading to powerlaw ICTs.

We select two grid cells (1000 m  1000 m) in Beijing and Shanghai respectively to analyze the ICTD by MLE and Akaike test. The mobility traces of different number of vehicles are extracted for analyzing the relationship between the ICTD and vehicle number. For each specified vehicle number, we extract multiple mutually-exclusive groups of vehicles, and calculate the aggregated results. As described before, the MLE for the truncated Pareto distribution is performed over the range between two minutes up to the 99% quantile of all samples. The results show that the ICTD for each grid cell follows a truncated Pareto distribution regardless of the vehicle number. Thus, the ICTD of each grid cell can be expressed as follows: k

ak ðsk  b Þ

6. Opportunistic coverage analysis

F i ðs; nÞ ¼ 1 

Recall that in Section 4, we must first obtain the ICTD of each grid cell F i ðs; nÞ, and then obtain the opportunistic coverage ratio f I ðsÞ of an urban area. However, it is a challenging problem for two reasons: (1) the samples of ICTs of some grid cells (especially the grid cells that vehicles seldom visit) are not enough to estimate their distributions accurately, and (2) the number of grid cells may be so large (e.g., there are 90,000 grid cells within the 5th Ring Road of Beijing if the grid cell size is 100 m  100 m) that the

Fig. 13 shows the relationships between the exponents of the truncated Pareto distributions and vehicle number for two grid cells in Beijing and Shanghai. We perform the least-square linear regression to these relationships, all showing good fitting results (all the coefficients of determination are larger than 0.98). Thus, we can use the linear functions to express the relationships between k and n. By combining these linear functions and Eq. (10), we can obtain the ICTD of each grid cell, as shown in Fig. 14.

1  ða=bÞ

k

:

ð10Þ

D. Zhao et al. / Computer Communications 60 (2015) 71–85

(a) RWP model

(b) RD model

(c) Manhattan model

(d) SLAW model

81

Fig. 11. The distributions of coverage times for 30  30 grid cells under four mobility models. The grid cell size is 1000 m  1000 m. All the values are normalized. Table 7 The AIC and AW values for the ICT data under four mobility models. The grid cell size is 100 m  100 m. Mobility model

RWP RD Manhattan SLAW

Exponential

Power-law

Truncated Pareto

AIC

AW

AIC

AW

AIC

AW

7,447,100 8,249,200 11,301,000 2,809,200

1 1 1 0

8,353,000 9,310,200 12,475,000 2,687,000

0 0 0 0

7,611,600 8,378,200 11,491,000 2,590,200

0 0 0 1

6.2. Opportunistic coverage for the whole urban area We denote the ratio of grid cells that can be opportunistically covered at least once during the entire time span T as pðnÞ, which increases with the vehicle number n. In order to avoid the problems of insufficient samples of individual grid cells and high computation cost, we assume these grid cells have the same ICTD, and analyze the opportunistic coverage ratio based on the aggregated ICTD F a ðs; nÞ. Therefore, the opportunistic coverage ratio of the whole urban area can be expressed as

f I ð sÞ ¼

F a ðs; nÞ  m  pðnÞ ¼ F a ðs; nÞ  pðnÞ: m

ð11Þ

The expression of F a ðs; nÞ can be obtained by the same way as shown in Section 6.1. We can also use a linear function to obtain a good fitting result for pðnÞ. By combining these expressions and Eq. (11), we can obtain a general expression of f I ðsÞ. The numerical results for Beijing and Shanghai mobility datasets are shown in Fig. 15. Obviously, f I ðsÞ increases monotonically with n and s. Thus, we can easily estimate the required number of vehicles to achieve the specific coverage quality. For example, we need to deploy at least 1700 and 1900 vehicles in Beijing and Shanghai respectively, so that the opportunistic coverage ratios are not less than 50% during the time interval of one hour. Although different number of vehicles may be needed to achieve the specific opportunistic coverage ratio for different cities, our proposed models and methods provide general guidelines for network planning.

6.3. The regularity of opportunistic coverage ratio Intuitively, people always have varied mobility patterns on different days, e.g., workdays, rest days or festivals. Thus, it is important to investigate the regularity of opportunistic coverage ratio for the whole urban area. From the previous subsection, we have known that at least 1700 taxies could be exploited for guaranteeing at least 50% opportunistic coverage ratio during the time interval of one hour, by analyzing the taxi mobility traces on February 3, 2008 within the 5th Ring Road in Beijing. In this subsection, we further analyze the changes of opportunistic coverage ratio on different days (February 4 to February 7) of a week. Specially, we perform the same data preprocessing described in Section 5.1 for the mobility traces on these 4 days. Then we can obtain the GPS points of different number of taxis at every 60 s during 6:00–24:00 within the 5th Ring Road in Beijing. The total numbers of taxis contained in mobility traces on different days are shown in Table 8. Note that, because no one taxi has at least one GPS report every 30 min during 6:00–24:00 on February 2 and February 8, the mobility traces on the two days will not be analyzed. Now we randomly select three groups of taxis (each group has 1700 taxis) on the four days (February 4 to February 7) respectively, and make statistical analysis on the opportunistic coverage ratios achieved by these taxis. The results are shown in Table 8. We see that the opportunistic coverage ratios on February 4 and February 5 achieve 50.63% and 48.26% on average, respectively, both of which are very close to 50%. However, both of the opportunistic coverage ratios on February 6 and February 7 are less than 50%. Especially on February 7, it is only 30.84%. At the same time, we notice that, the Chinese Spring Festival (Chinese New Year) is just on February 7, 2008. On the Spring Festival and New Year’s Eve, most of the Chinese people often gathered together to eat dinner and celebrate together at home, and seldom go out. Thus, the taxis had less customers and more time to remain still on these two days than usual. We conjecture that it is an important reason why the opportunistic coverage ratios are reduced on these two

D. Zhao et al. / Computer Communications 60 (2015) 71–85

τ

τ

82

τ

τ

(a) RWP model

(b) RD model 0

-1

10

I

τ

CCDF P(T >τ)

10

-2

10

ICTD (SLAW) Exponential Power law Truncated Pareto

-3

10 0 10

1

2

10

3

10

10

τ (min)

τ

(c) Manhattan model

(d) SLAW model

4

2

3

1.5

Exponent: λ

Exponent: λ

Fig. 12. The CCDF of ICTs collected from mobility traces of 1000 nodes during 18 h generated by four mobility models. The grid cell size is 100 m  100 m.

2

Exponent Fitted line 1

1

Exponent Fitted line 0.5

0

1000

2000

3000

0

4000

500

1000

1500

2000

Vehicle number: n

Vehicle number: n

(a) Exponents of ICTD in Beijing

(b) Exponents of ICTD in Shanghai

Fig. 13. The relation between the exponents of ICTD and vehicle number for two grid cells in Beijing and Shanghai.

F (τ;n)

1 0.8

i

0.8

i

F (τ;n)

1

0.6 4000 3000 2000 n 1000

0.6 60

2000

20

τ

40

1000

(min)

(a) ICTD of one grid cell in Beijing

60

1500

40

n

500

20

τ

(min)

(b) ICTD of one grid cell in Shanghai

Fig. 14. The relationship among F i ðs; nÞ; n, and

s for two grid cells in Beijing and Shanghai.

83

D. Zhao et al. / Computer Communications 60 (2015) 71–85

1

fI(τ)

fI(τ)

1

0.5

0 4000 3000 2000 n 1000

0 2000 800 1000

Table 8 Opportunistic coverage ratios within the 5th Ring Road in Beijing achieved by 1700 randomly selected taxis on different days.

4, 5, 6, 7,

2008 2008 2008 2008

# of taxis in total

Opportunistic coverage ratio Group 1 (%)

Group 2 (%)

Group 3 (%)

Average (%)

3982 3727 3549 3600

50.00 47.88 35.66 31.17

51.34 49.13 35.95 31.24

50.56 47.76 35.38 30.10

50.63 48.26 35.66 30.84

days. At the same time, it suggests that we should take the regularity of human mobility patterns on different days into consideration when we perform the network planning for opportunistic urban sensing applications. On the other hand, we see that different groups with the same number of taxis on the same day can achieve almost the same opportunistic coverage ratio (at most 1.34% difference). It shows that the relationship between opportunistic coverage ratio and vehicle number is stable. Thus, our proposed models and methods can characterize this relationship accurately. 7. Vehicle selection algorithm In practice, it is hard to guarantee that all the grid cells are covered during the specific time interval, because there always exist some places, such as parks and lakes, where vehicles cannot pass through, or some remote places where vehicles seldom reach. Thus, it is reasonable to limit the opportunistic coverage ratio not to be less than a specified threshold. In order to handle this problem, we divide the time span T into l coverage periods with the same size of s. Assume that a set of grid cells Gx # G can be opportunistically covered during the x-th (1 6 x 6 l) coverage period by a randomly selected set of vehicles V r ¼ fv r1 ; v r2 ; . . . ; v rn g. Thus, the problem is converted as: how to select the minimum number of vehicles so that the same grid cells can be opportunistically covered during each coverage period? Definition 4 (Coverage contribution matrix). Let dk;i ðxÞ denote the times that the vehicle v k 2 V r covers the grid cell g i 2 G within the x-th (x ¼ 1; 2; . . . ; l) coverage period. Therefore, a coverage contribution matrix can be formulated for each vehicle v k :

2

dk;1 ð1Þ dk;2 ð1Þ    dk;ðm1Þ ð1Þ dk;m ð1Þ

1000

n

τ(

Fig. 15. The relationship among f I ðsÞ; n, and

February February February February

1500

600 200 400 min)

(a) Results for Beijing datasets

Date

0.5

3

7 6 6 dk;1 ð2Þ dk;2 ð2Þ    dk;ðm1Þ ð2Þ dk;m ð2Þ 7 7 6 Dk ¼ 6 . 7 .. .. .. .. 7 6 .. . . . . 5 4 dk;1 ðlÞ dk;2 ðlÞ    dk;ðm1Þ ðlÞ dk;m ðlÞ lm:

500

1000 600 800 400 200 min)

τ(

(b) Results for Shanghai datasets s for Beijing and Shanghai mobility datasets.

The vehicle selection problem can be formally formulated as follows.

Definition 5 (Vehicle selection problem). We are given a randomly selected set of mobile vehicles V r ¼ fv r1 ; v r2 ; . . . ; v rn g. Each vehicle v k has a coverage contribution matrix Dk , satisfying that a set of grid cells Gx # G can be opportunistically covered during the x-th coverage period, 1 6 i 6 l, namely that: rn X

8g i 2 Gx ; 8x ¼ 1; 2; . . . ; l:

dk;i ðxÞ P 1;

k¼r1

The problem is to find the minimum-size subset V min # V r , satisfying that:

X

dk;i ðxÞ P 1;

8g i 2 Gx ; 8x ¼ 1; 2; . . . ; l:

v k 2V min

For the special case when l ¼ 1, this problem can be treated as a Set Cover problem, which has been proved to be NP-hard [56]. Thus, the more general Vehicle Selection problem is also NP-hard. In order to solve it, we adopt a greedy strategy, which has been proven to have a polynomial time complexity for Set Cover problem with a ð1  1=eÞ-approximation solution. However, we need to change the evaluation function to adapt to the more general vehicle selection problem. For each vehicle v k , we define an evaluation function:

f ðv k Þ ¼

l X X

sgnðdk;i ðxÞÞ;

x¼1 g i 2Gx

which quantifies the total number of grid cells that can be opportunistically covered by adding the vehicle v k in V min . The details of the Vehicle Selection Algorithm are shown in Algorithm 1. First, we initialize the set of selected vehicles V min as an empty set (line 2). Then, the algorithm works by picking a best candidate vehicle v b in each iteration, and stops when all the grid cells are removed from each of Gx ðx ¼ 1; 2; . . . ; lÞ. In each iteration, we first compute the values of the evaluation function f ðv k Þ for each vehicle v k , and then select a best candidate vehicle v b that maximize f ðv k Þ (line 4). After v b is selected, we remove g i from Gx if g i can be opportunistically covered within the x-th coverage period, namely db;i ðxÞ P 1 (line 5–9). At the same time, v b is removed from V r , and placed in V min (lines 10 and 11). When the algorithm terminates, the set V min contains a subset of V r that can achieve the same opportunistic coverage ratio with that can be achieved by V r .

84

D. Zhao et al. / Computer Communications 60 (2015) 71–85

Opportunistic coverage ratio

Ratio of selected vehicles

0.75 0.7 0.65 0.6 0.55

1000

2000

3000

4000

0.8 0.6 0.4

All Vehicles (HS) All Vehicles (TS) Selected Vehicles (TS)

0.2 0

1000

2000

3000

4000

# of all vehicles in V

# of all vehicles in V

(a) Ratio of selected vehicles

(b) Opportunistic coverage ratio

r

r

Vmin Vr Fig. 16. Evaluation results of the vehicle selection algorithm with different number of vehicles in V r .

Algorithm 1. Vehicle selection algorithm

1: Given fG1 ; G2 ; . . . ; Gl g; V r ¼ fv r1 ; v r2 ; . . . ; v rn g; Dk ðk ¼ r1; r2; . . . ; rnÞ ; 2: V min 3: while G1 – ;kG2 – ;k    kGl – ; do 4: Select v b ¼ v k 2 V r that maximize P P f ðv k Þ ¼ lx¼1 g i 2Gx sgnðdk;i ðxÞÞ 5: for each g i 2 Gx ðx ¼ 1; 2; . . . ; lÞ do 6: if db;i ðxÞ P 1 then Gx n fg i g; 7: Gx 8: end if 9: end for 10: V r V r n fv b g; 11: V min V r [ fv b g; 12: end while 13: return V min

In order to evaluate the vehicle selection algorithm, we implemented a standalone VC++ simulation platform to perform simulations based on the taxi mobility traces in Beijing. Specially, the dataset is divided into two parts: the mobility traces during the first three days (February 3 to February 5) are used as history set (HS), based on which the coverage contribution matrices of all vehicles are measured and a set of vehicles are selected according to the Vehicle Selection Algorithm; the mobility traces during the last two days (February 6 and February 7) are used as test set (TS), based on which the opportunistic coverage ratio are re-evaluated by using the selected vehicles. The coverage period s is set to be one hour. We vary the number of vehicles in V r from 500 to 4000 with the increment of 500. Fig. 16(a) shows the ratio of selected vehicles to all vehicles: jV min j=jV r j. We observe that the ratio decreases with the increasing of the number of vehicles in V r , implying the growth of redundancy. Specially, the ratio reduces to only 57.8% when the number of vehicles in V r equals to 4000. Fig. 16(b) compares the opportunistic coverage ratios that can be achieved by all vehicles in V r and that by selected vehicles in V min . Since the vehicle selection algorithm is executed based on the HS, the same opportunistic coverage ratio can be achieved by selected vehicles in V min as that by all vehicles in V r . More importantly, from Fig. 16(b) we can see that almost the same opportunistic coverage ratio can be achieved by selected vehicles as that by all vehicles based on the TS (at most 4.22% difference). It implies that the mobility traces of vehicles are often stable enough to enable

meaningful prediction of future traces with a short history. Thus, it is effective and efficient to exploit history traces of vehicles to select a subset of vehicles for achieving the same opportunistic coverage ratio in the future.

8. Conclusions We design an effective and efficient framework to recruit vehicles for urban sensing while guaranteeing the specific coverage quality requirement. This framework involves three basic problems. First, in order to measure the coverage quality, we use the ICT to characterize the opportunity with which a subregion is covered. According to the empirical measurement studies on real mobility traces of thousands of taxis collected in Beijing and Shanghai, we find that the aggregated ICTD closely resembles a truncated Pareto distribution regardless of the size of subregions and the number of vehicles. We evaluate four known mobility models from the perspective of opportunistic coverage, and investigate the reasons behind our observations. Second, in order to characterize the relationship between the coverage quality of an urban area and the number of vehicles, we propose a metric called opportunistic coverage ratio, and derive it as a function of the aggregated ICTD. We analyze the changes of opportunistic coverage ratios on different days based on taxi mobility traces. Third, we propose a vehicle selection algorithm to select the minimum number of vehicles to achieve the specific coverage quality requirement. Our work provides fundamental guidelines on the coverage measurement and network planning for urban vehicular sensing applications. Acknowledgments This work is supported by the National Natural Science Foundation of China under Grant No. 61332005 and No. 61133015, the Funds for Creative Research Groups of China under Grant No. 61421061, the Specialized Research Fund for the Doctoral Program of Higher Education under Grant No. 20120005130002, and the Cosponsored Project of Beijing Committee of Education. The research of Prof. Xiang-Yang Li is partially supported by NSF CMMI 1436786, National Natural Science Foundation of China under Grant No. 61272426. References [1] D. Cuff, M. Hansen, J. Kang, Urban sensing: out of the woods, Commun. ACM 51 (3) (2008) 24–33. [2] M. Srivastava, T. Abdelzaher, B. Szymanski, Human-centric sensing, Philos. Trans. R. Soc. A: Math. Phys. Eng. Sci. 370 (1958) (2012) 176–197.

D. Zhao et al. / Computer Communications 60 (2015) 71–85 [3] A. Campbell, S. Eisenman, N. Lane, E. Miluzzo, R. Peterson, People-centric urban sensing, in: Second ACM/IEEE International Conference on Wireless Internet (WiCon), 2006, pp. 18–31. [4] Y. Zheng, Y. Liu, J. Yuan, X. Xie, Urban computing with taxicabs, in: Proc. of ACM Ubicomp, 2011. [5] X. Mao, X. Miao, Y. He, T. Zhu, J. Wang, W. Dong, X. LI, Y. Liu, Citysee: urban CO2 monitoring with sensors, in: Proc. of IEEE INFOCOM, 2012. [6] N. Lane, S. Eisenman, M. Musolesi, E. Miluzzo, A. Campbell, Urban sensing systems: opportunistic or participatory?, in: Proc. of the 9th ACM Workshop on Mobile Computing Systems and Applications, 2008, pp. 11–16. [7] R.K. Ganti, F. Ye, H. Lei, Mobile crowdsensing: current state and future challenges, IEEE Commun. Mag. 49 (11) (2011) 32–39. [8] A. Campbell, S. Eisenman, N. Lane, E. Miluzzo, R. Peterson, H. Lu, X. Zheng, M. Musolesi, K. Fodor, G. Ahn, The rise of people-centric sensing, IEEE Internet Comput. 12 (4) (2008) 12–21. [9] U. Lee, M. Gerla, A survey of urban vehicular sensing platforms, Comput. Netw. 54 (4) (2010) 527–544. [10] X. Yu, H. Zhao, L. Zhang, S. Wu, B. Krishnamachari, V. Li, Cooperative sensing and compression in vehicular sensor networks for urban monitoring, in: Proc. of IEEE ICC, 2010, pp. 1–5. [11] S. Hu, Y. Wang, C. Huang, Y. Tseng, A vehicular wireless sensor network for CO2 monitoring, IEEE Sens. (2009) 1498–1501. [12] P. Mohan, V.N. Padmanabhan, R. Ramjee, Nericell: rich monitoring of road and traffic conditions using mobile smartphones, in: Proc. of ACM SenSys, 2008, pp. 323–336. [13] A. Thiagarajan, L. Ravindranath, K. LaCurts, S. Madden, H. Balakrishnan, S. Toledo, J. Eriksson, Vtrack: accurate, energy-aware road traffic delay estimation using mobile phones, in: Proc. of ACM SenSys, 2009, pp. 85–98. [14] J. Eriksson, L. Girod, B. Hull, R. Newton, S. Madden, H. Balakrishnan, The pothole patrol: using a mobile sensor network for road surface monitoring, in: Proc. of ACM MobiSys, 2008. [15] S. Mathur, T. Jin, N. Kasturirangan, J. Chandrasekaran, W. Xue, M. Gruteser, W. Trappe, Parknet: drive-by sensing of road-side parking statistics, in: Proc. of MobiSys, 2010, pp. 123–136. [16] S. Meguerdichian, F. Koushanfar, M. Potkonjak, M. Srivastava, Coverage problems in wireless ad hoc sensor networks, in: Proc. of IEEE INFOCOM, 2001, pp. 1380–1387. [17] M. Cardei, J. Wu, Energy-efficient coverage problems in wireless ad-hoc sensor networks, Comput. Commun. 29 (4) (2006) 413–420. [18] A. Ghosh, S. Das, Coverage and connectivity issues in wireless sensor networks: a survey, Pervasive Mob. Comput. 4 (3) (2008) 303–334. [19] A. Chaintreau, P. Hui, J. Crowcroft, C. Diot, R. Gass, J. Scott, Impact of human mobility on the design of opportunistic forwarding algorithms, in: Proc. of IEEE INFOCOM, 2006. [20] I. Rhee, M. Shin, S. Hong, K. Lee, S. Chong, On the levy-walk nature of human mobility, in: Proc. of IEEE INFOCOM, 2008. [21] K. Lee, S. Hong, S.J. Kim, I. Rhee, S. Chong, SLAW: self-similar least-action human walk, IEEE/ACM Trans. Netw. (TON) 20 (2) (2012) 515–529. [22] T. Karagiannis, J. Le Boudec, M. Vojnovic, Power law and exponential decay of inter contact times between mobile devices, in: Proc. of ACM MobiCom, 2007. [23] X. Zhang, J. Kurose, B. Levine, D. Towsley, H. Zhang, Study of a bus-based disruption-tolerant network: mobility modeling and impact on routing, in: Proc. of ACM MobiCom, 2007, pp. 195–206. [24] A. Balasubramanian, B. Levine, A. Venkataramani, DTN routing as a resource allocation problem, in: Proc. of ACM SIGCOMM, 2007, pp. 373–384. [25] H. Zhu, L. Fu, G. Xue, Y. Zhu, M. Li, L. Ni, Recognizing exponential inter-contact time in vanets, in: Proc. of IEEE INFOCOM, 2010, pp. 1–5. [26] D. Brockmann, L. Hufnagel, T. Geisel, The scaling laws of human travel, Nature 439 (7075) (2006) 462–465. [27] M. Gonzalez, C. Hidalgo, A. Barabási, Understanding individual human mobility patterns, Nature 453 (7196) (2008) 779–782. [28] A. Barabasi, The origin of bursts and heavy tails in human dynamics, Nature 435 (2005) 207–211. [29] A. Ahmed, K. Yasumoto, Y. Yamauchi, M. Ito, Distance and time based node selection for probabilistic coverage in people-centric sensing, in: Proc. of IEEE SECON, 2011, pp. 134–142. [30] D. Zhao, H.-D. Ma, L. Liu, J. Zhao, On opportunistic coverage for urban sensing, in: Proc. of IEEE MASS, 2013, pp. 231–239.

85

[31] B. Hull, V. Bychkovsky, Y. Zhang, K. Chen, M. Goraczko, A. Miu, E. Shih, H. Balakrishnan, S. Madden, Cartel: a distributed mobile sensor computing system, in: Proc. of ACM SenSys, 2006, pp. 125–138. [32] U. Lee, B. Zhou, M. Gerla, E. Magistretti, P. Bellavista, A. Corradi, Mobeyes: smart mobs for urban monitoring with a vehicular sensor network, IEEE Wirel. Commun. 13 (5) (2006) 52–57. [33] P. Dutta, P. Aoki, N. Kumar, A. Mainwaring, C. Myers, W. Willett, A. Woodruff, Common sense: participatory urban sensing using a network of handheld air quality monitors, in: Proc. of ACM SenSys, 2009, pp. 349–350. [34] R. Rana, C. Chou, S. Kanhere, N. Bulusu, W. Hu, Ear-phone: an end-to-end participatory urban noise mapping system, in: Proc. of ACM/IEEE IPSN, 2010, pp. 105–116. [35] B. Liu, P. Brass, O. Dousse, P. Nain, D. Towsley, Mobility improves coverage of sensor networks, in: Proc. of ACM/IEEE MobiHoc, 2005, pp. 300–308. [36] T. Wimalajeewa, S. Jayaweera, Impact of mobile node density on detection performance measures in a hybrid sensor network, IEEE Trans. Wirel. Commun. 9 (5) (2010) 1760–1769. [37] X. Wang, X. Wang, J. Zhao, Impact of mobility and heterogeneity on coverage and energy consumption in wireless sensor networks, in: Proc. of IEEE ICDCS, 2011. [38] M. Li, W. Cheng, K. Liu, Y. Liu, X. Li, X. Liao, Sweep coverage with mobile sensors, IEEE Trans. Mob. Comput. 10 (11) (2011) 1534–1545. [39] D. Zhao, H.-D. Ma, L. Liu, Mobile sensor scheduling for timely sweep coverage, in: Proc. of IEEE WCNC, 2012, pp. 1771–1776. [40] D. Yang, G. Xue, X. Fang, J. Tang, Crowdsourcing to smartphones: incentive mechanism design for mobile phone sensing, in: Proc. of ACM MobiCom, 2012. [41] L.G. Jaimes, I. Vergara-Laurens, M.A. Labrador, A location-based incentive mechanism for participatory sensing systems with budget constraints, in: Proc. of IEEE PerCom, 2012, pp. 103–108. [42] D. Zhao, X.-Y. Li, H.-D. Ma, Budget-feasible online incentive mechanisms for crowdsourcing tasks truthfully. IEEE/ACM Trans. Netw. (TON) (2014). in press. [43] G. Cardone, L. Foschini, C. Borcea, P. Bellavista, A. Corradi, M. Talasila, R. Curtmola, Fostering participaction in smart cities: a geo-social crowdsensing platform, IEEE Commun. Mag. 51 (6) (2013) 112–119. [44] D. Zhang, H. Xiong, L. Wang, G. Chen, Crowdrecruiter: selecting participants for piggyback crowdsensing under probabilistic coverage constraint, in: Proc. of ACM Ubicomp, 2014, pp. 703–714. [45] D. Zhao, H.-D. Ma, L. Liu, Energy-efficient opportunistic coverage for peoplecentric urban sensing, Wirel. Netw. 20 (6) (2014) 1461–1476. [46] D. Zhao, H.-D. Ma, S. Tang, X.-Y. Li, Coupon: a cooperative framework for building sensing maps in mobile opportunistic networks, IEEE Trans. Parallel Distrib. Syst. 26 (2) (2015) 392–402. [47] H.-D. Ma, D. Zhao, P. Yuan, Opportunities in mobile crowd sensing, IEEE Commun. Mag. 52 (8) (2014) 29–35. [48] Y. Chon, N.D. Lane, Y. Kim, F. Zhao, H. Cha, Understanding the coverage and scalability of place-centric crowdsensing, in: Proc. of ACM Ubicomp, 2013, pp. 3–12. [49] G. Danezis, S. Lewis, R. Anderson, How much is location privacy worth, in: 4th Workshop on the Economics of Information Security (WEIS), 2005. [50] Y. Singer, Budget feasible mechanisms, in: Proc. of IEEE FOCS, 2010, pp. 765– 774. [51] M. Abe, K. Suzuki, M + 1-st price auction using homomorphic encryption, in: Public Key Cryptography, 2002, pp. 115–124. [52] L. Bracciale, M. Bonola, P. Loreti, G. Bianchi, R. Amici, A. Rabuffi, CRAWDAD data set roma/taxi (v. 2014–07-17), 2014, . [53] K. Burnham, D. Anderson, Model Selection and Multimodel Inference: A Practical Information-Theoretic Approach, Springer, 2002. [54] A. Edwards, R. Phillips, N. Watkins, M. Freeman, E. Murphy, V. Afanasyev, S. Buldyrev, M. da Luz, E. Raposo, H. Stanley, et al., Revisiting lévy flight search patterns of wandering albatrosses, bumblebees and deer, Nature 449 (7165) (2007) 1044–1048. [55] F. Bai, N. Sadagopan, A. Helmy, IMPORTANT: a framework to systematically analyze the impact of mobility on performance of routing protocols for adhoc networks, in: Proc. of IEEE INFOCOM, 2003, pp. 825–835. [56] M. Garey, D. Johnson, Computers and Intractability: A Guide to the Theory of NP-Completeness, Freeman, New York, 1979.