A novel change-detection scheduler for a network of depth sensors

A novel change-detection scheduler for a network of depth sensors

Journal Pre-proofs A novel change-detection scheduler for a network of depth sensors Maryam S. RasouliDanesh, Shahram Payandeh PII: DOI: Reference: S...

3MB Sizes 0 Downloads 18 Views

Journal Pre-proofs A novel change-detection scheduler for a network of depth sensors Maryam S. RasouliDanesh, Shahram Payandeh PII: DOI: Reference:

S1047-3203(19)30354-2 https://doi.org/10.1016/j.jvcir.2019.102733 YJVCI 102733

To appear in:

J. Vis. Commun. Image R.

Received Date: Revised Date: Accepted Date:

4 May 2019 18 October 2019 3 December 2019

Please cite this article as: M.S. RasouliDanesh, S. Payandeh, A novel change-detection scheduler for a network of depth sensors, J. Vis. Commun. Image R. (2019), doi: https://doi.org/10.1016/j.jvcir.2019.102733

This is a PDF file of an article that has undergone enhancements after acceptance, such as the addition of a cover page and metadata, and formatting for readability, but it is not yet the definitive version of record. This version will undergo additional copyediting, typesetting and review before it is published in its final form, but we are providing this version to give early visibility of the article. Please note that, during the production process, errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

© 2019 Published by Elsevier Inc.

A novel change-detection scheduler for a network of depth sensors Maryam S. RasouliDanesh*1 and Shahram Payandeh 2 1 2

Networked Robotics and Sensing Laboratory, Simon Fraser University, Canada; [email protected] Networked Robotics and Sensing Laboratory, Simon Fraser University, Canada; [email protected] Abstract— In many monitoring applications such as smart home and surveillance, deployment of multiple depth sensors increases monitoring area and offers better occlusion handling which is not sensitive to illumination condition in comparison with RGB sensors. However, multiple sensors also increase the volume of data associated with signal processing alongside the associated computational complexity and power consumption. In order to address these drawbacks, this paper proposes a novel change detection algorithm that can be used as a part of a sensor scheduler in a centralized (e.g. star) network configuration. Initially, each sensor in the network performs a unique single scan of the common environment in order to detect any incremental changes in the sensed depth signal. This initial change detection is then used as a basis for several follow-up tasks such as foreground segmentation, background detection, target detection, and tracking for monitoring tasks. Here, instead of processing a complete depth frame, we proposed to utilize a collection of 1D scans of the depth frames. A confidence function is defined that can be used to estimate the reliability of the detected changes in each sensor and to reduce any false positive events which can be triggered by the noise and outliers. Analysis of the proposed confidence function is carried out through performance analysis in the presence of sensor noise and other parameters which can affect the reliability of the sensed data of each sensor. Finally, a score function is defined based on the confidence of the detected parameters and sensor resolution in order to rank and match sensors with the associated objects to be tracked. It results in tracking target(s) by a sensor (or sensors) that offer a high tracking score. This approach offers many advantages such as decreasing the overall system power consumption by placing the sensors with a low confidence value on standby mode and reducing the overall computational overheads. Index Terms— Change detection; Depth sensor; Network sensor; Sensor network scheduler; background subtraction; RGBD tracking

——————————  ——————————

1 INTRODUCTION Monitoring systems are deployed in various environments ranging from public areas such as airports and hospitals to private places such as homes in support of ageing in place. These systems are deployed to satisfy a wide range of requirements such as health, security, and surveillance. In general, different types of sensors have been utilized in such monitoring environments which can include wearable sensors (e.g. accelerometers) and ambient type sensors (e.g. RGB and depth cameras). Ambient sensors (such as RGB cameras) are more common in applications such as surveillance monitoring. Depth sensor which is another popular ambient sensor has recently been gaining popularity which can replace, or sometimes, complement information obtained through RGB sensors. For surveillance purposes, depth sensors can preserve the privacy of the subjects which is especially crucial when

it is utilized for monitoring living area of smart homes e.g. elderly monitoring in favour of ageing in place. In addition, depth sensors can monitor in low illumination conditions where traditional RGB cameras fail to function. This feature makes depth sensors a suitable alternative to be employed in many applications such as sleep monitoring for movement disorders[1]. In addition, they have advantages over the traditional RGB cameras in application of background subtraction, not being sensitive to shadow, and changes in illumination conditions. However, event detection remains a challenging task for depth sensor due to the existence of outliers and noise associated with depth frame captured by commodity type depth sensor which can also include the existence of depth camouflage and missing points in depth frame. The information associated

Figure 1: The overview of three steps of the proposed method

with 3D images leads many attempts to provide a 3D representation of an image based on 2D traditional images [2],[3]. Initial change detection in the sensed environment is an important part of a monitoring system and is the first step, among other follow-up tasks such as background subtraction, human body detection, and target tracking. The background estimation can be extended to utilize more than one sensor. While a single depth sensor can only cover a portion of the scene, multiple depth sensors can cover a larger area. Some of the advantages of multi-sensor utilization are shown in [4]. It offers an increased resolution and can reduce various instances of occlusion which may occur using a single sensor. In a network of depth sensors, an event in the environment can trigger more than one sensor. Using information from all collected data associated with all triggered sensors would not necessarily increase the accuracy of the detection since they usually provide redundant information. To improve the efficiency of the network, the most qualified sensor(s) should be selected to track a target. Moreover, in each instance, the triggered sensor(s) should be ranked and scheduled accordingly in order to provide higher accuracy in detection and tracking. In this paper, we present a sensor scheduler to accurately assign a tracking task to the most qualified sensor(s). To the best of our knowledge, the proposed method is the first approach in designing an efficient scheduler for a network of depth sensors. The proposed method contains three related steps where each step is built on top of the previous one. Figure (1) shows these three steps. In the first step, we analyze Kinect V 2.0 sensor to detect the systematic and non-systematic noise behaviour and present a formalism to be utilized as the basis of the second phase, change detection algorithm. In the change detection algorithm, we have utilized a 1D scan to reduce the time and computational complexity of the system. In our previous publications, we also showed the efficiency of using 1D scans in other applications such as sleep posture[1] and posture estimation[5]. The output of this phase is what we refer to as a “confid ence functio n of the scan” which then is used in the scheduler to rank and identify the sensors’ qualification to be assigned for the tracking task. The remaining of the paper is organized as follows. In section 2 a review of some of the most related works is presented. Section 3 highlights the proposed method for collecting 1D scans. Section 4 presents details of the proposed temporal sensor scheduler algorithm which can be associated with the tracking task. In section 5, an evaluation of the proposed method is presented forboth single and multi-sensor scenarios. Finally, some discussions and concluding remarks along with the future works are presented in section 6.

2. RELATED WORKS The overall method proposed in this paper can be divided into three steps which are highlighted in the following sections (Figure 1). Hence, the presented related works in this section will cover each part of the proposed method. 2

First, we review the literature related to modelling and parameters affecting the precision of depth sensing (subsection 2.1). Then, the current depth change detection methods are reviewed in subsection 2.2. Finally, a review of existing multiple sensors schedulers is presented in section 2.3.

2.1. Depth sensing accuracy and its precision evaluation Many of the published researches have evaluated different depth sensors based on three main objectives. First is the application-based approaches which try to evaluate the efficiency and accuracy of the sensor in a particular application such as face detection [6], multimedia[7], and reconstructions applications[8]. These papers evaluate the parameters of the sensor affecting its performance in an assigned task. In general, sensor accuracy is defined as the closeness between the sensed depth data of a point and its real depth value. Precision is defined as the closeness between the independently recorded noise [7]. Both these features are evaluated and compared, for example, in [9] and [10] the accuracy feature of different sensors are evaluated and [11] evaluated the precision features of depth sensors. In summary, defining a comprehensive model which can show the relationship between various parameters affecting depth precision or accuracy remains a challenging issue. In addition, many attempts have also been carried out to denoise the sensed depth data, e.g. [12], [13]. A comparison between some of the recent depth denoising techniques has been discussed in [14]. In our methodology, we postulate that the precision in the measurements is a more dominating factor than the accuracy. This is due to the fact that for our proposed change detection, we are interested in detecting temporal changes for in-between frames which can facilitate further object tracking task. However, in some other applications, such as calculating odometry, accuracy has an important role[15]. 2.2. Change detection Depending on the scene, detecting the presence of any changes can be considered under different approaches such as background subtraction (BGS), foreground segmentation, and moving objects detection. Attempts to estimate the background in RGB videos have been the subject of numerous researches. In a survey[16], 26 different methods for BGS[17], are evaluated using different metrics such as D-score and structural similarity (SSIM). Several important factors have been identified which can deteriorate the accuracy of RGB-based background estimation algorithms such as sudden changes of illumination, shadow, moving background (tree in the background), and effect of camouflage[18]. Depth sensors allow incorporation of synchronized spatial information of the scene which can improve the background subtraction and foreground segmentation tasks. A comparison study of a large group of RGBD-based background subtraction algorithms is presented in [19]. Many works attempted to combine the depth data as a source of extra information for improving the accuracy of the RGB background subtraction algorithms. A common trend is to consider the depth data and RGB channels as

complementary data. In these methods, there is usually an approach to address special challenges associated with depth data, such as recovering missing points. For instance, in one of the approaches, the depth data is first denoised through a learning phase and the background is detected based on the Gaussian mixture model (GMM). Then the foreground segmentation is accomplished by combining the data extracted from both depth and RGB[20]. In another approach, authors use a Gaussian filter to smoothen the depth frame and then create a cost map for thresholding the result to update the background[21]. Changing color space is also common to increase the stability of the system.For example, in one approach, color channels are first transferred to chromaticity space to avoid the effect of shadow and then the Kernel density estimation KDE method is applied on color channels and depth channel[22]. Unlike the color channels which background updated frequently (due to changes in illumination), no background update is necessary on depth channel due to its stability against illumination changes. In their method, they consider the missing depth points and estimate the probability of each missing point to be part of the foreground. Depth data is used to provide extra information to compensate for the artifacts of background estimation methods based on RGB information. Differences in the depth value of two frames are utilized to detect the ghost effect in the RGB background subtraction algorithm. The ghost effect happens when a moving object is detected in the scene while there is no corresponding real object [23]. VIBE (Visual Background Extractor) algorithm is another famous algorithm for background subtraction task [24]. To overcome the challenges associated with VIBE, the illumination details are extracted to decide which sensors (RGB or depth) are more reliable and when the depth frame is noisy the RGB information is used to extract the most accurate data for segmentation[25]. Using only Depth data for background subtraction is also another approach in the literature. In a study comparing three types of sensor (RGB, depth, and Thermal) depth is considered to be one of the best options for background subtraction[26]. The authors use GMM in learning-phase and extract the foreground mask. In our proposed change detection method, we also utilize only depth data. The learning phase contains offline learning to extract the sensor features. We show that having one background frame would be sufficient for providing a robust and accurate background subtraction.

2.3. Multi-depth Sensors and Scheduling Deployment of the sensor network in monitoring applications needs to overcome a number of challenges. These range from sensing coordination to fusion of the sensed information associated with a scene. General challenges related to multi-sensor systems are discussed in the literature and it has been stated that sensor assignment including scheduling is one of the main challenges for such deployment[27]. This issue is studied under different terms such as sensor action planning, sensor selection, and sensor-to-task assignment. The scheduling is also studied in different applications such as video surveillance applications ([28],[29]) and tracking applications ([30], [31], and

[32]). For instance, a low pace moving object was tracked in a wireless camera sensor network with increased accuracy and lower energy consumption[32]. They use a dynamic clustering method to schedule the distributed sensor network based on a defined contribution decision (CD) which can place each camera at three states: active (the ones in the cluster which tracks the target), alert (the ones that target is in their FOV, but they are not in the main cluster), and sleep (the target is not in its field of view). To find the head of the cluster, which is responsible for scheduling other nodes in the selected cluster, an optimization problem based on the distance to the camera and its energy requirements is posed and solved. A scheduling framework was designed in a Bayesian framework in[34]. Authors assumed that all uncertainties are Gaussian and thus based on the Bayesian estimator they calculate the expected uncertainty for each sensor. A scheduler is designed based on the smallest expected uncertainty. The method was evaluated to have an object in an overlapping FOV of all sensors. The method utilized Kinect V1.0 and thus to overcome the IR structure interference, special hardware was designed to turn on the sensor when it is needed. In one of the approaches, the problem of scheduling of depth sensor network was addressed by classifying the space into separate zones based on the sensor FOV and depth coverage and where each target was associated with each zone [4]. The authors defined a list for each monitoring space and update this space-list consistently. They use part of the scene instead of all the sensed information which also allows for a simpler calibration approach. For example, hand's skeleton is used to calibrate multiple sensors and then Kalman filter is employed to track the skeleton-joints matching scheme[33]. However, no scheduling methodology is utilized in order to improve the efficiency of the sensor networks and data captured by all sensors are simply fused together. In the proposed method instead of continuously using all available sensors, a method has been presented to detect and assign the most reliable and qualified sensors to further track the target in a sensor network. The method offers a low computational

Figure 2: The overall algorithm of the proposed method for N sensors

3

complexity and a low power consumption. Power consumption is a focus of many researchers in different computer fields [30] and [35].

{1, … , 𝑁} with a limited FOV. Each sensor 𝑘𝑖 covers a limited volume of the monitoring area as a function of its FOV. In general, and to ensure the best coverage, the FOV of sen-

Scan plane Spread noise and outlier and flying pixels

Depth frame

Point cloud

RGB frame

Missing point

Out of range area

Spread noise and outlier Depth shadow

(a)

(b)

Figure 3: The depth imperfections in the figure shows the corresponding RGB image and depth frame along with the point cloud in two different environments. The depth imperfections are shown by arrows and the type of the imperfection is written in the boxes below each arrow

Figure (2) shows an overview of the proposed approach. The design scheduler is based on the confidence definition of each sensor to detect changes in its FOV. At the end, the schedular decides which sensor should be assigned for the change detection task. Each sensor performs a 1D scan of the 3D point cloud of the scene and estimate the occurrence of any change. Then the information has been passed to the scheduler to select the qualified sensors and also determine the working frequency of each sensor.

3.

CHANGE DETECTION

This section presents details of the proposed change detection algorithm. We have utilized the following terminologies in the following developments: a) Depth frame (or Depth image) is a frame that has the recording of the depth sensor with each pixel location representing the distance of the corresponding sensed object with respect to the sensor plane; b) Point cloud is the 3D projection of the depth frame where the coordinates of each point is defined with respect to the sensor’s coordinate frame ( e.g. 𝑥, 𝑦, 𝑧 coordinate frame shown in figure (3c)). Field of view (FOV) is the portion of the environment which can be sensed and recorded by the sensor depending on the parameters such as its pan and tile angles and also the depth range of the sensor.

3.1. Problem statement Let 𝑁 define the number of depth sensors 𝑘𝑖 , 𝑖 ∈

sors should have some overlaps with other sensors[36]. When a target enters the monitoring area, multiple sensors may detect the presence and movements of the target. However, the question is which sensor(s) is/are more qualified to track the target and at what instant of time. Let 𝐶𝑖 , 𝑖 ∈ {0, … , 𝑁} be the estimation of the confidence in the sensed information provided by sensor 𝐾𝑖 ∶ 𝑖 ∈ {0, … , 𝑁} which includes the level of reliability of the change detection of the sensor to indicate how reliable a detected change is pertinent to a real change in the scene as opposed to the false positive due to the outliers and noise. Using the score or ranking method is commonly used in many other applications such as [37], [38], and [39]. To calculate the confidence function, in the following, we analyze the precision of Kinect sensor in order to model its noise behaviour.

3.2. Noise modelling Possible sources of noise and errors in depth frame can be classified into two groups namely systematic and nonsystematic (random)[10]. Some sources of noise and errors in time of flight depth sensor (particularly Kinect v 2.0) are as follows: (a) Out of range points - Each depth sensor has a limited sensing range. Kinect V2.0 range is between [0.5𝑚 − 8𝑚](the Microsoft Kinect SDK human tracking API works till range of 4.5m). The points which are too close or too far from the sensor will be shown having a value of 0 in the recorded depth frame; (b) Flying pixels and

X Y Z (a)

(b)

(c)

Figure 4: (a) The variance of noise with respect to sensor distance(b) The variance of noise with respect to sensor angle (c) The sensor coordinate system

4

missing pixels - The phenomenon of flying pixels happens especially on image edges and the depth discontinuity which are discussed in[11]. This suggests the lower preci-

Let the captured frame be defined as 𝑓(𝑖, 𝑗): 𝑖 ∈ {0, … , 𝐻}, 𝑗 ∈ {0, … , 𝑊} which its resolution is 𝐻 × 𝑊 (e.g. for Kinect V2 the resolution is 512 × 424). The selected points are a small window of 4 by 4 pixels in the middle of 𝐻

𝐻

2

2

the depth frame (e.g. 𝑓(𝑖, 𝑗), 𝑖 ∈ { − 2, … , + 2} 𝑎𝑛𝑑 𝑗 ∈ 𝑤

(a)

(b)

(c)

(d)

(e) Figure 5: The steps and representation to extract 1D scan (a) The corresponding RGB frame (b) The corresponding Depth frame (c)The depth map which the green plane is performed scan (d) the same point cloud with the scan points showed in green hollows (e) The raw performed scan at location 220

sion on points close to the edges. In addition, some of the pixels may not be captured by the sensor and are assigned and stored with zero values. Depending on the type of sensor, a particular reflective and material properties of an object can also cause a higher rate in missing pixels[7]; (c) Multipath interference - The ToF sensors are active sensors and dependent on the reflection of the light from the surfaces in order to measure the distance from the object to sensor plane. Multiple reflections from different surfaces cause the wrong measurement of the depth value; (d) Depth shadow - The closer objects to the sensor block the IR rays and hence the sensor is incapable to measure the distance and (e) Spread noise and outlier - The depth measured by the sensor is corrupted by noise and outliers[11]. Some of these depth imperfections are illustrated in Figure (3a) and (3b) including missing points, out of range area, and flying pixels. To model the noise and imperfections in depth measurements, the sensor is placed in front of a white wall located at different distance 𝑑 from the wall [40], [41]. Having consistence color in an infinite plane located in the sensor’s FoV can assist to focus on one parameter at a time and reduce the effect of other factors influencing on the sensor’s precision. It can also improve the modeling of noise effects more accurately.

𝑤

{ − 2, … , + 2} ) similar to [6] (however, we are looking 2 2 to introduce the approximate relationship between noise intensity and the distance here) (Figure 4). More than 5000 points in each distance have been recorded in over 100 frames in order to model the noise and outlier in each state. Figure (4a) shows the standard deviation relationship of all captured points against their mean value in order to establish the precision of the depth frame measurement. This relationship can be approximated by equation (1). All generated equations in this subsection then will be used to generate the confidence of change detection in equation (3). 𝜎(𝑑) = 10−9 (−1.12𝑑 3 ) + 10−6 ( 3.15 𝑑 2 ) − 10−3 (0.9𝑑) + 0.09

, (1)

where 𝑑 is the mean distance to the sensor plane and 𝑁𝑜 is indicating the standard deviation of noise (𝑑 is in meter so the coefficients in the equation calculate the standard deviation in millimetre). This equation shows that the noise intensity will be decreased as the distance between sensor and object increases. Similarly, Figure (4b) shows the standard deviation of recorded points measured at various distances with the sensor having various pan angles with respect to the flat wall. The relationship can be estimated as 𝑓(𝑦) = 10−05 2.44𝑦 2 + 10−5 2.83𝑦 , where y is the value of the y axis as where the depth sensor is the origin. This equation shows that the noise is higher as the pan angle increases. With the assumption that each source of error is independent than others, we can combine two sources of error as equation (2). Generally, as the standard deviation of the noise increases, we are expecting to have higher false positive and hence the proposed change detection algorithm will rank the detected changes lower as we will show in subsection 3.4. 𝜎𝑛 (𝑦, 𝑑) = 𝑓(𝑦) + 𝑁𝑜(𝑑)

, (2)

3.3. Definition of 1D Scans A given 1D scan of the scene has been utilized to detect any changes in a given sensed environment. Let 𝑓(𝑖, 𝑗): 𝑖 ∈ {0, … , 𝐻}, 𝑗 ∈ {0, … , 𝑊} be a captured depth frame associated with such a scan. For a given value of 𝑗, a scalar number 𝑆 𝑎 = 𝑓(𝑎 , 𝑗) where 0 ≤ 𝑎 ≤ 𝐻 is used as a depth value of a pixel in this 1D scan. Initially, the captured depth frame is converted to the point cloud using pinhole camera model. The sensor’s intrinsic calibration parameters are obtained using the Kinect SDK 2.8 and libfreenect 2 library. Then, we can define 𝑠 𝑎 (𝑗) ↦ 𝑆̂ 𝑎 (𝑦), 𝑌𝑑 < 𝑦 < 𝑌𝑢 where 𝑦 is the value of the corresponding scan pixel point cloud in 𝑦 axis (Figure (3a)) and 𝑌𝑑 and 𝑌𝑢 are the limits of the horizontal FOV of the sensor (e.g. for Kinect V2.0 , 𝑌𝑑 = −500and 𝑌𝑢 = 500). Figure(5c) shows the depth frame obtained using pinhole camera model corresponding to the RGB frame 5

section 3.4 and section 4). However, such reduction also introduces some additional challenges in the data interpretation and in detection of changes. Furthermore, the reduced information makes the data more sensitive to the noise. As a result, a sequence of preprocessing steps needs to be carried out. (a)

(b)

(c)

(d)

(e) Figure 6: Data preprocessing before using the 1D scan (a) RGB frame (b) Grayscale representation of the depth frame (c) The point cloud which the green line is the performed scan (d) The processed performed scan at location 220 for two scene the red one is the current scene (a, b, d)𝑠𝑡2 (foreground), and the blue one is the scan performed from scene shown in (figure (5 a, b, c))𝑠𝑡1 (background), and the green line is their subtraction (e) The confidence function along with the 𝑠𝑡2 (second scan) which shows the confidence is higher in position that any change occurs.

and depth frame shown in Figure (5a) and Figure(5b), respectively. The scene shows a long hallway which part of it is out of range of the depth sensor. The green hollows in the Figure(5d) represents the points associated with the scan plane located at the row location of 220 of the image plane, 𝑆̂ 220 . Figure(5e) shows the corresponding scan while the missing pixels are shown by (0,0). It should be mentioned, since the assigned depth value for the missing point is zero, the pinhole camera framework models the value as (0,0) to its corresponding point in the depth frame, generating the jumps to zero in the Figure (5e). We are going to use this scan after the preparation as the basis of the change detection algorithm. An initial scan frame is defined in order to compare with any subsequence frames. Inherent to the 1D scan, computation time is lower compared to when processing the complete depth frames (see 6

3.4. Change Detection

Let 𝑓𝑡 be the frame captured at time 𝑡 and 𝑆̂𝑡𝑎 be the corresponding scan plane. We would like to reliably detect any 𝑎 𝑎 changes between two scans 𝑆̂𝑡1 and 𝑆̂𝑡2 . Let us define 𝑎 (𝑦) 𝑎 (𝑦) 𝑔𝑡2𝑡1 (𝑦) = 𝑆̂𝑡2 − 𝑆̂𝑡1 as the subtraction of two scans (for simplicity, we use the notation 𝑔(𝑦) instead 𝑔𝑡2𝑡1 (𝑦)). Figure (6) shows an example of two scans and their subtraction which the first scan is captured from a long hallway (Figure (5 a, b, d) while the targets are out of range of the depth sensor and the second one is captured while the target enters the sensor’s FOV (Figure (6 a, b, c) RGB frame, Depth Frame, and Point cloud, respectively). Figure (6d) shows 𝑆̂0𝑎 (𝑦), 𝑆̂1𝑎 (𝑦) and the value of 𝑔(𝑦) maps into a 2D dimension. As can be seen, 𝑔(𝑦) contains non-zero values in the locations that a change has happened (e.g. a human body appears in the sensor’s FOV 𝑦 ≅ 0 ,100). However, it also contains non-zero values in other locations due to presence of noise and outliers. In location around 𝑦 ≅ ∓400, the value of 𝑔(𝑦) is relatively high while no change has occurred in that location. In other words, 𝑔(𝑦) wouldn’t provide any estimation of the accuracy of the detected changes. To provide confidence of the detected changes, the noise and outliers accompanied by the sensor should be considered. Let 𝑊𝑠̂𝑎𝑡1 (𝑦): 𝑦 ∈ {𝑌𝑑 , … , 𝑌𝑢 } be the weight of 𝑡2 each pixel’s contribution to detect the changes between 𝑎 𝑎 two scans of 𝑆̂𝑡2 and 𝑆̂𝑡1 . The basic idea here is that to distinguishing pixels that are more vulnerable to the noise and outliers and assign a smaller weight to them. Consequently, the effect of noise, outliers, and missing points is reduced due to the smaller weight. We define a weight function showing the stability of each pixel in each scan as: 𝑤𝑠𝑎𝑡1 (𝑦) = 𝛽 ( 𝑡2

1

) + 𝛼 exp (−

𝑎 (𝑦)) 𝐴𝑛 (𝑦,𝑠̂𝑡2 𝑎 (𝑦)) 𝐴𝑛 (𝑦, 𝑠𝑡2 is defined

𝜕 (𝑔(𝑦)) 𝜕𝑦

)

,(3)

where in equation (2) and the second term is defined to include the effect of flying pixels and spread noise and outlier. 𝛼 and 𝛽 are normalization factors. In this study, we have selected 𝛼 = 1 and 𝛽 = 0.5. We have experimentally selected the values of these parameters and validate them using set of test images. The effect of change in values of 𝛼 and 𝛽 started from zero to one, incremented by 0.1 to maximize the recall value: 𝑇𝑃/(𝑇𝑃 + 𝐹𝑃), where 𝑇𝑃 is true positive and FP is false positive. The first term of equation (3) considers the reliability of the captured depth pixel related to the systematic noise and outliers. As the pan angle and distance of the target respect to the camera increase the effect of noise increases as well (shown in section 3.2). As a result, this term will determine the weight of the pixels based on their distance and angle to the sensor. Thus, as the distance or the pan angle to the sensor increase, the weight of the pixel decreases and hence its contribution to detect the changes should be reduced. As we mentioned in section 3.2, another factor that

highly affects the precision of the sensor is the flying pixels. Flying pixels phenomenon usually happens along the edges of depth. In the second term in equation (3), we have identified the edges by taking the derivative of the scan through the scan axis. In addition, this strategy contributes to identify missing pixels and outliers. Since outliers have a different value than their neighbours, they generate higher derivative. As the number of flying pixels or outliers increases, its weight will be decreased as well. In summary, this provides weight value for each point in the point cloud so that lower weight will be assigned to points that are more vulnerable to the depth imperfections and especially those that are more exposed to noise and outliers. Next, we define the confidence of each point in the point cloud to detect an onset of a change in the scene. Here the contribution of neighbour pixels is considered by convoluting the performed scan in the point cloud with a box signal as: 𝐶(𝑦) = ∫ 𝑤𝑠𝑎𝑡1 (𝜏). ℎ(𝜏 − 𝑦)𝑑𝜏 where 𝑡2

ℎ(𝑦) = 𝑠𝑔𝑛(𝑦) − 𝑠𝑔𝑛(𝑦 − 𝑇) , where 𝑇 is the width of the box signal. The higher the 𝑇 is, the more neighbour pixels contribute to the confidence of a given pixel. In our experiment, we select 𝑇 = 12 which means twelve of neighbour points have been utilized to calculate the confidence of a point in the performed scan. Using a threshold 𝑇ℎ the confidence of change detection for each point in the scan would be defined as follows: 𝐶(𝑦) > 𝑇ℎ 𝐶(𝑦) ∁(𝑦) = { , (4) 0 𝑂𝑡ℎ𝑒𝑟 𝑤𝑖𝑠𝑒 Figure (6e) shows the confidence of the performed

Algorithm 1: The region growing algorithm 1) stack st=[pixels that changes are detected] 2) radius_search= the radios looking for connected pixels 3) while st is not empty 4) for i=-radius_search to radius_search 5) Distance= distance between st.top and its ith neighbor. 6) If (Distance is less than a threshold and the ith neighbor is not tagged as changed) 7) push the ith neighbor to the st

scan. As can be seen, the target’s location is detected when the target enters the FOV and noise and outliers are ignored successfully. As the threshold increases the more highlighted changes will be detected, the lower threshold causes triggering the sensor in subtle changes leading to a higher rate of false positive. Figure (7) illustrates the effect of the threshold value, where the image is taken from a cluttered laboratory area where the sensor is pointing to a door and the target opens the door and enters the room. Three different threshold values have been used to illustrate the importance of thresholding as 𝑇ℎ = 0, 𝑇ℎ = 2, and 𝑇ℎ = 5. In Figure (7a) threshold (𝑇ℎ = 0) is selected causing false positive due to noise and outliers. With 𝑇ℎ = 2, the subtle change in the door has been detected (Figure (7b)) while with 𝑡ℎ = 5 the change of the door could not be identified (Figure (7c)). Algorithm (1) shows the algorithm of the

Depth frame

RGB RGB frameposition Door

Door position

Algorithm 2: The proposed change detection algorithm 1) for each frame do: 2) W= width of frame 3) a=#scan [1,height of scan] 4) Bframe=frame of background 5) Cframe= current frame 6) 7) 8)

alpha= Coefficient1 beta= coefficient 2 gamma=coefficient 3

9) 10) 11) 12) 13)

FX=Focal length in X coordination FY=Focal length in Y coordination PX=Principal focuse in X coordination PY=Principal focuse in Y coordination 𝐿𝑤 =length of the window

(a)

14) for i=2 to W 15) 𝑔(𝑖) = 𝐵𝑓𝑟𝑎𝑚𝑒(𝑎, 𝑖) − 𝐶𝑓𝑟𝑎𝑚𝑒(𝑎, 𝑖) 16) 𝑦(𝑖) = (𝑎 ∗ 𝑊 + 𝑖) ∗ 𝐶𝑓𝑟𝑎𝑚𝑒(𝑎, 𝑖)/𝐹(𝑦); 17) 𝑑𝑦 = 𝑦(𝑎, 𝑖) − 𝑦(𝑎, 𝑖 − 1); // 18)

𝑊(𝑖) = 𝑒𝑥𝑝 (−

𝛼(𝑔(𝑖)−𝑔(𝑖−1)) 𝑑𝑦

{

)

+ 1

𝑎1∗𝐶𝑓𝑟𝑎𝑚𝑒(𝑎,𝑖)3 +𝑎2∗𝐶𝑓𝑟𝑎𝑚𝑒(𝑎,𝑖)2 +𝑎3∗𝐶𝑓𝑟𝑎𝑚𝑒(𝑎,𝑖)+𝑎4 𝑦(𝑖)

𝐶𝑓𝑟𝑎𝑚𝑒(𝑎,𝑖)

19) 20) 21) 22) 23) 24)

(b)

+

} ; calculate weight of pixel (a,i)

for 𝑗 = 𝑖 − 𝐿𝑤 𝑡𝑜 𝑖 𝐶(𝑖)+= 𝑊(𝑗) if(C(i) is greater than threshold) The pixel (𝑎, 𝑖) belongs to foreground Else The pixel (𝑎, 𝑖) does not belong to foreground

(c) Figure 7: The effect of 𝑇ℎ on the result of change detection (a) 𝑇ℎ = 0 causes false negative (b) 𝑇ℎ = 2 the changes of the scene has been detected (c) 𝑇ℎ = 5 The subtle change of the door is not distinguishable

7

the depth frame. Figure (8) shows the result of region growing algorithm on the scan for Figure (7). The black points are selected points of the scan as the detected change (foreground) using the region growing algorithm.

4. SCHEDULING MULTI-SENSOR FOR TRACKING (a)

(b) Figure 8:The result of region growing algorithm showed by black points where the green points are the scan. (a) One target (b) more than one target

proposed change detection method. Considering the limited range of the sensor, the complexity of the change detection algorithm can be calculated as 𝑂(𝑊𝐿𝑤 ) for one scan where 𝑊 is the width of the depth frame and 𝐿𝑤 ≪ 𝑊 is the size of windows. In the next step, a region growing algorithm is employed to completely extract the Region of interest (ROI) and extract all the points belong to the target. Algorithm (2) illustrates the proposed region growing algorithm. To find the seed of the region growing algorithm the maximum value of 𝐶(𝑦) > 0 is identified along with the scan. The seeds are the points that the region growing algorithm will be applied on them so the whole ROI could be segmented. These points are shown in red in Figure (6e). The region growing algorithm is run on a 1D scan and hence, its time complexity will be decreased significantly. For a single scan, the complexity is 𝑂(𝑊 2 ), however, for 𝑀 scans, the complexity is 𝑂(𝑀𝑊 2 ). On the other hand, if the region growing algorithm applied on the whole frame the complexity would be 𝑂((𝑊𝐻)2 ) where H is the width of

(a)

(b)

a

(c)

(d)

(e) (f) Figure 9: (a) depth frames captured by 𝐾1 (b) depth frames captured by 𝐾2 (c) the score with parameters 𝛼 = 1, 𝛽 = 𝛾 = 0 (d) the score with parameters 𝛽 = 1, 𝛼 = 𝛾 = 0 (e) with 𝛼 = 0.0001, 𝛽 = 0.05, and 𝛾 = 0 (f) with 𝛼 = 0.0001, 𝛽 = 0.05, and 𝛾 = 0.1

8

TASK

In a network of depth sensors, multiple sensors can be triggered when a target moves in the monitoring area. In this section, a scheduler is proposed in order to select the most qualified sensor(s) which can be used for tracking. First, a method for sensor scheduling is proposed followed by a detailed analysis and a comparative study of our approach with the case when no scheduler is present.

4.1. The scheduler In order to determine the most qualified sensor for detecting the target, three main factors need to be considered. a) Confid ence in the change d etection -We define the confidence of the detected change using equation (4) in section (3.4). The sensor which provides higher confidence could be a better candidate to do the tracking task. We have defined the boundaries of a detected change in each sensor’s FOV in the previous Section (3.4) using a region growing algorithm. To calculate the overall confidence of a particular change in a FOV, the sum of all the corresponding confidence points is considered in the Equation (5). Equation (5) represents the score of the ith sensor. This equation consists of two factors, firstly the confidence of each pixel in the detected change and secondly the number of pixels belongs to the detected change. The size of the detected change is a key factor to calculate the tracking score of each sensor, the more the detected size is the more probably the sensor is qualified to detect the changes. 𝑆𝑐𝑜𝑟𝑒𝐶 (𝑖) = ∑𝑦1 <𝑦<𝑦2 𝐶(𝑦) , 𝑖 ∈ {1, … , 𝑁} , (5) where y1 and 𝑦2 are the boundaries of the selected region in the point cloud. The corresponding boundaries are determined by applying the region growing algorithm in section (3.4) b) Provid ed d epth resolutio n - Another factor affecting the score of a sensor is the resolution of the detected changes. To take into account the impact of resolution in the final score we calculate the mean of the distance of the detected change points (foreground) to the sensor’s plane. The score has a reverse relationship with the distance[11] Algorithm 3: The Algorithm of the proposed multi-sensor scheduler 1) 2) 3) 4) 5) 6) 7) 8) 9) 10) 11) 12) 13)

For (all sensors) if(new frame is available) if(there is any changes) Calculate the score if(the score is high) Run the sensor on higher frequency else Run the sensor in standby frequency end else Run the sensor in standby frequency end end

(b)

(a)

(f)

(e)

(d)

(c)

Figure 10: An example of each class of the dataset the right up image is an example of background left-up an example of foreground image and down is the point cloud with the detected changes (a)Depth camouflage class (b)Color camouflage class (c)Illumination changes class(d) Intermittent Motion class (e) Out of range class (f) Shadow class

and hence we define equation (6) as the effects of resolution on the final score to increase the score as the distance decreases (e.g. the resolution increases). Higher depth resolution gives higher details and hence it is preferable. The value of 𝜖 = 0.01 is based on the minimum and maximum range of the sensor to provide a meaningful and distinguishable value for 𝑆𝑐𝑜𝑟𝑒𝑅 (𝑖). 𝑆𝑐𝑜𝑟𝑒𝑅 (𝑖) = exp (−𝜖

∑𝑦1
) , 𝑖 ∈ {1, … , 𝑁} ,

(6)

c) The continuity of the m onitoring - To prevent fluctuation between selected sensors, penalizing strategy has been employed to avoid the quick changes between sensors. Equation (7) is utilized to illustrate this strategy. 𝑖𝑓 𝑖𝑡ℎ 𝑠𝑒𝑛𝑠𝑜𝑟 𝑖𝑠 𝑛𝑜𝑤 𝑡𝑟𝑎𝑐𝑘𝑖𝑛𝑔 𝑡ℎ𝑒 𝑡𝑎𝑟𝑔𝑒𝑡 1 𝑃(𝑖) = { ,(7) 0 𝑜𝑡ℎ𝑒𝑟 𝑤𝑖𝑠𝑒

By considering the above three factors a confidence score function can be defined as follows: 𝑠𝑐𝑜𝑟(𝑖) = 𝛼 𝑆𝑐𝑜𝑟𝑒𝐶 (𝑖) + 𝛽𝑆𝑐𝑜𝑟𝑒𝑅 (𝑖) + 𝛾𝑃(𝑖) ,(8) the value of 𝛼, 𝛽 and 𝛾 are defined experimentally which is employed to normalize the effect of each factor defined by equation (5) to (7). To provide an understanding of the effect of each factor, we conduct an experiment in which two sensors are located on the slightly different pan angle while the target walks away from sensors for the scan of 𝑆̂ 250 . Figure (9a) and (9b) show some example samples of captured depth data for 𝐾1 and 𝐾2 , respectively. Figure (9c) and (9d) illustrate the 𝑆𝑐𝑜𝑟𝑒𝐶 (𝑖) and 𝑆𝑐𝑜𝑟𝑒𝑅 (𝑖) for sensor 𝐾1 and 𝐾2 . We select 𝛼 = 0.0001 and 𝛽 = 0.5 to normalize their effects while the confidence on change detection has

shadow

Out of range

IntermittentMotion

Illumination changes

Color camouflage

Depth camouflage

Table 1: the result of change detection algorithm on each class of the dataset, the result of whole class is shown in green at the end PWC%↓ Class name TP FP TN FN Rec↑ Sp↑ FPR↓ FNR↓

Pre↑

F1↑

Wall

19120

0

119760

0

1

1

0

0

0

1

1

DcamSe Despatx_ds

58274 12753

562 359

369313 283820

11 128

0.9998 0.9901

0.9985 0.9987

0.0015 0.0013

0 0.0099

0.13 0.1639

0.9904 0.9726

0.9951 0.9813

Entire Class

90147

921

772893

139

0.9985

0.9988

0.0012

0.0015

0.1227

0.9899

0.9942

Cseq1

28312 41057 16242 85611

425 1238 11 1674

162623 187465 365467 715555

0 0 3520 3520

1 1 0.8219 0.9605

0.9974 0.9934 1.00 0.9977

0.0026 0.0066 0 0.0023

0 0 0.178 0.0395

0.22 0.5388 0.91 0.6441

0.9852 0.9707 0.9993 0.9808

0.9926 0.9851 0.9020 0.9706

TimeOfDay Entire Class

12196 10177 0 0 22373

236 0 532 0 768

324848 251577 259948 787840 1624213

70 0 0 0 70

0.9943 1 NA NA 0.9969

0.9993 1 0.9988 1 0.9995

0 0 0.0020 0 0

0.0057 0 NA NA 0.0031

0.0907 0 0.2042 0 0.0509

0.9810 1 NA NA 0.9668

0.9876 1 NA NA 0.9816

Abandoned1

6384

57

152919

0

1

0.9996

0

0

0.0358

0.9912

0.9956

Abandoned2

21851 28235

2420 2477

134429 287348

660 660

0.9707 0.9772

0.9823 0.9915

0.0177 0.0085

0.0293 0.0228

1.9327 0.9842

0.9003 0.9193

0.9342 0.9474

Entire Class

151399 39848 11935 12166 7672 223020

1635 69 1137 436 1398 4675

739312 672140 414867 403325 261972 2491616

3014 4558 221 73 410 8276

0.9805 0.8974 0.9818 0.9940 0.9493 0.9642

0.9978 0.9999 0.9973 0.9989 0.9947 0.9981

0.0022 0 0.0027 0.0011 0.0053 0.0019

0.0195 0.1026 0.0182 0.0060 0.0507 0.0358

0.5192 0.6457 0.3172 0.1224 0.6660 0.4748

0.9893 0.9983 0.9130 0.9654 0.8459 0.9795

0.9849 0.9451 0.9462 0.9795 0.8946 0.9718

fall01cam1 genSeq2 Shadow_ds Shadows1 Shadows2 Entire Class

4301 16940 7449 33306 17049 79045

959 0 0 0 0 959

97030 174611 204331 132424 142311 750707

110 449 60 30 0 649

0.9751 0.9742 0.9920 0.9991 1 0.9919

0.9902 1 1 1 1 0.9987

0.0098 0 0 0 0 0.0013

0.0249 0.0258 0.0080 0 0 0.0081

1.0439 0.2339 0.0283 0.0181 0 0.1934

0.8177 1 1 1 1 0.9880

0.8895 0.9869 0.9960 0.9995 1 0.9899

colorCam2 Hallways Entire Class Chairbox Genseq1 Ls_ds

Entire Class Multipeople2 Multipeople1 Topviewlab1 Topviewlab2 Topviewlab3

*TP= true positive, FP= false positive, TN= True negative, FN, False negative, rec=Recall, SP=specificity, FPR= false positive rate, FNR=false negative rate , PWC%=percentage of wrong classification, PR= precision, and 𝐹1 F-measure

9

Table 2: Comparison of the proposed method and methods published in the dataset bench mark[36] in each class Sp↑

FPR↓

FNR↓ PWC%↓ Pre↑

F1↑

0.9985 0.9988 0.0012 0.0015 0.1227 0.9899

0.9942

0.8401 0.9985 0.0015 0.1599 0.9778 0.9682

0.8936

0.9725 0.9856 0.0144 0.0275 1.5809 0.8354

0.8935

FPR↓

FNR↓ PWC%↓ Pre↑

F1↑

0.9605 0.9977 0.0023 0.0395 0.6441 0.9808 0.9706 0.9563 0.9927 0.0073 0.0437 1.2161 0.9434 0.9488 0.4310 0.9767 0.0233 0.5690 16.0404 0.8018 0.4864

0.9816

0.4514 0.9955 0.0045 0.0486 0.9321 0.4737

0.4597

0.4366 0.9715 0.0285 0.0634 3.5022 0.4759

0.4527

0.4795 0.3392 0.4479 0.4699 0.4707

0.4159 0.4188 0.4587 0.4567 0.4504

0.4454 0.3569 0.4499 0.4610 0.4581

The proposed ap- 0.9642 0.9981 0.0019 0.0358 0.4748 0.9795 proach

0.9718

The proposed approach

0.9919 0.9987 0.0013 0.0081 0.1934 0.9880 0.9899

0.9260

RGBDSOBS

0.9323 0.9970 0.0030 0.0677 0.7001 0.9733 0.9500

RGBSOBS

0.9359 0.9881 0.0119 0.0641 1.5128 0.9140 0.9218

SRPCA

0.7592 0.9768 0.0232 0.2408 4.0602 0.8128 0.7591

AvgM-D Kim

0.8812 0.9876 0.0124 0.1188 1.9330 0.8927 0.8784 0.9270 0.9934 0.0066 0.0730 1.0771 0.9404 0.9314

SCAD

0.9665 0.9910 0.0090 0.0335 1.0093 0.9276 0.9458

RGBDSOBS

0.9816 0.9858 0.9935 0.9927 0.9914

0.0184 0.0142 0.0065 0.0073 0.0086

0.0205 0.1608 0.0521 0.0301 0.0293

2.9944 1.6943 0.9820 0.4432 2.4049

Sp↑

0.0031 0.0509 0.9668

0

0.1321 0.1632 0.1298 0.0159 0.3179

Rec↑

0.8083 0.8538 0.9009 0.9638 0.7648

0.9969 0.9995

0.0222 0.0078 0.0032 0.0037 0.0051

The proposed approach RGBDSOBS RGBSOBS SRPCA

0.7850 0.8860 0.9433 0.9447 0.9016

The proposed approach RGBDSOBS RGBSOBS SRPCA AvgM-D Kim SCAD cwisardH+

0.9778 0.9922 0.9968 0.9963 0.9949

1.9171 3.0717 1.1395 0.9715 1.0754

0.9170 0.9975 0.0025 0.0830 0.5613 0.9362

RGBSOBS

0.8902 0.9896 0.0104 0.1098 1.3610 0.8237

0.8527

SRPCA

0.8785 0.9878 0.0122 0.1215 1.6100 0.7443

0.8011

AvgM-D Kim

0.6319 0.9860 0.0140 0.3681 2.7663 0.6360 0.9040 0.9961 0.0039 0.0960 0.8228 0.9216

0.6325 0.9120

SCAD

0.9286 0.9965 0.0035 0.0714 0.5711 0.9357

0.9309

cwisardH+ 0.8959 0.9956 0.0044 0.1041 0.8731 0.9038

0.8987

IntermittentMotion

0.8679 0.8368 0.8702 SCAD 0.9841 cwisardH+ 0.6821

method

Color camouflage

Rec↑

Shadow

Out of range

Illumination changes

Depth camouflage

method The proposed approach RGBDSOBS RGBSOBS SRPCA AvgM-D Kim

0.8476 0.9001 0.9737 0.9875 cwisardH+ 0.9533

0.9389 0.9793 0.9927 0.9904 0.9849

The proposed approach RGBDSOBS RGBSOBS SRPCA AvgM-D Kim SCAD cwisardH+

0.9816 0.9858 0.9935 0.9927 0.9914

AvgM-D Kim SCAD

0.0611 0.0207 0.0073 0.0096 0.0151

0.1524 0.0999 0.0263 0.0125 0.0467

4.3124 2.0719 0.7389 0.7037 1.1931

0.8367 0.8096 0.9754 0.9677 0.9502

0.8329 0.8508 0.9745 0.9775 0.9510

0.9772 0.9915 0.0085 0.0228 0.9842 0.9193 0.9474 0.4514 0.9955 0.0045 0.0486 0.9321 0.4737 0.4597 0.4366 0.9715 0.0285 0.0634 3.5022 0.4759 0.4527 0.4795 0.3392 0.4479 0.4699 0.4707

cwisardH+ 0.9518

0.0184 0.0142 0.0065 0.0073 0.0086

0.0205 0.1608 0.0521 0.0301 0.0293

1.9171 3.0717 1.1395 0.9715 1.0754

0.4159 0.4188 0.4587 0.4567 0.4504

0.4454 0.3569 0.4499 0.4610 0.4581

0.9877 0.0123 0.0482 1.3942 0.9062 0.9264

*TP= true positive, FP= false positive, TN= True negative, FN, False negative, rec=Recall, SP=specificity, FPR= false positive rate, FNR=false negative rate , PWC%=percentage of wrong classification, PR= precision, and 𝐹1 F-measure

a slightly higher contribution to the final score. The score using these coefficients is shown in Figure (9e). Finally, 𝛾 = 0.1 to make the scheduler stable against small changes of the score (shown in Figure (9f)).

4.2. Analysis of the scheduler Pseudo code of the Algorithm 3 shows outlines the proposed scheduler. Here, sensors with lower confidence can run on lower frequency leading to reserving overall energy. In addition, any further detection algorithm can be run only on a limited number of sensors and hence for the networks with a high number of nodes the efficiency of the network will be increased. The reduction of the computational complexity will be more significant on the central handler which handles all the sensors and schedules them. The scheduler can be used on top of any sensor (visual) task management. Let 𝐹𝑠𝑒𝑛 be the sampling frequency of the employed sensor, and 𝑇 be the duration of the monitoring. Then the number of frames taken by a sensor can be calculated by 𝑇 × 𝐹𝑠𝑒𝑛 . If all sensors work in the same frequency and all sensors are involved in the monitoring task then the number of frames captured in the scene can be calculated by 𝑛𝑓𝑟𝑛𝑜𝑟𝑚𝑎𝑙 = 𝑁 × 𝑇 × 𝐹𝑠𝑒𝑛 for the case when the scheduler hasn’t been used. Let 𝑅 be the rate of reduction 10

in the standby mode and 𝑁́ is the number of sensors in the standby mode then the number of operations is: 𝐹 𝑛𝑓𝑟𝑠𝑐 = (𝑁 − 𝑁́) × 𝑇 × 𝐹𝑠𝑒𝑛 𝑓(𝑚) + 𝑁́ × 𝑇 × 𝑠𝑒𝑛 𝑓(𝑚) = 𝑅 ́ 1 𝑁 × 𝑇 × 𝐹𝑠𝑒𝑛 𝑓(𝑚) − 𝑁 (1 − ) × 𝑇 × 𝐹𝑠𝑒𝑛 𝑓(𝑚) , (9) 𝑅

where 𝑛𝑓𝑟𝑛𝑜𝑟𝑚𝑎𝑙 > 𝑛𝑓𝑟𝑠𝑐 . Increases in the values of 𝑅 and 𝑁́ result in decreases in the number of frames. Furthermore, a large portion of energy is spent on capturing images. By lowering the number of frames needed for the scheduler, it can result in reduction in the overall power consumption of a highly complex and battery supported sensor network.

5.

EVALUATION AND DISCUSSION

In this section, the evaluation results of the proposed methods are presented. First, the proposed change detection approach is evaluated to study its efficiency using a published dataset collected for background subtraction purposes. Then, the evaluation is extended by considering different real-time scenarios. Finally, the performance and result of the multi-sensor tracking and scheduling were evaluated in a variety of scenarios.

𝑑 𝐾1 (a)

(b)

(c)

(d)

(e)

(f)

𝐾1 (b) (a) Figure 11: Real time single sensor set-up (a) a long hallway which part of it is out of range of sensor (b) a clutter environment while sensor is pointing to the entrance door and 𝑑 is the distance between sensor and the entrance door

5.1. Single sensor change-detection evaluation We evaluate our method for the change detection algorithm using a dataset collected by depth sensor and in real time scenarios using an experimental setup. The employed dataset is SBM RGBD dataset[40] which contains different set of RGB and depth frames specialized for background detection scenarios and various situations to address different challenges associated with change detection algorithm such as depth camouflage (when the foreground is close to the background), color camouflage (when the foreground color is close to the background’s), single and multiple foregrounds. Then we evaluate our proposed algorithm on a real-time scenario in various situations to explore the proposed change detection algorithm efficiency in a more dedicated environment. 5.2. Evaluation of single sensor change-detection using datasets To evaluate our proposed method for change detection, an RGB-D dataset specially published for background subtraction applications has been utilized[40]. This dataset has been captured using Kinect v1.0. However, we analyze our method based on the proposed Kinect v2.0. The noise and outliers which affect the confidence function are assumed to follow the same behaviour as described previously [8]. In the proposed method the existence of the whole background is necessary at least for 1 frame. The dataset is classified into different classes, each with a different set of images to focus on each challenge of change detection applications. The proposed algorithm has been run in different classes of the dataset only on depth data. Some samples of the dataset have been shown in Figure (10). The Figure illustrates an example in each class where the top right RGB image is the background image, the top left image is one of the examples including the foreground, and

Figure 12: The setup for multi-sensor tracking in six different scenarios

the point cloud with the scan plane shown by green hollows (mostly 𝑆̂ 240 ) is shown under the RGB images and the detected changes are illustrated by the black circles. It should be mentioned that the green shadow is due to the graphics and the whole area is detected by the proposed algorithm. The result has been shown in Table (1) which for each subset of the dataset separately. The last row of each section in the table contains the overall result for each class of the dataset including the parameters such as the number of True Positive (TP), False positive (FP), True Negative (TN), and False Negative (FN) are shown in the table. In addition, the table contains the dataset benchmark metrics as follows. 𝑇𝑃 𝑇𝑁 Recall ( 𝑟𝑒𝑐 = ), specificity (SP = ), false posi𝑇𝑃+𝐹𝑁 FP

𝑇𝑁+𝐹𝑃

FN

tive rate (𝐹𝑃𝑅 = ), false negative rate FNR = , FP + TN TP + FN percentage of wrong classification 𝑃𝑊𝐶 = FN + FP TP 100 , precision 𝑝𝑟 = , and F-measure TP + FN + FP + TN 2 .Precision .Recall

TP + FP

𝑓1 = . Having lower values for FPR, FNR, Precision + Recall and PWC is preferable while the higher value for other metrics (rec, sp, pre, and F1) is preferable. These preferences are shown in each entry of the table by a small arrow next to each metric. As can be seen in Table 1, the approach has been able to detect the change of the scene accurately in most cases. For the color camouflage category, the best recall value is obtained by Kim method by factor of 0.1% better than our result. This shows that the number of true positive detected by our method is less than their proposed method. In addition, and in the same category, the FNR rate of SCAD is approximately 0.2% less than our approach which shows the number of false negatives founded in our method is less than theirs. But our approached gain the

Table 3: the real time result of real time experiment

Hallway Clutter lab Clutter lab Clutter lab Clutter lab Clutter lab Clutter lab

𝑑 ∞ 3.5 4 4.5 5 6 7.5

TP 1304 1921 1350 1864 1936 3513 1706

FP 0 108 114 50 118 50 83

TN 13919 14752 20890 20052 23058 42241 28758

FN 137 115 174 50 100 276 173

Rec↑ 0.9049 0.9435 0.8858 0.9739 0.9509 0.9272 0.9079

Sp↑ 1 0.9927 0.9946 0.9975 0.9949 0.9988 0.9971

FPR↓ 0 0.0073 0.0054 0.0025 0.0051 0.0012 0.0029

FNR↓ 0.0951 0.0565 0.1142 0.0261 0.0491 0.0728 0.0921

PWC%↓ 0.8919 1.3198 1.2784 0.4542 0.8647 0.7075 0.8333

Pre↑ 1 0.9468 0.9221 0.9739 0.9426 0.9860 0.9536

F1↑ 0.9501 0.9451 0.9036 0.9739 0.9467 0.9557 0.9302 11

𝐾1

𝐾2

(b)

(a)

(d)

(c)

(e)

Figure 13: (a,b) The depth frame captured by 𝐾1 and 𝑘2 , respectively and the corresponding point cloud which the scan is shown by green hollow (c) the trajectory of the target moving in the monitoring area (d) Comparing the scores of each sensor in each frame. (e) The representation of sensor’s situation in cooperating for target tracking, the shaded green shows that target is tracking by the sensor. Blue is the presentation of detection and the red showed no detection of the target

highest results on other parameters. In category of intermittent motion, the method of RGBD SOB has approximately 0.2% better than our results showing that we have included more pixels as foreground than them. However, with all other cases our approach shows better results. Table (2) shows the overall result of each class along with the result of some other methods (background subtraction and foreground segmentation) published in [40]. The proposed method results are shown in a gray row at the beginning of each class and the best results of each metric in each class are bolded.

5.2.1.

Real-time Single sensor change-detection evaluation In order to study the performance of our proposed change detection algorithm in a more challenging environment, the algorithm was run in different scenarios with a defined distance between the sensor and the expected onset of change. Two different setups have been considered which is shown in Figure (11). In the first set-up, the sensor is located in a long hallway that part of the scene is out of range of the sensor. The targets moved along the hallway while they are recorded by the depth sensor. In the second setup, the sensor is located in a cluttered area and its FOV is

restricted by different objects in the scene. The sensor is pointing to the entrance door where the onset of change is started at the door position where the target enters the room. In different scenarios, the door distance to the sensor’s plan has been increased step by step to generate new set of depth frame stream (the corresponding distance is shown by 𝑑 in Figure (11b)). In table (3), the first row shows the result for the hallway experiment (shown in figure (11a)) where the actual onset of change is out of range of sensor’s FOV. Other rows of the table contain the result of the proposed method in the cluttered lab (figure (11b)). The second column of table (3) shows the distance between the sensor and the onset of the change. In the first row this value is selected as 𝑑 = ∞ representing the fact that the event is started out of range of the sensor.

5.3. Multi-sensor results and scheduling To evaluate the proposed scheduler, we defined six different scenarios in which three of them are captured in a long Hallway which part of it is out of range of the sensor and the rests are in a cluttered environment. The sensors are placed in a variety of positions with respect to each other covering different parts of the monitoring environment

𝐾2

𝐾1

(a)

(b)

(c)

(d)

(e)

Figure 14: (a,b) The depth frame captured by 𝐾1 and 𝑘2 , respectively and the corresponding point cloud which the scan is shown by green hollow (c) the trajectory of the target moving in the monitoring area (d) Comparing the scores of each sensor in each frame. (e) The representation of sensor’s situation in cooperating for target tracking, the shaded green shows that target is tracking by the sensor. Blue is the presentation of detection and the red showed no detection of the target

12

𝐾1

𝐾2

(c)

(b)

(a)

(d) (e)

Figure 15: (a,b) The depth frame captured by 𝐾1 and 𝐾2, respectively and the corresponding point cloud which the scan is shown by green hollow (c) the trajectory of the target moving in the monitoring area (d) Comparing the scores of each sensor in each frame. (e) The representation of sensor’s situation in cooperating for target tracking, the shaded green shows that target is tracking by the sensor. Blue is the presentation of detection and the red showed no detection of the target

with various portions of FOV overlapping. Figure (12) illustrates the setup for multi-sensor in each of these scenarios. Here we first explain the result of our scheduler and then show how the sensors are assigned to the tracking task. We also compare the energy consumption and accuracy of change detection with and without scheduler.

5.3.1.

The scheduler evaluation

The detailed result of each scenario is shown in the following Figures (13, 14, 15, 16, 17, and 18). In each of this images, first, a sample of depth frames and point clouds captured by both sensors are presented (in parts (a and b) of each Figure) to show the sensed information of each sensor along with their relative spatial position with respect to each other in part (c). Part (c) also contains the trajectory of the moving target in the FOV of the sensors. The associated score of both sensors computed in each frame based on equation (8) is shown in part (d) of each figure. Finally, the action of the scheduler is visualized using a 1D heat map of sensors score defined in part (e). Heat maps are generated based on the value of the final score for each sensor. If the sensors cannot detect any changes in its FOV the heat map color is selected to be red. If a sensor detects the change but it’s not assigned to track the target

the color is a shaded blue. Darker blue indicates higher score value. Finally, shades of green color are indicating that the sensor detects the change and is assigned to track the target. Similarly, darker green is indicating a higher score. First, three scenarios are captured in a long hallway. In the first scenario (Figure (12a)), sensors are located at the same distance and next to each other. Initially, as the target moves toward the sensors, both sensors have similar scores. As the target moves closer toward either one of the sensors, the corresponding sensor’s score increases as expected. The target’s trajectory, score of each sensor, and the example of each depth frame and point cloud are shown in Figure (13). In the second scenario which is captured in the hallway (Figure (12b)) one of the sensors is located in front of the other sensor to enhance the monitoring area. As the target moves toward sensors, the closest sensor to the target (𝐾1 ) detects the target and as the target gets closer, its score increases. Meanwhile, as the target enters 𝐾2 FOV it gets detected by 𝐾2 as well. The target then exits the FOV of 𝐾1 and the tracking task will be assigned to the other one. The target’s trajectory, the score of each sensor, and the example of each depth frame and point cloud are shown in Figure (14).

𝑲𝟐

𝑲𝟏

(a)

(b)

(c)

(d)

(e)

Figure 16: (a,b) The depth frame captured by 𝐾1 and 𝐾2, respectively and the corresponding point cloud which the scan is shown by green hollow (c) the trajectory of the target moving in the monitoring area (d) Comparing the scores of each sensor in each frame. (e) The representation of sensor’s situation in cooperating for target tracking, the shaded green shows that target is tracking by the sensor. Blue is the presentation of detection and the red showed no detection of the target

13

In the third scenario corresponding to the hallway (Figure (12c)) one of the sensors is located in front of the other sensor while their direction is perpendicular to each other. As the target moves toward sensors, first 𝐾1 detects it since it is pointed to the hallway, so it is assigned to track the target until the target enters the other sensor FOV (𝐾2 ). Since the difference in the scores are not significant, the target which is closer to 𝐾2 is still continued to be tracked by 𝑘1 in order to avoid switching between two sensors. The target trajectory, the score of each sensor, and the example of each depth frame and point cloud are shown in Figure (15). The other three scenarios are taken from a cluttered laboratory space. In the first scenario, Figure (12d)), both sensors are covering the area around the door and their FOVs are mostly overlapped. The target trajectory, score of each sensor, and the example of each depth frame and point cloud are shown in Figure (16). As it is expected in the first set of frames when the target is in the field of view of 𝐾2 , 𝐾2 is assigned to track the target due to higher provided resolution and confidence as the target exits its FOV, 𝐾1 is assigned to track the target. In the second scenario in the cluttered laboratory (Figure (12e)), the sensors are positioned in front of each other and the target walks in the area between them. The scheduler finds the best sensors to track the target in each frame and the tracking tasks switched between the sensors. However, the exchanging the task between them is followed by a small lag due to the presence of 𝑃(𝑖) in equation (8). The target’s trajectory, the score of each sensor, and the example of each depth frame and point cloud are shown in Figure (17). In the last scenario (Figure (12f)), involves the situation where the distance of the sensors to the target is almost similar to the first scenario however in this scenario the target first enters into the FoV of the sensor which is farthest away (𝐾1 ). Therefore, 𝐾1 is assigned to track the target. However, in the follow-up frames 𝑘2 is detecting the target and hence tracks the target with higher score. Therefore, it is expected 𝐾1 is mostly assigned to track the target.

The target’s trajectory, score of each sensor, and the example of each depth frame and point cloud are shown in Figure (18). As can be seen, in all of these scenarios, the scheduler provides a logical decision on selecting the most qualified sensor. As can be seen from the above study, the proposed method can be easily extended to more than two sensors.

5.3.2 Evaluating the impact of scheduler In this subsection, we analyze the impact of using the scheduler on the whole tracking system. To do so, we are comparing four different conditions regarding the detection accuracy and energy consumption. First, we are using each sensor separately to track the target, then we use both sensors together without scheduler and finally, we add the scheduler in order to compare the result with others. For measuring the accuracy of each conditions 𝑠, we define the following equation 𝑟𝑒𝑐 𝑖𝑓 𝐷𝑖𝑠 = 1 𝐷𝐴𝑐𝑐𝑠 = ∑𝑖=1:𝑛 { 𝑖 (10) 0 𝑖𝑓 𝐷𝑖𝑠 = 0 where 𝑟𝑒𝑐𝑖 is the recall value calculated for frame 𝑖 (described in section 5.1) and 𝐷𝑖𝑠 = {0,1} is detection value where 0 indicates when the subject is in the field of view of the whole tracking system but cannot be detected in condition 𝑠. 𝐷𝑖𝑠 = 1 if the subject can be detected by the sensor in the defined scenario in each of the above four conditions, and 𝑛 is the number of depth images captured during the tracking process. We would also like to compare a measure of energy consumption in the tracking in each of the above scenarios. Since the main energy is spent on capturing frames, we define 𝐸𝑛 = 𝑛/(𝑇 × 𝐹𝑠𝑒𝑛 ) which 𝑇 is the duration of the tracking and 𝐹𝑠𝑒𝑛 is the sampling frequency of the sensor (refer to section equation (9)). For the cases where both sensors are utilized without deploying any scheduler, the detection accuracy is calculated using the best recall value in each frame given the limitation of change detection algorithm. 𝐷𝐴𝑐𝑐𝑠 = ∑𝑖=1:𝑛 max 𝑟𝑒𝑐𝑖𝑗

(11)

j={1,2}

𝐾2

𝐾1

(a)

(b)

(c)

(d)

(f)

Figure 17: (a,b) The depth frame captured by 𝐾1 and 𝐾2, respectively and the corresponding point cloud which the scan is shown by green hollow (c) the trajectory of the target moving in the monitoring area (d) Comparing the scores of each sensor in each frame. (e) The representation of sensor’s situation in cooperating for target tracking, the shaded green shows that target is tracking by the sensor. Blue is the presentation of detection and the red showed no detection of the target

14

Figure (19), shows the comparison of each of the scenarios defined in subsection 5.2.1 and shown in Figure (12).

detection applications which showed an improvement in change detection accuracy. In the worst case, the precision

𝐾2

𝐾1

(a)

(c)

(b)

(d)

(e)

Figure 18: (a,b) The depth frame captured by 𝐾1 and 𝐾2, respectively and the corresponding point cloud which the scan is shown by green hollow (c) the trajectory of the target moving in the monitoring area (d) Comparing the scores of each sensor in each frame. (e) The representation of sensor’s situation in cooperating for target tracking,ofthe shaded green shows consumption that target is tracking the sensor. is K1 the is presentation of tracking detection task and the redbars) showed Figure 19: The comparison accuracy and energy in the by conditions (1)Blue Only assigned to (blue no (2) detection of the target to tracking task (red bars) (3) Scheduler handles to assign sensors to tracking task (green bars) (4) Both sensors Only K2 is assigned are assigned to tracking task simultaneously (yellow bars)

Here we show results for the cases when using each sensor separately, both sensors together without the scheduler and finally, both sensors with the scheduler. As can be seen from the comparative analysis of Figure (19), using multiple sensors boost the detection accuracy significantly in both cases where all sensors are utilized. However, by using the scheduler (Sc), less energy spent on capturing images (25% less in this experiment), while still keeping the accuracy close to the maximum accuracy that can be achieved when using both sensors without the scheduler. In these experiments we have also decreased the frequency rate to half of its set value if the sensor is in standby mode (not assigned to the tracking task). In the scenario where the subject moves between the sensors’ Fov (figure (12e)) the number of images taken in case of using scheduler is closer to the maximum number of images taken using both sensors without scheduler.

6. SUMMARY AND CONCLUSIONS In this paper, we provide a sensor scheduling scheme in a network of depth sensors based on a model for detecting the onset of changes and its utilization during object tracking. The proposed approach provides an efficient scheme so that the sensor(s) with higher qualification continues tracking and increase overall efficiency of the tracking and reduces the computational load. We have utilized only 1D scan as the basis of the change detection method instead of using whole point cloud. In the proposed method the behaviour of noise and outlier has been recognized and be utilized to present a weighting scheme. With the contribution of the neighbouring nodes and considering the weight of each point the confidence of changes in the frame has been calculated and employed to avoid faulty triggering of sensors due to noise and outlier. We evaluated the proposed change detection method on a published dataset captured by depth sensor specially designed to cover the most challenging scenarios in change

was 96% and the recall was 97% which show a significant improvement in lowering the computational complexity. The overall precision of the proposed method on the dataset was 99.83% and its recall was 97.54% showed a highly reliable method for change detection. In addition, we evaluate the proposed method using a real-time experimental set-up under different conditions. The result shows the highly reliable detection of the onset of change in different environments. Finally, the calculated confidence of detected change along with sensor resolution has been utilized to schedule a sensor of networks in a centralized architecture. to the best of our knowledge the proposed method is the first type of scheduler that uses the sensor’s feature to schedule a network of depth sensors. The scheduler was also evaluated in multiple scenarios in different conditions. Six different scenarios were considered showing a logical decision on selecting the qualified sensor for the tracking task. These scenarios are run under different conditions as using only one sensor, using multiple sensors without scheduler and applying scheduler on multi-sensor configuration. The results showed that employing multiple sensors improve the accuracy, significantly and adding the scheduler also improves the energy consumption of the tracking system. The proposed change detection method can be employed as the basis of an object detection algorithm (such as human body detection). Furthermore, this algorithm can be extended to be utilized for multi-subject tracking. This can be acheived with the use of a matching scheme to identify the corresponding object in different sensor’s perspective.

BIBLIOGRAPHY 1.

Rasouli D, M.S.; Payandeh, S. A novel depth image analysis

15

for sleep posture estimation. J. Ambient Intell. Humaniz. Comput. 1–16.

2.

Hong, C.; Yu, J.; Wan, J.; Tao, D.; Wang, M. Multimodal deep autoencoder for human pose recovery. IEEE Trans. Image Process. 2015, 24, 5659–5670.

3.

Hong, C.; Yu, J.; Tao, D.; Wang, M. Image-based threedimensional human pose recovery by multiview localitysensitive sparse retrieval. IEEE Trans. Ind. Electron. 2014, 62, 3742–3751.

4.

5.

6.

7.

8.

9.

10.

11.

16

Computer Vision {\textendash} {ACCV} 2016 Workshops; Springer International Publishing, 2017; pp. 34–45.

12.

Quan, W.; Li, H.; Han, C.; Xue, Y.; Zhang, C.; Jiang, Z.; Hu, H. A depth enhancement strategy for kinect depth image. In Proceedings of the {MIPPR} 2017: Pattern Recognition and Computer Vision; Cao, Z., Wang, Y., Cai, C., Eds.; SPIE, 2018.

13.

Suh, Y.-H.; Rhee, S.K.; Lee, K.-W. Continuous location tracking of people by multiple depth cameras. In Proceedings of the 2015 International Conference on Information and Communication Technology Convergence ({ICTC}); IEEE, 2015.

Huhle, B.; Schairer, T.; Jenke, P.; Strasser, W. Robust nonlocal denoising of colored depth data. In Proceedings of the 2008 {IEEE} Computer Society Conference on Computer Vision and Pattern Recognition Workshops; IEEE, 2008.

14.

Rasouli, S.D.M.; Payandeh, S. Dynamic posture estimation in a network of depth sensors using sample points. In Proceedings of the Systems, Man, and Cybernetics (SMC), 2017 IEEE International Conference on; 2017; pp. 1710– 1715.

Javaheri, A.; Brites, C.; Pereira, F.; Ascenso, J. Subjective and objective quality evaluation of 3D point cloud denoising algorithms. In Proceedings of the 2017 IEEE International Conference on Multimedia & Expo Workshops (ICMEW); 2017; pp. 1–6.

15.

Kerl, C.; Sturm, J.; Cremers, D. Robust odometry estimation for {RGB}-D cameras. In Proceedings of the 2013 {IEEE} International Conference on Robotics and Automation; IEEE, 2013.

16.

Sobral, A.; Vacavant, A. A comprehensive review of background subtraction algorithms evaluated with synthetic and real videos. Comput. Vis. Image Underst. 2014, 122, 4– 21.

17.

Sobral, A. BGSLibrary: An opencv c++ background subtraction library. In Proceedings of the IX Workshop de Visao Computacional; 2013; Vol. 2, p. 7.

18.

Toyama, K.; Krumm, J.; Brumitt, B.; Meyers, B. Wallflower: principles and practice of background maintenance. In Proceedings of the Proceedings of the Seventh {IEEE} International Conference on Computer Vision; IEEE, 1999.

19.

Maddalena, L.; Petrosino, A. Background subtraction for moving object detection in rgbd data: A survey. J. Imaging 2018, 4, 71.

20.

Nguyen, V.-T.; Vu, H.; Tran, T.-H. An efficient combination of RGB and depth for background subtraction. In Proceedings of the The National Foundation for Science and Technology Development (NAFOSTED) Conference on Information and Computer Science; 2014; pp. 49–63.

21.

Pham, T.T.D.; Nguyen, H.T.; Lee, S.; Won, C.S. Moving object detection with Kinect v2. In Proceedings of the Consumer Electronics-Asia (ICCE-Asia), IEEE International Conference on; 2016; pp. 1–4.

Amon, C.; Fuhrmann, F.; Graf, F. Evaluation of the spatial resolution accuracy of the face tracking system for kinect for windows v1 and v2. In Proceedings of the Proceedings of the 6th Congress of the Alps Adria Acoustics Association; 2014; pp. 16–17.

Zennaro, S.; Munaro, M.; Milani, S.; Zanuttigh, P.; Bernardi, A.; Ghidoni, S.; Menegatti, E. Performance evaluation of the 1st and 2nd generation Kinect for multimedia applications. In Proceedings of the 2015 {IEEE} International Conference on Multimedia and Expo ({ICME}); IEEE, 2015.

Xu, G.; Payandeh, S. Sensitivity study for object reconstruction using a network of time-of-flight depth sensors. In Proceedings of the 2015 {IEEE} International Conference on Robotics and Automation ({ICRA}); IEEE, 2015.

Yang, L.; Zhang, L.; Dong, H.; Alelaiwi, A.; Saddik, A. El Evaluating and Improving the Depth Accuracy of Kinect for Windows v2. {IEEE} Sensors J. 2015, 15, 4275–4285.

Fankhauser, P.; Bloesch, M.; Rodriguez, D.; Kaestner, R.; Hutter, M.; Siegwart, R. Kinect v2 for mobile robot navigation: Evaluation and modeling. In Proceedings of the 2015 International Conference on Advanced Robotics ({ICAR}); IEEE, 2015.

Wasenmüller, O.; Stricker, D. Comparison of Kinect V1 and V2 Depth Images in Terms of Accuracy and Precision. In

22.

Moyà-Alcover, G.; Elgammal, A.; Jaume-i-Capó, A.; Varona, J. Modeling depth for nonparametric foreground segmentation using RGBD devices. Pattern Recognit. Lett. 2017, 96, 76–85.

23.

Hu, T.; Zhang, H.; Zhu, X.; Clunis, J.; Yang, G.; Maddalena, L.; Petrosino, A.; Pham, T.T.D.; Nguyen, H.T.; Lee, S.; et al. Improving video segmentation by fusing depth cues and the visual background extractor (ViBe) algorithm. Syst. Man, Cybern. (SMC), 2017 IEEE Int. Conf. 2017, 96, 76–85.

24.

Barnich, O.; Droogenbroeck, M. Van {ViBE}: A powerful random technique to estimate the background in video sequences. In Proceedings of the 2009 {IEEE} International Conference on Acoustics, Speech and Signal Processing; IEEE, 2009.

25.

Hu, T.; Zhang, H.; Zhu, X.; Clunis, J.; Yang, G. Depth sensor based human detection for indoor surveillance. Futur. Gener. Comput. Syst. 2018, 88, 540–551.

26.

Palmero, C.; Clapés, A.; Bahnsen, C.; Møgelmose, A.; Moeslund, T.B.; Escalera, S. Multi-modal {RGB}{\textendash}Depth{\textendash}Thermal Human Body Segmentation. Int. J. Comput. Vis. 2016, 118, 217–239.

27.

28.

Xiong, N.; Svensson, P. Multi-sensor management for information fusion: issues and approaches. Inf. Fusion 2002, 3, 163–186.

Liu, H.; Wan, P.; Yi, C.-W.; Jia, X.; Makki, S.; Pissinou, N. Maximal lifetime scheduling in sensor surveillance networks. In Proceedings of the Proceedings {IEEE} 24th Annual Joint Conference of the {IEEE} Computer and Communications Societies.; IEEE.

29.

Yan, T.; He, T.; Stankovic, J.A. Differentiated surveillance for sensor networks. In Proceedings of the Proceedings of the first international conference on Embedded networked sensor systems - {SenSys} {\textquotesingle}03; {ACM} Press, 2003.

30.

Fu, P.; Tang, H.; Cheng, Y.; Li, B.; Qian, H.; Yuan, X. An energy-balanced multi-sensor scheduling scheme for collaborative target tracking in wireless sensor networks. Int. J. Distrib. Sens. Networks 2017, 13, 155014771769896.

31.

Gilliam, C.; Ristic, B.; Angley, D.; Suvorova, S.; Moran, B.; Fletcher, F.; Gaetjens, H.; Simakov, S. Scheduling of Multistatic Sonobuoy Fields Using Multi-Objective Optimization. In Proceedings of the 2018 {IEEE} International Conference on Acoustics, Speech and Signal Processing ({ICASSP}); IEEE, 2018.

32.

Fu, P.; Cheng, Y.; Tang, H.; Li, B.; Pei, J.; Yuan, X. An Effective and Robust Decentralized Target Tracking Scheme in Wireless Camera Sensor Networks. Sensors 2017, 17, 639.

33.

Sun, S.-W.; Kuo, C.-H.; Chang, P.-C. People tracking in an environment with multiple depth cameras: A skeleton-based pairwise trajectory matching scheme. J. Vis. Commun. Image Represent. 2016, 35, 36–54.

34.

Faion, F.; Friedberger, S.; Zea, A.; Hanebeck, U.D. Intelligent sensor-scheduling for multi-kinect-tracking. In Proceedings of the 2012 {IEEE}/{RSJ} International Conference on Intelligent Robots and Systems; IEEE, 2012.

35.

Taher, M.R.H.; Ahmadi, H.; Hashemi, M.R. Power-aware analysis of H. 264/AVC encoding parameters for cloud gaming. In Proceedings of the 2014 IEEE International Conference on Multimedia and Expo Workshops (ICMEW); 2014; pp. 1–6.

36.

Bodor, R.; Drenner, A.; Schrater, P.; Papanikolopoulos, N. Optimal Camera Placement for Automated Surveillance Tasks. J. Intell. Robot. Syst. 2007, 50, 257–295.

37.

Yu, J.; Rui, Y.; Tao, D. Click prediction for web image reranking using multimodal sparse coding. IEEE Trans. Image Process. 2014, 23, 2019–2032.

38.

Yu, J.; Tao, D.; Wang, M.; Rui, Y. Learning to rank using user clicks and visual features for image retrieval. IEEE Trans. Cybern. 2014, 45, 767–779.

39.

Yu, J.; Yang, X.; Gao, F.; Tao, D. Deep multimodal distance metric learning using click constraints for image ranking. IEEE Trans. Cybern. 2016, 47, 4014–4024.

40.

Schuon, S.; Theobalt, C.; Davis, J.; Thrun, S. High-quality scanning using time-of-flight depth superresolution. In Proceedings of the 2008 {IEEE} Computer Society Conference on Computer Vision and Pattern Recognition Workshops; IEEE, 2008.

41.

Lachat, E.; Macher, H.; Mittet, M.A.; Landes, T.; Grussenmeyer, P. First experiences with Kinect v2 sensor for close range 3D modelling. Int. Arch. Photogramm. Remote Sens. Spat. Inf. Sci. 2015, 40, 93.

17