Optics Communications 448 (2019) 69–75
Contents lists available at ScienceDirect
Optics Communications journal homepage: www.elsevier.com/locate/optcom
Foveated ghost imaging based on deep learning Xiang Zhai a,b ,∗, Zheng-dong Cheng a , Yang-di Hu a , Yi Chen a , Zhen-yu Liang a , Yuan Wei a a b
State Key Laboratory of Pulsed Power Laser Technology, National University of Defense Technology, Hefei 230037, China Science and Technology on Electro-Optical Information Security Control Laboratory, Tianjin 300450, China
ARTICLE Keywords: Ghost imaging Foveated imaging Deep learning Object detection
INFO
ABSTRACT Ghost imaging is an unconventional imaging mechanism that utilizes the high-order correlation to reconstruct object’s image. Limited by the maximum refresh rate of DMD or SLM, the sampling efficiency of ghost imaging has been a major obstacle for practical application. In this paper, foveated ghost imaging based on deep learning (DPFGI) is proposed to generate non-uniform resolution speckle patterns according to the object detection results as the fovea point. We combine foveated speckle pattern inspired by the human visual system with GAN-based ghost imaging object detection system to realize selecting the region of interest for foveated imaging intelligently. The simulation and experimental results show that DPFGI can detect objects in undersampled images with higher accuracy and achieve higher PSNR in the fovea region compared with uniform-resolution ghost imaging, which opens new perspectives for more intelligent ghost imaging.
1. Introduction As an unconventional imaging mechanism based on high-order correlation of light fields, ghost imaging (GI), also known as correlated imaging, allows an object’s image to be reconstructed in the optical path without objects. In 1995, GI was first demonstrated as a quantum scheme by Pittman et al. [1] using entangled photon pairs. Years later, GI was observed as a classic scheme by Bennink et al. [2] using pseudothermal light in 2002. No matter performed with quantum states or classical light, both GI schemes need one reference arm and one test arm to record the light fields in two correlated beams. Subsequently, computational ghost imaging was proposed by Shapiro et al. [3] to simplify optical scheme by employing digital micro-mirror device (DMD) or spatial light modulator (SLM) to replace the reference arm, and improve reconstruction efficiency by applying compressive sensing reconstruction algorithms [4]. Due to the advantages of GI over scattering robustness and wide spectral range, more extensive attention has been attracted to many related fields including remote sensing [5], terahertz imaging [6], three-dimensional LiDAR [7], fluorescence imaging [8], optical encryption [9], etc. For the practical application development of GI, sampling efficiency limited by the maximum refresh rate of DMD or SLM has been a major challenge to be overcome. Several attempts focus on optimizing imaging methods [10–13] and improving hardware performance [14– 16]. Nevertheless, high-resolution imaging in complex backgrounds still struggles to address the speed/quality trade-offs problem. Owing to the sampling process of multiple speckle patterns in CGI, speckle pattern design is one of the key factors determining sampling resolution ∗
and reconstruction speed [17]. Inspired by the fact that human visual system’s spatial resolution decreases from the fovea point of gaze [18, 19], foveated imaging emerged to realize space-variant resolution and low-bandwidth processing [20–23]. More Recently, Phillips et al. [24] demonstrated an adaptive foveated single-pixel imaging method to create spatially variant patterns on DMD by combining a Cartesian grid within the fovea and a peripheral cylindrical polar system of pixels surrounding the fovea. However, the fovea gaze control in [24] provides constant fovea size and a limited number of fovea locations, which is not suitable for objects of uncertain size in real scenes. As is well known, the human visual system can automatically extract the most important parts from a large number of redundant visual inputs by high-level processing in cerebral cortex, such as memory, object detection, object tracking, etc. Therefore, we can train a selective visual attention mechanism in foveated ghost imaging by deep learning methods to improve sampling efficiency. In this paper, we focus on deep learning methods in foveated ghost imaging that can offer more intelligent approaches to fovea gaze control. The proposed foveated ghost imaging based on deep learning (DPFGI) realizes generating foveated speckle patterns according to the object detection results. The foveated speckle pattern inspired by the human visual system is constituted with 𝐿-layer pyramid of different resolution speckle patterns to enhance the image quality in the fovea region. The object detection system introduces generative adversarial nets (GAN) into SSD architecture to improve the robustness to undersampling. We evaluate the performance of DPFGI compared with uniform-resolution ghost imaging and validate the feasibility of detecting specific category object for foveated ghost imaging.
Corresponding author. E-mail address:
[email protected] (X. Zhai).
https://doi.org/10.1016/j.optcom.2019.05.019 Received 1 April 2019; Received in revised form 6 May 2019; Accepted 12 May 2019 Available online 15 May 2019 0030-4018/© 2019 Elsevier B.V. All rights reserved.
X. Zhai, Z.-d. Cheng, Y.-d. Hu et al.
Optics Communications 448 (2019) 69–75
2. Methods
in object detection is rectangular, the proposed foveated speckle pattern also follows the rectangular coordinates. Let 𝑃origin (𝑥, 𝑦) be two-dimensional rectangular coordinates of the 𝑊 × 𝐻 uniform-resolution speckle pattern. We perform equal interval sampling on the original uniform-resolution speckle pattern to constitute the 𝐿-layer pyramid of speckle patterns with different resolution. The speckle patterns of the 𝑙th layer are given by
2.1. Experimental setup The experimental setup of DPFGI can be represented graphically in Fig. 1. A collimated laser beam is irradiated onto a scene containing objects and background. The reflected light goes through a convergent lens and illuminates on a DMD to be modulated into a series of speckle patterns. A bucket detector is used to measure the total light intensity transmitted by each speckle pattern. Knowing the measured light intensities and the sequence of speckle patterns, we can reconstruct the image by the correlation of light fields. Different from uniform-resolution ghost imaging (URGI), the DPFGI method needs two steps of sampling and reconstruction: Firstly, the DMD generates a series of uniform-resolution speckle patterns to reconstruct an undersampled image with as few measurement times as possible. Then, the computer performs deep learning object detection on the undersampled image to acquire the bounding box position of specific objects. Finally, the DMD generates foveated speckle patterns based on the given bounding box for further task-specific imaging. In image reconstruction, we assume that the total number of pixels in one speckle pattern is 𝑁, and the image 𝑇 (𝑖) can be vectorized as an 𝑁 element column vector 𝐱 ∈ R𝑁×1 . The speckle pattern in 𝑚th measurement is recorded as 𝐼𝑚 (𝑖), where 𝑖 = 1, 2, … , 𝑁 represents the pixel coordinates, and 𝑚 = 1, 2, … , 𝑀 represents the sampling frame index. Therefore, each speckle pattern can be vectorized as a row vector Φ𝑚 ∈ R1×𝑁 . For 𝑀 times measurement with the same resolution, the sampling process can be written as 𝑀 × 𝑁 measurement matrices: ⎡Φ1 ⎤ ⎡ 𝐼1 (1) ⎢ ⎥ ⎢ Φ 𝐼 (1) Φ=⎢ 2 ⎥=⎢ 2 ⎢⋮ ⎥ ⎢ ⋮ ⎢Φ ⎥ ⎢𝐼 (1) ⎣ 𝑀⎦ ⎣ 𝑀
𝐼1 (2) 𝐼2 (2) ⋮ 𝐼𝑀 (2)
⋯ ⋯ ⋱ ⋯
𝐼1 (𝑁) ⎤ ⎥ 𝐼2 (𝑁) ⎥ . ⋮ ⎥ ⎥ 𝐼𝑀 (𝑁)⎦
⎧ ⎪𝑃 (𝑥, 𝑦) 𝑃𝑙 (𝑥, 𝑦) = ⎨ origin ( [ ] [ ] ) ⎪𝑃𝑙−1 2𝑙−1 𝑥∕2𝑙−1 + 1, 2𝑙−1 𝑦∕2𝑙−1 + 1 ⎩
𝑁 ∑
𝐼𝑚 (𝑖) 𝑇 (𝑖) .
∀𝑥, 𝑦,
(4) where [⋅] stands for the integer-valued function. In this way, each 2𝑙−1 × 2𝑙−1 pixel patch of equal interval in speckle patterns takes the same value +1 or −1 as the upper left pixel in that patch and combines as one large pixel for processing. With the deep learning object detection, we can obtain the coordinates of the bounding box. Knowledge of that the central point of the ( ) bounding box 𝐹 = 𝑥𝑓 , 𝑦𝑓 , and the width and height of the bounding box are 𝑤𝑓 and ℎ𝑓 respectively, the masks of 𝐿 layers can be expressed as ⎧ ⎪ ⎪1 𝑀𝑎𝑠𝑘𝑙 (𝑥, 𝑦) = ⎨ ⎪ ⎪0 ⎩
( 𝑙−1 ∑ 𝑖=0
| | 𝑟𝑖 ≤ |𝑥 − 𝑥𝑓 | − 𝑤𝑓 ≤ 𝑟𝑙 | |
) ∩
( 𝑙−1 ∑ 𝑖=0
| | 𝑟𝑖 ≤ |𝑦 − 𝑦𝑓 | − ℎ𝑓 ≤ 𝑟𝑙 | |
) ,
(5)
else
where 𝑟𝑙 indicates the radius of the 𝑙th layer, and 𝑟0 = 𝑟1 = 0. For the case where the resolution uniformly reduces from the central point to the borders, the radius 𝑟𝑙 is determined by the relative space position of the bounding box: } { ℎ 𝑤 ℎ 𝑤 max 𝑥𝑓 − 2𝑓 , 𝑦𝑓 − 2𝑓 , 𝑊 − 𝑥𝑓 − 2𝑓 , 𝐻 − 𝑦𝑓 − 2𝑓 . (6) 𝑟𝑙 = 𝐿−1 Here, we multiply the speckle patterns of each layers 𝑃𝑙 by different masks 𝑀𝑎𝑠𝑘𝑙 and add them together. In this way, the final speckle pattern decreases the total number of pixels 𝑁, as well as highlights the region of fovea with decreasing resolution from the fovea to the periphery. The foveated speckle pattern can be calculated by
(1)
In 𝑚th measurement, the light intensity in bucket detector is 𝑏𝑚 =
𝑙=1 𝑙 = 2, 3, … , 𝐿
(2)
𝑖=1
Thus, the ghost imaging linear system is as follows: (3) [ ]T where 𝐲 = 𝑏1 , 𝑏2 , … , 𝑏M stands for the vectorized representation of the measured light intensities in 𝑀 times measurements. The image 𝑇 (𝑖) can be reconstructed by solving above optimization problem for the 𝑙0 norm. Several researches [25,26] show that the total variation minimization by augmented Lagrangian and alternating direction algorithms (TVAL3) [27] based on the Hadamard basis [28] perform outstanding in state-of-the-art compressive sensing algorithms. Therefore, we use TVAL3 algorithm based on the Hadamard basis for ghost imaging reconstruction in this paper. Furthermore, when the measurement times are much smaller than the total number of pixels in one speckle pattern as 𝑀 ≪ 𝑁, both the sampling rate and the image quality are relatively low in reconstruction since too many unknown quantity in solving Eq. (4). Therefore, one of the solutions to improve reconstruction quality with limited measurement times is to decrease the pixel number in one speckle pattern and reserve enough pixels in the region of interest. 𝐲 = Φ𝐱.
𝑃𝐹 𝑜𝑣𝑒𝑎 =
𝐿 ∑
𝑀𝑎𝑠𝑘𝑙 × 𝑃𝑙 .
(7)
𝑖=1
With the aim of comparison, Fig. 2 demonstrates an example of the ‘Lena’ image reconstructed with uniform-resolution speckle patterns and foveated speckle patterns. The original speckle pattern is a 128 × 128 pixel Hadamard matrix and takes values of +1 or −1, as shown in Fig. 2(a). The reconstructed ‘Lena’ image with uniform-resolution speckle patterns by 2000 times measurement is shown in Fig. 2(b). Here, we take the central point 𝐹 = (64, 64) as the fovea of the ‘‘Lena’’ image, reduces the pixel number between adjacent layers by a factor of 4, and set parameters: 𝐿 = 4, 𝑤𝑓 = ℎ𝑓 = 32. Hence, the total number of pixels in one speckle pattern 𝑁 decreases from 16,384 to 2224. The resolution of foveated speckle pattern reduces from the center point to the periphery, and the red lines highlight the edges of each layer, as shown in Fig. 2(c). The reconstructed ‘Lena’ image with foveated speckle patterns by 2000 times measurement is shown in Fig. 2(d). From Fig. 2(b) and (c), it can be observed that the reconstructed image with foveated speckle patterns has better detail presentation in the fovea region than the reconstructed image with uniform-resolution speckle patterns by the same measurement times. On the contrary, the image reconstructed with foveated speckle patterns has lower resolution in the surrounding region. Consequently, the application of foveated speckle patterns can enhance the detail quality in the fovea region at the expense of the resolution in the surrounding region, and improve the sampling efficiency in a specific scene where the imaging target is known.
2.2. Foveated speckle pattern The fovea of the human retina, which is the most sensitive area of vision, has the highest resolution in the focus area from the center to the peripheral space. The foveated imaging simulates the biological property that the spatial resolution drops sharply with the retinal eccentricity. Based on this principle, we use the multi-resolution pyramid method to design the foveated speckle pattern. Since the bounding box 70
X. Zhai, Z.-d. Cheng, Y.-d. Hu et al.
Optics Communications 448 (2019) 69–75
Fig. 1. Experimental setup of foveated ghost imaging based on deep learning.
Faster-RCNN has accuracy advantage and YOLO has speed advantage among above architectures [35]. Nevertheless, SSD has the ability to balance accuracy and speed by combining the multi-scale feature maps of Faster-RCNN with the regression mechanism of YOLO. SSD300 can achieve above 70% mAP on Pascal VOC2007 test at more than 40 FPS [32], which satisfies the real-time demand of DPFGI. Hence, we choose SSD as the object detection architecture in DPFGI. Fig. 3 shows the GAN-based ghost imaging object detection system on SSD architecture. We take the ghost imaging reconstruction process as the generative model to generate reconstructed images 𝐺(𝑇𝑖 , 𝑀𝑗 , 𝑁𝑗 ) from 𝑛 original images 𝑇𝑖 in the training set under 𝑘 groups of different resolutions 𝑁𝑗 and sampling rates 𝑀𝑗 ∕𝑁𝑗 , where 𝑖 = 1, 2, … , 𝑛 and 𝑗 = 1, 2, … , 𝑘. Then, the training batches consisting of generated images are input into the SSD object detection model. We take the object detection part as the discriminative model. In the SSD architecture with a 300×300 input size, the base network VGG-16 keeps the first five layers of and transforms the FC6 and FC7 layers into two convolution layers. Moreover, three convolution layers and an average pool layer are added to the end of the base network. Afterwards, the multi-scale feature maps are used for the offsets of the default boxes and the predictions of the different categories. Finally, the nonmaximum suppression step outputs the detection results. According to the calculated loss of final detection, we select the reconstructed images with high loss value, which are difficult to detect, as training set for the next iteration. Fig. 2. Simulation results of the ‘Lena’ image with uniform-resolution ghost imaging and foveated ghost imaging. (a) 128 × 128 Uniform-resolution speckle pattern. (b) Reconstructed ‘Lena’ image with uniform-resolution speckle patterns by 2000 times measurement. (c) Foveated speckle pattern. (d) Reconstructed ‘Lena’ image with foveated speckle patterns by 2000 times measurement.
The loss function of the discriminative model 𝐿𝐷 can be calculated by the weighted sum of confidence loss 𝐿𝑐𝑜𝑛𝑓 (𝑥, 𝑐) and localization loss ( ) 𝐿𝑙𝑜𝑐 𝑥, 𝑏𝑝 , 𝑏𝑔 𝐿𝐷 =
( )) 1 ( 𝐿 (𝑥, 𝑐) + 𝛼𝐿𝑙𝑜𝑐 𝑥, 𝑏𝑝 , 𝑏𝑔 , 𝑁𝑢𝑚 𝑐𝑜𝑛𝑓
(8)
where 𝑁𝑢𝑚 is the number of matched default boxes, 𝑥 = {0, 1} indicates whether the matched default box belongs to the category, 𝑏𝑝 is predicted box, 𝑏𝑔 is the ground truth box, 𝑐 is the confidence that the object belongs to the category, and 𝛼 is the weight set by cross validation.
2.3. GAN-based ghost imaging object detection In order to apply deep learning object detection to ghost imaging, we have introduced generative adversarial networks (GAN) [29] into ghost imaging object detection on Faster-RCNN architecture to improve robustness to low resolution and low sampling rate [30]. For the demand of DPFGI on real-time performance of object detection, we have considered mainstream object detection architectures including FasterRCNN [31], SSD [32], YOLO [33,34], etc. Numerous studies show that
The training steps of the GAN-based ghost imaging object detection system on SSD architecture are as follows: Step 1: Pre-train the SSD architecture with 𝑛 original images, take the resulting checkpoints as the initial object detection model. 71
X. Zhai, Z.-d. Cheng, Y.-d. Hu et al.
Optics Communications 448 (2019) 69–75
Fig. 3. GAN-based ghost imaging object detection system on SSD architecture. Table 1 PASCAL VOC2007 test detection average precision (%).
Step 2: For each original image, the generative model reconstructs images with random speckle patterns under 𝑘 groups of different resolutions and sampling rates. Step 3: The discriminative model takes 𝑘 groups of 𝑛 images as input to obtain the loss for each image. For each original image, the generative model selects the resolution and sampling rate with the highest loss value. Step 4: For optimizing the object detection model, take the images selected by the generative model to train the SSD model. Step 5: Repeat Step 3 and 4 until the specified number of iterations. Thus, we can generate examples that harder to detect for training the object detection model. Through this joint-training method of GAN, We have expanded the number of training set and improved the robustness of the model to different degrees of low resolution and low sampling rate. Notably, we adjust the minimum confidence threshold according to the sampling rate, and only take the one with the highest confidence for the different classification results with overlapping bounding boxes.
Model
mAP aero bike
bird
boat
bottle bus
car
SSD 36.8 45.3 41.5 SSD* 42.1 52.4 46.1 GAN-SSD 44.4 54.0 48.9
36.2 38.4 39.8
29.4 36.8 36.3
23.8 28.4 33.7
40.7 40.5 28.1 50.3 44.9 33.8 52.5 45.5 37.2
Model
table dog
horse mbike person plant
SSD 32.5 46.2 40.3 SSD* 33.8 53.4 46.8 GAN-SSD 34.2 55.0 48.2
39.1 43.4 47.6
38.9 45.0 49.7
24.6 26.7 31.5
43.1 50.7 54.1
cat
chair cow 36.0 40.3 39.5
sheep sofa train tv 36.7 40.0 41.3
33.8 43.4 35.0 37.6 51.5 41.2 36.9 54.5 46.8
using original images and unified reconstructed images with height 64 and width adaptive by 1000 times measurement, namely SSD*; (c) The GAN-based training model, namely GAN-SSD. Table 1 shows the mAP and average precision of 20 categories on the above three models. From the results, we can see that our proposed model has the highest accuracy, surpassing SSD by 7.6% and SSD* by 2.3% respectively. As shown in Fig. 4(1-a,b,c,d), we select four original images containing aeroplane, car, train, person and dog from ‘‘VOC2007 test’’ dataset as examples. In Fig. 4(2-a,b,c,d), we reconstruct images with uniform-resolution speckle patterns of height 128 and width adaptive by 2000 times measurement as the results of URGI for later comparison. Then, we implement DPFGI on the four original images. For object detection, we firstly reconstruct undersampled images with uniformresolution speckle patterns of height 64 and width adaptive by 1000 times measurement and set the minimum confidence threshold as 0.3. The detection results of undersampled images by the trained GANbased object detection model are shown in Fig. 4(3-a,b,c,d). According to the detection results, 4-layer foveated speckle patterns of four images can be generated as Fig. 4(4-a,b,c,d,e). In particular, for the fourth image in which the two objects are detected, we generate two different foveated speckle patterns taking the two objects as the fovea points respectively. Concretely, Fig. 4(4-d) takes the detected ‘‘person’’ as the fovea point, and Fig. 4(4-e) takes the detected ‘‘dog’’ as the fovea point. Finally, the reconstructed images with foveated speckle patterns by 1000 time measurement are shown in Fig. 4(5-a,b,c,d,e). In single-object scenes, shown as Fig. 4(2-a,b,c) and (5-a,b,c), we can see that the image quality in the fovea region, as same as the bounding box area, is significantly enhanced by the DPFGI method. It is noteworthy that although DPFGI requires two stages of sampling, the image quality in the fovea region still greatly improves when the
3. Results 3.1. Simulation results To verify the effectiveness of DPFGI, we firstly conduct the object detection model training with PASCAL VOC datasets [36] on the TensorFlow platform. We use 5011 images of 20 categories in ‘‘VOC2007 trainval’’ dataset and 17,125 images of 20 categories in ‘‘VOC2012 trainval’’ for training, and 4952 images in ‘‘VOC2007 test’’ dataset for testing. We train the resulting model using SGD for 80,000 iterations to train our models. The learning rate starts from 0.001 and reduces to 0.0001 over 60,000 iterations. The experimental platform is configured on Ubuntu 16.04 operating system, Intel(R) Xeon(R) CPU E5-2673 v3 @ 2.40 GHz processor, 32 GB RAM and an Nvidia GeForce GTX 1080Ti/PCIe/SSE2 graphics card. We set 4 groups of different resolutions and sampling rates as: (1) height 128, width adaptation, sampling rate 10%; (2) height 128, width adaptive, sampling rate 20%; (3) height 64, width adaptive, sampling rate 20%; (4) height 64, width adaptive, sampling rate 30%. The images reconstructed with uniform-resolution speckle patterns of height 64 and width adaptive by 1000 times measurement are taken for testing. Then, we take the following three models for comparison: (a) the model trained using original images, namely SSD; (b) the model trained 72
X. Zhai, Z.-d. Cheng, Y.-d. Hu et al.
Optics Communications 448 (2019) 69–75
Fig. 4. Simulation results of URGI and DPFGI. (1) The original images containing aeroplane, car, train, person and dog. (2) The reconstructed images with uniform-resolution speckle patterns of height 128 and width adaptive by 2000 times measurement. (3) The detection results on undersampled images with uniform-resolution speckle patterns of height 64 and width adaptive by 1000 times measurement. (4) The foveated speckle patterns based on object detection results. (5) The reconstructed images with foveated speckle patterns by 1000 times measurement. Table 2 Overall and Fovea PSNR (dB) on the Simulation Results of URGI and DPFGI.
total measurement times are the same. In multi-object scenes, shown as Fig. 4(2-d) and (5-d,e), we can see that DPFGI distinguishes two categories of objects and improves the image quality in the corresponding object region with different foveated speckle patterns. This provides feasibility for performing foveated ghost imaging in the region of specific category object and eliminating interference from objects of other categories intelligently. For quantitative evaluations, Table 2 presents the peak signal-tonoise ratio (PSNR) of the overall image and the fovea region on the results of URGI and PDFGI compared with the original images. It also appears that the improvement of overall PSNR of DPGI compared with URGI is quite limited, and the overall PSNR even reduces in images with relatively complex backgrounds such as the ‘‘aeroplane’’. Nevertheless, the advantage of DPFGI in fovea PSNR is obvious compared with URGI owing to the improvement of the equivalent sampling rate in the fovea region.
Image
aeroplane
car
train
Overall
URGI DPFGI
22.19 21.20
20.08 20.29
17.83 17.92
19.164 18.88 19.28
Fovea
URGI DPFGI
29.55 34.47
26.93 29.20
28.02 29.57
26.86 27.74
Fovea
person+dog person
dog
29.01 31.23
time is 1.3750 s. Then, we start the DPFGI reconstruction with 64 × 64 Hadamard speckle patterns, where each pixel consists of 4 × 4 micro-mirrors on the DMD. The undersampled image reconstructed by 1000 times measurement is shown in Fig. 5(c), the projection time is 0.0091 s and the reconstruction time is 0.4531 s. The detection result on the undersampled image by the trained GAN-SSD model is shown in Fig. 5(d), and the detection time is 0.0338 s. According to the two detected objects, the computer generates foveated speckle patterns on the fovea of the detected ‘‘aeroplane’’ and ‘‘car’’ respectively. Finally, the reconstructed images on the fovea of ‘‘aeroplane’’ and ‘‘car’’ by 1000 times measurement are shown in Fig. 5(e) and (f) respectively, the projection time is 0.0091 s and the reconstruction time is 0.7969 s and 0.7656 s respectively.
3.2. Experimental results To validate the simulation results, experiments are implemented using the setup shown in Fig. 1. We use a CW laser source (wavelength: 532 nm, 300 mW) to illuminate the reconstructed scene and a DMD (Texas DLP4500) to modulate the light field. The DMD has 912 × 1140 resolution with each micro-mirror size of 7.6 μm × 7.6 μm. The total light intensity is detected by a photomultiplier (THORLABS PMM02-1). The experimental scene consists of an aeroplane model and a car model before the black background on the optical table, as shown in Fig. 5(a). We start the URGI reconstruction with 128 × 128 Hadamard speckle patterns, where each pixel consists of 2 × 2 micro-mirrors on the DMD. The result of URGI reconstructed by 2000 times measurement is shown in Fig. 5(b), the projection time is 0.0182 s and the reconstruction
It can be seen from Fig. 5 that both URGI and DPFGI can reconstruct the scene images, but the results of DPFGI show better image quality in the corresponding fovea region. The GAN-SSD model implements object detection on the undersampled image and realizes the relatively correct distinction and positioning of the two categories of objects. The total calculation time of reconstruction and detection in DPFGI is shorter than that time in URGI. However, the process of generating 73
X. Zhai, Z.-d. Cheng, Y.-d. Hu et al.
Optics Communications 448 (2019) 69–75
Fig. 5. Experimental results of URGI and DPFGI. (a) The original scene containing aeroplane and car. (b) The result of URGI with 128 × 128 speckle patterns by 2000 times measurement. (c) The undersampled image with 64 × 64 speckle patterns by 1000 times measurement. (d) The detection result on the undersampled image. (e) The result of DPFGI taking the detected ‘‘aeroplane’’ as the fovea point by 1000 times measurement. (f) The result of DPFGI taking the detected ‘‘car’’ as the fovea point by 1000 times measurement.
foveated speckle patterns may cost extra time, which can be solved by generating several series of foveated speckle patterns of different positions and sizes in advance and selecting the approximate ones in imaging. Thus, DPFGI can automatically obtain non-uniform resolution images on the fovea of different category objects, reduces the total number of pixels, increases the actual sampling rate, and ultimately achieves image quality improvement in the fovea region.
[2] R.S. Bennink, S.J. Bentley, R.W. Boyd, Two-photon coincidence imaging with a classical source, Phys. Rev. Lett. 89 (2002) 113601. [3] J.H. Shapiro, Computational ghost imaging, Phys. Rev. A 78 (2008) 061802. [4] O. Katz, Y. Bromberg, Y. Silberberg, Compressive ghost imaging, Appl. Phys. Lett. 95 (2009) 131110. [5] B.I. Erkmen, Computational ghost imaging for remote sensing, J. Opt. Soc. Amer. A 29 (2012) 782–789. [6] D. Shrekenhamer, C.M. Watts, W.J. Padilla, Terahertz single pixel imaging with an optically controlled dynamic spatial light modulator, Opt. Express 21 (2013) 12507–12518. [7] W. Gong, C. Zhao, H. Yu, M. Chen, W. Xu, S. Han, Three-dimensional ghost imaging lidar via sparsity constraint, Sci. Rep. 6 (2016) 26133. [8] M. Tanha, S. Ahmadikandjani, R. Kheradmand, H. Ghanbari, Computational fluorescence ghost imaging, Eur. Phys. J. D 67 (2013) 44. [9] P. Clemente, V. Durán, E. Tajahuerce, J. Lancis, Optical encryption based on computational ghost imaging, Opt. Lett. 35 (2010) 2391–2393. [10] F. Ferri, D. Magatti, L. Lugiato, A. Gatti, Differential ghost imaging, Phys. Rev. Lett. 104 (2010) 253603. [11] B. Sun, S.S. Welsh, M.P. Edgar, J.H. Shapiro, M.J. Padgett, Normalized ghost imaging, Opt. Express 20 (2012) 16892–16901. [12] M. Amann, M. Bayer, Compressive adaptive computational ghost imaging, Sci. Rep. 3 (2013) 1545. [13] W.-K. Yu, M.-F. Li, X.-R. Yao, X.-F. Liu, L.-A. Wu, G.-J. Zhai, Adaptive compressive ghost imaging based on wavelet trees and sparse representation, Opt. Express 22 (2014) 7133–7144. [14] M.-J. Sun, M.P. Edgar, D.B. Phillips, G.M. Gibson, M.J. Padgett, Improving the signal-to-noise ratio of single-pixel imaging using digital microscanning, Opt. Express 24 (2016) 10476–10485. [15] Z.-H. Xu, W. Chen, J. Penuelas, M. Padgett, M.-J. Sun, 1000 fps computational ghost imaging using LED-based structured illumination, Opt. Express 26 (2018) 2427–2434. [16] M.-J. Sun, W. Chen, T.-F. Liu, L.-J. Li, Image retrieval in spatial and temporal domains with a quadrant detector, IEEE Photonics J. 9 (2017) 1–6. [17] M. Chen, E. Li, S. Han, Application of multi-correlation-scale measurement matrices in ghost imaging via sparsity constraints, Appl. Opt. 53 (2014) 2924–2928. [18] B. O’Brien, Vision and resolution in the central retina, JOSA 41 (1951) 882–894. [19] W. Becker, A. Fuchs, Further properties of the human saccadic system: eye movements and correction saccades with and without visual fixation points, Vis. Res. 9 (1969) 1247–1258. [20] W.S. Geisler, J.S. Perry, Real-time foveated multiresolution system for lowbandwidth video communication, human vision and electronic imaging III, Int. Soc. Opt. Photonics (1998) 294–306. [21] R. Larcom, T.R. Coffman, Foveated image formation through compressive sensing, image analysis & interpretation (SSIAI), in: 2010 IEEE Southwest Symposium on, IEEE, 2010, pp. 145–148.
4. Conclusions In summary, we have proposed and validated foveated ghost imaging based on deep learning, which utilize the GAN-based ghost imaging object detection results to generate foveated speckle patterns. Different from projecting partial speckle patterns on the scene, DPFGI improves the image quality in the fovea region obtained by the object detection results while keeping the field of view unchanged, which provides possibility for high-speed tracking ghost imaging in more intelligent manner. Considering that deep learning object detection system has limitation on multi-category objects and dramatic changes in scenes, we can enhance our system by establishing dataset of a certain category objects with variable occlusions and deformations to train a more adaptive model. For the uncertain measurement times in reconstructing an undersampled image to be detected, we can further adopt adaptive sampling to achieve detection results with as few measurement times as possible. In addition, it is necessary to speed up the process of speckle pattern generating and image reconstruction using GPU acceleration and parallel computing. Acknowledgments This study was supported by National Natural Science Foundation of China (NSFC) (61271376), Anhui Provincial Natural Science Foundation, China (1208085MF114). References [1] T. Pittman, Y. Shih, D. Strekalov, A. Sergienko, Optical imaging by means of two-photon quantum entanglement, Phys. Rev. A 52 (1995) R3429. 74
X. Zhai, Z.-d. Cheng, Y.-d. Hu et al.
Optics Communications 448 (2019) 69–75 [30] X. Zhai, Z. Cheng, Y. Wei, Z. Liang, Y. Chen, Compressive sensing ghost imaging object detection using generative adversarial networks, Opt. Eng. 58 (2019) 013108. [31] S. Ren, K. He, R. Girshick, J. Sun, Faster r-cnn: Towards real-time object detection with region proposal networks, Adv. Neural Inf. Process. Syst. (2015) 91–99. [32] W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C.-Y. Fu, A.C. Berg, Ssd: Single shot multibox detector, in: European Conference on Computer Vision, Springer, 2016, pp. 21–37. [33] J. Redmon, S. Divvala, R. Girshick, A. Farhadi, You only look once: Unified, real-time object detection, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition2016, 2016, pp. 779–788. [34] J. Redmon, A. Farhadi, Yolov3: An incremental improvement, arXiv preprint arXiv:1804.02767, 2018. [35] J. Huang, V. Rathod, C. Sun, M. Zhu, A. Korattikara, A. Fathi, I. Fischer, Z. Wojna, Y. Song, S. Guadarrama, Speed/accuracy trade-offs for modern convolutional object detectors, in: IEEE CVPR, 2017. [36] M. Everingham, L. Van Gool, C.K. Williams, J. Winn, A. Zisserman, The pascal visual object classes (voc) challenge, Int. J. Comput. Vis. 88 (2010) 303–338.
[22] I.B. Ciocoiu, Foveated.compressed. sensing, Foveated compressed sensing, circuit theory and design (ECCTD), in: 2011 20th European Conference on, IEEE, 2011, pp. 29–32. [23] G. Carles, S. Chen, N. Bustin, J. Downing, D. McCall, A. Wood, A.R. Harvey, Multi-aperture foveated imaging, Opt. Lett. 41 (2016) 1869–1872. [24] D.B. Phillips, M.-J. Sun, J.M. Taylor, M.P. Edgar, S.M. Barnett, G.M. Gibson, M.J. Padgett, Adaptive foveated single-pixel imaging with dynamic supersampling, Sci. Adv. 3 (2017) e1601782. [25] Y. Kang, Y.-P. Yao, Z.-H. Kang, L. Ma, T.-Y. Zhang, Performance analysis of compressive ghost imaging based on different signal reconstruction techniques, J. Opt. Soc. Amer. A 32 (2015) 1063–1067. [26] L. Bian, J. Suo, Q. Dai, F. Chen, Experimental comparison of single-pixel imaging algorithms, J. Opt. Soc. Amer. A 35 (2018) 78–87. [27] C. Li, An Efficient Algorithm for Total Variation Regularization with Applications To the Single Pixel Camera and Compressive Sensing, Rice University, 2010. [28] M. Harwit, Hadamard Transform Optics, Elsevier, 2012. [29] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, Y. Bengio, Generative adversarial nets, Adv. Neural Inf. Process. Syst. (2014) 2672–2680.
75