Reconstruction for limited-data nonlinear tomographic absorption spectroscopy via deep learning

Reconstruction for limited-data nonlinear tomographic absorption spectroscopy via deep learning

Accepted Manuscript Reconstruction for limited-data nonlinear tomographic absorption spectroscopy via deep learning Jianqing Huang , Hecong Liu , Jin...

995KB Sizes 0 Downloads 20 Views

Accepted Manuscript

Reconstruction for limited-data nonlinear tomographic absorption spectroscopy via deep learning Jianqing Huang , Hecong Liu , Jinghang Dai , Weiwei Cai PII: DOI: Reference:

S0022-4073(18)30280-2 10.1016/j.jqsrt.2018.07.011 JQSRT 6160

To appear in:

Journal of Quantitative Spectroscopy & Radiative Transfer

Received date: Revised date: Accepted date:

23 April 2018 26 June 2018 15 July 2018

Please cite this article as: Jianqing Huang , Hecong Liu , Jinghang Dai , Weiwei Cai , Reconstruction for limited-data nonlinear tomographic absorption spectroscopy via deep learning, Journal of Quantitative Spectroscopy & Radiative Transfer (2018), doi: 10.1016/j.jqsrt.2018.07.011

This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

ACCEPTED MANUSCRIPT

Highlights 

AC

CE

PT

ED

M

AN US

CR IP T

 

A new inversion method based on convolutional neural networks for nonlinear tomographic absorption spectroscopy is demonstrated. The effects of network parameters on the performance of neural networks are investigated. Comparison between CNN and SA is investigated and CNN shows better noise immunity and higher computational efficiency.

Page 1 of 32

ACCEPTED MANUSCRIPT

Reconstruction for limited-data nonlinear tomographic absorption spectroscopy via deep learning

CR IP T

Jianqing Huang, Hecong Liu, Jinghang Dai, and Weiwei Cai*

Key Lab of Education Ministry for Power Machinery and Engineering, School of Mechanical Engineering, Shanghai Jiao Tong University, 800 Dongchuan Road, Shanghai, China, 200240 Corresponding Author: [email protected]

AN US

*

Phone: +86-021-3420-6540

AC

CE

PT

ED

M

Fax: +86-021-3420-6540

Page 2 of 32

ACCEPTED MANUSCRIPT

Abstract Nonlinear tomographic absorption spectroscopy (NTAS) is an emerging gas sensing technique for reactive flows that has been proven to be capable of simultaneously imaging temperature and

CR IP T

concentration of absorbing gas. However, the nonlinear tomographic problems are typically solved with an optimization algorithm such as simulated annealing which suffers from high computational cost. This problem becomes more severe when thousands of tomographic data needs to be processed for the temporal resolution of turbulent flames. To overcome this

AN US

limitation, in this work we propose a reconstruction method based on convolutional neural networks (CNN) which can take full advantage of the large amount tomographic data to build an efficient neural networks to rapidly predict the reconstruction by feeding the sinograms to it.

M

Simulative studies were performed to investigate how the parameters will affect the performance of neural networks. The results show that CNN can effectively reduce the computational cost and

ED

at the same time achieve a similar accuracy level as SA. The successful demonstration CNN in this work indicates possible applications of other sophisticated deep neural networks such as

PT

deep belief networks (DBN) and generative adversarial networks (GAN) to nonlinear

CE

tomography. © 2018 Elsevier Ltd. Keywords: Absorption spectroscopy, nonlinear tomography, convolutional neural networks,

AC

combustion diagnostics

Page 3 of 32

ACCEPTED MANUSCRIPT

1. Introduction Tomographic absorption spectroscopy (TAS) is a versatile imaging technique and has found extensive applications in flow and combustion diagnostics due to its species-specificity,

CR IP T

non-intrusiveness, and fast response [1]. According to the mathematical principles it relies on, TAS can be divided into linear and nonlinear modalities respectively [2]. The linear modality is based on the concept of classical tomography in which the projections are the line-of-sight (LOS) integrals of the field of absorption coefficient to be reconstructed. For linear TAS, there are

AN US

numerous well-established algorithms available for the inversion process such as algebraic reconstruction technique [3, 4], Landweber algorithm [5, 6], and maximum likelihood expectation maximization [7], to name a few. Detailed systematic comparison between these

M

algorithms can be found in [3, 8]. For linear TAS, typically inexpensive diode lasers are employed for its implementation [9, 10]. However, due to the narrow spectral bandwidth, only

ED

one or two non-interfering absorption transitions can be covered by the diode laser. In order to obtain sufficient information about the flow field, usually a large number of LOS measurements

PT

are needed for a satisfactory accuracy and spatial resolution [5]. However, for some applications

CE

e.g. engine measurements, there is extremely restricted optical access and only a limited number of line-of-sight measurements are available. To alleviate the ill-posedness, the arrangement of

AC

probing laser beams should be optimized so that the spatial sampling can be exploited to the maximum extent [11-13]. The second approach to mitigate the ill-posedness of linear TAS is to enhance the

sampling in the spectral domain through broadband absorption spectroscopy using light sources such as supercontinuum [14] and Fourier domain mode locking lasers [15]. However, due to the inherent

limitation,

the linear tomography cannot Page 4 of 32

handle multispectral

information

ACCEPTED MANUSCRIPT

simultaneously in a single reconstruction [2]. The nonlinear tomography can effectively overcome this limitation by formulating the inversion problem as a nonlinear equation system whose variables are the temperature and species concentration distributions. The nonlinear equation system can then be solved with a global optimizer such as simulated annealing (SA)

CR IP T

[16-18]. When a temporal resolution of 10s kHz is required, it will take thousands of hours to finish the reconstructions even for the measured data of one second. Thus, an efficient method is urgently needed to process the large amount of data more rapidly.

Even though the large set of tomographic data poses a challenge for rapid processing, the

AN US

information provided by the data can be exploited to accelerate the reconstructions [19, 20]. For example, proper orthogonal decomposition (POD) can be applied to extract the most salient eigenmodes of the training samples which are either measured previously or predicted from

M

numerical models [21]. The solution of the nonlinear equation system can then be fitted as the combination of the eigenmodes. As the number of eigenmodes needed is much smaller than the

ED

original number of variables, the dimension of the inversion problem can be significantly reduced. It had been demonstrated that the computational time for nonlinear TAS can be reduced

PT

by roughly five times [22] using the POD method.

CE

Here, we propose an alternative approach which is the deep learning network (DNN) [23, 24] and can also take the advantage of the existing tomographic data and the corresponding

AC

reconstructions. Convolutional neural network (CNN) [25] is one of the most prevalent examples of DNN and has found extensive applications. For example, it has been utilized to reduce the artefacts of limited-angle X-ray CT images [26]; to enhance image quality of low-dose X-ray CT reconstructions [27]; to produce reconstructions for ultrasonic tomography [28], to list a few. Excellent reviews of CNN can be found in [25, 29]. However, to the best of the authors’

Page 5 of 32

ACCEPTED MANUSCRIPT

knowledge, CNN has only been applied to modalities of linear tomography [26-28]. In this work, we aims to demonstrate its capability in reducing computational time under the context of nonlinear TAS. The remainder of this work is organized as follows: Section 2 will introduce the principles of both NTAS and CNN; Sections presents the numerical studies including the

CR IP T

parameter tuning of CNN and its comparison with SA; and the final section concludes this work and propose the future research directions.

2. Mathematical background

AN US

2.1 Nonlinear tomographic absorption spectroscopy

The mathematical formulation of the nonlinear tomographic absorption spectroscopy has been introduced in detail in literatures [14, 30]. Here, we briefly introduce its theory for the readers’

M

convenience. As shown in Figure 1, the tomographic field which assumes the distributions of temperature (T) and species concentration (C) is discretized into square pixels. Several parallel

ED

probing beams, each specified by an angel α, a distance d and a wavelength λ, travel through this

PT

field to measure the absorption of the gas. The Beer’s law describes the relationship between T, C, pressure P and absorbance p [2]:

AC

CE

p( )   ln[ I t ( ) / I 0 ( )]

where

and

   [T (l ), C (l ),  , l ] dl

(1)

l

   S[T (l ), g ] [T (l ), C (l ), (g  0 )] P C (l ) dl , l

g

are the incident and transmitted light intensities of the beam, respectively; is

the integration path along the line-of-sight (LOS);

the absorption coefficient;

the

normalized line-shape function, which is a function of T, C, and P; and S[T(l),λg] is the line strength of the g-th non-negligible transition centered at λg that contributed to the absorbance at

Page 6 of 32

ACCEPTED MANUSCRIPT

the wavelength λ. For simplicity, P is assumed to be constant, while T and C are variable from pixel to pixel. For practical applications, Eq. (1) is typically discretized as [2]: n2

pi ( )    k (Tk , Ck ,  ) Lik ,

(2)

CR IP T

k 1

where n2 is the total number of pixels within the discretized field; k represents the k-th pixel; and is the absorption path length of the i-th beam within the k-th pixel. In order to reconstruct T and C distributions, a series of p should be obtained by repeating Eq. (2) for the LOS

 p1,1    p1, j 

pi ,1    , pi , j  1

 p1,1    p1, j 

AN US

measurements at a number of λ’s, and can be organized as a sequence of sinograms as:

pi ,1    pi , j  2

 p1,1    p1, j 

pi ,1    , pi , j  NW

(3)

M

where i is the number of parallel beams , j the number of angels, and NW the number of

ED

wavelengths. These spectrally-resolved sinograms obtained at different wavelengths imply enriched information of the tomographic field and can be inverted to recover the distributions of

PT

T and C with a well-established optimization method, e.g. simulated annealing algorithm (SA) [31]. However, the SA algorithm typically suffers from high computational cost, which hinders

CE

its applications to scenarios where rapid reconstructions are required. Thus, in this paper, we aim

AC

to propose a more efficient algorithm based on deep neural networks.

2.3 Convolutional neural networks Convolutional neural networks (CNN) is one of the most popular deep neural networks,

featuring local receptive fields, shared weights and pooling [32]. Different from full connection neural networks, the convolution and pooling operations of CNN can significantly reduce the

Page 7 of 32

ACCEPTED MANUSCRIPT

number of networks parameters, while maintaining the feature extraction ability of the neural networks. As depicted in Figure 2 [33], and CNN generally includes an input layer, multiple hidden layers, a fully connected layer and an output layer. The hidden layers are composed of

forward and error back-propagation processes. Feed-forward process:

CR IP T

multiple convolutional and pooling layers. The training stage of CNN consists of both feed-

a) Firstly, input images i.e. the sinograms are processed using multiple convolution kernels to extract the most salient features. Then, the results after the convolution operation along

AN US

with the biases are fed into the activation function, and a series of feature maps are obtained finally. The leaky ReLU activation function [34] is adopted in this work, and it is defined as:

C  ReLU(W  a  b ) , if x  0  x, ReLU  x     0.1x, if x  0 ,

ED

M

(4)

where ⃑⃑⃑ is weight matrix for each convolution kernel,

is image to be convoluted, and

) the convolution operation; ⃑ is the bias matrix and

the feature maps to be

PT

( ⃑⃑⃑

(5)

transferred to the next layer.

CE

b) Select a pooling method (e.g. average pooling, maximum pooling, etc.) with a suitable

AC

size of the pooling filters to reduce the dimensions of the feature maps; c) The convolution and pooling operations are repeated until the original input sinograms are reduced to a one-dimensional feature column vector.

d) The characteristic column vector is multiplied by the coefficient matrix and added with the biases to produce the results i.e. the predictions.

Page 8 of 32

ACCEPTED MANUSCRIPT

Error back-propagation process: a) Compare the predicted results with the true values to calculate the error for each pixel, which is then propagate back from layer to layer. b) Set a reasonable learning rate

to update the weights and the biases of each layer, and

CR IP T

stochastic gradient descent method [35] is adopted in this updating process generally. c) Define a loss function, e.g. the L2 loss function is defined as [36]: 2

2 1 B n L2    y*k  yk  , B b 1 k 1

AN US

where B is the total number of T distributions for each batch;

and

(6)

are the true and

predicted values of the k-th pixel, respectively. Batch training should be repeated until the loss function converges. When the training is completed, the CNN structure parameters

M

are determined and ready to be used for prediction.

The trained CNN can be considered as a black box. By feeding a set of sinograms to it,

ED

the predictions can be quickly generated, with a satisfactory accuracy.

PT

3. Settings for simulative studies Simulative studies have been conducted to verify the feasibility of CNN for NTAS

CE

problems. First of all, a total number of 15050 samples were artificially created, 15000 for the

AC

training and 50 for the testing processes. Each of the samples contains a T phantom, a C phantom, and three sinograms respectively. Water vapor was assumed to be target species and probed with a broadband laser source, which can cover three selected absorption transitions. The T phantoms feature three randomly distributed Gaussian peaks on top of a flat plane, and the C phantoms are characterized with two randomly distributed Gaussian peaks overlaid on a paraboloid. These phantoms were generated to mimic the multimodal flames encountered in practical combustion

Page 9 of 32

ACCEPTED MANUSCRIPT

devices [2]. The phantoms were discretized into (

) square pixels, and one example for

both the T and C phantoms are shown in each of the two panels in Figure 3. Six projections, each with 40 parallel laser beams were assumed, resulting in a total number of 240 LOS measurements. Three sinograms, each with a dimensionality of (

) can then be obtained.

CR IP T

These basic settings are the same for the all the simulative cases discussed below if not stated otherwise.

It has been shown in the previous work that the reconstruction of the T distribution can be carried out independently from the reconstruction of the C distribution using the SA algorithm

AN US

[37]. Similarly, the T and C distributions can also be reconstructed separately using the CNN method. Since the procedures to reconstruct the T and C distributions are similar using CNN, in this work we only illustrate how a CNN structure should be designed for T reconstruction.

M

According to the dimensions of the sinograms and the magnitude of the T distribution, the design of the CNN architecture is shown in Figure 4. The hidden layers contain two convolutional

(

layer and

(

) convolution kernels in the first convolutional

) convolution kernels in the second one, i.e. the number of convolution , for narrow convolution [38] with a stride size of 1. The average pooling

PT

kernels is

ED

layers and one pooling layer. We set up

) filters, after which a feature vector

CE

operation was performed with (

(

) was

obtained. After full connections, the feature vector was converted into a column of T distribution ), which can then be easily reformed as a (

AC

(

) T distribution as the expected

output. In particular, the CNN structure designed in this work has the following characteristics: a) Random batch processing. Before each epoch of the training process, 15,000 training samples were randomly divided into 300 batches, i.e. each batch contained 50 samples, and the sequence of these 300 batches that was used to update the CNN was randomly

Page 10 of 32

ACCEPTED MANUSCRIPT

determined. We call this update an iteration. Hence for each epoch of training, there were 300 iterations; and for 100 epochs, there were totally 30,000 iterations. This operation ensures that the data of the same batch in each epoch is different, which will not only improve the convergence rate, but also the prediction accuracy.

CR IP T

b) Adaptive learning rate. Since a smaller learning rate is more suitable for network optimization at the later stage of the training process, fractional function with attenuation was adopted to train CNN in this article, whose mathematical expressions is [39]:

where q is the attenuation coefficient; respectively.

 L,0 1 q t

, q  0.0003 ,

(7)

AN US

 L ,t 

and

the learning rate in t-th and 0-th epoch,

c) Stochastic gradient descent method based on momentum [40]. This method can restrain

M

oscillation, facilitate convergence in the later periods of the training process, and contribute

ED

to escape from local minimum when network parameters oscillate back and forth near the

 t    t 1   L  g ,

(8)

t  t 1  t ,

(9)

CE

PT

local minimum. Mathematically, this method can be expressed as [40]:

where μ is the static momentum coefficient and was set to 0.9; g the first order gradient; the momentum values in t-th and (t-1)-th epoch, respectively; and ω the parameters

AC

and

need to be updated. Finally, in order to quantify the accuracy of CNN's predictions, we define the average

error between the reconstruction and the ground-truth distribution as [30]:

Page 11 of 32

ACCEPTED MANUSCRIPT

402

e

 abs  y k 1

* k

 yk 

402

 y 

.

(10)

*

k 1

k

CR IP T

4. Results and discussion The accuracy of CNN prediction is largely dependent on parameters such as the learning rate

, the number of convolution kernels

number of wavelengths

, the number of training samples

, and the

. Hence, the major focus of this section is to investigate how these

AN US

parameters will affect the performance of CNN and how they should be determined. The optimized CNN structure was then used to compare with the simulated annealing algorithm (SA) [41] for the inversion of the NTAS problems.

M

4.1 Determination of learning rate and number of convolution kernels Setting a proper learning rate is a critical step in the CNN training procedure. An ideal

ED

learning rate will promote convergence; on the other hand, an improper one will lead to a failure of the training process due to either the vanishing or exploding gradient problems [42]. However,

PT

there is no mathematically rigorous method to determine the optimal learning rate. But a proper

CE

one can be determined according to the evolution of the loss function, which is the most commonly adopted approach for training CNN. Keeping the other three parameters as constant ,

,

), five different learning rates were used to train CNN,

AC

i.e. (

and the evolution of the respective loss functions was shown in Figure 5. As can be seen from the figure, when the learning rate was too small (e.g.

), the loss function

converged slowly and to a local minimum. Increasing the learning rate could speed up the convergence and improve the predictions, but when the learning rate was too large (e.g.

Page 12 of 32

ACCEPTED MANUSCRIPT

), the loss function diverged and the training process failed. Therefore, for the NTAS inversion problems studied here,

was suggested.

The number of convolution kernels is one of the major factors that dictates the complexity of CNN, and an appropriate configuration of convolution kernels facilitates the quick

CR IP T

extraction of the characteristics from the training set and improves the learning efficiency. Here the simulations with four configurations of the convolution kernels had been performed to study the convergent behavior of the loss function. As shown in Figure 6, CNN with 8+14 convolution kernels performed best during the training process, with the fastest convergence rate and the

AN US

minimum loss function. On one hand, CNN with a few kernels is insufficient to extract useful features from the training set. On the other hand, an over-estimated number of convolution kernels not only complicates the training process, but also significantly increases the training

M

time. For example, the training time for the case ( times as much as the case (

,

,

) was 12

). Consequently, considering both the

.

PT

work was suggested to be

ED

training time and convergence of the loss function, the CNN convolution kernels designed in this

CE

4.2 Effects of number of training samples and wavelengths used Machine learning algorithms such as CNN are data-driven and a sufficient number of

AC

samples should be used to extract the useful features during the learning process [43]. To explore the influence of the number of training samples on the prediction accuracy of the T distributions, five groups of training sets with different sizes of sample sets were generated in the same way as introduced in Section 3 for simulative studies, for which the parameters were fixed as constants as (

,

,

). The reconstruction error and the computational

time as a function of number of training samples are shown in Panel (a) and (b) of Figure 7 Page 13 of 32

ACCEPTED MANUSCRIPT

respectively. As can be seen, the reconstruction error decreases as more training samples were used; however, when

, the reconstruction error was reduced in a slower pace,

indicating NS was no longer the major factor affecting reconstruction accuracy. In addition, as the training time scales linearly with respect to

, it is not sensible to train an excessively large

CR IP T

number of samples. Thus, in this work 25000 samples were adopted for the CNN training procedure discussed in the sections below.

The successful implementation of NTAS relies on multispectral measurements. For CNN,

AN US

it is easy to accommodate multi-channel inputs, which are the sinograms for a few transitions. The testing results were shown in Figure 8. As can be seen, reconstruction error decreased gradually as

increased. The reason is that the more wavelengths used, the richer spectral

information can be incorporated into the CNN training process so that CNN can connect more

M

features of the sinograms to the T distribution, which is the key to the performance of CNN. It has to be pointed out that, when only one transition was used i.e.

, SA was unable to

ED

reconstruct the T distribution, but CNN can still work with an acceptable reconstruction error

PT

that was slightly larger than 6%. This difference roots in the principles of both methods. The SA algorithm relies on the solution of the set of nonlinear equations which can be satisfied with an

CE

infinite number of solution when only one transition was used. However, the CNN circumvents the solution of the nonlinear equation system by directly mapping the extracted features of the

AC

sinogram(s) to the reconstructions. Such mapping can be very accurate when a sufficiently large number of training samples were used. Furthermore, the influence of

is not the larger the better. On one hand,

became weaker when the prediction accuracy reached its peak as

increases; on the other hand, a too large

will increase computational cost. Thus, the

simulative studies discussed below were all conducted assuming five transitions were used.

Page 14 of 32

ACCEPTED MANUSCRIPT

4.3 Comparison between CNN and SA Knowing the influence of learning rate, the number of convolution kernels, the number of training samples and the wavelengths on CNN prediction accuracy, an optimal CNN framework

,

,

,

CR IP T

to reconstruct temperature distribution has been designed with:

Three representative T phantoms along with the corresponding

reconstructions from CNN are shown in Figure 9. Qualitatively, there was high degree of similarity between the phantoms and the reconstructions, and the locations and magnitudes of the

AN US

Gaussian peaks were in good agreement. The results demonstrated that the designed CNN effectively extracted the T distribution modals from the training samples while retaining smoothness property perfectly. We also repeated the same reconstruction procedure using the sinograms with dimensions of (

) and (

), meaning less probing beams were used.

M

Results show that reconstruction errors increased slightly but was still less than 3%. As can be seen from Figure 9, when the number of Gaussian peaks increased or the peak was near the edge,

ED

the reconstruction error was larger. This may be due to the small amount of LOS measurements

PT

and the inherent lack of spatial resolution. In order to validate the reconstructions of CNN, we compared it against the SA algorithm,

CE

which currently is the most widely adopted inversion method for the NTAS problems [31, 41]. As noise prevails in practical applications, artificial Gaussian noise was added to the LOS

AC

measurements. Four sets of testing samples each with a different noise level (denoted as ) were prepared for the comparative study. As can be seen from Figure 10, without noise i.e.

,

the reconstruction errors for both algorithms were almost the same. As the noise level increases, the reconstruction error of SA increased significantly faster than CNN. In other words, CNN has a better noise immunity than SA, suggesting a good prospect of CNN for practical applications.

Page 15 of 32

ACCEPTED MANUSCRIPT

This can be explained by the fact that the random noise does not have any salient features, after tens of thousands of convolution, pooling and activation operations, the noise was filtered and its inference was continuously weakened. In addition, compared with SA, CNN has an overwhelming advantage in term of

CR IP T

computational efficiency. Both algorithms were implemented on the same computer with an Intel® Core™ i7-DMI2-X79 PHC 2.60 GHz CPU. For the same testing cases, CNN completed the each reconstruction in about 0.7 milliseconds, while SA took about 10 hours. These two critical advantages of CNN make it a promising technique for real-time measurements. It has to

AN US

be noted that although CNN's training process took about 1 hour, once the networks were established it can be used continuously to process the data.

M

5. Conclusions

In conclusion, this paper proposes a new inversion method for solving nonlinear tomographic

ED

problems based on convolutional neural network (CNN). The simulative studies performed in this work have shown that the temperature distribution can be rapidly reconstructed using an

PT

optimized CNN structure, when a large set of training samples is available. Compared with

CE

simulated annealing (SA) algorithm, CNN can achieve comparable accuracy but has a better noise immunity. These two desirable features suggest that CNN is a promising technique for

AC

applications where rapid data processing or real-time monitoring is necessary. The successful demonstration of CNN in this work also indicates possible applications of other sophisticated deep neural networks such as deep belief networks (DBN) [44] and generative adversarial networks (GAN) [45] to nonlinear tomography.

Page 16 of 32

ACCEPTED MANUSCRIPT

Acknowledgement This work was funded by National Science Foundation of China under grant number 51706141 and the Chinese Government ‘Thousand Youth Talent Program’.

CR IP T

References

AC

CE

PT

ED

M

AN US

[1] Cai W, Kaminski CF. Tomographic absorption spectroscopy for the study of gas dynamics and reactive flows. Progress in Energy & Combustion Science 2017; 59:1-31. [2] Dai J, Yu T, Xu L, Cai W. On the regularization for nonlinear tomographic absorption spectroscopy. Journal of Quantitative Spectroscopy and Radiative Transfer 2018; 206:233-241. [3] Yu T, Cai W. Benchmark evaluation of inversion algorithms for tomographic absorption spectroscopy. Applied Optics 2017; 56:2183. [4] Wang F, Cen KF, Li N et al. Two-dimensional tomography for gas concentration and temperature distributions based on tunable diode laser absorption spectroscopy. Measurement Science & Technology 2010; 21:045301. [5] Liu C, Xu L, Chen J et al. Development of a fan-beam TDLAS-based tomographic sensor for rapid imaging of temperature and gas concentration. Optics Express 2015; 23:22494-22511. [6] Yang WQ, Spink DM, York TA, Mccann H. An image-reconstruction algorithm based on Landweber's iteration method for electrical-capacitance tomography. Measurement Science & Technology 1999; 10:1065. [7] Busa KM, Mcdaniel JC, Brown MS, Diskin GS. Implementation of maximum-likelihood expectation-maximization algorithm for tomographic reconstruction of TDLAT measurements. Aerospace Sciences Meeting. 2014. [8] Daun KJ. Infrared species limited data tomography through Tikhonov reconstruction. Journal of Quantitative Spectroscopy & Radiative Transfer 2010; 111:105-115. [9] Stritzke F, Van dKS, Feiling A et al. Ammonia concentration distribution measurements in the exhaust of a heavy duty diesel engine based on limited data absorption tomography. Optics Express 2017; 25:8180. [10] Zhang L, Wang F, Zhang H et al. Simultaneous measurement of gas distribution in a premixed flame using adaptive algebraic reconstruction technique based on the absorption spectrum. Chinese Optics Letters 2016; 14:66-70. [11] Mccormick D, Twynstra MG, Daun KJ, Mccann H. Optimising laser absorption tomography beam arrays for imaging chemical species in gas turbine engine exhaust plumes. International Society for Industrial Process Tomography 2013. [12] Daun KJ, Twynstra MG. Laser-absorption tomography beam arrangement optimization using resolution matrices. Applied Optics 2012; 51:7059-7068. [13] Yu T, Tian B, Cai W. Development of a beam optimization method for absorption-based tomography. Optics Express 2017; 25:5982-5999. [14] Cai W, Kaminski CF. A tomographic technique for the simultaneous imaging of temperature, chemical species, and pressure in reactive flows using absorption spectroscopy with frequency-agile lasers. Applied Physics Letters 2014; 104:545-562.

Page 17 of 32

ACCEPTED MANUSCRIPT

AC

CE

PT

ED

M

AN US

CR IP T

[15] Kranendonk LA, An X, Caswell AW et al. High speed engine gas thermometry by Fourierdomain mode-locked laser absorption spectroscopy. Optics Express 2007; 15:15115-15128. [16] Dai J, O'Hagan S, Liu H et al. Hyperspectral tomography based on multi-mode absorption spectroscopy (MUMAS). Applied Physics Letters 2017; 111:184102. [17] Cai W, Ewing DJ, Ma L. Investigation of temperature parallel simulated annealing for optimizing continuous functions with application to hyperspectral tomography. Applied Mathematics & Computation 2011; 217:5754-5767. [18] Cai W, Kaminski CF. A numerical investigation of high-resolution multispectral absorption tomography for flow thermometry. Applied Physics B 2015; 119:29-35. [19] Yu T, Liu H, Zhang J et al. Toward real-time volumetric tomography for combustion diagnostics via dimension reduction. Optics Letters 2018; 43:1107. [20] Yu T, Cai W, Liu Y. Rapid tomographic reconstruction based on machine learning for timeresolved combustion diagnostics. Review of Scientific Instruments 2018; 89:043101. [21] Torniainen ED, Gouldin FC. Tomographic reconstruction of 2-D absorption coefficient distributions from a limited set of infrared absorption data. Combustion Science & Technology 1998; 131:85-105. [22] Cai W, Ma L. Hyperspectral tomography based on proper orthogonal decomposition as motivated by imaging diagnostics of unsteady reactive flows. Applied Optics 2010; 49:601-610. [23] Schmidhuber J. Deep Learning in neural networks: An overview. Neural Networks 2015; 61:85-117. [24] Sozykin AV. An overview of methods for deep learning in neural networks. Vestn YuUrGU Ser Vych Matem Inform 2017; 6:28-59. [25] Mccann MT, Jin KH, Unser M. Convolutional neural networks for inverse problems in imaging: A review. IEEE Signal Processing Magazine 2017; 34:85-95. [26] Zhang H, Li L, Qiao K et al. Image prediction for limited-angle tomography via deep learning with convolutional neural network. arXiv preprint arXiv 2016; 1607:08707. [27] Chen H, Zhang Y, Zhang W et al. Low-dose CT via convolutional neural network. Biomedical Optics Express 2017; 8:679. [28] Lähivaara T, Kärkkäinen L, Huttunen JMJ, Hesthaven JS. Deep convolutional neural networks for estimating porous material parameters with ultrasound tomography. The Journal of the Acoustical Society of America 2018; 143:1148-1158. [29] Zhou FY, Jin LP, Dong J. Review of convolutional neural network. Chinese Journal of Computers 2017; 40:1229-1257. [30] Cai W, Kaminski CF. Multiplexed absorption tomography with calibration-free wavelength modulation spectroscopy. Applied Physics Letters 2014; 104:4788-4797. [31] Cai W, Ma L. Applications of critical temperature in minimizing functions of continuous variables with simulated annealing algorithm. Computer Physics Communications 2010; 181:1116. [32] Ahn, SungMahn. Deep learning architectures and applications. Journal of Intelligence & Information Systems 2016; 22:127-142. [33] Ozcan A, Günaydin H, Wang H et al. Deep learning microscopy. Optica 2017; 4:1437-1443. [34] Xu B, Wang N, Chen T, Li M. Empirical evaluation of rectified activations in convolutional network. arXiv preprint arXiv 2015; 1505:00853. [35] Ketkar N. Deep learning with python. Apress; 2017. [36] Hennig C, Kutlukaya M. Some thoughts about the design of loss functions. Revstat Statistical Journal 2007; 5:19-39.

Page 18 of 32

ACCEPTED MANUSCRIPT

AC

CE

PT

ED

M

AN US

CR IP T

[37] Ma L, Cai W. Numerical investigation of hyperspectral tomography for simultaneous temperature and concentration imaging. Applied Optics 2008; 47:3751. [38] Kalchbrenner N, Grefenstette E, Blunsom P. A convolutional neural network for modelling sentences. arXiv preprint arXiv 2014; 1404:2188. [39] Cosatto E, Saito A, Kiyuna T. Automated gastric cancer diagnosis on H&E-stained sections; ltraining a classifier on a large scale with multiple instance machine learning. International Society for Optics and Photonics 2013; 8676:05. [40] Srinivasan V, Sankar AR, Balasubramanian VN. ADINE: An adaptive momentum method for stochastic gradient descent. Proceedings of the ACM India Joint International Conference on Data Science and Management of Data 2018:249-256. [41] Cai W, Ewing DJ, Ma L. Application of simulated annealing for multispectral tomography. Computer Physics Communications 2008; 179:250-255. [42] Bengio Y, Simard P, Frasconi P. Learning long-term dependencies with gradient descent is difficult. IEEE Transactions on Neural Networks 2002; 5:157-166. [43] Ayodele TO. Types of machine learning algorithms. New advances in machine learning; 2010. [44] Keyvanrad MA, Homayounpour MM. A brief survey on deep belief networks and introducing a new object oriented MATLAB toolbox (DeeBNet V2.2). arXiv preprint arXiv 2014; 1408:3264. [45] Goodfellow IJ, Pouget-Abadie J, Mirza M et al. Generative adversarial networks. Advances in Neural Information Processing Systems 2014; 3:2672-2680.

Page 19 of 32

ACCEPTED MANUSCRIPT

Figure Captions Figure 1. Illustration of the mathematical formulation of nonlinear tomographic absorption spectroscopy (NTAS) and the discretized tomographic field. Parallel probing beams are specified by α, d and . Figure 2. Illustration of convolutional natural networks (CNN), including (a) training process and (b) testing process.

CR IP T

Figure 3. Panel (a): an example temperature phantom with three randomly distributed Gaussian peaks in the plane used for the simulative studies. Panel (b): an example concentration phantom with two randomly distributed Gaussian peaks overlaid on a paraboloid used for the simulative studies. Figure 4. The framework of CNN for NTAS inversion. FC: full connection.

AN US

Figure 5. Evolution of the loss function for each of the four learning rate the loss function indicates divergence and failure on the training process.

. Sudden increase in

Figure 6. Evolution of the loss function with five configurations of convolution kernels Figure 7. Panel (a): loss function for cases with different number of training samples (b): the training time as a function of number of training samples.

M

Figure 8. Panel (a): loss function as a function of number of wavelengths training time as a function of number of wavelengths.

. ; Panel

. Panel (b): the

ED

Figure 9. Comparisons between three representative phantoms (the first row) and the corresponding reconstructions (the second row) obtained via CNN.

AC

CE

PT

Figure 10. Reconstruction errors from two sets of cases with different noise levels. The red and blue lines are the results for CNN and SA, respectively.

Page 20 of 32

AN US

CR IP T

ACCEPTED MANUSCRIPT

AC

CE

PT

ED

M

Figure 1. Illustration of the mathematical formulation of nonlinear tomographic absorption spectroscopy (NTAS) and the discretized tomographic field. Parallel probing beams are specified by α, d and .

Page 21 of 32

M

AN US

CR IP T

ACCEPTED MANUSCRIPT

AC

CE

PT

ED

Figure 2. Illustration of convolutional natural networks (CNN), including (a) training process and (b) testing process.

Page 22 of 32

AN US

CR IP T

ACCEPTED MANUSCRIPT

AC

CE

PT

ED

M

Figure 3. Panel (a): an example temperature phantom with three randomly distributed Gaussian peaks in the plane used for the simulative studies. Panel (b): an example concentration phantom with two randomly distributed Gaussian peaks overlaid on a paraboloid used for the simulative studies.

Page 23 of 32

CR IP T

ACCEPTED MANUSCRIPT

AC

CE

PT

ED

M

AN US

Figure 4. The framework of CNN for NTAS inversion. FC: full connection.

Page 24 of 32

M

AN US

CR IP T

ACCEPTED MANUSCRIPT

AC

CE

PT

ED

Figure 5. Evolution of the loss function for each of the four learning rate  L . Sudden increase in the loss function indicates divergence and failure on the training process.

Page 25 of 32

M

AN US

CR IP T

ACCEPTED MANUSCRIPT

AC

CE

PT

ED

Figure 6. Evolution of the loss function with five configurations of convolution kernels N K .

Page 26 of 32

AC

CE

PT

ED

M

AN US

CR IP T

ACCEPTED MANUSCRIPT

Page 27 of 32

M

AN US

CR IP T

ACCEPTED MANUSCRIPT

AC

CE

PT

ED

Figure 7. Panel (a): loss function for cases with different number of training samples NS ; Panel (b): the training time as a function of number of training samples.

Page 28 of 32

AC

CE

PT

ED

M

AN US

CR IP T

ACCEPTED MANUSCRIPT

Page 29 of 32

M

AN US

CR IP T

ACCEPTED MANUSCRIPT

AC

CE

PT

ED

Figure 8. Panel (a): loss function as a function of number of wavelengths NW . Panel (b): the training time as a function of number of wavelengths.

Page 30 of 32

AN US

CR IP T

ACCEPTED MANUSCRIPT

AC

CE

PT

ED

M

Figure 9. Comparisons between three representative phantoms (the first row) and the corresponding reconstructions (the second row) obtained via CNN.

Page 31 of 32

M

AN US

CR IP T

ACCEPTED MANUSCRIPT

AC

CE

PT

ED

Figure 10. Reconstruction errors from two sets of cases with different noise levels. The red and blue lines are the results for CNN and SA, respectively.

Page 32 of 32