Computers in Biology and Medicine 42 (2012) 714–723
Contents lists available at SciVerse ScienceDirect
Computers in Biology and Medicine journal homepage: www.elsevier.com/locate/cbm
Three penalized EM-type algorithms for PET image reconstruction Yueyang Teng a,b,n, Tie Zhang a a b
School of Sciences, Northeastern University, 110004 Shenyang, China Neusoft Positron Medical Systems Co., Ltd., 110179 Shenyang, China
a r t i c l e i n f o
a b s t r a c t
Article history: Received 9 January 2011 Accepted 18 April 2012
Based on Bayes theory, Green introduced the maximum a posteriori (MAP) algorithm to obtain a smoothing reconstruction for positron emission tomography. This algorithm is flexible and convenient for most of the penalties, but it is hard to guarantee convergence. For a common goal, Fessler penalized a weighted least squares (WLS) estimator by a quadratic penalty and then solved it with the successive over-relaxation (SOR) algorithm, however, the algorithm was time-consuming and difficultly parallelized. Anderson proposed another WLS estimator for faster convergence, on which there were few regularization methods studied. For three regularized estimators above, we develop three new expectation maximization (EM) type algorithms to solve them. Unlike MAP and SOR, the proposed algorithms yield update rules by minimizing the auxiliary functions constructed on the previous iterations, which ensure the cost functions monotonically decreasing. Experimental results demonstrated the robustness and effectiveness of the proposed algorithms. & 2012 Elsevier Ltd. All rights reserved.
Keywords: Auxiliary function Noise propagation Regularization method Smoothness penalty Penalty parameter
1. Introduction Positron emission tomography (PET) as an important imaging tool in modern medicine provides noninvasive quantification of the biochemical and biological activity inside a living subject. Many reconstruction methods such as analytic and iterative methods have been developed and applied in clinical practice. The iterative methods have attracted more attention because they can generally model imaging physics and suppress noise artifacts better than the analytic methods. Up to date, the iterative algorithms are mainly based on either the maximum likelihood (ML) estimate or the weighted least squares (WLS) model. As too much noise appears or more smoothing reconstruction is required, the iterative methods are still incapable. So the regularization techniques are developed to control noise propagation or smooth homogeneous areas. The ML-expectation maximization (MLEM) algorithm [1] has been widely used in PET reconstruction, which takes into account that the counts of each projection follow Poisson distribution. Every iteration of the EM algorithm consists of two steps: E-step and M-step. In the E-step, the algorithm computes a conditional expectation. In the M-step, the algorithm maximizes the conditional expectation. It is well-known that the ML is equivalent to minimizing Kullback–Leibler (KL) divergence between measured
n Corresponding author at: School of Sciences, Northeastern University, 110004 Shenyang, China. Tel.: þ86 2483661976; fax: þ86 2483661796. E-mail address:
[email protected] (Y. Teng).
0010-4825/$ - see front matter & 2012 Elsevier Ltd. All rights reserved. http://dx.doi.org/10.1016/j.compbiomed.2012.04.004
projections and estimated projections. Then the E-step can be seen as constructing an auxiliary function at the previous iteration and the M-step is to minimize this auxiliary function [2,3]. To improve image quality when much noise was present, Green [4] developed the Bayesian reconstruction to control noise behavior by the maximum a posteriori (MAP) algorithm that introduced a penalty term to reflect some prior information of reconstruction. The one-step-late (OSL) formulation was used to yield the required solution, which was extremely convenient and flexible to combine many penalties. However, there is a limitation that the penalty parameter must carefully be selected to obtain good convergence properties. Aiming at the smoothing reconstruction and following the fact of a projection approximating its estimate, Fessler [5] introduced a penalized WLS (PWLS) model. This work restricts the penalty term to a quadratic function, then the model becomes a quadratic programming, so that it can be solved with the successive over-relaxation (SOR) algorithm. There is a difference between MAP and SOR that the updates to pixels for the latter are done sequentially (also called coordinatedescent) in contrast to the former that simultaneously updates all pixels. In general, sequential algorithms are difficult to parallelize. In addition, as pointed out in [6], this was at the cost of running time. Anderson et al. [7] proposed another WLS model to obtain a rapid algorithm with higher contrast. On it, there were few regularization methods studied. In this paper we develop three EM-type algorithms to solve the Bayesian estimate and two PWLS models of Fessler and Anderson. By the concepts of auxiliary function, we can yield the update rules to iteratively solve them. The auxiliary function is constructed on
Y. Teng, T. Zhang / Computers in Biology and Medicine 42 (2012) 714–723
the previous iteration, and its minimum as new iteration can decrease the cost function. Beside the monotonicity, the proposed algorithms also enjoy automatic satisfaction of the nonnegativity constraints without any adjustable learning rate required. Details of these algorithms were presented and the performances were evaluated by two simulated phantoms and a real clinical data set.
2. Methodology 2.1. A review We denote all vectors as column vectors unless specified. Let x A RJ represent the image and y A RI be the projections; xj is the value of the j-th pixel and yi is the value of the i-th projection. And P denote the system matrix by W A RIJ satisfying Ii ¼ 1 W ij ¼ 1. Assuming that each projection was Poisson variable, Shepp and Vardi introduced the log-likelihood function l(x) for reconstruction. The ML estimate consists in choosing the image to maximize l(x): lðxÞ ¼
I X
½yi lnðWxÞi ðWxÞi þ yi !
ð1Þ
i¼1
The log-likelihood function can be viewed as the opposite KL-divergence between the real projections y and the estimated projections Wx without regard to the constant term. The ML estimate amounts to minimize the KL-divergence below: I X
F L ðxÞ ¼
½ðWxÞi yi lnðWxÞi
ð2Þ
i¼1
We have removed the constant term for convenience. Fessler considered each projection approximating its estimate, and then a WLS estimator was introduced: F W ðxÞ ¼
I 1X ½yi ðWxÞi 2 2i¼1 Dii
ð3Þ
where D is a diagonal matrix predetermined, and it is usually obtained by preprocessing y to avoid division by zero. Following this idea, another WLS estimator was introduced by Anderson et al. to obtain a rapid reconstruction: F A ðxÞ ¼
I 1X ½yi ðWxÞi 2 2i¼1 ðWxÞi
ð4Þ
Fessler’s and Anderson’s models can be considered as
w2 -divergence between y and Wx [8,9]. Both KL- and w2 -divergence are included in Amari’s a-divergence [10,11]. The estimators above have been used by an overwhelming majority of iterative reconstruction methods. Regardless of FL, FW or FA, we can always impose a penalty V to them for a smoothing reconstruction: min x
s:t:
FðxÞ ¼ FðxÞ þ bVðxÞ xj Z 0,
j ¼ 1, . . . ,J
ð5Þ
where FL, FW and FA are collectively called F, and b is the penalty parameter. For F L þ bV, it is the Bayesian estimate introduced by Green. There is a slight difference between Green’s model and Eq. (5), since the former maximizes FðxÞ. By the one-step-late strategy, the following update can be obtained: xtj þ 1 ¼
xtj
I X
@Vðxt Þ i ¼ 1 1þb @xj
W ij
yi ðWxt Þi
ð6Þ
715
In spite of the flexibility and convenience in applications, there is no theory or experiment to guarantee its convergence for any penalty parameter. It also takes a risk in violating the physical nonnegativity of pixels because of the uncertainty as to the positivity/negativity of 1 þ b@Vðxt Þ=@xj . Generally, a smaller b is required to obtain stable performance. For F W þ bV, Fessler has restricted V to quadratic form and then solved it by the SOR algorithm. This algorithm is a variant of the Gauss–Seidel method for solving a linear system of equations. The extrapolation takes a weighted average between the previous iteration and the computed Gauss–Seidel iteration successively for each component. For F A þ bV, although Anderson’s algorithm converges faster than the MLEM algorithm, it is difficult to obtain a smoothing reconstruction. Then a penalty is needed to play a role in the improvement over image quality. However, few researches have explored the corresponding PWLS reconstructions. 2.2. New update rules We assume that the penalty term V has the following form: J
VðxÞ ¼
J
1XX p Sðx ,x Þ 2 j ¼ 1 l ¼ 1 jl j l
ð7Þ
where pjl Z 0 represent the weights and S is a convex function in two-dimensional real space. The next we denote the auxiliary function and utilize it to seek to solve Eq. (5). Definition 1. Denote gðx,x0 ) an auxiliary function for G(x) on x0 if gðx0 ,x0 Þ ¼ Gðx0 Þ and gðx,x0 Þ ZGðxÞ. It is clear that G is decreasing under the update x00 ¼ minx gðx,x0 Þ, which can be proved by Gðx00 Þ r gðx00 ,x0 Þ r gðx0 ,x0 Þ ¼ Gðx0 Þ. The auxiliary function has been used in many places [12,13]. As following, we respectively construct the auxiliary functions fL, fW, fA and v for FL, FW, FA and V on xt. Certainly, f þ bv is an auxiliary function of F þ bV. Let lij ¼ W ij xtj =ðWxt Þi , we know that FL has the following P auxiliary function according to lij Z 0 and Jj ¼ 1 lij ¼ 1 [2,3]: 2 3 J I X X W x ij j t 4ðWxÞi yi 5 ð8Þ f L ðx,x Þ ¼ lij ln i¼1
j¼1
lij
Similarly, we construct the auxiliary functions for FW and FA: 2 3 J I X W ij xj 2 5 1X 1 4 2 t ð9Þ f W ðx,x Þ ¼ y 2yi ðWxÞi þ lij 2 i ¼ 1 Dii i lij j¼1 2 3 J I X W ij xj 1 5 1X 2 4 f A ðx,x Þ ¼ 2yi þ ðWxÞi þ yi lij 2i¼1 lij j¼1 t
ð10Þ
It is easy to verify that f W ðxt ,xt Þ ¼ F W ðxt Þ and f A ðxt ,xt Þ ¼ F A ðxt Þ. Following the convexity of cðxÞ ¼ x2 and x1 , we can prove that P PJ l ðW ij xj =lij Þ2 Z ðWxÞ2i and Jj ¼ 1 lij ðW ij xj =lij Þ1 ZðWxÞ1 i , so j ¼ 1 ij t t that f W ðx,x Þ Z F W ðxÞ and f A ðx,x Þ Z F A ðxÞ. Based on the similar concept, again, we can construct an auxiliary function for S. An equation need firstly be obtained: ! ! t! xtj xj 1 2xj xj 1 ¼ þ ð11Þ xl 2 2 2xl xtl xtl Following the convexity of S, we can get Sðxj ,xl Þ r 12 Sð2xj xtj ,xtl Þ þ 12Sðxtj ,2xl xtl Þ ¼ sðxj ,xl Þ
ð12Þ
716
Y. Teng, T. Zhang / Computers in Biology and Medicine 42 (2012) 714–723
It is easy to verify that sðxtj ,xtl Þ ¼ Sðxtj ,xtl Þ, so s is indeed an auxiliary function of S, then an auxiliary function of V can be written as J
vðx,xt Þ ¼
J
1XX p ½Sð2xj xtj ,xtl Þ þ Sðxtj ,2xl xtl Þ 4 j ¼ 1 l ¼ 1 jl J
xtj þ 1 ¼ xtj
J
1XX ¼ ½p Sð2xj xtj ,xtl Þ þ plj Sðxtl ,2xj xtj Þ 4 j ¼ 1 l ¼ 1 jl
ð13Þ
We should note that fL, fW, fA and v are separable on x, i.e., to minimize them is equivalent to obtain each minimal xj, respectively. Take the partial derivatives of fL, fW, fA and v: I xtj X @f L ðx,xt Þ yi ¼ 1 W @xj xj i ¼ 1 ij ðWxt Þi
ð14Þ
" # I X xj @f W ðx,xt Þ 1 ¼ W ij yi þ t W ij ðWxt Þi @xj D xj i ¼ 1 ii t @f A ðx,xt Þ 1 1 xj ¼ @xj 2 2 xj
!2
I X
" W ij
i¼1
yi ðWxt Þi
ð16Þ
ð17Þ
@f ðx,xt Þ @vðx,xt Þ þb ¼0 @xj @xj
ð18Þ
where f represents fL, fW or fA.
2.3. Quadratic penalty as special case We focus on the penalty term with quadratic form as 2 Sðxp j ,x ffiffiffil Þ ¼ ðxj xl Þ , where pjl ¼ 1 if orthogonal nearest neighbors, 1= 2 for diagonal neighboring pixels, and 0 otherwise. Then the partial derivative of v can be formulated in detail: X X X @vðx,xt Þ ¼ 4xj pjl 2xtj pjl 2 pjl xtl @xj lAN lAN lAN
ð19Þ
j
where Nj is the set of eight neighbors of the j-th pixel. For the Bayesian estimate, Fessler’s and Anderson’s PWLS models, Eq. (18) can respectively be transformed to quadratic, linear and cubic equations in one variable with real coefficients. P For the Bayesian estimate, denote aj ¼ 4b l A Nj pjl , bj ¼ P P PI t t t 12b½xj l A Nj pjl þ l A Nj pjl xl and cj ¼ xj i ¼ 1 W ij yi =ðWxt Þi , then aj x2j þbj xj þcj ¼ 0 has the sole positive solution: qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 2 bj þ bj 4aj cj tþ1 ¼ ð20Þ xj 2aj P For Fessler’s PWLS model, let aj ¼ ð1=xtj Þ Ii ¼ 1 W ij ðWxt Þi = PI P P D þ4b l A Nj pjl and bj ¼ i ¼ 1 W ij yi =Dii 2b½xtj l A Nj pjl þ Pii t l A N j pjl xl , and the solution of aj xj þbj ¼ 0 is also positive: xtj þ 1 ¼
bj aj
ð21Þ P
ðW 0 D1 Wxt Þj
vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi " #2ffi u I uX y i xtj þ 1 ¼ xtj t W ij ðWxt Þi i¼1
ð22Þ
ð23Þ
It should be noted that the latter is different from Anderson’s update rule. In addition, there are some simpler quadratic P penalties, for example, VðxÞ ¼ Jj ¼ 1 ðxj CÞ2 with the given C [3] and VðxÞ ¼ JxJ22 [14]. They are naturally separable on all xj, then we need not construct auxiliary functions for them. Note that they do not consider the local smoothness information, so that the reconstructions can be less smoothing. These subjects, however, are outside the scope of this paper.
3. Simulation
#2
Then we can solve the following equation in R1 to obtain the update for every pixel:
j
ðW 0 D1 yÞj
ð15Þ
" # J @Sð2xj xtj ,xtl Þ @Sðxtl ,2xj xtj Þ @vðx,xt Þ 1X ¼ pjl þ plj @xj 4l¼1 @xj @xj
j
If considering b ¼ 0, we can obtain two new updates for Fessler’s and Anderson’s WLS models:
P
We studied the behavior of the proposed algorithms compared to MLEM, MAP and SOR. The proposed algorithms based on the Bayesian estimate, Fessler’s and Anderson’s PWLS models were called PMLEM, PFWLS and PAWLS. The experiments were performed on Lenovo PC with 3.00 GHz Pentium 4 (Dual Core) and 1.5 GB memory. All algorithms were implemented in MATLAB. 3.1. Materials and reconstructions A simulated Shepp-Logan phantom was used to evaluate the performance of the algorithms, which was shown in Fig. 1(a) with 64 64 grids and 3 mm pixels. A circular phantom, shown in Fig. 2(a), with the same parameters was also used, which had clear relative activities for four parts (air, background, cold sphere and hot sphere, and the ratio of activities with 0:2:1:8). There are many advantages to use simulated phantoms including prior knowledge of pixel values and controllable noise. Total counts were 105 for both simulated phantoms. The system matrix was obtained using the ‘‘angle of view’’ method [1]. The diagonal matrix D was computed using Fessler’s ‘‘data-plugin’’ technique. For the SOR algorithm, the relaxation factor o was set to be 0.7 as recommended by [5]. The function ROOTS of MATLAB was called to solve the cubic polynomials for Anderson’s PWLS reconstruction method. By the system matrix, we forwardly projected both of simulated phantoms on the sinograms with 64 radial bins (bin size 3 mm) and 96 angular views evenly spaced over p, which were shown in Figs. 1(b) and 2(b). The noisy projections were obtained by Fessler’s pseudo-random formulation: n n 1 n yi ¼ ci Poissonfc1 i ðyi þ ai yi Þgc i Poissonfc i ai yi g
ð24Þ
where yni was the observed projection, ai ¼ 10% simulated the contribution of random events and ci was the i-th detector efficiency. We selected ci ¼1 since we paid no attention to detector efficiency. Note that yi follows no longer Poisson distribution. To evaluate the performance of the algorithms on different noise distributions and levels, we also generated Poisson-noisy projections by the following formulation: yi ¼ wyni þ ð1wÞPoissonfyni g
ð25Þ
For Anderson’s, denote aj ¼ 4b l A Nj pjl , bj ¼ 12 2b½xtj l A Nj pjl þ P PI t 2 t 1 t 2 i ¼ 1 W ij ½yi =ðWx Þi , and we can solve l A N j pjl xl and c j ¼ 2 ðxj Þ 3 the cubic polynomial aj xj þ bj x2j þ cj ¼ 0 to obtain the update rule.
where 0 r wr 1 represented weight coefficient. And SNR (signal to noise ratio) was used to measure noise level:
Appendix shows that it has the sole positive real root.
SNR ¼ w=ð1wÞ
ð26Þ
Y. Teng, T. Zhang / Computers in Biology and Medicine 42 (2012) 714–723
717
Fig. 1. (a) Shepp-Logan phantom (64 64), (b) noisy-free projections (96 64), (c) pseudo-random projections, (d) Poisson-noisy projections with SNR¼ 1, and (e) Poissonnoisy projections with SNR¼ 10.
0 2 8
1
Fig. 2. (a) Circular phantom (64 64) with relative activities 0:2:1:8 for air, background, cold sphere and hot sphere, (b) noisy-free projections (96 64), and (c) pseudorandom projections.
We used the pseudo-random (Eq. (24)) and Poisson-noisy (Eq. (25), SNR ¼1 and 10) projections of Shepp-Logan phantom, which were shown in Fig. 1(c)–(e). We also used the pseudorandom projections of the circular phantom, which was shown in Fig. 2(c). Real clinical transmission projections were also used to evaluate the performance of the algorithms, which were obtained by POSITRON’s mPower scanner with a 5 mCi Ge68 rotating rod source. We performed a short transmission scan, which was more comfortable for patient, but leaded to less counts and more noise in the projections. It is well-known that the details of reconstruction are not generally concerned in transmission scan (an attenuation image is usually segmented into air, the lungs and
soft tissue to suppress noise [15]). The acquisition lasted for 2 min and 20M counts in 61 slices were obtained. The normalization correction was performed by measurements obtained from a calibration scan. The raw data were 256 radial bins and 192 angles (bin size 2 mm). These angles, however, were not evenly spaced, for which (also to reduce computation complexity) we resampled them to 128 bins and 128 even angles. The attenuation coefficients of the attenuation media were computed from these transmission projections. We performed reconstruction on the attenuation coefficients shown in Fig. 3. For fairness, all the algorithms were initiated using the same positive uniform image whose total counts equaled these of the projections.
718
Y. Teng, T. Zhang / Computers in Biology and Medicine 42 (2012) 714–723
4. Results 4.1. Shepp-Logan phantom We firstly used a Shepp-Logan phantom to investigate the behavior of the algorithms. In order to find the suitable penalty parameter for MAP, SOR, PMLEM, PFWLS and PAWLS, we compare the change of MAE (computed by Eq. (27)) with varying b in Fig. 4, where the pseudo-random projections are used and 200 iterations are executed for every reconstruction. A smaller MAE indicates the reconstruction closer to the true case. We can see that the optimal b are obtained at 103 for all the algorithms. Fig. 5 presents the comparison of MAE curves among all the algorithms as b ¼ 103 for the first 100 iterations. As can be seen, MLEM increases the MAE curve after 10 iterations, and it obviously fails to suppress noise. SOR rapidly decreases the curve and then obtains a plat curve, which indicates a fast convergence. PFWLS decreases the curve slower than SOR, but they become very close after 50 iterations. It can also be seen that MAP and PMLEM obtain very low MAE curves after 25 iterations and they deliver similar curves after 60 iterations making us difficult to
103 MAP SOR PMLEM PFWLS PAWLS
3.2. Evaluation The mean absolute error (MAE) was used to demonstrate the difference between the reconstructed and original images, where n was the total number of pixels of image: 1 t original Jx x J1 n
ð27Þ
We also put our efforts to investigate resolution and noise for these algorithms. Eq. (24) was used to generate 100 realizations and we executed 200 iterations for everyone. The mean and deviation are respectively related to the resolution and noise. They were calculated for every pixel and formed two images: m¼
ð28Þ
vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi u 100 X u 1 s¼t ðxðlÞmÞ2 1001 l ¼ 1
100 X X 1 JxðlÞj xoriginal J j 1009O9 l ¼ 1 j A O
P
mj
k A O2 mk
ADEV ¼
1 X sj 9O9 j A O
10−8
10−6
10−4
10−2
100
ð30Þ
,P P
Fig. 4. MAE as the functions of b on the pseudo-random projections with 200 iterations. The minimal MAE for each algorithm was marked with c.
ð29Þ
where x(l) represented the l-th reconstructed image. We further computed AMAE (average MAE), contrast and ADEV (average deviation) for selected ROIs (region of interest) to measure image quality and ability of noise control, quantitatively:
j A O1 Contrast ¼ P
101
β
100 1 X xðlÞ 100 l ¼ 1
AMAE ¼
102
100
original j A O1 x j original k A O2 xk
ð31Þ
Mean absolute error (log scale)
MAEðtÞ ¼
Mean absolute error (log scale)
Fig. 3. Real clinical data (128 128) from a transmission scan.
MLEM MAP SOR PMLEM PFWLS PAWLS
101
ð32Þ
where O1 , O2 and O denoted ROIs, and xðlÞj represented the j-th component of the l-th reconstruction. The desired values were 0, 1 and 0 for AMAE, contrast and ADEV.
0
20
40 60 Iteration number
80
Fig. 5. MAE changes with iteration number increasing.
100
Y. Teng, T. Zhang / Computers in Biology and Medicine 42 (2012) 714–723
719
Fig. 6. Reconstructed images by (a) MLEM, (b) MAP, (c) SOR, (d) PMLEM, (e) PFWLS and (f) PAWLS.
90
true MAP SOR PMLEM PFWLS PAWLS
80 70 60 Pixel value
distinguish each other. PAWLS shows a little slow convergence compared to SOR, but still more rapid than MLEM. Fig. 6 displays the reconstructions corresponding to Fig. 5 at the 100th iteration. We can see from Fig. 6(a) that the MLEM reconstruction clearly suffers from very serious noise artifacts. But the other algorithms successfully smooth the reconstructions and improve homogeneity within every region, so that the image noise is effectively suppressed. The SOR result is inferior to these obtained by MAP, PMLEM, PFWLS and PAWLS with more noise artifacts. In fact, except MLEM and SOR, the other algorithms provide too similar reconstructions to be visually distinguished. Fig. 7 quantitatively compares the horizontal profiles of the reconstructions with that of the true phantom at the 32th row. The MLEM profile is omitted because of the invalid noisesuppression. We can see that MAP, PMLEM, PFWLS and PAWLS follow the outline of the phantom more accurately than SOR. We guess it due to the truncation of the pixels below zero for accommodating the nonnegativity of image when updating. These conclusions are consistent with those of visual assessment of Fig. 6. Fig. 8 displays the reconstructions at the 100th iteration for b ¼ 104 and 102 . For the case b ¼ 104 , the SOR result still suffers more noise artifacts. The other reconstructions are very similar by the naked eyes. For the case b ¼ 102 , we can clearly see that MAP obtains an ill-reconstructed image. In fact, it is unsurprising that a divergent algorithm gives a nonsensical reconstruction. And the other algorithms obviously place too much emphasis on smoothness, so that the constituent parts of the phantom are difficult to distinguish from each other. Considering the influence of noise distributions and levels on the optimal b and convergence, we plotted MAE-b curves on the Poisson-noisy projections with SNR ¼1 and 10 in Fig. 9 (every reconstruction is with 200 iterations). As can be seen, for the case SNR ¼1 the optimal b are always obtained at 104 for all the algorithms. For the case SNR ¼10, the optimal b are obtained at 105 for them. The figure (tied up with Fig. 4) emphatically indicates that the convergence of MAP depends on noise level. In fact, it is reasonable for MAP that the optimal b depends on noise level, but the convergence dependence on noise level makes it hardly applied.
50 40 30 20 10 0 10
20
30 40 Pixel position
50
60
Fig. 7. Horizontal central profiles of the reconstructions at the 32th row.
4.2. Circular phantom By Eqs. (28)–(32), we analyze the image quality and noise control of the algorithms on the circular phantom with the pseudo-random projections. The penalty parameter b ¼ 103 was selected according to the conclusion of Fig. 4 since the phantom was also with 105 counts and pseudo-random noise. Mean images are shown in Fig. 10. MLEM provides, as expected, a noisy reconstruction; MAP fails instead in achieving a correct reconstruction, thus showing. We can also see that the other four images are very close to each other at least from a visual standpoint. In the following of this subsection, we will further give quantitative indexes to compare them. Since MAP corrupts the reconstruction, few focus will be given to it. The standard deviation images are displayed in Fig. 11. We scaled the reconstructions of MLEM and MAP to the same graylevels, and those of SOR, PMLEM, PFWLS and PAWLS were also so. That is because MLEM and MAP provide very big deviations and the other four algorithms present relatively little ones. As can be
720
Y. Teng, T. Zhang / Computers in Biology and Medicine 42 (2012) 714–723
102
MAP SOR PMLEM PFWLS PAWLS
101.8
Mean absolute error (log scale)
Mean absolute error (log scale)
Fig. 8. From left to right, reconstructed image using MAP, SOR, PMLEM, PFWLS and PAWLS; the first row with b ¼ 104 and the second one with b ¼ 102 .
101.7 101.6 101.5 101.4 10−8
10−6
10−4
10−2
100
MAP SOR PMLEM PFWLS PAWLS
101
100
10−8
10−6
10−4
10−2
100
Fig. 9. MAE as the functions of b on the Poisson-noisy projections with 200 iterations: (a) SNR¼ 1 and (b) SNR¼ 10. The minimal MAE for each algorithm was marked with c.
Fig. 10. Mean images using (a) MLEM, (b) MAP, (c) SOR, (d) PMLEM, (e) PFWLS and (f) PAWLS. Everyone was scaled according to its own minimum and maximum.
Y. Teng, T. Zhang / Computers in Biology and Medicine 42 (2012) 714–723
721
Fig. 11. Standard deviation images using (a) MLEM, (b) MAP, (c) SOR, (d) PMLEM, (e) PFWLS and (f) PAWLS, where (a) and (b) have been scaled to the same gray-levels, and (c)–(f) have been also so. Table 1 Comparison on AMAE, contrast and ADEV for several ROIs. Index (desired value)
ROI
MLEM
MAP
SOR
PMLEM
PFWLS
PAWLS
AMAE (0)
Hot sphere Cold sphere Background
61.81 22.38 32.94
143.36 11.08 13.79
28.69 6.04 7.06
28.95 5.33 6.12
30.58 5.63 6.38
31.51 6.06, 5.77
Contrast (1) (to background)
Hot sphere Cold sphere
0.98 1.08
0.14 1.26
0.84 1.18
0.81 1.18
0.80 1.23
0.75 1.25
ADEV (0)
Hot sphere Cold sphere Background
75.87 30.05 40.50
35.77 4.56 8.96
5.18 5.24 5.01
3.86 3.96 3.83
3.63 3.54 3.48
3.60 3.55 3.50
seen, the deviations of MLEM and MAP obviously depend on the activities of the object, which indicate less (or more) activity correlating to lower (or higher) deviation. However, we desire uniform deviations for various activities when dealing with real clinical data. The other four images provide a uniform deviation overall the object. But SOR obviously gives the maximal one, relatively, which uncovers the most sensitive to noise. It can also be seen that the remainders (those of PMLEM, PFWLS and PAWLS) are considerably close. Table 1 lists AMAE and ADEV for the hot sphere, cold sphere and background, and the contrasts of the hot and cold spheres to the background. We can find out that MLEM and MAP provide very big AMAE values for all ROIs. The others provide comparable results on all the ROIs. We can also see that, MLEM gives the best contrasts and MAP gives the worst ones. The SOR contrasts are slightly superior to those of PMLEM, PFWLS and PAWLS. By the table, we can draw a similar conclusion with Fig. 11 that MLEM and MAP give very big and activity-dependent deviations. The SOR deviations are bigger than those of PMLEM, PFWLS and PAWLS. Considering contrast and ADEV together, we cannot pursue both of them; except those of MAP, when one falls, the other rises.
We also scaled the sum of the attenuation coefficients to 105 since they had effect on the selection of penalty parameter. Generally, we need to rescale these of the reconstructed image to the original value. From above, although we cannot provide an effective approach to select a suitable penalty parameter for different requirements, we can decide which provides the smoothest reconstruction for MAP, experimentally. In fact, the MAP algorithm with b ¼ 103 obtains the smoothest reconstruction in f109 ,108 , . . . ,1g on the real clinical projections. So the penalty parameter was fixed to 103 for all the algorithms to obtain an enough smoothing reconstruction. Fig. 12(a) clearly illustrates that the MLEM image is contaminated by the noise. The SOR reconstruction also shows a large noise level, despite inferior respect to MLEM. The other algorithms appear to be able to control noise. For real clinical data, it is difficult to quantitatively evaluate the algorithms. By the naked eyes, we can come to a similar conclusion with those of Fig. 6. The overall CPU times, for 30 iterations, are listed in Table 2. It should be pointed out that SOR costs about 43 times CPU time than MLEM, while MAP, PMLEM and PFWLS have similar computational costs. PAWLS requires about 3 times running time than MLEM, which is inherited from solving the cubic polynomials.
4.3. Real clinical data 5. Conclusion and discussion Fig. 12 shows the reconstructions of the patient’s projection data after 30 iterations using MLEM, MAP, SOR, PMLEM, PFWLS and PAWLS. The reconstructed grid was 128 128, with 4 mm pixels.
For the Bayesian estimate and two PWLS models, which take into account the influence of smoothness, we propose three
722
Y. Teng, T. Zhang / Computers in Biology and Medicine 42 (2012) 714–723
Fig. 12. Reconstructed images of real clinical data using (a) MLEM, (b) MAP, (c) SOR, (d) PMLEM, (e) PFWLS and (f) PAWLS.
Table 2 Comparison of time costs (unit in second) for 30 iterations. MLEM
MAP
SOR
PMLEM
PFWLS
PAWLS
22.31
20.50
947.84
21.39
20.53
75.34
EM-type algorithms by solving the constructed auxiliary functions to decrease the cost functions. The experimental results clearly show the main defect of MAP, i.e., the lack of convergence for large penalty parameter values. All the other considered algorithms always converge. For SOR and our algorithms, we must weigh the contrast and the noise suppression because none consider both of them. To compare convergence speed, the SOR algorithm converges very fast, but our algorithms inherit the slow convergence from MLEM. By comparison of time cost, SOR requires a great deal of computation time and the other algorithms need comparable running time. To conclude, our algorithms have the great advantage over MAP; however, they have both advantage and disadvantage compared with SOR. It should be noted that the proposed algorithms, specially with a smaller penalty parameter, behave similarly on the convergence rate, contrast and noise suppression, and so on. Such phenomenon may be interpreted by that FL, FW and FA (no penalty is imposed) can be considered as the KL- and w2 -divergence between y and Wx, furthermore, both of them are the special examples of the a-divergence.1 The similar algorithmic derivations on the similar estimators lead to the similar behavior. In reconstruction, it is usually needed to design edge-preserving image mode as prior distribution, such as the total variation approach [16] and the nonlinear anisotropic diffusion approach [17,18]. But these algorithms are mainly based on the MAP algorithm. Aiming at the acceleration of our algorithms, the
1
As shown in [9], the a-divergence can measure the difference between two real numbers as Da ðp,qÞ ¼ ð1=að1aÞÞ½ap þ ð1aÞqpa q1a . Three special cases are respectively KL-divergence and two w2 -divergences: D1 ðp,qÞ ¼ lima-1 Da ðp,qÞ ¼ q log q=pq þ p, D2 ðp,qÞ ¼ 12 ðpqÞ2 =q, D1 ðp,qÞ ¼ 12 ðpqÞ2 =p. Using the a-divergence to measure the difference between y and Wx, we can obtain that P a F a ðy,WxÞ ¼ ð1=að1aÞÞ Ii ¼ 1 ½ayi þ ð1aÞðWxÞi yai ðWxÞ1 , so FL, FW and FA are the i three special cases (F W ðy,WxÞ is a variant of F a ¼ 1ðy,WxÞ to avoid division by zero).
ordered subsets technique [19] could fulfill the task. In addition, Shepp and Vardi [1] provided an imperfect convergence proof for MLEM in 1982. Until 1985, the defect was remedied (though hard to do) by themselves based on Csisza´r and Tusna´dy’s work [20]. Fessler provided the convergence of PWLSþSOR based on the well-known theory of quadratic programming. But we cannot provide a strict convergence proof for the proposed algorithms although the simulation results show so. The future work is to focus on these topics.
Conflict of interest statement None declared.
Acknowledgment This work was supported by the National Natural Science Foundations of China (Nos. 10771031, 11071033).
Appendix A. Existence and uniqueness of positive solution In Mathematics, the complex conjugate root theorem states that if P is a polynomial in one variable with real coefficients, and the complex conjugate of a root of P is also a root. Following this, any cubic polynomial with real coefficients must have at least one real root [21]. For convenience, we rewrite the equation with a simple form z3 þpz2 q ¼ 0 (q 40). It can be factorized in R1. ðz þ k1 Þðz2 þ k2 z þ k3 Þ ¼ 03 z3 þðk1 þ k2 Þz2 þ ðk1 k2 þ k3 Þz þ k1 k3 ¼ 0
ðA:1Þ
where k1 k2 þ k3 ¼ 0 and k1 k3 ¼ q o0. If k1 40, then k3 o 0 and k2 40, so that z2 þ k2 z þ k3 ¼ 0 has the sole positive root. If k1 o 0, then k3 4 0 and k2 40 such that z2 þ k2 z þk3 ¼ 0 has two negative real roots or no real root, and thus z ¼ k1 is the sole positive solution.
Y. Teng, T. Zhang / Computers in Biology and Medicine 42 (2012) 714–723
References [1] L.A. Shepp, Y. Vardi, Maximum likelihood reconstruction for emission tomography, IEEE Trans. Med. Imag. 1 (2) (1982) 113–122. [2] C.L. Byrne, Iterative image reconstruction algorithms based on cross-entropy minimization, IEEE Trans. Med. Imag. 2 (1) (1993) 96–103. [3] C.M. Chen, H.H. Lu, Y.P. Hsu, Cross-reference maximum likelihood estimate reconstruction for positron emission tomography, Biomed. Eng. Appl. Basis Commun. 13 (4) (2001) 190–198. [4] P.J. Green, Bayesian reconstructions from emission tomography data using a modified EM algorithm, IEEE Trans. Med. Imag. 9 (1) (1990) 84–93. [5] J.A. Fessler, Penalized weighted least-squares image reconstruction for positron emission tomography, IEEE Trans. Med. Imag. 13 (2) (1994) 290–300. [6] J.A. Fessler, E.P. Ficaro, N.H. Clinthorne, L. Kenneth, Grouped-coordinate ascent algorithms for penalized-likelihood transmission image reconstruction, IEEE Trans. Med. Imag. 16 (2) (1997) 166–175. [7] J.M.M. Anderson, B.A. Maira, R. Murali, C.H. Wu, Weighted least-squares reconstruction algorithms for positron emission tomography, IEEE Trans. Med. Imag. 16 (2) (1997) 159–165. [8] Y. Teng, T. Zhang, Fixed point algorithm in PET reconstruction, in: 2010 3rd International Congress on Image and Signal Process, vol. 5, 2010, pp. 2034– 2038. [9] Y. Teng, T. Zhang, Iterative reconstruction algorithms with a-divergence for PET imaging, Comput. Med. Imag. Graph. 35 (4) (2011) 294–301. [10] S. Amari, in: Differential–Geometrical Methods in Statistics, Lecture Notes in Statistics, vol. 28, Springer, Berlin, 1985.
723
[11] A. Cichocki, S. Amari, Families of alpha- beta- and gamma-divergences: flexible and robust measures of similarities, Entropy 12 (6) (2010) 1532–1568. [12] D.D. Lee, H.S. Seung, Algorithms for non-negative matrix factorization, in: Advances in Neural Information Processing, vol. 13, 2001, pp. 556–562. [13] A. Cichockia, H. Lee, Y.D. Kimb, S.J. Choi, Non-negative matrix factorization with a-divergence, Pattern Recognition Lett. 29 (9) (2008) 1433–1440. [14] J. Zhou, J.L. Coatrieux, L.M. Luo, Noniterative sequential weighted least squares algorithm for positron emission tomography, Comput. Med. Imag. Graph. 32 (8) (2008) 710–719. [15] H. Zaidi, J. Diaz-Gomez, A.O. Boudraa, D.O. Slosman, Fuzzy clustering-based segmented attenuation correction in whole-body PET imaging, Phys. Med. Biol. 47 (7) (2002) 1143–1160. [16] V.Y. Panin, G.L. Zeng, G.T. Gullberg, Total variation regulated EM algorithm, IEEE Trans. Nucl. Sci. 46 (6) (1999) 2202–2210. [17] O. Demirkays, Anisotropic diffusion filtering of PET attenuation data to improve emission images, Phys. Med. Biol. 47 (20) (2002) 271–278. [18] C. Riddell, H. Benali, I. Buvat, Diffusion regularization for iterative reconstruction in emission tomography, IEEE Trans. Nucl. Sci. 51 (3) (2004) 712–718. [19] H.M. Hudson, R.S. Larkin, Accelerated image reconstruction using ordered subsets of projection data, IEEE Trans. Med. Imag. 13 (4) (1994) 601–609. [20] Y. Vardi, L.A. Shepp, L. Kaufman, A statistical model for positron emission tomography, J. Am. Stat. Assoc. 80 (389) (1985) 8–20. [21] A. Jeffrey, Complex Analysis and Applications, CRC Press Inc., Boca Raton, 2005.