Mathematical Biosciences 313 (2019) 48–60
Contents lists available at ScienceDirect
Mathematical Biosciences journal homepage: www.elsevier.com/locate/mbs
A Bayesian hierarchical point process model for epidermal nerve fiber patterns
T
C. Andersson ,a, T. Rajalab, A. Särkkäa ⁎
a b
Department of Mathematical Sciences, Chalmers University of Technology and the University of Gothenburg, Göteborg SE-412 96, Sweden Department of Statistical Science, University College London, Gower Street, London WC1E 6BT, UK
ARTICLE INFO
ABSTRACT
Keywords: Bayesian estimation Cluster point process Epidermal nerve fibers Hierarchical structure Nerve entry points Thomas process
We introduce the Thomas process in a Bayesian hierarchical setting as a model for point pattern data with a nested structure. This model is applied to a nerve fiber data set which consists of several point patterns of nerve entry points from 47 subjects divided into 3 groups, where the grouping is based on the diagnosed severity of a certain nerve disorder. The modeling assumption is that each point pattern is a realization of a Thomas process, with parameter values specific to the subject. These parameter values are in turn assumed to come from distributions that depend on which group the subject belongs to. To fit the model, we construct an MCMC algorithm, which is evaluated in a simulation study. The results of the simulation study indicate that the group level mean of each parameter is well estimated, but that the estimation of the between subject variance is more challenging. When fitting the model to the nerve fiber data, we find that the structure within clusters appears to be the same in all groups, but that the number of clusters decreases with the progression of the nerve disorder.
1. Introduction Most methods for analyzing and modeling point pattern data are concerned with treating a single observed point pattern or replicated patterns, i.e. several observations from one, in every sense identical, process. However, in a data set consisting of several observations, it is not uncommon that the mechanism that produces the observations is in principal the same, but has certain aspects that vary among the observations. From from a modeling perspective, we would think of this as different observations coming from the same underlying point process model but with different parameter values, where an observation refers to an observed point pattern. Moreover, there might be several groups in the data, and observations that are in the same group are expected to be more similar to each other than to observations from different groups. In this case, it is very often of interest to characterize the data both on the observation level and on the group level. This hierarchy could be extended further by including more levels. Such a hierarchical structure is present in a data set consisting of skin samples taken from subjects in three groups: healthy individuals, those with mild diabetic neuropathy and those with moderate/severe diabetic neuropathy. The samples are point patterns consisting of the locations where epidermal nerve fibers (ENFs) enter the outermost living layer of the skin. These patterns are hereafter referred to as entry point patterns. The data set has several samples from each subject, and ⁎
each subject belongs either to the healthy group or one of the diabetic groups, as illustrated in Fig. 1. Our goal is to reach a better understanding of the process generating these point patterns and to examine whether the process differs between the groups. To achieve this, we model the entry point patterns, which are clustered, by the Thomas process and estimate the subjectwise and groupwise parameters in a Bayesian hierarchical setting. This means that the samples from one subject are assumed to be realizations of a Thomas process with certain parameter values, and that the parameter values of one subject are assumed to be realizations of a distribution that is shared between subjects in the same group. For the group level parameters, we specify prior distributions. Apart from the priors for the group level parameters, this can also be viewed as the point patterns from one subject being realizations of a Thomas process, with parameter values coming from a random effects model. Bayesian estimation of the parameters of the Thomas process, or Neyman-Scott process in general, can be found in the literature [see e.g. 1,2]. However, in these earlier papers, parameters are estimated based on a single point pattern, not in such a complicated hierarchical setting with groups, subjects within groups and samples within subjects, as here. We fit the model using an MCMC algorithm, sampling from the posterior distribution of the parameters given the data. This provides inference both on the parameter values of individual subjects and on the group level means of these parameters. To evaluate the goodness of
Corresponding author. E-mail address:
[email protected] (C. Andersson).
https://doi.org/10.1016/j.mbs.2019.04.010 Received 31 August 2018; Received in revised form 30 April 2019; Accepted 30 April 2019 Available online 01 May 2019 0025-5564/ © 2019 Elsevier Inc. All rights reserved.
Mathematical Biosciences 313 (2019) 48–60
C. Andersson, et al.
aggregation, but they also exhibit large variability among samples from the same subject. For our analysis, only point patterns with at least five points are kept, leading to the above mentioned group sizes. Such a small number of entry points is due to the severity of neuropathy of the subject in question and 10 subjects from the moderate/severe group were excluded with no samples above the threshold.
Fig. 1. Illustration of the structure of the data.
fit of the model we obtain posterior distributions of the pair correlation functions by sampling from the posterior distributions of the parameters and plugging in these values into the analytically known function. The paper is organized as follows. In Section 2, we describe the nerve data in more detail. In Section 3, we introduce the Bayesian hierarchical framework and the Thomas process, and describe how the Thomas process can be formulated in this framework. Furthermore, we describe the MCMC algorithm we constructed to fit the model. In Section 4 we present the results of a simulation study conducted to evaluate the parameter estimation method. The results of fitting the model to the nerve fiber data, and evaluation of the goodness of fit of the model, are presented in Section 5. Finally, in Section 6, we discuss the implications of our findings and some future work.
3. Method In this section we describe the Bayesian hierarchical framework and the Thomas process. Moreover we introduce the MCMC-algorithm we use for fitting the model. 3.1. Bayesian hierarchical framework To capture the structure of the data in a model, we cast it in a Bayesian hierarchical setting. To ease the presentation we will describe the framework in terms of the data set we intend to model, and use the levels in Fig. 1, i.e. samples within subjects within groups, ignoring the blister level as explained in Section 2. The samples, which are the observed point patterns, are at the lowest level of the hierarchy. Let Xgsi denote the sample i {1, …, ngs} from subject s {1, …, ng } of group g {1, …, n} . These point patterns are assumed to be realizations of the same process, X, but with parameters, θgs, that may differ between subjects. These subject specific parameters are assumed to come from a common distribution pS( · ; θg) within the group g, while the group level parameters are assumed to come from some common distribution pG( · ; θ0), where θ0 is a vector of the prior parameters. A concise way of formulating this is
2. Data The data we have available consist of suction induced skin biopsies taken from the epidermis which is the outermost layer of the skin [see 3]. From the resulting images, epidermal nerve fibers (ENFs) were traced in sample boxes of size 330 µm × 432µm × z, where z varies between 0–220 µm. Our focus will be on the spatial structure of the locations of nerve entry points, the locations where nerve trunks enter the epidermis, and they all lie on the bottom of the epidermis. It is not completely clear how the suction blister technique affects the ENF pattern in 3D. It should not affect the entry point pattern but may affect the end point pattern if the suctioning is prolonged (personal communication with Ioanna Panoutsopoulou). Therefore, we concentrate on 2D projections of the resulting point patterns. The data consist of two skin blisters taken from the right calf of 32 healthy volunteers (healthy) and 15 subjects diagnosed for diabetes who are suffering from diabetic neuropathy. The group of diabetic subjects is further divided into two groups, eight subjects who are suffering from mild neuropathy (mild) and seven subjects who are suffering either from moderate or from severe neuropathy (moderate/ severe). The number of subjects and samples from each group are shown in Table 1. From three to six images were taken from the two blisters, typically two images per blister. Therefore, the hierarchical structure consists of groups (healthy, mild, moderate/severe) at the top level, subjects at the middle level, and samples at the bottom level, as illustrated in Fig. 1. Note that in our hierarchical analysis, we consider all the images from one subject to be independent, identically distributed observations, and ignore that they originate from two blisters since there should be no blister effect present in the data (personal communication with William R. Kennedy; [4]). Each sample is a point pattern consisting of the 2D locations of the ENF entry points. In Fig. 2, all samples from four subjects are presented, along with the estimated pair correlation-functions (pcfs) for those samples. For almost all samples, the estimated pcfs indicate
g| 0
Healthy
Mild
Moderate/Severe
Subjects Samples Mean point count
32 128 26.5
8 28 21.4
7 20 12.3
0 ),
g
{1, …, n},
gs | g
pX (Xgsi ;
gs ),
i
{1, …, ngs},
pS (
gs ;
g ),
s
{1, …, ng } Xgsi |
gs
where n is the number of groups, ng is the number of subjects in group g and ngs is the number of samples from subject s in group g. In the model fitting, we will sample from the posterior distribution of the parameters given the data. Note that in this general setting, the joint posterior distribution of all the unknown parameters can be written as
p ({ g }, {
gs }|{X gsi },
0)
p ( g;
= g
0 ) p ({ gs };
pG ({ g };
{ g }) p ({Xgsi }; {
0) s
pS (
gs ;
gs })
g)
p (Xgsi ;
gs ).
i
3.2. Thomas process The Thomas process [5] is an example of a Neyman-Scott cluster process [6]. Following the presentation by Møller and Waagepetersen [7], it can be constructed as follows. Let C be a stationary Poisson process with intensity κ > 0. Conditional on C, let Xc for each point c ∈ C be independent Poisson processes with intensities c; ), where c( ) = k(
exp k( ; ) =
(
2
2 2 2
2
),
2,
(1)
that is, an isotropic Gaussian kernel with standard deviation ω. Then, X = c C X c is a Thomas process with parameters κ, α and ω. We refer to C as the parent point process, and X as the daughter point process. Note that α is the average number of daughters per cluster, and so the intensity of X is given by = . The Thomas process can also be represented as a Cox process [7] driven by the random intensity
Table 1 Number of subjects, total number of samples and mean point count in the samples for each group. In the analysis the data from the moderate and severe groups are combined into one group. Group
pG ( g ;
Z( ) =
c( c C
49
),
(2)
Mathematical Biosciences 313 (2019) 48–60
C. Andersson, et al.
meaning that if we condition on a fixed parent pattern c, the density of the daughter point process restricted to a bounded observation window W is given by
p (x|c) =
k( x
c; ) exp |W |
c c
k(
W
c c
µ
c; ) d ,
, g |…
with respect to the unit rate Poisson process.
= t|µ
,g ,
,g )
=
,g
t 2
µ ,g )2
, g (log t
exp
(
gs )
[
gs ]
= exp µ
1
= exp
,g
+
gsmooth (t ) =
1 2
1
,g
(
gs ).
(5)
Expectations as in (4) can be used to characterize groups, and differences between groups, in an intuitive way. Moreover, the precision on the log-scale gives the standard deviation on the natural scale as a factor times expected value. We will let this factor, the coefficient of variation, be denoted by rg, i.e. rg = exp
( ) 1
,g
1
1 2
IWE (g¯, g^) =
p g (t ) =
0
pµ
,g
log t
,g
(v ) t 1dv,
{ , ,
and evaluated numerically, where pµ ,g (·) and p densities of μθ,g and τθ,g respectively.
,g
+
) ,
{ , ,
2}.
(log(
gs )
µ ,g )2
2
,
{ , ,
2}.
2
) 1 exp[ t 2/(4
2 )],
(6)
g (s ) k (t
s ) ds
(7)
u
[g¯ (t )
l
g^ (t )]q dt , s (t )
(8)
where we use q = 1 for signed error and q = 2 for squared error, and set s(t) to be the standard deviation of the posterior pcf envelope. The integral is computed as a Riemann sum over l = 1.5, 3.0, …, 30.0 = u units. We then plot the point count weighted mean IWE per subject.
, and henceforth
also use the notation of (4), so that, for example, g = [ gs]. The priors of the means can be derived (see Appendix C) in terms of the priors of μθ,g and τθ,g as
1 p 2v
,g
to the pcfs estimated from the data. Here, g is the original posterior pcf defined in (6), and k the kernel. For each subject, we can plot the posterior pcf envelope together with the intensity weighted mean of the pcfs estimated from each sample. In addition, we compute the integrated error between the posterior mean kernel smoothed pcf g¯ per subject and the pcf g^ for each sample of that subject,
(4)
,g
, (n g
where κ, α, ω are samples from the posterior distribution of the model. The pcf is estimated from data by kernel estimation as described by Illian et al. [8, Sec 4.3.3.], using the Epanechnikov kernel (re-normalised for distances near zero) and bandwidth 0.2/ . Because of the kernel smoothing, this estimator is a biased estimator of the true pcf but ratio unbiased for the convolution of the true pcf with the kernel [9]. Therefore, we compare the kernel smoothed posterior pcfs
1 2
a + ng /2, b +
g (t ) = g (t ; , , ) = 1 + (4
for θ ∈ {κ, α, ω }. The latent parent pattern of each sample from this subject is from a stationary Poisson process with intensity κgs. Finally, given a parent pattern, the subject level mean cluster size αgs, and the cluster spread gs2 , an observed pattern Xgsi is from an inhomogeneous Poisson process with intensity Z(ξ), ξ ∈ Wgsi, with Z(ξ) given by (2) and Wgsi being the observation window for the corresponding pattern. An interpretation of the model on the natural scale, can be found by noting that if gs log (µ , g , , g1) the expected value and standard deviation are given by
=
+µ
+
To evaluate the goodness of fit of the model, we compare the pair correlation function (pcf) using parameters sampled from the posterior distributions and the pcfs estimated from the data. For each subject, posterior pcf samples can be computed [8, p. 377] as
2
g
gs )
3.3. Goodness of fit-methods
,
2
,g
For the subject level parameters we resort to the Metropolis-withinGibbs algorithm, since the full conditionals are too complicated to sample from directly. The Hastings ratios of the updates are found in Appendix A. For the updates we also use the Birth-death-move Metropolis-Hastings algorithm [7]. The Hastings ratios for these updates are also given in Appendix A.
3.2.1. Hierarchical Thomas process In order to fit the Thomas process to the data in the hierarchical framework we formulate the model as illustrated in Fig. 3, where the plates represent the different levels of the hierarchy and the mathematical formulation of the model is given next to the plate diagram. In words, µ , µ , µ 2 , , and 2 are the hyperparameters of the group level means on the log-scale, and a · and b · are hyperparameters for the respective group level precisions. Conditional on these parameters, the subject specific parameters are log-normal, with mean and precision depending on the group, meaning that the density of θgs|μθ,g, τθ,g is given by gs
ng
With the same method, and with the same notation, the priors for the precisions are gamma distributions, meaning that the posteriors are also gamma as
(3)
p(
log(
,g
, g |…
4. Simulation study
2},
In order to evaluate how well the method works, and how the different parameters and the group size affect the results, we conduct a simulation study. The cases studied here are close to the values estimated from the data in Section 5, in terms of expected number of parent points κ, expected number of daughters per parent α, and kernel standard deviation ω relative to the window size. However, the spatial scale is different in the simulation study (unit square) and the data analysis (330 µm × 432 µm), and to account for this we adjust the values of μκ and µ 2 according to the change of scale. In addition, the ω in the nerve data corresponds roughly to the ω used in the simulations multiplied by the side length of the nerve data window and the κ in the nerve data to the κ in the simulations divided by the area of the nerve data window. The remaining parameters to the prior distributions are free of window size scale, since τθ, aθ and bθ affects prior standard deviation relative to the expected value for each parameter θ ∈ {κ, α, ω2}, and μα relates to
(·) are the prior
3.2.2. MCMC algorithm Since we work in a Bayesian setting we are interested in the posterior distribution of the parameters given the data. This is obtained by sampling from the posterior distribution using an MCMC algorithm. At the group level this is done by Gibbs sampling, and the posteriors are obtained from the fact that we use conjugate priors for both the mean and precision parameters. For the group level means, μθ,g, θ ∈ {κ, α, ω2}, the likelihood for θg,s is log-normal with mean μθ,g and precision τθ,g, and the prior for μθ,g is normal, meaning that the posterior is also normal as 50
Mathematical Biosciences 313 (2019) 48–60
C. Andersson, et al.
Table 2 The levels of the parameters in the different simulated cases. Parameter
Value
κg αg
10,20 1, 3 0.0252, 0.052
rg Group size Samples per subject
0.25, 0.5 10,25 3
2 g
Table 4 For each parameter, θ, the true value is denoted by θg, the mean of the true ^ subject specific values is denoted by ¯gs , the posterior median is denoted by g and 95% CI is the 95% credible interval. The order of the scenarios is the same in all tables. Subjects 10 10 10 10 10 25 25 25 25 25 25 25 25
the prior mean cluster size, which is scale free in itself. 4.1. Set-up Each of the group level expectations, κg, αg and g2 as defined in (4) in Section 3.2.1, are varied between two levels. The variability within a group is parameterized through rg, the ratio between the standard deviation and the mean. We also allow rg to vary between two levels, but keep it the same for all group level parameters in each case. Moreover, we consider two group sizes, while the number of replicates per subject is kept fixed at 3. A complete description of the parameter choices is given in Table 2, although it should be noted that only 13 of the 32 possible parameter combinations are considered. The patterns were simulated on the unit square. In each scenario, data are generated according to the model structure, i.e. for each parameter θ and subject s, we draw a value θgs from the log-normal distribution with log-scale mean μθ,g and precision τθ,g, which can be derived using (4) and (5). Thereafter, we simulate three patterns from the Thomas process with the sampled parameter values κgs, αgs and gs2 for each subject. The model is then fitted to the simulated data using the MCMC algorithm described in Section 3.2.2, and the groups are compared in terms of the posterior distributions of the different parameters. The parameters of the prior distributions are chosen as presented in Table 3.
10 10 10 10 10 25 25 25 25 25 25 25 25
A summary of the parameter estimates of the natural scale means, κg, αg and g2 , for the different cases considered is given in Table 4. For ^ each parameter we give the posterior median, denoted by g , as a point estimate and a 95% credible interval. These estimates can be compared both to the true values θg and to the mean of the sampled subject level parameters denoted by ¯gs . In general, the point estimates are quite close to the mean of the sampled values. Moreover, overlaps of the credible intervals between different simulated cases are rare, indicating adequate identifiability of the parameters. We can get more detailed insights into the parameter estimates by plotting the posteriors from different cases together, as in Figs. 4 and 5. The true values of the parameters and the corresponding estimates from the sampled subject level values are indicated by vertical lines. For the natural scale expectations, we take the mean of the subject level values as our estimate, see Table 4. The log scale mean, μθ,g, is estimated by the mean of log θgs, and the precisions, τθ,g, by 1/ s 2gs , where s gs is the
Value
μκ μα µ 2
log (20) log (5) log (0.1)
τθ aθ bθ
rg 0.25 0.25 0.25 0.50 0.50 0.25 0.25 0.25 0.50 0.50 0.50 0.50 0.50 Summary of the
Subjects
rg
10 10 10 10 10 25 25 25 25 25 25 25 25
0.25 0.25 0.25 0.50 0.50 0.25 0.25 0.25 0.50 0.50 0.50 0.50 0.50
¯ gs
κg
^g
95% CI
10 10.3 9.76 [8.18,11.69] 20 21.1 22.5 [18.97,27.18] 20 19.1 20.6 [15.94,27.91] 10 9.89 9.63 [7.7,12.24] 20 17.8 16.2 [11.92,21.38] 10 9.4 9.38 [8.1,10.84] 10 10 9.97 [8.77,11.43] 20 21.4 22.5 [19.84,25.29] 10 9.54 10 [8.18,12.41] 10 11.2 10.7 [8.72,13.91] 20 19.9 20.1 [15.89,25.93] 20 17.5 17 [12.97,22.51] 20 19.6 18.5 [15.67,22.59] estimates of κg for the different simulated cases. αg
¯ gs
^g
95% CI
3 2.75 2.85 [2.41,3.3] 3 2.95 2.77 [2.3,3.36] 3 3.38 3.07 [2.48,3.89] 3 2.32 2.56 [1.96,3.23] 1 1.00 1.04 [0.7,1.63] 3 3.10 3.14 [2.88,3.41] 3 2.86 3.03 [2.69,3.43] 1 1.11 1.10 [0.92,1.27] 3 3.07 3.01 [2.39,3.85] 3 3.39 3.39 [2.71,4.31] 1 1.22 1.21 [0.93,1.67] 1 1.04 1.03 [0.79,1.36] 3 2.79 2.90 [2.4,3.6] estimates of αg for the different simulated cases. 2 g
2 ¯ gs
25 6.25 25 25 6.25 6.25 25 6.25 6.25 25 6.25 25 25
Summary of the estimates of
20.09 5.61 26.68 19.70 5.99 5.58 24.46 6.49 5.21 23.27 6.03 24.95 25.50
2 g
^g2
20.94 5.40 24.22 22.02 6.72 5.55 23.90 7.11 5.21 23.90 5.44 23.56 29.13
95% CI [17.38,25.87] [4.75, 6.21] [19.76,30.70] [17.38,27.75] [4.09,17.65] [5.13, 6.01] [20.99,27.71] [6.08, 8.76] [4.42, 6.58] [19.45,30.18] [4.59, 6.70] [18.59,31.35] [22.90,41.16]
for the different simulated cases ( × 104).
sample standard deviation of log (θgs), and θ ∈ {κ, α, ω2}. We start by considering two groups with different mean parameter values, but with the same standard deviation - mean ratio given by rg = 0.5. Both groups consist of 10 subjects, each having three samples, giving us in total 30 samples. The posterior distributions of all parameters are presented in Fig. 4. The first and second rows give the posteriors for the mean parameters on the natural and log -scale, respectively, and the third row gives the posterior for the precision on log-scale. (Note that the true group level parameter, thick vertical line, on the last row can be computed by using the relationship between rg and τθ,g giving us , g = 4.48 when rg = 0.5.) In general, the parameters are fairly well estimated, even for such small groups with rather large variance within the groups. For the α and ω parameters, the posteriors are clearly separated, both on the natural and on the log-scale, indicating that we can clearly distinguish a difference between the two groups. For κ and μκ, the difference is not as clear with overlap between the posteriors from the different groups. An explanation for the greater overlap for posteriors of κ is simply that the ratio between the means in
Table 3 Values of the hyperpriors for the simulation study. Parameter
0.25 0.25 0.25 0.5 0.5 0.25 0.25 0.25 0.5 0.5 0.5 0.5 0.5 Summary of the
Subjects
4.2. Results of the simulation experiment
rg
1/4, θ ∈ {κ, α, ω2} 2, θ ∈ {κ, α, ω2} 0.2, θ ∈ {κ, α, ω2}
51
Mathematical Biosciences 313 (2019) 48–60
C. Andersson, et al.
the two groups is 2, while it is 3 and 4 for α and ω2, respectively. The posteriors of the precision parameters seem to be more affected by the prior than the posteriors of the mean parameters, possibly due to the small sample size. This is especially clear when considering τα,g. It might be that the groups are too small to obtain much information about the precision of the unobserved variables, but it is also possible that the true posterior is quite close to the prior. To investigate the effect of the group size and the within group variance, we consider the scenario where we fix = 3 and 2 = 0.0025 for both groups and vary κ, rg and the group size. From Fig. 5, it is quite clear that the group size and the within group variation affect the estimates of the mean parameters. Considering the posteriors for μκ,g, in the middle panel, it is obvious that the estimate is much more precise with the large group–small variance combination, than with the small
Table 5 Values of the hyperpriors for the ENF data. Here, |W | = 430·330, which is the approximate area of the observation windows. Parameter
Value
μκ μα µ 2
log (25/|W|) log (0.5) log (40)
τθ aθ bθ
1/4, θ ∈ {κ, α, ω2} 2, θ ∈ {κ, α, ω2} 0.2, θ ∈ {κ, α, ω2}
Fig. 2. Empirical pair correlation functions (leftmost column) and all samples (remaining columns) from five different subjects. 52
Mathematical Biosciences 313 (2019) 48–60
C. Andersson, et al.
Fig. 3. Model structure and mathematical formulation of the hierarchical Thomas process. Here PoissonPP( · ) is the density of the (inhomogeneous) Poisson process with respect to the unit rate Poisson process, and Z(ξ) is the intensity given by (2).
group–large variance combination. For the groups with the larger true value of μκ,g, the differences are very small, and the effect of decreasing the group size and increasing the variance seems to be very similar. Finally, it can be seen in the rightmost panel of Fig. 5, that the precision parameter seems well estimated for rg = 0.5. In addition, the posteriors are very different from the prior for both group sizes, even though the overlap between these posteriors and those for the groups with rg = 0.25 is quite substantial.
mentioned earlier, we have three groups of subjects, healthy, mild and moderate/severe. The values of the parameters of the hyperpriors that we used for the data are given in Table 5. Some convergence and autocorrelation results for the MCMC chains are shown in Appendix B. To evaluate the goodness of fit of the model, we compare the kernel smoothed posterior pcfs in (7) for each subject to the ones estimated from the samples from the respective subject. To get an overview, we compute, for each subject, the integrated error IWE between the posterior mean and the estimated pcf of each sample as defined in (8). This provides a measure of the relative goodness of fit for each subject, i.e. when comparing the IWEs for two subjects a smaller absolute value indicates a better fit of the model. The signed and squared IWEs are presented in Fig. 6. We also plotted the posterior pcf envelopes together with the pcfs estimated from each sample and the intensity weighted
5. Analysis of the nerve fiber data In this section, we fit the Thomas process to the entry points of the ENF patterns and estimate the group and subject level parameters using the Bayesian hierarchical approach introduced in Section 3. As
Fig. 4. Posterior densities for the two groups, one with g = 10, g = 3, and g2 = 0.0025, and one with g = 20, g = 1, and g2 = 0.000625. Each group has 10 subjects and rg = 0.5. The thick vertical lines indicate the true group level parameter values, while the thin vertical lines are the corresponding estimates from the true subject level parameters. The prior densities are plotted with solid black lines. 53
Mathematical Biosciences 313 (2019) 48–60
C. Andersson, et al.
Fig. 5. Posterior densities of the natural and log -scale means of κg, and for the precision τκ,g, for the four groups with fixed αg and Vertical lines are estimates from the true subject level parameter values, and black solid lines are the prior densities.
2 g,
and varying κ, r and group size.
Fig. 6. Integrated signed (left) and squared (middle) error between the posterior mean pcf per subject and the pcf for each sample of that subject, and a ”scatter” plot between the two (right). One outlier subject with IWE 250 and IWSE 9145 is not included in the plots.
Fig. 7. Posterior pcf envelopes (grey) together with the intensity weighted pointwise mean of these estimates (solid blue) plotted for the same five subjects as in Fig. 2.
pointwise mean of these estimates for all 47 subjects. For the majority of the subjects the model provides a good fit, and even for the subjects with larger IWEs the range is estimated fairly well. The largest differences are found at short distances, but considering the low point counts in each sample, the differences between the samples from the same subject, and the variance of the estimate of the pcf at these distances, these differences are not necessarily indications of a poor fit of the model. Five of the subjectwise intensity weighted mean pcfs with posterior envelopes, two examples of subjects with relatively low IWEs (id01130 and id00275) and three examples of subjects with relatively high IWEs (id01146, id00261 and id00278) are presented in Fig. 7. The samples and individual pcfs are shown in Fig. 2. From the IWEs we see that the model has a tendency to slightly overestimate the pcfs. From the group level posterior pcfs, see Fig. 8, we see that the entry point
pattern seems to become more clustered as the neuropathy advances. This is in line with results in previous studies: Kennedy et al. [10] noted that entry point patterns from the thigh of one diabetic subject were more clustered than those from healthy subjects. Waller et al. [11] compared one entry point pattern from the thigh of a healthy subject to two patterns each from one subject diagnosed with mild, one with moderate and one with severe diabetic neuropathy, and found that the patterns from the moderate and severe subjects were significantly more clustered in terms of the L-function. Furthermore, Andersson et al. [12] compared the entry point patterns from the foot, in terms of groupwise pooled centered L-functions for groups of healthy and mild subjects, and found strong indications that the patterns from the mild group are more clustered. In contrast, Myllymäki et al. [13] analyzed the patterns from the calf, and compared the healthy group to a group composed of 54
Mathematical Biosciences 313 (2019) 48–60
C. Andersson, et al.
be slightly higher than in the other two groups but given the large overlap between the group wise posteriors, the difference is not significant. For ωg, the median ranges from 6.62 to 7.31, which is about 2% of the smaller side of the observation window. The median values for the healthy and mild groups are very close to each other and the median for the moderate/severe groups is slightly higher. The credible interval of the median in the moderate/severe group include the intervals of the two other medians. The posterior variances of αg and ωg differ between the groups, most likely due to the different group sizes, but there is no indication that the groups are different in terms of the mean parameters. For κg, the posteriors for the healthy and the moderate/severe groups are clearly different, with the posterior median of κg in the former being approximately twice as large as in the latter, as seen in Table 6. These values of κg correspond to about 40 and 20 parent points per sample in the respective groups. Moreover, there is no overlap between the 95% credible intervals. The mild group’s posterior is halfway between the other two and has some overlap with both. Again, the posterior variance is different in the different groups, but they are more similar than for the other two parameters. There seems to be very little information in the data to estimate the precisions, since the posteriors are very similar to the priors, see the bottom row in Fig. 9. Since the precision parameters describe the between subject variance of unobserved parameters, this is not surprising. However, as was found in the simulation study, the estimates of the means still seem reliable, but care has to be taken when choosing the priors for the precision parameters, which is further discussed in Section 6. Some realizations from Thomas process on the unit square with the group level mean parameters adapted to the unit square are shown in Fig. 10. These could be compared to the point patterns in Fig. 2.
Fig. 8. Group level posterior pcf envelopes for the three groups, normal, mild, and moderate/severe.
subjects with mild or moderate diabetic neuropathy, using Gaussian process regression for the centered L-function. They found no indication that the disease status affected the structure of the entry point patterns, but that the end point patterns were more clustered in the group of mild and moderate subjects. Considering the group level mean parameters, the mean number of daughters per cluster, αg, and the variance of the Gaussian kernel, g2 , there is a large overlap between the posteriors from the different groups, as can be seen in Table 6. The posterior medians of αg are very similar for the healthy and moderate/severe groups, 0.64 and 0.61, respectively. Interestingly, the median is higher, 0.77, for the mild group indicating that the average cluster size in the mild group would
Table 6 Summary of the parameter groupwise natural scale mean parameters for the ENF data. Group
κ
CIκ
α
CIα
ω
CIω
Healthy Mild Moderate/Severe
2.93e-04 2.05e-04 1.43e-04
[2.48e-04,3.54e-04] [1.49e-04,2.98e-04] [9.60e-05,2.06e-04]
0.64 0.77 0.61
[0.557,0.743] [0.545,1.139] [0.411,0.878]
6.69 6.62 7.31
[6.19, 7.43] [5.36, 8.25] [5.25,10.01]
Fig. 9. Posteriors of group level parameters for the ENF data. 55
Mathematical Biosciences 313 (2019) 48–60
C. Andersson, et al.
constructed an MCMC algorithm for estimating the model parameters by sampling from their posteriors. To evaluate how well the parameters can be estimated and groups with different parameter values distinguished, we conducted a simulation study. Finally, the model was fitted to the ENF entry point data. To assess the goodness of fit of the point process model for each subject, we compared the posterior pcf and the pcfs estimated from the individual samples. Due to the large number of subjects, we chose to present integrated error plots for the difference between the pcfs instead of traditional envelope plots for each subject. Our simulation study showed that the MCMC algorithm for sampling from the posteriors works well and is feasible to run on a standard laptop. We also saw that by using the hierarchical approach, we were able to estimate the group mean parameters and separate the groups rather well. However, having only 10 or 25 subjects per group was not enough to obtain good estimates for the precision parameters connected to the within group variance. The choice of the priors seems to have a rather large impact on the posteriors of the precisions which, in turn, affect the posteriors of the mean parameters. One of our observations was that a prior for the precision that is mainly concentrated on values close to zero, leads to relatively large posterior variances for the mean parameters. Such a choice is conservative in the sense that it makes it more difficult to infer whether the groups are different. Concerning the analysis of the ENF data, the Thomas process provided a good, although not perfect, fit for most subjects. The results gave new insights into the group differences, and improved understanding of the nerve regeneration process in the different groups. The Thomas process was chosen as the model for the ENF entry point patterns since entry point locations in all groups were aggregated. Our first idea was to model the aggregation by using the log Gaussian Cox process (LGCP). Both models are physically feasible: In the Thomas process, the parent points can be thought as nerve trunks on deeper skin layers that we cannot observe, and entry points appear close to each other since they stem from the same trunk. In the LGCP, the aggregation can be explained by a varying intensity caused by the ridges in the dermis (the skin layer below the epidermis), where entry points are more likely to appear. To investigate which model seemed most suitable in terms of goodness of fit, we fitted both models to each sample using the minimum contrast method, and compared the nearest neighbor distance distribution functions estimated from each sample to the ones from the fitted models by using the global rank envelope test [14] based on 2500 simulations. With the null hypothesis that each sample was a realization from the respective model with the estimated parameters, the Thomas process was rejected fewer times than LGCP (21 compared to 35 out of 143 samples). There seemed to be strong aggregation at
Fig. 10. Simulated patterns on the unit square with group level mean parameter values adapted from the data observation window to the unit square for the three groups.
Finally, box plots of the posteriors of the subject level parameters, κgs, αgs and gs2 , are presented in Fig. 11. A log-scale is used since the posteriors on the natural scale are quite skewed and a few large values for some subjects make to box plots on the original scale hard to read. The group level results for κg are reflected in the box plot for κgs, with the subjects in the moderate/severe group having lower parent point intensities than the subjects in the healthy group. Again, the mild group seems to be in between the two other groups, with some subjects being more similar to the healthy group and others more similar to the moderate/severe group. The larger variances in the mild and moderate/ severe groups appear also on the subject level for all parameters. From these plots there is no clear correlation between the posteriors of κgs and the other two subject level parameters. 6. Discussion and conclusions Our aim with this work was to find a suitable model for clustered point patterns based on data that have a hierarchical structure, such as ENF entry point data with samples taken from subjects that belong to different groups. We introduced a hierarchical point process model, namely Thomas process in a Bayesian hierarchical framework. Such a modeling framework allows us to capture differences between the groups, and also to describe the variability within the groups. We
Fig. 11. Box plots of samples from the posterior distribution of the subject level parameters on log scale. Subjects are ordered according to mean posterior κgs-value. 56
Mathematical Biosciences 313 (2019) 48–60
C. Andersson, et al.
short ranges which the LGCP model was not able to capture, and therefore we chose to proceed with the Thomas process. The Thomas process is a fairly simple cluster process, and any of the components could be modified to obtain a more flexible model. Our strategy was to start with the Thomas process, and evaluate whether a more complex model was needed once we had fitted it to the ENF data. Since the point counts in the samples are rather low, we would risk not being able to identify all parameters with a more complex model. Our assessment, based on the posterior mean pcf envelopes, is that the model fit is adequate for our purposes in this paper, but that there is room for improvement. The slight positive bias for the IWE indicates that the model overestimates the clustering, and it is possible that a model with a repulsive parent process or a different distribution for the locations of the daughters relative to their parent would provide a better fit. Moreover, the mean number of daughter points in the ENF data was estimated to be less than one. This may mean that the Poisson distribution is not the best model for the daughter point counts, since e.g the probability of empty clusters is quite high. One of the authors is currently investigating whether a distribution with a smaller variance than that of the Poisson distribution would give a better fit for the daughter point counts. It should be noted, however, that such a distribution would not give a Cox process and the corresponding MCMC
approach would be computationally heavier than the approach presented here. Finally, we would like to extend the model by including a 3D-model for the nerve endings to obtain a complete model for the ENF structure. Declarations of interest None Acknowledgements The authors thank William R. Kennedy and Gwen WendelschaferCrabb (University of Minnesota) for blister immunostaining, quantification and morphometry of the ENF data, and for helpful discussions and the two anonymous reviewers whose comments led to an improvement of the paper. The authors would also like to thank the two anonymous reviewers, whose comments and suggestions helped to improve the manuscript. The project has been financially supported by the Swedish Research Council (VR 2013-5212) and the Knut and Alice Wallenberg Foundation (KAW 2012.0067) and the Engineering and Physical Sciences Research Council (EP/N007336/1).
Appendix A. Hastings ratios For the subject level parameters we use Metropolis within Gibbs, since the form of the posteriors are too complicated to sample from directly. To ease the notation of the Hastings ratios, we will use the following notation: W
(c ; ) =
W
k (c
; )d =
1 2
W
2
2
c
exp
d .
2
2
For all three subject level parameters, the proposals are log-normal, with the current value as the mean and a standard deviation sθ, θ ∈ {κ, α, ω}. Ratio for κgs For a proposed gs* , given the current value κgs we get the log ratio as
log r ( gs* ;
gs ) =
* gs
log
( gs*
n (Cgsi)
gs
,g
ext |W gsi |
gs )
i
2
i
((log gs*
µ , g )2
(log
µ , g ) 2 ),
gs
ext where W gsi is the extended window in which we simulate the parent points of Xgsi.
Hastings ratio for αgs For a proposed gs* , given the current value αgs we get
log r ( gs* ;
gs ) =
* gs
log
ngsi ( gs*
gs
gs )
i
Wgsi (c ; i
gs )
c Cgsi
,g
2
((log gs*
µ ,g )2
(log
gs
µ , g )2 )
Hastings ratio for ωgs For ( gs* )2 , given the current value
log r ((
* )2; gs
2 gs ) =
( gs* )2
log
2 gs
,g
2
( (log(
we get
2 gs
ngsi+ i
*) gs
log i
2
µ
2
x Xgsi
,g
)2 (log
c Cgsi
c Cgsi 2 gs
µ
exp exp 2
,g
x c 2 * )2 2( gs x
c 2
2 2 gs
(
gs i
c Cgsi
Wgsi (c ;
*) gs
Wgsi (c ;
gs ))
)2 )
Hastings ratios for birth-death-move updates Finally, the latent parent patterns, Cgsi are updated by the birth-death-move algorithm. In each iteration we propose a move with probability 1/2 and a birth or death with probability 1/4 each. Since these updates depend only on the subject level parameters and the patterns from the same subject are updated independently we drop the indices. Assume we have a current parent pattern, C, of a pattern X observed in a window W and propose a new point, ξ, to add to the pattern. The Hastings ratio is then 57
Mathematical Biosciences 313 (2019) 48–60
C. Andersson, et al.
r (C
; C )=
exp(
W
( ;
)) ×
k(
1+ c
x X
k (c C
x; ) x; )
|W ext| , n (C ) + 1
ext
where W , is the extended window for the parents and n(C) is the number of points in C. When we propose to delete an existing point, ξ, from the pattern, the Hastings ratio is
r (C ; C ) =
1 exp(
W
( , )) ×
k(
1 c
x X
k (c C
x; ) x; )
n (C ) |W ext|
Finally, when we propose to move a point ξ to a new location η the Hastings ratio is
r ((C )
; C ) = exp( [
W
( ; )
W
( ;
)]) ×
1+
k(
x X
x; ) k ( x; ) k ( c x ; ) c C
Appendix B. MCMC diagnostics To check the convergence of the MCMC algorithm for the ENF data we consider plots of the evolution of the potential scale reduction factor (psrf) [15,16] for the group level parameters, shown in Fig. B.12. The computations are based on the output of two chains for the normal group, with different initial values, and the plots are produced using the CODA package for R [17]. A value close to 1 for the psrf indicates that the chains have converged. We find that for all parameters the psrf is close to 1 and quite stable from 250000 iterations on. For the subject level parameters we compute the psrf for each parameter of the second half of the chains, presented in Table B.7. These are all close to 1, and only two values are greater than 1.1. When considering the upper limit of the 95% confidence intervals, there are four values greater than 1.2, with the largest value slightly below 1.5. The multivariate psrf (mpsrf) for the group level parameters is estimated to 1.59. However, when computing the mpsrf for each subject separately, they are all below 1.1. From the estimated auto-correlation functions (Fig. B.13), it is clear that there are quite long ranging correlations in the output. Most likely, this is a consequence of the parent patterns changing by at most one birth, death or move every iteration. A possible solution would be to update the parent patterns more than once in every iteration of the chain, but this is also the most computationally expensive operation in every iteration.
Fig. B1. The black solid lines are the point estimates of the psrf, and the red dashed lines are the upper limits of the 95% confidence interval.
Table B1 Point estimates of the psrfs of the subjectwise parameters. Subject
κgs
αgs
ωgs
1 2 3 4 5 6 7 8 9
1.03 1.01 1.00 1.00 1.01 1.00 1.00 1.00 1.00
1.03 1.01 1.00 1.00 1.00 1.00 1.00 1.00 1.00
1.01 1.01 1.01 1.03 1.01 1.01 1.01 1.01 1.00
(continued on next page) 58
Mathematical Biosciences 313 (2019) 48–60
C. Andersson, et al.
Table B1 (continued) Subject
κgs
αgs
ωgs
10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32
1.04 1.02 1.02 1.01 1.00 1.00 1.00 1.00 1.01 1.00 1.00 1.01 1.00 1.00 1.01 1.01 1.04 1.03 1.00 1.14 1.00 1.01 1.00
1.03 1.02 1.02 1.00 1.00 1.00 1.00 1.00 1.01 1.00 1.00 1.01 1.00 1.00 1.01 1.00 1.03 1.02 1.00 1.11 1.00 1.00 1.00
1.02 1.09 1.02 1.01 1.02 1.01 1.00 1.02 1.01 1.01 1.00 1.04 1.05 1.03 1.02 1.01 1.05 1.04 1.06 1.01 1.00 1.01 1.03
Fig. B2. Estimated auto-correlation functions for the group level parameters.
Appendix C. Deriving prior densities for the group level mean parameters Let g, s log (µ , g , , g1) where μθ,g has density pµ (·) and τθ,g has density p (·), where p (v ) = 0 for v < 0. Furthermore, let derive the density for θg, we start from its distribution function:
F g (t ) = = = =
(
0
0
0
g
t) =
exp µ +
exp µ + µ Fµ log t
log t
1 2v
1 2
t
t p (v ) dv 1 p (v ) dv 2v
1 p (v ) dv, 2v
where Fµ is the distribution function of μθ. To obtain the density function of θg we differentiate F g to obtain 59
g
=
[
g , s ].
To
Mathematical Biosciences 313 (2019) 48–60
C. Andersson, et al.
p g (t ) =
0
pµ log t
1 . p (v ) t 1dv 2v
Supplementary material Supplementary material associated with this article can be found, in the online version, at 10.1016/j.mbs.2019.04.010.
[10] W.R. Kennedy, M. Nolano, G. Wendelschafer-Crabb, T.L. Johnson, E. Tamura, A skin blister method to study epidermal nerves in peripheral nerve disease, Muscle Nerve 22 (3) (1999) 360–371. [11] L.A. Waller, A. Särkkä, V. Olsbo, M. Myllymäki, I.G. Panoutsopoulou, W.R. Kennedy, G. Wendelschafer-Crabb, Second-order spatial analysis of epidermal nerve fibers, Stat. Med. 30 (23) (2011) 2827–2841. [12] C. Andersson, P. Guttorp, A. Särkkä, Discovering early diabetic neuropathy from epidermal nerve fiber patterns, Stat. Med. 35 (24) (2016) 4427–4442. [13] M. Myllymäki, A. Särkkä, A. Vehtari, Hierarchical second-order analysis of replicated spatial point patterns with non-spatial covariates, Spatial Stat. 8 (2014) 104–121. [14] M. Myllymäki, T. Mrkvička, P. Grabarnik, H. Seijo, U. Hahn, Global envelope tests for spatial processes, J. R. Stat. Soc. Ser. B (Statistical Methodology) 79 (2) (2017) 381–404. [15] A. Gelman, D.B. Rubin, Inference from iterative simulation using multiple sequences, Stat. Sci. (1992) 457–472. [16] S.P. Brooks, A. Gelman, General methods for monitoring convergence of iterative simulations, J. Comput. Graph. Stat. 7 (4) (1998) 434–455. [17] M. Plummer, N. Best, K. Cowles, K. Vines, Coda: convergence diagnosis and output analysis for mcmc, R News 6 (1) (2006) 7–11. URL https://journal.r-project.org/ archive/
References [1] J. Møller, R.P. Waagepetersen, Modern statistics for spatial point processes, Scand. J. Stat. 34 (4) (2007) 643–684. [2] T. Mrkvička, Distinguishing different types of inhomogeneity in Neyman-Scott point processes, Method. Comput. Appl. Probab. 16 (2) (2014) 385–395. [3] G. Wendelschafer-Crabb, W.R. Kennedy, D. Walk, Morphological features of nerves in skin biopsies, J. Neurol. Sci. 242 (2006) 15–21. [4] I.G. Panoutsopoulou, G. Wendelschafer-Crabb, J.S. Hodges, W.R. Kennedy, Skin blister and skin biopsy to quantify epidermal nerves a comparative study, Neurology 72 (14) (2009) 1205–1210. [5] M. Thomas, A generalization of poisson’s binomial limit for use in ecology, Biometrika 36 (1/2) (1949) 18–25. [6] J. Neyman, E.L. Scott, Statistical approach to problems of cosmology (with discussion), J. R. Stat. Soc. B 20 (1958) 1–43. [7] J. Møller, R.P. Waagepetersen, Statistical Inference and Simulation for Spatial Point Processes, CRC Press, 2003. [8] J. Illian, A. Penttinen, H. Stoyan, D. Stoyan, Statistical Analysis and Modelling of Spatial Point Patterns, John Wiley & Sons, Ltd., 2008. [9] T. Fiksel, Edge-corrected density estimators for point processes, Statistics 19 (1) (1988) 67–75.
60