Computational Statistics and Data Analysis 55 (2011) 2871–2879
Contents lists available at ScienceDirect
Computational Statistics and Data Analysis journal homepage: www.elsevier.com/locate/csda
Semiparametric regression with shape-constrained penalized splines Martin L. Hazelton a,∗ , Berwin A. Turlach b a
Institute of Fundamental Sciences—Statistics & Bioinformatics, Massey University PN461, Private Bag 11222, Palmerston North, New Zealand
b
School of Mathematics and Statistics (M019), University of Western Australia, 35 Stirling Highway, Crawley WA 6009, Australia
article
info
Article history: Received 18 June 2010 Received in revised form 31 March 2011 Accepted 27 April 2011 Available online 4 May 2011 Keywords: Linear mixed model MCMC Shape constraint Spline Truncated multivariate normal
abstract In semiparametric regression models, penalized splines can be used to describe complex, non-linear relationships between the mean response and covariates. In some applications it is desirable to restrict the shape of the splines so as to enforce properties such as monotonicity or convexity on regression functions. We describe a method for imposing such shape constraints on penalized splines within a linear mixed model framework. We employ Markov chain Monte Carlo (MCMC) methods for model fitting, using a truncated prior distribution to impose the requisite shape restrictions. We develop a computationally efficient MCMC sampler by using a correspondingly truncated multivariate normal proposal distribution, which is a restricted version of the approximate sampling distribution of the model parameters in an unconstrained version of the model. We also describe a cheap approximation to this methodology that can be applied for shape-constrained scatterplot smoothing. Our methods are illustrated through two applications, the first involving the length of dugongs and the second concerned with growth curves for sitka spruce trees. © 2011 Elsevier B.V. All rights reserved.
1. Introduction Classical linear regression models are not appropriate whenever the application at hand includes correlated responses or the regression functions do not have a simple form (such as a low-order polynomial). One way of addressing these problems simultaneously is to use semiparametric regression models based on penalized splines in a linear mixed modelling framework. The inclusion of random effects allows the dependence structure in the data to be modelled, and provides requisite shrinkage estimators of spline coefficients. An excellent account of this type of semiparametric regression model is provided by Ruppert et al. (2003). Penalized splines allow a great deal of flexibility in modelling the shape of the relationship between variables. However, this can lead to estimation of functions with implausible properties. Consider, for example, Fig. 1. This displays data on the growth of sitka spruce trees (Picea sitchensis) in normal conditions and in an ozone-rich atmosphere. A semiparametric model was fitted with the effect of ozone represented as a linear (fixed) effect, and penalized splines were used to describe the growth curves. The fitted model suggests that the trees decrease in size from about day 400 to day 500, and again after day 650. Such behaviour is biologically unlikely, indicating that this feature of the model is an artefact of the underlying fitted penalized spline. We revisit this example in greater detail in Section 4.2. In cases like the sitka growth example, there is a need to place constraints on the shape of the regression function. A monotonicity constraint is a common requirement, but in some applications (particularly in economics) we may instead wish to enforce convexity (Hildreth, 1954; Matzkin, 1991). Other applications, apart from growth curves (Silverman, 1985; Silverman and Wood, 1987), in which one wishes to impose monotonicity constraints include nonparametric calibration (see, e.g., Knafl et al., 1984), and the estimation of dose–response curves (see, e.g., Kelly and Rice, 1990).
∗
Corresponding author. Tel.: +64 6 356 9099x2483; fax: +64 6 350 5682. E-mail addresses:
[email protected] (M.L. Hazelton),
[email protected] (B.A. Turlach).
0167-9473/$ – see front matter © 2011 Elsevier B.V. All rights reserved. doi:10.1016/j.csda.2011.04.018
2872
M.L. Hazelton, B.A. Turlach / Computational Statistics and Data Analysis 55 (2011) 2871–2879
Fig. 1. A semiparametric regression model (bold lines) for the growth curves of sitka spruces (Picea sitchensis) in normal and ozone-rich atmospheres.
A number of authors have looked at shape-constrained spline smoothing. Some of the early notable contributions to this problem are due to Dierckx (1980), Wright and Wegman (1980), and Villalobos and Wahba (1987). Traditionally, shapeconstrained spline smoothing is implemented by choosing a suitable basis for the spline, usually a B-spline basis, such that the desired shape constraints can easily be imposed by imposing suitable linear constraints on the parameters of the basis functions. Thus, fitting a shape-constrained smoothing spline is reduced to either a linear programming problem or a quadratic programming problem. See, among others, Micchelli et al. (1985), Irvine et al. (1986), Ramsay (1988), Fritsch (1990), Schmidt and Scholz (1990), Tantiyaswasdikul and Woodroofe (1994), and He and Shi (1998). Alternative approaches are discussed in Ramsay (1998), Heckman and Ramsay (2000), Turlach (2005), Tutz and Leitenstorfer (2007), Leitenstorfer and Tutz (2007), Wang and Li (2008), and Wang (2008). Theoretical results can be found in Utreras (1985); Mammen and Thomas-Agnan (1999), Meyer and Woodroofe (2000), and Meyer (2008). Later papers have often taken a Bayesian approach to the problem. See, for example, Holmes and Heard (2003), Neelon and Dunson (2004), and Shively et al. (2009). In a recent paper, Brezger and Steiner (2008) developed a method for fitting monotone functions in general additive models using Bayesian P-splines, building on the work of Lang and Brezger (2004) and Brezger and Lang (2006). Using a B-spline basis, they showed that the requirement for monotonicity is simply an isotonic ordered set of spline coefficients. They enforced this by assigning a suitably truncated multivariate normal distribution on these parameters. Model fitting was performed using Markov chain Monte Carlo (MCMC) methods. Their algorithm has a nested structure in that it is necessary to run a short inner sampler (with burn in) to draw values from the requisite truncated multivariate normal distribution, and these must be done at each step of the main (outer) MCMC algorithm. In this paper, we also develop a shape-constrained semiparametric regression model using penalized splines, implemented in a Bayesian framework. Unlike Brezger and Steiner (2008), however, we employ a truncated power series basis for the penalized splines. We discuss how constraints can be imposed, and provide specific details for monotonic and convex regression. As with Brezger and Steiner (2008), our constraints result in the need to sample spline coefficients from a truncated normal distribution. We propose a computationally efficient MCMC sampler that does not require an inner sample to handle the truncated distribution. Our trick is to use a proposal distribution derived as a truncated version of the sampling distribution for model parameters for an unconstrained model. We also describe a cheap approximation that can be applied to scatterplot smoothing problems. We set up our modelling framework in the next section. We begin by describing the model structure. We cover model fitting using least squares and likelihood methods because we require these results for later developments. We then discuss how shape constraints may be imposed on the regression function by placing linear constraints on the spline coefficients. In Section 3, we discuss Bayesian inference and describe our MCMC sampler. Our methods are illustrated in Section 4 on two applications, one involving the length of dugongs (a large marine mammal) and the other being a continuation of our examination of the sitka growth data. We draw conclusions and discuss possible extensions of our work in Section 5. 2. A shape-constrained semiparametric regression model 2.1. Linear mixed models with penalized splines We start by considering the simple case where we observe data (xi , yi ) for individuals i = 1, . . . , n, and wish to model the mean of y as a function of x. A nonparametric regression model is Yi = m(xi ) + εi
(i = 1, . . . , n),
(1)
M.L. Hazelton, B.A. Turlach / Computational Statistics and Data Analysis 55 (2011) 2871–2879
2873
where m is some unknown smooth mean function, and ε1 , . . . , εn are independent normally distributed error terms with zero mean and variance σε2 . A popular approach to fitting such models is to represent m as a spline. See, for example, Green and Silverman (1994) and Ruppert et al. (2003), and the references therein. The precise formulation of the model depends then on the choice of (the number of) knots and the family of basis functions for the spline. We work with penalized regression splines (Eilers and Marx, 1996; Marx and Eilers, 1998; Ruppert and Carroll, 2000). These are characterized by having a relatively modest number of knots (certainly far less then one per distinct x value, as is the case with smoothing splines). Using the simple and intuitive truncated power function basis, our pth-order spline is given by s(x) = s(x|β, u) = β0 + β1 x + · · · + βp xp +
K −
p
uj (x − κj )+ ,
(2)
j =1
where x+ = max(0, x) and κ1 < κ2 < · · · < κK are knots with K < n. We employ the popular rule of thumb K = min(n∗ /4, 35), where n∗ is the number of distinct x values (see Ruppert et al., 2003). The parameter vectors β = (β0 , . . . , βp )T and u = (u1 , . . . , uK )T can be estimated by minimizing the penalized sum of squares n K − − {yi − m(xi )}2 + λ u2j , i =1
(3)
j =1
where m(·) = s(·|β, u) and λ is a tuning parameter that controls the smoothness of the fitted regression function. It is now well understood that this is equivalent to fitting the linear mixed model y = X1 β + Z1 u + ε,
(4)
where the random effects vector u has a N (0, G1 ) distribution with G1 = σ given by
2 u I;
the n × (p + 1) fixed effects design matrix is
X1 = [1, x, . . . , xp ] and the n × K random effects design matrix by p
p
Z1 = [(x − κ1 1)+ , . . . , (x − κK 1)+ ]. Here, 1 denotes a vector of ones, and exponentiation of a vector is to be interpreted as applying elementwise. The variance parameters in (4) are related to the tuning parameter in (3) by λ = σε2 /σu2 . The use of a linear mixed model framework allows us to combine penalized splines with linear effects to produce semiparametric regression models (Ruppert et al., 2003). We will focus our attention on the case where these models contain a single spline term so that the mean function (conditional on the random effects) is given by m(xi ) = s(xi |β1 , u) + xT2,i β2 + z2T,i u2 , where x2,i and z2,i are vectors of fixed and random effect explanatory variables for case i. The parameter vectors β = (βT1 , βT2 )T and u = (uT1 , uT2 )T are partitioned into spline coefficients and non-spline coefficients, with the non-spline random effect vector u2 assumed to have an N (0, G2 ) distribution. Note that we reserve the notation σu2 specifically for the variance of the components of the spline random effects u1 . Extension to additive models containing multiple spline terms is entirely straightforward in principle, but the additional notation required would complicate the exposition of this paper without providing any significant additional insight. To complete the description of this model as a linear mixed model, we define the (full) design matrices by X = [X1 , X2 ], where X2 = [x2,ij ], and Z = [Z1 , Z2 ], where Z2 = [z2,ij ]. The vector of all the random effects u follows an N (0, G) distribution with block diagonal covariance matrix
[ G=
G1 0
]
0 , G2
and we denote by R = σε2 I the covariance matrix for the error vector ε. We also denote the combined design matrices by C = [X , Z ], and define V = ZGZ T + R following the notation of Ruppert et al. (2003). We write θ = (βT , uT )T for the vector of all regression parameters, and τ for the vector of all parameters describing the covariance matrices R and G. Without any shape-related constraints on the model parameters, the classical estimated best linear unbiased predictors (EBLUPs) of β and u are, respectively,
βˆ = (X T Vˆ −1 X )−1 X T Vˆ −1 y
(5)
ˆ ˆ T Vˆ −1 (y − X β), ˆ = GZ u
(6)
and
2874
M.L. Hazelton, B.A. Turlach / Computational Statistics and Data Analysis 55 (2011) 2871–2879
ˆ are estimates of the covariance matrices V and G obtained by residual maximum likelihood estimation where Vˆ and G (REML). The covariance matrix of these EBLUPs can be approximated by [
βˆ ˆ −u u
Var
]
≈ (C T R−1 C + B)−1 ,
(7)
where
[ B=
0 0
]
0 . G−1
2.2. Shape constraints Various types of shape constraint can be applied to the relationship between y and x by placing restrictions on the coefficients ξ = (βT1 , uT1 )T of the spline s. However, in many cases the constraints will be somewhat complex, particularly when the spline is of order p ≥ 3. Consider, for example, the requirement that s be a monotonic non-decreasing function. The most general form for a monotonic polynomial g of order 2ρ + 1 is g (x) = ζ0 + ζ1
∫ x∏ ρ 0
{ηi2 + 2ηi t + (1 + ωi2 )t 2 } dt ,
i =1
where ζ1 ≥ 0 and ζ0 , ηi , and ωi are arbitrary real values (see, e.g., Hawkins, 1994). This implies somewhat complicated quadratic constraints on the coefficients in a cubic spline, for instance. When we look at MCMC Bayesian model fitting in the next section, we will wish to sample from distributions whose support is defined by these constraints. It will be difficult to develop a computationally efficient means of doing so with such general constraints. In light of these comments, we will focus on linear constraints on the coefficients, which we will write as Aξ ≥ c
(8)
for an appropriate matrix A and vector c. Such constraints are very manageable in the context of the aforementioned MCMC computation, but naturally place restrictions on the types of shape constraint that can be represented. Nonetheless, linear constraints on ξ are sufficient to handle the important cases of monotonicity (using linear or quadratic splines), and convexity (using quadratic splines). For linear splines with the truncated power function basis from (2), the parameter vector is ξ = (β0 , β1 , u, . . . , uK )T . We ∑j obtain non-decreasing monotonicity if β1 ≥ 0 and β1 + i=1 uj ≥ 0, for all j = 1, . . . , K . Hence 1 1
.. .
··· ··· .. .
0 0
.
.. .
0 1
0
1
1
···
1
0 0
A= ..
.. , .
and c = 0 in this case. For quadratic splines, the relevant parameter vector is ξ = (β0 , β1 , β2 , u1 , . . . , uK )T . We assume that monotonicity is required over some interval [κ0 , κK +1 ] with κ0 < κ1 and κK < κK +1 . (The endpoints of this interval are not knots, but the use of this notation simplifies the mathematics below.) In practice, we typically set κ0 = min{x1 , . . . , xn } and κK +1 = max{x1 , . . . , xn }. Non-decreasing monotonicity is assured if and only if j −
β1 + 2β2 κj + 2
uk (κj − κk )+ ≥ 0
for j = 0, . . . , K + 1,
k=1
and hence A is a (K + 2) × (K + 3) matrix given by 0 0
1 1
2κ0 2κ1
2(κ2 − κ1 )
0
1
2κK +1
2(κK +1 − κ1 )
A= .. .
.. .
0
.. .
.. .
··· ··· .. . ···
0 0
.. .
.
2(κK +1 − κK )
Convexity of s for p = 2 is obtained with c = 0 and the matrix 0 0
A= ..
0 0
.
.. .
0
0
1 1
0 1
.. .
..
1
1
.
··· ··· .. .
0 0
···
1
.
While all the cases above have employed c = 0 in (8), the possibility of a non-zero vector c provides increased flexibility. For example, we can impose a minimum or maximum gradient constraint at the origin which might, for example, be appropriate when modelling stock and recruitment relationships in fisheries research (see, e.g., Myers et al., 1999).
M.L. Hazelton, B.A. Turlach / Computational Statistics and Data Analysis 55 (2011) 2871–2879
2875
3. Bayesian model fitting by MCMC methods MCMC methods are a popular and generally effective means for fitting hierarchical models like the linear mixed model (4) within the Bayesian framework. We can fit shape-constrained semiparametric regression models by simply truncating the multivariate normal distribution for ξ according to the linear constraints described in the previous section. However, this creates the problem of sampling from truncated multivariate normal distributions. Algorithms to simulate from a univariate truncated normal distribution were proposed by Geweke (1991), Robert (1995), and Chopin (2011); see also Gentle (2003, Chapter 5). A Gibbs sampler to simulate directly from a multivariate truncated normal distribution was described by Damien and Walker (2001), while Geweke (1991) and Robert (1995) suggest using their univariate rejection sampler in conjunction with Gibbs sampling for sampling from a truncated normal distribution. While Robert (1995) discusses a Gibbs sampler suitable for quite general truncations, Geweke (1991) concentrates on truncations given by a set of linear inequalities. A drawback of Geweke’s proposal is that the matrix that defines the linear inequalities has to be square and of full rank. Rodriguez-Yam et al. (2004) propose a Gibbs sampler that avoids these restrictions and works for truncated multivariate normal distributions for which the truncation is defined by any set of linear inequalities. In our work, we used an implementation of Robert’s univariate rejection sampler in conjunction with Rodriguez-Yam et al.’s Gibbs sampler. Rodriguez-Yam et al.’s refinement produces chains with good mixing properties even when the components of the vector being sampled are highly correlated. This is important, because the spline coefficients using the truncated power series basis may be heavily dependent due to near collinearity in the design matrix. Using a Gibbs sampler is (of course) significantly more computationally expensive than direct sampling from the corresponding distribution without truncation. It follows that we would prefer to employ an approach that does not require a new run of Rodriguez-Yam et al.’s sampler at each iteration of the outer MCMC algorithm, in particular because this would require the generation of a fresh burn in at each step. The previous discussion motivates the development of a Metropolis–Hastings algorithm in which the proposal distribution for ξ is independent of the current state. We will then require just a single burn in period, after which we can generate a single candidate value of ξ at each iteration. However, such an approach will only provide computational efficiency if the acceptance rate is reasonable, which in turn requires that the proposal distribution is at least a crude approximation to the posterior. Our idea is to employ a truncated multivariate normal proposal distribution based on the estimate sampling distribution of the (estimated) BLUPS βˆ1 and uˆ1 with the linear constraints applied. Specifically, we employ a proposal distribution q defined by
ˆ H (C T R−1 C + B)−1 H T ), q = TNS (ξ,
(9)
where TNS (µ, Σ ) denotes a multivariate normal distribution with mean vector µ and covariance matrix Σ , truncated to T
ˆ , uˆ T )T ; and the matrix H extracts the spline coefficients from the vector of all support S ≡ SA,c = {x : Ax ≥ c }; ξˆ = (β 1 1 regression parameters, so that it contains 0s everywhere apart from a diagonal of 1s from position (1, 1) to (p + 1, p + 1), and another from position (p + 2 + K , p + p2 + 2) to (p + 2 + K , p + p2 + 2 + K ), where p2 is the number of non-spline fixed effects (i.e. the dimension of β2 ). Using this proposal distribution, our complete MCMC algorithm is of the following form. 1. Select initial vectors θ 0 and τ 0 . Set t = 1. ∗ 2. Generate candidate ξ ∼ q. 3. Compute the acceptance probability by
π = min 1,
q(ξ
t −1
)f (y |θ ∗ , τ t −1 )f (u∗1 |(σu2 )t −1 )f (β∗1 )
q(ξ )f (y |θ t −1 , τ t −1 )f (ut1−1 |(σu2 )t −1 )f (βt1−1 ) ∗
.
Set ξ ← ξ with probability π ; otherwise, ξ ← ξ . t 4. Sample βt2 , ut2 , and τ t (conditional on ξ ) using the Metropolis–Hastings algorithm. 5. t ← t + 1; return to 2. t
∗
t
t −1
In the formula for π , f denotes a generic density function which is then specified by the dummy variables. The random effects density f (u1 |σu2 ) = TNS (0, σu2 I ). The prior density f (β1 ) should be chosen to have support S . A truncated diffuse multivariate normal is a possible choice. It should be recognized that any prior without non-zero probability mass on the boundary of S will correspond to an assumption that the constraint in (8) holds in a strict sense. Alternatives that do place probability mass on the boundary are discussed by Neelon and Dunson (2004) for linear splines and Shively et al. (2009) in the context of monotonic smoothing splines. However, the use of such priors would consequently require a proposal distribution that also placed mass at the boundary. We note that it is not necessary to compute the normalizing constant for either f (β1 ) or f (u1 |σu2 ) when computing π , because these constants cancel. Moreover, we also obtain a helpful cancellation when updating σu2 , but only if c = 0 in (8). To see this, note that the requisite conditional for σu2 is f (σu2 |u1 ) ∝ f (u1 |σu2 )f (σu2 ) = TNS (0, σu2 I )f (σu2 ).
2876
M.L. Hazelton, B.A. Turlach / Computational Statistics and Data Analysis 55 (2011) 2871–2879
Now, if c = 0,
∫ S
φI σu2 (x) dx =
∫
φI (x) dx = ν, S
where φΣ (·) is the multivariate normal density with zero mean vector and covariance matrix Σ , because SA,0 = {x : Ax ≥ 0} does not change under multiplication of x by a scalar. It follows that the normalizing constant ν of TNS (0, σu2 I ) does not depend upon σu2 when c = 0, and hence will cancel when constructing acceptance probabilities for this parameter. However, if c ̸= 0, then the normalizing constant does depend on σu2 , and hence explicit calculation of this term is necessary. Repeated calculation of the normalizing constant at each iteration will typically slow the algorithm to an unacceptable degree. In order to keep our methodology entirely general (i.e. applicable whether or not c = 0), we prefer to fix the value of σu2 at its REML value σˆ u2 obtained when fitting the unconstrained model. This can be viewed as a type of empirical Bayes methodology (see, e.g., Casella, 2001), although we sample any other hyperparameters in the model (i.e. those describing the distribution of the non-spline random effects u2 ) in the usual Bayesian way. Since the parameter σu2 controls the amount of smoothness in the fitted spline, this approach can also be regarded as a technique for preselecting the smoothing parameter for s. Naturally, we may go further in terms of controlling the degree of smoothness by adjusting the parameter λ = σε2 /σu2 in the unconstrained model to a preferred value, rather than estimating σε2 and σu2 separately by REML. The methodology described above can be employed to perform shape-constrained scatterplot smoothing when there are no other fixed and random effects in the model except for x. However, in that case a computationally cheap approximation to the posterior mean fit is available. If we employ flat priors on SA,c , then q will be an approximation to the posterior density. Hence, s˘(x) =
∫
s(x|ξ)q(ξ) dξ
will be an approximation to the posterior mean spline, which can be estimated by sˆ(x) =
N 1 −
N t =1
s(x|ξ ) t
(10)
in the usual way, where we have assumed that the burn in period has been discarded (so that t = 1 for the first iteration thereafter) and N is the number of (post burn in) samples from q. 4. Two applications 4.1. Scatterplot smoothing for dugong data We consider first data on the growth of dugongs (Dugong dugong), a type of large marine mammal related to the manatee (sea cow). The data comprise single observations on age (in years) and length (in metres) of 27 individuals captured near Townsville in north Queensland, Australia. See Marsh (1980) for details. Our aim is to fit monotonic non-decreasing curves to describe length as a function of age. Since this is a scatterplot smoothing exercise, we have the option of using the computationally cheap approximation described in Eq. (10). We display three fitted curves for the data in Fig. 2, in each case obtained using a quadratic spline with K = 5 knots (evenly space on the quantile scale). We use diffuse multivariate normal priors, truncated where necessary. The dot–dash line is the unconstrained fitted spline. This curve has a rather implausible turning point at around x = 25 years, after which mean length decreases. Given that the typical lifespan of a dugong is 50–60 years, this cannot be a result of old age. Rather, it would seem to be an artefact of the smoothing methodology. The monotonic fitted curves provided by the full MCMC approach and the approximation thereof appear more biological reasonable. There is clearly little difference in these two fits. The shaded grey region around the curve describes pointwise 95% credible intervals for the shape-constrained curve. The full MCMC curve took 78 s to compute, based on N = 10,000 iterations on the sampler running on a desktop PC (Quad core 2.66 GHz CPU with 3.24 GB available RAM running Windows XP). The acceptance rate for the sampler was 49%. The approximation method took 39 s on the same platform. All computing was performed using the software R, version 2.10 (R. Development Core Team, 2009). 4.2. Modelling the growth of sitka spruces This example is concerned with the data for sitka spruces that we mentioned in the Introduction. They are taken from a study in which 54 trees were grown in an ozone-rich atmosphere (coded ozone = 1 below) and 25 were grown under normal conditions (ozone = 0). Repeated measurements of the tree size were taken over a two-year period. This was calculated as being (proportional to) the product of tree height and diameter squared, which is a good approximation to above-ground biomass after inclusion of a suitable multiplicative constant (see, e.g., Lucas et al., 1988). This size measurement is converted to a log scale for our subsequent analysis.
M.L. Hazelton, B.A. Turlach / Computational Statistics and Data Analysis 55 (2011) 2871–2879
2877
Fig. 2. Penalized spline regression models of the mean length of dugongs (Dugong dugong) as a function of age. The shaded region describes pointwise 95% credible intervals for the monotonic model.
Fig. 3. A monotone semiparametric regression model for the growth curves of sitka spruces (Picea sitchensis) in normal and ozone-rich atmospheres.
A possible semiparametric regression model for these data is
logsizeij = s(dayij ) + Ui + β4 ozonei + εij (i = 1, . . . , n; j = 1, . . . , T ), where dayij is the time in days from 1 January 1998 at which the jth measurement was taken on the ith tree. The coefficient
β4 is the fixed effect of ozone, and u2 = (U1 , . . . , Un )T is a vector of independent tree-specific N (0, σU2 ) random effects. The growth of these trees is seasonal and non-linear, motivating the use of a spline term for the growth curve. It is implausible that the trees will shrink, so it is natural to impose a monotonicity constraint on s. We employ a monotone spline s with K = 3 knots, evenly spaced on the quantile scale. Truncated diffuse multivariate normal priors are assigned to the regression coefficients, suitably truncated for the spline parameters. The priors for σε and σU are uniform. The fitted growth curves are displayed in Fig. 3. Clearly the growth rate is slow in the middle of the time period considered. In particular, from about day 450 to day 525 there is negligible growth. As we saw in the Introduction, unconstrained spline models tend to model an implausible decrease in size over that period. We describe pointwise 80% credible intervals for the growth curves for each group using grey shading. The lack of overlap between these intervals provides evidence of an ozone effect, a finding mirrored by the posterior point estimate of βˆ 4 = −0.304 and 95% credible interval (−0.592, −0.023) for the parameter β4 . This result ties in with current understanding in phytology, which suggests that exposure of sitka spruces to ozone makes these trees more sensitive to frost, and consequently slows growth, particularly during the colder parts of the year (see, e.g., Lucas et al., 1988).
2878
M.L. Hazelton, B.A. Turlach / Computational Statistics and Data Analysis 55 (2011) 2871–2879
5. Discussion In this paper, we have described a methodology for imposing shape constraints on penalized splines in semiparametric regression models. One of the strengths of our method is that it can handle a range of curve shapes, unlike many earlier methods, which are restricted to the case of monotonicity. Nevertheless, we acknowledge that the choice of shape constraint dictates the order of spline that can be employed if we are to obtain the requisite linear inequalities on the spline coefficients. Our methodology cannot, for example, fit monotone cubic splines, although it can manage both monotonicity and convexity constraints when using quadratic splines. Like a number of other Bayesian approaches, our method requires that we sample from a truncated (normal) distribution. Our Metropolis–Hastings algorithm uses a proposal distribution (derived from the fit of an unconstrained model) that is a good match for the posterior, hence ensuring a healthy acceptance rate. Moreover, our method requires that we sample from just one specific truncated normal distribution, hence obviating the need to burn in a new truncated normal sampler at each iteration of the outer algorithm. The truncated normal distribution from which we do sample can be quite strongly correlated because of our use of a truncated power series spline basis, in contrast to methods based on B-splines. Nonetheless, recent refinements to algorithms for truncated normal samplers provide good mixing even in correlated cases, and mean that our methodology is computationally efficient. While we have focused solely on semiparametric extensions to the linear mixed model, our methodology can be applied in a direct manner to generalized linear mixed models. However, one might expect the resulting MCMC algorithms to be somewhat less efficient than in the linear model case. The point is that a normal sampling distribution for the regression parameter estimates will provide a relatively good approximation to the posterior (based on diffuse priors) for linear models, but may be a less good match for generalized linear models. As a result, our suggested proposal distribution proves less effective in the context of the latter, producing a lower acceptance rate in the sampler. Acknowledgements The authors would like to thank the editor, an associate editor, and two referees for their valuable comments and constructive suggestions. The work on this paper was partly done while the second author was employed at the Department of Statistics and Applied Probability at the National University of Singapore; during that time, funding was received from the Singapore Ministry of Education AcRF R–155–000–069–133. References Brezger, A., Lang, S., 2006. Generalized structured additive regression based on Bayesian P-splines. Computational Statistics & Data Analysis 50 (4), 967–991. Brezger, A., Steiner, W.J., 2008. Monotonic regression based on Bayesian P-splines: an application to estimating price response functions from store-level scanner data. Journal of Business & Economic Statistics 26 (1), 90–103. Casella, G., 2001. Empirical Bayes Gibbs sampling. Biostatistics 2 (4), 485–500. Chopin, N., 2011. Fast simulation of truncated Gaussian distributions. Statistics and Computing 21 (2), 275–288. Damien, P., Walker, S.G., 2001. Sampling truncated normal, beta, gamma densities. Journal of Computational and Graphical Statistics 10 (2), 206–215. Dierckx, P., 1980. An algorithm for cubic spline fitting with convexity constraints. Computing 24, 349–371. Eilers, P.H.C., Marx, B.D., 1996. Flexible smoothing with B-splines and penalties (with discussion). Statistical Science 11 (2), 89–121. With comments and a rejoinder by the authors. Fritsch, F.N., 1990. Monotone piecewise cubic data fitting. In: Mason, J.C., Cox, M.G. (Eds.), Algorithms for Approximation II. Chapman and Hall, London, pp. 99–106. Gentle, J.E., 2003. Random Number Generation and Monte Carlo Methods, 2 edn. In: Statistics and Computing, Springer-Verlag, New York. Geweke, J., 1991. Efficient simulation from the multivariate normal and student-t distribution subject to linear constraints, in: Keramidas, E.M. (Ed), Computer Science and Statistics: Proceedings of the Twenty-Third Symposium on the Interface, Interface Foundation of North America, Fairfax, Virginia, pp. 571–578. Green, P.J., Silverman, B.W., 1994. Nonparametric Regression and Generalized Linear Models: A Roughness Penalty Approach. Chapman & Hall, London. Hawkins, D.M., 1994. Fitting monotonic polynomials to data. Computational Statistics 9 (3), 233–247. Heckman, N., Ramsay, J.O., 2000. Penalized regression with model-based penalties. Canadian Journal of Statistics 28, 241–258. He, X., Shi, P., 1998. Monotone B-spline smoothing. Journal of the American Statistical Association 93 (442), 643–650. Hildreth, C., 1954. Point estimates of ordinates of concave functions. Journal of the American Statistical Association 49, 598–619. Holmes, C.C., Heard, N.A., 2003. Generalized monotonic regression using random change points. Statistics in Medicine 22, 623–638. Irvine, L.D., Marin, S.P., Smith, P.W., 1986. Constrained interpolation and smoothing. Constructive Approximation 2 (2), 129–151. Kelly, C., Rice, J., 1990. Monotone smoothing with application to dose–response curves and the assessment of synergism. Biometrics 46, 1071–1085. Knafl, G., Sacks, J., Spiegelman, C., Ylvisaker, D., 1984. Nonparametric calibration. Technometrics 26 (3), 233–241. Lang, S., Brezger, A., 2004. Bayesian P-splines. Journal of Computational and Graphical Statistics 13 (1), 183–212. Leitenstorfer, F., Tutz, G., 2007. Generalized monotonic regression based on B-splines with an application to air pollution data. Biostatistics 8 (3), 654–673. Lucas, P.W., Cottab, D.A., Sheppard, L.J., Francis, B.J., 1988. Growth responses and delayed winter hardening in sitka spruce following summer exposure to ozone. New Phytologist 108, 495–504. Mammen, E., Thomas-Agnan, C., 1999. Smoothing splines and shape restrictions. Scandinavian Journal of Statististics 26, 239–252. Marsh, H.R., 1980. Age determination of the dugong [dugon dugon (Müller)] in Northern Australia and its biological implications. In: Age Determination in Toothed Cetaceans and Sirenians, International Whaling Commission, Cambridge. Marx, B.D., Eilers, P.H.C., 1998. Direct generalized additive modeling with penalized likelihood. Computational Statistics and Data Analysis 28 (2), 193–209. Matzkin, R.L., 1991. Semiparametric estimation of monotone and concave utility functions for polychotomous choice models. Econometrica 59 (5), 1315–1327. Meyer, M.C., 2008. Inference using shape-restricted regression splines. Annals of Applied Statistics 2 (3), 1013–1033. Meyer, M.C., Woodroofe, M., 2000. On the degrees of freedom in shape-restricted regression. Annals of Statistics 28 (4), 1083–1104. Micchelli, C.A., Smith, P.W., Swetitis, J., Ward, J.D., 1985. Constrained Lp approximation. Constructive Approximation 1 (1), 93–102.
M.L. Hazelton, B.A. Turlach / Computational Statistics and Data Analysis 55 (2011) 2871–2879
2879
Myers, R., Bowen, K., Barrowman, N., 1999. Maximum reproductive rate of fish at low population sizes. Canadian Journal of Fisheries and Aquatic Sciences 56, 2404–2419. Neelon, B., Dunson, D.B., 2004. Bayesian isotonic regression and trend analysis. Biometrics 60, 398–406. R Development Core Team. 2009. R: A Language and Environment for Statistical Computing, R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0. URL: http://www.R-project.org. Ramsay, J.O., 1988. Monotone regression splines in action (with discussion). Statistical Science 3 (4), 425–461. Ramsay, J.O., 1998. Estimating smooth monotone functions. Journal of the Royal Statistical Society Series B 60 (2), 365–375. Robert, C.P., 1995. Simulation of truncated normal random variables. Statistics and Computing 5, 121–125. Rodriguez-Yam, G., Davis, R., Scharf, L., 2004, Efficient Gibbs sampling of truncated multivariate normal with application to constrained linear regression, Technical report, Department of Statistics, Colorado State University. URL: http://www.stat.colostate.edu/~rdavis/papers/CLR.pdf. Ruppert, D., Carroll, R.J., 2000. Spatially-adaptive penalties for spline fitting. Australian & New Zealand Journal of Statistics 42 (2), 205–223. Ruppert, D., Wand, M.P., Carroll, R.J., 2003. Semiparametric Regression. Cambridge University Press. Schmidt, J.W., Scholz, I., 1990. A dual algorithm for convex–concave data smoothing by cubic C 2 -splines. Numerische Mathematik 57, 333–350. Shively, T.S., Sager, T.W., Walker, S.G., 2009. A Bayesian approach to non-parametric monotone function estimation. Journal of the Royal Statistical Society series B 71 (1), 159–175. Silverman, B.W., 1985. Some aspects of the spline smoothing approach to non-parametric regression curve fitting (with discussion). Journal of the Royal Statistical Society, Series B 47 (1), 1–50. Silverman, B.W., Wood, J.T., 1987. The nonparametric estimation of branching curves. Journal of the American Statistical Association 82, 551–558. Tantiyaswasdikul, C., Woodroofe, M.B., 1994. Isotonic smoothing splines under sequential designs. Journal of Statistical Planning and Inference 38, 75–88. Turlach, B.A., 2005. Shape constrained smoothing using smoothing splines. Computational Statistics 20 (1), 81–104. doi:10.1007/BF02736124. Tutz, G., Leitenstorfer, F., 2007. Generalized smooth monotonic regression in additive modeling. Journal of Computational and Graphical Statistics 16 (1), 165–188. Utreras, F.I., 1985. Smoothing noisy data under monotonicity constraints: existence, characterization and convergence rates. Numerische Mathematik 47, 611–625. Villalobos, M., Wahba, G., 1987. Inequality-constrained multivariate smoothing splines with application to the estimation of posterior probabilities. Journal of the American Statistical Association 82 (397), 239–248. Wang, X., 2008. Bayesian free-knot monotone cubic spline regression. Journal of Computational and Graphical Statistics 17 (2), 373–387. Wang, X., Li, F., 2008. Isotonic smoothing spline regression. Journal of Computational and Graphical Statistics 17 (1), 21–37. Wright, I., Wegman, E., 1980. Isotonic, convex and related splines. Annals of Statistics 8, 1023–1035.