An algorithm for quantile smoothing splines

An algorithm for quantile smoothing splines

COMPUTATIONAL STATISTICS & DATAANALYSIS ELSEVIER Computational Statistics & Data Analysis 22 (1996) 99-118 An algorithm for quantile smoothing sphne...

821KB Sizes 9 Downloads 182 Views

COMPUTATIONAL STATISTICS & DATAANALYSIS ELSEVIER

Computational Statistics & Data Analysis 22 (1996) 99-118

An algorithm for quantile smoothing sphnes Pin T. Ng Department of Economics, University of Houston, Houston TX 77204-5882, USA Received March 1995; revised July 1995

Abstract

For p = 1 and c~, Koenker, Ng and Portnoy (Statistical Data Analysis Based on the Li Norm and Related Methods (North-Holland, New York, 1992); Biometrika, 81 (1994)) proposed the rth Lp quantile smoothing spline, ~.qo, defined to solve min "fidelity" + ,~ "Lp roughness" as a simple, nonparametric approach to estimating the zth conditional quantile functions given 0 < n r < 1. They defined "fidelity" = ~--~i=lP~ (Yi - #(xi)) with p,(u) = (z - l(u < 0))u, "Li roughness" = y~i~=-~~ ]g' (x~+~)-#' (xi)l, "Loo roughness" = maxx #" (x), 2 _> 0 and ffp to be some appropriately defined functional space. They showed de, Lp to be a linear spline for p = 1 and parabolic spline for p = oo, and suggested computations using conventional linear programming techniques. We describe a modification to the algorithm of Bartels and Conn (ACM Trans. Math. Software, 6 (1980)) for linearly constrained discrete Lj problems and show how it can be utilized to compute the quantile smoothing splines. We also demonstrate how monotonicity and convexity constraints on the conditional quantile functions can be imposed easily. The parametric linear programming approach to computing all distinct zth quantile smoothing splines for a given penalty parameter 2, as well as all the quantile smoothing splines corresponding to all distinct 2 values for a given ~, also are provided. Keywords: Quantile; Splines; Robustness; Nonparametric regression; Monotone regression; Constrained optimization

1. Introduction Let frlx (ylx) be the conditional probability density function of a continuous random variable Y given X - - x . The zth conditional quantile function, #~ (x), is a 0167-9473/96/$15.00 (~) 1996 Elsevier Science B.V. All rights reserved SSDI 0 1 6 7 - 9 4 7 3 ( 9 5 ) 0 0 0 4 4 - 5

P. T. Ng l Computational Statistics & Data Analysis 22 (1996) 99-118

100

function of x such that oe(x) r = fYIX (ylx) dy

f

J--dO

for 0 < z < 1. Koenker et al. [9, 10] introduce Ll and L ~ quantile smoothing splines as nonparametric estimators for g~ (x). Given n pairs of observations {(xi, Yi)}i=l with a = x0 < xl < ... < x, < x,+l = b, some smooth function 9 and the check function p~(u) = [z - I ( u < 0)]u = [½ + (z - ½)sgn(u)] lul, we define "fidelity" to the data as n

"fidelity"= ~

p,(y~ - g(x,)).

i=1

The rth Lp quantile smoothing spline, ~ , ~ , is the solution to min "fidelity" + 2 "Lp roughness"

(1)

o~p

where 2 > 0 is the usual smoothing parameter controlling the trade-off between fidelity to the data and roughness of the fit. Koenker et al. [9, 10] suggest two versions of "'Lp roughness" measure for the fit. The L1 roughness is measured by the total variation of the first derivative, n--I

"LI r o u g h n e s s " = V (g') = Z Ig'(xi+l) - g'(xi)[

(2)

i=l for some function g which belongs to the following expanded Sobolev space, c~1 = V(2) = {g " 9 ( x ) = ao + alx + a~ER, i=0,1}

/0'

(x-

y)+ d l x ( y ) , V ( p ) < oo, i~ - g',

introduced in Pinkus [12]. The L ~ roughness is measured in the usual L ~ norm of g", "L~ roughness" = II g" 11~ = max g" (x) X

(3)

in the following L ~ Sobolev space W ~ ) = {g" g (x) = ao + alx + aiER,

i = 0,1}.

/0'

(x - y)+ 9" (Y) dy, g" E L ~ ,

The solution O~.L, is a linear smoothing spline whose second derivative does not exist everywhere (see [9, Theorem 1] for detail). As a result, "L1 roughness" is measured in total variation of the first derivative of functions in the expanded Sobolev space V (2) instead of the usual LI norm of the second derivative, r i o " (x)[ dx, of functions in the usual Sobolev space W~2). The median smoothing splines (z = 0.5) are nonparametric estimators of the conditional median function and provide robust alternatives to least squares estimates

P. T. NO I Computational Statistics & Data Analysis 22 (1996) 99-118

• o

101



0

~.."

- :..°

°

8 °

o



" . :.'.

g o~



-





••

•e*•~

""

m ,~e

"o

.'s •

|

R*~ ":.





• •

IJ_



.'...

,

:

O i

50

Fig. 1. U n c o n s t r a i n e d L= - - ,

i

i

100 150 Engine Output [horsepower] Lo~ . . . . . . . . . . . .

i

200

median smoothing splines and L2

cubic s m o o t h i n g spline.

of the conditional mean function as estimators for central tendency. Fig. 1 presents the LI and Loo median smoothing splines' estimates of the conditional median along with Reinsch's [14] L2 cubic smoothing spline estimate of the conditional mean for the "cars" data set analyzed in Hawkins [6]. The sample has 392 observations for fuel consumption (response) and power output (covariate) for which both are available in the "'cars" data set provided in 1983 ASA Data Exposition. The degree of smoothness determined by 2 was chosen using the Schwarz information criterion (SIC) suggested by Koenker et al. [9] for the L] and Loo quantile smoothing splines while the smoothing parameter for the cubic smoothing spline was selected using Craven and Wahba's [5] generalized cross-validation. The conditional mean estimate is sensitive to outliers in fuel consumption, labeled as o in Fig. 1, while neither conditional median estimates are. Additional constraints on the fitted function in the form of monotonicity, concavity or convexity are sometimes desirable. Examples include estimation of per unit cost functions and efficient production frontiers in Economics, where the estimated functions are sometimes required to be convex and concave, respectively. Growth curves, demand functions, supply functions and total cost functions are examples where monotonicity constraints on the conditional quantile estimates may be needed. These additional constraints are easily incorporated into the quantile smoothing splines as will be shown in Section 3. Fig. 2 presents the constrained versions of the LI and Loo median smoothing splines. Here, the median smoothing splines are forced to be monotonically decreasing as noted by Hawkins [6, p. 241]: "Clearly, the fuel consumption should increase with the engine's power output, but the relationship is likely to be nonlinear". Again the smoothing parameters are chosen by the SIC. Comparing Fig. 1 to Fig. 2, the constrained median smoothing splines agree closely with the non-constrained versions except in the extreme left and right regions.

102

P. T. Ng l Computational Statistics & Data Analysis 22 (1996) 99-118

.:°,;,..." e• .=41~ ••

o

. ==" o

gg • •



• •

e-



me•86

• eeo • • 0 • •~

o



I• o eO

e°dp °

o

8





|

U-

• ~ °







.o

0 i

50

i

i

i

100 150 Engine Output [horsepower]

Fig. 2. Constrained L1 and Loo median smoothing splines, g.s.t~, -

200

-

g.5,L~

. . . . . . . . . . . .

"

There are many situations where aspects of the conditional distribution other than the central tendency are also of substantial interest. Some examples include seasonal pattern of extreme temperatures, water levels or pollution readings. Figs. 3 and 4 display, respectively, the L, and Lo~ quantile smoothing splines' estimates of the z = (0.10, 0.25, 0.50, 0.75, 0.90) conditional quantile functions of the "cars" data set. The monotonically constrained quantile smoothing splines for z = 0.1, 0.25 in Figs. 3 and 4 are probably somewhat undersmoothed, reflecting the fact that SIC is not very good at selecting the smoothing parameters for extreme quantiles as noted in Koenker et al. [9]. Currently, we are working on other information criteria for choosing the smoothing parameters of extreme quantiles, which we will report in a future paper. Nevertheless, heteroskedasticity is quite apparent in the data. Dispersion in fuel efficiency is greater for small engine cars than those of larger engines. In fact, the entire family of distinct quantile function estimates for a chosen 2, as well as all the quantile function estimates corresponding to all the distinct 2's for a fixed T, can be obtained easily via parametric programming technique, as illustrated in Section 5. Bartels and Conn [2] [BC] provide an efficient algorithm for solving a linearly constrained discrete L, problem. Their algorithm is well suited, with some modifications, for computing the L] and Lo~ quantile smoothing splines. We provide details of the modifications in Section 4. We first develop some theory of quantile smoothing splines in Section 2. Section 3 shows how monotonicity and convexity/concavity constraints can be incorporated into the objective functions. Implementation of the algorithm is detailed in Section 4. Section 5 demonstrates how parametric linear programming techniques can be used to obtain the entire path of quantile smoothing splines corresponding to different smoothing parameters for a given z, as well as the entire spectrum of quantile smoothing splines corresponding to all the distinct z's given a fixed 2. FORTRAN algorithms and an S(plus) interface are available from the author by sending E-mail to [email protected].

P. T. N# I Computational Statistics & Data Analysis 22 (1996) 99-118

0~

ff

0 ',¢

103

"...., *.** .•



.

~* . , ~

• .



".,~ .'~ : ".-~ ",\

•. O



. *

°

o3

~----~,..,;. :~:.,;..... ._.., •

,~



,e . • ,i . - .

N

.~.. *-, ..°

: ...

....~-~:



. . . . ~

-i

i

-, K

- . • • .:...":

i

50

. . . . .



i

100

[

2 oo

150

En9ine Output [horsepower]

Fig. 3. ~ = (0.10, 0.25, 0.50, 0.75, 0.90) constrained L~ quantile smoothing splines.

",;,. 0

".. , , 4 ,

*

..%%. ° ""~."t~.. :.

ID

gN x •

°

.'o..•-.o

E

8~ ..

:..~". . . . .



:

~-,~

. ~.

.~ ~ ..........~...]~.:.:~:.-:.~.... •

O

50

1O0 Engine

Fig. 4. z = (0.10,0.25,0.50,0.75,0.90)

~--~-,,~_--

150 Output



_._. . . . . . 200

[horsepower]

constrained Lo~ quantile smoothing splines.

2. Theory

2.1. The L1 penalty linear quantile smoothin# splines Koenker et al. [9] show the existence of a natural linear smoothing spline of the fOrlTl

L,,,(x) = { solving

• i + #,(x - xi)

~o + # o ( x - xl )

(1) with p = 1.

for x E [xi, xi+]), i = 1 , . . . , n f o r x E [x0,xl )

104

P. T. No l Computational Statistics & Data Analysis 22 (1996) 99-118

Let hi = x/+l - x/, i = 1. . . . . n - 1. From the continuity constraints o f the linear spline, we have ~0 = ~1 plus 0~,z,(Xi+l) = o~i + flihi = ~/+1,

i = 1,...,n - 1

which yields

fl/= (o~i+1- o~/)/hi,

i = 1 , . . . , n - 1.

(4)

In addition, we have fl0 --- fll and fln-i = fin; otherwise the "roughness" measured by (2) could be reduced without affecting the "fidelity". The '%1 roughness" penalty can thus be expressed as n--2

n--2

V (gt) = Z [ ] ~ i + 1 _ j~i[ = Z [ ( ~ i + i=l

2 _o~i+l)/hi+l-

(5)

(~i+1-~,)/h,I.

i=l

We have 2 (n + 1 ) parameters and n + 2 constraints. Write the n free parameters as 0 = (~1 . . . . . ~n)' and let ]J = (ill . . . . . fln-i )'. We may rewrite the objective function as

~in ~

+

1

z-

sgn ( y i - ~iO) 133i- ~,01 + ~

i=1

lYj - :?j0l,

(6)

j=n+l

where ~. and ~ are the ith and j t h elements o f the pseudo response vector, fi' = (yb...,Yn,0'n-2), respectively, 2/ and :?j are the ith and j t h rows of the (2n - 2 ) x n pseudo design matrix,

,--[5] In the pseudo design matrix, I. is an n x n identity matrix and

l(ll) - E+E

1

~, U=2

o •

0

1

h~

E -

(~--~2+ ~s)

.

.

0

1

h-S °

",

".

...

0

... o .

"-

1

hn-2

".

(h-~_2 h-~-i) +

is an (n - 2) × n matrix. Also write the ( 2 n - 2)-vector of weights as

~1+ (~ - ½)son(.~,

~,o)

W =

~+ 1 (, - ~)~o~(L ~o)

0

0

1

P.T.

NOI Computational

Statistics & Data Analysis 22 (1996) 99-118

105

where 1 is an ( n - 2)-vector o f l's. We then may formulate (6) as the following linear program (see e.g. [1, 7, 15]): minoe R, s.t

2.2.

~ w (u + v)

~-.~O=u-v U E .14~+2 n - 2 , V E [. 4~+2 n - 2

T h e Log p e n a l t y q u a d r a t i c q u a n t i l e s m o o t h i n g splines

The solution to (1) with (3) is a quadratic smoothing spline o f the form

L,L~ (x)

O~i -~- /3i (X -- Xi) + Yi (X -- Xi) 2

for x

~0 + / 3 0 (x - Xl ) + ~; (x - xl )2

for x E [Xo,Xl )

E

[Xi,Xi+l) , i = 1 . . . . , n - 1

as suggested in Koenker et al. [9]. Introducing a new parameter, tr > 0, we can write the objective function as n

min gE~

Z

subject to

- a <_ g" (xi) <_ a,

P, {Yi -

g(Xi)} q-

,~

i=1

i=

1.... , n - 1 .

(7)

From the continuity constraints of the quadratic smoothing spline, we have i - - 1. . . . . n -

Or,Lo¢(Xi+l):Tih2q-flihiq-cti:cti+l, ^!

gT, L~ (Xi+l) : 27ihi -k-/3i : /3i+1,

1.

(8)

i : 1,..., n -- 1.

Eliminating fli, we have ])ihi -Jr-]~i+lhi+l

:

o~i+l) /hi+l] --

[(oci+ 2 -

[(o¢i+ I --

In matrix form, this can be expressed as

R~ = QO where t} = ( Y l , ~ l , . . . , ~ , ) ' , Y =

R =

(~)l . . . . .

......

~.-1)

1

0

0

hi

h2

0

"'.

:

0

h2

h3

"'.

:

: 0

"'. ...

"'. 0

"'. hn-2

0 hn-I

t ,

o~i) /hi] ,

i = 1 . . . . , n -- 2.

P. T. No/Computational Statistics & Data Analysis 22 (1996) 99-118

106

is an ( n -

1) × (n - 1) matrix and 1

0 1

0

1

-hi -

Q=

-

h2

".

".

1

0 ."

0

hC

-

+

hC

o

..

".

..•

• ."

0

h.-2

+

is an (n - 1 ) × (n + 1 ) matrix. This yields

z,= R-1QO = MO Including the exterior segments, [X0,X 1) and [x., x.+l], we have 3 (n + 1) parameters and 2 (n - 1 ) linear continuity constraints, plus the four conditions that 70 = 7. = 0, /~o =/11 and ~0 = ~l. This leaves us with (n + 1) free parameters. Thus, reparameterizing 9~,L= by the (n + 2) vector 0' = ( 0 , a), we can rewrite (7) as

z:=,

min

0ER.+2

+ (~ - 2 ) son()3i - ~O)] l)3~- ~iO, + ,)3.+, - ~.+,O, (9)

subject to

11]0>0,

-M

where )3i is the ith element of (n + 1 ) × (n + 2) matrix

0

,

)3' = (Yl . . . . . y.,0), and £/ is the ith row of the

"

The equivalent linear programming formulation of (9) is

~ w ( u + v)

rain 0 E ~ n+2

s.t.

)3 - ) ? 0 = u - v [M -M

1]0>0 1 -

U E

V

[u~. +n + l ,

E

I- ~- +n + l

where the (n + 1)-vector of weights is

~+

-

-~+

-

W =

1

P.T. Nal Computational Statistics & Data Analysis 22 (1996) 99-118

107

3. Monotonicity and convexity constraints There are situations under which further monotonicity and/or convexity constraints must be imposed on the estimated conditional quantile functions. Due to the linear programming nature of the quantile smoothing splines, these additional constraints can be incorporated easily into the linear programming formation. 3.1.

Monotonicity

The L1 smoothing spline and the Loo smoothing spline are monotone if the slopes of all the segments have the same sign. From now on we will concentrate only on constraining the fitted function to be nondecreasing. For the L~ quantile smoothing splines, the constraints on the slopes fl = (~l,... ,fin-l)' can be expressed in terms of 0' = (am.... ,0~,) using (4), as

/~= F0___ 0,

(lO)

where 1

1

hi

hi 1

0

h2

F =

"..

...

0

...

...

0

°o°

0

1 h2 ".

°°°

1 h,-2

0

0

1 h,-2 1

h,-t

0 1

hn-I

is an (n - 1 ) x n matrix. Now we must minimize the objective function (6) subject

to (10). Because 0' = (71, ~1,..., =,, tr) for the Loo monotonic quantile smoothing splines, we can express the constraints, using (8), as 13 = [O

F

0]0-J?

= ([O

F

O]

-J[M

O])0 _> 0,

(11)

where J = diag(hi) is an ( n - 1) × ( n - 1) diagonal matrix. The monotonically increasing Lo~ quantile smoothing spline becomes the solution to (9) subject to the additional inequality constraints in (11 ) . 3.2.

Convexity

and concavity

Convexity for the linear Li quantile smoothing splines requires ^l

^!

g~,1., (x~+l) - g ~ , ( x i ) = fli+l - fl~ _> O,

We can see from (5) that this becomes lIO>O.

i = 1. . . . , n -

1.

P.T. Ng l Computational Statistics & Data Analysis 22 (1996) 99-118

108

For the Lo~ quantile smoothing splines, convexity constraint requires g',',L~ (x) > 0, which translates to ?~ > 0, i -- 1.... , n - 1. In matrix form this becomes ?=M/J=

[M

0]0>0.

Again we simply need to solve the corresponding linear program subject to the additional set of inequality constraints. Similarly, we can impose concavity constraints by multiplying the constraining matrices for convexity by minus one.

4. Implementation Bartels and Conn [2] [BC] provide an efficient nonsimplex active-set algorithm for the following linearly constrained discrete L~ problem:

~

min

xE R"

la~x - bjl

J

G'x = h,

subject to

C'x >_ d,

(12)

where aj E R', bj is a scalar, G, C are matrices with columns gk, ct E R', respectively, and h,d are vectors with elements r/k, fit, respectively. To incorporate our objective functions (6) and (9), we extend (12) to the more general setting of

min

~i [ ~ + ( ~ - - z )

subject to

G'x = h,

sgn(s~x--ti)] ,s~x--ti,+ ~j ,ajx-bj[

C'x > d

(13)

with s i E ~n, and ti being a scalar. Unfortunately, BC use a notation quite different from that in the regression literature and in Sections 2 and 3. We choose to retain as much of BC's notation as possible in this section, to allow easier cross reference to BC, and Bartels et al. [3], who provide more details. As a result, a few notes are required to help bridge the gap. First, the residuals Y i - £i0 and ) y - £,jO in (6) and (9) are negatives of the residuals s~x - ti and a~x - bj in (13), respectively. The weights associated with the fidelity portion of (13), therefore, are accordingly adjusted to reflect this fact. Second, the parameter vector 0 in (6) and (9) corresponds to the vector x in (13). Following BC we rewrite our objective function (13) as minimizing the following unconstrained problem, [(1 - 2z) + sgn(s~x - t,)](s~x - t,)

if(x) = ? ~ i

+ ? ~ sgn( a~x - bj )(a~x - bj ) + ~ sgn(g'kx -- ~lk)(g'kx -- ~lk) -- ~ rain(0, c'tx -- 3t ), k

l

(14)

P. T. NO I Computational Statistics & Data Analysis 22 (1996) 99-118

109

where ? is the Lagrange multiplier. Notice we have also rescaled the weights in the fidelity by a factor of two for programming convenience. Our problem (13), or equivalently (14), encompasses BC's original problem in (12). Let S , A , G , C be matrices with columns s~,aj, ok, ct E R", respectively. Further, let t,b,h,d be vectors with components t, bj, hk,6t, respectively, whose dimensions equal the column dimensions of S,A, G,C, respectively. Letting E = [e,, ...,em] =

[wSIAIGIC],

f

[Tt'l~b'lh'ld']',

=

[ffl .....

(m] '=

we may rewrite (14) more compactly as l

~b(x) = ~

p~(eux - Cu) + ~

,u

le~x - ¢~,1- ~ I

to



I

mln(O,e~x - (~),

(15)

v

where #, ~o, and v are the indices for [?S], [~.41G] and C, respectively. Letting g(x) = ? = {tple~x = (~} be the index set of the active columns in E associated with the zero residuals, (15) can be written as /

~b(x) = y~p¢(eux - ~,) + ~

/

/

Y]leo x - Colo~

min(0, e,x - (v).

(16)

v~

We also denote N as the submatrix of E formed from the active columns (arranged in any convenient order) indexed by g. With (16), the following optimization algorithm corresponds to that of BC's: {select any x E R"}; {select any ? > 0};

repeat {reduce ?};

repeat {update}; {find p}; if not minimal then begin {find ~) x := x + ~tp

end until minimal until {feasible or ? too small}; In the following sections, we describe only those portions of the modules in the above algorithm different from those in BC.

P. T. NO/Computational Statistics & Data Analysis 22 (1996) 99-118

110

4.1. find p To find the descent direction, p, express the change in ~b(x) as Aq4x; a p )

=

qJ(x + a p ) -

= ~ g ' P + a { y~ue'up+~c#

o~#~'~'~e'p-Y~e'~P}~e,r

(17)

for small ~ > O, where the restricted gradient is given by (18) and 2-2z 0"~

=

ife',x-~u>

0

I

1 - 2z

if e . x - ~. = 0

-2z

if eux - ~. < 0

I

1

if eo~x '

0

if eo~x ' - ~,o = 0

-

1

if

'

e,ox

--

~o~>0

(o,<0

-

0

if e~x - (v _> 0

-1

ife~x - ( ~ < 0

O"v

2-

~ =

1 - 2z -2z

ao~ =

^¢7v

2r

/

if e,p>O l if eup = 0

(19)

l

if e,p
1

if e'~p>O

0

if eo~ ' p = 0

-1

if e~op
(20)

if evp ' > 0

[0 -1

if e'vp
(21)

As in BC, two cases are possible.

Case 1: The orthogonal projection o f g onto the null space o f the submatrix N ' is not zero.

P.T. N# l Computational Statistics & Data Analysis 22 (1996) 99-118

111

In this case, we let I p = -Z2D2Z~g,

where Z2 and D2 are the submatrices in the orthogonal-triangular factorization

As a result, g'p < 0 and so is Aft < 0 for small enough ~ _> 0. Case 2: The orthogonal projection of g onto the null space of N' is zero. In this ease, g is in the range of N and hence

g = Nw, where the vector w is w = R-'Z~o. The change in ~b(x) can then be expressed as

O(x + o~p) - O(x) = o~ y](~'.6u + wuau)le.p[ peg

+ 1,o~g ~--~( + w~'d'°)le'p[ + Y](dTve~

+ w~)(e'~p)]

where ^

d. =

1

if et,' p >O

0

if e~,p=O

-1

ife~p < 0

!

The change in ~(x) will be nonnegative for all ~ > 0 and any p if the following conditions hold: 2z-2 -1

< wu < 2r

VpEg

_< wo, _< 1

g6oEg

0 < wv <

1

(22)

VvE~

The descent direction p is then given by p = Z 1 ( R - l ) '1)

where v is a vector whose components are zeros except

v~o = -sgn( w~o)

(23)

112

P.T. No~Computational Stat&tics & Data Analysis 22 (1996) 99-118

for some selected index ~0 C ~, ~o =/~,


4.2. find ot Corresponding to BC, let ~¢(x) = ~¢ = {~(~)}~=l, where ~ i / 8 , ~(~) - ~l~)--e(~):x el~),p ---~ 0 and e('),~ (') are elements of E , f that correspond to a ('). Also assume a (~) is indexed so a °) _< a (2) <_ .-. <_ a (s). With the above choice of p in {lind p} along with a (1) = min ~ , we can write ~h(x+ap)=@(x)+apg

, (0),

(24)

for 0 < a < a(1). The gradient g(0) is given by g(0)

=

/ g S~.,, a v ev g + ~ ~ , 5~e~ + ~ o~ ~,oeo~ + ^vu40 v,,#O L

if Case 1 if Case 2

(25)

and g is the original restricted gradient in (18). When ~ increases past ~(1) and falls in the range ~o) < ~ < ~(2), we have

d/(x + ~p) = ~k(x + a(l)p) + (~ _ aO)p, gO), where g(O = g(0) _ ~ 2 a ° ) e ° )

(

(1 + 2 a ° ) ) e O)

if e °) is a column of

[7SlyAIG]

if e O) is a column of [C]

to account for the change in sign of the residual, e O)' (x + ~ p ) - ~o). In general, when a increases past a (') but is below ~(,+1), we have

~h(x + ap) = d/(x + a(')p) + (a - a('))p' g (~) and the necessary change to g(') is { 2a(~)e (') g(-) = g(~-o _

(1 + 2a('))e (~)

if e (~) is a column o f [?S[TA[G] if e (~) is a column o f [C]

(26)

P. T. NO/Computational Statistics & Data Analysis 22 (1996) 99-118

113

in which a ~") is the corresponding element o f au, a~,,or a7 for rc = 1. . . . . s. So long as p'9 ~) < O, ~b(x + ~p) will decrease as the index n increases from 1 toward s and the value of ~ increases past ~(") consecutively until a value of n is reached such that p' 9 ~) > O. The module o f {find c~} then is {adjust g as necessary to obtain 9 (°) using (25)}; rt := 0;

repeat ~:=~+1; {obtain gC~) from g(~-l) by (26)}

until p' g(~) >_ O. 4.3. update As in BC, the updating activities involving x, 8 and N remain intact. When updating the objective function (15), however, we separate the index sets /~ and o9 corresponding to the fidelity (involving p~) and the penalty (involving the absolute norm) and update their contributions to (15) accordingly. When z = 0.5, the fidelity norm and the penalty norm both are absolute norm and the dichotomy of the index sets becomes unessential. For general z, however, the separation of the index sets is crucial in obtaining the correct value for the objective function during each inner loop of our algorithm.

4.4. Degeneracy Degeneracy occurs when there are more zero residuals in the objective function (15) at an intermediate solution, x, than the number of parameters to be solved. In our situation, degeneracy is the norm rather than an anomaly. For the L1 penalty, it is obvious from (6) that when 2 ~ 0, the L~ quantile smoothing spline will try to interpolate all the n observations and, hence, will yield n zero residuals in the fidelity measure. As 2 ~ 0, the penalty measure also will approach zero and, hence, will yield possibly a maximum of n - 2 more zero residuals. In the case of L ~ penalty, degeneracy occurs when 2 ---, c~. As the L~ quantile smoothing spline tries to fit a straight line through the data, all inequality constraints, 2 ( n 1) of them, in (9) will be tight with a ---, 0, so that the contribution to the L~ penalty is negligible. With only n + 2 free parameters, there will be n - 4 extra zero residuals. BC handles degeneracy by choosing a minimal set of linearly independent columns associated with zero residuals for the N matrix of active columns. Extra zero-residual columns then are perturbed by randomizing their signs over ( - 1 , 1) in the contribution to the restricted gradient g in (18). This approach works fine if the cross product e'~p for q~ ~ 8 is non-zero. A large enough number of cycles eventually perturbs the restricted gradient to the correct quantity and, hence, leads to a correct choice o f p. Problems arise, however, when e'~p = 0 for a direction of descent p

114

P.T. Ng / Computational Statistics & Data Analysis 22 (1996) 99-118

for ~0 ~ 8. Notice that the )( and M matrices are extremely sparse in (6) and (9). If p is generated by Case 2 of {find p}, all components of p except one will be zero and the cross product e'~p easily can be zero. When e'~p = 0, the sign contribution to the gradient should be computed according to (19)-(21). The sign should, therefore, be randomized over ( - l , 0, 1 ) rather than over ( - 1 , 1 ). A randomized sign of +1 or - 1 will not yield a correct restricted gradient and, as a result, the chosen direction of descent p will not be correct. As a consequence, no step should be taken in the {step} module. In this case, the algorithm will search for the potential p forever. In order to ensure a zero step is taken before the correct p is computed from a correctly randomized restricted gradient, we recompute the restricted gradient in the module {find ~} to ensure p'g~) < 0 before a step of ~ ) is taken.

5. Parametric linear programming Koenker et al. [9] recommend minimizing the Schwarz information criterion

to choose the smoothing parameter 2 for the zth quantile smoothing splines. The effective dimension p~, which takes an integer value between 2 and n, is the number of interpolated response observations (yi's). This approach requires the formidable task of computing the entire path of zth quantile smoothing splines corresponding to all the infinitely many different values of 2. To construct point-wise confidence intervals or confidence bands for 0r, L,, we can adopt the direct approach of Zhou and Portnoy [16] as demonstrated in Ng and Smith [11]. This method requires a similarly formidable task of computing the whole spectrum of quantile smoothing splines in z for a given 2. The above computational problems can, however, be ameliorated substantially by adopting a parametric programming approach - the study of the effect of parametric changes to the current optimal solution. Even though there are infinitely many different z in [0, 1], there are only finitely many distinct quantile smoothing splines in a sample of size n for a fixed 2. due to the nonsmooth nature of our linear objective function (13). The same is true in terms of the smoothing parameter 2 given a fixed z. Portnoy [13] shows the number of distinct regression quantiles in a linear regression setting to be Oe (n log n). Fig. 5 presents all distinct zth L~ quantile smoothing splines for a fixed ~, for the data set x = (1,2,4,7,9) and y = (3,2,7,8,6) analyzed in [4]. Here 2 is chosen to be large enough for the Ll quantile smoothing splines to become the linear regression quantiles of Koenker and Bassette [7]. Altogether there are three jump points (7/22, 1/2 and 3/4) for 0 _< z _< 1. The solid line connecting the points (2,2) and (9,6) is the L~ estimated zth conditional quantile function, g~,L,, for all 0 <_ z _< 7/22. The dotted line joining (1,3) and (9,6) is O~,L.for all 7/22 < z _< 1/2. There is no unique solution for z = 7/22. Any linear combination of the solid and dotted

P. T. Ng l Computational Statistics & Data Analysis 22 (1996) 99-118

I

115

,,,"

.x

~tn.

,,,/"/ .~

2

4

6

8

Fig. 5. All distinct L] estimated zth conditional quantile function, g,,L,, for 0 < t < 1. t E [0, 7] - - ,

~E[~,~]

............

J 3 [~,~]

,rE

~

CO.

©.

I0-

\

/

\

\

z/ LN. i

I

I

I

2

4

6

8

Fig. 6. All the distinct L1 estimated conditional medians, ~ [½,2]

............

g,5,Li , for

, ,~ ~ E 2 , o o ]

-

0 < 2 < c¢. 2 C [0, ½] -

lines is an L~ estimated (7/22)th conditional quantile function. All unique median (z = 0.5) smoothing splines corresponding to the distinct values o f 2 are given in Fig. 6. To perform parametric linear programming in • and 2, we begin by looking at (17), which is the directional derivative o f (16) in the direction o f ~ p ~ 0.

P. T. N o I Computational Statistics & Data Analysis 22 (1996) 99-118

116

Because ~k(x) is a convex function, it attains a unique minimum at x*, if and only if A~,(x*; a p ) > 0 for all a p ~ 0. Hence we have A~b(x*;~p) = ~

(

+o¢

/ (z-z-' z~-"l

~-~a~4 +

a* 4 + ~-~av e v

au eu +

a,oe~, +

p

av e~

p

>0

where the , ' s denote quantities evaluated at x*. Letting u = ~ V ' p , we have A~b(x*; a p ) > 0 for all a p # O, if and only if

o

ECu,+~
<

+ E a ~ e• u Z, I ( R -1 ) ,u + ~-~a~eo~Zl(R * ' -1,) u + Za~-" evZ,(R , -1 ) ,u for all u ~ O. Denoting i~, i~o, and iv as vectors of zeros with the respective #, co, and vth components being ones and setting u = iu, io),iv consecutively, we have

2~-2

I

<

]

iu + 12--, ,.e~ ,t Lo~¢~

Za*ue'~ ' Z ' ( R - I ) ' L~,¢~ +

[z '

~v evZl(R

L v~a

)'

"1) j & < 2 r

(27)

for # ¢ 8, -I<

+

[z -"

~v e~Z~(R

Lv~

-"1) J i~o <

(28)

1

for ~o E g and 0 <

[E*'a.e~Zl(R - l), L~

+ for v E ~f.

] [z"-'d iv+

a,oe~,Zl(R

Lo'~8

1

iv< 1

)

*v

] (29)

P. T. Ng I Computational Statistics & Data Analysis 22 (1996) 99-118

117

5.1. Next z To compute the entire path of quantile smoothing splines for a fixed )`, we begin with r0 = 0, which yields the lower extreme smoothing spline, as in [8]. The solution remains optimal if and only if conditions (27)-(29) are satisfied. Collecting all terms associated with z in (27)-(29) into =~, =o~,=~, respectively, and the remaining into Au, Ao, A,, respectively, the next z at which the current solution ceases to be optimal is

rl

= min,>,o

minze~,

-~,

, mino~ee{

--

A~+I ~"~ tO

min,s• {

l~Ao~ 1 l"~ O)

J

-,A'~,1 Z= A,}},.,,:,

Continuing this iteration until '~ph-i ~'p < 1, which gives the upper extreme smoothing spline, yields the entire family of quantile smoothing splines. In Fig. 5, we have z~ = 7,z2 = ~,z3 ] = ~3 and "1~ 4 = T5 = 1, SO only four distinct L~ quantile smoothing splines for 0 < z < 1 exist. :

5.2. Next )` Similarly, the entire path of solutions corresponding to all distinct )`s for a given z can be obtained by starting the iteration at 2o = ~ , which corresponds to the zth linear regression quantile of Koenker and Bassett [7]. Collecting all terms associated with )` into E's and the remaining into A's, the next smaller distinct 2 is 21 = max,
max,et

-

,

,

, maxo~. ~to

maxve~,

~,

~t.o

_

Again the iteration is carried out until 2p+~ = )`p _> 0, which corresponds to the zth quantile interpolating spline. Fig. 6 shows three distinct median L~ smoothing splines for 2 between 2O = ~ , )`! = 2, )`2 = 5l and 23 = ),4 = 0.

Acknowledgements

I would like to thank Roger Koenker for inciting my interest to write this paper. The very helpful comments of an anonymous referee are also greatly appreciated. All remaining errors are mine. Computations in the paper are performed on equipment supported by the National Science Foundation Grant SES 89-22472.

118

P.T. NglComputational Statistics & Data Analysis 22 (1996) 99 118

References [1] Armstrong, R. and J. Hultz, An algorithm for a restricted discrete approximation problem in the LI norm, S I A M J. Numer. Anal., 14 (1977) 555-565. [2] Bartels, R. and A. Corm, Linearly constrained discrete Li problems, A C M Trans. Math. Software, 6 (1980) 594-608. [3] Bartels, R., A. Conn and J. Sinclair, Minimization techniques for piecewise differentiable functions: the L~ solution to an overdeterminexi linear system, S I A M J. Numer. Anal., 15 (1978) 224-241. [4] Bassett, G. and R. Koenker, An empirical quantile function for linear models with lid errors, J. Amer. Statist. Assoc., 77 (1982) 407-415. [5] Craven, P. and G. Wahba, Smoothing noisy data with spline functions, Numer. Math., 31 (1979) 377-403. [6] Hawkins, D., Fitting monotonic polynomials to data, Comput. Statist., 9 (1994) 233-247. [7] Koenker, R. and G. Bassett, Regression quantiles, Econometrica, 46 (1978) 33-50. [8] Koenker, R. and V. d'Orey, Computing regression quantiles, Appl. Statist., 36 (1988) 383-393. [9] Koenker, R., P. Ng and S. Portnoy, Quantile smoothing splines, Biometrika, 81 (1994) 673-680. [10] Koenker, R., S. Portnoy and P. Ng, Nonparametric estimation of conditional quantile functions, in: Yadoleh Dodge (Ed.), Statistical Data Analysis Based on the L~ Norm and Related Methods (North-Holland, New York, 1992). [11] Ng, P. and J.L. Smith, The elasticity of demand for gasoline: a semi-parametric analysis, Mimeo (Dept. of Economics, University of Houston, Houston, TX, 1994). [12] Pinkus, A., On smoothest interpolants, S I A M J. Math. Anal., 19 (1988) 1431-1441. [13] Portnoy, S., Asymptotic behavior of the number of regression quantile breakpoints, S I A M J. Sci. Comp., 12 (1991) 867-883. [14] Reinsch, C., Smoothing by spline functions, Numer. Math., 10 (1967) 177-183. [15] Schuette, D.R., A linear programming approach to graduation, Trans. Soc. Actuaries, 30 (1978) 407-445. [16] Zhou, Q. and S. Pormoy, A direct approach to construct confidence intervals and confidence bands for regression quantiles, Mimeo (Dept. of Statistics, University of Illinois, Champaign, IL, 1994).