Statistics & Probability North-Holland
Letters
December
12 (1991) 517-525
1991
Inferences about correlation structure based on proof loading experiments Richard A. Johnson and K.T. Wu Department
of Statistics,
University
of
Wisconsin, Madison,
WI 53706, USA
Abstract:
Units can have two or more important strength properties. A procedure is described for estimating the correlation between two strength properties that, individually, can only be observed through destructive testing. The procedure involves proof loading a unit in one failure mode and then loading survivors, in a second mode, to failure. We first review results for the case where the two strength properties follow a bivariate normal distribution. The selection of an optimal single proof load is highlighted. A new nonparamettic formulation is also explored and estimation of mixed moments and the correlation considered. Some large sample estimation results are also given. Keywords: Correlation,
nonparametric
estimation,
proof load.
1. Introduction In the design of complex systems, it is often the case that components undergo more than one kind of stress. An example, the one that motivated this research, comes from the design of wood frame structures. The 2 x 4 or 2 x 6 members of a roof truss will simultaneously be subjected to both bending and tensile stresses. In order to make optimal use of a scarce resource in this application, it is important to have knowledge of the joint distribution of bending and tensile strength for these specimens. Typically, the various strength properties such as tensile, compression, and bending strengths of a specimen will be positively correlated. Pieces with few knots and straight grains will be strong on all of the properties while specimens with large edge knots will break easily in any of the modes of failure. When the material is steel or some other homogeneous medium, pairs of nearly identical specimens can be created. The correlation between any two strength properties can then be estimated by breaking one member of each pair in each mode of failure. However, because of natural variation, it is difficult or impossible to match specimens of dimension lumber even when they come from the same tree or stand of trees. Another approach is needed for cases like this where good pairing of experimental units is not possible. Galligan, Johnson and Taylor (1981) report a design for estimating the correlation in a bivariate normal distribution when each variable is measured separately by loading the experimental unit until it fails. Bartlett and Lwin (1984) independently considered estimation of correlation in a bivariate lognormal using the same design but their estimation procedure required that additional samples must also be available from each of the two marginal distributions. The proof load procedure involves two steps. Design Step I. Load the unit in mode
1 up to an established maximum load L. If the unit fails, record x. If it does not fail, remove the load and proceed to Design Step 2. Design Step 2. Load the unit in mode 2 until failure and record its mode 2 strength y.
its mode 1 strength
Research
supported
0167-7152/91/$03.50
by the Airforce
Office of Scientific
Research
0 1991 - Elsevier Science Publishers
under
Grant
No. AFOSR-87-0256.
B.V. All rights reserved
517
Volume 12, Number 6
STATISTICS & PROBABILITY
LETTERS
December 1991
When the joint distribution of X and Y is bivariate normal, the likelihood is
,5&
pexp(
-(x1
- pi)‘/20?)
X ;~~~~exp(-(,;-~*)‘/2u~)~~~exp(jl,’)
dzj
2
where uj = [L - p1 - p( ~~/a,)( y, - p2)]/[a, {l - p2 ] and F is the set of indices where xj is observed. Point estimation of the moments and correlation, when (X, Y) is bivariate normal, has been studied by De Amorim and Johnson (1986), Bartlett and Lwin (1984), Evans, Johnson and Green (1984), Galligan, Johnson and Taylor (1981) and Johnson and Galligan (1983). The selection of the proof load L is also considered in De Amorim and Johnson (1986) and Evans, Johnson and Green (1984). All of this literature concerns a single proof load, although De Amorim and Johnson (1986) consider designs where one part of the sample is first proof loaded in mode 1 and the other part is first proof loaded in mode 2. Our goal is to study nonparametric estimation procedures and these will require multiple proof loads. In Section 3, we prove the identifiability of distribution of (X, Y) under a multiple proof load design constructed from an increasing mesh of finite proofs loads L,, L,, . . . , L, on the mode 1 variable X. Elsewhere, we consider estimation of the conditional survival function, P( Y > y 1X> L), the mode 2 survival function of those items which survive proof load L. This conditional survival function describes one aspect of the dependence between X and Y. The main results of this paper are concerned with the problem of nonparametric estimation of the mixed moments E[ X”Yb]. This leads to a new estimate of the correlation, corr( X, Y), and thus provides an alternative measure with which to compare the normal theory maximum likelihood estimate.
2. More background The mechanical and physical properties of building materials must be understood in order to design safe structures. Lumber is a prime example where the physical and mechanical properties of each piece are related to the physical and mechanical properties of structures into which they are placed. Building materials typically have several strength properties which could, ultimately, be related to the strength of structures. These include the modulus of elasticity in bending, bending strength(modulus of rupture), tensile strength, and compressive strength. Except for the modulus of elasticity, any single strength property can be determined by loading a specimen to failure in a test apparatus designed for that property. Consequently, all of the strength properties cannot be measured on a single specimen and, until the current proof loading procedure, relationships among them could only be roughly approximated. Suddarth, Woeste and Galligan (1978) first investigated the importance of the correlation between strength properties and its influence on the performance of structural systems made of lumber. They showed that a positive correlation can have major effect on the load-carrying capacity and reliability of structural systems such as the metal-plate wood truss used in the roof structure of most homes. Therefore, it is necessary to have an accurate estimate of the correlation between strength properties if structural systems are to be designed efficiently and with a high degree of safety for the loads they must bear. Johnson (1980) reviews the uses of proof loads for a single variable. Galligan, Johnson and Taylor (1981) exploit the idea of a proof load in a novel way. They present a design, with a fixed proof load, for estimating the correlation in a bivariate normal distribution. In their initial application, a moderately large amount information was available on the marginal distributions since many specimens had been loaded to failure in strength mode 1 and many others in strength mode 2. Consequently, maximum likelihood estimates were obtained for the correlation coefficient with the marginal distributions held fixed. Johnson 518
Volume
12, Number
STATISTICS
6
& PROBABILITY
December
LETTERS
1991
Table 2.1 The optimal
choice
of proof
load. Known
normal
marginal
distributions
P
L;,.,,, IJP,
Lz,.opt)
P[Z,
<
~2,.“p11
0.95
0.00
0.20
0.40
0.60
0.80
0.90
0.61
0.60
0.55
0.46
0.32
0.23
0.4049
0.4305
0.5272
0.8012
1.9491
5.1867
14.2761
0.7291
0.7257
0.7088
0.6772
0.6255
0.5910
0.5636
0.16
and Galligan (1983) report the analysis of eight data sets and give approximate confidence intervals for the correlation based on the large sample distribution of the likelihood ratio statistic. They also extend the approach to consider estimation of the correlation of error terms in a bivariate multiple regression model. Evans, Johnson and Green (1984) used Monte Carlo techniques to find the optimal fixed proof load, L, for estimating the correlation coefficient for a given population correlation. Samples of sizes 100 and 300 were considered because these seemed reasonable for the lumber applications. De Amorim and Johnson (1986) determine the optimal proof load by maximizing the Fisher information. In the usual notation for a bivariate normal distribution, this information can be expressed in terms of the standardized proof load L,,
=
L-Cl1 -
(2.1)
01
the standardized variable Z, = (X, - p2)/uz, and the quantity a( Z,) = ( Lz, - pZ,)/(l
z, =
1
O” / --cc,( z2
(1-p2)3
-
PLJ
2
- P~)‘/~ as
G’b+2>)
1_
@(a(z2))
cp(z2)
dz2
A brief summary of the optimal proof loads is given in Table 2.1. Generally, the optimal proof load decreases as p increases. Monte Carlo studies with sample sizes 50,100, 200 and 300 support the use of the Fisher information criterion. Evans, Johnson and Green (1984) also investigated the coverage of confidence intervals centered at the maximum likelihood estimator. When the parameters of the marginal normal distributions are unknown, estimation of p proved to be much more difficult. De Amorim and Johnson (1986) minimized the element, in the inverse of the Fisher information matrix, corresponding to p. A brief summary of the optimal proof loads is given in Table 2.2. Comparing Tables 2.1 and 2.2, we see that the optimal proof load has now moved to the lower tail. Unfortunately, even the best proof load scheme is by no means satisfactory. The estimated large sample variance for the maximum likelihood estimator { is 577432~~’ for p = 0 and drops only to 32n-’ for p = 0.60. A major source of difficulty is an unusual amount of indeterminancy between p and p2. Many partially maximized likelihood functions had a long ridge on the p, p2 plane. To circumvent this difficulty, De Amorim and Johnson (1986) suggested a symmetric design where half of specimens were proof loaded in mode 1 with survivors failed in mode 2 while the second half of
Table
2.2
The optimal
choice
of proof
load. Unknown
normal
marginal
distributions
P
0.00
L:,.“pt P[Z,
<
-1.13
Lz,.aptl
0.1292
0.20 - 1.12 0.1314
0.40 -1.04 0.1492
0.60
0.80
0.90
- 0.91
- 0.62
- 0.52
0.1814
0.2676
0.3015
0.95 -0.39 0.3483
519
Volume
12, Number
Table 2.3 The optimal
6
STATISTICS
choice of proof
load with the symmetric
& PROBABILITY
LETTERS
December
1991
design
P
L:,.“Pt Pl& < L,.
““,I
0.00
0.20
0.40
0.60
0.80
0.90
0.95
0.83 0.1961
0.11 0.1194
0.68 0.1517
0.51 0.1157
0.41 0.6591
0.29 0.6141
0.22 0.5781
specimens were proof loaded in mode 2 with survivors failed in mode 1. Minimizing the Fisher information in this case, led to the optimal proof loads given in Table 2.3. Note that the optimal proof loads are in the upper tail of the strength distribution. With the optimal symmetric design, the estimated variance for the maximum likelihood estimator dropped to 3.ln -’ for p = 0 and the multiplier is less than one for p G 0.7. Alternatively, one can consider a hybrid where a fixed number of specimens N are divided into three groups. Those in the first group are failed in mode 1, those of the second group are failed in mode 2, and those in the third group are assigned to the original proof load scheme. De Amorim and Johnson (1986) indicate that it appears best (i) to not assign any specimens to first group (ii) assign about 25% of the specimens to the second group and (iii) the best proof load is in the upper right hand tail of the mode 1 strength distribution. The proof load is higher than when p is the only unknown parameter. In a nonparametric setting a single proof load does not suffice.
3. Identifiability of the joint distribution under a multiple proof load scheme Because we were motivated by the example concerning strength variables, we develop the statistical theory in terms of non-negative random variables. Let (X, Y) have a continuous bivariate distribution function Fx,y on [0, co) x [0, co) with Fx,,(O, 0) = 0. Under proof load L, we observe (Z,, 8,) where if X< L, if X> L.
and
(3.1)
let F&.6, be
the distribution function of (Z,, 8,) on [0, 00) x (0, l}. To achieve identifiability, we need to consider a dense set I_ of proof loads in the interior of the support of the mode 1 variable X. The support of X, supp( X), is defined as the closure of the set {t: t > 0, and 0 < F,( t + E) - F,(t - E) for every E > O}. Theorem 3.1. The joint distribution of X and Y is identifiable under the proof load design if [L is dense in the interior of supp( X). More specifically, let ( ZL,,, a,,,) and (ZL,z, S,,,) be two observed random vectors with underlying random vectors (X,, Y,) and (X2, Y,), respectively. Then on [0, ~0) X (0, l}
F-LJLJ = FZ,.,A
VLE
(3.2)
IL
implies F X,.Y,
=F -
on [0, ~9) X [0, ~0).
X,.Y,
Proof.
P(z,
s,=
t) = i
520
F,(s),
ifO
t=l,
F,(L),
if L
t=l,
F y,x,L(s)Fx(L),
ifOGs,
t=O.
(3.3)
Volume 12,
December 1991
STATISTICS & PROBABILITY LETTERS
Number 6
By (3.2) and (3.3), we have O=zs
F,,(s)=F&),
VLEIL.
and F y,,x,>d@x,(L)
=FY,,x~>L(&(L)
Since Fx, is a continuous Fx, = F, z
function
and
vs>Ot
LEk.
and IL is dense in the support
supp(Xi)
of X,, i = 1, 2, we have
= su~p(X~,).
Next
since (3.3) gives the inner
equality.
Thus
~~,,y,(L,s)=P(x,~L,Y~~s)=l-P(x~~L)-P(x,~L,Y~~s) =l-P(X*,L,
Y*cs)=F,z,y,(L,
s)
vs>o,
LEL,
or G,.Y, = G>.Y,
on IL X [0, ~0).
and U_is dense in the interior of the common Since F,_,< is continuous on [0, co) x [0, co). This completes the proof. 0 &.y, = &.Y,
support
of Xi and
X,, we have
Given a finite number of experimental units one cannot conduct tests with an infinite number of proof loads. Conceptually, we consider an sequence of finite sets that increases to a dense limit IL. This will be our approach in the following section.
4. An estimator for mixed moments In this section, we first obtain estimates of moments of Y, conditional on X being greater than the proof load. These quantities are then used to develop estimates of the mixed moments and the correlation coefficients. Let (X, Y) be a bivariate random vector with nonnegative components such that P[ X > 0, Y > 0] = F(O, 0) = 1. Fix m increasing proof loads ILm = { L,, L,, . . . , L, }, where each L,, i = 1,. . . , m, is contained in the interior of supp( X). For each Li, we take a random sample (XI,, X,,), j = 1, 2,. . . , n,, where
K/9 7, =
yI/,
if X,, < L,, if X,, > L,,
and
for i=l,2 ,..., m. Our estimator of E(Yp 1X> L,) is motivated concerning nonnegative random variables.
%, =
i
1,
if XiJ < L,,
0
if Xi,>&,
3
from the following
generalization
(4-l)
of a well known
result
521
Volume
12, Number
6
STATISTICS
& PROBABILITY
LETTERS
December
Lemma 4.1. Suppose that E( XaYP) exists for some 1y>, 0, j3 > 0, and (Y+ j3 > 0. E[X*YqX>L,,
Y>L,]]
= L;LfF(
L,, co
03
JJ LI 4
+
y) dy + L,BJL:a,.IF(x,
L2) + L:j-;byBLF;(L,.
L2) dx
y) dy dx.
4x -lyP-+(x,
Proof. We can rewrite X” and Yp as X*=
j0
mcux”~‘l[X>x]
dx
and
Yp=
j
0mpy8-1Z[Y>y]
dy.
Hence EIX”YBIIX>L,, =E
[j0
=E
dxx
oCa~“-lIIXrx]
o ,f3yP-*I[Y>y]dyxI[X>L,, jm
*
I[X>x]l[Y>y]I[X>
ma,jxa-‘yb-l 1
0
0J0 [/
=E
Y>L,] I
0
[j
Y>L,]]
O” m~px”-‘yP-ll[X>max(Ll,
L,,
x),
Y> L2] dy dx
Y>max(L,,
y)]
dydx
1 1
By Fubini’s Theorem, we can interchange the order of the integration. Hence EIX*YPIIX>
L,,
Y> L,]]
=
03 m @x “-‘yP-‘E[I[ JJ 0 0
=
4 L2 jj0 0 +
jj0
L,,
LI M 4x *-‘yP-‘E[I[X> L2
LI 0
4X
“-‘yP-‘E
[ I[ X>
m cc @x a-‘yP-‘EIIIX jjLI L2
00 m 4x *-‘yP-‘F(x, jjLI L2
This proves the lemma. 522
Y> L,]]
L,,
Y>y]]
Y>max(L,,
y)]]
dydx
dy dx dy dx
0
x, Y > L2]] dy dx
>x,
= LyLtF( L,, L2) + L;L;j3yB-IJ;(
+
x),
L2
JJ
+
*-‘yP-‘E[I[X>
4x
*
+
X >max(L,,
Y>y]]
L,,
y) dy dx.
dydx
y) dy + L,BIL:cxxa-?(x,
L2) dx
1991
STATISTICS & PROBABILITY LETTERS
Volume12, Number6
December1991
Specializing to (Y= 0, L, = L,, and L, = 0, we have a result concerning would survive a proof load of L, units. In particular,
E( YP 1x> LJ = _F tL > EIYPIIX>
Lil] = &f=~yB-l~(L,. x I 0
I
x
the resulting population that
Y) dy
(4.2)
for j3 > 0. Expression (4.2) suggests the following intuitive estimator for E( Ya 1X> L;) based on the empirical c.d.f. $L,,
Y) = ?
Z]x,,
> L,, Y:, >yl/n,
k=l
and any estimator of the marginal distribution
Fx(L,).
We write E for ?x( L,).
y) dy
I?(Y” 1X> L,) = $jo”/3y”-1+(L,; I copf-[~k>Y1
dy
1
z[x,k>Lil
or (4.3) In the remainder of our development, we choose the empirical estimator n,fi = c;t.-tZ [X,, > L;] in (4.3). This simplifies the large sample distribution of the estimator Z?(Yp ( X > L;). Theorem 4.1. Suppose that c;_, # E and n = CF=,n,,
n,/n
-+ A, (0 < A, < l), as mini, r
co. Then
~[~(YPIL,_,
>N(O,
where A Var(i(Yp =
IL,_,
cX<
L,)) Var( YPr[ X> L;_,])
E;-I(1 -F-J
_ ,E2(YP)L,P,
A,-@-, E;;(l-q + x,(F,_,
hi( F,_, - q’
-q’
ME-,
_ 2E(Ybz(L,_l
_ *qYqL,-, A&,
L,))E(YqX>
L,_,))
-4) 1-E
-2
- q*
Var( Yp[ X> L,])
E*(YqL;_,
l-E;_,
-2
A,_,( &
< xg
L,))E(YfiZ(X>
L;)).
(4.4)
-4)
Proof. Under the assumptions on the sample sizes, the Central Limit Theorem establishes that fi
E-F, _qYpI(X>
L,)) -E(Y~Z(X>
4 N(O, z(P,
L,)),
L,)) I 523
December1991
STATISTICS & PROBABILITY LETTERS
Volume 12. Number 6 where
(1 - E)E(YfiZ(X>
E(l-E) x(p3
Ll)=*
(l-E;l)E(Yfiz(X>Z,i))
[
Var( YpZ( X > Li))
for any i, 1 G i G m. Since the sampling at proof loads L,-,
t_, err
g(ypz(x>
and Li are independent,
we conclude that
- z_,
L,-l))
-E(YPz(X>
Li-l>)
d+
N(O
9
xcp
g-F, i(YflZ(X>
1’
LJ)
Lj)) -E(YBI(X>
9
L,_
I 1,
L,jj
I
)
L;))
where
In order to obtain the variance of the estimator Z?(Yp 1L,_, < X< L;), we consider the transformation g(x,,
x2,
x3, x4)
=
s,
and define the vector
Since c_,
pT=
[e_,, E(Y'Z[X>L,_l]), 6~ E(Y’z[X>Lil)l.
# t,
the transformation
h(P)
E( YPZ[ L,_,
ax,=-
MPL) -= ax3
g is differentiable < x<
LJ)
(F;_,_$ E(YPIILi_l
(E;_,-F;)2
at ~1with partial derivatives M-4
’ L,])
’
ax,=
&
al.G>
1
ax,=--
e-,-F;’
Equation (4.4) follows easily by the S-method and the proof is complete.
0
The estimator in Theorem 4.2 can be used to develop approximations fixed number of proof loads m, we have that
to the mixed moments. For a
m-1
E(XaYP)= Numerically,
c E(X”Y~z[L,L,]). i=o
we can approximate
this expectation
(4.5)
over the grid of proof loads IL, by
m-l c L;E(YBIILjL,])
E,_(X*YB)=
i=o and, if the proof load grid is fine enough the approximation
will be quite good. Therefore,
for m fixed
proof loads, we obtain an estimate m-l
i,“,( xaYq 524
=
c
i=o
L$(
ypz [L, < x<
Li,,])
+ Lg(
YqX>
Lml)
(4.6)
Volume
12, Number
STATISTICS
6
& PROBABILITY
LETTERS
December
1991
where
qYQ[Li
])=qY~I[x>L,])-~(Y~z[x>L,+,])
and, from (4.2) qYqx>Li])
= f
2 ’
Y$I[X,,>L;]
j=1
where (Xjj, Y,), j = 1, 2,.. , n,, are the observations at the ith proof load L, for i = 1,. . ., m. This estimate should provide a good estimate of the mixed moments when the number of proof loads is large and they are well spaced in the support of the joint distribution. We have shown, under some regularity conditions, that the estimator kL ( XnYP) is consistent as the set of proof loads approaches a dense set and the minimum n, goes to infinzy. For a fixed set of m proof loads, it follows rather directly that the estimator (4.6) is asymptotically normal provided that E( X2aY2P) < co. Additional regularity conditions are needed in order to establish that the limiting mean E,_( XaYP) can be replaced by E( X*Yp) as IL, increases to a dense set. An estimate of the correlation is obtained directly from the estimates of the mixed moments.
References Bartlett, N.R. and T. Lwin (1984) Estimating a relationship between different destructive tests on timber, Appl. Statist. 33, 65-12. De Amorin, S.D. and R.A. Johnson (1986). Experimental designs for estimating the correlation between two destructively tested variables, J. Amer. Statist. Assoc. 81, 807-812. Evans, J.W., R.A. Johnson and D.W. Green (1984), Estimating the correlation between variables under destructive testing, or how to break the same board twice, Technometrics 26, 285-290. Galligan, W.L., R.A. Johnson and J.R. Taylor (1981) Examination of the concomitant properties of lumber, Proc.
of the Metal Plate Wood Truss Con& (1979) (Forest Products Laboratory, Madison, WI). Johnson, R.A. (1980). Current statistical methods for estimating lumber properties by proofloading, Forest Products J. 30, 14-22. Johnson, R.A. and W.L. Galligan (1983) Estimating the concomitance of lumber strength properties, Wood and Fiber Sci. 15, 235-244. Suddarth, SK., F.E. Woeste and W.L. Galligan (1978), Probabilistic engineering applied to wood members in bending/ tension, Res. Paper FPL 302, U.S. Dept. of Agriculture, Forest Service, Forest Products Laboratory (Madison, WI).
525