En@tmin~ Fiachm Mcchumics Vol. 31, No. 2. pp. 221-235, 1988 Printed in Omit Britain.
0
W13-7944f88 S3,00+ .oO 1988 Pergmon Press pie.
SUGGESTED PROCEDURES FOR THE ANALYSIS OF MULTI-SPECIMEN FRACTURE TOUGHNESS TEST DATA R. MOSKOVIC and P. L. WINDLE Scientific and Technical Branch, OED South, CEGB, Gravesend,
Kent, U.K.
Abstract-The analysis of a single set of fracture toughness data, in the form of J estimates and crack extension Au, for the prediction of mean and lower bound values is often hampered by insufficient sample size, This report outlines methods based on multiple regression techniques for the analysis of a combination of data from several sources to overcome this difliculty. In the context of a multiple regression approach, the analysis of data to identify temperature dependence and for Aa > 2 mm is also discussed. Techniques for monitoring the regression models for discrepancies with the data are presented.
1. INTRODUCTION fracture toughness testing provides data in the form of J-integral estimates (J) and crack growth measurements (Au). The analysis of these data must aim to provide valid, unbiased and efficient (i.e. low variance) predictions of the requisite toughness parameters for defect assessment. These parameters are Jo.*, an engineering value of the initiation toughness, Jg, the maximum valid J at which crack growth is J-controlled and dJ/da, crack growth resistance. Furthermore, to satisfy the current need for lower 60~~~ estimates and the possible future requirements of a Probabilistic Fracture Mechanics assessment, the analysis should also obtain an estimate of the distributional form of the population of toughness predictions. In the present procedures[l, 21, the estimation of fracture toughness parameters is based on an unweighted regression analysis of a single set of f/Au data, Such analyses often cannot provide all the required information with the definition of realistic lower bound values and population estimates being particularly problematic. In this report, new methods appropriate to the analysis of J/As multi-specimen data with the aim of predicting toughness will be recommended. These methods will include techniques through which the database can be increased to facilitate predictions of population parameters and lower bound values, multiple regression approaches to include further independent variables such as temperature (7’) and a technique for combining the toughness predictions for several sources. The report also describes standard diagnostic techniques for testing the assumptions of the regression model.
MULTI-SPECIMEN
2. CHOICE
OF REGRESSION
MODEL
ASTM [ 11 and CEGB [2 ‘Jprocedures currently recommend different methods of analysis of multi-specimen fracture toughness data. ASTM gives a J on Au regression analysis with J the dependent variable and Au the independent variable. The CEGB procedure suggests that for crack growth in the range of 0.2 mm to 2 mm the sum of the squares of the perpendicular distances from the observations to the regression line should be minimized and to allow for curvature in the resistance curve at long crack lengths when Au > 2 mm defines a power law curve through the data. It is clearly important to establish that the choice of regression analysis and curve fitting does give a best fit line to the data, which satisfies the paramount aim of providing unbiased toughness predictions with minimum variance. Therefore, before expanding the process of analysis to include further independent variables and techniques for estimating lower bound values, various analytic approaches including the recommended procedures will be reviewed and tested, where possible, to ascertain the most appropriate method. EPH 31:2-c
222
R. MOSKOVIC
and P. L. WINDLE
2.1. Regression analysis In order to obtain a regression equation which can predict J values, either the ASTM recommended J on Aa regression must be used which minimizes the sum of the squares of the differences between the measured and predicted values of J or the CEGB recommended minimization of the sum of the squares of the perpendicular distances. The regression analysis would give: j=&+fiAa,
(1)
where the A symbol refers to the regression estimate of the parameter. For the regression model of eq. (l), the errors in each observation of the dependent variable are assumed to be statistically independent and uncorrelated with each other, and the error distribution is assumed to have constant variance (i.e. homoscedastic) at all points, as shown schematically in Fig. 1. Any error in the independent variable, Aa, will introduce bias into the estimate of the slope p. In reality, a J and Aa regression obtained by minimizing the differences between the measured and predicted J values, will produce unbiased estimates of the parameter[3]:
(2) where a: is the variance in Aa due to error and & is determined by the range of Aa values and is given by [C(Aa - Aa)*]/(N- l), w h ere Aa is the mean value of Aa and N is the number of data points. Hence in this case a further assumption is made that errors in Aa are small compared with the range of crack length values. In consequence, it is argued that little bias is introduced into the estimates of /3, supporting the case for a J on Aa regression. For the regression model, used as the basis of the CEGB recommended approach[2], the assumptions of statistically independent and uncorrelated errors and homoscedasticity are made for the errors in both the dependent and independent variables. For such a model, it is also necessary to make an assumption as to the relative magnitude of the variances of the error distributions in each variable. In the CEGB procedure, the variance of the error distribution for J is assumed equal to that for the error distribution for Aa. Choosing this procedure introduces complications into the use of multiple regression techniques and the process of estimating lower
REGL”IRE~S’ ON
500 -
400 -
Yaoo3 s _) 200 -
100 -
01 0.0
NOTE: J VARIANCE IS CONSTANT
I 0.5
I 1.0
I
I
I
1.5
2.0
2.5
Lb Fig. 1. Statistical
model
for unweighted
regression
analysis.
3
Analysis of multi-specimen fracture toughness
223
values. For example, it is only feasible to obtain a confidence region for the regression line but not a prediction limit for a particular point along the line. In view of this and on consideration of the assumptions made, it is concluded that some form of a .I on ha regression should be carried out. The particular form that the .I on Aa regression takes, i.e. whether weighted or unweighted, depends on the family of resistance curves from which the sample of observations is drawn[4]. By testing the regression model against the model assumptions, the need for a weighted, as opposed to an unweighted, regression can be identified. This testing is based on an analysis of the residuals which are defined by the difference between the measured and predicted J values.
bound
2.2.
Analysis of residuals
Before using the regression analysis to predict toughness values, it is clearly important to assess the assumptions of the regression model used against the data analysed for indications which may lead to a better approach. Several standard techniques are available for this and they are based on the analysis of residuals, It is recommended that these tests be applied. If the fitted model does not produce a set of residuals that is considered acceptable then some aspect of the model will need modification. The form of the model currently assumed in the analysis of multi-specimen fracture toughness data is as represented in Fig. 1. It should be noted that use of these diagnostic techniques with small sample sizes may not be helpful and where possible measures to increase the data base should be used, for example multiple regression techniques. For a direct comparison of the residuals, it is useful to calculate the studentized residual, ri, from: ri = cJ[Var(ej)]1’2,
(3)
where ei = Ji - ji and Var (ei) =I S2( 1 - Vii),
(4)
with s2
-
1
ei
N-2 and for the simple linear regression of J on Aa[5] 1 Vii=-+ N
(Aai-Aa)* C (Aaim Ati)*’
A plot of the values of hiagainst the fitted values (& of the dependent variable provides a helpful method of quickly checking the statistical properties of the regression model for the following: 2.2.1. Residuuls are rundo~ly scattered und uncorreluted. Testing for randomness in the residuals is based on the statistic d*=C
(ei+~-eJ2,/C(ei-P)”
with d = 1 e&~. If the hypothesis of randomness is correct then the distribution of d* will have a mean of 2 and a variance of: Var(d*) = (N- 2)/~~. To obtain the probability of having a certain value of d*, and hence randomness, the value of (d* - 2)/[&ar(d*)] is compared with the tabulated values of the cumulative probabilities of the standard normal deviate.
R. MOSKOVIC
224
2.2.2.
Independence
and P. L. WINDLE
of residuals and fitted values. A test of independence
is provided by the
statistic
A value of zero confirms independence and a large finite value indicates an incorrect model. Critical values for this statistic are not defined in statistical texts, however values greater than unity may be taken as indicating an unacceptable model. 2.2.3. Variation in the variance of Ji along the regression line. If the assumption of a constant variance (homoscedasticity) is correct then a plot of the residuals ri against the independent variables, Au, of the form shown in Fig. Z(a) will be obtained. Any tendency for heterogeneity of variance of the types shown in Figs 2(b) and (c) indicates a need for weighted regression. Physical processes which could lead to such heterogeneity are discussed in [4] as are suitable weighting functions. For heteroscedasticity of the form shown in Fig. 2(b), the weighting function (l/$)/x( l/j:) is recommended whilst for variation in variance of the form shown in Fig. 2(c) the weighting function should be (Aa)/C (Au). 2.2.4. Data which fail to conform with the yodel whilst the bulk appear to do so, i.e. outliers. Those data with large ri (i.e. pi> 2) values should be examined with a view to their being outliers; the conditions under which such data were determined should be scrutinized for indicators which may lead to their exclusion from the analysis. It should be noted that exclusion of a data point is only necessary if the residual is large and the data have a significant effect on the regression parameters. A statistic useful for estimating the influence that data have on regression estimates is Cook’s distance, Di, given by:
Q_lr’“ii p
’ U
-
c (a) CONSTANT
VAFUANCE
X
XX
x
x
Vii)
‘k
xx
xxx
x
xx
Aa
X
Ib) VARIANCE INCREA~ffi
WlTHAa X
Xx x
x
x
X
X
x x
x
xx
Aa
x x
x X
X
(13 VARIANCE DECREASING INCREA~NGAa
Fig. 2.
Residuals
piots.
WITH
Analysis of multi-specimen
fracture toughness
225
with t)iidefined in eq. (6) and where p is the number of parameters estimated in the regression[4]. This is a measure of the difference between the values of the regression parameters calculated both with and without the ith data point. Data with both large ri and Di (i.e. both values >2) are candidates for exclusion from the analysis and should be re-examined for evidence which might support their exclusion. Even if no physical reason is apparent for the occurrence of an outlier, exclusion from the regression is recommended because of its large influence on the predictions. Finally, having decided on the appropriate data for analysis and having carried out a suitable regression procedure, a measure of how well the most significant regression equation fits the data is required. This goodness of fit of the regression model is indicated by R2, where R is the correlation coefficient, since it reveals what proportion of the variation of the dependent variable is accounted for by the fitted regression equation. Conversely (1 - J?*) is the proportion of the variation which is unexplained by the regression equation and this is due either to experimental error or mis-specification of the model. 2.3. Curve fitting The CEGB procedure for the determination of the fracture resistance of a ductile steel [2] recommends that a power law curve be fitted if Aa maXis greater than 2 mm. The regression model for the power law fit is non-linear in the regression parameters and has the form: J = CAa”,
(10)
where C and m are the parameters determined by regression analysis. In order to fit the model given by eq. (10) it is necessary to transform this equation into the linear form: In J=ln
C+mInAa.
(11)
This equation is linear in the regression coefficients which can be estimated by linear regression analysis using the least squares method. The main drawback of the log-log transformation is that it leads to a biased estimate of parameter C[6]. Hence as an alternative it is proposed that the fracture toughness data should be analysed by multiple regression and, in order to take account of possible non-linearity in the data associated with large values of crack extension, that the regression model has the form of a polynomial equation: J=~+m1Aa+m~Aa2+***+miAai+*~*+m,,Aa”,
(12)
where mo (the value of J associated with zero crack growth) is the intercept and ml,m2,...,mi,..., n, are the remaining coefficients which are estimated by the regression analysis. Equation (12) provides the best means for unbiased prediction of the initiation fracture toughness and crack growth resistance. There are many standard texts which cover the development of a multiple regression analysis. For completeness the procedure will be briefly described here and is based on Draper and Smith[7]. In order to develop the best fit regression model it is necessary to select, for insertion in the model, only those variables which are significant. The procedures for selection of variables in regression models are described by Draper and Smith[7]. The most relevant of these, for the analysis of fracture toughness data, are the methods of forward selection and backward elimination. These two methods can be combined together. The former can be used, initially, to identify the important variables and these are then inserted in the regression model. The latter can be used to remove redundant variables from the final regression model. The method of forward selection involves adding variables to the regression model sequentially until the increase in the regression sum of squares (RSS) due to the inclusion of an additional variable is no longer statistically significant. At each stage in the regression analysis, the significance of a variable is tested by an F-test using the F ratio of (regression mean square/residual mean square). If this mean square ratio exceeds the critical F value then a statistically signi~cant regression has been obtained.
R. MOSKOVIC
226
and P. L. WINDLE
As the first step a simple linear regression, with a single independent variable, is carried out each of the variables and the variable which brings about the largest RSS is inserted first in regression model. At each subsequent step a regression is performed for each variable, not in the regression model, in the presence of those variables already included in the model and variable which contributes most to the increase in the RSS is inserted next in the model. The effectiveness of the variables in the final regression equation is tested further by using the method of backward elimination: certain variables are omitted from the regression model and the significance of this omission is tested. By this process redundant variables are removed from the regression equation. The removed variables are those which may have entered the regression equation at an early stage and may have become correlated with other variables entering the regression model at later stages. A case study illustrating the forward selection procedure is given below. 2.3.1. Curve fitting by multiple regression metho&case study. The data to be analysed, given in Table 1, were obtained by testing 100 mm sized compact tension specimens of a low alloyed steel[8]. The variables considered for insertion in the regression model are those in eq. (12) and the results obtained in each step of the regression analysis are given in Table 2. Step 1: Select the first variable for inclusion. Test statistics for regression equations with one variable are presented in Table 2. The highest RSS is obtained with Aa and the corresponding F-value is greater than the critical values at both a 1% and 5% level of significance. Note that (Aa)’ also provides a significant F-value but has a smaller RSS. Hence Aa is the first variable chosen for inclusion and the subsequent regression analysis gives: for the yet the
.? = 374 + 249Aa
(13)
with j in kN/m and Aa in mm. Step 2: Select a second variable for inclusion. Table 2 presents three regression equations with two independent variables in each. The RSS for all models are similar, however in each case the F-value is lower than the critical values. Hence none of the models, with a second variable included, provide a statistically significant regression and the best fit regression model is given by eq. (13). However, as the aim is to fit a curve to the data, it is pertinent to quote the most Table
1. Fracture
toughness data base for multiple analysis[8]
regression
Dummy variables .I kN/m
ha mm
21
z,
295 305 355 383 436 486
0.2 0.31 0.26 0.36 0.43 0.9
0 0 0 0 0 0
0 0 0 0 0 0
279 321 362 505 545 676 66.5
0.24 0.28 0.47 0.68 0.99 1.316 0.87
1 1 1 1 1 1 1
0 0 0 0 0 0 0
343 486 466 745 900 1221 1332
0.23 0.425 0.44 1.15 2.12 2.95 4.23
0 0 0 0 0 0 0
1 1 1 1 1 1 1
Specimen size
20 mm CT
40 mm CT
1OOmm CT
TF=
3
’
98.5
98.4
98.4
98.4
98.4
53.0
69.1
82.5
95.9
% fit
0.01391 0.00021 0.01372
Residual
’
0.00002
Residual Regression
0.02274 0.01393 0.02252 0.01415 0.02260 0.01407
0.85398 0.03667 0.73448 0.15618 0.61597 0.27468 0.47186 0.41879
3
3 1
1
1 4 1 4 1 4
1 5 1 5 1 5 1
0.05
0.004
6.43
6.29
6.53
5.63
11.2
23.5
116.5
Computedt test statistic F
(100 mm CT specimen
Degrees of freedom
method
sum of squares
regression
Regression
Regression Residual Regression Residual Regression Residual
Regression Residual Regression Residual Regression Residual Regression Residual
Source of variation
data by multiple
sum of squares mean square = --_____ -~ degrees of freedom
J,,+Aa+Aa2+Aa4
Aa
mean ___ square residual mean square
regression
Jo + Aa + An* + Aa
J,+Aa+Aa4
ha4
Aa
Jo+Aa+Aa3
ha3
Jo + Aa
ha4
Ju+Aa+Aa*
Jc+Aa3
ha3
An2
Ju+Aa*
Aa2
1
2
Ju+Aa
ha
Step
Terms in the regression model
fitting of a set of fracture toughness
Term contributing to the increase in the regression sum of squares
Table 2. Curve
34.12
34.12
21.20
21.20
21.20
16.26
16.26
16.26
16.26
10.13
10.13
7.71
7.71
7.71
6.61
6.61
6.61
6.61
Critical values for F test 1% 5% Level Level
data from Table 1)
228
R. MOSKOVIC
0.0
0.0
/ 0.5
I 1.0
I 1.5
and P. L. WlNDLE
I 2.0
I 2.5
I 3.0
I
I 3.5
4.0
I
4.5
E
A a(mm) -
J = 314-C 249Aa
-c-w
J =289+406Aa
-.-._
J z 6Q2Aa0.462
- 37Aa’
Fig. 3. Linear, polynomial and power law crack growth resistance curves for 100 mm CT specimen data.
appropriate
regression model with two independent
variables, i.e.:
j = 289 + 406Aa - 376~~.
(14)
Note that a further step to include a third independent variable produces little change in RSS and no significant regression model. It will be instructive to compare these regression madels with one based on the previously mentioned power law fit recommended in the CEGB procedure[2]. The power law fit, derived using a log f/log Aa regression analysis of the data, is: j = 692,3Ai~‘.~~‘.
(13
Figure 3 shows the experimental data and the three regression models of eqs (13), (14) and (1.5). A comparison of the models will be based on the statistical testing of the respective residuals; the results of these tests are given in Table 3. From Table 3 it is clear that for these data, the power law fit should be rejected on the basis of the tests for independence and randomness and the model given in eq. (14) on the basis of the randomness test thus confirming the earlier conclusion that eq. (13) is the correct regression model, In reaiity, the power law fit is always likely to be rejected on the basis of these statistical tests carried out in J/Au space. Unless there are physical reasons for choosing a power law fit, the polynomial regression model is more appropriate for use in the analysis of data in which Aa values exceed 2 mm. 3. ANALYSIS
OF SEVERAL
SETS OF ERACTURE
TOUGHNESS
DATA
Both the previously outlined diagnostic techniques based on the analysis of residuals and the estimation of the population of predicted toughness values are more effectively carried out for large sample sizes, a typical single set of 5 multi-specimen data pairs is insufficient. The need for
Analysis of multi-specimen fracture toughness
229
Table 3. Statistical testing of residuals for 100 mm CT specimen data Statistical tests of residuals Regression model Polynomial (linear) Polynomial (second order) Power law
Fitted equation
%Fit R=
tTest of independence
Sd*
iProbability of randomness
j = 314 + 249Aa
95.9
0.00
2.26
20.9%
j = 289 + 406a - 37Aa2
98.4
0.00
3.22
<0.0001%
j = 692Aa”.462
98.5
3.22
<0.0001%
10.7
tStatistic for testing of independence = z e,j, a value of zero indicates independence of residuals and fitted values. Sd* = 1 (ei_, - ei)2/x (ei - E)* where E = 1 e,/N. JProbability of randomness = Prob. (d* 2 2) = Prob. Z 3
[(d*
-
2)/m]
where Z is the value of the standard normal deviate. See Section 2.2 for discussion of this statistic
an increased data base provides the impetus to treat several sets of multi-specimen data together using multiple regression analysis. With a view to defining lower bound values and population parameters for Probabilistic Fracture Mechanics assessments, data sets, for which it is reasonable to make the assumption that they have the same form of error distribution, could be taken together for analysis even though the differences in the respective crack growth resistance curves may be statistically significant. For example, such data sets could be made up of data from similar material, but separate plates or data from similar material but differing specimen geometry and temperature. Alternatively, the aim may be to test for the presence of statistically significant differences in the regression relationships between several data sets in order to indicate that there is a dependency on some factor as for example specimen size, or geometry or temperature. These situations can be dealt with within the framework of a multiple regression analysis[5,7] with the factors included as further independent variables. It is clear that factors such as specimen size, plate type, or weld type are not continuous variables whereas temperature can be treated as either a continuous or discrete variable. For instance, whenever a step change in the dependent variable occurs with temperature, as with strain ageing effects, then temperature must be treated as a discrete independent variable. If the extra factors for inclusion in the multiple regression analysis are of the non-continuous type then an analysis using dummy variables is required, otherwise the independent variables can be included directly. The decision either to include or exclude the dummy variable factors is based on the results of the tests for their statistical significance. If exclusion of the dummy variable factors is indicated then this implies that the data from the different sources can be analysed as one. To illustrate the use of multiple regression analysis involving dummy variables, a case study analysis of data from specimens of different sizes is given in the next section. For a further example, involving temperature dependence, see [9]. 3.1. Case study for multiple regression analysis of fracture toughness data involving the use of dummy variables
The analysis will be illustrated using the data reported by Ingham et al.[8] for several sizes of test specimens (20,40 and 100 mm) of a low alloyed steel. The data are given in Table 1 and plotted in Fig. 4. The multiple regression model to be fitted to these data will have to use dummy variables to uniquely identify each specimen size and must also include polynomial terms. Hence, the regression model will have the form: ”
J =
p-1
C C
i=O
j=O
m&ZjAa’,
(16)
R. MOSKOVIC
230
and P. L. WINDLE
where mj are the regression coefficients, 4 the dry variables, n is the highest exponent for Aa and p is the number of levels in the factor. Note that, in this form, the first term in the series when expanded (i.e. moo&) is the value of the intercept, Jo, and Z, = 1. The data to be analysed, together with the dummy variables which uniquely define each specimen size, represented by Z, and &, are shown in Table 1. In a similar manner to the step-by-step process used earlier for choosing independent variables for inclusion in the model, the most appropriate regression equation describing these data and based on eq. (16) is built up. Step 1: Select the first in~e~n~e~t vuria~ie for ~ncf~i~n. A degree of judgement is necessary at this stage to keep the number of variables to a manageable level, for these data a maximum value of n = 3 was chosen. Hence J was regressed separately on Aa, Au2 and Au’ to find the regression giving the highest RSS with no dummy variables as yet involved. Having identified the first variable for inclusion (i.e. Aa) the dummy variables Z1 and Z, can be included to test for differences in the data from each source. At this point, the model was effectively: f = Jo + FRO,Zl + m2&
+ tnioZ&Aa + mliZIAa
+ m12.&Aa.
(17)
The significance of each of the dummy variable coefficients (i.e. ms) is tested using the t-statistic and if t > 2 the dummy variable was retained in the regression. The results of this process indicated that none of the dummy variable terms involving Z, and Z, were significant and the most approporiate regression relationship was:
(1%
j = 294.9 + 176Aa. See Table 4 for the relevant test statistic values.
Table 4. Analysis of fracture toughness data for three different specimen sizes by multiple regression method Regression coefficients
Step I
Variables in regression model
JO
94.06
286.67
4.41
11.3
4.45
16.89
244.2 368.2 -6.1
12.19 -3.31
96.44
2.38
-‘u Aa Aa Z,Aa3
247.9 372.0 6.4 -123.0
12.71 -3.65 - 1sot
96.88
0.44
2.26
4.49
Jo 2
294 -122.5 165.3
3.60 -0.45t 0.25t
Aa Z,Aa Z,Aa Aa* Z,AaZ Z,Aa* Aa Z,Aa3 &Aa3
391.6 -212.3 15.6 28.8 566.6 286.1 -12.4 -409.0 -211.6
1.841 0.41t 0.01t 0.25t 0.46t 0.15t -0.07t -0SOf -0.267
98.12
1.68
0.89
5.91
JO
Aa Aa3 3
% Fit
294.9 176.0
&a
2
Value
Test statistic t-value
Contribution of additional variables to goodness of fit Critical value of Increase Statistic F at 5% in % fit F level
tRegressioncoefficients which are not significant. ‘Their true value cannot be distinguished from zero.
Analysis of multi-specimen
fracture toughness
231
1.4 x10’
1.2 -
.I= 244+368Aa-
0.0
0.0
I 0.5
I 1.0
6 Aa
I
I
I
I
I
I
I
1.5
2.0
2.5
3.0
3.5
4.0
4.5
!5.13
A atmm)
SPECIMENS CT SPECIMENS
0
20mm CT
A
40mm
+
1OOmmCT SPECIMENS
Fig. 4. Crack growth resistance curve obtained by multiple regression analysis of the data in Table 1.
Step 2: Introduce a second variable. In a similar process to step 1, Aa3 was found to be the appropriate variable and again all dummy variable coefficients were not significant. Hence;
j = 244 + 368Aa - 6Aa3.
(19)
All subsequent steps produced no statistically significant regression model, see Table 4 for the relevant test statistic values. This analysis has indicated that there are no statistically significant differences between the
Table 5. Studentized residuals and Cook distances for data from Table 1
J (kN/m)
Aa (mm)
Studentized residuals
Cook’s distance
Predicted J (kN/m)
295 305 355 383 436 486 279 321 362 505 545 676 665 343 486 466 745 900 1221 1332
0.197 0.313 0.259 0.357 0.431 0.899 0.242 0.284 0.469 0.684 0.988 1.316 0.872 0.227 0.425 0.442 1.151 2.119 2.951 4.227
-0.38582 -0.95502 0.27643 0.13516 0.58830 - 1.48238 -0.96103 -0.48705 -0.94516 0.19088 - 1.00345 -0.70413 1.81531 0.27289 1.50083 1.04187 1.54200 - 1.33983 0.98532 -0.58176
0.00574 0.02699 0.00255 0.00049 0.00816 0.05427 0.03205 0.00748 0.01988 0.00072 0.02894 0.02607 0.07807 0.00268 0.05364 0.02515 0.09256 0.25955 0.17225 2.10665
316.63 359.19 339.39 375.30 402.34 570.69 333.16 348.56 416.19 494.02 602.01 714.75
561.14 327.65 400.15 406.35 658.60 966.26 1173.87 1339.77
232
R. MOSKOVIC
and P. L. WINDLE
0
-1.5
I 0.2
0.4
0.6 FITTED
Fig. 5. Plot of studenthd
I
I
I 0.6
J VALUES
1.0
1.2
I;j,I X’ IO3
residuals against fitted J values for data in Table 1.
data for each specimen size, i.e. there is no specimen size dependency in these data. By taking all data together, the estimated value of the variance of the error distribution s2 is based on a much larger sample and it is usually found that as a result the standard errors of the coefficients are smaller. This effect feeds through into estimates of lower bound values. It must be emphasised at this point that even if some specimen size dependence had been found, the multiple regression model should still be used to analyse all the data together because the increased sample size provides a reasonable basis on which to estimate s* and carry out an analysis of residuals. For these data the values of ri and cti are given in Table 5 with rj plotted against fitted Ji in Fig. 5. No discrepancy between the regression model and the data is evident from the residuals which promotes confidence in the chosen form of the model. 3.2. Analysis of temperature dependence of fracture toughness data For many plant integrity assessments it is necessary to establish the variation of the initiation fracture toughness and crack growth resistance with temperature. Hence there is a need to obtain a relationship in which J is a function of both crack growth and temperature. Recently, multiple linear regression analyses have been used by ~oskovic~9~ to investigate the temperature dependence of fracture toughness of several steels. Two types of regression model were developed, depending on the material’s propensity to strain age. Full details of the analyses are given in [9] and will not be repeated here. As an alternative to the multiple regression method of analysis, a method based on a series of simple linear regression has been utilized by other authors. This approach carried out a regression analysis of J/Au data at each temperature, using these to predict a JO., estimate for each temperature before finally regressing &,2 on temperature to obtained a temperature dependence. The pitfalls in this method need to be clearly stated. Each JO.zvalue is a predicted value based on the J/Au regression at that temperature (T) and will have a variance which is dependent on the scatter about that particular regression. In a regression analysis, one observed, or measured, variable is regressed against other such variables: SO.2is not the result of an observation but is a predicted value. Also,_more significantly, the assumption of constant error variance, which is made when regressing Jo.2 on T, does not apply as the variance of .&2 will be different for each JlAa regression. A more appropriate regression model could include the weighting of each fz with +J%& .& followed by a regression on T. In reality, however, the magnitude of Var .&.Z is usually such as to completely swamp any changes in &.Z values with T making the whole exercise
Analysis of multi-specimen
irrelevant. concluded
fracture toughness
233
A more detailed comparison of the two methods is given in [9] in which it was that the multiple regression method is the more effective method.
3.3. Analysis of data from several sources at several temperatures A combination of the techniques is necessary for the analysis of data from several sources and at several temperatures. An acceptable approach would be: (i) Analyse the data from each source separately in order to determine the form of the temperature dependence; if a temperature factor is appropriate, due to strain ageing effects for example, then use dummy variables (see section 3.1), and if a continuous temperature dependence is expected then use T and Aa as the independent variables (see reference 9); (ii) Each data set would then be represented by a single regression equation of some form; the data from separate sources can now be compared using a dummy variable factor for each source to obtain finally the single relationship best fitting the whole data set. For example, with 3 data sources, each source containing J/ha data for several temperatures, a set of T dependencies could be: Source 1 J1 = a1 + &Aa + SiAaT Source 2 J2 = (Y:!+ p2Aa + a2AaT Source 3 J3 = a3 + p3Aa + S3AaT. The combination model:
of these, using dummy variables Z1 and &, would produce a regression
5, = a1 + PiAa + &AaT + mIZI + m2& + m3Z1Aa + m&ha
+ m5Z1AaT + m&AaT.
(20)
It must be emphasized that for an analysis to be carried out on combined data sets, there should be a physically based rationale for considering the data to be similar; for example, plates should be of nominally the same composition and metallurgical condition.
4. ESTIMATION
OF DISTRIBUTIONAL
FORM AND LOWER BOUNDS
An integrity assessment of structures containing flaws requires, as part of the input, values for the lower bound toughness of the material. Furthermore, any Probabilistic Fracture Mechanics assessment will also need an estimated distribution representative of the population of toughness predictions. With the most appropriate regression model defined by the procedures described, estimation of the distributional form for toughness predictions, and hence lower bound values, is feasible. Texts on the subject of regression analysis usually represent the population of predicted values by a normal distribution based on the regression estimate of the mean value and the variance. However, in the context of giving toughness predictions it could be argued that a non-negative, possibly skewed, distribution is physically more meaningful. The Weibull function provides a flexible functional form for such a distribution, but the parameters defining the appropriate fit to the data cannot be derived from regression mean and variance estimates. The following procedures aim to provide a sample of predicted values to which any acceptable distributional form can be fitted. The regression estimates of the mean and variance of the population of predicted values are useful reference parameters. The mean of the population of predicted J values is given by eq. (1) and its variance is derived from: Var .?i = Var & + Aa Var 6 + 2Aa Cov (job). The regression estimate of the population variance (a’) (see Fig. 1) is given by s2 as defined in eq. (5). Hence the estimated variance of the population of predicted Ji values is Var Ji(pred) = Var Ji +
s2.
(21)
234
R. MOSKOVIC
and P. L. WINDLE
Suitable methods for deriving the population of toughness predictions from various sets of data can now be outlined. 4.1. Data from a singie plate or weld (a) Jo.* estimation. As mentioned previously, the standard techniques for predicting lower bound values are based on the assumption of a normal population: an alternative method will be outlined here. The simplest case to consider is that of a single set of JlAa multi-specimen data. It has been shown[lO] that the distribution of predicted J values can be estimated for any Au by backfitting the residuals about the regression line of the observed J, i.e. (Ji- ji), to form a common distribution about the regression estimate of Ji at the particular Aai. The arguments in support of this method are given in [lo]; the basis of the approach is the regression model assumption that the variance of the population about the regression is the same at all points along the regression line (see Fig. 1). The residuals are backfitted in the studentized form since the distribution of studentized residuals has a mean of zero and standard deviation of one. This population of studentized residuals can be backfitted to form the distribution of the population of predicted J values at any particular crack length. For example, to generate a popui~tion at Aa = 0.2 mm for each datum, residual ei is obtained then suitably scaled and added to Jo.* to give: Jo.2 + ei(&ar where Var JO.2 is the estimate of the variance which is given by:
J&Var
ei),
of the population
(22) of J predictions
at Aa = 0.2
Var JO.*= sz + Var &.2 (see [Sj). On the basis of this backfitted population of J predictions, the best fitting distributional form can be determined. Again the problem with using this technique with a single set of J/As data is insufficient sample size. Hence recourse has to be made to the use of multiple regression to combine data from several sources for which the assumption of a common error variance is reasonable. On the basis of such a combined data set, the backfitting procedure can generate a population of predicted J values for each data source. For example, with 5 data points at each of three temperatures for specimens from the same plate, a multiple regression analysis followed by backfitting will provide a population of predicted J values of 15 points for each temperature. (b) dJ/da and J, estimates. The lower bound estimate of the mean value for dJ/da is determined on the basis of the regression estimate of the variance in the slope (i.e. Var /3) and the assumption of a normal distribution. This lower bound slope in conjunction with a lower bound Jf).2 obtained as outlined in section 4.1(a) will provide the lower bound crack growth resistance curve. It is on the basis of this curve and the validity criteria that a lower bound Jg can be estimated for use in assessments which allow for some stable crack growth. 4.2. Data from several sources It is often the case that data for the particular plant item being assessed are not available, in this instance recourse has to be made to the general body of data available on the material of similar type. These data, which will be from several sources, can be grouped together, using a backfitting approach, to produce a g~obai population of predicted toughness values on which integrity assessments can be based. This approach has been used in the analysis of fracture toughness data loaded in the through thickness direction as reported in [ll]. The suggested methodology is first to take each source of data separately and by backfitting, as described in section 4.1 generate a population of toughness predictions for that data set. All of these populations can then be grouped together to form the global population which is to be fitted by the appropriate distribution function. Prior to determining a distributional fit, this global population should be plotted as a frequency histogram to check for the occurrence of bimodality.
Analysis of multi-specimen 5.
SUMMARY
fracture toughness
235
OF RECOMMENDATIONS
The aims of the analysis of toughness data are to give predictions of toughness values .&(K&) and crack growth resistance d.T/da (dK’/da), both mean, lower bound and distributional form. With these needs in view, guidelines for the analysis of data sets can be given. 5.1. Single set of J(K*) and Aa measurements (i) If all crack lengths are smaller than 2 mm a bivariate regression of J on Aa is recommended, alternatively multivariate regression should be used. (ii) Analyse the residuals in the studentized form, testing for independence, heteroscedascity and outliers. (iii) Re-assess the data set, either re-evaluate or omit the outliers and consider the need for a weighted regression analysis. (iv) If a normal distribution for errors is to be assumed, then use the standard regression estimates of the confidence limits on intercept and slope. (v) If other distributional fits are required, then backfit the scaled residuals as described in section 4.1(a) to generate a population of predicted values at the crack growth of interest. (vi) Obtain lower bound J values on the basis of the lower bound J0.22 and dJ/da values. 5.2. Data from several sources The analysis of data from several sources, and at various temperatures, is based on a regression of J(K’), as the dependent variable, on suitable independent variables and utilizes the multiple regression technique as follows: (i) With data from several sources (e.g. different plates, orientations) an analysis using dummy variables should be used to determine whether the data can be analysed as a single set or as several separate subsets. (ii) Data at several temperatures from a single source should be analysed either with a dummy variables factor for temperature or, if a continuous temperature dependence is expected, with T and Aa as independent variables in a multiple regression analysis. (iii) For data both from different sources and at various temperatures a combination of approaches (i) and (ii) should be used in the analysis. (iv) If normally distributed errors are to be assumed, then the multiple regression analysis will give directly the statistically based lower bounds. (v) If other distributions are to be considered then back~ttin~ the residuals in a studentized form can generate a population of values at any crack growth and temperature.
Acknowledgement-This paper is published with the permission of the Director (Nuclear Plant) Operational Engineering Division, Central Electricity Generating Board.
REFERENCES [I]
ASTM Standards E813-81, pp. 810-828, (1981). [2] B. K. Neale, D. A. Curry, G. Green, J. R. Haigh and K. N. Akhurst, A procedure for the determination of the fracture toughness of ductile steels, CEGB Report TPRD/B/0495/R84 (1984). [3] A. Madansky, American Stat. Assoc. J., p. 173 (March 1959). [4] P. L. Windle, Statistical models for the analysis of fracture toughness test data, Engng Fracture Me&., 22, 885-895 (1985). [S] S. Weisberg, Applied Linear Regression, Wiley, New York (1980). [6] A. S. Goldberger, Econometric Theory, p. 218, Wiley, New York (1963). (71 N. Draper and H. Smith, Applied Regression A~u~ys~, Second Edition, Wiley, New York (1981). [8] ~~~~~rn, J. T. Bland and G. Wardle, Seventh Int. Conf. on S~uct~ro~ Mechanics in Reactor Technoiogy, p. G2/3
[9] R. Moskovic, Fracture toughness and crack growth resistance of pressure vessel plate and weld metal steels, CEGB Report SER/SSD/85/0026N. [lo] P. L. Windle, Backfitting: a technique for generating a population sample of fracture toughness predictions from multi-specimen test data, CEGB Report SER/SSD/86/. [I I] P. L. Windle, The analysis of mode I toughness data for mild steel plate loaded in the through-thickness direction. CEGB Report, SER/SSD/85/0350/S. (Received
29 September 1987)