Materials Science and Engineering A 527 (2009) 397–399
Contents lists available at ScienceDirect
Materials Science and Engineering A journal homepage: www.elsevier.com/locate/msea
Rapid communication
On evaluating Weibull fits to mechanical testing data Murat Tiryakio˘glu a,∗ , David Hudak b , Giray Ökten c a
Department of Engineering, Robert Morris University, 6001 University Boulevard, Moon Township, PA 15108, USA Department of Mathematics, Robert Morris University, Moon Township, PA 15108, USA c Department of Mathematics, Florida State University, Tallahassee, FL 32306, USA b
a r t i c l e
i n f o
Article history: Received 29 July 2009 Received in revised form 4 August 2009 Accepted 5 August 2009
Keywords: Extreme value Reliability Castings
a b s t r a c t The use of coefficient of determination, R2 , and the Anderson–Darling (A2 ) hypothesis test to evaluate the goodness-of-fit to the two-parameter Weibull distribution was investigated. Results of Monte Carlo simulations for sample sizes between 5 and 100 indicated that guidelines provided previously in the literature are too conservative for sample sizes up to 80. New guidelines for the use of R2 and A2 for sample sizes between 5 and 100 have been developed. The two measures of goodness-of-fit were found to agree more than 95% of the time, regardless of sample size. The use of the new guidelines has been demonstrated on two datasets from the casting literature. © 2009 Elsevier B.V. All rights reserved.
1. Background In 1951, Weibull [1] introduced an empirical distribution based on the “weakest link” theory, developed previously by Pierce [2]. The “weakest link” theory applies in situations that are analogous to the failure of a chain when one of its parts has failed [3]. Since then, the Weibull distribution has been used in a wide variety of applications, including mechanical testing data from material specimens with defects. The distribution function of the Weibull distribution which models the extreme values due to the “weakest link” concept is expressed as:
m
P = 1 − exp −
0
(1)
where 0 is the scale parameter and m is the shape parameter, alternatively referred to as the Weibull modulus. The shape parameter, m, in Eq. (1) has been used as a measure of reliability, and applied to brittle fracture of ceramics and mechanical properties of metals, such as tensile and fatigue results. One application of the Weibull modulus is the filling system design of aluminum alloy castings. Green and Campbell [4] showed that the tensile strength of cast Al–Si alloys follow a Weibull distribution and that the filling system design has a strong effect on the Weibull modulus. According to Campbell [5], m is often between 1 and 10 for pressure die castings, and between 10 and 30 for many gravity-filled castings. For
∗ Corresponding author. E-mail address:
[email protected] (M. Tiryakio˘glu). 0921-5093/$ – see front matter © 2009 Elsevier B.V. All rights reserved. doi:10.1016/j.msea.2009.08.014
good quality aerospace castings, m is expected to be between 50 and 100. Due to the destructive nature of testing involved in these studies, m has to be estimated from a sample, sometimes small in size, using one of the three methods: (i) linear regression, (ii) maximum likelihood, and (iii) moments. The most common method is linear regression mainly because of its simplicity. Taking the logarithm of Eq. (1) twice yields a linear equation: ln[−ln(1 − P)] = m ln() − m ln(0 )
(2)
with a slope of m and an intercept of −m ln( 0 ). When ln(−ln(1 − P)) is plotted versus ln(), the Weibull probability plot is obtained, which is the most common goodness-of-fit test: if the trend in the data suggests a linear relationship, then the data are assumed to indeed come from a Weibull distribution. The use of a probability plot, however, is subjective and may be insufficient to test the hypothesis that the data come from the particular distribution. That is why it is strongly recommended that probability plots be always augmented by formal goodness-of-fit tests [6]. When linear regression technique is used, the coefficient of determination, R2 , can be taken as a measure of the goodnessof-fit, as reported in the metal casting literature. Although R2 is undeniably a measure of goodness-of-fit, there are only ambiguous guidelines provided in the literature on what values of R2 , the goodness-of-fit is poor. Doremus [7] stated that for R2 ≥ 0.95, the fit is good and for R2 < 0.90, the fit is poor for all sample sizes. Note that for R2 values between 0.95 and 0.90, no guidance was provided. There are also formal hypothesis tests to determine whether a dataset come from a specific type of distribution [8]. Among all
398
M. Tiryakio˘glu et al. / Materials Science and Engineering A 527 (2009) 397–399
formal hypothesis tests, Anderson–Darling [9] goodness-of-fit test statistic: (i) is known for its sensitivity to the tails of the distribution [8], and (ii) was shown [10,11] to be superior to a majority of other goodness-of-fit tests for a variety of distributions. The test statistic, A2 , is written as: 1 [(2i − 1)ln P(xi ) + (2n + 1 − 2i) ln(1 − P(xi ))] n n
A2 = −n −
(3)
i=1
where n is the sample size, i is the rank of the data point in the sample in ascending order and P(xi ) is the cumulative probability for each data point, calculated with the estimated distribution parameters. The smaller the value of A2 , the higher the confidence that data follow the distribution being tested. The hypothesis that the dataset follows the tested distribution is rejected when p-value is less than a specified value, ˛, which is typically prescribed as 0.05 (corresponding to 95% confidence). Although the Weibull distribution has been used to characterize mechanical testing data, it is the authors’ opinion that there is insufficient guidance in the literature for the materials engineer to evaluate the Weibull fit and determine whether the mechanical data indeed follow the Weibull distribution. This study has been motivated by this need. 2. Experimental details Monte Carlo simulations were used to generate n data points from a Weibull distribution with 0|true = 1 and mtrue = 3. It should be noted that the values for scale and shape parameters are inconsequential because the distributions of estimated 0 and m, normalized by 0|true and mtrue respectively, are only affected by the sample size [12,13]. To estimate probability for each data point, probability estimators (plotting positions) were assigned by P=
i−a n+b
(4)
where a and b are numbers, such that 0 ≤ a ≤ 1 and 0 ≤ b ≤ 1.0. Combinations of a and b were chosen using the values provided by Tiryakio˘glu and Hudak [14] so that the estimated scale parameters and Weibull moduli would be unbiased, i.e., the average of the estimated Weibull moduli is equal to the true value of the Weibull modulus, mtrue . Fourteen sample sizes (n) ranging from 5 to 100 were investigated. For each sample size, simulations were repeated 20,000 times. For each iteration, scale parameter and Weibull modulus were estimated by using Eq. (2) and R2 as well as A2 were calculated. 3. Results and discussion The histogram for R2 for n = 30 is presented in Fig. 1. Note that a strong majority of the values is above 0.95 and there are very few occurrences below 0.90. The fraction of R2 values above 0.95 (fR2 ≥0.95 ) and below 0.90 (fR2 <0.90 ) are given in Table 1. Note that for sample sizes less than n = 80, the fraction of occurrences in which R2 ≥ 0.95 is less than 95%. Hence, the guideline that the fit is good when R2 ≥ 0.95 is too conservative for sample sizes less than 80, especially for lower values of n (15–50) used commonly in destructive testing. Note that although simulations were conducted at mtrue = 3, these results are valid for all values of the Weibull modulus for reasons stated previously. For low sample sizes, the occurrence of R2 < 0.90 is too high. When n = 20, the occurrence is approximately the same as the commonly used p-value of 0.05. For higher sample sizes, the guideline is too relaxed and becomes irrelevant with increasing sample size. These findings indicate that the guidelines given by Doremus [7] are not adequate to use R2
Fig. 1. The histogram of R2 values when n = 30.
Table 1 The fraction of R2 values above 0.95 and below 0.90 for all sample sizes. n
fR2 ≥0.95
fR2 <0.90
5 7 10 15 20 25 30 40 50 60 70 80 90 100
0.371 0.386 0.467 0.584 0.730 0.743 0.799 0.857 0.902 0.923 0.942 0.952 0.964 0.968
0.283 0.236 0.157 0.092 0.051 0.041 0.029 0.018 0.013 0.008 0.007 0.005 0.003 0.003
as a reliable tool to evaluate the goodness-of-fit to the Weibull distribution. 2 The critical values for R2 with ˛ = 0.05 (R0.05 ) were calculated for each sample size and presented in Fig. 2, along with the best-fit curve, which has the following formula: 2 R0.05 = 1.0637 −
0.4174 n0.3
(5)
2 calculated from Eq. (5) can be used to evaluate the The R0.05 goodness-of-fit. If the R2 of the linear regression from the Weibull 2 probability plot is less than the R0.05 value, then it can be concluded that the data do not come from a Weibull distribution. If 2 R2 ≥ R0.05 , then the distribution of the mechanical testing data is
Fig. 2. The critical values of R2 for ˛ = 0.05 for sample sizes between 5 and 100.
M. Tiryakio˘glu et al. / Materials Science and Engineering A 527 (2009) 397–399
Fig. 4. The Weibull probability plots for the two datasets.
Fig. 3. The crossplot of R2 and A2 for n = 30, with critical values for both indicated.
indeed Weibull. The authors recommend that Eq. (5) be used to evaluate goodness-of-fit to the Weibull distribution with R2 instead of the guidelines provided by Doremus. Fig. 2 also shows the critical values for A2 with ˛ = 0.05 (A20.05 ) as well as the best-fit curve, with the following formula: A20.05 = 1.2344 −
1.3956 √ n
399
Table 2 Statistical results on the Weibull fits to the two datasets. Dataset
n
2 R0.05
A20.05
m
0 (MPa)
R2
A2
TF BF
45 36
0.9305 0.9213
1.0264 1.0018
11.16 38.40
288.3 311.4
0.9278 0.9522
1.3913 0.7809
(6)
If the A2 of the Weibull fit is less than the A20.05 value calculated from Eq. (6), then it can be concluded that the data do indeed come from a Weibull distribution. If A2 ≥ A20.05 , then the distribution is not Weibull. The performance of R2 was also compared to A2 . The R2 and A2 for n = 30 are crossplotted in Fig. 3. Note that critical values of both are also indicated, which divide the plot into four regions. In Regions I and III, the results of the two tests agree, and in Regions II and IV, the two tests yield opposite recommendations. It was determined that R2 and A2 agree more than 95% of the times regardless of the sample size. Hence, Eq. (5) yields almost as reliable goodness-of-fit results as the highly regarded Anderson–Darling test statistic. Therefore, Eq. (5) (and/or 6) can be used as a reliable goodness-of-fit test for the Weibull distribution.
the presence of multiple defect distributions and therefore the data may come from a mixture of two Weibull distributions [16]. For the bottom-filled castings, both measures of goodness-of-fit indicate that the Weibull fit is acceptable. 5. Conclusions • The guidelines provided previously by Doremus for “good” fits are too conservative for sample sizes up to 80. • New guidelines for the use of R2 and A2 for sample sizes between 5 and 100 have been developed in this study. • The two measures of goodness-of-fit were found to agree more than 95% of the time, regardless of sample size. • It is recommended that these guidelines be used when twoparameter Weibull distribution is fitted to mechanical testing data by the linear regression method.
4. Examples Acknowledgment Two datasets from the literature will be used to demonstrate the use of the goodness-of-fit techniques recommended by the authors. Both datasets are from a study by Green and Campbell [4,15], who showed that the tensile strength (ST ) of cast Al–7%Si–Mg alloys is affected to a great extent during the mold filling stage. If the mold is filled quiescently, tensile strength not only is higher but also has less variability, i.e., higher reliability. Conversely, tensile strength has a lower average and higher variability when the mold filling takes turbulently. The two datasets represent these two types of mold filling: top-filled (TF) which is quite turbulent, and bottom-filled (BF), which is quiescent. The sample size is 45 and 36 for TF and BF, respectively. For the plotting position formula (Eq. (4)), b is 0.481 and 0.466 for TF and BF, respectively, and a = 0 [14]. The Weibull probability plots are presented in Fig. 4 and the estimated parameters as well as goodness-of-fit measures 2 are given in Table 2. For TF, R2 < R0.05 , indicating that the Weibull fit has to be rejected. The same conclusion is reached when the Anderson–Darling hypothesis test is used because A2 > A20.05 . Note in Fig. 4 that the slope for the lowest five points seems to be less than that for the rest of the data. This change in slope is indicative of
The authors would like to thank Dr. Paul N. Crepeau of General Motors for his comments on an earlier version of the manuscript. References [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16]
W. Weibull, J. Appl. Mech. 13 (1951) 293. F.T. Pierce, J. Textile Inst. 17 (1926) T355. T.T. Shih, Eng. Frac. Mech. 13 (1980) 257. N.R. Green, J. Campbell, Mater. Sci. Eng. A 173 (1993) 261. J. Campbell, Castings, vol. 303, 2nd ed., Elsevier, 2003. S.S. Shapiro, C.W. Brain, in: C. Taillie, G.P. Patil, B.A. Baldessari (Eds.), Statistical Distributions in Scientific Work, vol. 5, D. Reidel Publishing, 1981, p. 1. R.H. Doremus, J. Appl. Phys. 54 (1983) 193. M.A. Stephens, in: R.B. D’Agostino, M.A. Stephens (Eds.), Goodness of Fit Techniques, Marcel Dekker, 1986, p. 97. T.W. Anderson, D.A. Darling, J. Am. Stat. Assoc. 49 (1954) 765. M.A. Stephens, J. Am. Stat. Assoc. 69 (1974) 730. F.J. O’Reilly, M.A. Stephens, J. Roy. Stat. Soc. Ser. B 44 (1982) 353. D.R. Thoman, L.J. Bain, C.E. Antle, Technometrics 11 (1969) 445. A. Khalili, K. Kromp, J. Mater. Sci. 26 (1991) 6741. M. Tiryakio˘glu, D. Hudak, J. Mater. Sci. 43 (2008) 1914. N.R. Green, J. Campbell, AFS Trans. 102 (1994) 341. C.A. Johnson, J. Frac. Mech. Ceram. 5 (1983) 365.