Reliability Engineering 18 (1987) 131-139
Proportional Hazards Analysis of Field Warranty Data T h o m a s L. L a n d e r s Department of Industrial Engineering, Bell Engineering Center, University of Arkansas, Fayetteville, Arkansas 72701, USA
and William J. K o l a r i k Department of Industrial Engineering, Texas Tech University, Lubbock, Texas 79409, USA (Received: 16 October 1986)
ABSTRACT This paper describes a proportional hazards rel&bifity analysis and some of the methods of statistical inference, based on asymptotic theory. The authors report application of the Weibull proportional hazards model to the analysis of actual field warranty data for automotive air conditioning compressors. Statistical inference is illustrated through asymptotic normal confidence intervals and the likelihood ratio test statistic. The research employed a commercial data set to test theories relevant to defensive systems reliability analysis, and illustrates technology transfer from defense-related research to commercial applications.
1 INTRODUCTION The proportional hazards (PH) models have recently received theoretical attention and practical applications in medical research. These models provide valuable insights in survival studies, by permitting reliability to be expressed as a function of important explanatory variables, referred to as covariates. The PH models are especially useful in ex post facto studies, because such problems as nonhomogeneity and censoring can be 131
Reliability Engineering 0143-8174/87/$03-50 © Elsevier Applied Science Publishers Ltd, England, 1987. Printed in Great Britain
132
Thomas L. Landers, William J Kolarik
accommodated. However, there have been very few engineering applications of the PH models to date. Dale reported applications of PH models to accelerated tests on nonrepairable systems and to simulated reliability growth data on repairable systems. 1 Ascher reported application of the nonparametric Cox model to the life analysis on nonrepairable naval components (Sonar Dome Rubber Windows). 2 Ascher also reported an application of the Prentice, Williams and Peterson 3 repeated onset model for reliability studies on repairable Marine Gas Turbines. 2 The primary objective of this paper is to report a successful application of the Weibull PH model to reliability analysis of field warranty data on automotive air conditioning compressors. A secondary objective is to illustrate some of the important hypothesis tests employed in PH reliability studies.
2 THEORY
2.1 Proportional hazards models Dale defined the basic proportional hazards model as ,~(tj, Z1, Z 2 , . . . , Zk) = )-o(t) exp (fllZ~ + fl2Z 2 + . . . + flkZk)
(1)
where 2(ts, Z 1.... , Z k ) = the hazard rate at time t for an individual with covariates Z I , Z : , . . . , Z k, 2o(t)= the baseline hazard function and fl,, fl:,..., flk are the parameters of the model. In the literature, this model is commonly expressed as ;4t; Z; #) = 20(0 exp (Z#)
(2)
where Z is the vector of covariates and fl is the vector of model parameters, which are essentially regression coefficients estimated by the method of maximum likelihood. The maximum likelihood estimators (mles) are obtained by NewtonRaphson algorithms, using software codes which are becoming widely available and accessible by practicing engineers. The estimators and formal statistical tests are based on asymptotic properties. The two primary approaches to statistical inference are (1) asymptotic normality and (2) likelihood ratio statistics.
Proportional hazards analysis of field warranty data
133
2.2 Asymptotic normality To illustrate these methods, let 0 be a vector of true distribution parameters and covariate regression coefficients. Under the assumption of asymptotic normality, the maximum likelihood estimator/~ is distributed multivariate normal, N[0, I-1(0)]. The standard normal variate is given by z =
0 -0j 1/2
where /~(0) is the asymptotic variance of 0r also denoted Asvar (dr)" Confidence intervals can be obtained using this standard normal variate. 4
2.3 Likelihood ratio statistics For smaller samples, most authorities recommend that statistical inference be based on the likelihood ratio statistic R(Oo) = L(Oo)/L(O )
(4)
where L denotes the maximized log likelihood for the model. Under the null hypothesis Ho: 0 = 0 o, the asymptotic distribution of A = -- 2 log R(Oo) = -- 2 log L(00) + 2 log L(/))
(5)
is approximately chi-square with degrees of freedom equal to the order of the information matrix. Large values of A indicate failure to accept the null hypothesis. 5
3 COMPRESSOR W A R R A N T Y
3.1 Objectives The authors undertook a quantitative investigation of field warranty data on automobile air conditioning compressors. The objectives of this study were as follows: (1) to demonstrate practical application of theory; (2) to investigate the commercial utility of modeling concepts developed in the research; and (3) to study a case of reliability analysis in a field warranty data set for air conditioning compressors.
134
-Fhomas L. Landers, William J. Kolarik
3.2 Approach The authors were previously employed in quality assurance and reliability engineering. Prior experiences were in reliability and maintainability engineering, and in warranty management for a company manufacturing air conditioning systems for automobiles. This first-hand experience with field data analysis, and specifically in the warranty data base, provided valuable knowledge and insight. The approach in this research included preparation and analysis of the warranty data. The authors obtained authorization from the manufacturer to use the warranty data for research purposes. All proprietary information, such as part numbers, model numbers and vehicle identification numbers, was purged from the working data set. Statistical analyses were performed using the SAS procedures L I F E T E S T and LIFEREG. The LIFETEST procedure computes nonparametric estimates of reliability distributions from censored life data, by either the life-table method or the product-limit (Kaplan-Meier) method. L I F E R E G fits parametric reliability models to censored life data. The allowed distributions are the exponential, Weibull, lognormal, loglogistic and gamma. Covariates may be included in the models. L I F E R E G computes the maximum likelihood estimators of model parameters and regression coefficients by a N e w t o n - R a p h s o n algorithm. The Weibull analysis employs the location-scale formulation. The scale parameter (~) in the location-scale model is the reciprocal of the Weibull shape parameter (6). 6 The Weibull reliability function is expressed in a variety of ways in the literature. A c o m m o n form of the two-parameter Weibull without covariates is R(t; ~; 6) = exp { - [(t/~)~] }
(6)
where a = Weibull scale parameter and 6 = Weibull shape parameter. The L I F E R E G software uses the location-scale survival function, for w = In (t): G(w) = exp { - exp [ ( w - #)/a] }
(7)
where w = In(t), # = location parameter and a = scale parameter. The parameters in eqns (6) and (7) are related as follows: a = l./a
(8)
= exp (#)
(9)
and
Subsequent discussion in this paper assumes the location-scale model, and deals specifically with the estimates of a obtained from computer analysis.
Proportional hazards analysis of field warranty data
135
3.3 Automotive air conditioning The warranty data set consisted of failure and censoring events on auto air conditioning compressors. Such equipment is installed at the factory, by a distributor or by a local dealership. The automotive model year begins in the fall of the calendar year, usually September or October, depending upon the automaker. For cars manufactured in the early model year, the air conditioning system is used minimally until the spring, around April or May of the following calendar year. There are seasonal patterns in the sales of automobiles and in the use of the air conditioning. There is also a lag of about one month from warranty claim until that claim is entered in the warranty data base. These factors make the assessment of system reliability very difficult. Problems and trends are unclear until well after the model year is complete. Consequently, sophisticated techniques are needed for the analysis of air conditioning warranty data. The analysis is justified by the importance of the product quality in a competitive marketplace and the substantial impact of warranty expense on profitability of the firm. 3.4 Weibull distribution and covariates Engineers have recognized that the Weibull distribution is applicable to the life analysis of reciprocating mechanical equipment, such as compressors. The authors' experience in the industry and with this particular data set led to the belief that two variables might be important covariates: (1) dormancy (DORMANT) and (2) installer (INSTALL). The compressors in this sample were suspected of being placed under abnormally high stress (overcharging with refrigerant) at the time of installation. The researchers recognized that this was an important factor and that the stress might be relieved during the dormancy between installation and the subsequent air conditioning season. The compressors in the data set were installed between October of one year (year 1) and April of the following year (year 2). Two phases were defined: (1) dormancy (from October year 1 to April year 2) (2) active (from April to September year 2). The events consisted of either failure or Type I (Time) censoring. There were 134 failures and 4433 censoring events (i.e. 97% censoring) in the active phase.
136
Thomas L. Landers, William J. Kolarik
INSTALLER GROUPS tss -3
R B
t (t) Fig. 1, Kaplan-Meier estimates.
3.5 Data analysis Figure 1 illustrates the plots ofln (t) versus In [ - In R(t)], where In (0) denotes the natural logarithmic function. The reliability function (R) was estimated by the nonparametric Kaplan-Meier method, using the SAS procedure LIFETEST. Note that the plots for installers A and B have different slopes giving subjective evidence that the two strata are characterized by different Weibull shape parameters. Accordingly, the graphical evidence indicates that the two levels of the INSTALL (Installer) class covariate do not form proportional hazards. The LIFETEST software provided three different chi-square test statistics for the null hypothesis of homogeneity (logrank, Wilcoxon, and Likelihood Ratio), all of which were highly significant (p-values of 0.0001). The logrank and Wilcoxon tests of association for the DORMANT covariate gave strong evidence that dormancy (time elapsed between installation and the active air conditioning season) was an explanatory variable. Figure 1 graphically indicates different Weibull shape parameters for the two installers. If there are two separate populations, then the data should not be pooled across strata. Therefore the authors investigated whether the difference in the estimated shape parameters is statistically significant, based on formal tests. The approach was to fit the Weibull distribution, without covariates, to the life data in the two installer strata. The maximum likelihood estimates were obtained by means of SAS procedure LIFEREG and are summarized in Table 1. TABLE I
Asymptotic Normal Intervals for Installers Installer strata
Scale parameter estimate
Asymptotic variance
95% confidence interval
A B
1"185 800 0"785 786
0"023 684 0"008 118
(0"824, 1-487) (0"609, 0-962)
Proportional hazards analysis of field warranty data
137
The 95% confidence intervals were based on asymptotic normality. As shown in Table 1, neither estimate falls within the 95 % confidence interval of the other estimate. Assuming asymptotic normality, each estimate is distributed normal, N [ # , A s v a r ( # ) ] , where # is the estimated scale parameter of the location-scale model. To test the hypothesis of no difference in means, the null hypothesis is Ho:a A - an = 0 and the test statistic is Z=
#A--#B--0
[Asvar (#A) + Asvar (#B)] 1/2
= (1"185 800 -- 0"785 786)/(0"023 684 + 0.008 118) 1/2 = 2"2395 > 1.96 This result indicates failure to accept the null hypothesis at the 0.05 level 1"96). As an illustration of the likelihood ratio method, consider testing the hypothesis of exponentially distributed lives: (Zo.975 =
Ho:a~ = 1
for i = A , B
These hypothesis tests require maximization of the log likelihood in each strata, with the scale parameter restricted to a t = 1. Log likelihood values for both the unrestricted and restricted models are presented in Table 2. To test the hypothesis that the scale parameter for distribution A is equal to one, the likelihood ratio statistic is A = -2(-355.3501) + 2(-354.4367) = 1.8268 The value of X2 (0"05, 1) is 3.84. This test indicates acceptance of the null hypothesis for installer A. Similarly, to test the corresponding hypothesis for installer B, the likelihood ratio statistic has the value A = 4-078 Since 4.078 > 3.84, the null hypothesis is rejected for installer B. TABLE 2 Maximized Log Likelihoods for the Likelihood Ratio Tests on Scale Parameters
Installer strata
Scale parameter estimate
A
1'185800
B
1 0'785 786 1
Log likelihood
- 354-4367 -355-3501 -370'0552 -372"0941
138
Thomas L. Landers, William J. Kolarik
Both the graphical evidence and the asymptotic tests indicated that the time-to-failure distributions are different for the two installation sites. Statistical analyses o f the d o r m a n c y factor established that D O R M A N T was a covariate forming proportional hazards within the two populations defined by installer. 3.6 Model results
Within each installer strata, active-phase reliability was modeled as a Weibull, with the characteristic life a function o f the d o r m a n c y interval. The D O R M A N T class covariate was defined with three levels (zero, one to two, or three to four months). The mles o f regression coefficients, for installer i, gave the following model for characteristic life (in days): ~i(z,/~i) = exp (flil "3t-Z2fli2 -~- Z3J~i3)
i = A, a
(10)
where ilk1 = 10"0841, flA2 = -- 1"4082, flA3 = -- 0"6289, fiB1 = 8"3264, fiB2 = --1"1974 and f l a 3 = - 0 " 7 8 9 8 . Chi-square tests o f the hypotheses Ho: flij = 0 (i = A, B;j = 1, 2, 3) indicated rejection, with observed significance levels (p-values) o f 0.10 or less. In eqn (10), the indicator variables assume the values in Table 3. TABLE 3 Indicator Variables
Dormancy (months)
0 1-2 3-4
Z2
Z3
1 0 0
0 1 0
In the case of installer A, the active phase reliability for an air conditioning season o f 120 days, following a d o r m a n c y o f two months, is (from eqn (6)) R(t; ~A(Z,ilk); hA) = exp [-- (te--ZPA)6A] =exp[--(120e-(l°°a41-°'62891) 0"8433] =0"981
(11)
4 CONCLUSIONS The research reported here demonstrated the potential of the proportional hazards models in reliability analysis of warranty data. The software performed satisfactorily and should prove very useful to practicing
Proportional hazards analysis of fieM warranty data
139
engineers. The research produced insights valuable for accelerated testing, management of installers and improvement of installation procedures. The maximum likelihood estimates and statistical inferences were based on asymptotic theory. Warranty data sets tend to be reasonably large, compared to the sample sizes in life tests. However, warranty data tends to be heavily censored, as illustrated by the compressor data. The mles are generally regarded to be better for larger samples and moderate censoring; and sensitivity to censoring is a subject for additional research. In the meantime, the authors suggest that the theory and methods described in this paper provide valuable tools for engineers to analyze warranty data.
ACKNOWLEDGEMENTS This work represents a portion of research funded by the United States Defense Nuclear Agency/SDIO under contract number DNA-001-85-C0184. It illustrated how data from a commercial application can benefit research on defense-related topics. More importantly, the potential was illustrated for technology transfer from strategic defense research to the private sector.
REFERENCES 1. Dale, C. J. Application of the proportional hazards model in the reliability field, Reliability Engineering, 10 (1985), pp. 5-25. 2. Ascher, H. Regression analysis of repairable systems reliability, NA TO A SI, F3, pp. 119-33. 3. Prentice, R. L., Williams, B. J. and Peterson, A. Y. On the regression analysis of multivariate failure time data, Biometrica, 68 (1981), pp. 373-9. 4. Kalbfleisch, J. D. and Prentice, R. L. The Statistical Analysis of Failure Time Data, Wiley, New York, 1980. 5. Lawless, J. E Statistical Models and Methods for Lifetime Data, Wiley, New York, 1982. 6. SAS Institute Incorporated. SAS User's Guide: Statistics, Version 5 Edition, SAS, Carey, North Carolina, 1985.