On the internal multivariate quality control of analytical laboratories. A case study: the quality of drinking water

On the internal multivariate quality control of analytical laboratories. A case study: the quality of drinking water

Chemometrics and Intelligent Laboratory Systems 56 Ž2001. 93–103 www.elsevier.comrlocaterchemometrics On the internal multivariate quality control of...

193KB Sizes 0 Downloads 31 Views

Chemometrics and Intelligent Laboratory Systems 56 Ž2001. 93–103 www.elsevier.comrlocaterchemometrics

On the internal multivariate quality control of analytical laboratories. A case study: the quality of drinking water a a O. Ortiz-Estarelles a , Y. Martın-Biosca , M.J. Medina-Hernandez , S. Sagrado a,) , ´ ´ E. Bonet-Domingo b a

Departamento de Quımica Analıtica, Facultad de Farmacia, UniÕersitat de Valencia, C r Vicente Andres ´ ´ ´ Estelles ´ s r n, 46100 Burjassot, Valencia, Spain b General de Analisis Materiales y SerÕicios (GAMASER) S.L., Valencia, Spain ´ Received 30 June 2000; accepted 26 February 2001

Abstract Multivariate statistical process control ŽMSPC. tools, based on principal component analysis ŽPCA., partial least squares ŽPLS. regression and other regression models, are used in the present study for automatic detection of possible errors in the methods used for routine multiparametric analysis in order to design an internal Multivariate Analytical Quality Control ŽiMAQC. program. Such tools could notice possible failures in the analytical methods without resorting to any external reference since they use their own analytical results as a source for the diagnosis of the method’s quality. Pseudo-univariate control charts provide an attractive alternative to traditional univariate and multivariate control charts. This approach uses the relative prediction error in percentage, Er Ž%., which is calculated from a multivariate model such as PLS, as the univariate control variable. Er offers quantitative information on the magnitude of the error and is sensitive to systematic errors at the 10% level Žwhich are of analytical interest.. Finally, its capacity to detectrquantify error in a single method can be checked a priori. As a case study for applying such strategies in routine analysis, the problem of the quality of drinking water was examined. q 2001 Elsevier Science B.V. All rights reserved. Keywords: Quality control; Control charts; Multivariate statistical process control; Multivariate regression; PCA; PLS; Routine water analysis

1. Introduction The norms to regulate the quality of the water for human consumption include limits for all the parameters that directly affect human health w1x. Most of

) Corresponding author. Tel.: q34-96-38-64-878; fax: q34-9638-64-953. E-mail address: [email protected] ŽS. Sagrado..

these parameters are obtained by applying analytical methods. The entire set of circumstances under which analytical results are produced can be defined as the laboratory’s analytical system w2x. It is responsible for the accuracy of analytical data and, therefore, for water quality estimation Ža AmultiparametricB routine analysis system; here we have used the term AmultiparametricB instead of the term AmultianalyteB since not all the parameters determined in water samples are analytes..

0169-7439r01r$ - see front matter q 2001 Elsevier Science B.V. All rights reserved. PII: S 0 1 6 9 - 7 4 3 9 Ž 0 1 . 0 0 1 1 4 - 9

94

O. Ortiz-Estarelles et al.r Chemometrics and Intelligent Laboratory Systems 56 (2001) 93–103

Since the reliability of the analytical results depends on the quality of the analytical methods used, it becomes essential to validate them. The two key elements for controlling bias in routine quality control are the use of reference materials and participation in interlaboratory tests w2x. Nevertheless, these external control mechanisms are costly and they must be applied periodically, which affects the analysis routine. On the other hand, they do not prevent the analytical system from failing Ži.e. a systematic error in a method. between the periods of validation. For that reason, it is desirable to complement the external validation scheme with internal laboratory quality control tools, which use the analytical results generated by an analytical method as the tool for diagnosing its quality. The traditional method is to use the individual results of each analytical method Žunivariate statistics.. Techniques for statistical process control ŽSPC. are being used increasingly in the analytical field, and control charts, for example, have become a popular tool for monitoring the stability of an analytical method w3x. Nevertheless, as occurs in the case of multivariate problem, such us water quality, a vector of analytical results is generated in laboratories for each analysed sample. In such cases, monitoring univariate control data on all the individual parameters can be very misleading and difficult to interpret w4x. Alternatively, it is possible to use the joint information of several analytical results Žmultivariate statistics.. Multivariate techniques, such as principal component analysis ŽPCA. and partial least squares ŽPLS. regression, are being used more and more to solve analytical problems. For example, there are numerous applications of these techniques in the water analysis field, but they focused on characterizing environmental problems w5–9x and on multivariate calibration for determining chemical species w10,11x. In the last few years, in the field of process engineering, multivariate statistical process control ŽMSPC. techniques have been developed w4,12–14x based on PCA and PLS models. These techniques have proven to be useful in identifying changes in the systems and sensor failures in chemical processes. However, MSPC techniques have not been widely applied in the analytical area and have focused on the control of spectral signals w15,16x. As far as we know, these techniques have not been used to diagnose the

possible bias in analytical methods as a means of internal validation in analytical laboratories. In the present study, MSPC tools based on PCA and PLS models are adapted for automatic detection of possible errors in the methods employed for routine analysis of drinking water, using only the analytical results generated by them. Some strategies were compared in terms of error detection ability by means of error simulation. Some aspects, such as the type of obtained information, qualitative or quantitative, the influence of homogeneity on sample composition and the predictive ability of the model, were studied.

2. Experimental The values of 25 parameters from 186 water samples corresponding to a period of 4 years Ž1996– 1999. were used. The samples were taken at different points of the drinking water network and wells from the province of Valencia ŽSpain.. The frequency and distribution of analysis in the different zones was established in agreement with the legislation w1 x. The analyses were carried out in the GAMASER S.L. laboratories Žspecialized in the routine analysis of drinking water in this province. following standard methods of analysis w17x. The 25 parameters, selected from a total of 60 Žcomplete analysis., were those showing concentration levels over the quantification limits of the corresponding analytical methods. The parameters used Žand the corresponding units of measurement. were: Ž1. turbidity Žin U.N.F., Ž2. pH, Ž3. conductivity Žin mSrcm at 208C., Ž4. chloride Žin mgrl., Ž5. sulfate Žin mgrl., Ž6. silica Žin mgrl., Ž7. calcium Žin mgrl., Ž8. magnesium Žin mgrl., Ž9. sodium Žin mgrl., Ž10. potassium Žin mgrl., Ž11. hardness Žin 8F., Ž12. alkalinity . Ž . Ž Žin mgrl HCOy 3 , 13 total dissolved solids in mgrl., Ž14. dissolved oxygen Žin %O 2 ., Ž15. nitrate Žin mgrl., Ž16. oxidability ŽKMnO4 . Žin mgrl O 2 ., Ž17. total organic carbon ŽCOT, in mgrl C., Ž18. boron Žin mgrl., Ž19. fluoride Žin mgrl., Ž20. trihalomethanes ŽTHM, in mgrl., Ž21. iron Žin mgrl., Ž22. zinc Žin mgrl., Ž23. chlorine residual Žin mgrl., Ž24. aluminium Žin mgrl. and Ž25. temperature Žin 8C..

O. Ortiz-Estarelles et al.r Chemometrics and Intelligent Laboratory Systems 56 (2001) 93–103

2.1. Chemometric methods

3. Results and discussion

Analysis of each water sample involves generating a vector with 25 analytical results, one from each method applied to the sample. In this context, each method can be considered a sensor, whereas the group of methods associated with the global performance of the laboratory can be considered a system. Finally, the totality of the analyses done over a period of time can be seen as the process under study. The available data were arranged in an X-matrix Ž168 = 25.. MSPC techniques are usually used with a large, AcleanB data set representing a AnormalB operation system and showing a high degree of correlation. The results of routine laboratory analyses may not fit exactly the features of such systems. In our opinion, this does not mean that MSPC is not applicable to control analytical laboratories. However, the techniques should be adapted to the particular features of the data, and their applicability in relation to each particular method should be taken into account. A 168-sample data set is probably smaller than that used in most MSPC applications. However, it represents the variability of water quality along a 4-year period and therefore can be considered representative set in this particular case. On the other hand, the main objective in this preliminary paper is to explore the possibilities of various control plots to be alert to possible systematic error in the analytical methods in routine analysis. For this purpose, the available water quality data set serves as an adequate case study. The X-matrix was decomposed by PCA into a product of two new matrices: X s Tk PkT q E

Ž 1.

where Tk is the matrix of scores, Pk is the matrix of loadings, k is the number of principal components ŽPCs. included in the model and E is the residual matrix. The residual matrix from the test samples ŽE t . was calculated according to: E t s X t Ž I y PP T .

95

Ž 2.

where I is the identity matrix and P was calculated in Eq. Ž1..

3.1. Quality and structure of data set The ideal situation in a analytical Quality Control study is that there is a calibration set with no examples of out-of-control analytical measurements and a test set with confirmed examples of all expected outof-control effects. However, this may not be possible when only historic data are available or when the cost to obtain representative referenced samples is too high. In this situation, the data set needs to be split into a calibration set and a test set. The method of doing this can be quite arbitrary. The purpose is to get a calibration set that will generate a test limit that is conservative enough to highlight samples with possible problems. However, the samples that are put into the test set by such procedures should not be labelled as outliers or out-of-control. In truth, they may be but there is no evidence that they are; they are simply the most extreme samples in the data set. As new samples are tested, evidence of truly outof-control samples should be gathered at the same time and the calibration and test sets revised from time to time to reduce the incidence of false positives given by the process. In this paper, we have used a data set for which they have no evidence about out-of-control samples so we have used a multivariate procedure which consists in putting all of the samples with multivariately extreme values into the test set and checking that this then includes all of the univariately extreme values as well.

3.2. Split of data into calibration and test sets The localisation of extreme samples in the data matrix was performed on the autoscaled data. Fig. 1 shows two different plots showing the presence of anomalous samples. Fig. 1a shows the univariate box plots for each column of X Žmethod.. Fig. 1b gives a multivariate approach based on a PCA model. A critical aspect of PCA is the selection of the number of principal components Ž k .. Unfortunately, there is no universal, automatic criterion for this selection w18,19x, and discrepancies between different

96

O. Ortiz-Estarelles et al.r Chemometrics and Intelligent Laboratory Systems 56 (2001) 93–103

Fig. 1. Univariate and multivariate detection of anomalous samples. Ža. Box plots for all methods. Žb. Q vs. T 2 control plot based on PCA model.

criteria are common. Todeschini w18x did an extensive comparison of methods for establishing data correlation and the number of PCs to be used in the PCA model. The author proposed a new correlation index Ž K . as a measure of the total quantity of correlation in the data set, as well as linear ŽKL. and non-

linear ŽKP. functions for estimating the maximum and minimum number of significant PCs, respectively. The author concluded that KL and KP can be used to estimate the number of CPs giving potentially useful information, and that they are simpler than most of the traditional criteria used up to now. Applying the

O. Ortiz-Estarelles et al.r Chemometrics and Intelligent Laboratory Systems 56 (2001) 93–103

Todeschini criterion to the data matrix we obtain K s 0.41, which suggest a moderate degree of correlation. KL s 16 Žwhich implies 92.3% of the explained variance, EV. and KP s 7 Ž67.6% EV.. We select k s KP s 7 PCs in order to avoid obtaining a AconservativeB result, since the objective is to detect extreme samples in the data matrix. A hypothesis test showing anomalous Žpossible ‘out-ofcontrol’. samples can be obtained from the Hotelling T 2 and residual Q statistics and their confidence limits from the PCA model w13x. In MSPC, the Q or T 2 statistic is usually plotted vs. the sample number Žin many cases associated with time. as control charts. Here we have preferred to plot directly the Q vs. T 2 statistics together with their control limits w16,19x. Fig. 1b shows the Q–T 2 control plot corresponding to the 7-PCs model. As this plot shows, samples 39, 81 and 87 are far from the rest, which confirms that these three samples are out-of-control, in agreement with the univariate box plots of Fig. 1a. Sample 39 presents a much higher T 2 value Žthe distance to the multivariate mean. than the corresponding control limit ŽT 2-limit.. This is interpreted as an unusual disturbance within the PCA model. Fig. 1a reveals that this sample presents high values in parameters 1 Žturbidity. and 21 Žiron. simultaneously. In contrast, samples 81 and 87 have a Q value well above the corresponding control limit Ž Q-limit., which indicates that the sample data shifted outside the normal operating PCA space. Usually this fact is associated with anomalous values in a single parameter. Fig. 1a confirms that sample 81 has the maximum value of parameter 13 ŽCO 2 ., while sample 87 has a very low value in parameter 2 ŽpH.. After elimination of these three samples, two new cycles ŽPCA model recalculation and elimination of samples outside the Q- andror T 2-limits. were applied and 25 more samples were eliminated. The final PCA model was trained with the definitive 140 = 25 X-matrix, which was considered a Anormal cleanB data set for quality control. The Todeschini criterion values were K s 0.46 and significant PC interval 6–14. The value k s 6 PCs was chosen for this control model, which accounts for 67.6% EV. This strategy will lead to a model that, in the worst of the cases, may increase the number of false detections Žto be confirmed lately.. This is not as negative as the risk to have a non-representative model obtained with

97

a non-clean data set. For instance, the fact of including in the model a sample containing an atypical high value for a single parameter Žaffecting the associate parameter loading in the firsts PCs. may mask the detection of future samples with an error in this parameter w13,14x. This new PCA model may serve to alert us to future anomalous Žpossible out-of-control. test samples. The 25 = 25 matrix Žformed by the previously eliminated out-of-control samples. was used as a test matrix ŽX t . to check the performance of the PCA based Aalert systemB. The test data Žnon-scaled. were scaled according to the mean and standard deviation of the 140 = 25 training X-matrix, interpolated into the PCA model and finally, their Q and T 2 statistics were obtained. Fig. 2a shows the Q vs. T 2 control plot including training samples Žq. and interpolated test samples. As can be expected, all the test samples are above the Q and T 2-limits, i.e. in a real situation they will be classified as anomalous samples to be investigated. At this point, it is interesting to have a tool that shows which parameters Žanalytical methods. are related to the detected anomalous test samples. Examining the residual matrix from the test samples ŽE t . can facilitate this task by means of the so-called residual contribution plot strategy. These plots were designed to identify the sensors Žvariables. responsible for a high mean residual value in periods in which a system is beyond control w20x. As an alternative to the use of PCA-residuals, a method for generating PCA-like residuals from single data blocks using PLS was developed w21x. In this method, the block of data ŽX-matrix. is used to calibrate every variable to every other variable with PLS. The residuals in prediction corresponding to the X t-matrix are used instead of the ones obtained by means of PCA. Fig. 2b shows the residual contribution plot Žmean value of each column vector of E t , from the PCA and PLS approaches. associated with test samples 8 and 24, which emerge as a high Q and high T 2 sample, respectively Žin Fig. 2a.. In this plot, a single parameter Ž18, boron. appears to be mainly responsible for the high Q value. In contrast, sample 24 presents lower, comparable residuals for methods 14, 21, 22 and 23, which is corroborated by the high T 2 value. This study could be extended to the rest of the anomalous test samples.

98

O. Ortiz-Estarelles et al.r Chemometrics and Intelligent Laboratory Systems 56 (2001) 93–103

Fig. 2. Multivariate identification of anomalous samplesrmethods based on a 6-PC PCA model. Ža. Q vs. T 2 control plot including calibration samples Žq. and interpolated test samples Ž`.. Žb. Residual contribution plots for detecting the methods contributing to PCA and PLS model residuals corresponding to test samples 8 and 24.

In the context of this paper, an anomalous analytical result produces a breakdown in the correlation between the method results and generates a Q-value exceeding the established model Q-limit. The more the test samples are characterised by larger Q values, the greater the probability is that the error is as-

sociated with a single parameter, i.e. a systematic error in the analytical method. In the other cases, a high T 2 value Žseveral analytical results are extreme simultaneously in a sample. is probably more related to the sample Žnature, storage., analyst Žnon-expert staff. or laboratory environment.

O. Ortiz-Estarelles et al.r Chemometrics and Intelligent Laboratory Systems 56 (2001) 93–103

3.3. Detection of systematic errors in the analytical methods From an analytical point of view, the possibility of detectingrquantifying a systematic error in a single routine analytical method is very valuable. Up to now, detection problems have been addressed mainly by means of Žinternal. univariate control charts, while quantification problems have been addressed by means of Žexternal. reference materials or methods or by interlaboratory experiments. Here, we explore the possibilities of performing these tasks using an internal multivariate quality control approach, i.e. based on the own analytical results generated by the methods. Let us fix the desired detectable systematic error at the 10% level, and explore different alternatives for detecting the error by means of a simulation study. First, we selected five AnormalB samples that expand the Q–T 2 space. These samples were eliminated and the PCA model was recalculated on the resulting 135 = 25 X-matrix. The eliminated 5 = 25 matrix was used as a validation matrix ŽX v .. Fig. 3 gives the Q vs. T 2 control plot showing the distribution of the

99

AnormalB set of training samples Žq. and the interpolated X v samples Žv .. As expected, these validation samples fall into the AnormalB category. The next step was to introduce a 10% relative error ŽEr. in a single method Ža column vector of X v .. Fig. 4 shows various strategies for trying to detect the simulated error applied to method number 7 ŽCa.. Fig. 4a shows that the Q–T 2 control plot was not able to detect the error in any of the five adulterated validation samples Ž`.. In addition, the residual contribution plot ŽFig. 4c. does not permit us to conclude that method number 7 is affected by error with respect to other method residuals. Also, a null capacity of detection was deduced from the univariate control chart ŽFig. 4b., based on the mean of the 7th column of X Žas centre line, CL. and two standard deviations for the upper and lower control limits ŽUCL and LCL, respectively.. The five validation samples Ž`. fall again into the UCL–LCL interval. Neither the traditional SPC nor the MSPC approach seems to be the sensitive enough to detect the error in the fixed level. In order to find solutions to this problem, we adapted the approach based on PLS-residual calculation. Here, we used as the con-

Fig. 3. Q vs. T 2 plot showing the selected AnormalB set of samples Žq. from which five samples have been excluded and interpolated into the plot Žv .. The last samples are used as a validation set to simulate systematic errors in the analytical methods.

100

O. Ortiz-Estarelles et al.r Chemometrics and Intelligent Laboratory Systems 56 (2001) 93–103

Fig. 4. Control plots used in trying to detect the simulation of a 10% excess error in method number 7 ŽCa. applied on the validation set samples Ž`.. See text for details. Ža. Multivariate Q vs. T 2 plot, Žb. univariate control chart, Žc. multivariate residual contribution plot and Žd. pseudo-univariate control chart.

trol variable the relative prediction error Žin percentage., Er Ž%., from a PLS model. The regression was performed using the 7th method data Žcolumn vector of X. as the y-block or dependent variable and the rest of methods Žcolumns. as the X-block or independent variables. The number of latent variables, LVs, used in the PLS model was automatically selected by the algorithm based on cross-validation. The result is a Apseudo-univariate control chartB, in which the univariate control variable, Er Ž%., is calculated by means of a multivariate approach. The Er Ž%. value of each sample Žtraining and test. was obtained according to: Er Ž % . s 100 Ž yactual y ypredicted . rypredicted

Ž 3.

where all the y-values are rescaled after PLS calculation. We used this equation considering that the actual y-value may be affected by error while the predicted y-value is the AlogicalB one Žfrom the information of the rest of the methods.. In addition, this equation generates the right error sign and, therefore, it should be able to inform about the magnitude and the nature Žexcess or defect. of the systematic analytical error. Fig. 4d shows the results of this pseudounivariate control chart corresponding to method number 7, using the same limit criteria as in Fig. 4b. It can be seen that the simulated error was detected

in all the validation samples. In addition, the magnitude of the error was about 10% Žby excess. in all adulterated samples. The simulation procedure was repeated for the rest of the methods. Unfortunately, the situation varied depending on the method in which the error was applied. Fig. 5 shows the pseudo-univariate control charts for each of the 25 methods. The results vary from an excellent error detectionrquantification Žmethods 2, 7 and 11., to no error detection Žlast methods.. In the case of those variables that cannot be predicted by the others, we can conclude that there is no causal relationship between them. In this situation, no statistical method will give a predictive model. Fig. 6 shows a classification of each of the 25 analytical methods based on their coefficient of variation ŽCV of columns of X. and percentage of explained variance of the PLS model Ž%EV of the y-block.. It can be observed that for those methods located in the lower left part of the plot Žlow CV and high %EV values. the ability to detect the simulated 10% excess error is adequate, and in some cases, the magnitude of the error can be quantified Ži.e. methods 2, 3, 7, 8 and 11.. In the rest of the cases, the detection is doubtful Ži.e. methods 5, 12. or not possible. It should be noticed that 25 PLS loading plots must be checked to find information with respect to chemically linked parameters relationship, which has no advantage with respect to the direct use of Fig. 5. In contrast, a single plot like Fig. 6 a priori summarizes the expectations on Agood sensitivity to disturbance detectionB with respect to the methods. When the same simulation procedure was applied to the univariate control chart or the multivariate Q–T 2 approaches, qualitative error detection was positive only in the case of method number 2. This points to the relative goodness of the pseudo-univariate control chart. When the simulated error was increased to the 20% level, the positive detection based on the pseudo-univariate approach was extended to methods 4, and 10 and 14, following the trend marked by Fig. 6. Fig. 7 Žlelt part. shows how the pseudo-univariate approach based on PLS regression was not able to detect a 20% excess error in X v applied in method number 9. However, other regression models can be used instead of the PLS model. For instance, for the

O. Ortiz-Estarelles et al.r Chemometrics and Intelligent Laboratory Systems 56 (2001) 93–103

101

Fig. 5. Pseudo-univariate control charts for all the methods. In each case, a 10% excess error over the validation set data Ž`. was applied to the corresponding method, without changing the data of the rest of the methods. Each sub-figure corresponds to a method from 1 to 25 Žgoing from left to right and from up to bottom..

water quality data set used here, Multiple Linear Regression ŽMLR. offers similar features to those of PLS in detecting systematic errors. We used different regression algorithms to compare detection sensitivity. One of these methods was the locally weighed

regression ŽLWR. method, in which local regression models are produced using points that are near the sample to be predicted in the independent variable space. This algorithm uses a function proportional to the Mahalanobis distance based on the Principal Components model of the X-matrix. A modification of this method, based on considering the distance in the dependent variable space w22x, was considered. We use the nomenclature LWRxy in this last case. Fig. 7 Žright part. shows that the LWRxy model per-

Fig. 6. Classification of the methods based in their training data coefficient of variation ŽCV. and percentage of explained variance Ž%EV. obtained upon predicting the method data by means of the PLS model calculated with the data of the rest of the methods.

Fig. 7. Pseudo-univariate control charts for method 9 ŽNa. when a 20% excess error over the validation set data Ž`. was applied to this method. The control variable, Er Ž%., was obtained by means of the PLS Žleft part. and LWRxy Žright part. regression models.

102

O. Ortiz-Estarelles et al.r Chemometrics and Intelligent Laboratory Systems 56 (2001) 93–103

mits the detection of the 20% error in method 9 much better than PLS, although the quantification of the error is still imprecise. This suggests that the pseudo-univariate control chart approach could easily be extended to non-linear situations Žusing nonlinear algorithms.. The previous study has been focused on the presence of a systematic error located in a single sensor; probably the most habitual situation in a routine analysis applying Good Laboratory Practices ŽGLP. conditions. However, the presence of systematic errors in two or more methods simultaneously must be taken into account, since a failure in a sensor can mask other sensor disturbances Žalgorithms to solve this situation have been proposed w14x.. It is important to ascertain how this fact can influence the Aalert toolB performance in order to prevent false Žpositive or negative. detections. We have simulated the presence of 10% systematic error in two or three methods simultaneously Žone of them showing good individual detection performance.. For several method combinations, slight Žinsufficient. improvement in detection ability was observed in the Q–T 2 plot. However, the differences between residuals Žresidual contribution plot. were reduced regarding to the previous simulations, which makes the disturbed method identification difficult. In contrast, negligible changes were observed in the univariate and pseudo-univariate control charts. This means that for the water analysis data case studied, the Er Ž%. control variable seems to be robust enough for quality control purposes. Finally, certain caution on the data should be kept in mind. If the data are subject to seasonal variations or trends, methods to remove these effects must be previously applied prior to the application of the MSPC approaches. From visual variables observation, we have not detected any trend pattern in the data set. Means and standard deviations are time-invariant. This is a logical result since the data used in the present study come from processed samples Žchemical treatment, sedimentation, dilution in tanks, etc., are applied to natural water in the treatment plant before its distribution to the network where the analyses were carried out.. On the other hand, the autocorrelation function reveals periodicity in the variable 25 Žtemperature.. This is an expectable result because this parameter evolution is much less Aal-

teredrfilteredB in the treatment plant as it occurs with the rest of physicochemical parameters. The importance of variable 25 in predicting those methods showing Agood sensitivity to disturbance detectionB was negligible. The exclusion of the variable 25 of the study revealed that the temperature did not affect the multivariate quality control of the rest of parameters. 4. Conclusions A laboratory that does routine multiparametric analysis could be assimilated to a process system and therefore, could be subjected to a multivariate statistical process control. The Q–T 2 control plot can be used for split real Žnon-reference. samples into a calibration and test sets, selecting a normal training sample set Žextreme samples.. In addition, it also serves for a rapid anomalous test sample detection. A large QrT 2 ratio value increases the probability of having a systematic error in a single method, which can be identified by means of the residual contribution plot. However, this multivariate approach Žas well as the traditional univariate control chart. could be insensitive to systematic errors at the 10% level Žwhich are of analytical interest.. The use of the relative prediction error, Er Ž%., as a control variable could improve the error detection task, but can offer quantitative information on the magnitude of the error as well. Interpretation of pseudo-univariate control charts is easy because it uses the same format as the traditional univariate control chart. In addition, Er can be obtained from a variety of multivariate regression models, which in some cases could improve its versatility, robustness or sensitivity with respect to the error detection. Finally, the capacity to detectrquantify error in a single method can be checked a priori by examining its CV and %EV parameters. Therefore, it is possible to predefine the methods for which the control program can be applied. All these tools together Žunivariate, multivariate and pseudo-univariate. can serve as an immediate alarm system in order for the problem to be investigated by a specialist. The control plots can be easily automated and implemented as an internal Multivariate Analytical Quality Control ŽiMAQC. program in the laboratory with a minimum effort. The iMAQC program can be used as a complement of the external

O. Ortiz-Estarelles et al.r Chemometrics and Intelligent Laboratory Systems 56 (2001) 93–103

validation program. The real possibility of checking a possible bias associated with the analytical process Žstaff, instruments, methods, reagents, calibration, etc.. should be included in the cost–benefit analysis of the problem under study. Regarding the case study selected here, two main factors should be considered: Ži. the complete analysis cost Ž60 parameters. of each water sample is approximately $1300, Žii. not detecting a method error Ži.e. during inter-validation periods with external references. would have unpredictable consequences on health. On the other hand, the iMAQC program should imply a unique cost related to the algorithms implementation, which could result in a favourable cost–benefit analysis. Future studies using this multivariate approach in other real situations Žpreferably using a priori known ‘in-control’ and ‘out-of-control’ samples. will help both to explore the statistical benefits, limitations and alternatives of this approach in relation to the QC information generated, and in promoting its inclusion in the Quality Assurance scheme of the routine analysis laboratories. References w1x European Council Directive 98r83rCEE. w2x Analytical Methods Committee, R. Soc. Chem., Anal. 120 Ž1995. 29–34. w3x E. Mullins, Analyst 119 Ž1994. 369–375. w4x J.F. MacGregor, Int. Stat. Rev. 65 Ž1997. 309–323.

103

w5x M. Caselli, A. De Giglio, A. Mangone, A. Traini, J. Sci. Food Agric. 76 Ž1998. 533–536. w6x P. Barbieri, G. Adami, A. Favretto, E. Reisenhofer, Fresenius’ J. Anal. Chem. 361 Ž1998. 349–352. w7x N. Kannan, N. Yamashita, G. Petrick, J.C. Duinker, Environ. Sci. Technol. 37 Ž1998. 1747–1753. w8x G.S. Chen, K.W. Schramm, C. Klimm, Y. Xu, Y.Y. Zhang, A. Kettrup, Fresenius’ J. Anal. Chem. 359 Ž1997. 280–284. w9x J.M. Andrade, D. Prada, E. Alonso, P. Lopez, S. Muniategui, ´ P. de la Fuente, M.A. Quijano, Anal. Chim. Acta 292 Ž1994. 253–261. w10x K.J. James, M.A. Stack, Fresenius’ J. Anal. Chem. 358 Ž1997. 833–837. w11x E. Engstrom, B. Karlberg, J. Chemom. 10 Ž1996. 509–520. w12x T. Kourti, J.F. MacGregor, Chemom. Intell. Lab. Syst. 28 Ž1995. 3–21. w13x B.M. Wise, B.R. Kowalski, in: F. McLennan, B.R. Kowaslki ŽEds.., Process Analytical Chemistry. Chapman & Hall, London, 1995, chap. 8. w14x C.L. Stork, D.J. Veltkamp, B.R. Kowalski, Anal. Chem. 69 Ž1997. 5031–5036. w15x A. Rius, M.P. Callao, F.X. Rius, Analyst 122 Ž1997. 737– 741. w16x J.B. Marzo, M.J. Medina-Hernandez, S. Sagrado, E. Bonet, R. ´ Gimenes, J. Chemom. 12 Ž1998. 323–336. w17x M.A.H. Franson, HAPHA-AWWA-WEF ŽEds. ., Standard Methods for the Examination of Water and Wastewater, 20th edn., NW, Washington, 1998. w18x R. Todeschini, Anal. Chim. Acta 348 Ž1997. 419–430. w19x M.M. Morales, P. Martın, ´ A. Llopis, L. Campos, S. Sagrado, Anal. Chim. Acta 394 Ž1999. 109–117. w20x B.M. Wise, N.L. Ricker, D.J. Veltkamp, B.R. Kowalski, Process Control Qual. 1 Ž1990. 41–51. w21x B.M. Wise, N.L. Ricker, D.J. Veltkamp, AIChE 1989 Annual Meeting, November, 1989. w22x Z. Wang, T. Isaksson, B.R. Kowalski, Anal. Chem. 66 Ž1994. 249–260.