ILS : An R package for statistical analysis in Interlaboratory Studies

Accepted Manuscript ILS: An R package for statistical analysis in Interlaboratory Studies Miguel Flores, Rubén Fernández-Casal, Salvador Naya, Javier ...

Download PDF

3MB Sizes 0 Downloads 118 Views

Report

PDF Reader
Full Text

Accepted Manuscript ILS: An R package for statistical analysis in Interlaboratory Studies Miguel Flores, Rubén Fernández-Casal, Salvador Naya, Javier Tarrío-Saavedra, Roberto Bossano PII:

S0169-7439(18)30249-1

DOI:

10.1016/j.chemolab.2018.07.013

Reference:

CHEMOM 3660

To appear in:

Chemometrics and Intelligent Laboratory Systems

Received Date: 24 April 2018 Revised Date:

27 July 2018

Accepted Date: 30 July 2018

Please cite this article as: M. Flores, Rubé. Fernández-Casal, S. Naya, J. Tarrío-Saavedra, R. Bossano, ILS: An R package for statistical analysis in Interlaboratory Studies, Chemometrics and Intelligent Laboratory Systems (2018), doi: 10.1016/j.chemolab.2018.07.013. This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

ACCEPTED MANUSCRIPT

SC M AN U TE D

60

70

Lab 1 Lab 2 Lab 3 Lab 4 Lab 5 Lab 6 Lab 7

AC C

EP

50 40

Mass (%)

80

90

RI PT

100

TG curves of calcium oxalate

200

400 Temperature (C)

600

800

ACCEPTED MANUSCRIPT

RI PT

ILS: An R package for statistical analysis in Interlaboratory Studies Miguel Floresa , Rub´en Fern´andez-Casalc , Salvador Nayab , Javier Tarr´ıo-Saavedrab , Roberto Bossanod

a Escuela Polit´ ecnica Nacional. Quito, Ecuador MODES, CITIC, ITMATI, Departamento de Matem´ aticas, Escola Polit´ ecnica Superior, Universidade da Coru˜ na. Ferrol, Spain c Grupo MODES, CITIC, ITMATI, Facultade de Inform´ atica, Universidade da Coru˜ na. A Coru˜ na, Spain d Universidad de las Fuerzas Armadas ESPE, Quito, Ecuador

M AN U

SC

b Grupo

Abstract

In this paper we present an R package with routines to perform Interlaboratory Studies (ILS). The aim of the ILS package is to detect laboratories that provide not consistent results, working simultaneously with different test materials, from the perspective of the Univariate Data Analysis and the Functional

D

Data Analysis (FDA).

The ILS package estimates the Mandel’s h and k scalar statistics, based

TE

on the ASTM E691 and ISO 5725-2 standards, to identify laboratories that provide significantly different results. Cochran and Grubbs tests to evaluate the presence of outliers are also available. In addition, Analysis of Variance

EP

(ANOVA) techniques are provided, both for the cases of fixed and random effects, including confidence intervals for the parameters.

AC C

One of the novelties of this package is the incorporation of tools to perform an ILS from a functional data analysis approach. Accordingly, the functional nature of the data obtained by experimental techniques corresponding to analytical chemistry, applied physics and engineering applications (spectra, thermograms, and sensor signals, among others) is taking into account by implementing the functional extensions of Mandel’s h and k statistics. For this purpose, the ∗ Corresponding

author. Phone: +34 981167000 (ext. 3210). e-mail:[email protected]

Preprint submitted to Chemometrics and Intelligent Laboratory Systems

July 31, 2018

ACCEPTED MANUSCRIPT

ILS package also estimates the functional statistics H(t) and K(t), as well as the dH and dK test statistic, which are used to evaluate the repeatability and

using a bootstrap algorithm.

RI PT

reproducibility hypotheses where the critical ch and ck values are estimated by

Keywords: Interlaboratory studies, Functional data analysis, Outlier

SC

detection, Bootstrap, Data depth, R software

1. Introduction

An Interlaboratory Study (ILS) can be defined as a control procedure to

M AN U

evaluate the performance of a group of laboratories through a collaborative trial [1, 2]. In an Interlaboratory Study, an adequate number of laboratories are chosen to participate in the experiment with the aim of analysing the samples and obtain results.

Participating laboratories receive samples (previously homogenized or to be homogenized by the laboratories) for analysis, then, the measurements results of

D

the laboratories are evaluated according to the degree of data variability. Some of the most common factors that may be a cause of variability are: the equipment

TE

of laboratories, operators, materials, temperature and humidity, among others. Several scalar statistical techniques are frequently applied to study the consistency of test results from the different laboratories that participate in an ILS.

EP

Standard ASTM E-691 (Standard Practice for Conducting an Interlaboratory Study to Determine the Precision of a Test Method) recommends applying only

AC C

one graphical technique from Mandel’s k and h statistics [2], while ISO 5725-2 (Accuracy – trueness and precision – of measurement methods and results) recommends, in addition to the graphic technique, to use the Cochran and Grubbs tests [1].

Additionally, through Analysis of Variance (ANOVA), the effect of the labo-

ratory factor over the response can be studied. The variance of repeatability and reproducibility can be also estimated when an ANOVA random effects model is considered, as shown in ISO 5725-2 [1]. On the other hand, if a fixed effect

2

ACCEPTED MANUSCRIPT

model is fitted, in addition to the F test, multiple comparisons of means can be performed with the Tukey Honest Significant Difference (HSD) method.

RI PT

To perform consistency tests for the repeatability and reproducibility hy-

potheses, as well as for the detection of outliers, the values of the statistics should be compared with their corresponding critical values. If these are greater, in-

consistency is detected in the results of laboratories. ISO 5725-2 provides some critical values depending on the number of laboratories p, number of measure-

SC

ments n and level of significance α.

At present, both ISO 5725 and ASTM-E691 do not provide a methodol-

M AN U

ogy for performing an ILS when data are functional, this is, in the case where the test results are curves (functional data). Functional Data Analysis (FDA) is a relatively new branch of statistics that takes curves as unit of analysis, also surfaces, and volumes defined in a continuum (such as time, or frequency’s domain). Considering the recent advances in computing science, and the increasing amount of data generated by experimental techniques and sensors, the FDA has had a great development in recent years. In fact, we have many sta-

D

tistical methodologies that have been developed and extended to the functional

TE

case, such as exploratory analysis, regression, classification, analysis of variance, and time series [3–5]. In the specific case of ILS, FDA extensions for Mandels’s h and k have been proposed and described by Flores et al. [6], in addition to

EP

other works where the FDA descriptive analysis had been introduced for ILS studies [7].

The aim of the ILS package [8] is, on the one hand, to facilitate the use of new

AC C

tools in the FDA context and, on the other hand, to provide a comprehensive set of the more used univariate outlier tests for ILS with scalar response. It is important to note that FDA techniques for Interlaboratory Studies are based on the proposals of Naya et al. [7], Flores et al. [9] and, above all, Flores et al. [6], whereby new functional extensions of h and k statistics are introduced for identifying non consistent laboratories. The functions that have been implemented in the ILS package can determine the Mandel’s h and k statistics both in a graphical and analytical way, using a functional approach. These test statistics have 3

ACCEPTED MANUSCRIPT

also been implemented to facilitate the implementation of Repeatability and Reproducibility studies (r&R) when the data are functional. In addition, the

RI PT

ILS package apply the methods suggested by the norms ASTM E-691 and ISO 5725-2 for the scalar case. The ILS package is available on the Comprehensive R Archive Network at http://CRAN.R-project.org/package=ILS.

The present ILS library implements and calls some of their routines in order

to perform outlier detection in the framework of the Interlaboratory Studies.

SC

Thus, regarding to ILS with scalar response, there are some interesting and useful computational tools in R software. Namely, the metRology package esti-

M AN U

mates the uncertainty of the measurement, and performs the required statistical calculations for Interlaboratory Studies [10], whereas multcomp performs analysis of variance (ANOVA) through F and Tuckey tests [11]. On the other hand, due to the exponential increasing of FDA available techniques, there are also a continuously increasing number of R libraries devoted to this branch of statistics. Among all of them, the most important and used packages (on which the present proposal is based) are fda.usc [12], that implements outlier detection

D

techniques and functional ANOVA, among other tools for FDA, and the fda

TE

[13]. The present ILS package uses the applications of the before mentioned multcomp and fda.usc packages.

This work is organized as follows. In Section 2, we describe examples of

EP

Interlaboratory Studies defined by four sets of experimental data. Then, the ILS package is used to summarize two of these sets. In Section 3, the functionality of the ILS package is illustrated through a standard ILS procedure using the

AC C

Glucose dataset. Further, in Section 4, the TG and DSC datasets (composed by thermogravimetric and colorimetric curves of calcium oxalate, respectively) are used to show the ILS package utilities when experimental data are curves (functional). Finally, the principal remarks of this study are summarized in the conclusion section.

4

ACCEPTED MANUSCRIPT

2. Examples of Interlaboratory Studies

RI PT

An Interlaboratory Study evaluates the analytical methods performed by laboratories, either for the evaluation of the efficiency of the laboratories involved, or for the performance of an experimental procedure, or for the valida-

tion of a standard guideline. For example, to show the application of consistency test, the ILS, package contains the Glucose dataset, avalaible on ASTM E-691

SC

[2] that corresponds to the results of a clinical test. Likewise, from a study of the properties of the calcium oxolate material, three datasets (IDT, TG, DSC)

were obtained, and incorporated into the package. These latter datasets have

M AN U

been extensively described in Naya et al. [7] and Flores et al. [6]. 2.1. Clinical study of blood glucose measurement

The Glucose dataset corresponds to the serum glucose test (measurements of the concentration of glucose in the blood used to control the diabetes). In the study, eight laboratories where involved, and five different tests were performed on blood samples labelled with different references, ranging from a low sugar

D

content to a very high one. Three replicates were obtained for each sample.

TE

Each of these laboratories measured five different concentration levels (A, B, C, D, E) of a given material, and at each of these levels, three measurements were taken (3 replicates). Each laboratory provided a total of 15 measurements

EP

(3 for each level), therefore, with 8 laboratories involved, 120 measurements were obtained.

AC C

In order to access this dataset, the ILS package installing and loading is required. Once loading is performed, the Glucose data.frame object is called using the following instructions. R> library("ILS") R> data("Glucose") The first step to perform an analysis with the ILS package consist on using the function ils.qcdata() (quality control data) that receives a data.frame

5

ACCEPTED MANUSCRIPT

as an argument. By default, the first column of the data frame must contain the response variable, the second column the replicates, the third column corre-

RI PT

sponds to the tested material, while the fourth column includes the laboratories where the procedure was performed.

For instance, the following code creates an object qcdata (of class ils.qcdata) and uses the summary() method to obtain descriptive statistics information of

SC

the dataset. R> qcdata <- ils.qcdata(Glucose)

x Min.

replicate material

: 39.02

1:40

1st Qu.: 78.45

2:40

Median :135.03

3:40

Mean

:149.09

Lab1

:15

B:24

Lab2

:15

C:24

Lab3

:15

D:24

Lab4

:15

E:24

Lab5

:15

Lab6

:15

(Other):30

TE

:309.40

laboratory

A:24

D

3rd Qu.:196.66 Max.

M AN U

R> summary(qcdata)

R> plot(qcdata, ylab = "Laboratory", xlab = "Glucose concentration in blood")

EP

Figure 1 shows the obtained values for each laboratory and material. It can be noted that the blood glucose level increases from material A to D and there is more variability between the results for each laboratory from material C to

AC C

material E.

In order to calculate the graphical and analytical statistics for the scalar

(univariate) case, first, the function lab.qcs() (quality control statistics) has to

be used. This function returns the estimates of the statistical required measures (mean, variance, etc.) for computing the Mandel’s h and k statistics, as well as the required measures to perform the Cochran and Grubbs tests. In the following commands, the lab.qcs() function uses the qcdata object to create the qcstat object that estimates both the mean and the global de6

ACCEPTED MANUSCRIPT

RI PT

Lab8

Lab7

Laboratory

Lab6

A B C D E

Lab5

Lab4

SC

Lab3

Lab2

50

M AN U

Lab1

100

150

200

250

300

Glucose concentration in blood

Figure 1: Measurements of glucose concentration in blood used to control diabetes.

viation from the results of all laboratories and all materials. The repeatability deviation (Sr ), the deviation between the means of laboratories (SB ), and the

D

reproducibility deviation (SR ) for each material are also estimated (following ASTM E-691 [2]). More information about definitions and calculation methods

TE

can be obtained in [1, 2].

R> qcstat <- ils.qcs(qcdata)

EP

R> summary(qcstat)

Number of laboratories:

AC C

Number of materials:

8

5

Number of replicates:

3

Laboratory summaries (means): Lab1

Lab2

Lab3

Lab4

Lab5

Lab6

Lab7

Lab8

A

41.28

41.44

41.45

41.46

41.46

42.02

40.46

42.58

B

78.32

79.23

79.90

80.96

78.69

79.89

79.52

80.35

C 133.20 135.41 134.59 140.83 133.27 136.62 132.49 134.71 D 193.65 195.11 192.09 197.21 193.05 197.24 191.26 198.12 7

ACCEPTED MANUSCRIPT

Laboratory summaries (std. dev.): Lab1

Lab2

Lab3

Lab4

Lab5

Lab6

Lab7

RI PT

E 293.25 298.92 292.67 295.82 293.56 294.96 290.14 296.62

Lab8

A 0.2230 0.4851 1.0608 1.8118 0.3667 1.408 1.248 0.8225 B 0.1582 1.3269 0.8303 2.7661 0.7754 1.637 2.060 0.5065

SC

C 0.5910 2.1680 1.7288 6.6200 1.1987 1.287 2.124 1.0344 D 0.0600 4.6824 1.5932 1.9366 1.8826 1.650 3.818 2.4638

M AN U

E 0.7267 9.1869 2.7101 0.8836 0.9544 4.034 3.304 1.6479

Material summaries: mean

S

S_r

S_B

S_R

A

41.52 0.5543 1.063 0.6061 1.059

B

79.61 0.8665 1.496 0.8627 1.495

C 135.14 1.9071 2.751 2.6567 3.479 D 194.72 1.4263 2.625 2.5950 3.366

D

E 294.49 2.8068 3.935 2.6931 4.192

TE

R> plot(qcstat, xlab = "Deviations", ylab = "Material") In Figure 2, the values of S (the global deviation of all laboratories), Sr

EP

(the repeatability’s deviation), SR (reproducibility’s deviation) and SB (the deviation between the means of the laboratories) are shown for each material. A greater variability can be noted from material C to material E. Materials C

AC C

and D have a greater variability between the results of the laboratories (SR )

and within them (Sr ). 2.2. Characterization of materials by thermogravimetric analysis In [7], 7 (emulated) laboratories analysed 15 samples of calcium oxalate by

Thermogravimetric (TG) techniques, obtaining 105 TG curves that shows the mass loss of oxalate as a function of temperature when the oxalate samples were

8

ACCEPTED MANUSCRIPT

RI PT

E

Material

D

S S_r S_B S_R

C

SC

B

M AN U

A

1

2

3

4

Deviations

Figure 2:

Measures of variability for each material obtained from the results of all the

laboratories.

heated at 20◦ C/min. The TG dataset contains the resulting values at 1000 dis-

D

cretization points, corresponding to temperatures ranging from 40◦ C to 850◦ C. Laboratories 1, 6, and 7 presented non-consistent results. In laboratory 1 a Si-

TE

multaneous Thermal Analyzer (STA) was used with an out of phase calibration program. In laboratory 6, we used a simultaneous SDT analyser with an old calibration, and finally, in laboratory 7, we used a simultaneous SDT analyser

EP

with a bias in the the temperature calibration with respect to the real values (2◦ C displaced with respect to the melting point of the zinc).

AC C

From the TG curves, a second set of data called IDT was obtained. The IDT (Initial Decomposition Temperature) is a parameter defined by the temperature at which the studied material losses the 5% of its weight when it is heated at a constant rate. The dataset is composed of the IDT values of the calcium oxalate samples analyzed by the 7 laboratories. It is an example of ILS study with scalar response, obtained by extracting just only one representative feature from the TG curve. It is important to stress that when a feature extraction process is performed, there is the risk of loosing relevant information and thus obtaining

9

ACCEPTED MANUSCRIPT

erroneous findings. In addition, excluding laboratory 1, 15 samples of Calcium Oxalate were

RI PT

analysed by Differential Scanning Calorimetry (DSC) thermal technique by the 2-7 laboratories. The resulting DSC curves, obtaining by using a SDT instru-

ment, accounts for the difference of energy between a reference and the oxalate sample. We can observe the exchange of energy between the sample and the

reference as a function of temperature when the latter vary as a linear function

SC

of the time defined by a slope of 20◦ C/min. The DSC dataset contains the 90

DSC curves discretized at 1000 temperature values ranging from 40◦ C to 850◦ C.

M AN U

For the estimation of the functional statistics (for the performance of the graphical and analytical methods), the procedure is the same as for the scalar case. The ILS package provides the ils.fqcdata() function to generate the functional quality control data from a matrix (or a data.frame), in which each row represents a test result. The dimension of the matrix must be n×m, where n is the number of replicates performed by the laboratories that participate in the study, and m is the number of points observed in each curve. The usual methods

D

were implemented to make plots and summaries of the resulting objects.

TE

Then, the ils.fqcs() function (functional quality control statistics) is needed for the estimation of some important statistical functional measures, namely functional mean, functional variance, etc. These are necessary for the estima-

EP

tion of the H(t), K(t), dH and dK statistics, that will be defined in the next sections.

To create an object that contains the functional quality control data from

AC C

the TG dataset, first we defined the grid in which the observations were obtained. In this case, the grid consists of 1000 temperature values ranging from 40◦ C to

850◦ C.

R> data(TG)

R> delta <- seq(from = 40, to = 850, length.out = 1000) R> fqcdata <- ils.fqcdata(TG, p = 7, argvals = delta) R> plot(x = fqcdata, main = "TG curves of calcium oxalate",

10

ACCEPTED MANUSCRIPT

xlab = "Temperature (C)", ylab = "Mass (%)",

+

legend = TRUE, x.co = 20, y.co = 90)

90

100

TG curves of calcium oxalate

SC

70

200

M AN U

60 40

50

Mass (%)

80

Lab 1 Lab 2 Lab 3 Lab 4 Lab 5 Lab 6 Lab 7

RI PT

+

400

600

800

Temperature (C)

D

Figure 3: TG curves obtained from calcium oxalate.

In Figure 3, the TG curves are presented. The following commands create

afterwards.

TE

a fqcstat object containing the functional statistics from fqcdata, and plot it

EP

R> fqcstat <- ils.fqcs(fqcdata) R> plot(fqcstat, xlab = "Temperature (C)", ylab = "Mass (%)")

AC C

The result is shown in Figure 4. The plot() method creates a panel with four graphs, in the first row the functional means and variances for each laboratory are presented, while in the second row the global functional mean and variance are plotted.

3. Interlaboratory Studies: Standard Approach The ILS package provides two groups of functions made to detect outlying

individual results (outlying replicates) and outlying laboratories: both for the 11

ACCEPTED MANUSCRIPT

0.0

40

400

600

800

200

Temperature (C)

400

600

800

SC

200

Temperature (C)

Global Functional Variance

0.20 0.10 0.00

40

50

60

70

Mass (%)

M AN U

80

90 100

0.30

Global Functional Mean

Mass (%)

Lab 1 Lab 2 Lab 3 Lab 4 Lab 5 Lab 6 Lab 7

RI PT

0.6 0.2

0.4

Lab 1 Lab 2 Lab 3 Lab 4 Lab 5 Lab 6 Lab 7

Mass (%)

80 70 60

Functional Variance by Laboratory

50

Mass (%)

90 100

Functional Mean by Laboratory

200

400

600

200

400

600

800

Temperature (C)

Figure 4:

TE

D

Temperature (C)

800

Functional descriptive statistics: functional means and functional variances for

each laboratory as well as the overall functional mean and the overall functional variance

EP

corresponding to the TG curves obtained from calcium oxalate.

scalar and the functional cases (Table 1). The ILS package offers graphical and

AC C

analytical procedures (statistical hypothesis test) for this purpose. As above mentioned, among the methodologies used to evaluate the consis-

tency of laboratory results, we must highlight the r&R studies, which quantify the variability between laboratories (reproducibility) and variability between results (repeatability). The repeatability is the variability between the results of the independent tests obtained for each individual laboratory. That is, it is the variability produced by the measurement system since the results are obtained by a single operator in each laboratory and in a short interval of time. On

12

ACCEPTED MANUSCRIPT

Table 1: Functions incorporated to the ILS package to perform Interlaboratory Studies from a scalar or functional approach.

Technique

Function

Description

Scalar

Plot

h.qcs, k.qcs

Mandel’s h and k statistics

Test

cochran.test

Cochran’s test

grubbs.test

Grubbs’ test

ils.aov

ANOVA

Plot

h.fqcs, k.fqcs

Mandel’s H(t) and K(t) functional statistics

Test

mandel.fqcs

Mandel’s functional test

SC

Functional

RI PT

Approach

the other hand, the reproducibility refers to the variability between the results

M AN U

of individual tests obtained in different laboratories, allowing to determine the bias.

Accordingly with the repeatability and reproducibility concepts, Mandel’s h and k statistics are used in ILS to detect laboratories that provide inconsistent results. The h statistic explains the variability between the laboratories, that is, estimates the bias, which is the difference of the means of each laboratory with

D

respect to the global mean, while the k statistic estimates the variability within the laboratories, comparing the repeatability corresponding to each laboratory.

TE

The decision rule for detecting whether a laboratory is inconsistent is based on the comparison between the value of the h or k statistic and the critical value,

EP

which is normally calculated with a significance level of 0.5% (as recommended by ASTM E-691 [2]).

On the other hand, the ILS package implements the Cochran test to examine

AC C

the consistency within a laboratory, whereas the Grubbs test is commonly used to examine consistency between laboratories. The Grubbs test can also be used as a consistency test for the results obtained in a laboratory using identical materials. These tests are recommended by ISO 5725-2 [1].

13

ACCEPTED MANUSCRIPT

3.1. Consistency tests The basic statistical model proposed in ISO 5725-2 that estimates the accu-

RI PT

racy and precision of an analytical method is: X = µ + B + ,

2 whereby µ is the global mean for the material under analysis, B ∼ N (0, σL ) is

SC

the laboratory bias component under repeatability conditions, and ∼ N (0, σr2 )

accounts for the random error occurring in each measure under repeatability 2 is given by: conditions. The reproducibility variance σR

M AN U

2 2 σR = σL + σr2 .

In Table 2, the one-way ANOVA approach statistics are shown, where xlj is the j-th result of laboratory l, x ¯l is the mean of the nl results of laboratory l, x ¯

D

is the global mean of the results of the p laboratories, Pp n2 1 n ¯= N − l=1 l , p−1 N Pp 2 2 and N = l=1 nl . The repeatability variance σr is estimated by Sr , which

TE

is the within-laboratory variance. On the other hand, the between-laboratory 2 variance σL , related to laboratory bias, is estimated by SL2 :

EP

SL2 =

M SL − Sr2 , n ¯

AC C

when M SL ≥ Sr2 (SL2 = 0 otherwise). 2. Table 2: ANOVA approach for the estimation of σr2 and σL

Source

Laboratory

Residual (repeatibility)

Mean squares Pp nl (¯ xl −¯ x)2 M SL = l=1 p−1 Pp Pnl (xlj −¯ xl )2 2 Sr = l=1 Nj=1 −p

Estimate of 2 ¯ σL σr2 + n

σr2

Using the ILS package, one-way ANOVA analysis and mean comparison test can be performed. However, laboratories that present non-consistent results

14

ACCEPTED MANUSCRIPT

should be excluded from the ILS in advance [1]. Accordingly, consistency tests and identification of atypical results must be performed previously to ANOVA

RI PT

analysis. There are two possible scenarios in which the presence of outliers can be evaluated: the first is that the results of one laboratory deviates from the others in terms of precision, that is, when the measurements made by a

laboratory significantly differ with respect to the measurements obtained by

other laboratories. The second scenario is related with the identification of

SC

outliers in a laboratory for a certain level. The statistics and tests recommended by ISO 5725-2 and ASTM E-691 are described below.

M AN U

3.1.1. Mandel’s h statistic and Grubbs test

Let {x1 , x2 , . . . , xp } be a sample of p observations. The xl , l = 1, . . . , p, are modelled as realizations of random variables Xl , l = 1, . . . , p, being identically and independently distributed according to the normal distribution N (µ, σ 2 ). We denote:

¯= X

l=1

Xl

p

,

D

as the mean of the Xl , and:

Pp

p

1 X ¯ 2, (Xl − X) p−1

TE

S2 =

l=1

as the sample variance of the Xl .

EP

Mandel’s h statistic [14] is defined by: Hl =

¯ Xl − X , l = 1, . . . , p, S

AC C

which has the same distribution for all l = 1, . . . , p. The critical value is [15]: (p − 1)tp−2;1− α2 hp;1− α2 = q , p(t2p−2;1− α + p − 2) 2

whereby tp−2;1− α2 is the (1 − α/2) quantile of the t distribution with v = p − 2 degrees of freedom. For the case of p laboratories that obtain n replicates each one, the h statistic is obtained as: hl =

x ¯l − x ¯ , l = 1, . . . , p, SB 15

ACCEPTED MANUSCRIPT

whereby SB =

qP p

l=1

(¯ xl −¯ x)2 p−1 ,

x ¯l is the mean of the n results of laboratory

l, and x ¯ is the global mean of the results of the p laboratories. A laboratory

RI PT

is detected as inconsistent when the observed value hl of the statistic is larger than the critical value, i.e. when hl ≥ hp;1− α2 .

On the other hand, if we want to determine if the observation Xmax =

max {X1 , . . . , Xp } is an outlier, the Grubbs test [16] could be used. The statistic

SC

corresponding to this test is defined by the following expression:

¯ ¯ Xl − X Xmax − X = . l=1,...,p S S

Gmax = max Hl = max l=1,...,p

an outlier, the test statistic is:

M AN U

If we want to determine if the smallest observation Xmin = min {X1 , . . . , Xp } is

¯ − Xl ¯ − Xmin X X = . l=1,...,p S S

Gmin = max {−Hl } = max l=1,...,p

The critical value for this test can be approximated [15] by:

D

(p − 1)tp−2;1− αp gp;1−α ≈ r . 2 p p − 2 + tp−2;1− α p

TE

For the special case where there are n replicates obtained by each one of the p laboratories, the Grubbs’ statistics are defined analogously. The observations must be replaced by the means of the results corresponding to each labora-

EP

tory, whereas the mean of the observations is also replaced by the global mean obtained as the average of the laboratory means. If a laboratory is identified as an outlier, after applying the h statistic and

AC C

the Grubbs test to different levels within a laboratory, this is an evidence of the presence of a laboratory high bias (due to a high systematic error in calibration, or errors in the equations used to compute the results). 3.1.2. Mandel’s k statistic and Cochran test Let us assume that the observed values xlj , l = 1, . . . , p, j = 1, . . . , n, are realizations of random variables Xlj identically and independently distributed according to a normal distribution N (µj , σ 2 ) for each replicate j and laboratory 16

ACCEPTED MANUSCRIPT

l, and denote by Sl2 the sample variance of the replicates of laboratory l for a given material: n X ¯ l )2 (Xlj − X = . n−1 j=1

RI PT

Sl2

Then (n − 1)Sl2 /σ 2 , l = 1, . . . , p, follow a χ2v distribution with v = n − 1 degrees of freedom. Mandel’s k statistic [14] is defined by: Sl , l = 1, . . . , p, Sr

where

Pp

l=1

=

p

Sl2

,

M AN U

Sr2

SC

Kl =

with the same distribution for all l = 1, . . . , p. The critical value [15] is: s p kp,n;1−α = , 1 + Fvp−1 ,v ;α 1

2

where Fv1 ,v2 ;α is the α-quantile of the F distribution with v1 = (p − 1)(n − 1) and v2 = n − 1 degrees of freedom. A laboratory is detected as inconsistent

thas is, kl ≥ kp,n;1−α .

D

when the observed value kl of the statistic is greater than the critical value,

TE

2 = max S12 , S22 , . . . , Sp2 On the other hand, to determine if the highest variance Smax is an outlier, we used the Cochran test:

EP

1 S2 S2 C = Ppmax 2 = max l2 . p l=1,...,p Sr l=1 Sl

The critical value of this test is approximated [15] by:

AC C

cp,n;1−α ≈

1 . 1 + (p − 1)Fv1 ,v2 ; αp

The Cochran test only evaluates the highest value in a series of variances. If

a laboratory is detected as an outlier, using the k statistic or with the Cochran

test, this indicates that the variance within the laboratory is high (due to lack of familiarity with the analytical method, differences of appreciation among operators, inadequate equipment, equipment in poor state, or careless execution), in which case the total of results collected by this laboratory should be rejected and taken out of the study. 17

ACCEPTED MANUSCRIPT

The detection of inconsistent laboratories must be repeated until it stops reporting outliers. However, the consistency tests should be used with cau-

RI PT

tion, because if this process is carried out in excess, could lead to false outlier identification. 3.2. ILS: Glucose study

In this section, we will use the qcdata and qcstat objects created in sub-

SC

section 2.1 from the Glucose dataset. First, an analysis of the variability for each laboratory will be performed. For this purpose, the k statistic (k.qcs())

M AN U

and the Cochran test (cochran.test()) will be used to identify if there is any laboratory with non-consistent results. Subsequently, the h statistic (h.qcs()) and the Grubbs test (grubbs.test()) will be used to perform an analysis to evaluate inter-laboratory variability.

The following code creates an object containing the scalar Mandel’s k statistic for each laboratory and material, which is subsequently plotted (see Figure 5).

D

R> k <- k.qcs(qcdata, alpha = 0.005)

TE

R> plot(k)

In Figure 5, the dotted line represents the critical value equal to 2.06, ob-

EP

tained for p = 8, n = 15 and α = 0.005 (following the ASTM standard). Hence, outliers were detected for the laboratory 2, when testing the material 5, and for material 3, when is tested by the laboratory 4, since the corresponding values

AC C

of the k statistics were greater than the critical value. The k.qcs() function returns a list (of class k.qcs) containing the following

components:

• k: The k statistic for each laboratory and material. • k.critical: The critical value (corresponding to the given significance level, specified in parameter alpha).

18

ACCEPTED MANUSCRIPT

RI PT

A B C D E

SC

1.5 0.0

M AN U

0.5

1.0

k

2.0

2.5

3.0

Glucose

Lab1

Lab2

Lab3

Lab4

Lab5

Lab6

Lab7

Lab8

Laboratory

Figure 5:

Values of Mandel’s k, classified by laboratory and material, corresponding to

glucose measurements in blood, and available in the Glucose database.

D

• violations: A logical matrix containing the results of checking if the

material.

TE

observed k value is less than the critical value for each laboratory and

EP

The summary() method prints the violations matrix. If an entry of this matrix is FALSE, the laboratory reports outlying results for the corresponding material

AC C

at the specified significance level. R> summary(k)

Number of laboratories: Number of materials:

5

Number of replicates: Significance level: Critical value:

8

3

0.005

2.06084

Inside control limits: 19

B

C

D

E

Lab1 TRUE TRUE

TRUE TRUE

TRUE

Lab2 TRUE TRUE

TRUE TRUE FALSE

Lab3 TRUE TRUE

TRUE TRUE

TRUE

Lab4 TRUE TRUE FALSE TRUE

TRUE

Lab5 TRUE TRUE

TRUE TRUE

TRUE

Lab6 TRUE TRUE

TRUE TRUE

TRUE

Lab7 TRUE TRUE

TRUE TRUE

TRUE

Lab8 TRUE TRUE

TRUE TRUE

TRUE

SC

A

RI PT

ACCEPTED MANUSCRIPT

M AN U

The Cochran’s test can be performed with the cochran.test() function. In this case study, with the maximum variance for each material, no laboratory was considered inconsistent, since the critical value was 0.52 and the p-values for each material exceeded the 5% significance level. R> cochran.test(qcdata)

D

Cochran's Test

TE

Critical value: 0.5157 Material Smax

C p.value

A Lab4 0.20034

1

2

B Lab4 0.15448

1

C Lab4 0.10935

1

D Lab2 0.08494

1

E Lab2 0.07416

1

3 4

AC C

5

EP

1

We proceeded to use the methods h.qcs() and plot() to estimate and plot

the h statistics for each laboratory and material. The critical value was 2.15, therefore, from this result it can be seen in Figure 6 that laboratories 4, 7 and

8 presented non-consistent results at a significance level of α = 0.005. R> h <- h.qcs(qcdata, alpha = 0.005) R> plot(h) 20

ACCEPTED MANUSCRIPT

R> summary(h)

Number of materials:

5

Number of replicates: Significance level: Critical value:

8

RI PT

Number of laboratories:

3

0.005

2.152492

C

D

E

Lab1 TRUE TRUE

TRUE

TRUE TRUE

Lab2 TRUE TRUE

TRUE

TRUE TRUE

Lab3 TRUE TRUE

TRUE

TRUE TRUE

Lab4 TRUE TRUE FALSE

TRUE TRUE

Lab5 TRUE TRUE

TRUE

TRUE TRUE

Lab6 TRUE TRUE

TRUE

TRUE TRUE

Lab7 TRUE TRUE

TRUE FALSE TRUE

Lab8 TRUE TRUE

TRUE FALSE TRUE

M AN U

B

D

A

SC

Inside control limits:

TE

Moreover, laboratories with very extreme results were detected by using the Grubbs’ test, i.e. laboratories defined by very large or small results (glucose content).

EP

R> grubbs.test(qcdata)

AC C

Grubbs' Test

Critical value: 2.032 Material Gmax G.max p.value.max Gmin G.min p.value.min

1

A Lab8 1.909

9.986e-02 Lab7 1.915

9.690e-02

2

B Lab4 1.564

3.708e-01 Lab1 1.490

4.568e-01

3

C Lab4 2.984

2.220e-16 Lab7 1.387

5.937e-01

4

D Lab8 2.388

4.186e-04 Lab7 2.424

8.644e-05

5

E Lab2 1.576

3.577e-01 Lab7 1.552

3.846e-01

21

ACCEPTED MANUSCRIPT

RI PT

A B C D E

SC

0

Lab1

Lab2

M AN U

−3

−2

−1

h

1

2

3

Glucose

Lab3

Lab4

Lab5

Lab6

Lab7

Lab8

Laboratory

Figure 6: Mandel’s h statistics, classified for material and laboratory.

D

Once the outlier laboratories are removed, one-way ANOVA analysis and mean comparison test can be performed [1]. The ILS package provides function

TE

ils.aov() to perform an analysis of variance considering random or fixed effects of the laboratory factor depending on the value of the random argument. By setting this parameter to TRUE, a random effects one-way ANOVA is done for

EP

each material, providing the corresponding F tests and confidence intervals for the variances. For instance, the following code shows an example using the

AC C

results of one material for the sake of simplicity. R> Glucose2 <- subset(Glucose, Material == "A") R> ils.aov(Glucose2, random = TRUE, level = 0.95)

AOV of Material: A

Analysis of Variance Table 22

ACCEPTED MANUSCRIPT

Df laboratory Residuals

Sum Sq Mean Sq F value Pr(>F)

7

7.7152

1.1022

16 18.0871

1.1304

0.975 0.4816

SC

Linear mixed model fit by REML ['lmerMod'] Formula: x ~ 1 + (1 | laboratory) Data: df

Groups

Name

M AN U

REML criterion at convergence: 71.0936 Random effects:

RI PT

Response: x

Std.Dev.

laboratory (Intercept) 1.393e-07 Residual

1.059e+00

Number of obs: 24, groups: Fixed Effects:

D

(Intercept)

laboratory, 8

TE

41.52

Confidence Intervals

97.5 %

sd_(Intercept)|laboratory

0.0000000

0.8337248

sigma

0.7776066

1.4169263

EP

2.5 %

AC C

(Intercept)

41.0863395 41.9503271

If a fixed effects ANOVA model is fitted (random = FALSE), multiple com-

parison of means with the Tukey HDS method are shown (and can be also plotted by setting the argument plot = TRUE).

23

ACCEPTED MANUSCRIPT

4. Interlaboratory Studies: New FDA Approach

RI PT

A random variable X is a functional variable if it takes values in a functional space F (full normed or semi-normed). A particular case occurs when the functional variable is of the form X = {X(t) : t ∈ T }, where T is an interval

T ⊂ R that belongs to a Hilbert space, as is the case of continuous functions in an interval [3].

SC

A set of functional data {x1 (t), . . . , xn (t)} is the observation of n functional variables X1 (t), . . . , Xn (t) with the same distribution as X(t). Where X(t) is

M AN U

usually assumed to be an element of: Z 2 L2 (T ) = f : T → R, f (t) dt < ∞ , T

with the inner product hf, gi =

R

T

f (t) g (t) dt.

In this context, the ILS package can be used to apply consistency tests (outlying detection) in an Interlaboratory Study. The techniques developed in [6] to check if inconsistent laboratories are detected either by outliers in the

D

within-laboratory or in between-laboratory variability, have been implemented

TE

in this package.

4.1. Hypothesis of reproducibility and repeatability

EP

In a ILS, each laboratory l experimentally test n samples, observing a set of curves X1l (t), . . . , Xnl (t) , l = 1, . . . , p. To test the hypothesis that there are no differences in the measurements between the laboratories, functional statistics Hl (t) and Kl (t) are estimated for each laboratory.

AC C

The null hypothesis of reproducibility states that: H0 : µ1 (t) = µ2 (t) = · · · = µp (t),

where µl (t), l = 1 . . . p, are the theoretical functional means of the laboratories. To evaluate the reproducibility of the laboratory results, the H(t) statistic is

calculated as follows: Hl (t) =

¯ l (t) − X(t) ¯ X , l = 1, . . . , p, Sl (t) 24

ACCEPTED MANUSCRIPT

¯ l (t) and Sl (t) are the the point-wise calculated functional mean and where X ¯ variance of the l-th laboratory, and X(t) is the global functional mean. To test

dH l

Z = kHl (t)k =

! 12

b 2

Hl (t) dt a

.

RI PT

the reproducibility hypothesis, the test statistic dH is defined as:

Significant larger values of dK will determine supposedly non-consistent labora-

SC

tories.

On the other hand, the null hypothesis of repeatability is:

M AN U

H0 : σ12 (t) = σ22 (t) = · · · = σp2 (t),

where σl2 (t), l = 1 . . . p, are the theoretical functional variances corresponding to each laboratory l. The repeatability test is based on the K(t) statistic, expressed as:

Kl (t) = where Sr (t) =

q P p 1 p

l=1

Sl (t) , l = 1, . . . , p, Sr (t)

Sl2 (t). The test statistic is dK l = kKl (t)k, and likewise,

D

significant larger values of dK will correspond to laboratories assumed as non-

TE

homogeneous.

A bootstrap procedure to reproduce the distribution of these statistics under the corresponding null hypothesis is proposed in [6]. This procedure provides

EP

the approximation of the critical values cK and cH of the test statistics dH and dK , respectively. Additionally, confidence bands for the functional statistics H(t) and K(t) can be computed (determined by the envelope of their bootstrap

AC C

counterparts with a norm less than the corresponding critical value). 4.2. ILS: Thermogravimetric Study In this section, with the aim to perform a r&R study of the TG and DSC func-

tional datasets described in subsection 2.2, the corresponding statistics H(t), K(t), dH and dK will be estimated and graphically represented. As above mentioned, the TG dataset contains thermogravimetric test results from 7 laboratories, while the DSC dataset contains results from 6 laboratories (excluding 25

ACCEPTED MANUSCRIPT

laboratory 1). Laboratories 1, 5 and 6 have provided different results from the remaining laboratories and should be detected as outliers.

RI PT

Function mandel.fqcs() estimates the functional statistics Hl (t) and Kl (t),

H and the test statistics dK l and dl , l = 1 . . . p. It also performs the bootstrap

procedure described in [6] to approximate the cK and cH critical values, and to

compute functional confidence bands for H(t) and K(t), under the correspond-

ing null hypothesis. Note that the computational time may be high, depend-

SC

ing on the sample size, the number of discretization points and the number of bootstrap samples. The results can be plotted with the default method (see

M AN U

Figure 7). For instance, the following code illustrates this feature with the TG functional dataset (the computational time of the bootstrap procedure was 58.4 s on a Intel I7-7500U CPU with 8GB of RAM). R> set.seed(1)

R> mandel.tg <- mandel.fqcs(fqcdata, nb = 100) R> plot(mandel.tg)

D

H The left panels of Figure 7 show the dK l and dl test statistics for each

TE

laboratory, l = 1 . . . p, as well as the corresponding critical values cH and cK (horizontal lines), constructed at a significance level of α = 0.01. In the case of the dH statistic, it is concluded that laboratory 7 does not meet the repro-

EP

ducibility hypothesis (according with [6], laboratories 1, 6, and 7 were detected as outliers through an iterative process). When the repeatability hypothesis is tested by the dK test statistic, laboratory 6 is identified as an outlier in the first

AC C

iteration.

The right panels of Figure 7 show the functional statistics Hl (t) and Kl (t)

for each laboratory, l = 1 . . . p, as well as the corresponding confidence bands (dotted lines in black). Functional estimates with a significant large test statistic are shown with a solid line. These functional estimates allow us to identify in which time intervals the laboratories provide different results. In this case, the differences between the laboratory 7 and the remaining are in the interval corresponding to the first degradation process of calcium oxalate. 26

ACCEPTED MANUSCRIPT

dH

Lab 1

Lab 3

Lab 5

1 2 3 −1

Lab 7

0

200

400

Laboratory

RI PT

0

−3

20

Statistic

40

H(t)

600

800

1000

SC

t

dK

0

2.0 0.0

1.0

M AN U

20

Statistic

40

3.0

K(t)

Lab 1

Lab 3

Lab 5

Laboratory

Lab 7

0

200

400

600

800

1000

t

Figure 7: TG dataset: The left panels show the dH (up) and dK (below) test statistics for

D

each laboratory and the corresponding critical values (dotted line). The right panels show the functional statistics H(t) (up) and K(t) (below) for each laboratory, with their confidence

with a solid line.

TE

bands (dotted lines). Functional estimates corresponding to outlier laboratories are shown

EP

Finally, the ILS package is used to perform outlier detection tasks in the Interlaboratory Study defined by the DSC dataset. Thus, Figure 8 shows that repeatability hypothesis is not reject. Otherwise, the reproducibility hypothesis

AC C

is rejected, and laboratory 6 is properly detected as an outlier (see top panels

of Figure 8).

R> data(DSC)

R> fqcdata.dsc <- ils.fqcdata(DSC, p = 6, +

index.laboratory = paste("Lab", 2:7), argvals = delta)

R> mandel.dsc <- mandel.fqcs(fqcdata.dsc, nb = 100) R> plot(mandel.dsc)

27

ACCEPTED MANUSCRIPT

dH

Lab 2

Lab 4

1 2 3 −1

Lab 6

0

200

400

Laboratory

600

800

1000

SC

t

dK

2.0 0.0

1.0

M AN U

20

Statistic

40

3.0

K(t)

0 Lab 2

Lab 4

Lab 6

Laboratory

Figure 8:

RI PT

0

−3

20

Statistic

40

60

H(t)

0

200

400

600

800

1000

t

DSC dataset: The left panels show the dH (up) and dK (below) test statistics for

D

each laboratory and the corresponding critical values (dotted line). The right panels show the functional statistics H(t) (up) and K(t) (below) for each laboratory, with their confidence

with a solid line.

EP

5. Conclusions

TE

bands (dotted lines). Functional estimates corresponding to outlier laboratories are shown

The present ILS library has been implemented in R software to provide

AC C

practitioners of Academia and Industry an open source computational tool to perform the statistical analysis in Interlaboratory Studies. The development and presentation of this library fills a gap within the software alternatives to carry out this type of analysis. In fact, this package provides the main descriptive and outlier detection tools

dealing with the Interlaboratory Studies and recommended by the ISO 5725-41994 and ASTM E-691 standards. Namely, Mandels’s h and k test (including their graphical output), Grubbs’ test, Cochran’s test, in addition to ANOVA 28

ACCEPTED MANUSCRIPT

utilities. On the other hand, apart from standard scalar statistical techniques, the ILS

RI PT

package provides FDA techniques to deal with functional data (curves), as data

unit, in the framework of Interlaboratory Studies. Indeed, the main novelty of this computational proposal is the implementation of the functional extensions

of the h and k Mandel’s statistics when data results are curves, preventing to lose

relevant information derived from reduction dimension and feature extraction

SC

processes. These new methods can identify successfully the outlier laboratories directly from data curves.

M AN U

Thus, different study cases dealing with analytical chemistry and applied physics (thermal analysis), and also clinical studies have been presented and analysed by using the scalar and FDA approaches provided by the ILS computational tool.

The availability of these computational tools dealing with functional data is important and necessary since the increasing number of processes that are actually sensorized, and the complexity of data obtained by laboratory experi-

D

mental techniques. They produce in many cases a huge volume of complex data

TE

that had to be properly analysed for their future exploitation. In this paper we have proposed new computational tools that allows to perform the statistical

EP

analysis of ILS studies when working with this new paradigm of data.

6. Acknowledgements

AC C

This research/work of Salvador Naya, Javier Tarr´ıo-Saavedra and Rub´en Fern´ andez-Casal have been supported by MINECO grants MTM2014-52876R and MTM2017-82724-R, and by the Xunta de Galicia (Grupos de Referencia Competitiva ED431C-2016- 015 and Centro Singular de Investigaci´ on de Galicia

ED431G/01), all of them through the ERDF. The research of Miguel Flores has been partially supported by Grant PII-

DM-002-2016 of Escuela Polit´ecnica Nacional of Ecuador.

29

ACCEPTED MANUSCRIPT

References

RI PT

[1] ISO-5725, International Standard ISO 5725-4-1994: Accuracy (Trueness and precision) of measurement methods and results — Part 2: Basic method for the determination of repeatability and reproducibility of a standard measurement method, Geneva, Suiza, 1994.

[2] ASTM-E691, Practice for conducting and interlaboratory study to deter-

SC

mine the precision of a test method., West Conshohocken, USA, 2004. [3] J. O. Ramsay, B. W. Silverman, Functional Data Analysis, 2nd ed.,

M AN U

Springer-Verlag, New York, 2002.

[4] J. O. Ramsay, B. W. Silverman, Applied Functional Data Analysis: Methods and Case studies, Springer-Verlag, New York, 2002.

[5] F. Ferraty, P. Vieu, Nonparametric functional data analysis: theory and practice, Springer Science & Business Media, 2006.

D

[6] M. Flores, J. Tarr´ıo-Saavedra, R. Fern´andez-Casal, S. Naya, Functional extensions of Mandel’s h and k statistics for outlier detection in interlabo-

TE

ratory studies, Chemometrics and Intelligent Laboratory Systems (2018). [7] S. Naya, J. Tarr´ıo-Saavedra, J. L´opez-Beceiro, M. Francisco-Fern´andez,

EP

M. Flores, R. Artiaga, Statistical functional approach for interlaboratory studies with thermal data, Journal of Thermal Analysis and Calorimetry

AC C

118 (2014) 1229–1243. [8] M. Flores, R. Fern´ andez-Casal, S. Naya, J. Tarr´ıo-Saavedra, R. Bossano, ILS: Interlaboratory Study, 2018. URL: https://CRAN.R-project.org/ package=ILS, R package version 0.3.1.

[9] M. Flores, S. Naya, J. Tarr´ıo-Saavedra, R. Fern´andez-Casal, Functional Statistics and Related Fields, Springer, 2017, pp. 123–130. doi:10.1007/ 978-3-319-55846-2_16.

30

ACCEPTED MANUSCRIPT

[10] S. L. R. Ellison, metRology:

Support for Metrological Applications,

2017. URL: https://CRAN.R-project.org/package=metRology, R pack-

RI PT

age version 0.9-26-2.

[11] T. Hothorn, F. Bretz, P. Westfall, R. M. Heiberger, A. Schuetzenmeis-

ter, S. Scheibe, multcomp: Simultaneous Inference in General Parametric Models, 2017. URL: https://CRAN.R-project.org/package=multcomp,

SC

R package version 1.4-7.

[12] M. Febrero-Bande, M. Oviedo, Statistical computing in functional data

M AN U

analysis: The R package fda.usc, Journal of Statistical Software, Articles 51 (2012) 1–28.

[13] J. Ramsay, G. Hooker, S. Graves, Functional data analysis with R and Matlab, 2010.

[14] J. Mandel, A new analysis of interlaboratory test results, In: ASQC Quality Congress Transaction-Baltimore (2014) 360–366.

D

[15] P. Wilrich, Critical values of Mandel’s h and k, the Grubbs and the Cochran

TE

test statistic, AStA Adv Stat Anal 97 (2013) 1–10. [16] F. Grubbs, G. Beck, Extension of sample sizes and percentage points for significance tests of outlying observations, Technometrics 14 (1972) 847–

AC C

EP

854.

31