Evaluation of different validation strategies and long term effects in NIR calibration models

Evaluation of different validation strategies and long term effects in NIR calibration models

Food Chemistry 141 (2013) 2639–2648 Contents lists available at SciVerse ScienceDirect Food Chemistry journal homepage: www.elsevier.com/locate/food...

502KB Sizes 0 Downloads 39 Views

Food Chemistry 141 (2013) 2639–2648

Contents lists available at SciVerse ScienceDirect

Food Chemistry journal homepage: www.elsevier.com/locate/foodchem

Analytical Methods

Evaluation of different validation strategies and long term effects in NIR calibration models Valeria Sileoni a,⇑, Ombretta Marconi b, Giuseppe Perretti b, Paolo Fantozzi a a b

Italian Brewing Research Centre, University of Perugia, Via San Costanzo, 06126 Perugia, Italy Department of Economic and Food Science, University of Perugia, Via San Costanzo, 06126 Perugia, Italy

a r t i c l e

i n f o

Article history: Received 23 August 2012 Received in revised form 7 January 2013 Accepted 24 April 2013 Available online 11 May 2013 Keywords: Near infrared spectroscopy Barley malt Beer Validation Analyses

a b s t r a c t Stable and reliable NIR calibration models for the barley malt quality assessment were developed and exhaustively evaluated. The measured parameters are: fine extract, fermentability, pH, soluble nitrogen, viscosity, friability and free-amino nitrogen. The reliability of the developed calibration models was evaluated comparing the classic leave-one-out internal validation with a more challenging one exploiting resampling scheme. The long-term effects, intended as possible alterations of the NIR method predictive power, due to the variation between samples collected in different years, were evaluated through an external validation which demonstrated the stability of the developed calibration models. Finally, the accuracy and the precision of the developed calibration models were evaluated in comparison with the reference methods. This exhaustive evaluation offers a realistic idea of the developed NIR methods predictive power for future unknown samples and their application in the beer industry. Ó 2013 Elsevier Ltd. All rights reserved.

1. Introduction Beer is the world’s most widely consumed alcoholic beverage with a world production of 1,846,393,000 hl in 2010 and is also likely the oldest alcoholic beverage (The Barth Report hops, 2010–2011). Each beer production step (e.g. lautering, fermentation and filtration) and the characteristics of the produced beer (e.g. flavour, colour, foam and stability) are strongly influenced by the quality of the barley malt, which is the main raw material. The quality of the barley malt is a complex character. Its most important feature is its behaviour in the mashing process and its ability to catabolise its components (Kunze, 2004). The Mitteleuropäische Brautechnische Analysenkommission (MEBAK, 2002) and the European Brewery Convention (EBC, 2006) defined the analytical specifications commonly used in Europe. Ideally, the quality assessment methods routinely used in food manufacturing should be non-invasive, non-destructive, rapid and reliable. The most rapid analytical methods currently available are based on the physical properties of food products and include spectroscopic methods, such as near-infrared spectroscopy (NIRS) (Williams, 2001). Near infrared spectroscopy is a vibrational spectroscopy that involves the spectrum region between 13,300 ⇑ Corresponding author. Tel.: +39 075 585 7923; fax: +39 075 585 7946. E-mail address: [email protected] (V. Sileoni). 0308-8146/$ - see front matter Ó 2013 Elsevier Ltd. All rights reserved. http://dx.doi.org/10.1016/j.foodchem.2013.04.110

to 4000 cm1 (800–2500 nm) and which measures overtones and combinations of the molecule’s vibrational modes, principally those involving hydrogen bound with another heavier element, such as carbon, nitrogen or sulphur (O–H, C–H, N–H or S–H bonds) (Siesler, 2008). As these types of bonds are common in organic matter, NIRS is widely used to analyse the composition of food products (Dahm & Dahm, 2001). Furthermore, the low absorbance of water makes the NIR region eminently suitable for the analysis of samples with a high content of water such as foods and beverages (Cozzolino, Kwiatkowski, Waters, & Gishen, 2007; Woodcock, Downey, & O’Donnel, 2008). Moreover, longer path lengths as compared to other spectroscopic techniques enable NIR spectra to be measured by transmission through intact materials, otherwise by diffuse reflection for opaque, biological samples. This allows rapid, low-cost and non-destructive analysis because sample preparation is avoided or replaced by very simple procedures like grinding. In this context, the potentialities for the application of NIRS to the brewing industry are evident, especially for the malt quality assessment. In fact, grains and flour are optimal sample types for NIR diffuse reflectance measurements (Osborne, 2008). Indeed, a typical application of NIR spectroscopy is the measurement of the main quality parameters of barley malt, such as moisture and nitrogen content (Henry, 1985; Marte et al., 2009; Sileoni, Perretti, Marte, Marconi, & Fantozzi, 2010). Moreover, there are some studies about the determination of different malt quality parameters, such as extract, Diastatic Power (DP), b-glucan, Free-

2640

V. Sileoni et al. / Food Chemistry 141 (2013) 2639–2648

Amino Nitrogen (FAN) and Total Soluble Nitrogen (TSN) from grain (Black & Panozzo, 2001) and ground samples spectra (Sileoni, Perretti, Marconi, & Fantozzi, 2009). Despite its different advantages and wide range of applications, NIRS was largely ignored by conservative spectroscopists during the past years, because of the overlap of combination bands and overtones, which greatly decreases its specificity, especially for interpretation purposes. However, NIR spectroscopy becomes more attractive with the more recent availability of chemometrics for qualitative discrimination (Munck et al., 2010; Naes, Isaksson, Fearn, & Davies, 2002) and quantitative determination (Conzen, 2006; Romía & Bernàrdez, 2008). In fact, absorption bands arising from NIR vibrations are typically very broad, leading to spectra that often lack a detailed structure, but which contain significant amounts of information hidden by the overlapping nature of the specific absorption bands present. Consequently, multivariate (multiple wavelength) data analysis tools and calibration techniques (e.g. partial least-squares regression) are often exploited to extract the desired chemical information from spectral data sets (The American Society for Testing and Materials, 2001). The use of advanced data-handling (chemometrics) makes NIRS an indirect method that requires a large number of samples for calibration, covering a broad variability for each analytical parameter, with a uniform distribution between extreme values, to obtain an accurate calibration equation (Cruciani et al., 1989). This is especially true for natural matrices such as food raw materials and their processing where the composition cannot be exactly predetermined. In fact, to build a regression model, a relatively large number of training samples – both spectra and the desired reference sample – is required, and this training-set should not only vary in the quantity of interest but should also contain different factors contributing to the samples variation. Concerning barley malt, these factors are genetics, growth conditions, seasonal variation, harvesting, storage and production technology (Marconi et al., 2011). This is the reason for multivariate models in food processes taking years to build and/or improve up to a desirable level (Sileoni, Marconi, Perretti, Buiatti, & Fantozzi, 2010; Sileoni, van den Berg, Marconi, Perretti, & Fantozzi, 2011). Therefore, it is difficult to state in advance how a NIR calibration model will predict the parameter of interest for unknown future samples. It is necessary to calculate the error of prediction by means of different validation strategies, in order to evaluate how it changes according to the type and the number of samples used to calculate it. Moreover, an exhaustive evaluation of the real

predictive power of the NIR model, through a comparison with the reference method, is needed. Finally, considering that barley malt is a natural product, it is also necessary to consider the long term effects on the developed NIR calibration models for its quality evaluation. The long term effects are intended as possible alterations of the NIR method predictive power, due to the variation between samples collected in different years, because of genetic, environmental and technological factors. Therefore, it is advisable to verify that the predictability of a NIR method does not change when samples collected in different years are analysed. In fact, if these possible sources of variation are included in the data set of the calibration model, the model can ‘‘recognise’’ and handle them properly, and consequently the error of prediction is supposed to be stable. Consequently, the aim of this paper is the development of stable and reliable NIR methods to measure important malt quality parameters, such as fine extract, fermentability, pH, soluble nitrogen, viscosity, friability and free-amino nitrogen. Subsequently, the reliability, stability, accuracy and precision of the developed NIR calibration models were exhaustively evaluated adopting the same successful approach of previous papers from the same authors (Marte et al., 2009; Sileoni, Perretti, et al., 2010; Sileoni, Marconi, et al., 2010; Sileoni et al., 2011). Through different validation strategies, including the use of different numbers and kinds of samples, the assessment of the long term effects and the evaluation of accuracy and precision in comparison with the reference methods, the authors offer a realistic idea of the NIR methods predictive power on future unknown samples and their applicability in the beer industry. 2. Materials and methods 2.1. Samples Commercial pale malt samples were supplied from different industrial malt-houses and mills, and are representative of the Italian market. The number of samples changes between 116 and 334 according to the considered parameter, as shown in Table 1. The samples were collected from 2006, 2007, 2008 and 2009 crops. 2.2. Spectral analysis Sample preparation, spectral acquisition and NIR instrument validation were performed according to Sileoni et al. (Marte et al., 2009; Sileoni, Perretti, et al., 2010; Sileoni et al., 2011), using

Table 1 Spectral range (cm1), number of samples in the complete dataset and number of outilers for each parameter. Parameter

Spectral range (cm1)

Complete dataset

Samples in calibration

LVs

R95

Range true (predicted)

Bias

R

Slope

Offset

Viscosity (mPa⁄s) FAN (mg/L)

9000–4000

239

222

11

0.14

8.17e016

0.82

0.66

0.51

0.016

9000–4000

116

101

13

22

1.04e012

0.98

0.96

5.27

2.42

Friability (%)

9000–4000

196

176

10

4.9

1.45–1.58 (1.45–1.56) 102–168 (104–169) 78.8–99.4 (80.0–100.6) 79.14–84.10 (79.71–83.13) 77.78–83.40 (78.58–83.13) 5.73–6.18 (5.74–6.16) 0.530–0.830 (0.559–0.818)

1.80e013

Fine extract (%dm) Fermentability (%) pH Soluble nitrogen (%dm)

7163–6781; 6006–5624; 4463–4981 9000–4000

325

295

10

1.2

318

288

13

2.8

8320–7940; 6000–5600; 5320–4850; 4460–4000 9091–9709; 7934–6009; 5620–4081

334

304

14

0.12

327

310

11

0.09

RMSEC

0.92

0.84

14.12

1.60

013

0.78

0.61

31.39

0.54

013

1.01e

0.85

0.72

22.93

0.51

1.34e014

0.87

0.75

1.79

0.03

1.0e014

0.87

0.75

0.166

0.020

1.50e

FAN: free amino nitrogen; dm: dry matter; LVs: latent variables; R95: reproducibility limit at 95% of probability; R: correlation coefficient; RMSEC: root mean square error of calibration. In order to ease the comparison between the NIR and the reference method, the reproducibility limit at 95% of probability and the error of prediction (calculated for the optimal number of latent variables) are reported in bold.

V. Sileoni et al. / Food Chemistry 141 (2013) 2639–2648

a vector 22/N FT-NIR spectrometer system, equipped with a tungsten source, Rock-solid interferometer, integrating sphere module equipped with PbS detector for spectra acquisition in diffuse reflectance mode (Bruker Optics, Milan, Italy). Malt grain samples (1 kg) were homogenised by means of a sample divider (VLB, 110 Berlin, Germany) and finely ground by means of a DLFU type disk mill set at a distance between the disks of 0.2 mm (Bühler, Uzwil, Switzerland). The flours were used to carry out the reference analyses and to record the log(1/R) spectra in the spectral range of 11,500–4000 cm1, with a resolution of 8 cm1, with 64 scans. Absorption spectra were collected placing the samples on a quartz-bottomed cup (4 cm inner diameter) spinned during the measurement (10 rpm), at room temperature and against a goldcoated background. 2.3. Reference analyses Standard methods from the Analytica European Brewery Convention (A-EBC, 2006) were used as reference analyses. The barley malt quality parameters considered are the following: fine extract (% dry matter), fermentability (%), pH, soluble nitrogen (% dry matter), viscosity (mPa⁄s), friability (%) and free-amino nitrogen (FAN, mg/L). 2.4. Software Absorption spectra were collected by means of the OPUS software (version 5.5 or 6.5, Bruker Optics). All computations involving the calibration models (spectral data pretreatments, selection of the spectral data set, construction of PLS regression model and its validation) were carried out by the MATLAB software (version 7.6) and in-house routines (The Mathworks Inc., Natick, MA, USA).

2641

sion (iPLS) algorithm was used in order to choose the most suitable spectral range for modelling the concentration data. The spectra are divided into a number of intervals of equal width and local PLS models are developed on each of these spectral subintervals. The prediction performance of these local models and the global (full-spectrum) model is compared. The comparison is mainly based on the validation parameter RMSEP (root mean squared error prediction), but other parameters such as R (correlation coefficient), slope, and offset are also evaluated to ensure a comprehensive model overview (Andersen & Bro, 2010; Nørgaard et al., 2000). The spectral range selected for each parameter is shown in Table 1. (d) Outlier identification and elimination: The samples with high difference between true and predicted data, high Mahalanobis distance and high spectral residuum were excluded from the calibration data set (Sileoni et al., 2011). The number samples used for the calibration after the removal of the outliers for each parameter is shown in Table 1. (e) Number of calibration factors: The determination of the number of principal components is a crucial point for the quality of the calibration model. The factors (equivalent to principal components, or latent variables) explaining the spectral matrix are sorted in decreasing order according to their contribution to the spectral features In this work, the ranks selected as optimal are the ones for which the RMSEP parameter reaches the lowest value. If two or more ranks gave good results, an F-test has been calculated to verify that these results are not statistically different and then the lowest rank was chosen, in order to prevent overfitting. The number of latent variables chosen for each parameter is shown in Tables 1–3.

2.5. Data analysis 2.5.1. Calibration models set up (a) Algorithm: The calibration models were developed using PLS1 (Partial Least Squares, on Latent Structures or Projections) regression. This algorithm provides a continuous relationship in space between two variables related to each other and its purpose is to build a calibration model using the minimum number of factors (or latent variables) derived from the spectral data matrix X and containing all information relevant to predict concentrations instead of the original spectra (Geladi & Kowalski, 1986). (b) Methods of preprocessing: In order to allows the PLS algorithm to find the best correlation between spectral data and concentrations, different scatter correction and derivation methods were compared and finally the extended multiple scatter correction (EMSC) was found to be the best pretreatment. This method can be seen as a merging of the de-trending technique with the Multiplicative Scatter Correction (MSC), including both second-order polynomial fitting to the reference spectrum and fitting of a baseline on the wavelength axis. The EMSC algorithm minimises the signal variability caused by scatter from particulates in the samples (following the basic idea of MSC) with the inclusion of the wavelength dependency (Rinnan, van den Berg, & Balling, 2009; Martens & Stark, 1991). (c) Spectral range: NIR spectroscopy is called ‘‘full-spectrum method’’ because the calibration model can be set up using the whole spectral range, but the contribution of the noise spectral absorption bands or contaminants can degrade its quality. Therefore, the interval partial least-squares regres-

2.5.2. Evaluation of the model The quality of calibration models can be assessed via some statistical parameters, of which those allowing the mean error for the whole population rather than a single sample are to be preferred. The statistics typically used to assess the quality of calibration models is the Root Mean Square Error of Calibration (RMSEC), which should be as lower as possible:

sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi PM 2 i¼1 ðxi  yi Þ RMSEC ¼ N where xi = true values (determined by reference methods), yi = predicted values (determined by the NIR calibration models), N = number of samples used for the calibration. This parameter has been calculated for prediction using 1–20 PLS1 factors, and the lowest RMSEC and the highest R values are reported in the results. The results of the calibration are usually displayed in a graph that represents the predicted values versus the true values. The model has of course a better predictability if the values are arranged along a line 45° across the chart (bisector). From the equation of the fitted trend line:

y ¼ ax þ bðEq:3:11Þ it is possible to obtain the Slope a and the Offset b, and these two parameters can be used for the evaluation of the calibration model. In fact, the fitted trend line is supposed to have an equation similar to the bisector, with Slope = 1 and Offset = 0. Another statistic typically used to assess the quality of calibration models is a measure of the correlation (linear dependence) between two variables xi (true values) and yi (predicted values), giving a value between +1 and 1 inclusive:

2642

V. Sileoni et al. / Food Chemistry 141 (2013) 2639–2648

Table 2 Results of leave-one-out and leave-33%-out cross-validations. Viscosity (mPa⁄s)

FAN (mg/L)

Friability (%)

Fine extract (%dm)

Ferm (%)

pH

Soluble nitrogen (%dm)

R95 Range true LVs Cross-validation leave-one-out Samples in calibration/validation Range predicted Bias R Slope Offset RMSECV % of samples outside the range

0.14 1.450–1.580 11

22 102.0–168.0 13

4.9 78.8–99.4 10

1.2 79.14–84.10 10

2.8 77.78–83.40 13

0.12 5.86–6.18 14

0.09 0.530–0.760 11

221/1 1.451–1.564 5.85e005 0.73a 0.88 0.18 0.019b 0

100/1 108.7–167.6 0.10 0.91a 1.00 0.62 5.2b 0

175/1 80.5–100.4 0.02 0.85a 0.95 4.27 2.1b 1

294/1 79.71–83.17 0.002 0.72a 0.56 35.75 0.60b 3

287/1 78.74–82.95 0.0014 0.72a 0.58 33.96 0.67a 0

303/1 5.88–6.15 1.98e004 0.75a 0.72 2.30 0.04b 0

309/1 0.565–0.805 8.73e005 0.82a 0.70 0.20 0.023b 0

Cross-validation leave-33%-out Samples in calibration/validation Range predicted Bias R Slope Offset RMSECV % of samples outside the range

149/73 1.470–1.558 0.0012 0.68 ± 0.05a 0.63 0.56 0.021 ± 0.002b 0

68/33 108.8–165.1 0.70 0.89 ± 0.04a 0.91 13 5.9 ± 0.8b 0

118/58 81.1–98.1 0.60 0.84 ± 0.03a 0.70 27.25 2.2 ± 0.2b 2

198/97 80.04–83.31 0.09 0.72 ± 0.04a 0.46 43.66 0.61 ± 0.04b 5

193/95 78.39–82.73 0.06 0.70 ± 0.04a 0.60 32.61 0.70 ± 0.04a 0

204/100 5.80–6.15 0.0075 .72 ± 0.05a 0.62 2.87 0.05 ± 0.01b 1

208/102 0.598–0.744 0.002 0.80 ± 0.03a 0.70 0.20 0.024 ± 0.001b 0

Numbers on the same column with the same letter as superscript are not statistically different (P < 0.05). FAN: free amino nitrogen; Ferm: fermentability; dm: dry matter; LVs: latent variables; R95: reproducibility limit at 95% of probability; R: correlation coefficient; RMSECV: root mean square error of cross validation. In order to ease the comparison between the NIR and the reference method, the reproducibility limit at 95% of probability, the error of prediction (calculated for the optimal number of latent variables) and the consequent % of samples outside the agreement range are reported in bold.

vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi u hP i2 u N   u i¼1 ðxi  xÞðyi  yÞ u ihP i R ¼ thP N N 2 2 i¼1 ðxi  xÞ i¼1 ðyi  yÞ A value of R close to 1 indicates a good linear dependence between the true and predicted values, namely a good predictability of the model. Another parameter that can be used in order to evaluate the calibration is the Bias, which is a systematic deviation of the predicted values yi from the true values xi, and which should be as lower as possible:

PN Bias ¼

i¼1 ðxi

 yi Þ

N

2.5.3. Validation In the order to check the real predictive power and the long time stability of the NIR calibration models, three different validation tests were applied: (a) Internal validation: the regular leave-one-sample-out crossvalidation procedure, with as many one-object validation subsets as the number of samples included in the calibration set, was applied to all the developed calibration models. (b) Internal validation: a more challenging cross-validation leave-N-samples-out, with N = 33% of samples, was applied to all the developed calibration models in order to verify how the RMSEP and the R2 values change by the number of samples used to calculate them. The selection of the samples excluded was random. Because these two values will change according to chance, the resampling procedure was repeated 200 times (Sileoni et al., 2011; Sileoni, Marconi, et al., 2010). The number of samples used in validation for each parameter is shown in Table 2. (c) External validation: three test-set validations, where the validation samples were chosen according to the year of collection, were applied in the order of check if the calibration model has a stable predictive performance on samples collected in different years. Three different data blocks were

defined and used for the validation: 2006, 2007, and 2008 + 2009. Two of these three data blocks were used to develop the calibration, and the third one was used as validation set. The number of samples used in validation for each parameter is shown in Table 3. This ‘‘extrapolation’’ test should give a good indication of the model performance for coming years. These validations were applied to the calibration models developed for the most important malt quality parameters: fine extract (% dry matter), fermentability (%), pH and soluble nitrogen (% dry matter) (Sileoni et al., 2011; Sileoni, Marconi, et al., 2010). The results of the three different validation were evaluated using the same statistical parameters calculated for the evaluation of the calibration models: the root mean square error prediction (RMSEP) or root mean square error of cross validation (RMSECV), the slope, the offset, the correlation coefficient (R) and the bias. 2.5.4. Comparison with the reference method (a) Agreement A method for evaluate the agreement between the model and the reference method involves the reproducibility limit at 95% (R95) of the reference method, namely the value which should, with a probability of 95%, include the absolute difference between the results of an analysis performed under reproducibility conditions (independent results obtained by the same method, on the same sample in different laboratories with different operators and different instruments). A comparison between the NIR and the reference method can be performed by calculating the percentage of reference values (true values) that fall in the interval definite by the following equation (The American Society for Testing and Materials, 2001):

y predicted  R95  y true  y predicted þ R95 If 95% or more of the reference values fall within this interval, then estimates produced with the multivariate NIR model agree with

2643

V. Sileoni et al. / Food Chemistry 141 (2013) 2639–2648 Table 3 Results of the ‘‘according to the year of collection’’ test-set validation. Parameter R95

Fine extract (%dm) 1.2

Fermentability (%) 2.8

pH 0.12

Soluble nitrogen (%dm) 0.09

Test set validation on 2006 Samples in calibration/validation Range true Best LVs/CV LVs Range predicted Bias R Slope Offset RMSEP % of samples outside the range

184/111 79.14–82.86 10 79.86–83.18 0.25 0.62 0.48 35.75 0.70 5

187/101 78.01–83.40 13 79.50–82.75 0.07 0.65 0.45 33.96 0.76 0

194/110 5.86–6.18 14 5.87–6.14 0.021 0.72 0.61 2.30 0.05 1

194/116 0.530–0.760 11 0.555–0.753 0.002 0.79 0.66 0.20 0.024 0

171/124 79.14–84.10 12 79.27– 83.15 0.36 0.74 0.65 28.48 0.70 5

166/122 77.78–83.40 11 78.82– 83.40 0.02 0.68 0.61 31.57 0.80 0

176/128 5.73–6.12 13 5.80– 6.14 0.025 0.79 0.60 2.43 0.05 1

185/125 0.590–0.760 11 0.599– 0.744 7.65e004 0.73 0.60 0.27 0.024 0

Test set validation on 2007 Samples in calibration/validation Range true Best LVs/CV LVs Range predicted Bias R Slope Offset RMSEP % of samples outside the range Test set validation on 2008 + 2009 Samples in calibration/validation Range true Best LVs/CV LVs Range predicted Bias R Slope Offset RMSEP % of samples outside the range

235/60 79.41–83.28 10 79.67– 82.24 0.26 0.69 0.48 50.20 0.76 10

10 79.32– 82.93 0.33 0.69 0.69 24.91 0.77 10

13 78.68–83.70 0.13 0.65 0.61 32.04 0.86 0

223/55 79.00–83.10 13 79.14– 82.57 0.30 0.63 0.42 55.55 0.80 0

238/66 5.80–6.11 14 5.82– 6.07 0.024 0.56 0.58 3.08 0.06 3

14 5.71–6.15 0.034 0.78 0.73 1.65 0.06 3

244/66 0.61–0.83 16 0.594– 0.799 0.003 0.77 0.75 0.18 0.026 0

11 0.602– 0.738 0.004 0.66 0.55 0.30 0.030 0

LVs: latent variables; RMSEP: root mean square error of prediction; CV: cross-validation; R: correlation coefficient; R95: reproducibility limit at 95% of probability; dm: dry matter. In order to ease the comparison between the NIR and the reference method, the reproducibility limit at 95% of probability, the error of prediction (calculated for the optimal number of latent variables) and the consequent % of samples outside the agreement range are reported in bold.

those produced by the reference method as well as a second laboratory repeating the reference measurement would agree. (b) Accuracy The accuracy and the precision of values estimated from a NIR multivariate model is calculated from repeated spectral measurements (Marte et al., 2009; Sileoni, Perretti, et al., 2010; The American Society for Testing and Materials, 2001). The number of samples for which repeat measurements are taken should be at least three. Calibration curve should be tested by recording more than 10 independent spectra under repeatability conditions and evaluating the predicted values. The normality test of the distributions on each data set was performed by means of the Shapiro–Wilk test with a probability level of P = 95%. Moreover, anomalous data were identified by means of the Huber test, which is based on the evaluation of the median. If both conditions are satisfied, that is, the distribution is normal and there are no anomalous data, the statistical Þ and the variance parameters, such as the average value ðy ðs2r Þ. The accuracy can be evaluated by checking that (Marte et al., 2009; Sileoni et al., 2010):

reproducibility limit at the 95% (R95) by the following equation (Marte et al., 2009; Sileoni et al., 2010):

R95 Ue ¼ pffiffiffi 2 (c) Precision The precision can be evaluated by checking that the variance ðs2r Þ of the distribution is statistically equivalent to the one of the reference method ðr2r Þ by mean of a Chi square’s test (The American Society for Testing and Materials, 2001). The variance of the reference method ðr2r Þ can be obtained from the repeatability limit at the 95% (r95), which is the value including, with a probability of 95%, the absolute difference between the results of an analysis performed under repeatability conditions (independent results obtained by the same method, on the same sample in the same laboratory with the same operator and instrument). The r2r can be calculated by the following equation (Marte et al., 2009; Sileoni et al., 2010):

r2r ¼

r 95 pffiffiffi 2

t

  xi þ Ue xi  Ue  y  is the average value of the distribution, xi is the true value where y determined by the reference method and Ue is the extended uncertainty of the reference method, which can be calculated from the

where t is the Student coefficient for a = 0.05, m degrees of freedom (m = N  1, where N is the number of laboratories participating to the collaborative trial performed to validate the method and calculate R95 and r95).

2644

V. Sileoni et al. / Food Chemistry 141 (2013) 2639–2648

3. Results and discussion 3.1. Results of calibration Various new calibration models have been developed for the determination of other malt quality parameters from NIR spectra of malt flour. The RMSEC were calculated using 1–20 PLS1 latent variables. For each calibration model, the best number of factors was chosen as the minor rank corresponding to the lowest RMSEC. The lowest RMSEC values obtained for each parameters are shown in Table 1, and they are always lower than the R95 of the reference methods, which is the absolute difference between the results of an analysis performed under repeatability conditions. These results demonstrate the good suitability of the developed calibration models for the evaluation of the quality of brewing raw materials, and the good agreement with the reference methods. Another indication of the good predictability of the developed NIR calibration models is given by the comparison between the true values (measured by the reference methods) and the pre-

dicted values (obtained by the NIR methods), both shown in Table 1. The mean difference between these two sets of data, namely the bias, is also displayed in Table 1 for each parameters, and it is always close to 0. The values predicted by mean of the calibration models versus the reference values are shown in Fig. 1 with the correspondent fitted trend lines. The linear dependence between these two sets of data can be visually evaluated, and it indicates a good correlation between the NIR spectra, opportunely pretreated and cut in the most suitable spectral range, and the analytical data. These results confirm the suitability of the NIR methods for the determination of the parameters of interest. The good linearity between the spectral and the analytical data displayed in Fig. 1 is confirmed by the correlation coefficients shown in Table 1. The best NIR calibration models were obtained for the determination of FAN and friability, pH and soluble nitrogen (Fig. 1b, c, f and g), with correlation coefficients about 0.9. For these parameters, the fitted trend lines between the true and predicted data show equations similar to the bisector, with the slope values close to 1 (Table 1).

Viscosity - Calibration (11 LVs)

Predicted (cP)

1.6 1.55 1.5 1.45 1.4 1.4

1.5 1.55 True (cP)

FAN - Calibration (13 LVs)

180

1.6

1.65

Friability - Calibration (10 LVs)

105 100

160

Predicted (%)

Predicted (mg/l)

(a) 1.45

140 120

100

120

140 True (mg/l)

160

90 85 80

(b)

100

95

(c)

75 75

180

80

2007

2008

84

2009

82 81 80

105

2006

2007

2008

2009

79

80

81 82 True (%dm)

83

84

82 81 80

78 77

85

pH - Calibration (14 LVs) 2007

2008

0.9

2009

6 5.9 5.8

(f)

5.7 5.7

5.8

5.9

6 True

6.1

6.2

Predicted (%dm)

2006

6.1 Predicted

100

79

(d)

79 78

6.2

95

83

83

Predicted (%)

Predicted (%dm)

2006

90 True (%)

Fermentability - Calibration (13 LVs)

Fine extract - Calibration (10 LVs) 84

85

(e) 78

79

80 81 True (%)

82

83

84

Soluble Nitrogen - Calibration (11 LVs) 2006

2007

2008

2009

0.8 0.7 0.6 0.5 0.5

(g) 0.6

0.7 True (%dm)

0.8

0.9

Fig. 1. Predicted versus true (reference) values in calibration. (FAN: free amino nitrogen; dm: dry matter; LVs: latent variables solid line: fitted trend line).

V. Sileoni et al. / Food Chemistry 141 (2013) 2639–2648

3.2. Results of validation Results of validation (a): the calibration models were validated by leave-one-out cross-validation (CV) procedure. The results of this validation are shown in Table 2. As explained in the materials and methods section, in this procedure one sample is extracted from the calibration data set and treated as an unknown sample to apply the model and predict the parameter of interest. The procedure is repeated for all the samples of the calibration data set and the root mean of the obtained square errors gives the RMSECV value. Differently from what happens in the evaluation of the calibration, the sample used for calculating the error is extracted from the calibration data set, so it is unknown to the model. Consequently, the RMSECV values are higher than the corresponding RMSEC values. Despite this, the RMSECV values shown in Table 2 are very low compared with the range of measured values(Table 2) and they allow a good agreement between the calibration models and the reference methods. In fact, the absolute mean difference between true and predicted data, namely the mean error of prediction, is always lower than the R95 of the reference methods (Table 2) and consequently for all the parameters considered at least the 95% of the true values fall into range defined by the predicted values ± the R95 of the reference methods, meaning that the values predicted by the NIR models agree with those produced by the reference method as well as a second laboratory repeating the reference measurement. The leave-one-out CV results confirm that the best NIR calibration models were obtained for the determination of FAN and friability, pH and soluble nitrogen, showing the highest correlation coefficients and the highest slope values (Table 2). Results of validation (b): However, one single performance criterion of validation often is not sufficient to judge a NIR calibration model in the proper way and it can lead to over- or under-estimation of the model quality. In fact, the error of prediction is statistic, which can change according to the type or number of samples considered in the validation. For example, the cross-validation leaveone-out considers all the samples in the data set for the determination of the error of prediction, but just one sample is used each time to calculate it, having a low perturbing effect on the model. For this reason, this technique can lead to an overestimation of the models predictability, and can be not enough to give a realistic idea of their performances in the real application. Consequently, we decided to compare the simple leave-one-sample-out crossvalidation (CV loo) with a more challenging CV with leave-33%samples-out. The selection of the samples excluded was random. This kind of validation was chosen in order to verify how the RMSECV-values change by the number of samples used to calculate it. Because the error will change according to according to the type of samples considered in the validation, the re-sampling procedure was repeated 200 times. For each i-set of 200 containing the 33% of samples, the error of prediction was calculated. Finally, for each of the 20 latent variables 200 different values of error of prediction have been calculated, according to the type of validation samples. It was therefore possible to calculate a mean and an uncertainty of the errors of prediction and of the correlation coefficients. This kind of validation was applied on all the developed calibration models, allowing to the determination of the different RMSECV values and correlation coefficients with their standard deviations, shown in Table 2. These errors of prediction are not statistically different from the ones obtained by leave-one-out cross-validation. This means that the cross-validation in these models does not overestimate their predictability, and then the models can be considered stable and reliable. The worst predicted i-set is considered in the comparison between the range of the true and the predicted values. This i-set leads to the highest errors of prediction, which can be obtained by adding each mean RMSECV value to its stan-

2645

dard deviation. Even if this worst situation is considered in the comparison with the reference methods, the two methods show a good agreement. In fact, for all the parameters considered at least the 95% of the true values fall into range defined by the predicted values ± the R95 of the reference methods. Results of validation (c): Then, a more extreme kind of validation was applied for the most important malt quality parameters: fine extract (% dry matter), fermentability (%), pH and soluble nitrogen (% dry matter), which can be called a ‘‘year-according’’ test-set validation. In fact, for these four calibration models it was possible to check if the predictive performance is stable during the time, because the samples used to set up these calibration models have been analysed from 2006 to 2009. Three different data blocks were defined and used for the validation: 2006, 2007 and 2008 + 2009, according to the year of collection. Two of three data blocks were used to develop the calibration and the third one was used as validation-set. This ‘‘extrapolation’’ test should clarify the long-term effects on NIR calibrations and give a good indication of the model performance for coming years. In fact, if the error of prediction for each parameter calculated through this ‘‘year-according’’ test-set validation does not significantly change compared to the one obtained through the internal validations, we can state that the developed models are reliable also for future, unknown samples. A stable error of prediction means that the possible sources of variation due to genetic, environmental and technological factors have been included in the data set of the calibration model and opportunely ‘‘handled’’ by the model. Therefore, the predictability of a NIR method is not supposed to change when samples collected in different years will be analysed. The Root Squared Errors of Prediction values (RMSEP, dashed lines) calculated for each parameter from 1 to 20 PLS latent variables are shown in Fig. 2 compared with the Root Squared Errors of Cross Validation leave-one-outvalues (RMSECV solid lines). The results of test-set validations are displayed in Table 3 and the errors of prediction are compared with the reproducibility limits (R95) of the reference methods. Concerning pH and soluble nitrogen the differences between the errors of prediction obtained by internal or external validation are not significant, while for fermentability and fine extract the ‘‘year-according’’ test-set validations show a lower predictive power than the cross-validations, and the predictive performance is not the same on the three data sets. In particular, the calibration model developed for the determination of soluble nitrogen can be defined very stable and reliable. In fact, Fig. 2a and b clearly show that the number of latent variables chosen in cross-validation leads to the minimum error of prediction also for the external validations on 2006 and 2007 samples. Moreover, the errors of prediction obtained by internal or external validation are the same, demonstrating the stability of the NIR method (Table 3). The model seems a bit under-fitted for the 2008 + 2009 validation (Fig. 2c), for which 16 latent variables are needed for the best prediction. Anyway, even considering the highest error of prediction, which is equal to 0.030%dm for 11 latent variables on 2008 + 2009 samples, it is shown in Table 3 that this value is one third of the reproducibility limit of the reference method. The NIR method has then a good agreement with the reference method, and reasonably this good performance can be expected also for unknown samples analysed, for example, in 2010. Similar considerations are valid for the calibration model developed for pH determination. As shown in Fig. 2d, e and f, the choice of 14 latent variables is suitable for samples of 2008 + 2009 data set, while the model is over-fitted for 2006 and 2007 data sets. Consequently, 14 latent variables on 2007 samples lead to the highest error of prediction, which is equal to 0.06. This error is anyway the same obtained for 2008 + 2009 test set validation (Table 3). These results allow to conclude that a similar error can be expected for future unknown samples, and we can conclude that the devel-

2646

V. Sileoni et al. / Food Chemistry 141 (2013) 2639–2648 Soluble Nitrogen, TSV on 2006

0.03 0.025 0

5

10 LVs

15

0.035 0.03 0.025 0.02 0

20

RMSECV/RMSEP

0.04

RMSECV/RMSEP

RMSECV/RMSEP

0.07

0.04

0.035

0.02

5

pH, TSV on 2006

0.05

0

5

10 LVs

0.04 0.03

15

0.07 0.06 0.05 0.04 0

20

0.8 0.7 15

5

10 LVs

15

20

0.05 0.04

RMSECV/RMSEP

0.9 0.8 0.7 0.6 5

10

15

20

10 LVs

15

20

Fermentability, TSV on 2008 and 2009

0.8 0.7 5

10 LVs

15

0.8

0.6

20

0

5

10

15

20

LVs Extract, TSV on 2008 and 2009 1.5

0.9 0.8 0.7 0.6 0.5 0

1

Extract, TSV on 2007

1

1

5

1

0

1.1

0

0

1.2

0.9

20

Extract, TSV on 2006

20

0.06

RMSECV/RMSEP

10 LVs

15

0.07

RMSECV/RMSEP

RMSECV/RMSEP

1

10 LVs

0.08

Fermentability, TSV on 2007

0.9

5

5

pH, TSV on 2008 and 2009

1.1

0

0

0.09

Fermentability, TSV on 2006

RMSECV/RMSEP

0.05

0.02

20

RMSECV/RMSEP

RMSECV/RMSEP

RMSECV/RMSEP

0.06

1.1

RMSECV/RMSEP

15

0.06

pH, TSV on 2007

0.07

0.5

10 LVs

0.08

0.08

0.04

Soluble Nitrogen, TSV on 2008 and 2009

Soluble Nitrogen TSV on 2007

0.045

5

LVs

10 LVs

15

20

1

0.5

0

5

10 LVs

15

20

Fig. 2. Root mean square error of cross-validation (RMSECV, solid line), and root mean square error of prediction (RMSEP, dashed line) versus latent variables (LVs); (TSV: test set validation).

oped model has a good agreement with the reference method, which shows a reproducibility limit of 0.12. More complex is the situation about fermentability. Fig. 2g, h and i show that the choice of 13 latent variables is the best for samples from 2006 and from 2008 + 2009, while the model is over-fitted for 2007 data set, where the lowest error of prediction is obtained with 12 LVs. The RMSEP value obtained with 13 LVs on 2007 samples is 0.86% (Table 3), which is appreciably higher than the one calculated by both the cross-validations. We can conclude that the internal validations were too optimistic, and it can be realistic to consider this predictive power on future unknown samples. However, the reproducibility limit of the reference method is very high (2.8%) and this value allows a good agreement with the reference method even in the worst situation. The perfect agreement with the reference method is not reached for the fine extract calibration model. In fact, based on cross-validation results, 10 latent variables were chosen. As shown in Fig. 2l and n, this choice was right for samples from 2006 and from 2008 + 2009, but in both the data sets the errors of prediction obtained by test-set validations are appreciably higher than the ones calculated by cross-validations (Table 3). The model seems under-fitted for the 2007 validation (Fig. 2m), for which 12 latent variables are needed for the best prediction. Anyway, the RMSEP even in the best situation of 12 LVs is 0.70%dm (Table 3), which is higher than the RMSECV and which does not lead to a good agreement with the reference method. Then, considering the RMSEP value obtained with 10 LVs on 2007 and 2008 + 2009 samples, which is 0.77%dm, around 10% of predicted values fall outside

the range defined by the true values ± the reproducibility limit of the reference method. It is difficult to state how the model will predict the fine extract values of unknown samples analysed, for example, in 2010, but it is necessary to be conservative and consider an error of prediction higher than the one calculated by cross-validation and an agreement with the reference method lower than the 95%. 3.3. Accuracy and precision Moreover, the accuracy and precision of all the calibration models considered was estimated and statistically compared with the reference methods, with good results shown in Table 4. Three samples were chosen, one for each dataset, and analysed 10 times by NIR methods under repeatability condition. The accuracy was tested by checking that the mean of the 10 measurements falls inside the range defined by the ‘‘true’’ value chemically determined ± the extended uncertainty of the method. This test gave good results for the three samples for all the parameters considered. This result was expected considering the good agreement with the NIR calibration models and the reference methods. The precision was tested by mean of a v2 test to compare the variance calculated for the 10 measurement with the reference method one. Table 4 shows that the parameters FAN, friability, extract and fermentability give variance values statistically comparable with the reference methods. In fact, performed two-tailed v2 test demonstrates that the null hypothesis (the two variance values are statistically equal) cannot be rejected at the 1% significance

2647

V. Sileoni et al. / Food Chemistry 141 (2013) 2639–2648 Table 4 Comparison between the reference methods variances (r2) and the NIR models ones (sd2) by two tailed Chi square test or lower one-tailed Chi square test. Parameter

Viscosity (mPa⁄s) 0.012 T < 3.053

pH

Predicted variance (sd2) T Mean predicted ± sd True value ± Ue

0.01 T < 3.053

Soluble nitrogen (%dm) 0.014 T < 3.053

FAN (mg/L) 4.3 2.60 < T < 26.76

Friability (%) 0.7 2.60 < T < 26.76

Fermentability (%) 0.21 2.60 < T < 26.76

Extract (%dm) 0.20 2.60 < T < 26.76

3.15e005

6.53e004

9.5

0.4

0.06

0.06

9.68e005

0.032 1.533 ± 0.006 1.520 ± 0.099

0.975 5.95 ± 0.03 5.94 ± 0.08

26.51 120.2 ± 3.1 115.3 ± 15.6

6.88 88.5 ± 0.7 87.6 ± 3.5

3.43 80.29 ± 0.24 81.73 ± 1.98

3.60 82.07 ± 0.24 81.56 ± 0.85

0.086 0.659 ± 0.010 0.650 ± 0.064

Predicted variance (sd2) T Mean predicted ± sd True value ± Ue

4.12e005

7.27e004

3.6

0.6

0.12

0.08

4.61e005

0.041 1.544 ± 0.007 1.550 ± 0.099

1.088 5.83 ± 0.03 5.84 ± 0.08

10.05 143.5 ± 1.9 147.4 ± 15.6

10.29 92.9 ± 0.8 95 ± 3.5

6.86 80.17 ± 0.35 81.01 ± 1.98

4.80 81.33 ± 0.29 82.06 ± 0.85

0.040 0.603 ± 0.007 0.618 ± 0.064

Predicted variance (sd2) T Mean predicted ± sd True value ± Ue

8.59e005

5.91e004

5.7

0.8

0.08

0.05

1.02e004

0.083 1.517 ± 0.009 1.510 ± 0.099

0.888 5.80 ± 0.03 5.79 ± 0.08

14.98 141.1 ± 2.4 141.3 ± 15.6

13.71 85.1 ± 0.9 87.5 ± 3.5

4.57 79.28 ± 0.29 80.07 ± 1.98

3.00 82.76 ± 0.23 83.08 ± 0.85

0.086 0.647 ± 0.011 0.662 ± 0.064

r95 SAMPLE 1

SAMPLE 2

SAMPLE 3

FAN: free amino nitrogen; dm: dry matter; sd: standard deviation, v2: tabulated Chi square, a: significance level, m: degrees of freedom, T: calculated Chi square (m⁄sd2/r2), Ue: extended uncertainty.

level for all the samples analysed. This results lead to conclude that the developed NIR calibration models are as precise as the standard A-EBC methods. Concerning viscosity, pH and soluble nitrogen parameters, Table 4 shows that the calculated variance are significantly lower than the reference methods ones. In fact, for these parameters a lower one-tailed v2 was performed, and the results show that for all the samples analysed the calculated variance is significantly lower than the reference method ones. This outcome means that the NIR calibration models developed for these parameters are even more precise than the standard A-EBC methods for repeated measurements. This deduction can be justified considering that the NIR methods are supposed to be characterised by fewer experimental variables than the reference ones. In conclusion, the developed NIR methods are extremely stable and reliable, because repeating ten time the same measurement they give a variance (and then a repeatability) statistically equal or even lower than the reference method. This is a very important result. In fact, NIR calibrations for the determination of quality parameters on malt are already available, in the literature. However, the new NIR calibration models developed in this work (see Section 3.1 Results of calibration) were exhaustively evaluated in order to offer a realistic idea of their predictive power on future unknown samples and their applicability in the beer industry. Furthermore, the comparison with the repeatability values of the A-EBC reference methods confirm the reliability of the predictive power of the models and their suitability for the evaluation of the quality of brewing raw materials. 4. Conclusions New reliable NIR calibration models, suitable for the measurement of important barley malt quality parameters, were developed. The obtained results demonstrate that one single performance criterion often is not sufficient to judge a Near Infrared calibration model in the proper way avoiding to over- or under-estimation of the model quality. In fact, the error of prediction is a statistic, and the results demonstrated that it changes according to the number and kind of samples used to calculate it. It is therefore mandatory to check that the variations in the errors of prediction calculated trough different validations are not statistically significant, in order to give a realistic estimate of this error on future, unknown samples. Leave-one-sample-out

cross validation can be too optimistic, because the exclusion of one sample has a low perturbing effect on the model. The second type of cross-validation examined, the leave-33%-out, gives a more realistic idea of the predictive power of the model. From the comparison between the results of these two validation, it can be concluded that the developed NIR models are extremely stable and reliable because the error of prediction does not significantly change according to the type and number of samples used to measure it. Concerning the long-term effects, it is difficult to state how the model will predict considered parameters of unknown future samples, so it is necessary to be conservative and consider as probable the highest errors of prediction obtained with the external calibration. These error values are higher than the ones calculated by both cross-validations but lower than the reproducibly limits of the reference methods. This results allow to conclude that the estimates produced by the NIR calibration models developed for the measurement of pH, fermentability and soluble nitrogen agree with those produced by the reference method as well as a second laboratory repeating the reference measurement would agree under reproducibly conditions. This good agreement with the reference method can reasonably be expected also on future, unknown samples for all the parameters considered apart from fine extract. Furthermore, the variance calculated from ten repeated measurements is statistically equal or even lower than the reference method, demonstrating a good agreement between the precision of the NIR methods and the A-EBC reference methods. Concerning the transferability of the developed NIR calibration models, we can state that the calculated regression coefficients can be applied to any NIR spectrum collected on an unknown malt sample, opportunely pretreated and cut in the spectral range of interest, in order to obtain a rapid and reliable estimate of the desired quality parameter. Finally, it can be concluded that the new developed calibration models are specific analytical perfectly suitable for the evaluation of the quality of brewing raw materials.

References Andersen, C. M., & Bro, R. (2010). Variable selection in regression–A tutorial. Journal of Chemometrics, 24(11), 728–737.

2648

V. Sileoni et al. / Food Chemistry 141 (2013) 2639–2648

Black, C., & Panozzo, J. F. (2001). Utilising near infrared spectroscopy for predicting malting quality in whole grain barley and whole grain malt. In Proceedings of the 10th Australian barley technical symposium. Canberra, ACT, Australia. Conzen, J. P. (2006). Multivariate calibration—A practical guide for developing methods in the quantitative analytical chemistry. Ettlingen, Germany: Burker Optik GmbH. Cozzolino, D., Kwiatkowski, M. J., Waters, J., & Gishen, M. (2007). A feasibility study on the use of visible and short wavelengths in the near-infrared region for the non-destructive measurement of wine composition. Analytical and Bioanalytical Chemistry, 387, 2289–2295. Cruciani, G., Baroni, T., Clementi, S., Costantino, F., Riganelli, D., & Skagerberg, B. (1989). Predictive ability of regressions models. Part I: Standard deviation of prediction errors (SDEP). Journal of Chemometrics, 3, 499–509. Dahm, D. J., & Dahm, K. D. (2001). The physics of near infrared scattering. In P. Williams & K. Norris (Eds.), Near-infrared technology in the agricultural and food industries (pp. 1–18). St. Paul, Minnesota, USA: American Association of Cereal Chemists. EBC (European Brewery Convention) (2006). Analytica-EBC. Nurnberg, Germany: Fachverlag Hans Carl. Geladi, P., & Kowalski, B. R. (1986). Partial least-squares regression: A tutorial. Analytica Chimica Acta, 185, 1–17. Henry, R. J. (1985). Evaluation of barley and malt quality using near-infrared reflectance techniques. Journal of the Institute of Brewing, 91, 393–396. Kunze, W. (2004). Malt production. In W. Kunze (Ed.), Technology brewing and malting (pp. 97–194). Germany: VLB Berlin. Marconi, O., Sileoni, V., Sensidoni, M., Amigo Rubio, J. M., Perretti, G., & Fantozzi, P. (2011). Influence of barley variety and Southern Europe environmental and agronomic factors for malting and brewing. Journal of the Science of Food and Agriculture, 91, 820–830. Marte, L., Belloni, P., Genorini, E., Sileoni, V., Perretti, G., Montanari, L., et al. (2009). Near infrared reflectance models for the rapid prediction of quality of brewing raw materials. Journal of Agricultural and Food Chemistry, 57, 326–333. Martens, H., & Stark, E. (1991). Extended multiplicative signal correction and spectral interference subtraction: New preprocessing methods for near infrared spectroscopy. Journal of Pharmaceutical and Biomedical Analysis, 9, 625–635. MEBAK, (2002). Sudwerkkontrolle, Wurze, Bier, Biermischgetranke und AfG (4th ed.), Mitteleuropaische Brautechnische Analysenkommision, Freising, Germany. Munck, L., Møller Jespersen, B., Rinnan, Å., Fast Seefeldt, H., Møller Engelsen, M., Nørgaard, L., et al. (2010). A physiochemical theory on the applicability of soft mathematical models – Experimentally interpreted. Journal of Chemometrics, 24, 481–495. Naes, T., Isaksson, T., Fearn, T., & Davies, T. (2002). Univariate calibration and the need for multivariate methods. In T. Naes, T. Isaksson, T. Fearn, & T. Davies

(Eds.), A user-friendly guide to multivariate calibration and classification (pp. 11–18). Chichester, UK: NIR Publications. Nørgaard, L., Saudland, J., Wagner, J., Nilesen, J. P., Munck, L., & Balling, Engelsen. S. (2000). Interval partial least-squares regression (iPLS): A comparative chemometric study with an example from near-infrared spectroscopy. Applied Spectroscopy, 54(3), 413–419. Osborne, B. G. (2008). NIR analysis of cereals products. In D. A. Burns & E. W. Ciurczak (Eds.), Handbook of near-infrared analysis (III ed., pp. 399–414). New York, USA: CRC Press, Taylor and Francis Group. Rinnan, A., van den Berg, F., & Balling, Engelsen S. (2009). Review of the most common pre-processing techniques for near-infrared spectra. Trends in Analytical Chemistry, 28(10), 1201–1222. Romía, M. B., & Bernàrdez, M. A. (2008). Multivariate calibration for quantitative analysis. In Da.-Wen. Sun (Ed.), Infrared spectroscopy for food quality analysis and control (pp. 51–79). Burlington, USA: Academic Press is an imprint of Elsevier. Siesler, H. W. (2008). Basic principles of near-infrared spectroscopy. In D. A. Burns & E. W. Ciurczak (Eds.), Handbook of near infrared analysis (III ed., pp. 7–20). New York, USA: CRC Press, Taylor and Francis Group. Sileoni, V., Perretti, G., Marconi, O., & Fantozzi, P. (2009). Evaluation of malt quality by near-infrared spectroscopy in reflectance. In Proceedings of the 32nd Congress of the European Brewery Convention. Hamburg, Germany. Sileoni, V., Perretti, G., Marte, L., Marconi, O., & Fantozzi, P. (2010). Near-infrared spectroscopy for proficient quality evaluation of malt and maize in beer industry. Journal of the Institute of Brewing, 116(2), 134–139. Sileoni, V., Marconi, O., Perretti, G., Buiatti, S., & Fantozzi, P. (2010). Long-term NIR calibration for malt extract. In 9th edition of the international symposium ‘‘Trends in Brewing’’. Ghent, Belgium. Sileoni, V., van den Berg, F., Marconi, O., Perretti, G., & Fantozzi, P. (2011). Validation strategies for long term effects in NIR calibration models. Journal of Agricultural and Food Chemistry, 59, 1541–1547. The American Society for Testing and Materials (ASTM) Practice E1655–00. (2001) In ASTM Annual Book of Standards, 03.06. West Conshohocken, PA 19428–2959 USA, 573–600. The Barth Report hops 2010-2011 (2010–2011). World Beer Production 2009/2010. In The Barth Report hops 2010-2011 (pp. 7–8). Nuremberg, Germany: Barth-Haas Group, Joh. Barth & Sohn Gmbh & Co KG. Williams, P. C. (2001). Implementation of near infrared technology. In P. Williams & K. Norris (Eds.), Near infrared technology in the agricultural and food industries (pp. 145–170). St. Paul, USA: American Association of Cereal Chemists. Woodcock, T., Downey, G., & O’Donnel, C. P. (2008). Review: Better quality food and beverages: The role of near infrared spectroscopy. Journal of Near Infrared Spectroscopy, 16(1), 1–29.