A comparison of methods for linear prediction of apple flavour from gas chromatographic measurements

A comparison of methods for linear prediction of apple flavour from gas chromatographic measurements

(1993) 215-222 Food Quality and Pwjiienw4 ACOMPARISONOFMETHODSFORLINEAR PREDICTIONOFAPPLEFLAVOURFROM GASCHROMATOGRAPHICMEASUREMENTS Per Brockhoff ,a...

894KB Sizes 3 Downloads 62 Views

(1993) 215-222

Food Quality and Pwjiienw4

ACOMPARISONOFMETHODSFORLINEAR PREDICTIONOFAPPLEFLAVOURFROM GASCHROMATOGRAPHICMEASUREMENTS Per Brockhoff ,a lb Skovgaard,b Leif Pollc & Keld HansenC “Centre of Food Research, department

of Mathematics and Physics, “Department of Dairy and Food Science, Royal Veterinary and Agricultural University, Thorvaldsensvej 40, DK-1871 Frederiksberg C, Denmark

(Received 13 Api1 1993; accepted 13 Sejhmber 1993)

In building

ABSTRACT

classical

ments was perfiid. The data used cameporn an experiment on the effect of storing apples at various oxygen concentrations. Partial least-squares regression and continuum regression showed the best performance, measured by a twc+step cross-validation pn’ncipk The traditional prediction error sum of squares (PRESS) overestimated the predictive ability of a multiple linear regression awoach. The quality of the predictions of sensory properties from GC analyses was measured

in terms of a ‘panel size

equivalent ‘. Thus, the predictions obtained in the pesent study were as accurate as predictions from an assessor panel consisting of 4-9 assessors, depending on the sensory P-operty in question. Kqrwords: Continuum linear

regression; cross validation; gas prediction

methods;

problem

predictive

of aroma

models

components

of multicollinearity,

based on

we face the as these

com-

A conventional multiple linear regression approach is not advisable, owing to extreme uncertainty about the parameter estimates. Researchers have turned to the biased regression techniques, of which the most common are principal component regression, ridge regression and partial least-squares regression (see Hoer1 & Kennard, 1970; NZS & Martens, 1989). More recently, the approach of continuum regression was proposed by Stone and Brooks (1990). The continuum regression embraces principal component regression, partial least-squares regression and ordinary least-squares multiple linear regression. This is achieved by the introduction of a parameter, varying in a continuum, in a generalized factor selection criterion. The possible values of this parameter represent a continuum of possible regression models with principal component regression and multiple linear regression at the two extremes and partial least-squares regression as an intermediate point. Sundberg (1993) revealed a close connection between first-stage continuum regression and ridge regression, which we have used in this paper to implement a version of continuum regression as described in the Appendix. The continuum regression and its relationships to the well-known methods are briefly outlined in the section on statistical methods. We also include two variants of a multiple linear regression approach where model selection is based on the PRESS statistic from cross-validation. This approach was taken by Kowalski (1990)) and showed multiple linear regression to be superior to any of the biased techniques for five data sets. The main objectives of this paper are to compare the predictive abilities of these methods and to find a suitable prediction model for the sensory properties of ‘Jonagold’ apples. We used twestep cross-validation to evaluate the predictive ability of the various models. ponents

A comparative study of linear methods for prediction of sensory projiles from gas chromatography (GC) measure-

chromatography;

up sensory

GC measurements

multiple

linear regression; partial kast squares; prediction ability; principal component regression; ridge regression; sensory analysis.

INTRODUCTION The idea of calibrating instrumental measurements with sensory information is of major interest for food research and the food industry. This paper will compare and discuss linear calibration methods based on data from sensory and gas chromatography (CC) analyses of ‘Jonagold’ apples stored at various 0, concentrations. 0 1994 Elsevier Science Limited 095@3293/94/$07.00 215

are closely

related.

2 I6

Per Brockhoffe t al.

The idea of this is to partition groups,

and then

‘training

in turn

two of the groups

as

sets’ and one as the ‘test set’. This is referred

to as the second refers

the data into, say, three

treat

cross-validation

step.

to the use of cross-validation

selection

in each training

criterion

for the predictive

by a selection

The

first step

set. In this way, we avoid the performance

being deflated

a new way of reporting

this pre-

dictive ability of a model based on a given data set from a designed assessors

experiment, needed

by reporting

in a sensory

sensory attribute

in question

As the purpose statistical

aspects

scription

of experimental

in a section

of

to predict

is to investigate

materials

problem,

of the total volatile production statistical

Sensory analysis The apples were evaluated 8-10

by a trained

assessors six times during

Approximately placed

six apples

in 3litre

from

glass jars

will

method

is

to prevent

sensory panel of

the post-storage each

period.

treatment

with lids. The

pression smell

assessors from gaining

were

apples

were

of the fruits,

produced

by the

instructed

to give points properties:

anise,

were chosen during

apples.

The

by the panel

the training

by profile the

assessors

on a 0-5-point

intensity,

green,

musty and preference.

were

scale for the banana,

These

pine-

properties

on the basis of discussions

sessions.

For convenience

as a sensory attribute

is not a descriptive

MATERIALS

and were evaluated

following apple,

any visual im-

we refer

as well, although

it

property.

STATISTICAL

METHODS

Analysis of variance

November

1987. Immediately

were transferred

to four containers,

N, and 21%

only oxygen notation.

were harvested

after harvest,

at 2°C in 1% O,-99%

O,-96%

O,-79%

Two storage

days, were considered.

where

they were N,, 4%

N,, respectively.

Below,

periods,

of 109 days and

Further

details

and allowed to ripen in normal

et al.

from the

atmosphere

period).

tiles were performed

by dynamic

measurements

assessor scores,

according

= Assessor t Day t Treatment Day X Treatment

where

we have used

a notation

effects

and interaction

included.

ment’ refers to the storage to the post-storage

indicating

+

(I)

the main

In this model

conditions

to

‘Treat-

and ‘Day’ refers

time.

in question

we then

The

volatiles

sampling, (1990)

produced

10 or 11 times during

by a Poropac

and

by the the post-

trap, eluted with ether and

to the gas chromatograph.

However,

as sen-

and GC analyses were not made on the

same days, linear interpolations

E(score)

of the vola-

headspace

in detail by Poll and Hansen

sory evaluations

(109 days and 190

effect on the sensory

tested

the reduction

to

the model (GC)

storage period

of the original

E(score)

attribute

Gas chromatography

et al. (1992).

of variance the model

To assess the overall treatment

Analysis of the volatiles

apples were collected

periods

days) the sensory data were initially analysed by analysis

190

on the apples

have been given by Hansen

at 20°C for up to 40 days (post-storage

as described

For each of the two storage

are used as a shorthand

After storage, the apples were removed

containers

on 11

the apples

N,, 2% O,-98%

concentrations

and storage conditions

injected

1%

for further

the

to other pub

the statistical

Apples of the variety ‘Jonagold’

Hansen

were chosen

analysis, where the assessors were asked to evaluate

Storage

(1992).

only GC

analysis.

to ‘preference’

stored

reasons,

more than about

the

by itself.

EXPERIMENTAL AND METHODS

For computational

covered

the de-

and methods

by references

For the same reason,

presented

panel

of the prediction

be brief and partly covered lications.

the number

equally well.

of this paper

periods).

for model/factor

based on the same criterion.

Finally, we present

storage

values for the 15 esters producing

were performed

on the

GC data to obtain GC values corresponding to the days of sensory evaluation. This seemed reasonable, as the time profiles of the GC results were fairly smooth. As sensory evaluations were performed six times during the post-storage period, the GC data set consists of 48 samples (six times, four storage conditions and two

by an Rest.

Only

ficant treatment

= Assessor t Day

attributes

exhibiting

effects were considered

(2) clearly

signi-

in the further

analyses.

Two-step cross-validation The

48 samples

in X matrix

were divided

into

three

subsets of 16 observations. The three subsets were chosen systematically to be similar with respect to shortor long-term

storage,

average post-storage

and average

oxygen treatments. Each subset was then in turn regarded as a test set, and the remaining two subsets as a training set. We denote the three 32 X 15 dimensional training

set matrices

by X,, X, and X,, respectively.

Flavour Prediction from GC Measurements

For each of the three training sets, full cross-validation was performed to ‘optimize’ each of the prediction methods. This means that each of the 32 samples was in turn left out of the estimation and then predicted. The ‘optimization’ is to be understood in a broad sense including model selection, variable selection and estimation, the details depending on the method being investigated. For example, this type of cross-validation was used to include or exclude variables in multiple linear regression, and to choose the number of factors in principal component regression and partial least-squares regression. We denote the mean score for the sensory attribute in question by yI, where i is the observation number for the data set. Each method leads to a prediction function and consequently to a ‘predicted value’ j,. For each training set, indexed by k = 1, 2, 3, the PRESS statistic is

2 17

methods; these included MLR, PCR, PLS, ridge regression (RR) and continuum regression (CR). Each of these methods obtains a linear function of the GC measurements, predicting sensory attribute. The 32 X 15 GC data matrices, the X,, matrices, were in all applications centred over samples; i.e. the mean of the 32 measurements was subtracted for each of the esters. Models with standardized as well as original data were used, except in MLR for which scaling makes no difference. For the standardized versions two slightly different standardizations were used. In the standardized PCR and PLS the X, matrices (k = 1, 2, 3)) were columnwise standardized by column standard deviation; i.e. for each X, matrix the original (centred) xii element was replaced by the standardized element x11 Z,]= Sl where

*=I

and we define the mean root prediction squares for the training sets as

error sum of For RR and CR the column root sums were used to convert X, to correlation form; i.e. the standardized element was

MRPRESSl=;$ (&PRESS,): In a comparison of the methods we want to use error in predicting the test sets as a measure of predictive ability. To obtain this prediction error, the model in question was fitted to all 32 observations in the training set and used for prediction in the test set. Thus, for each test set we obtained a prediction error sum of squares

%

xv=css,,t where 32

ss,= c

XT,

MRPRESSl gives the cross-validatory index traditionally reported in analyses, averaged over the three partitions. Instead of this, we used MRPRESSS. Apart from giving a more realistic estimate of prediction error, this gave a comparison of the various models for equal conditions. In particular, in a comparison between multiple linear regression (MLR) methods and principal component analysis (PCR) and partial least-square regression (PLS) methods, MRPRESSl can be expected to favour MLR methods, as both variable selection and number of variables are determined by cross-validation. For PCR and PLS only the number of factors is found by cross-validation.

We used the latter standardization for RR and CR, in accordance with standard ridge regression (see Hoer1 & Kennard, 1970). For MLR two approaches were applied. By MLRl we denote a forward selection procedure which at each step includes that variable among the remaining variables giving the smallest cross-validation index (PRESS). This was continued until the PRESS statistic increased. By MLR2 we denote a corresponding backward selection procedure starting with all variables included; this procedure was continued until the last variable was excluded, and then the model with smallest PRESS statistic was chosen. The prediction of the sensory score by its average, before centring, for the 32 cases in the training set is included for comparison, and is referred to below as the ‘constant’ prediction method. As this method does not use the GC measurements at all, it provides a reference from which we may see how much is gained by use of the GC data.

Linear prediction methods

Continuum regression

Assessor mean scores of the sensory attributes were related to GC measurements of flavour volatiles by several

We let X denote a column mean centred 72 X p matrix of GC data and y an 12vector of assessor mean scores for

16

PRESS2, = c (r, -j,)P /=I leading to the mean root prediction error sum of squares MRPRESS2 =; i 3

I=,

(6

PRESSB,)’

(10

/

218

Per Brockhoffet al.

one sensory attribute. The ordinary least-squares dictor (MLR) and any component of PCR and PLS the form c’x, where c is a p vector of parameter mates and x is a p vector of GC measurements. For stage of the CR, the vector c is chosen to maximize criterion T,=

(cls)‘(clsc)

preis of estione the

(Za-l)/(l-a)

withs=X’y,S=X’XandOIaIl.Hereaisthecontinuum parameter, which can be chosen by crossvalidation. Stone and Brooks (1990) showed that cr = 0 corresponds to MLR, (Y = i corresponds to PLS and (Y= 1 corresponds to PCR. Moreover, Stone and Brooks (1990) pointed out that the predictors for these specializations may be referred to as ‘canonical correlation’, ‘canonical covariance’ and ‘canonical variance’, respectively. Thus CR chooses the ‘best’ predictor among a huge set of possibilities, including the standard methods, and a pre-decision of which to use is not necessary. If more than one stage of CR is applied, every c is chosen to maximize the criterion (3) under the constraints of being orthogonal to all of the previous chosen cs. However, we consider first-stage CR only. Sundberg (1993) argued that first-stage CR is no worse than the general approach of RR. In the Appendix, the theoretical result by Sundberg relating first-stage CR to RR is given, as this is applied directly in this paper to find the optimal (Yvalue and corresponding predictor c. It should be noted that this implementation of CR only involves a search in the continuum ranging from MLR (a = 0) to PLS ((Y = t), and the part ranging from PLS to PCR (a = 1) is omitted.

Gpredwith a non-zero variance would result in the mean squared prediction error Var(&,,,-

P) =Xcr2+Var(&,)

Let us suppose now that the mean score from the sensory evaluation were to be predicted by another panel with m members assessing apples from the same combination of treatment and storage time. The resulting mean squared prediction error would be Var(&,,,-

P) =

k+i cr2 i 1 which may be used as a measuring stick for the prediction errors based on the CC measurements. As n is the number of assessors in the investigation, and a2 is estimated from the analysis of variance, m may be determined so that the expression above matches the mean squared prediction error achieved in each case. The result is the number, m, of panel members required to obtain a prediction of the same quality as that based on the GC measurements.

RESULTS Analyses of variance The F and P values for tests of model (2) against model (1) are listed in Table 1. Flavour intensity, banana flavour and preference are seen to be the only attributes showing clear significances for both 109 and 190 days of storage, and consequently only prediction of these from the GC analyses was attempted.

Panel size interpretation of prediction ability

Prediction results for intensity, banana and preference

When attempting to predict the average score for a certain sensory attribute, say banana flavour, it makes sense to ask how precisely this value is measured. Even a perfect prediction function cannot hit the target without error, owing to the assessor score variability. The variance of the mean score caused by this variability can be estimated from the data by use of model (1). Let us consider the given experiment. For each storage time and each combination of ripening times and oxygen concentrations, n assessors judged the banana flavour, n being 8 or 10 depending on storage period. The mean score y for a given treatment and ripening time then has the variance

The 15 esters chosen for predictions are listed in Table 2, together with their concentrations and relative headspace distributions at two particular combinations of storage time and treatment. This shows the relative importance in magnitude of the chosen volatiles, and also that a considerable percentage of the concentration of

Var( Y) =Xcrz where u2 is the individual assessor score variance estimated as the residual variance in the analysis of variance based on model (1). This would be the resulting prediction error for a ‘perfect’ method exactly hitting the mean p. More realistically, an unbiased predictor

TABLE 1. F Statisticsand P Values for the Significance of

Storage Treatment Effects Based on Analyses of Variance of RawAssessor Scores for 109 days and 190 days, respectively Variable

Intensity Green Banana Pineapple Anise Musty Preference

190days

109days F

P

F

P

11.34 0.68 10.36 1.11 0.91 1.36 5.90

<0*0001 0.83 <0~0001 0.35 0.57 0.16 <0*0001

10.52 0.83 11.58 2.99 1.34 1.77 7.48

<0*0001 0.66 <0~0001 <0*0001 0.17 0.033 <0*0001

Fluvour Predictionfi-om GC Meawrements TABLE

2. Effect of 0,

No.

Concentration on 15 Flavour Volatiles Relative distribution in headspace (% of total)

Production of volatiles

Compound

(cLg/(k

1))

1% 0,

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

21% 0,

1% 0,

21% 0,

Hexyl hexanoate

0.02 0.05 0.01 o-21 1.44 0.03 0.02 0.03 o-05 0.16 0.01 0.01 0.06 0.07 0.03

0.74 0.19 0.20 5.37 2.57 0.85 0.21 0.65 0.30 3.00 0.17 0.42 0.99 0.78 0.59

0.75 1.38 0.18 5.74 38.30 0.88 0.56 0.93 1.42 4.27 0.47 0.32 2.03 2.35 1.27

4.62 1.04 1.31 24.09 14.92 4.19 1.13 2.76 1.56 14.40 1.05 2.18 4.56 3.65 2.53

Total

3.50

19.85

60.85

83.99

Propyl acetate BMethyl-propyl acetate Propyl propanoate Butyl acetate 2/3Methylbutyl acetate Butyl propanoate Pentyl acetate

Butyl butanoate Butyl2/3methylbutanoate Hexyl acetate Propyl hexanoate Hexyl propanoate Butyl hexanoate Hexyl2/3methylbutanoate

Production and relative distribution in headspace are values after 190 days in 1% O,-99% N, or 21% O,-79% (ambient atmosphere) and post-storage ripening for 7 days in ambient atmosphere at 20°C.

flavour volatiles is not taken into account in the predictions. The degree of multicollinearity for the 32 X 15 training data matrices, measured by the matrix condition numbers for the corresponding 15 X 15 matrices of cross products, were approximately 996 000, 663 000 and 1048 000, respectively, for the three training sets. These numbers indicate a high degree of multicollinearity. The prediction results are presented in Tables 3-5. In terms of the MRPRESS:! results, the unstandardized PLS and CR performed best, with a slight superiority of the PLS. Also, the number of factors was smallest for the PLS models. This confirms previous PLS experience. In fact, only one factor is needed for the optimal unstandardized PLS. Thus, this single-factor PLS is a special case of the first-stage CR, which is reflected by the smaller MRPRESSl values in Tables 4 and 5 for CR as compared with PLS. Nevertheless, we note that

N,

for two out of three test sets the prediction errors (MRPRESSB) for PLS were slightly smaller than for CR, reflecting an increase in prediction error as a result of the estimation of an extra parameter. An interesting fact is that the unstandardized models in general perform better than the standardized ones. The MRPRESS2 values can be compared with the value corresponding to the ‘constant’ prediction which does not use the GC measurements at all. This shows that even by optimal choice of prediction model, the reduction in standard prediction error is only between 35 and 40% compared with the most naive choice of prediction-an illustration of the high degree of uncertainty in sensory data of this kind. Also striking is the fact that if prediction ability were to be judged from the MRPRESSl values, the MLR2 approach would clearly be the winner, and MLRl would be comparable with PLS and CR. How-

TABLE 3. Prediction Results for Flavour Intensity Model

PCR PCR PLS PLS MLRl MLR2 RR RR CR CR Constant

219

Standardization

hfRPRESS1

1 l/S 1 l/s 1 1

0.476 0.490 0.474 0.493 0.463 0.435

l,(lSS)~ 1&,:

0.489 0.480 0.489 0.474

1

0.716

Optimization parameter

1 2

1 1.3 2.7 6.3 17.65 1.03 0.62 0.25 0

MRPRESSS

0.464 0.483 0.458 0.481 0.466 0.640 0.460 0.469 0.457 0.472 0.702

Optimization parameter, which is averaged over the three training sets, refers to the number of factors for PCR and PLS, the number of variables for MLRl and MLR2, the ridge parameter for RR and the continuum parameter for CR.

220

al.

Per Brockhoffet

TABLE 4. Prediction

Results for Banana

Model

Flavour

Standardization

MRPREsSl

Optimization parameter

MRFTw‘ss2

1 l/s 1 l/S 1 1 1

0.514 0.561 0.515 0.561 0.524 0.493 0.526 0.558 0.513 0.555 0.858

1.3 1.3 1 1.3 3 5.7 18.56 0.89 0.75 0.40 0

0.538 0.577 0.519 0.578 0.578 0.743 0.535 0.564 0.525 0.565 0.855

PCR PCR PLS PLS MLRl MLR2 RR RR CR CR Constant

j 1’0 l/(SS$ 1

Optimization parameter, which is averaged over the three training sets, refers to the number of factors for PCR and PLS, the number of variables for MLRI and MLRS, the ridge parameter for RR and the continuum parameter for CR.

TABLE 5. Prediction Results for Preference Model

._

Standardization

PCR PCR PI8 PLS MLRl MLR2 RR RR CR CR Constant

MRPRESSl

1 1/s 1 I/s 1 1 1 l/(Iss) a

Optimization parameter

MFwREss2

1.3 1.3 1 1 2.3 9.3 19.67 1.17 0.77 0.45 0

0.466 0.479 0.451 0.475 0.505 0.938 0.469 0.489 0.456 0.485 0.735

0.462 0.494 0.466 0.495 O-469 0.417 0.474 0.495 0.465 0.490 0.751

l/(S.s) i 1

Optimization parameter, which is averaged over the three training sets, refers to the number of factors for PCR and PLS, the number

of variables for MLRl

ever, in MRI’RESS:!

the MLRl

and CR, and MLR2 attribute training

and MLR2,

is clearly worse than PLS

is disastrous.

it even performs set preference

the ridge parameter

For the preference

much worse than using the

for RR and the continuum

an unstandardized observations.

Variables

components

PLS Analysis of banana flavour

scores

from

In this section,

of flavour

separate

a brief report of our sensory-instrumenFigures

1-4 contain

the results of

together

with the fact that these are the

measured

high and low oxygen intensity Figure

We

on all 48

4, 5 and 10 have large loadings in concentration.

the first PLS factor

tical pictures.

flavour.

on ‘full’ cross-validation

(see Table 2), consistent

average as test set prediction.

for CR.

PLS analysis of banana

base this presentation

major

tal study is presented.

parameter

The

seem in particular

treatments.

and preference

to

PLS analyses

give almost

4 shows the prediction

iden-

function

with the 48 observations.

Panel size interpretation of prediction ability By substitution

of the MRPIZESSS

PLS for, say, banana

flavour

for the unweighted

(see Table

4), on the left-

hand side of (4)) we obtain 1 0.5192 -=--m a2

I

I

0

FIG.

1

2

3

4 5 6 7 6 9 10 11 12 13 14 15 Number of PLS factors included

1. Cross-validated residual variance from the set of 48

observations from an unstandardized PLS analysis of banana flavour and the 15 aroma components.

1 n

and the corresponding panel size m can be calculated. The results are summarized in Table 6 for the sensory attributes

in question

and the PLS predictions.

that the GC measurements lead to predictions parable with the results from a sensory panel tween four and nine assessors.

We see comof be-

Fluvour Predictionfiom GC Measunments

221

DISCUSSION The important sensory attributes for ‘Jonagold’ apples in the present study were flavour intensity, banana

1

3

2

4

5

6

7

El

9

10 11 12 13 14 15

flavour and preference.

The PLS and CR methods

shown

best

to provide

the

attributes

in question,

15 aroma

volatiles.

based

That

FIG. 2. Loadings of the 15 aroma components on the first PLS factor from an unstandardized analysis predicting banana flavour.

is in good

agreement

one

factor

continuum

ranging

as a strong

For

This

leaves

candidate

the given

MLR

models

data,

I

FIG. 3. Scores relative to the first PLS factor from an unstandardized PLS analysis of banana flavour and the 15 aroma components. The 48 observations are ordered according to the four storage conditions.

the

methods

have been

developed,

of the

linear

form

prediction methods.

transformations

and

and quadratic

stage of our analysis, results. These

not pursued further.

parametric

version.

the part

several other possible

logarithmic

terms without

approaches

Non-linear

were not applied.

and non-

Such

methods

not only in the statistical

litera-

ture, but also in chemometric literature (see Cruciani et al., 1992, and references therein). Recently, Sutter et al. (1992)

advocated

the selection

ability rather

idea already proposed

*

(see

CR in its complete

of improved

were therefore

on predictive

.

of factors

experience

only

with multiplicative

any indications 121 % Oxygen

of

better

((Y = 0) to PLS (cr = i)

among

were tried at an earlier

Observations

and

from MLR

approaches. There are, of course,

Oxygen

slightly

only in a very restricted

was used

was examined.

(4 %

on CC measurements

PLS performs

with previous

that it was used here

Oxygen

three

Noes & Martens, 1985; Naps et al., 1986; Kowalski, 1990). The CR is new in this context. It is important to note Only

I2 %

were

of the

than (or similar to) PCR with a small number

Aroma component

11 % Oxygen

predictions

A different

of factors in PCR based

than from the top down; an

by Noes and Martens

approach

(1985).

to the predictions

of sensory

profiles is to build up models based on raw data rather than pre-averaging over judges. NZS and Kowalski (1989) presented cluding

several ways of performing unfoldings

and factor

seems to be very attractive, assessors and treatment

such analyses, in-

models.

This approach

as the interaction

effects

between

may be included

in the

modelling. A two-step cross-validation

t 0

2

4

6 PCS factor 1

0

10

Variable

Intensity Banana Preference

109days(n=8)

predictive

mize’ a particular

FIG. 4. Fitted linear relationship between factor 1 and banana flavour from an unstandardized PLS analysis of banana flavour and the 15 aroma components; 72% of the variation in aroma component space is explained by factor 1. TABLE 6. Panel Size Interpretation

evaluate

the methods.

based

190 days (n = 10)

62

Panel size

$2

Panel size

0.66 0.79 0.84

5.2 4.6 8.5

0.63 0.77 0.81

4.3 4.0 6.6

was applied

to

and the other is different

to compare from

who used a single fixed

that of

test set for

prediction evaluation. Moreover, NES et al. did not use cross-validation in their MLR modelling. It has been

of PLS Predictions

method

This approach

Naes et al. (1986),

procedure

ability. One step was used to ‘opti-

on

shown

that the classical

‘full’ cross-validation

favours the MLR models. the similar

conclusion

Cruciani that

PRESS

(MRPRESSl

values)

et al. (1992)

this approach

statistic reached

cannot

be

used to compare regression methods. Nevertheless, Kowalski (1990) used this method to conclude that MLR

models

methods. This questionable.

give

a better

conclusion

prediction therefore

than

biased

seems

highly

222 The single

Per Brockhoffet generality

al.

of the conclusions

experimental

obtained

from

a

study, such as the one presented

here, may be questioned, of course, and our findings should be seen in conjunction with those presented by other workers. Concerning the experimental conditions, the use of only four containers from which apples were taken at varying times might cause some difficulties in the separation of treatment effects from ‘container variation’, although the comparison of prediction methods should not be affected systematically by such confounding factors. Finally, we have presented an interpretation of the predictive ability in terms of the number of assessors required to obtain a prediction of equal accuracy. This approach seems new, and offers a way of reporting and interpreting the results that reflects the underlying idea of replacing sensory evaluations by instrumental measurements.

ACKNOWLEDGEMENTS This

work was supported

as part of the F@TEK

of helpful

comments

from

APPENDIX

by the Danish

Education

NZS, T. & Martens, H. (1985). Comparison of prediction methods for multi-collinear data. Commun. Statist.-Simul. Cornput., 14(3), 545-76. Nxs, T. & Martens, H. (1989). Multivariate Calibration. Wiley, Chichester, pp. 116-65. NZS, T., Irgens, C. & Martens, H. (1986). Comparison of linear statistical methods for calibration of NIR instruments.J R. Statist. Sot. B, 35(2), 195-206. Poll, L. & Hansen, K. (1990). Reproducibility of headspace analysis of apples and apple juice. Lebenm.-Wiss. Technob, 23,481-3. Stone, M. & Brooks, R. J. (1990). Continuum regression: Cross-validated sequentially constructed prediction embracing ordinary least squares, partial least squares and principal components regression. J. R Statist. Sot II, 52(2), 237-69. Sundberg, R. (1993). Continuum regression and ridge regressi0n.J. K. Statist. Sot. B, 55( 3)) 653-9. Sutter, J. M., Kalivas, J. H. & Lang, P. M. (1992). Which principal components to utilize for principal components regressi0n.J. Chemometrics, 6,217-25.

Ministry

Programme.

two referees

of

A number

are gratefully

acknowledged.

REFERENCES Cruciani, G., Baroni, M., Clementi, S., Costantino, G., Riganelli, D. & Skagerberg, B. (1992). Predictive ability of regression models. Part I: Standard deviations of prediction errors (SDEP) . J. Chemomettics, 6,335-46. Hansen, K., Poll, L., Olsen, C. E. & Lewis, M. J. (1992). The influence of oxygen concentration in storage atmospheres on the post storage of ‘Jonagold’ apples. ZRbam.-Wiss. Technol., 25,457-61. Hoerl, A. E. & Kennard, W. (1970). Ridge regression: Biased estimation for nonortbogonal problems. Technomettics 12( 1)) 55-67. Kowalski, K G. (1990). On the predictive performance of biased regression methods and multiple linear regression. Chemometrics and Intelligent Laboratory Systems, 9, 177-84. Noes, T. & Kowalski, B. (1989). Predicting sensory profiles from external instrumental measurements. Food @al. I%$, 4, 135-47.

The relationship

between CR and RR

In the notation from the section CR, we let y = a/ (1 - (Y). Moreover, we let bcR (y) denote the vector c that maximizes (3), i.e. the CR estimates corresponding to continuum parameter y, and let bRR(6) denote the RR estimates with ridge parameter 6, i.e. P

(6) = (S + 61))’ S, 6 2 0

where Zis the n-dimensional identity matrix. Sundberg (1993) showed that for 0 I y < 1 b’:R(y) = r

with &Y

=-

I + ?_ I-Y

bRR (6)

(Al)

1

b”R(y)tSb”R(Y) 1 -y

bCR(Y)‘b”R(.y)

Y

bRR(Y)‘SbRR(Y)

1 -Y

bRR(Y)‘bRR(Y)

This means that letting the ridge parameter

(AZ) S vary from

we obtain from (A2) the continuum parameter y varying between zero and one, and from (Al) the CR estimates bCR(y).

zero

to infinity

(while

calculating

bRR(6)),