Confidence intervals for the triangle test can give reassurance that products are similar

Confidence intervals for the triangle test can give reassurance that products are similar

ELSEVIER Food@alityandP@rence6 (1995) 61-67 0 1995 Elsevier Science Limited Printed in Great Britain. All rights reserved 095~3293/95/$9.50+.00 0950...

896KB Sizes 1 Downloads 133 Views

ELSEVIER

Food@alityandP@rence6 (1995) 61-67 0 1995 Elsevier Science Limited Printed in Great Britain. All rights reserved 095~3293/95/$9.50+.00

0950-3293(94)000424-x

CONFIDENCEINTERVALSFORTHETRIANGLETESTCAN GIVE REASSURANCETHAT PRODUCTSARE SIMILAR A. W. MacRae School of Psychology, The University of Birmingham, Edgbaston, Birmingham, UK, B15 2lT (Accepted 3 January 1995)

surance about absence of a perceptible difference. It is certainly not the case that lack of a significant difference constitutes evidence that there is no difference. However, awareness of that fact is not universal, and the current international standard for the triangle test (ISO, 1983) offers no analysis to give reassurance that two versions of a product are similar though it is intended that the next revision of the IS0 standard will include ways of seeking reassurance that samples are nearly indistinguishable. One approach to assessing the strength of the evidence for similarity is a power analysis in which we determine not only an acceptable level (alpha) of Type-l error - falsely concluding that there is a real difference when only chance is at work in the data; but also an acceptable level (beta) of Type-2 error - failing to conclude that there is a difference when in fact there is one. Tables to assist in the task have been provided by Schlich (1993)) who also discusses the background and advantages of this style of analysis. However, Type-2 error does not have a unique value: for any alpha level and for any particular amount of data, the probability of Type-2 error depends on the size of difference that is considered to make a practical difference to the product that is, it naturally depends on just what counts as a Type-2 error. Thus the approach requires three parameters of the analysis to be specified in advance: alpha, beta and the smallest degree of detectability that matters. The analyst has a problem if the outcome of a power analysis falls near a borderline. After all, there is something rather arbitrary about these advance choices, especially selecting the smallest difference that matters in practice, so the analyst may wonder if the analysis should be repeated, perhaps with some change in the size of the smallest difference that is considered important, or in the level of beta considered acceptable. The alternative approach described here avoids that prob lem because it starts from the results rather than by choosing parameters in the abstract - or rather, only the significance level, alpha, needs to be selected. Because only one arbitrary parameter (alpha) is involved, the approach described here will usually be simpler to apply and is almost always more conceptually direct.

ABSTRACT One of the major advantages of the triangle test to set against its low statistical power is its potential for revealing disn’minable sensory daffences when the nature of the difference is unknown. That makes it an attractive tool for seeking assurance that there is no sensory difference between samples (after a process change, fm instance). Mere absence of signiJicant difference is completely inadequate to give that reassurance and power analysis is much better However; power analysis requires three somewhat arbitrary parameter values to be setected in advance. An alternative approach based on exact binomial confidence intervals is described which needs only a single paramete comparable to the alpha level in a test of signiJicance, to be specified. It is shown that the amount of data usually envisaged for seeking reassurance about lack of difference is much too small to do the task adequately.

INTRODUCTION The triangle test requires an assessor to locate the odd sample in a set of three, of which two are identical. It is widely used for sensory analysis in industry in spite of concern about its poor sensitivity (Ennis, 1990, 1993). One reason for its widespread use is the fact that an assessor can make successful judgements without knowing how the samples differ. That makes it useful for assessing the detectability of sensory changes in a product when a new source of raw material or a changed process is introduced, because in such cases, not only the assessors but also the production experts may be uncertain about the nature of the alteration, if any, in the finished product. However, the only statistical analysis generally used is calculation of the significance of any above-chance success rate. While that is a satisfactory way to demonstrate the existence of a difference, should that be needed, it is woefully inadequate for the purpose of giving reas61

62

A. W. Mac&e

BINOMIAL CONFIDENCE INTERVALS The approach focuses on confidence bounds for the estimate of success rate provided by the data. For any statistic calculated from a set of data it is possible to calculate two bounds, one above and one below the observed value, spanning a range of values called the confidence interval. The calculation places the bounds so as to give any desired degree of confidence that the true value of the statistic in question (here, the estimated probability of an assessor making a correct choice) lies within the interval. (Rather than speaking of the ‘true value’, many prefer to speak of the ‘value of the statistic in the population from which these particular results were sampled’ and the statistic calculated from the sample then gives an estimate of the corresponding parameter of that population.) As with a test of significance, there is always a possibility of error in the outcome. In a significance test, we choose an alpha level to set the probability of Type-l error that we consider tolerable. With confidence intervals, we choose a comparable alpha level to control the probability of the true answer lying outside the bounds. With a significance test, setting alpha too small brings a greater chance of failing to label a real effect significant. With confidence intervals, the penalty of too rigorous an alpha is that the bounds must be far apart so we obtain a wide range of credible values for our estimate of the population parameter. Confidence intervals are rarely invoked in sensory analysis but are a standard tool of statistics. Often, they are calculated from approximations based on the normal distribution, for example by Smith (1981) in a rare invocation of confidence intervals in sensory analysis, but that is only for convenience rather than by necessity. Here, they are calculated from the binomial distribution the exact distribution of probabilities of frequencies of right and wrong responses in a finite number of trials with the same probability of success on every trial. One of the first papers about confidence intervals (Clopper & Pearson, 1934) used the binomial distribution to illustrate the idea. Here, the approach has been adapted in an unusual way for convenient use with the triangle test. It is usual to express confidence as a percentage. For example, ‘the 90% confidence interval’ is the range of possible parameter values for which a directional alpha from each of the bounds is 0.05. That is, if we adopt the probability of success at the bound as a ‘null hypothesis’, the data differ from that with an alpha of 0.05 in a directional test. The sum of these alphas is O-10, giving a 10% chance of error in one direction or the other. With the triangular test, it is customary to invoke a directional hypothesis and that practice has been

followed here. Calculations here are for a 90% confidence interval so as to correspond to a directional significance test at the 0.05 level. The calculations required for exact binomial confidence intervals are simple in principle but too laborious to perform without a computer. The exact upper bound for a particular alpha is found by carrying out binomial tests of significance on the observed results for various values of the probability, p(C), of a correct choice being made. A trial value for p(C) is chosen which is greater than the observed proportion of correct choices. If a significance test using that p(C) as the ‘null hypothesis’ yields a probability greater than the target alpha, p(C) is increased. That is, it is made more different from the observed proportion of successes. If the probability is less than alpha, p(C) is moved closer to the observed proportion. This process is iterated until a value of p( C) is found for which the significance of the data is equal to the desired alpha. The lower bound is found in the corresponding way beginning with a trial value below the observed proportion. Figure 1 contains all the information needed for an analysis seeking reassurance that a sensory difference is small. The horizontal axis is scaled from 12 to 150, each scale point representing the total number of trials in a study. Ideally, that will equal the number of assessors, each making a single choice of odd item from a triangle of samples. Values are plotted in steps of three, partly to avoid crowding the graph but also because it is good practice to use multiples of three (and six is even better) to permit counterbalancing the position of the odd item across triangles). The vertical axis on the left shows the upper bound of a 90% confidence interval for the probability of making a correct choice. That is, the true value of p(C) is estimated to exceed the predicted bound on no more than 5% of occasions that the technique is used. (This, corresponds to a directional alpha of 0.05 in a test of significance, as commonly used with triangle tests.) The right vertical axis represents a transformation that is often applied to p(C) to represent the percentage of ‘detectors’ P(D) in the population if each individual either detects the difference reliably or guesses completely at random. The relationship between the axes is that P(D) = 15Op( C) -50. Although the model it relates to is very implausible, the transformation is widely used and is provided here for convenience. These scale values do not map neatly onto the grid lines, so if more than an approximation is wanted it is better to calculate it from p(C) since that can be read to about three decimal places. The curved line labelled ‘O’, represents the upper bound for a result where the number of successes is at the chance level - one third of the number of triangles. That is, on the left, four correct responses out of 12 result in an estimate of the population probability of correct responses having an upper bound of 0.609

Conjidence Intervals while, on the right, 50 correct bound

of 0.402.

these

probabilities

of ‘detectors’, The sent

Reading

the right-hand

can be translated

lines above it labelled bounds

exceeds

the

the number

The

sult the heavy line labelled ing to 75 triangles bounds

expected

of

much

correct

by chance

by

the difference,

that p( c) = 0.568

to give adequate

results with a non-directional

63

choices

in

an outcome

rate.

If this paper

Since

that trans-

of the population

that is not much reassurance

detecting at all! Even

only very limited reassurance

does no more

of simi-

is at the chance

than emphasize

level.

that point

for sensory analysts it will have served a useful function.

correspond-

Below-chance success frequencies

high. If so, it

appreciated assurance

than 40%

larity can be given if performance

and P(D) = 35%.

may all seem alarmingly

twice the chance

with 150 triangles,

is 25

that by 10 so we conposition

exceed

lates into more

yield 35 cor-

by chance

with as few as eight correct but, with twelve triangles,

Test

at the chance level (four correct) gives reassurance at the same alpha level of 0.05 only that accuracy does not

‘10’. We can read from the

it is not generally

data it takes

number

that 75 triangles

number

value of that line in the horizontal

Significant

into 41 and 10%

expected

of 75) and the result exceeds

is because

twelve triangles

1, 2, 3 and so on, repre-

when

1, 2, 3 and so on. Suppose rect responses.

These

can be obtained

scale,

respectively.

upper

responses

(l/3

out of 150 have an upper

from

fw the Triangle

how much

If a directional test is appropriate, then below-chance success frequencies, should they occur, generate lower

of similarity. alpha of 0.05

confidence

bounds

than

do

chance

outcomes.

0.9

0.8

60

0.6

0

-1

-2

‘3

-4

15

-5

30

-6

45

60

75

90 105 Number of Sets of Three

120

135

150

FIG. 1. Upper 90% confidence bounds for the estimated population proportion of correct responses in a triangle task as a function of the number of triangles presented and the excess of correct responses over the proportion expected by chance. Readings on the left represent a probability of correct response in the task which is unlikely to be exceeded by the true probability in the population. Readings on the right represent an estimate of the proportion of the population who detect the difference. Both conclusions are drawn with the same confidence as is given by a directional test of significance at the 0.05 level.

64

A. W. MacRae

90

9

0.9 80 0.8

70 8

60 0.7 50

7

0.6

40 6

30 0.5

0.4

5

20 IO

4

nV.“““” 2~ ,nn

15

30

45

60 Number

75

90

of Sets

105

120

135

lic?

of Three

FIG. 2. Lower 90% confidence bounds for the estimated population proportion of correct responses in a triangle task as a function of the number of triangles presented and the excess of correct responses over the proportion expected by chance.

(Although the outcome is attributed to the influence of chance alone, the particular outcome observed is less plausible for some higher value of the population parameter than is an outcome at the chance level.) However, if the true state of affairs is that the sensory difference is imperceptible there is no way to encourage belowchance outcomes to occur - they are just occasional happy accidents. They are irrelevant if the purpose is to demonstrate the existence of a perceptible difference, but if the purpose is to gain reassurance that the sensory difference is small, the analysis should take note of them if they occur. In that case, we use the lines in Fig. 1 labelled -1, -2, -3 and so on, referring to frequencies of success that are 1,2,3 and so on below the number expected by chance. If a belowchance outcome does occur, the upper bound may come as low as the chance level. It cannot come below chance since, in order for these lines to be meaningful, we must believe that no conceivable process can

generate

systematically below-chance

outcomes.

With 12 triangles, of which none are correctly identified, the upper bound is $. In the graph, that corresponds to the line labelled ‘-4’ being below the base of the graph in the position corresponding to 12 triangles. Since the chance outcome for 12 triangles is four correct, -4 represents no correct responses. As another example, an upper bound at chance level is given by 60 triangles of which 13 or fewer are correct (because the chance frequency is 20 and the line representing -7 is below the baseline). It must be emphasized, however, that although these outcomes are possible they are very unlikely and it would not be sensible to design a study in the hope that they will occur. The probability of zero successes in 12 triangles is less than 0.01 by chance alone, for instance. If a difference is truly imperceptible, belowchance and abovechance outcomes will occur about equally often.

Confidence Intervals for the Triangle Test them

THE LOWER BOUND

only by probabilities.

independent

CONFIDENCE

underlying

theory,

of the confidence

theory

performance

of a binomial

on

or any other

in particular.

confidence

interval

is

useful. Figure 2 is set out in the same way as Fig. 1, but dis-

equally suitable for other forced-choice difference tests but some, such as the duo-trio test or pair comparisons

plays the lower bound

where

the

require

different

we can be confident

-

interval can also be

principle

they are processes

and do not depend

threshold

model of psychophysical The lower bound

of decision

observed performance

signal detection The

For that reason,

of the view taken

65

that is, the value below which

with an alpha level of 0.05 that the

true probability of success does not fall. To say that the lower bound lies above the chance level is to exclude the chance

level and anything

for the population

lower as plausible

parameter

underlying

other words, the result is statistically From

Fig.

significant

2

we

can

at the 0.05

see

values

the result.

In

which

expectation,

5 above chance, graph

the correct

choice

is 30 above

for this

(perhaps

of sample,

and 50 correct

the chance

an

Fig. 2 allows us probability

to be significant.

ple, if there were 60 triangles

of an

as well as

value for p(C)

see that the highest which

plausible

we have in the reality

hypothesis

in

others

However,

so the

may be less for

If so, the analysis needs

to be differ-

Directional

and non-directional

hypotheses

If the number

of correct

is 0.734.

From Fig. 1 we is 0.90’7. Both

of a result

believe

that

no conceivable

cause a below-chance

as that

which

is just

are reason-

DISCUSSION

perceptible

process

could

outcome

difference

is known, the triangle forced

choice

strong

a

evidence

a below-chance

success

does

exist,

with

the

assessors

responses to it. explanation (a sys-

tematic

because

effect)

is not very plausible

that the assessors

have given

it is not easy to envisage

they could

systematically

A cause

haps not entirely is to assess the noticeability of some in any other situation where the

as especially

making systematically inappropriate With the triangle test, the second

by mistake.

than three-alternative

systematic

rate of success, we can interpret

take that view, we must regard

deliberately:

task is less efficient

can be explained

rate as no better than performance at the chance level. Indeed, it might even be interpreted as evidence that a

which

of the sensory difference

is less than the num-

in two ways. The first is that the result has happened by chance. The second is that there is some cause. If we

requires

When the purpose improvement, or

choices

the outcome

(stronger than a chance number of successes) of lack of perceptible difference. If we are not prepared to

of 20.

probability

at the 0.05 level and the bounds

by chance,

selections,

expectation

ably narrow.

direction

to one out of three

belowchance

are made with the same confidence

significant

tests and

For exam-

Our best estimate of the probability of success, p(C) , is 50/60, or 0.833, but from Fig. 2 we also see that the

statements

The proba-

is 5 in 3AFC

here may be usable.

of

that a di-

ent in some respects.

ber expected

is noticeable,

the outcome

lowest plausible

here

the probability

4 above

potential.

improvement)

the result

a target sample

these procedures.

is $,

is appropriate.

by chance

of a directional

over the usual tables

to find a lower limit for the plausible just declaring

test of significance

bility of success

chance described

alone is 4, provided

provided

If our aim is to show that a difference

assessor making

by

methods

which is 10 + 5, or 15, and so on. The

but it has further

intended

rectional

by chance

the graphs

that is 5 + 4, or 9. With 30 we need

has no advantage

purpose,

choices

plausibility

at the 0.05 level. With 15 trials, we again need chance

correct

are

12 to 150. With 12 trials, we need 4 above the chance expectation (which is 4), so 8 out of 12 are significant

success

The

of trials from

outcomes

level for numbers

of

graphs.

apply equally well to any task where

matching

significant.

probability

a mechanism

give the wrong

for below-chance

inconceivable

answers

by

answer is per-

even with the triangle

test but with the 3AFC a cause is certainly when the task concerns

it virtually

false responses

imaginable:

a low level of some flavour

de-

fect, say, it is possible that assessors will wrongly identify which type of sample has the defect and will systemati-

(3AFC). That was demonstrated by MacRae & Geelhoed (1992) and Geelhoed et al. (1994), for example.

cally tend to pick one of the pair as having the defect. How plausible such an explanation is, depends on

The

various

the

difference

is caused

task on assessors

decision

strategies

explanation

by the different

and

that

they follow

of the difference

demands

the consequently in each

of

features

of the

task,

sensory

If such effects

attribute

in

question

task. An

ered possible, two consequences follow, both of which weaken the power of the procedure to give reassurance

resides in signal-detection

and the assessors.

the

different

are consid-

theory and has been expressed in various ways by Ennis (1990,1993), Frijters (19’79)) Ura (1960) and others.

that sensory differences are small. Firstly, the test needs to be nondirectional,

However, the methods described here refer only to observed numbers of each kind of outcome and model

confidence interval is needed to give an alpha of 0.05, leading to wider bounds. (The graphs here can be

so a 95%

66

A. W. MacRae

used, however, an alpha comes

if they are interpreted

of O-1.)

Secondly,

may be evidence

any systematic chance

cause

since

for a perceptible can

should

strong evidence

occur,

for

they can (at best)

1 labelled

not be treated If any

be interpreted

with negative

if

a below-

against a difference.

same way as a success rate at the chance in Fig.

out-

difference

be envisaged

success rate, they should certainly

as especially

to an extent

as representing below-chance

in the

level. The lines

numbers

should

be

1 to require

The

with the tables by Schlich

is about

approaches

are to some

extent

yields the probability, ing

that

the

p, of Type-2

detectability

exceed some fixed amount. degree fixed

of detectability probability

levels

of

implausible.) correct

(/3 = 0.05)

here

are

of ‘detectors’

responses,

p(C)

, and

For

therefore

judged

to

the

+ 50)

(11 is 1 greater

which of

I call correct

it

may

be

of success

as it is to be lower than

undesirable

what reassurance grounds

possible

is required

for setting approach

and

some

to

specify

in

unless

there

are

particular

is to conduct

determine

the

criterion.

as large a test as

degree

of

reassurance

given by the results. Suppose that the test is, in fact, conducted triangles.

If the number

be 56 (that

of correct

is, 6 more

the manager

than

0.442

sponding

of correct

by reference

to 150 triangles.

using 150 turns out to

expectation),

with a confidence choices

to the curved

to 6, in a horizontal

corresponds

choices

the chance

can be reassured

of 0.05 that the probability exceed

the

the probability that as is desired is some-

position

A probability

line

corre-

corresponding

of being correct

to a percentage

level

does not

of ‘detectors’

of 0.442 of 150

X

0.442

- 50 = 16%, so the result gives reassurance

at the

usual

level

is no

higher

of confidence

that

the

percentage

than that.

as a percentage

we use one of the equa-

of how the two approaches and

11 correct

work

responses,

the

(1993) gives p = 0.00 for 50%

0.05 for 25% of ‘detectors’. obtained

of the per-

probability

p = 0.01 for 37.5%

as 0.533,

the

are tested or

must be below

Amount

of data needed

+ 150 or P(D) = 15Op(C) -50.

table in Schlich

of ‘detectors’,

be

is also expressed of choosing

an answer expressed

30 triangles

appropriate

0.05

(Higher

in the population,

Here is an example out.

error.

in the population

tions: p(C) = (P(D)

exist for a

gives an estimate

P(D) and he calls p,. To convert between of ‘detectors’

not

of Type-2

it is the probability

sample but Schlich

centage

declar-

does

plausibly

Beware that detectability

differently:

when

Here, Fig. 1 yields the largest

that might

detectability

error

of a difference

of 1

of con-

expectation.

The alternative

complementary.

10%

150 triangles judged

as likely to be higher

formal

For any number of triangles presented and number correct responses, the table in Schlich’s Appendix

than

what less than 0.5, since the observed frequency

advance

(1993)

than

correctly

chance level. With 150 triangles, the outcome will be as reassuring

Therefore,

Comparison

by more

that more

that the number

the chance

ignored.

detectable

sumers. For P(D) = 10, p(C) = (10 + 50) + 150 = 0.4. TO bring p(C) down to 0.4 with /I - O-05, can be seen in Fig.

of ‘detectors’

Figure

1 gives p(C)

by consulting

than the chance

and p = for /3 =

the line for +l

expectation

of 10 cor-

The graphs

provided

here allow for analysis of a maxi-

mum of 150 triangles. customary

in sensory

That

give firm reassurance cases. Similar (larger)

graphs

is a larger

about

lack of difference

can be prepared

beta levels, which generate

the cost of offering

number

than is

analysis but is still insufficient

less confidence

to

in most

for less stringent

narrower

bounds

at

that the true value

rect) above the horizontal scale position corresponding to 30 triangles. Converting p(C) to P(D), the percent-

of the parameter being estimated actually lies between them. But if a company really is concerned to avoid a

age of ‘detectors’,

flavour defect

being

of confidence

demanded

the 25% indicated

gives 30% by Schlich’s

not too different table,

that the value of p it gives is expressed only one digit. The table additionally

bearing

from

in mind

to a precision

of

gives p values for

of the opposite

kind -

noticeable.

gives answers for only one /3value but with that level of

smaller

is expected

confidence

significant,

percentages

of ‘detectors’

it can be used to discover

while

the graph

what is the maxi-

by consumers,

the degree

of the answers should

less than that which would be acceptable indeed

two other

noticed

a decision

Conventionally, before

that a difference

is

an alpha of 0.05 or

a difference

so it is reasonable

not be

for a decision

is considered

to demand

beta

mum probability of correct responses after observing any particular number of correct responses. Alternatively, it can reveal the largest number of correct responses that would be consistent with the probability of correct responses not exceeding some desired value.

about similarity we require much more data than for the purpose of demonstrating a sensory difference. The disparity is so great that most writers on the topic

Practical applications

despair of assessing similarity with the degree of confidence expected from tests of significance. Schlich

Suppose

a quality

ance that a process

assurance change

manager

wants

has not altered

reassur-

the product

be no larger than that if reassurance

that

should

of detectable difference is sought. It was noted earlier that in order

(1993) in his fifth example, being conducted to reassure

about lack

to get reassurance

envisages an assessment a marketing department

67

Conjidence Internals for the Triangle Test that a new ingredient tible

will not cause a difference

to consumers.

percep

He takes it as axiomatic

puter

that not

experience.

interface

A version

with

an improved

user

is under development.

more than 100 triangles can be used and consequently settles for a probability of ‘detection’ (P(D) in my terminology

and

p, in his)

offers a selection

of O-25, at which

of tradeoffs between

Type-l

and Type-2 errors,

0.0434

and 0.0443

more unbalanced

his analysis

the probabilities

one pair of possibilities

respectively, between

with the other

of

being options

the two types of error. These

REFERENCES Clopper, C. J. & Pearson, E. S. (1934). or fiducial Ennis,

using 100 triangles

Ennis,

they are achieved

that a 25% ‘detection

acceptable,

whereas

bility of detection

only at the cost of

rate’ in the population

one can well imagine

is

that a proba-

of 25% would be completely

unaccept-

able for some types of flavour defect in a food product. The problem is that sensory analysts tend to allow their

constraints

to be set by what is considered

an economically Instead,

practicable

they should

the problem

and design

ing is logically acceptable

required

degree

amount

determine

of sensory testing.

the

whatever

to be

requirements

of

programme

of test-

to answer the question

with an

D. M.

methods

A stand-alone

computer

Frijters,

the

models

data simi-

lar to that used in Figs 1 and 2 runs on a PC and is available on request sending

as a UUENCODED

electronic

or as an executable or 5.25

file by

mail to [email protected], file by sending

in. disk to the author.

not yet been

executable

optimised

The

a PC-formatted

3.5

user interface

has

so the user needs

some

com-

case

of the binomial.

The

relationship

the

of sensory

Variations

testing

discrimination

forced-choice

method

probabilistic signal

detection

Br.J. Mathematical Stat. Psychology,32, 229-41.

gives more

task can

of the triangular

of its unidimensional

E. N., MacRae,

erence

of difference

Food Technol., 44, 1147.

power

to three-alternative

Geelhoed,

power

Se?Lsoly Stud., 8, 353-70.

theory models.

be

A. W. & Ennis,

consistent

D. M. (1994).

judgments

modelled

as forced

Pref-

than oddity only if choice.

Percept. Psy-

ckopkys., 55, 473-7. IS0

(1983).

Geneva:

Schlich, @al. Smith,

Sensory analysis International

4120-1983.

(Under

powerful

Methodology -

Organization

for

Triangular Test. Standardization,

revision)

A. W. & Geelhoed,

inability.

to generate

Relative

J. E. R. (1979).

and

more

program

The use of confidence

in the

evaluation.

D. M. (1993).

MacRae,

Computer program availability

(1990).

in sensory

meth0ds.J.

IS0

of confidence.

illustrated

Biomettika, 26, 404-13.

error probabilities are not particularly exacting and are close to the alpha of 0.05 used here. However, when agreeing

limits

E. N. (1992).

than detection

Preference

can be

of oddity as a test of discrim-

Percept. Psychophys, 51, 179-81.

P. (1993).

Risk tables

for discrimination

tests. Food

Prefflence, 4, 141-51. G. L. (1981).

difference

Statistical

tests: Confidence

properties limits

of simple

sensory

and significance

tests. j

Sn’. Food Agn’culture, 32, 513-20. Ura, S. (1960).

Pair, triangle

tistical Application &search, ginefxs, 7, 107-19.

and duo-trio

tests. +orts

of Sta-

Union ofJapanese Scientists and En-