Acta Psychologica 129 (2008) 208–216
Contents lists available at ScienceDirect
Acta Psychologica journal homepage: www.elsevier.com/locate/actpsy
Empirical evaluation of the axioms of multiplicativity, commutativity, and monotonicity in ratio production of area Thomas Augustin a,*, Katja Maier b a b
Karl-Franzens-University of Graz, Austria Medical University of Graz, Austria
a r t i c l e
i n f o
Article history: Received 23 January 2008 Received in revised form 30 May 2008 Accepted 2 June 2008 Available online 26 July 2008 Keywords: Direct scaling Ratio production Commutativity Multiplicativity Monotonicity Scale type
a b s t r a c t Although direct scaling methods have been widely used in the behavioral sciences since the 1950s, theoretical approaches which could clarify the implicit assumptions inherent in Stevens’ ratio scaling approach were developed only recently. Today, it is generally accepted that the axioms of commutativity and multiplicativity are fundamental to the subjects’ ratio scaling behavior. Therefore, both axioms were evaluated in ratio production of area. Participants were required to adjust the area of a variable circle to prescribed ratio production factors. The results are in accordance with previous empirical findings: commutativity was satisfied, whereas multiplicativity failed to hold. Additionally, the validity of the monotonicity property was analyzed, which postulates that the subjects’ adjustments in a ratio production experiment preserve the mathematical order of the ratio production factors. Monotonicity was satisfied empirically, which is consistent with all the current theories of ratio scaling. Ó 2008 Elsevier B.V. All rights reserved.
1. Introduction One of the major methodological problems in psychology is that of finding quantitative measures of sensation. Although initially, the possibility of measuring sensations was strictly denied (Campbell, 1920, 1933; Stevens, 1946; for a summary of this controversy see Newman, 1974), more recent discussions have focused on the problem of how to construct a scale of sensation. The most prominent controversy on this issue arose between Fechnerians and Stevens’ supporters. Whereas Fechner (1860/1889) derived his logarithmic scale from Weber’s law on the assumption that just noticeable differences corresponded to equal units of sensation, Stevens pursued a strictly experimental approach, for example, by asking subjects who were presented with different physical stimuli to make numerical judgments reflecting their subjective experience (ratio estimation method). Similarly, in a ratio production experiment, participants are required to adjust a variable stimulus to a prescribed ratio, while observing a fixed standard stimulus. For a comprehensive summary of Stevens’ direct scaling methods, see Stevens (1971). Stevens and others collected a vast set of data on a wide variety of sensory modalities that were well fit by power functions of the form * Corresponding author. Address: Cognitive Science Section, Department of Psychology, Karl-Franzens-Universitätsplatz 2/III, A-8010 Graz Austria. Tel.: +43 316 380 8550. E-mail address:
[email protected] (T. Augustin). 0001-6918/$ - see front matter Ó 2008 Elsevier B.V. All rights reserved. doi:10.1016/j.actpsy.2008.06.004
uðsÞ ¼ asb ;
ð1Þ
with real-valued parameters a > 0 and b > 0. For a summary of wellknown empirical findings, see Stevens (1975). The following results are of particular interest for this study: Stevens and Guirao (1963) showed, among other things, that for the magnitude production of squares, the power law holds with an exponent equal to 0.7. A similar result is due to Teghtsoonian (1965) for the judgments of size obtained by the magnitude estimation procedure: Participants were required to judge the apparent area of two-dimensional figures (e.g., circles, squares, triangles, polygons) with respect to a given standard stimulus whose area was arbitrarily called 10. The results indicate that the form of a figure has only minor influence on the exponent in Stevens’ power law (1): The exponents ‘‘range from 0.60 to 0.90, with most falling in the interval 0.76–0.81” (Teghtsoonian, 1965, p. 399). More recent studies, dealing with the relationship between perceptual and memorial psychophysics, confirm these results (e.g., Chew & Richardson, 1980; DaSilva, Marques, & Ruiz, 1987; Kemp, 1988; Kerst & Howard, 1978): Participants in all the four studies were presented with an outline map of different countries. The participants’ task was to rate the relative area of these countries with respect to a fixed reference country, whose area was assumed to be 100 units in size. The data indicate that for perceived visual areas, Stevens’ power law holds with an exponent around 0.8. Algom, Wolf, and Bergman (1985) obtained similar results by letting subjects estimate the areas of perceived rectangles.
T. Augustin, K. Maier / Acta Psychologica 129 (2008) 208–216
In spite of the fact that ratio scaling methods are frequently used in psychophysics, Stevens was vague about ‘‘what underlies and is accomplished by his method of ratio magnitude estimation” (Narens, 1996, p. 109). This was one of the reasons why several researchers have questioned Stevens’ approach of measuring sensations (e.g., Anderson, 1970, 1976; Graham, 1958; McKenna, 1985; Shepard, 1981). Although different authors provided elaborate theories of magnitude estimation and the closely related procedure of cross-modal matching (e.g., Krantz, 1972; Luce, 1990; Marley, 1972; Shepard, 1978, 1981), a comprehensive solution was not published until 1996 when Narens formulated what he believed were the implicit assumptions in Stevens’ approach. In his axiomatic theory of ratio magnitude estimation, Narens (1996) introduced the following terminology: If a subject in a ratio production experiment produces a stimulus x that appears to be ptimes more intense than a standard stimulus t, then the subject’s adjustment is recorded as (x, p, t) 2 E. Similarly, the triple (x, p, t) 2 E can refer to the outcome of a ratio estimation experiment. Narens (1996) distinguishes cognitive axioms, dealing with the relationship between cognition and behavior, from behavioral axioms, relating the stimuli to the behavior. Of these axioms, the empirically testable axioms of commutativity (If (x, p, t) 2 E, (z, q, x) 2 E, (y, q, t), 2 E, and (w, p, y) 2 E, then z = w), and multiplicativity (If (x,p,t) 2 E, and (z,q,x) 2 E, then (z, pq, t) 2 E) are crucial to the interpretation of the subjects’ ratio scaling behavior: If the commutativity axiom holds along with a number of side conditions, then there exists an order preserving1 ratio scale / on the set of physical stimuli and a strictly increasing transformation function f from the numerals occurring in the ratio scaling experiment into the positive real numbers such that
ðx; p; tÞ 2 E () /ðxÞ ¼ f ðpÞ/ðtÞ
ð2Þ
(Narens (1996), Eq. (1)). One way of interpreting Formula (2) is that, in principle, the method of ratio production (respectively, ratio estimation) yields measurements on a ratio scale level, but that the practice of interpreting the numerals occurring in a ratio scaling experiment at face value is highly problematic unless the transformation function f is the identity function. This is the reason for the multiplicative property to play a crucial role in the interpretation of the subjects’ ratio scaling behavior: If certain technical side conditions are satisfied, then the multiplicative property is necessary and sufficient for f to be a power function of the form f(p) = pa. In that case, Formula (2) can be specified as
ðx; p; tÞ 2 E () /ðxÞ ¼ pa /ðtÞ;
ð3Þ
with a real-valued parameter a > 0. Ellermeier and Faulhammer (2000) evaluated Narens’ crucial axioms of commutativity and multiplicativity by having subjects produce loudness ratios. Both axioms were tested for the ratio production factors p = 2 and q = 3. Their results are unequivocal: Whereas commutativity holds for loudness production, multiplicativity fails to hold. Zimmer (2005) obtained similar results by using loudness production factors less than 1, and Steingrimsson and Luce (2007) rejected the axiom of multiplicativity by using a different experimental methodology. The experimental data by Peißner (1999) on brightness perception provide evidence for the generality of these results across different sensory modalities. Finally, Augustin (2008) tested for the area production of circles whether the p 1p adjustments differ significantly from the 1 adjustments2. Multiplicativity was violated for all participants. Recently, Luce (2002, 2004) provided an extension of Narens’ axiomatization by introducing a generalization of Stevens’ ratio 1 i.e., for all stimuli x, y 2 X, x y , /(x) > /(y). Here, as in the following, is the natural physical ordering on the stimulus set X. 2 Note that this is a special case of the multiplicative property.
209
scaling procedure. In Luce’s notation, the expression xpy stands for that intensity z, for which the subjective interval from y to z is perceived to stand in the proportion p to the subjective interval from y to x. Obviously, this operation is a generalization of Stevens’ ratio production procedure, which obtains as the special case when y = 0. Luce (2002) formulated conditions which are necessary and sufficient for the existence of a psychophysical function W and a subjective weighting function W that are both strictly increasing such that
WðpÞ ¼
Wðxp yÞ WðyÞ WðxÞ WðyÞ
ð4Þ
(Luce, 2002, Definition 3). Steingrimsson and Luce (2005a, 2005b) provided empirical evidence supporting this theory. Furthermore, it is important to note that, by setting y = 0 and W*(x): = W(x) W(0), Eq. (4) becomes
W ðxp 0Þ ¼ WðpÞW ðxÞ;
ð5Þ
3
which is equivalent to Formula (2) proposed by Narens (1996). It is worth mentioning that by finding support for Narens’ (1996) and Luce’s (2002, 2004) theoretical approaches, we also obtain implicit support for the following monotonicity property, which was already postulated by Narens (1996): If in a ratio production experiment an observer has to produce a stimulus x that appears to be p-times more intense than a referent t, and a stimulus y appearing q-times more intense than t, then the adjustments x and y preserve the mathematical order of the ratio production factors p and q. This requires, for instance, the 2 adjustment to be less intense than the 3 adjustment, the 3 adjustment to be less intense than the 4 adjustment, and so forth. Formally, this can be expressed as follows: If (x, p, t) 2 E and (y, q, t) 2 E, then
p > q () x y
ð6Þ
(Narens, 1996, Axiom 3.1; see also Augustin, 2006, Axiom 3). Further empirical evidence in favor of Formula (6) results from the fact that the monotonicity axiom is necessary for Stevens’ power law to hold. Consequently, each of the myriads of experiments that found the power law to provide a good fit to empirical data has implicitly confirmed the monotonicity property. Finally, it is important to note that the monotonicity property is closely related to the question of which scale type is appropriate to describe ratio scaling data: Augustin (2006) showed that Stevens’ method of construction measurements scales yields at least an ordinal scale if the monotonicity property holds along with quite plausible technical side conditions. Due to its fundamental importance for the interpretation of the subjects’ ratio scaling behavior, the present article provides an empirical evaluation of the axioms of commutativity and multiplicativity in the ratio production of area. Furthermore, since Eq. (6) is fundamental to all the current theories of ratio scaling, we analyzed the validity of the monotonicity property as well. Except for the range of the ratio production factors, the experimental task was identical to that used by Augustin (2008): we presented two circular curves – a fixed referent and a variable comparison stimulus – to the participants, who were required to adjust the variable circle to the prescribed ratio production factors. 2. Method 2.1. Participants Thirteen students at the University of Graz, five university graduates and two people with other qualification levels participated in the experiment. This sample consisted of twelve women and eight
3
Note that Luce’s notation z = xp0 is equivalent to Narens’ notation (z, p, x) 2 E.
210
T. Augustin, K. Maier / Acta Psychologica 129 (2008) 208–216
men, and had a median age of 26 years (range, 21–46 years). All of them reported normal or corrected-to-normal vision. With the exception of participant T.A. (the first author), none of the participants had prior knowledge of the hypothesis being investigated. 2.2. Stimuli and apparatus The stimuli were presented on a 21” EIZO FlexScan Color LCD Monitor. They consisted of white circular curves on a dark blue background (cf. Fig. 1). We used the software program4 Orange TA 5.2 for the presentation of the stimuli and for the recording of the responses. The participants adjusted the stimuli by pressing one of two keys on a standard keyboard (CHERRY G83). In order to keep the distance to the monitor constant, we used a chin rest during the data collection. To reduce the effect of background noise, the experiment was conducted in a sound-proof chamber. The room was completely darkened, the only light source came from the monitor. 2.3. Procedure Each participant was tested individually in four sessions. At the beginning of each session, he or she sat down in front of an LCD monitor with his or her head placed on a chin rest at 75 cm viewing distance from the screen. The keyboard used for adjusting the stimuli was placed on a desk in front of the participants. The participants received a written instruction on the screen. After they had become familiar with the task through eight practice trials, the experimenter left the chamber. The experiment consisted of 280 trials divided into four sessions. There was a five-minute break between Sessions 1 and 2, and Sessions 3 and 4, and a longer 20min break between Sessions 2 and 3. Each trial proceeded as follows: Two white circular curves appeared on a dark blue screen. The epicenters of both circles were presented on the same level in the middle of the screen. The left one was the standard stimulus and the right one was adjustable by the participant. Furthermore, a white numeral p (p = 2, 3, . . .) was displayed in the bottom left corner of the screen, which was the request for the participant to adjust the area of the variable circle in such a way that it appeared to be p-times as large as the area of the standard stimulus (cf. Fig. 1). It is important to note that by adjusting the circle to a prescribed ratio production factor, a participant not only changes the size of the circle, but also its center. Alternatively, one might think of a different experimental paradigm, in which the center locus of the variable circle is fixed. However, since we intended to keep the distance between the two circle lines constant, we have not pursued this idea further. Nevertheless, the effect of shifting the circle’s center should be addressed in further research. The adjustments were performed by pressing the ‘‘"” and ‘‘;” keys on the keyboard. In order to accelerate the adjustments, the participants could press the ‘‘Shift” key5 and unclasp it again to do the fine tuning. There was no time restriction for the task. Each trial continued until the participant pressed the ‘‘Return” key to proceed to the next trial. Furthermore, if the screen appeared to be too small to adjust the area of the variable circle to the prescribed ratio production factor, the participants had the opportunity to press the ‘‘ESC” key. This was done to detect ceiling effects. Sessions 1 and 2 consisted of 66 trials each. As standard stimuli, we used two circles with diameters of 75 and 125 pixel (corresponding to diameters of 2.5 cm and 4.1 cm on the screen, respectively). In each session, both standards were combined five times 4
which was developed by Michael D. Kickmeier-Rust, University of Graz, Austria. In that case, the size of the variable stimulus changed more rapidly when pressing the ‘‘"” and ‘‘;” keys. 5
Fig. 1. Stimulus configuration projected onto the screen. The participant’s task was to adjust the right circle in such a way that it appeared to be p-times as large as the left (standard) circle. The numeral p (in this example p = 2) was displayed in the bottom left corner of the screen. This was the request for the participant to adjust the area of the variable circle in such a way that it appeared to be 2-times as large as the standard circle. The arrow indicates the direction of the alteration of the variable circle. The dashed-lined circles indicate possible adjustments given by the participants.
with the ratio production factors p = 2, 3 and 4, and three times with the factors p = 5, . . . , 10. In the following, the single adjustðiÞ ments given by the participants are denoted by xp . For instance, ðiÞ 752 is the i-th (i = 1, . . . , 10) adjustment made in response to a two times as large request starting from the standard stimulus t = 75. For each participant, the 60 single adjustments 75ðiÞ and p 125ðiÞ p (p = 2, 3, 4; i = 1, . . . , 10) from Sessions 1 and 2 were used as standard stimuli in the subsequent Sessions 3 and 4. In each of the Sessions 3 and 4, the participants had to process 74 trials. Again, we presented the two standard stimuli t = 75 and t = 125 from Sessions 1 and 2, but also half of the above-menðiÞ tioned single adjustments 75ðiÞ p and 125p that were collected in the previous sessions. As in the previous experimental work (e.g., Ellermeier & Faulhammer, 2000; Steingrimsson & Luce, 2005a, 2005b; Zimmer, 2005), we used single adjustments instead of averaged data. By this procedure, the data variance from Sessions 1 and 2, otherwise lost, could propagate through the rest of the experiment. Thus, in Sessions 3 and 4, each participant had an individual set of standard stimuli. In both sessions, the standard stimuli with 75 and 125 pixel diameters were combined two times with the ratio production factors p = 5, . . . , 10. This was done to present higher production factors in Sessions 3 and ðiÞ ðiÞ 4 as well. Furthermore, the standard stimuli 752 and 1252 had to be adjusted once to the ratio production factors 2, 3, and 4, ðiÞ ðiÞ ðiÞ ðiÞ and the standard stimuli 753 , 754 , 1253 , and 1254 were combined once with the production factor 2 (i = 1, . . . , 5 for Session 3, and i = 6, . . . , 10 for Session 4). This made it possible to present all ratio production factors in all four sessions. Altogether, there were 28 different types of adjustments: Each of the two standard stimuli from Sessions 1 and 2 was combined with each of the nine ratio production factors p = 2, 3, . . . , 10, resulting in 18 different types of p adjustments. Furthermore, each standard was combined with each of the pairs (p, q) = (2, 2), (2, 3), (2, 4), (3, 2), and (4, 2), resulting in ten different types of consecutive p q adjustments. Each participant produced ten adjustments of each type, resulting in a total of 280 adjustments. In all the four sessions, the starting size of the variable circle was either minimal, with a diameter of 10 pixel (0.3 cm), or maximal, with a diameter of 780 pixel (25.7 cm). In order to ensure variability and reduce predictability of the experimental design, the ratio production factors, the standard stimuli and the starting size of the variable circle were randomly intermixed within each session.
T. Augustin, K. Maier / Acta Psychologica 129 (2008) 208–216
3. Results One participant had to be excluded from the data analysis since she obviously misunderstood the experimental task. Instead of adjusting in accordance with her perception, she used a fixed number of keystrokes for each ratio production factor. This was detected by examining the standard deviations of the participant’s adjustments, which were all close to zero. In very few cases (altogether 6 of 5320 trials), participants used the ‘‘Esc” key to indicate that the screen appeared to be too small to
211
adjust the area of the variable circle to the prescribed ratio production factor (ceiling effect). These trials were excluded from all the subsequent data analyses. The total time for the experiment was about one and a half hour. The average duration of Session 1 was 19 min, 15 min for Sessions 2 and 3, and 13 minutes for Session 4. 3.1. Validity of monotonicity in the sample A descriptive comparison of the individual mean diameters P10 ðiÞ P10 ðiÞ 1 1 xp :¼ 10 i¼1 xp and xpþ1 :¼ 10 i¼1 xpþ1 indicates that the monoto-
Fig. 2. Plots of the individual mean diameters 75p (lower regression line) and 125p (upper regression line) versus p. Best fitting lines were obtained by linearly regressing xp on p, separately for both standard stimuli.
212
T. Augustin, K. Maier / Acta Psychologica 129 (2008) 208–216
Fig. 3. Cumulative sums for two participants (A.P., G.N.) representative of the main experiment. The left column refers to the smaller standard stimulus (75 pixel diameter), the right one to the larger standard (125 pixel diameter). To minimize the effect of random influences, the plots are restricted to n = 6, . . . , 10.
nicity property holds for the area production of circles: In 285 of 304 cases6, we observed an increase from xp to xp+1, which corresponds to 93.75% of all cases. Fig. 2, which shows a graphical representation of the individual mean diameters 75p and 125p versus p, provides further evidence for a monotonic trend.
6 Note that for each participant and each standard stimulus (75 and 125 pixel diameters), we performed eight pairwise comparisons, and a total of 304 for the whole sample.
Additionally, we used a graphical approach based on the cumulative sums of the individual adjustments: For every participant, let
S ðx; pÞ : n7!Sn ðx; pÞ :¼
n X
xpðiÞ ;
1 6 n 6 10;
i¼1
denote the cumulative sum of the individual p adjustments starting from the standard stimulus x (75 or 125 pixel diameter). Then the monotonicity property requires that, for a fixed standard stimulus x and a fixed number n of repetitions, the inequality Sn(x, p) < Sn(x, p + 1) holds. Thus, in order to check for monotonicity,
213
T. Augustin, K. Maier / Acta Psychologica 129 (2008) 208–216 Table 1 Empirical evaluation of the monotonicity property
Table 3 Empirical evaluation of the commutative property for the larger standard stimulus with 125 pixel diameter
?
S ðx; pÞ > S ðx; p þ 1Þ
?
Part.
x = 75
x = 125
A.C. A.M. A.P. B.M. C.O. C.W.
7/8 – – – – 3/ 4 5/6 7/ 8/ 9 – 6/7 8/9 8/9 – 9/10 – –
6/7 5/6 – – – 3/4
D.K. E.P. G.N. H.G. J.P. M.G. M.U. R.E. S.H. S.W. S.Z.
8/9 7/9 8/9 –
T.A. T.R.
8/9 8/9
– 6/7 6/7 8/9 8/9 – – 6/7 8/9/10 – – 4/5 7/8 – –
It was tested whether the cumulative sums can be ordered as S(x, 2) < S(x, 3) < < S(x, 10). This was done separately for both standard stimuli (x = 75 and x = 125 pixel diameters) and all participants. The table entries indicate the violations of monotonicity. Participant J.P., for instance, showed no violation for the smaller standard stimulus, and one violation for the larger standard. The table entry ‘‘8/9” indicates that the participant’s 8 adjustments, starting from the larger standard stimulus, tend to systematically overshoot the 9 adjustments. If, however, the cumulative sums S(125, 8) and S(125, 9) are interchanged, then the adjustments are completely consistent with the monotonicity property.
125p;q ¼ 125q;p (p,q) Part.
(2, 3)
(2, 4)
A.C. A.M. A.P. B.M. C.O. C.W. D.K. E.P. G.N. H.G. J.P. M.G. M.U. R.E. S.H. S.W. S.Z. T.A. T.R.
.075 .631 .529 .190 .912 .912 .631 .063 .218 .631 .739 .001 .190 .796 .218 .393 .052 .436 .971
.739 .529 .165 .315 .063 .105 .489 .075 .579 .931 .019 .002 .190 .353 .075 .143 .796 .015 .447
For each participant, it was tested whether the area of the 2 3 (respectively, 2 4) adjustment is statistically indistinguishable from the area of the 3 2 (respectively, 4 2) adjustment. The table entries are p-values of the computed Mann–Whitney U-tests (two-tailed test, a = 0.05). Violations of commutativity are printed in boldface. Table 4 Empirical evaluation of the multiplicative property for the smaller standard stimulus with 75 pixel diameter ?
75p;q ¼ 75ðpqÞ Part.
Table 2 Empirical evaluation of the commutative property for the smaller standard stimulus with 75 pixel diameter ?
75p;q ¼ 75q;p (p, q) Part.
(2,3)
(2,4)
A.C. A.M. A.P. B.M. C.O. C.W. D.K. E.P. G.N. H.G. J.P. M.G. M.U. R.E. S.H. S.W. S.Z. T.A. T.R.
.684 .853 .684 .052 .123 .739 .280 .165 .579 .123 .393 .143 .684 1.00 .015 .218 .971 .001 .315
.631 .105 .739 .029 .123 .165 .353 .835 .089 .481 .579 .005 .684 .393 .015 .315 .739 .796 .105
For each participant, it was tested whether the area of the 2 3 (respectively, 2 4) adjustment is statistically indistinguishable from the area of the 3 2 (respectively, 4 2) adjustment. The table entries are p-values of the computed Mann–Whitney U-tests (two-tailed test, a = 0.05). Violations of commutativity are printed in boldface.
A.C. A.M. A.P. B.M. C.O. C.W. D.K. E.P. G.N. H.G. J.P. M.G. M.U. R.E. S.H. S.W. S.Z. T.A. T.R.
(p, q) (2,2)
(2,3)
(2,4)
.007 .353 .190 <.001 <.001 <.001 <.001 <.001 .011 <.001 <.001 .796 .029 .063 <.001 <.001 .143 .003 .007
.019 .015 .019 .009 <.001 .029 <.001 .029 .579 <.001 <.001 .971 .739 .023 <.001 <.001 .280 .002 .075
.089 .315 .247 .353 .002 .247 <.001 .393 .481 <.001 .063 .123 .353 .003 .007 .043 .529 .796 .739
For each participant, it was tested whether the area of the 2 2 (respectively, 2 3 or 2 4) adjustment is statistically indistinguishable from the area of the 4 (respectively, 6 or 8) adjustment. The table entries are p-values of the computed Mann–Whitney U-tests (two-tailed test, a = 0.05). Violations of multiplicativity are printed in boldface.
every standard stimulus x and every ‘‘sufficiently large” number n of repetitions7, the elements Sn(x, 2), Sn(x, 3), . . . , Sn(x, 10) can be ordered as
Sn ðx; 2Þ < Sn ðx; 3Þ < . . . < Sn ðx; 10Þ: Fig. 3 shows the results for two participants (A.P., G.N.) representative of the whole sample. To minimize the effect of random influ-
we compared the cumulative sums S(x,2), S(x, 3), . . . , and S(x, 10) with each other. This was done separately for both standard stimuli and all participants. Monotonicity is satisfied for a participant, if for
7 Note that due to random influences, there might be minor deviations from the expected order if the number of repetitions is ‘‘small”.
214
T. Augustin, K. Maier / Acta Psychologica 129 (2008) 208–216
Table 5 Empirical evaluation of the multiplicative property for the larger standard stimulus with 125 pixel diameter ?
125p;q ¼ 125ðpqÞ Part.
A.C. A.M. A.P. B.M. C.O. C.W. D.K. E.P. G.N. H.G. J.P. M.G. M.U. R.E. S.H. S.W. S.Z. T.A. T.R.
(p,q) (2,2)
(2,3)
(2,4)
<.001 .019 .029 .015 <.001 <.001 <.001 <.001 .853 <.001 <.001 .579 .019 .002 .005 .002 .579 <.001 .063
.015 .043 .089 .165 <.001 .009 <.001 <.001 .011 <.001 .005 .853 .190 .315 .001 .019 .089 .005 .035
.035 .089 .684 .123 .002 .105 .010 <.001 .002 .013 .063 .063 .393 .143 .019 .123 .063 .971 .022
For each participant, it was tested whether the area of the 2 2 (respectively, 2 3 or 2 4) adjustment is statistically indistinguishable from the area of the 4 (respectively, 6 or 8) adjustment. The table entries are p-values of the computed Mann–Whitney U-tests (two-tailed test, a = 0.05). Violations of multiplicativity are printed in boldface.
ences, the plots are restricted to n = 6, . . . , 10. According to Fig. 3, the data are in quite good correspondence with the monotonicity property: Participant A.P. (upper row of Fig. 3) shows no violation of monotonicity at all, and for the remaining two plots, referring to participant G.N., the violations of monotonicity can be corrected by interchanging two consecutive sequences: It is evident from the lower row of Fig. 3 that the violations of monotonicity can be attributed to the fact that the participant’s 8 adjustments, starting from the smaller standard stimulus, tend to systematically overshoot the 9 adjustments. If, however, the cumulative sums S(75, 8) and S(75, 9) are interchanged, then the adjustments are consistent with the monotonicity property. Similarly, for the larger standard stimulus, the monotonicity property can be corrected by interchanging S(125, 6) and S(125, 7). It is important to note that these two participants are representative of the whole sample (cf. Table 1): For the smaller standard stimulus, nine participants (A.M., A.P., B.M., C.O., D.K., J.P., M.U., R.E., S.Z.) showed no violation at all, and for all but two (C.W., S.W.) of the remaining participants, the monotonicity property can be corrected by interchanging two consecutive sequences. Similarly, for the larger standard stimulus, there are ten participants (A.P., B.M., C.O., D.K., M.G., M.U., S.H., S.W., T.A., T.R.) with no axiom violation, and seven participants (A.C., A.M., C.W., E.P., G.N., H.G., J.P.), for which the monotonicity property can be corrected by interchanging two consecutive sequences. Thus, the overall picture of results shows that the participants’ adjustments are in quite good correspondence with the monotonicity property. 3.2. Validity of commutativity in the sample The axiom of commutativity is satisfied, if the area of the p q adjustment is statistically indistinguishable from the area of the q p adjustment. This was tested for the pairs (p,q) = (2,3) and (p,q) = (2,4), and both standards. Consequently, we conducted four (Mann–Whitney U-) tests for each participant and a total of 76 tests for the whole sample. We, nevertheless, did not correct for
multiple comparisons for the following reasons8: The aim of the analysis was to accept a statistical null hypothesis, and a correction for multiple comparisons would make this easier, not harder. Furthermore, we do not reject commutativity on the basis of a single significant result, but rather the overall pattern of results. For this reason, we decided against a Bonferroni correction and consequently, used the standard significance level of a = 0.05. Altogether we found a total of nine violations of 76 comparisons, which correspond to a violation rate of 12%. It is worth mentioning that only three participants (M.G., S.H., T.A.) produced more than one violation of commutativity, whereas a majority of 14 participants showed no violation at all. The p-values of the conducted Mann–Whitney U-tests are listed in Tables 2 (smaller standard) and 3 (larger standard), respectively. 3.3. Validity of multiplicativity in the sample In order to test for multiplicativity, we checked whether the area of the p q adjustment was statistically indistinguishable from the area of the (p q) adjustment. This was tested for the pairs (p, q) = (2, 2), (p, q) = (2, 3), and (p, q) = (2, 4), and both standards (two-tailed Mann–Whitney U-tests). Thus, we tested the axiom of multiplicativity six times for each participant which leads to a total of 114 tests. As in Section 3.2, we did not correct for multiple comparisons, but used the standard significance level of a = 0.05. In total, we observed 70 violations of 114 comparisons, which is equivalent to a proportion of 61%. Except for the two participants with no violation of multiplicativity (M.G., S.Z.), every participant produced at least two and up to six axiom violations. For the standard stimulus with 75 pixel diameter we found 34 violations of 57 tests, with only two participants showing no statistically significant result. Table 4 lists the corresponding p-values of the conducted Mann–Whitney U-tests. Table 5, which contains a total of 36 axiom violations, shows that similar results hold for the larger standard stimulus with 125 pixel diameter.
4. Discussion The aim of the present experiments was to investigate three basic conditions inherent in the current theories of ratio scaling. The obtained pattern of results suggests that, in general, commutativity is satisfied for the area production of circles. Note that the axiom of commutativity was violated in only nine of 76 tests, and that seven of these violations were produced by three participants. In contrast to this, the axiom of multiplicativity was violated in 70 of 114 tests. Since 17 of 19 participants produced at least two and up to six axiom violations, the conclusion is justified that multiplicativity is violated for the area production of circles. It is important to note here that a Bonferroni correction would only slightly affect the statistical results: If the significance level is corrected to9 a = 0.0125, then the axiom of commutativity is violated in four of 76 tests, which corresponds to a violation rate of 5% (compared to 12% for a = 0.05). Similarly, for the corrected significance level of a = 0.008, we observed 45 violations of multiplicativity, which is equivalent to a proportion of 39% (compared to 61% for a = 0.05). It is worth mentioning that, except for the range of the ratio production factors, Augustin (2008) used the same experimental paradigm to test whether the area of the p 1p adjustment is statistically indistinguishable from the area of the 1 adjustment. Ten 8 The following argument was pointed out by Ragnar Steingrimsson, in a review of a former version of this article. 9 Note that four tests for commutativity and six for multiplicativity were conducted for each participant.
T. Augustin, K. Maier / Acta Psychologica 129 (2008) 208–216
215
Fig. 4. Cumulative sums for a single participant (C.O.) representative of the supplementary experiment. The left graphic refers to the smaller standard stimulus (75 pixel diameter), the right one to the larger standard (125 pixel diameter). To minimize the effect of random influences, the plots are restricted to n = 16, . . . , 25.
persons participated in the experiment, and for each participant, four tests were conducted (Mann–Whitney U-test, a = .0125). Augustin’s (2008) results are in line with the present findings: each participant produced at least two violations of the multiplicative property, and a total of 28 of 40 tests (70%) showed statistically significant results. Furthermore, the present findings are consistent with those reported by Ellermeier and Faulhammer (2000); Zimmer (2005); and Steingrimsson and Luce (2007) (cf. Section 1): For commutativity, we found a total of nine violations of 76 tests, which corresponds to a violation rate of 12%. Similarly, Ellermeier and Faulhammer (2000) reported a violation rate of 11%, and Zimmer (2005) found that 14% of the tests for commutativity were violated. For the axiom of multiplicativity, we computed a violation rate of 61%, whereas slightly higher rates were reported by Ellermeier and Faulhammer (94%), Zimmer (89%), and Steingrimsson and Luce (86%). Ellermeier and Faulhammer (2000) and Zimmer (2005) argued that, according to their results, the adjustments of a single participant provide representations on a ratio scale level, but that the practice of deriving these representations directly from the ratio production factors used in the experiment is problematic. Formally, this situation can be summarized as
ðx; p; tÞ 2 E () /ðxÞ ¼ f ðpÞ/ðtÞ; where / is a ratio scale, and f is a strictly increasing transformation function from the numerals occurring in the ratio scaling experiment into the positive real numbers. It is important to note, however, that this conclusion is tenable only if the monotonicity property is satisfied empirically (cf. Section 1). The results of the present experiment provide evidence that monotonicity is satisfied for the area production of circles: Even though minor axiom violations were observed for most of the participants, the overall pattern of results indicates that the individual adjustments are in quite good correspondence with the monotonicity property. To analyze whether the observed axiom violations are due to the relatively small number of repetitions, we repeated the experiment with a total number of N = 25 adjustments per con-
dition10: We tested two participants from the main experiment (C.O., S.Z.), and three persons who had never participated in direct scaling experiments before. The procedure was similar to that in the main experiment: The experiment consisted of 200 trials divided into five sessions with short breaks inbetween. The experimental task was identical to that in the main experiment (cf. Section 3). Each session consisted of 40 trials: Two circular curves with diameters of 75 and 125 pixel were used as standard stimuli. In each session, the smaller standard stimulus was combined five times with the ratio production factors p = 7, 8, 9 and 10, and the larger standard was combined five times with the ratio production factors p = 4, 5, 6 and 7. Hence, for each combination of standard stimulus and ratio production factor, each participant produced a total number of 25 adjustments. To check for monotonicity, we used a graphical approach similar to that used in the main experiment: Fig. 4, which shows the results for a single participant (C.O.) representative of the whole sample, provides a graphical representation of all cumulative sums. In order to minimize the effect of random influences, the plots are restricted to n = 16, . . . , 25. Since for each displayed n, and each combination of standard stimulus x and ratio production factor p, the inequality Sn(x, p) < Sn(x, p + 1) holds, the participant’s adjustments are completely consistent with the monotonicity property. Since none of the remaining four participants showed a single axiom violation, there is some evidence that the observed deviations from monotonicty (cf. Table 1) can be attributed to the relatively small number of repetitions in the main experiment. Note that whereas in the main experiment ten observations per experimental condition were collected, the participants in the second experiment had to produce a total of 25 adjustments per condition.
10 It might be interesting to test the axioms of commutativity and multiplicativity in a similar way. However, since the observed pattern of results is perfectly in line with the previous empirical findings on loudness and brightness production (Ellermeier & Faulhammer, 2000; Peißner, 1999; Steingrimsson & Luce, 2007; Zimmer, 2005), we do not expect the results to differ materially from the present findings.
216
T. Augustin, K. Maier / Acta Psychologica 129 (2008) 208–216
It is important to note that, with the exception of an unpublished master’s thesis by Peißner (1999), the monotonicity property has only been implicitly tested in the previous studies (e.g., Ellermeier & Faulhammer, 2000; Steingrimsson & Luce, 2005a, 2005b, 2007; Zimmer, 2005). Similar to the present experiment, Peißner (1999) provided an explicit test of monotonicity by having subjects produce brightness ratios of achromatic light stimuli. The results provide evidence that monotonicity is satisfied for all the eleven participants of the experiment. It has to be noted, however, that monotonicity was tested only for a rather limited range of the ratio production factors: Since the main aim of Peißner’s study was to test the axioms of commutativity and multiplicativity, the experiment was limited to the ratio production factors p = 2, 3, 5, 6, and 7. Furthermore, it is worth mentioning that finding monotonicity to hold is consistent with the predictions of all the current theories of ratio scaling: As pointed out in the introduction, the monotonicity property is fundamental to Stevens’ power law, as well as to the theoretical approaches by Narens (1996) and Luce (2002, 2004). Similarly, the monotonicity property plays a central role in Augustin’s (2006) theoretical framework: Finding monotonicity to hold provides evidence that Stevens’ method of constructing measurement scales yields at least an ordinal scale (Augustin, 2006, Corollary 5). Finally, let us make some comments on the statistical data analysis employed in Section 3: A problem in the statistical analysis was that the standard deviations of the adjustments were strictly increasing with the ratio production factors, which indicates that the participants had more problems with higher production factors. Note, however, that this increase of the standard deviations is not surprising if Weber’s law (DðaÞ ¼ const:) is taken into account. a Since, by this invariance principle, the just-noticeable difference D(a) increases with the intensity a of the physical stimulus, also the standard deviations of the participants’ adjustments are expected to increase with the ratio production factors. In order to allow for unequal standard deviations (and possibly not normally distributed adjustments), we have applied the non-parametric Mann–Whitney U-test, which is only based on the rankings of the adjustments. To sum up, the basic axiom of monotonicity is satisfied for the area production of circles, which is consistent with the myriads of experiments that found the power law to provide a good fit to empirical data, as well as with all the current theories of ratio scaling (Luce, 2002, 2004; Narens, 1996). Furthermore, the present investigation found commutativity to hold, and multiplicativity to fail, which is perfectly in line with the previous empirical findings on loudness and brightness production (Ellermeier & Faulhammer, 2000; Peißner, 1999; Steingrimsson & Luce, 2007; Zimmer, 2005). Thus, the picture emerging is unequivocal: Whereas monotonicity and commutativity appear to be general laws that hold across different perceptual continua, multiplicativity is likely to be violated in most applications. However, in order to come to a final conclusion, further research will have to focus on other domains, such as visual length or heaviness.
Acknowledgements We are grateful to Ragnar Steingrimsson, Wolfgang Ellermeier, and Johan Wagemans for helpful comments on the earlier drafts of this paper. Correspondence should be addressed to Thomas Augustin, Department of Psychology, Karl-Franzens-Universität Graz, Austria; e-mail:
[email protected].
References Algom, D., Wolf, Y., & Bergman, B. (1985). Integration of stimulus dimensions in perception and memory: Composition rules and psychophysical relations. Journal of Experimental Psychology: General, 114, 451–471. Anderson, N. H. (1970). Functional measurement and psychophysical judgment. Psychological Review, 77, 153–170. Anderson, N. H. (1976). Integration theory, functional measurement and the psychophysical law. In H. G. Geissler & Y. M. Zabrodin (Eds.), Advances in psychophysics (pp. 93–129). Berlin: VEB Deutscher Verlag der Wissenschaften. Augustin, T. (2006). Stevens’ direct scaling methods and the uniqueness problem. Psychometrika, 71, 469–481. Augustin, T. (2008). Stevens’ power law and the problem of meaningfulness. Acta Psychologica, 128, 176–185. Campbell, N. R. (1920). Physics: The elements. Cambridge: Cambridge University Press. Reprinted as Foundations of science: The philosophy of theory and experiment. New York: Dover Publications, Inc., 1957.. Campbell, N. R. (1933). The measurement of visual sensations. Proceeding of the Physical Society, 45, 565–571. Chew, E. I., & Richardson, J. T. E. (1980). The relationship between perceptual and memorial psychophysics. Bulletin of the Psychonomic Society, 16, 25–26. DaSilva, J. A., Marques, S. L., & Ruiz, E. M. (1987). Subject differences in exponents of psychophysical power functions for inferred, remembered, and perceived area. Bulletin of the Psychonomic Society, 25, 191–194. Ellermeier, W., & Faulhammer, G. (2000). Empirical evaluation of axioms fundamental to Stevens’s ratio-scaling approach: I. loudness production. Perception and Psychophysics, 62, 1505–1511. Fechner, G. T. (1889). Elemente der Psychophysik [Elements of psychophysics]. Leipzig: Breitkopf und Härtel. (Original work published 1860). Graham, C. H. (1958). Sensation and perception in an objective psychology. Psychological Review, 65, 65–76. Kemp, S. (1988). Memorial psychophysics for visual area: The effect of retention interval. Memory and Cognition, 16, 431–436. Kerst, S. M., & Howard, J. H. (1978). Memory psychophysics for visual area and length. Memory and Cognition, 6, 327–335. Krantz, D. H. (1972). A theory of magnitude estimation and cross-modality matching. Journal of Mathematical Psychology, 9, 168–199. Luce, R. D. (1990). On the possible psychophysical laws revisited: Remarks on crossmodal matching. Psychological Review, 97, 66–77. Luce, R. D. (2002). A psychophysical theory of intensity proportions, joint presentations, and matches. Psychological Review, 109, 520–532. Luce, R. D. (2004). Symmetric and asymmetric matching of joint presentations. Psychological Review, 111, 446–454. Marley, A. A. J. (1972). Internal state models for magnitude estimation and related experiments. Journal of Mathematical Psychology, 9, 306–319. McKenna, F. P. (1985). Another look at the ‘‘new psychophysics”. British Journal of Psychology, 76, 97–109. Narens, L. (1996). A theory of ratio magnitude estimation. Journal of Mathematical Psychology, 40, 109–129. Newman, E. B. (1974). On the origin of ‘‘scales of measurement”. In H. R. Moskowitz, B. Scharf, & J. C. Stevens (Eds.), Sensation and measurement (pp. 137–145). Dordrecht: D. Reidel Publishing Company. Peißner, M. (1999). Experimente zur direkten Skalierbarkeit von gesehenen Helligkeiten [Experiments on the direct scalability of perceived brightness]. Unpublished master’s thesis, Universität Regensburg. Shepard, R. N. (1978). On the status of ‘‘direct” psychological measurement. In C. W. Savage (Ed.). Minnesota studies in the philosophy of science (Vol. 9, pp. 441–490). Minneapolis: University of Minnesota Press. Shepard, R. N. (1981). Psychological relations and psychophysical scales: on the status of ‘‘direct” psychological measurement. Journal of Mathematical Psychology, 24, 21–57. Steingrimsson, R., & Luce, R. D. (2005a). Evaluating a model of global psychophysical judgments I: Behavioral properties of summations and productions. Journal of Mathematical Psychology, 49, 290–307. Steingrimsson, R., & Luce, R. D. (2005b). Evaluating a model of global psychophysical judgments II: Behavioral properties linking summations and productions. Journal of Mathematical Psychology, 49, 308–319. Steingrimsson, R., & Luce, R. D. (2007). Empirical evaluation of a model of global psychophysical judgments: IV. Forms for the weighting function. Journal of Mathematical Psychology, 51, 29–44. Stevens, S. S. (1946). On the theory of scales of measurement. Nature, 103, 677–680. Stevens, S. S. (1971). Issues in psychophysical measurement. Psychological Review, 78, 426–450. Stevens, S. S. (1975). Psychophysics: Introduction to its perceptual, neural, and social prospects. New York: Wiley. Stevens, S. S., & Guirao, M. (1963). Subjective scaling of length and area and the matching of length to loudness and brightness. Journal of Experimental Psychology, 66, 177–186. Teghtsoonian, M. (1965). The judgment of size. American Journal of Psychology, 78, 392–402. Zimmer, K. (2005). Examining the validity of numerical ratios in loudness fractionation. Perception and Psychophysics, 67, 569–579.