JOURNAL
OF MATHEMATICAL
PSYCHOLOGY
13, 101-118
(1976)
Learning on a Response Continuum: Linear Change and a Functional J. DOUGLAS Bell Laboratories,
Comparison of a Learning Model
CARROLL~
Murray
Hill,
New
Jersey
AND SEYMOUR Rutgers
University,
ROSENBERG New
Brunswick,
New
Jersey
Two reinforcement schedules were used to compare the predictive validity of a linear change model with a functional learning model. In one schedule, termed “convergent,” the linear change model predicts convergence to the optimum response, while in the other, termed “divergent,” this model predicts that a subject’s response will not converge. The functional learning model predicts convergence in both cases. Another factor that was varied was presence or absence of random error or “noise” in the relationship between response and outcome. In the “noiseless” condition, in which no noise is added, a subject could discover the optimum response by chance, so that some subjects could appear to have converged fortuitously. In the “noisy” conditions such chance apparent convergence could not occur. The results did not unequivocally favor either model. While the linear change model’s prediction of nonconvergence in the divergent conditions (particularly the “noisy” divergent condition) was not sustained, there was a clear difference in speed of convergence, counter to the prediction inferred from the functional learning model. Evidence that at least some subjects were utilizing a functional learning strategy was adduced from the fact that subjects were able to “map out” the relation between response and outcome quite accurately in a follow-up task. Almost all subjects in the “noisy” conditions had evidently “learned” a strong linear relation, with slope closely matching the veridical one. The data were consistent with a hybrid model assuming a “hierarchy of cognitive strategies” in which more complex strategies (e.g., functional learning) are utilized only when the simpler ones (e.g., a linear change strategy) fail to solve the problem.
Linear change models for learning on a response continuum have been formalized by a number of investigators (Anderson, 1961; Rosenberg, 1962, 1968; Suppes, 1959, 1960). Most of these models are special cases of a stochastic process described by i Request 600 Mountain
for reprints Avenue,
should Murray
be sent to J. Douglas Carroll, Hill, New Jersey 07974.
101 Copyright All rights
0 1976 by Academic Press, Inc. of reproduction in any form reserved.
Room
7F-412,
Bell Laboratories,
102
CARROLL
AND
ROSENBERG
Rouanet and Rosenberg (1964), termed the general linear model. More recent developments in linear change models basedon conditioning principles are summarized by Estes (1972). Th e c1assof learning tasks with which linear change models for the responsecontinuum are concerned is one in which the subject is given a sequenceof discrete trials, each of which consistsof the following two events: (1) At a signal, the subject makesa choice from a responsecontinuum; (2) the subject is then presented with a display showing the subject’s actual responseand an outcome that definesthe reinforced responsefor that trial. The present experiment involved two schedulesof reinforcement. In both schedules, the reinforced responseon eachtrial is a linear transformation of the subject’s actual responseon that trial; the two schedulesdiffer in the nature of the linear transformation. There is a unique responsewhich, as the result of the linear transformation in either schedule, will result in a correct outcome provided no noise is added to the transformation. Although the transformations were so chosen that the value of this optimal responseon the continuum is the samein both schedules,the generallinear model predicts that the learning of the optimal responsewill take place with only one schedule or transformation, termed convergent, and not with the other schedule, termed divergent. This prediction is basedon the assumptionthat a critical “learning rate” parameter in the model is positive under both reinforcement schedules,an assumptionthat is in accord with previous resultsusing noncontigent and two-person reinforcement schedules(Rosenberg, 1962, 1963; Suppes & Frankman, 1961). An empirical comparisonof asymptotic behavior under the two (contingent) reinforcement schedulesused in the present experiment thus provides a strong qualitative test of the generallinear model. Quite another model to describelearning in this general situation is the “functional learning” model (Carroll, 1963). In qualitative terms, this model assumesthat the subject learnsa functional relationship betweentwo variablesand that this information is used to solve the problem of finding an optimal response.In the present instance, this model assumesthat the subject eventually learns a close approximation to the veridical linear relation between his actual responseand the outcome, this cognitively stored linear relation then being used to identify the optimal response.Evidence for such learning of linear relations, and even of nonlinear relations under someconditions, and for interpolation and extrapolation of these, can be found in Smedslund(1955), Carroll (1963), Bjorkman (1965a,b), Gray (1968), De Klerk, De Leeuw, and Oppe (1970), Hammond (1972), Brehmer (1971, 1973, 1974), and Brehmer, Kuylenstierna, and Liljergren (1974). In order to preclude the possibility that learning takes place in either or both conditions by the subject’s accidental choice of the correct location on the response continuum rather than by the mechanismspostulated in either of these models, the two scheduleswere also run with noise added to the outcome. That is, the outcome for each trial consistedof the (linear) transformation of the responseplus a random
LEARNING
ON
A RESPONSE
CONTINUUM
103
component. With the addition of this random component, the general linear model predicts that the asymptotic value of the response mean will stabilize at the value of the optimal response in the convergent schedule only. The functional learning model predicts convergence under both schedules.
METHOD
Apparatus. Immediately in front of the subject was a panel with a movable knob scaled from 0 to 100 by means of which the subject made a response on each trial. Different colored lights on the same panel served as signals for the subject to respond. Behind and above the panel, at about the subject’s eye level, was an oscilloscope that displayed the outcome on each trial as a small blue dot of light. The screen was masked in such a way as to expose to the subject’s view only an 8-cm square area. Horizontal placement of the dot, in all conditions, was completely controlled by the subject’s response; a 0 response placed the dot somewhere on the left edge of the square and 100 placed it somewhere on the right edge, with a linear relation holding between these limits. The subject was informed of her control over horizontal placement. The experimenter controlled the verical displacement of the dot according to rules that will be described below. A black line 0.04 cm wide was painted on the screen and referred to as the “goal line.” The locations of the goal line, which differed in the two reinforcement schedules, are shown as solid lines in Fig. 1. A correct outcome was defined for the subject as the dot touching anywhere on the goal line. Although the experimenter recorded the location of the dot on each trial, the subject was asked to report whether the dot was above, below, or touching the line by depressing one of three switches. This was done to assure that the subject paid close attention to each outcome on each trial. To enhance motivation, the subject was paid $0.02 for each correct outcome. Subjects. Eighty female undergraduates from an introductory psychology course at a women’s college served as subjects. Most subjects were freshmen or sophmores, with a median age of 19 years. They were paid $1.00 for participating plus whatever they won during the learning trials. Experimental sequence. Subjects in all conditions were given the same taperecorded instructions which explained the routine on each trial. Each trial lasted 12 set and consisted of the following sequence: (a) At a signal, the subject positioned the response knob between 0 and 100; (b) the oscilloscope displayed the dot to the subject; (c) the subject moved the knob out of the numerical interval to an off-scale position and a new trial was started. Following the instructions, the subject was given several practice trials to familiarize her with the experimental routine and with the fact that she controlled the horizontal
104
CARROLL
CONVERGENT
CONDITION
ACTUAL RESPONSE
AND
ROSENBERG
DIVERGENT
T
COMDITION
I
i
ACTUAL RESPONSE REINFORCED RESPDNSE
FIG. 1. Sketch of the outcome display square for the convergent and divergent conditions. The solid line within each square is the subject’s goal line and the dotted line is the basic training function, i.e., the set of possible outcomes in a noiseless condition. Typical “actual” and “reinforced” responses are illustrated for each of the two cases.
displacement of the dot. She was told that the apparatus controlled the vertical displacementbut that during the practice trials, the dot would always appear at the bottom edgeof the squarescreen. The number of training trials that followed practice varied, depending on whether the subject was in a “noiseless” or “noisy” condition (to be distinguishedlater). In a noiselesscondition, the experimental sessionwas terminated when the subject had achieved 20 consecutive “hits” (dot touching line), or had had 150 trials, whichever occurred first. In a noisy condition, 150 trials were given regardlessof the number or distribution of “hits.” The differing criteria for terminating the sessionwere used becausea pilot study had indicated that subjects in a noiselesscondition typically converged rapidly to a stereotyped responsebeyond which little or no change in responseoccurred. In a noisy condition, by contrast, no such stereotypy was anticipated since no single responsevalue could result in lOOo/o reinforcement. In neither case were subjects informed of the criteria for terminating the session. At the conclusion of the session,each subject was given a seriesof 42 cards to enable the experimenter to later “map out” any relation that the subject might have learned between horizontal and vertical placementsof the dot. An &cm square (the
LEARNING
ON
A RESPONSE
CONTINUUM
105
size of the square on the oscilloscope) was printed on each card and the subject was asked to draw a dot within the square based on her best guess about the dot’s location, given the response value shown on the same card. The cards had different response values between 0 and 100 and were arranged in a different random order for each subject. Design. A 2 x 2 design was tions and 22 subjects in each of The same basic relationship placement of the dot in all four
used with 18 subjects in each of the noiseless condithe two noisy conditions. existed between the subject’s response and vertical conditions, defined by the linear function y = 90 - .67x,
(1)
where y is the vertical coordinate of the dot measured on a scale from 0 to 100, and x is the subject’s response value, which also can be regarded as the dot’s horizontal coordinate on a 0 to 100 scale. Equation (1) will be referred to as the training function. The training function is representedin Fig. 1 asa dotted line. One variable in the factorial design was the amount of error (noise) added to the basic training function. In one case,the noiseless,no error was added; in the second case, the noisy, a random error drawn from a beta-distribution was added. The parametersof this beta-distribution were m = n = 1, in which case the frequency function reducesto the following quadratic f(e)
= 6(~ - c”)
= 0
if O
(2)
This distribution is symmetric with a mean of .5 and a s.d. of .25, so that the entire distribution lies within 12 s.d.‘s of the mean. The E’Sobtained from Eq. (2) were linearly transformed to a mean of zero and s.d. of 5, sothat the transformed distribution ranged symmetrically between + 10 and - 10. The distribution was limited to this range to assurethat the added error would never drive the dot out of the viewing area. A different set of 150 such random deviates were computer generatedfor each of the 44 subjectsin the noisy conditions, and were added in sequenceto the vertical coordinates determined by applying the basic training function to the subject’sresponseon each trial. The second variable in the factorial design was the position of the goal line as shown in Fig. 1. The two goal lines can be representedby the following two linear functions: For the convergent schedule, y=lOO-xx;
(3)
and for the divergent schedule, y = 82 - .41x.
(4)
106
CARROLL
AND
ROSENBERG
The two goal lines form the same angle (11.25”) with the training function, and they intersect it at approximately the same point, i.e., x z 30, y z 70 (see Fig. 1).
MODEL
The General Linear
PREDICTIONS
Model
The general linear model (Rouanet & Rosenberg,1964) statesthat z n+1
-
a,z,
+
b,x,
+
C,Y,
(5)
,
where x, = actual responseon trial n, yn = reinforced responseon trial n, and x, is a hypothetical variable associatedwith trial n whose value is x, plus an error component with an expected value of zero. Thus, Ez, = Ex, . The quantities an, b, , and c, are all hypothetical random variableswhose sum on any trial n is equal to one. In the present study, the reinforced responsey,,, can be expressedin terms of the actual responsex, . Substituting the training function (Eq. (1)) into the goal-line function for each reinforcement schedule (Eqs. (3) and (4)), yields the value of yn in terms of X, . Thus, yn = 100 - (90 - .67x,) + (noise, if any)
(convergent)
yn = (82 - (90 - .67x,))/.41 + (noise, if any)
(divergent).
and Substituting each of theseexpressionsinto Eq. (5) and taking expected values yields the following two difference equations Ex,+, = (1 - .33c) Ex, + 10~
(convergent)
(f-5)
Exn+l = (1 + 26~/41)Ex, - 8z/41
(divergent).
(7)
Since the coefficient of Ex, is between 0 and 1 in the convergent condition, Ex, approachesthe value of x (s 30) at the intersection of the training function and goal lines; on the other hand, sincethe coefficient of Ex, is greater than 1 in the divergent conditions, Ex, diverges to +oo or -co depending on Ex, (Goldberg, 1958). This result alsocan be seenusing Fig. 1. That is, the reinforced responseis defined as the one that would place the dot on the goal line if the vertical coordinate of the dot were to remain unchanged.Thus, on trial n + 1 the subject “moves” her response away from her old responseand toward the reinforced one. It is clear from inspection of Fig. 1 that this would lead her closer to the goal line in the convergent conditions, but away from it in the divergent ones. Hence, the general linear model makes qualitatively different predictions for thesetwo conditions.
LEARNING
The Functional
Learning
ON
A RESPONSE
CONTINUUM
107
Model
The functional learning model assumesthat human subjects have the capacity to learn certain relatively “simple” functional relations between two sets of scaled variables. The set of variables comprising the “arguments” of these functions may correspondto stimuli, while the domain set may correspondto responses.In its most general form this model simply postulatesa form of S-R theory, where the mapping of S onto R is continuous rather than discrete, which, of course, requiresthat S and R themselvesbe continuous rather than discrete. To make the model more concrete, we outline the specific formulation advanced by Carroll (1963). Th e most general functional learning model assumes
R =F(po
9P, ,..., P, ; S>,
P, , P, ,..., pm are parameterswhose values define specialcasesof this general function, S is the stimulus variable, and R is the responsevariable (which, for present purposes, are both assumedto be unidimensional). The processof learning, in this model, consistsof assigningvaluesto theseparameters,presumablyby implementation of some sort of “subjective hill climbing” or optimization procedure. This process requires,of course,the existenceof feedback defining the criterion, or “loss function,” for the optimization procedure. In caseswhere both S and R are overt events this might constitute some form of environmentally defined feedback. The basketball player, who makes both the angle and force of his throw continuous functions of perceived distancefrom the basketgets feedback in the form of hits and misses(or of nearnessof “near misses”)and presumably adjusts parametersof his S-R functions basedon this feedback. In the casewhere either S, R, or both are internal representationsof external events, we might supposea different form of feedback. This is particularly so when R is in somesensea “reinforced” response,one that is “supposedto go” with that stimulus with which it has been paired (as in paired associatelearning). In such cases,it may be more appropriate to speakof the learning processasinvolving the relation between two setsof events, one of which is treated asthe predecessorset and the other asthe successorset. In such casesthe “S-R mapping” could be viewed as producing a predicted response (or outcome) which is compared with the actually observed outcome, with some measure of the discrepancy between the two providing the feedback. In the present situation, it will simplify matters if we define the “response” or “outcome” to be, not the vertical coordinate of the dot per se, but the signedvertical distanceof the dot from the goal line (which is, in the present case,a linear function of the dot’s vertical coordinate). The subject is assumedto be motivated to “learn” the relation betweenthe horizontal coordinate of the dot, which plays the role of S (although it is in fact the result of a where
108
CARROLL
AND
ROSENBERG
response, setting the dial, on the part of the subject), and this outcome. Presumably, the subject adjusts parameters of an internally defined function until this “response” can be adequately predicted, that is, so that the overall discrepancy (as measured by the subjective loss function) between the predicted and observed responses is minimized. This process might involve use of memory to store representations of some previous S-R pairings, so that at any stage the loss function defining the overall measure of discrepancy might combine the discrepancy in the current “predicted” response and some composite measure of discrepancy for the memorial representations of previous S-R pairs encountered. One important aspect of the functional learning model, which was confirmed by Carroll, is its prediction of interpolation and extrapolation of a learned S-R functional relation to values of S not previously encountered. This is an important difference between this model and classical S-R theory, as it implies learning of S-R pairings not previously encountered. In particular, the occurrence of extrapolation cannot be derived from principles of stimulus and response generaliztions. Carroll (1963) proposed a special case of this general model in which the functional form was of the more special type of linear functionals, defined as
R = P, + P,h(s)
+ P&u(s) + *** + Pmfin(S)~
that is, a linear combination of simple functions of S only, with the parametersp, ,. .., P,~ constituting the coefficients of the linear combination and pa serving as an additive constant. Carroll was able to show that functions of this form accounted for his data quite well, with m = 3, and the functions fr , fa , and fa being essentially equivalent to the first three “orthogonal polynomials” (linear, quadratic, and cubic, respectively). Stated differently, Carroll’s results indicate that to at least a very good degree of approximation, we can assume these functions to be of the form R = P, + ~1% + p,x2 + ~3%~ (i.e., polynomials of degree 3 or less). Carroll (1963) showed that linear functions were learned especially well and rapidly. This is relatively consistent with findings that will be discussed later. In the present study, the "S-R functions” were restricted to such linear functions. We assume the subject starts with an initial vector of parameter values p” = (PO”, Pl”,.-,Pmo>, and alters this vector iteratively by the “subjective hill climbing” (she) procedure alluded to earlier. We merely assume that, asymptotically, (as n grows large) pn + p*, where pn is the vector of parameter values on the nth trial or iteration, while p* is the vector of parameter values optimizing the subjective loss function. Despite the vagueness of definition of both the she procedure and of the loss function on which it operates, it seems reasonable to assume that, if the ererzi&ul S-R relation is described by a function within the admissible family defined by F, with parameter
LEARNING
ON
A RESPONSE
CONTINUUM
109
vector p, then pn ---f p* = p. It is also assumed that if the veridical S-R relation is approximately given by a function in the admissible family, again with parameter vector p, that p” -+ p* z p. The word “approximately” and the relation “E” will not be defined more explicitly at present, but could be given quite reasonable operational definitions, if necessary. Since, as we have seen, linear functions are clearly within the admissible family F, we would predict that the subjects in the present experiment should (asymptotically) “learn” something very close to this veridical relation. Finally, how does the subject use whatever relation has been learned to produce a response (a dial setting) ? Remembering that the subject’s overt response controls the horizontal dot coordinate, and that we are assuming the subject “knows” this R-S relationship (and so can, for a given value S, produce the response R, that would yield that .value of S), we merely have to postulate that the subject produces a response Kz = Rsn, where S, is defined as that stimulus value such that F(P,“~, A~,..., P,“; Sn) = 0. That is, in words, the response on the nth trial is the response that produces a horizontal coordinate of the dot which, according to the subject’s “current” subjective S-R function, should produce a zero vertical discrepancy between the dot and the goal line. If, as we have argued earlier, the subject’s subjective function asymptotically approaches the “true” function or a function very “close” to it, the subject’s response should also asymptotically approach the “correct” response, or a “close” approximation. (We are ignoring, for the present, the possibility of the subject’s learning a nonmonotonic function. This could cause difficulties, but does not seem to occur in the present situation.)
F&SULTS
AND
DISCUSSION
Since the noiseless and noisy conditions differed with respect to the possibility of perfect asymptotic behavior, it was decided to analyze convergence differently in the two conditions. Convergence in the Noiseless Conditions In the noiseless conditions, where it was possible for the subject to obtain a correct outcome on any trial by giving the optimal response, the criterion of convergence was simply the number of trials before the subject gave 18 out of 20 correct responses. Since both the dot and the goal line have finite breadth, the locus of dot positions interpreted by the subject as being a “hit” (dot touching line) is a narrow band around the goal line. In the apparatus used in this study, this band was determined to be approximately *S/1/2 units perpendicular to the goal line. On the basis of this value,
110
CARROLL
AND
TABLE Number
ROSENBERG
1
of Trials to Learning Criterion Divergent Conditions-Noiseless
Rank order of subjects within condition
Convergent condition
in the Convergent Case
and
Divergent condition
1
24
23
2
24
24
3
24
25
4
25
25
5
25
26
6
25
34
7
26
35
8
26
37
9
26
39
10
28
41
I1
30
41
12
38
43
13
47
44
14
51
52
15
55
59
16
60
73
17
77
81
18
> 150
103
Median
27
40
the goal line, and the training lines used in this study, a correct response was defined as any value f5 response units from the optimal response (defined as the response value that placed the dot in the middle of the goal line when no noise is added). Table 1 summarizes the number of trials to criterion for subjects in the convergent and divergent conditions-noiseless case. As can be seen, all but one subject reached the learning criterion within 150 trials. Moreover, although the median number of trials to criterion is smaller in the convergent condition, a median test indicated no significant difference between conditions. This conclusion is reinforced by the more sensitive Mann-Whitney U test (U = 134, n, = n, = 18, p > .lO). The fact that all subjects in the divergent condition locate and maintain the optimal response (although their rates of convergence may be somewhat slower) is evidence against the linear model. The main qualification to this conclusion, as already noted in the introduction, is that without noise the subject may accidentally find the optimal
LEARNING
response (&5) the numbered The analyses the two noisy
ON
A RESPONSE
111
CONTINUUM
and then reproduce it with sufficient accuracy (which is easy to do on response dial) to preclude the divergent effects postulated in the model. that follow are directed, therefore, to a comparison of convergence in conditions.
Convergence in the Noisy Conditions The random error added to the outcome in the noisy conditions results in some incorrect outcomes no matter how close the subject’s response is to the optimal value. Moreover, to the extent that response variance is likely to match error variance, the convergence criterion used for the noiseless conditions is much too stringent. Instead of specifying any particular convergence criterion for the noisy conditions, comparisons were made between them in learning rate as well as “asymptotic” behavior. Two response measures were calculated for each subject in each successive block of 10 trials: (1) the difference between the response mean and the optimal response
+30t 20
*IO
0
-10 1
2
3
4
5
6
7 TRIAL
FIG. 2. conditions, convergent, &O/13/1-8
6
9
10
11
12
13
14
15
BLOCK
Plots of average response nxans for each of 15 trial block for the two “noisy” with &l standard deviation bounds placed around each. Solid lines are for the dashed lines for the divergent condition.
112
CARROLL
AND
ROSENBERG
value, denoted & (2) res p onse variance, denoted s”. The average values of d and the geometric mean of s (response standard deviation) are plotted in Figs. 2 and 3, respectively. In Fig. 2, the &l standard deviation (across subjects within the trial block) bounds for J are also plotted for the two conditions. These bounds are considerably more conservative than would be provided by the .05 confidence bounds in each case, which, of course, would be closer to f2 s.d. units about each mean J. It should be noted that the information contained in these bounds is different from that provided in Fig. 3, which shows the geometric mean (over subjects) of the standard deviation within each subject (and within trial block). These two figures appear to support these general observations: (1) a tendency toward convergence in both conditions, as indicated by a downward trend in both measures; (2) a slower rate of convergence in the divergent than in the convergent condition; (3) considerable overlap of the two conditions, as exhibited by the fact that the fl s.d. bounds for either group tend to contain the Ct curve for the other group. Analyses of variance confirmed the trial-related effects observed in Figs. 2 and 3.
13 z 0 5 5 x * g
4211
-
io9-
f5
8-
ii
7-
a
6-
ki 0
5-
F:
4-
p
32l-
0”““” 1234567 TRIAL
1
”
0
9
10
1
”
11
12
13
1
’
14
15
BLOCK
3. Geometric means across subjects of standard deviations of responses subject and within trial bIock, for each of the 15 trial blocks of the two “noisy” conditions. lines are for the convergent condition, dotted lines for the divergent condition. FIG.
within Solid
LEARNING
ON
A RESPONSE
CONTINUUM
113
In order for the linear model to account for learning in both convergent and divergent conditions, it is necessary that the learning-rate parameter 5 be positive in the convergent condition and negative in the divergent condition. Estimation of c shows this to be the case: Estimated E is z 0.5 in the convergent condition and E -0.2 in the divergent condition. This difference, and also the signs of the E’S in the two conditions are highly statistically reliable as indicated by the fact that E’S estimated within the trial blocks were negative in all trial blocks in the divergent condition. The difference in absolute value is also consistent with the observed difference between the two conditions in learning rate. Nevertheless, the value of the linear model is severely limited in the absence of a theoretical rationale that describes the relation between c and reinforcement schedule. The functional learning model, on the other hand, has the advantage that it is not necessary to adjust the value of a learning parameter ad hoc according to the reinforcement schedule. This model is also capable of accounting for the learning by the subject of the entire linear relationship between response and outcome and not just the correct response (see next section). However, the functional learning model is also limited to the extent that there is a reliable difference between the two conditions. If all that were involved in the learning process is the subject’s learning of a functional relationship, no differences would exist between the two conditions. It may be that subjects in the divergent condition gradually switch to the more cognitive learning process (functional learning) when they perceive that approaching the response value reinforced on the previous trial does not work. This could result in a slower learning rate for the divergent group but in the same asymptotic behavior as the convergent group. The two groups are apparently approaching the same d and s2, as already noted. A comparison of the two groups on the last 10 trials with a Mann-Whitney U test showed no statistically reliable differences. For Ct, U = 230.5 (n, = 1z2= 22, p > 5). For s2, U = 176.5 (n, = ns = 22, p > .lO). Analysis of “questionnaire” data. The data from the cardsgiven subjectsat the close of the experiment were analyzed with a view to determining the extent to which subjectshad “learned” the linear function underlying the relation between horizontal and vertical position of the dot (or, alternatively, between the subject’s responseand vertical position). Since many subjectsin the “noiseless”conditions found the solution very rapidly, whether by fortuitous choice of the optimal response(15) or by processes postulated in either of the modelsdescribedhere, these subjectswere not necessarily expected to have learnedthe relationship with any degreeof precision. Therefore, only the questionnaire data for subjects in the noisy conditions will be discussedhere. There were two questionsof interest. First, did subjectslearn any linear relation ? Second, did subjects tend to learn the correct linear relation ? The first question was answeredby reference to the correlations between horizontal and vertical placement
114
CARROLL
AND
ROSENBERG
of the dot on these cards for each subject. The second question was answered by reference to the slopes of the estimated linear functions. The intercept was not considered of relevance because the exact position of the subject’s head and a slight drift in the apparatus affecting vertical placement of the dot might make the effective intercept different for different subjects. Table 2 shows the value of these two indices
TABLE Correlations
and Slopes
Convergent Subject number
of Regression
2 Lines
from
condition
T
Questionnaire
Divergent
Slope
Subject number
Data
condition
T
Slope
C-l
-.990
***
-0.963
D-l
-.723
***
- 0.408
c-2
-.998
***
-0.982
-.946
***
-0.762
c-3
-.894
***
-.957
***
-0.699
c-4
+.934
***
-.941
***
-0.525
C-5
--.930
***
-.961
***
C-6
--.978
***
c-7
--.944
***
-0.723 +0.735 -0.848 -0.941 -0.943
.066 -.969 ***
iO.067 -0.909
D-2 D-3 D-4 D-5 D-6 D-7 D-8 D-9
-0.057
D-10
-.673
-0.384
D-11
-.670***
-0.503
-0.956
D-12
-0.291
D-13”
-.864"** -
-0.803 -
c-14
- .080 -.707 *** -.989 *** - .260 -.457 **
-0.429
D-14
-.937
***
-0.741
c-15
- .332 *
-0.162
D-15
-.912
***
-0.491
C-16
-.930
***
-0.866
-.870***
c-17
-.994
***
- 0.989
D-16 D-17
C-18
-.965
***
-0.818
D-18
-.953
***
-0.747
c-19 c-20
.007 -.977 ***
+0.007 -0.783
D-19
-0.812
D-20
-.846*** -.937 ***
c-21
--.873
***
-0.711
D-21
-.854
***
-0.673 -0.628
c-22
-.602
***
-0.399
D-22
-.791
***
-0.531
C-8 c-9 c-10 C-l 1 c-12 c-13
0 Subject D-13 failed *p < .05. **p < .Ol. ***p < .OOl.
to complete
the questionnaire.
p.158
-0.894 -0.120
-.954
***
-.926
***
-0.703
-.779
***
-0.646
***
-0.421
-0.724
-0.602
-.852***
- 0.230
LEARNING
ON
A RESPONSE
CONTINUUM
115
for subjects in the two conditions, together with associated significance levels for the correlation coefficients. In the noisy convergent condition, 18 of the 22 correlations are statistically significant at the .05 level, and of these, 15 are significant beyond the .OOl level. Taken as a whole, it is clear that these subjects have “learned” a linear relation between horizontal and vertical dot placement or, to be precise, are exhibiting this relation in their questionnaire responses. Of the significant correlations, all but one have the correct sign (negative). Interestingly, the one significant positive correlation was highly significant (p < .OOl), so that it is highly unlikely that this could have arisen by chance. By contrast, none of the correlations in the divergent condition are in the wrong direction. Moreover, all but one subject in the divergent condition exhibited correlation coefficients significant beyond the .OOl level. These values ranged from -.670 to -.957, indicating a generally extremely high level of relation. Thus, there would appear to be a generally higher degree of relatedness between horizontal and vertical coordinates of subject’s “maps” in the divergent condition than in the convergent condition. This is consistent with the notion that subjects in the divergent condition are more likely than subjects in the convergent condition to rely on the strategy embodied in the functional learning model. As for the data on slopes of the regression line for each subject, also shown in Table 2, the interest here is in whether or not these tend toward the slope of the training line. Since it does not seem reasonable to assume these regression coefficients to be normally distributed over subjects, a sign test was chosen to determine whether these coefficients differ significantly from -.67, which was the slope of the training line for both conditions. The results were all nonsignificant. One might suppose that some subjects misunderstood the instructions and were in fact “mapping out” the goal line rather than the training line. Since the training line was fairly close in slope to the goal line, this may be hard to discriminate. However, it turns out that we can reject the hypothesis that the median slope equals that of the goal line in both conditions, by a sign test (p < .OOl in both cases). It is the case, however, that in both conditions the median slope tends somewhat toward that of the goal line, which suggests that at least some subjects may have been so misinterpreting the instructions. The actual median slopes for the two conditions were -.800 and -.640, for the convergent and divergent conditions, respectively.
CONCLUSIONS
The present experiment has not resulted in unambiguously rejecting either model in favor of the other. On the one hand, the prediction of the linear change model (at least if E is assumed to be fixed) of convergence in one case but not in the other has not been sustained. The data indicate convergence in both cases. On the other hand, 480/13/I-9
116
CARROLL
AND
ROSENBERG
there is some difference in rate of convergence, favoring the convergent over the divergent condition, contrary to the prediction derived from the functional learning model that convergence should be equally fast in both conditions. The functional learning model is supported by the questionnaire results showing that most subjects (particularly in the divergent condition) did learn a linear relation close to the veridical one. At least some of the subjects in the convergent condition apparently did not learn such a relation at all, however, and one subject learned a linear relation of slope opposite in sign to the correct one. Taken as a whole the data seem to be most consistent with what might be called a “hierarchy of cognitive strategies” hypothesis. It might be speculated that in solving such problems subjects begin with simpler cognitive strategies, and progress to more complex strategies only when the simpler ones fail to solve the problem at hand. In the present case, the strategy implied by the tinear change model could be viewed as a simpler one that is tried first, and the functional learning strategy as a more complex one. Subjects in the convergent condition can solve the problem, or at least attain a close approximation to the optimum solution, by using the simpler linear change strategy. But in the convergent noisy condition, a more adequate solution can be obtained by shifting to a functional learning approach, which many of these subjects may have done-as evidenced by the questionnaire results. In the divergent conditions, a functional learning strategy appears to be a necessary condition for attaining even an approximate solution to the problem, a fact consistent with the observation that all subjects in this condition seem to have learned the linear relation defined by the training line, as well as the fact that convergence does occur in this condition. This idea of a hierarchy of cognitive strategies is quite consistent with the idea of a hierarchy of hypotheses about functional relations first suggested by Bjorkman (1965b) and speIIed out in more detail by Brehmer (1974). (See also Brehmer, Kuylenstierna, & Liljergren, 1974). Brehmer’s formulation explicitly postulates a hierarchy ranging from linear functions with positive slope, linear functions with negative slope, to two kinds of essentially quadratic curvilinear functions (those having a maximum versus those having a minimum), and finally (presumably) to certain classes of more complex nonlinear functions (e.g., cubic, or periodic functions). He bases this in large part on evidence by Bjorkman and Brehmer, indicating that this is the order of difficulty of the functions. (Carroll, 1963, had found no difference in efficiency of learning linear functions with positive versus negative slopes, but did find-clear evidence that linear functions are learned much better than quadratic.) Our findings are highly consistent with this notion, particularly since it would be impossible to distinguish empirically between the learning of a linear function with positive slope and the linear change model (with positive E), since the two would lead to identical predictions. Thus, the linear change model could be viewed as simply resulting from the first phase in a hierarchically organized hypothesis testing strategy, in which the hypothesis is that the S-R function is linear with positive slope.
LEARNING
ON
A RESPONSE
CONTINUUM
117
The exact nature and organization of this hypothesized hierarchy of cognitive strategies, the detailed microstructure of each specific strategy within the hierarchy and the mechanics of the process of shifting between strategies within the hierarchy, would appear to offer quite promising ground for fruitful future research. The functional learning strategy, in particular, is one whose characteristics deserve wider exploration, including cases where nonlinear functions or functions of more than one variable are involved. The type of task employed in this study appears to be a useful one for such investigations.
REFERENCES N. H., Two learning models for responses measured on a continuous scale. Psycho1961, 26, 391-403. BJORKMAN, M., Learning of linear functions: a comparison between a positive and a negative slope. Report Psychology Laboratory, Univ. of Stockholm, No. 183, 1965. (a) BJORKMAN, M., Studies in prediction behavior: explorations into predictive judgments based on functional learning and defined by estimation, categorization, and choice. Scandinavian Journal of Psychology, 1965, 6, 129-156. (b) BREHMBR, B. Subjects’ ability to use functional rules, Psychonomic Science, 1971, 24, 259-260. BREHMER, B. Effects of cue validity and interpersonal learning of inference tasks with linear and nonlinear cues. American Journal of Psychology, 1973, 86, 29-48. BFIEHMER, B. Hypotheses about relations between scaled variables in the learning of probabilistic inference tasks. Organizational Behavior and Human Performance, 1974, 11, l-27. BREHMER, B., KUYLENSTIERNA, J., & LILJERGREN, J. Effects of function form and cue validity on the subject’s hypotheses in probabilistic inference tasks. Organizational Behavior and Human Performance, 1974, 11, 338-354. CARROLL, J. D. Functional learning: The learning of continuous functional mappings relating stimulus and response continua. Educational Testing Service Research Bulletin, RB-63-26, 1963. DEKLERK, L. F. W., DELEEUW, J., & OPPE, S. Functional learning investigations with real valued functions. Psychological Institute, Univ. of Leiden, No. E024-70, 1970. ESTES, W. K. Research and theory on the learning of probabilities. Journal of the American Statistical Association, 1972, 67, 81-102. GOLDBERG, S. Introduction to dtyerence equations. New York: Wiley, 1958. GRAY, C. W. Predicting with intuitive correlations. Psychonomic Science, 1968, 1 I, 8 I-1 02. HAMMOND, K. R. Inductive knowing. In J. Royce & W. Rozeboom (Eds.), The psychology of knowing. New York: Gordon and Breach, 1972. ROSENBERG, S. Two person interactions in a continuous-response task. In J. Criswell, H. Solomon, & P. Suppes (Eds.), Mathematical methods in small group processes. Stanford: Stanford Univ. Press, 1962. ROSENBERG, S., Behavior in a continuous-response task with noncontingent reinforcement. Journal of Experimental Psychology, 1963, 66, 168-l 76. ROSENBERG, S. Mathematical models of social behavior. In G. Lindzey and E. Aronson (Eds.), The handbook of social psychology. Reading, Mass.: Addison-Wesley, 1968. Vol. I, revised ed. ROUANET, H., & ROSENBERG, S. Stochastic models for the response continuum in a determinate situation: Comparisons and extensions. Journal of Mathematical Psychology, 1964, 1, 215-232. ANDERSON,
metrika,
118
CARROLL
AND
ROSENBERG
SMEDSLUND, J. Multiple-probability learning. Oslo: Academisk-Forlag, 1958. SUPPES, P. A linear model for a continuum of responses. In R. R. Bush & W. K. Estes (Eds.), Studies in mathematical learning theory. Stanford: Stanford Univ. Press, 1959. SUPPES, P. Stimulus sampling theory for a continuum of responses. In K. J. Arrow, S. Karlin, 8L P. Suppes (Eds.), Mathematical methods in the social sciences. 1960, Stanford: Stanford Univ. Press, 1960. SUPPES, P., & FRANKMANN, R. W. Test of stimulus sampling theory for a continuum of responses with unimodal noncontingent determinate reinforcement. Journal of Experimental Psychology, 1961, 61, 122-132. RECEIVED:
October
22, 1974