Sequential item selection: Optimal and heuristic policies

Sequential item selection: Optimal and heuristic policies

JOURNAL OF MATHEMATICAL PSYCHOLOGY 23, 134-152 (1981) Sequential Item Selection: Optimal and Heuristic Policies S. P. MARSHALL University of Cali...

1MB Sizes 1 Downloads 76 Views

JOURNAL

OF MATHEMATICAL

PSYCHOLOGY

23, 134-152 (1981)

Sequential Item Selection: Optimal and Heuristic Policies S. P. MARSHALL University

of California,

Santa

Barbara

This paper considers testing in which the goal is to minimize the number of test items required to establish a learner’s state of ability. Focus is on optimal or near optimal selection over a well-defined universe of items or stimuli. Selection policies are determined for the case in which the items have hierarchical or partial hierarchical relationships. Derivation of an optimal policy rests upon techniques from dynamic programming. For situations in which an optimal policy may be too costly to compute, two heuristic approximations are offered. One heuristic counts the hypothetical estimates of ability that remain tenable following a response to each item and chooses the item that minimizes the expectation of that number. The other selects the item that maximizes the statistic of information.

How does one evaluate individual performance on a learning task? One can present items representing all aspects of the task, one can present a random selection of such items, or one can seek an optimal or near optimal sequence of items. Item selection is, of course, more general than the context of learning tasks and has applications in any study of individual performance. This paper is concerned with optimal and near optimal item selection. Previous studies of optimal item or stimulus selection generally were concerned with optimal repetition of some or all of the items for the purpose of allowing the subject to learn the skills involved (Chant & Atkinson, 1973; Atkinson, 1972; Atkinson & Paulson, 1972; Groen & Atkinson, 1966; Karush & Dear, 1966). For example, the Karush and Dear model selects as the current test item that item with the least probability of having been learned by the individual. The model presupposes that items have already been presented once in random order, producing estimates of which items are unlearned. Similarly, Atkinson and Paulson compare three strategies of item presentation using three distinct models of learning. In all of these studies, the emphasis is on which of the already presented items to repeat. Initial selection of items is random. Consider an alternative situation in which the objective is optimal selection of items that will be presented only once. Restriction to this case may be motivated by considerations of experimental design or by cost of testing. Items may be difficult or costly to administer more than a single time (e.g., oral examinations or laboratory Address reprint requests to Dr. S. P. Marshall California, Santa Barbara, Calif. 93 106.

134 0022.2496/81/020134-19$02.00/O Copyright All rights

0 198 1 by Academic Press, Inc. of reproduction in any form rcscrvcd.

at: Department of Psychology, University of

SEQUENTIAL

ITEM SELECTION

135

procedures), the individual to be evaluated may be available for only a limited amount of time, or the evaluator’s time may be constrained. The situation of interest here is that in which at least some of the items to be tested have common components and in which there is at least a partial hierarchical ordering of the items. One model for optimal item selection in this case is a Markov decision process that depends upon the current estimate of an individual’s ability as well as the internal structure of the items themselves. The latter refers to distinguishable components of each item in addition to the item’s relationship to other items in the universe. It is typically not necessary to test every possible item. Some items will be dependent upon others, and lack of ability on one item implies a similar lack on others. This paper has four sections. The first consists of the specification of item types, the characterization of any individual’s ability on the items, and the probability distributions over the possible ability characterizations. The second section defines the model as a Markov decision process, and an optimal policy for item selection is derived. In the third section, two heuristics approximating the optimal policy are presented. One heuristic selects the item that minimizes the number of remaining tenable ability states, and the second selects the item that maximizes the statistic of information. The final section contains comparisons of the optima1 policy and the two heuristics.

ITEM TYPES AND PROBABILITY

DISTRIBUTIONS

Suppose there is a set of components or skills to be tested. It is possible and even desirable to test these components individually and also in logical combinations. A hierarchical relationship exists between a combination and its component parts, and under the usual restriction of hierarchical structures, success on any combination requires success on all individual components of that combination. Thus, it will be advantageous to avoid testing any combinations containing components known to be unmastered by the individual being tested. Similarly, if the individual has demonstrated mastery on a combination item, it will be unnecessary to test singly the individual components of the combination. These components must be mastered since the individual has demonstrated the ability to use them in conjunction with other components. Many items are typically available for each skill or combination of skills. The class of items for a specific skill or specific combination is called an item type. For example, assume there are three basic components to be tested: c, , c,, and c, . The logical combinations of these are c, + c2, c, + c,, c, + cJ, and c, + c, + c3. The number of possible item types is seven, three individual ‘components and four combinations. At the most, seven items could be presented by giving each type once, and at the least, one item could be presented. In the latter case, a correct response to the combination of c, + c, + c3 would indicate mastery of all components and combinations. If the individual were given the combination c, + c2 as the first item

136

S. P. MARSHALL

and responded correctly, certain information is obtained from the structure of the item and the response. It is now known that the individual has mastered not only c, + c2 but c, and c, by themselves as well. Consequently, the number of possible items that remain to be tested is four rather than six. Although the assumption is made that a correct response on a combination implies mastery of the components of the combination, it is not assumed that mastery of the individual components implies similar mastery of the combination of those components. Thus, in the above example, suppose the individual had been presented c1 and c2 as the first two items and had responded correctly. It would still be necessary to test c, + c2 because of a possible synergistic relationship between the combination and its components. That is, the whole may be greater than the sum of its parts. Some added complexity is present in the combination of the components. This complexity has been previously defined as the procedural component (Cotton, Gallagher, & Marshall, 1977). The procedural component may be the result of needing to perform the component skills in a certain order or may be simply a confusion factor occurring when several components are present in the same item. In any case, mastery of individual components does not imply mastery of combination items. It is convenient to have a summary of an individual’s achievement on the possible TABLE Possible

Ability

Estimates

1 for Seven Item Types Item

Ability

estimates 1 2 3 4 5 6 1 8 9 10 11 12 13 14 15 16 17 18 19

Cl

C2

c3

Cl +c2

Cl +c,

1 1 1 1 1 1 0 1 1 0 1 1 0 0 1 0

1 1 1 1 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0

1 1 1 0 1 0 0 1 0 0 1 0 0 0 0 0 0 0 0

1 1 1

I 1 0 1 1 0 1 0 0 0

0 1 0

types

cz + c3

c, + c2 + c3

0 0 0 0 0 1 0 0 0 0 0 0 0

1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

SEQUENTIAL

ITEM

SELECTION

137

item types. A vector of O’s (nonmastery) and l’s (mastery) provides this information. The number of elements in the vector is determined by the number of possible item types. A matrix can be developed, with each row being a possible ability characterization of the individual. For p item types, there are 2p vectors. This is an upper bound, and it is of interest because it applies generally. Formulas for the actual number of vectors would necessarily vary with the hierarchical structure of the model. For instance, in the simple example of 3 individual components given above, there are 7 item types and 19 possible vectors, rather than 2’ or 128. For a simpler example, with only two components, there are 3 item types and only 5 vectors rather than 8. The vectors for the three-component example above are shown in Table 1. The vector containing all l’s indicates the individual’s mastery of all item types. The vector containing all O’s indicates ignorance of all item types. The ability vector (1 1 1 0 1 1 0) means that the individual could successfully respond to item types of and c2 -t c3. The individual has not shown mastery on c, + c2 or Cl + c3, Cl, Czr c3, c,+c,+c,. The set of vectors in Table 1 represents all possible characterizations of an individual being tested in the three-component, seven-item example. At each stage of the testing process, there is some probability that the individual is characterized by each of these vectors. Testing is complete when all but one of the characterizations have probability zero. It is convenient to introduce the following notation in order to specify probabilities for the general case of p item types and n ability vectors: a = l,..., p R(a) = 1 if correct = 0 if incorrect 6 = l,..., n

D = [Ll

item types response to a ability types matrix relating ability types to item types (n x P)

d,, = 0

if ability 9 cannot answer a = 1 if ability f3 can answer a

de = (4x4,

a-. d,) S = (s(l) s(2) ... s(n))

s(O) > 0

ability vectors probability distribution over ability types probability that 19 is true ability where Ce s(0) = 1; s(0) may also be denoted s,.

When an individual responds to a specific question a, the response is recorded as correct or incorrect (1 or 0). The current information now consists of the vector S of prior probabilities plus the new information contained in the individual’s response R(a). The probabilities can be adjusted in light of the new information. This is a natural situation in which to apply Bayes’ theorem. The objective is to calculate the

138

S.p.MARSHALL

conditional probability that the individual’ has the true ability summarized in a particular vector given the individual’s response to an item. Suppose there is an initial probability vector S as defined above. An item of type a is presented, and the response observed. The probability of ability t?, given this response, is s*(e) = prob(dlR(a))

=

proW(a) I@ s(e) Co1 (PrWWW’)W))’

(1)

For some abilities 0, prob(R(a) 10) = 0; consequently, s*(B) = 0 for such vectors. A new vector, s* = (s*( 1) s*(2) . * * s*(n)), may be formed from the repeated application of (1) over all 8. This is the posterior distribution, and it can be used as a prior distribution for the next item presentation.

OPTIMAL ITEM SELECTION

The Markov Decision Process The set of ability vectors, an initial probability distribution across the set, and a rule for updating the probability distribution are used in the model of optimal item selection. The logic connecting these features is that of a Markov decision process. A state of the model is a probability distribution across the possible ability vectors. Under this definition, the prior S described above is a state of the system. The states are denoted Si. Although the system in theory contains an infinite number of states, in fact there is a finite set of possible successor states given any particular prior. Given a prior, the number of possible states of the system is 3p. The number 3 indicates the outcomes for any possible item type: answered correctly, answered incorrectly, or not yet presented. Under hierarchical restrictions on the item types as described above, the number of possible states is further reduced. For example, with two components, there are three item types p: the two components and their combination. The number of distinct states is 33 = 27. Under hierarchical restrictions, the number of distinct states is 33 - ,I, where 1 represents the number of restricted states. For this example, L = 3 (the states are eliminated for which mastery of the combination occurs jointly with the nonmastery of either or both components). For other cases, calculation of I is a complex combinatorial problem and is not considered here. The important point for the current model is the determination of 3p as the upper bound for the number of distinct states. The system described here is a Markov process. The probabilities for moving from one state to another can be given in a matrix of transition probabilities. The

SEQUENTIAL

ITEM

139

SELECTION

probability of being in any state on trial t + 1 depends only upon the state of the system in trial One typical row of such a matrix is given below by Si: t.

Trial

Trial

t

s,

s2

**’

0

0

.--

Sj-1

Sj

t

+ 1

Sj+,

‘**

Sk-1

.a.

0

Sk

Sk+,

s, s2 ii

0

Q

0

(1-Q) 0

For Si (the prior distribution), Sj and S, are the posterior distributions derived from the individual’s response (correct or incorrect) and the distribution of Si. For any row, only two entries will be nonzero, because only two transitions are possible as a function of the two categories of response. Given a particular prior, there are only two posterior distributions that can be calculated. The individual’s response depends upon the particular item czpresented. Details of the selection process are discussed in the next section of this paper. For the moment, assume that a selection process exists and that it is a function of the current state of the system. Define this function as a@,) with Si being the current state. In State Si, a(Si) generates an item and the individual responds. Let Q be the probability of a correct response. Then Q is also the probability of the transition from Si to Sj: Q = prob(R(a(S,))

= 11Si).

In general, the transition probabilities will be designated qij, the probability of moving from S, to Sj. The process is observed at time points t = 1,2,... to be in one of a certain number of possible states. For any observed Si, an action must be taken. An action may be either the presentation of an item or the termination of testing. If the process is in Si at time t and action a is chosen, the next state is identified by the transition probability qij(a), the conditional probability that the system moves to Sj at time t + 1 given Si and a. There will be costs associated with giving the test. Costs are naturally denominated subject hours, experimenter hours, dollars, or other units. In general, the cost is dependent upon the number of items. For simplicity, assume a constant incremental cost per item and, without further loss of generality, take the incremental cost to be unity. The incremental cost of stopping the test is zero. However, the process should only stop in “terminal” states, where a terminal state is one with certainty of identification, i.e., one in which Si has a single unitary element and zeros elsewhere. The technical means of assuring termination only in “terminal” states is to define m, the cost of terminating in a nonterminal state, to be sufficiently large. Loosely, one takes m to be

140

S.P.MARSHALL

infinity. The cost assumptions surrounding the decision to offer another item or to stop testing are summarized by C(Si,t, a,) = 0 =??I = 1

if action is termination if action is termination otherwise.

and Si is terminal and Si is not terminal

Both the transition probabilities qij(a) and costs C(S,, a) are functions only of the current state and action. The process is by definition a Markov decision process, and as such, it has properties that allow determination of an optimal policy for taking any action. This optimal policy is defined here as the optimum next-item function and is described below. The Optimum Next-Item Function There are three considerations here: to define what it means for a policy to be optimal, to assure the existence of such a policy, and to find practical means to synthesize the optimum. The general method of characterizing an optimal policy is derived from dynamic programming principles. For any policy 7t and initial state Si, define the total expected cost to be v*(si>

=Ez

fj

(

t=o

c(si,t7

af)lSi

1

i > 0.

7

E, is the conditional expectation given policy II. For C(Si,t, a,) = 1 for all t, prior to appropriate termination, the total expected cost of testing is the average number of items needed to reach the terminal state that identifies the ability vector. An optimal policy may be defined as follows. First, let V(Si) be the valuation function having the smallest value resulting from any policy rc, when the model is in state Si. Thus, V(Si) = inf V,(S,), x A policy rr* is optimal

i > 0.

if Vz,(Si) = V(Si).

One can compute expected values V,(Si) for all possible policies in order to select an optimal one, but for a large number of policies, the calculations may be difftcult. Computational effort can often be reduced by use of the following theorem from dynamic programming. THEOREM

1. V(Si) = m,‘n C(Si, a) + 2 4(,(a) I j=O

v(sj)

I

9

i > 0,

SEQUENTIAL

ITEM SELECTION

141

where all terms are as defined above. The general proof of this theorem can be found in any text of dynamic programming and is therefore omitted here (see for example, Dreyfus & Law, 1977, or Ross, 1970). With Theorem 1, one finds the minimum cost of Si by adding the cost of the next item C(S,, a) to the cost of each possible next state and then taking the minimum value of the calculated sums. Equation (2) is the functional equation of the optimal policy. Optimum policies exist because the policy space is essentially finite, as shown above. (The number of items is finite, and each needs to be presented only once.) It remains to synthesize an optimum policy. Define a stationary policy as a function mapping the state space into the action space. Policy f is the policy that selects action f(S,) when the process is in Si. Let N(Z) be the set of all functions from Si into nonnegative real numbers, and let U’ be a policy. Then, u(Si) = VU,(Si) is the corresponding valuation. Clearly, u is a member of N(Z). Given policies f and u’, one can define a valuation T,u for following policyffor one step, and thereafter following u’. That is,

This defines a mapping T,: N(Z) -+ N(Z). If policy f is followed for the first step only, we have Tfu. If it is followed for the first two steps and then policy U’ is followed, we have Tju. ‘I;u is, therefore, the valuation for following policy f for the first n steps and policy U’ thereafter. From dynamic programming, we have the following lemma and theorem: LEMMA

(i) (ii) (iii)

1. For u, v E N(Z), and f a stationary policy, u < ZJ+ T,u < Trv, T,V,= Vf, (qO)(S,)-+ VAS,) as n-+ co for each Si,

where 0 represents the function that is identically TAT;-‘).

zero, Tj = Tf, and for n > 1, q =

2. Let f, be the stationary policy that, when the process is in state Si, selects the action (or an action) minimizing THEOREM

c(si,a>+

14ij(a)

v(sj>*

Then vf,(si>

and f, is optimal.

=

v(si)

for all

i>0

142

S.P.MARSHALL

(For proofs of Theorem 2 and Lemma 1, the reader is referred to Ross, 1970.) In summary, there exist optimum policies and subsequent V(S,), the optimum valuation. Given the optimum valuation, one can find f, the optimum policy. In practice, one may sometimes use policy improvement routines related to Lemma 1 in synthesizing optimum or near optimum policies.

Two HEURISTICS APPROXIMATING

THE OPTIMAL

POLICY

Given the functional equation (2) of the valuation function, one can construct an optimal policy by use of dynamic programming strategies. However, in many instances there are large numbers of intermediate states and the calculations become unwieldly, if not impossible. The problem lies in the number of states that must be specified prior to the backward enumeration process of dynamic programming. Characteristically, the solutions to dynamic programming problems are too costly to compute, and the preferred approach to computation is through heuristic or systematic approximations to the true solutions. Two heuristics are evaluated here. Heuristic One (Hl)

The objective in the experimental testing situation defined here is the classification of an individual’s ability over a set of items, using as few of the items as possible. One heuristic way to appoach this objective is to select an item that will reduce the number of plausible ability characterizations as much as possible. Recall that {d,} is the set of vectors describing ability over the items and that S is a probability distribution across the set. Under the assumption that an individual does not guess successfully, the response to any given item will increase or leave unchanged the number of zero elements in S. A value of 0 for any s, means that d, is an impossible description of the individual’s ability. The objective may be restated as the maximization of the number of zeros in S. The next item should yield the fewest nonzero terms for S. Formally, HI’s next item function a@,) is defined in the following way. For n ability vectors and p items, construct a matrix W of order n x p. The typicai element w,, is the number of expected reductions in tenable ability vectors, given true ability f3 and item type a. Weight each w,, by the probability of having ability 0, and sum the results by column. In matrix notation, that becomes SW = X’, with X’ being the transpose vector of weighted reductions. The next item to be presented is the one corresponding to the largest element in X. Thus, CY(~~)is the item achieving

myxa=yx

(z, %WO~)

for

a = l,..., p.

The matrix W changes following item presentation and response of the individual. Some of the n vectors will be replaced by vectors of zero entries, indicating that they are no longer possible, and the column corresponding to the item selected will contain

SEQUENTIAL

ITEM

143

SELECTION

only zeros also,since no new information can be gathered by repeating an item known to be mastered or nonmastered. The function is applied repeatedly until a final state of ability identification is obtained with only a single nonzero vector remaining in W. Heuristic

Two (H2)

An alternative approximation to an optimal information (Attneave, 1959). This is a number, information conveyed by the use of any item. more probable the message the less information monotonic function of probability, and iff

Z@i) < Z(Xj)

policy depends upon the statistic of 0 < Z < 1, referring to the amount of According to information theory, the .it carries. Information is an inverse

prob(x,) > prob(xJ

where xi and xi are the messages. The usual definition of Z(x,) is Z(xi) = -log,

prob(x,)

(see Coombs, Dawes, & Tversky, 1970, for the derivation of the definition from more elementary concepts). The expected information obtained from receiving one of a set of m messages is E(Z) = - f i=

prob(x,) log, prob(x,)

(4)

I

and this is commonly called uncertainty. The criterion for this heuristic approach is the maximum information conveyed by an item. In effect, the item with information value closest to unity will be selected. Each item leads to one of two messages: x, = correct and x2 = incorrect. We want to determine the value of prob(x,) that characterizes the item xi with maximum information. Clearly, prob(x,) as close as possible to i characterizes the best item: If prob(x,) = 5 and prob(x,) = +, then from (4) we have E(Z)={~log,~+~log,~} = - {$(log, 1 - log* 2) + i(log, = -{log, 1 -log* 2) = 1.

1 - log, 2)}

But unity is the maximum possible value of E(Z). The heuristic, therefore, selects the item with prob(x,) closest to $ or .50. The procedure is similar to that used in the first heuristic. Instead of the matrix W, however, we use the original matrix D of ability vectors. Define X as the row vector formed by weighting each row of D by the appropriate value s, and X = SD. The next item is found by selecting the value in X that is closest to .50. That is, n

min X, = min a

Q

(I

‘i- sod,,-.50 2,

I)

.

144

S. P. MARSHALL COMPARISONS

OF THE

HEURISTICS

AND

OPTIMAL

POLICY

At issue is the question of how similar or dissimilar are the two heuristics and the optimal policy. There are two general cases to consider: (1) uniform priors and (2) nonuniform priors. These are evaluated separately. Uniform Priors Comparison of the two heuristics with the optimal policy: An example. With relatively few item types, it can be shown that both heuristics coincide with optimal policies, given a uniform distribution. For example, in calculating an optimal policy for the case of two components and three items, there are five possible vectors a, characterizing ability:

(1) (2) (3) (4) (5)

Cl

c2

1 1 0 1 0

1 1 1 0 0

Cl + c2 1 0 0 0 0

and a state of the model is defined as a five-element probability vector. Possible actions are: a,, present c, ; a,, present c,; and a3, present ci + c2. The decision to stop is automatically invoked when a terminal state is encountered. There are 38 possible states in this example, but only 13 are distinct. These are shown in Fig. 1. To calculate V(S,) in Fig. 1, it is necessary to determine all possible V(Sj). That is, it is necessary to find a cost for each state that includes the cost of all subsequent actions. These costs may be calculated by beginning at the outermost point of the tree, where subsequent cost is zero and action is termination. Figure 1 also gives the costs for the current example; they are the numbers in parentheses. The optimal policy is determined by backward enumeration. There are three levels of decisions; these are indicated by dotted lines in Fig. 1. In backward enumeration, one begins at the outermost decision points and works back up the tree. Consider the decision at level 3. The value of zero for all succeeding states is sufficient information to calculate the best policies for states in level 3. For each of the nonterminal states (S,, s99 s12, S,, and S,), there is only one informative question to ask since the other two have already been presented. Now move up one level and consider the state S,. Either item 2 or 3 can be asked. (Item 1 has been presented; repetition of it would cause no transition). One calculates the cost of each item and selects the minimum. Thus. Cost (item 2) = (.67)( 1) + (.33)(O) = .67, and Cost (item 3) = (.33)(O) + (.67)(l) = .67.

SEQUENTIAL

ITEM

SELECTION

145

FIG. 1. Decision levels for the three-item example. Actions are circled numbers, and states are Si. Costs are below state labels. States 0 are terminal; no action is taken. Possible responses to an action in a nonterminal state are correct (t) and incorrect (-).

The cost of item 2 is the probability of answering item 2 correctly given the current state times the expected cost of the state to be reached given that response-(.67)( 1 )-plus the probability of answering incorrectly times further expected costs-(.33)(O). Item 3 is similarly defined. The expected costs can now be expressed in the following way:

01

V(S,) = min

C(S,, a,) + .67 V(S,) + .33 I’@,) I C(S, , a3) + .67V’(S,) + .33 V(S,) I

+ = 1.67 1+ V(S,) = min 1 .67(l) .33(O) I = 1.67. 1 + .67(l) + .33(O) = 1.67

The first term in the brackets is the cost of presenting the current item. The second term is the probability of answering the problem correctly times the expected cost of the state obtained with a correct response. The third term is the product of the probability of an incorrect response and the expected value of the state obtained. For 480/23/2.4

146

S. P. MARSHALL

S,, either item 2 or item 3 is best as the next item. Expected costs are calculated for S,, S,, S,, and S, in the same way. Now there is enough information to evaluate the initial decision point at level 1. From the expected costs of S,, S,, S,, S,, S,, and S,, one calculates the expected cost of asking each of the three item types. The final minimization is

V(S,) = min

I

1 + .6OV(S,) + .4OV(S,) = 1 + .60(1.67) + .40(l) = 2.4 1 + .6OV(S,) + ,4OI’(S,) = 1 + .60(1.67) + .40(l) = 2.4 , 1 + .2OV(S,) + .8OV(S,) = 1 + .20(O) + .80(2) = 2.6 I

The optimal strategy is the presentation of item 1 or item 2 as the first problem in the testing process, and the expected cost is 2.4. If the first item is answered correctly, either of the remaining items may be presented. If the first is answered incorrectly, the remaining item to be asked is the other single component. Heuristic Hl obtains the same policy. The matrix W has the form

w= and S = (.20.20.20.20.20).

2 2 3 2 3

2 4 2 1 1 3 1 32 1

i1

From Eq. (4), = max(2.4 2.4 1.6) = 2.4.

Either ci or c2 is selected as the next item. Suppose ci is given and is answered correctly. Now

W= i 0

021 021

with tenable ability vectors of I, 2, and 4. Vectors 3 and 5 correspond to nonmastery on c, and are impossible given a correct response to this item. The vector S now is (.33 .33 0.33 0). Application of Eq. (4) again yields X = (0 1.33 1.33), and either cz or c, + c2 can be the next item. Continued application of Hl results in a policy coincident with the optimum.

SEQUENTIAL

ITEM SELECTION

147

When the second heuristic, H2, is applied to the same example, the optimal policy is also obtained. If

and S = (.20.20.20.20.20),

from Eq. (5)

5

min a

(I

=min

K.‘ sBdelr- .50

Zl

1)

i

Again, either c, or c2 would be the first choice. If c1 is selected and answered correctly, S = (.33 .33 0.33 0) and repeated application of Eq. (5) yields

and either c2 or c, + c2 will be presented next. Thus, for the three-item example, both heuristics are optimal, given a uniform prior distribution of probabilities. It has already been pointed out that the optimal policy cannot always be calculated at a reasonable cost when there exists a large number of items. For this reason, comparison of the two heuristics with the optimal policy cannot be made for the general case, and only the two heuristics are considered below. Comparison of the two heuristics: The general case. A uniform prior is defined as a vector whose elements have identical value. Priors containing some numer of zero elements and having all remaining terms identical will be termed quasi-uniform. Given either a uniform or quasi-uniform prior, an item and an individual’s response to the item, application of Bayes’ theorem (1) leads to a new quasi-uniform distribution. (This is a result of the assumption that the individual cannot guess the correct response.) Thus, in the model presented here, uniform vectors lead only to quasi-uniform ones, and quasi-uniform vectors lead to other quasi-uniform ones. It will be shown below that the two heuristics select the same next item given either a uniform or a quasi-uniform distribution. LEMMA 2. Let n* be a positive number and {X,}, a = l,..., p, be a collection of numbers such that 0 < X, < n*. Let Y, = n* -X,. Then a* maximizes X, Y, if and only if it minimizes IX, - n*/2 I.

148

S. P.MARSHALL

ProoJ (1) a* maximizes X, Y, if and only if it maximizes X, n* - Xi ; that is, if and only if it minimizes Xi - n*X,. (2) a* minimizes IX, - n*/21 if and only if it minimizes (X, - n*/2)‘; that is, if and only if it minimizes Xi - n*X, + n**/4. Since n* is a constant, this corresponds to (1) above. Q.E.D. THEOREM 3. Let S be a distribution of n probabilities having n* nonzero entries, all of which are equal. Then max, Ci=, s, w,, and min, 1xi=, sOdO,- .5 1select the same item a as the next item to be presented.

ProoJ

Define if s,>O otherwise.

G = de, =o Then x, = 2 d& , the number of remaining number of remaining 0’s. Then (1)

if if if

w,,=y, =x, =o

s, > 0 and se > 0 and s, = 0.

The only values s0 can take are l/n*

l’s in column a. Y, = n* -X,,

d&, = 1; d& = 0;

the

there are X, such 0 there are Y, such t9

or 0. Thus,



max eyl \‘ se wen = mtx (x, y,/n* a

+ xa ya/n*>

= max 2X, Y,/n* = 2/n* max X, Y, . a a X, and Y, are both nonnegative, and the constant 2/n* can be dropped. (2) Similarly, n

min

x (l/n*)

S sBdenCl

a

d& - .5

= min 1X,/n* - l/2 ( a by definition

of X,. Factoring out l/n*,

the problem becomes

n min a

K7 sOdO,- .5 1= l/n*

mjn (X, - n*/2 I.

Zl

By Lemma 2, the solutions to max, X, Y, and min, IX, - n*/2 1 are given by the same item type. Q.E.D.

SEQUENTIAL

ITEM

149

SELECTION

Given either a uniform or a quasi-uniform prior, both heuristics Hl and H2 will choose the same item type for the next test item. Since a uniform prior can lead only to posterior distributions in which all nonzero entries are identical, every item type selected by Hl will also be selected by H2 until final diagnosis of ability is made. This is true regardless of the number of ability characterizations n or the number of item types p. Nonuniform Priors Comparison of the two heuristics with the optimal policy. The above theorem is true only for the uniform or quasi-uniform distributions as specified. For nonuniform priors in general, the two heuristics are not necessarily the same, and there is no general theorem to determine similarities between them and an optimal policy. It is useful, in this case, to return to the simple example of three items and five ability vectors characterizing performance on the three items. To compare the two heuristics and the optimal policy in terms of expected costs for this situation, 500 prior probability distributions were generated. The elements for each vector were selected randomly under the obvious constraint that they sum to unity. The results of this comparison are presented in Table 2. The expected cost of an optimal policy given each distribution was determined as well as the expected costs of the policies generated by Hl and H2. The costs of the heuristics’ policies were found by taking the costs generated for each possible state when calculating the optimal policy and then applying these costs to the strategies selected by each heuristic. For the 500 vectors, HI was an optimal policy in 460 instances (92%) and H2 was optimal in 434 instances (87%). For all but a single vector, either Hl or H2 coincided with an optimal policy. The average difference in expected costs across the 500 vectors was 0.011 for HI and 0.006 for H2. A pattern is evident in the deviation of HI from the optimal policy. For 37 of the 40 vectors in which HI had a higher expected cost that the optimal, Hl selected a simple component (c, or c2) rather than the composite (c, + cJ. In contrast, only once in 66 instances of disagreement did the second heuristic select a single component rather than the composite. In the remaining 65 cases, H2 selected c1 rather than c, or vice versa. TABLE with

Result Hl = H2 = Hl # H2 = H2 # Hl = Hl = H2 #

optimum optimum optimum optimum

2

A Comparison of Heuristic 1 and Heuristic 2 Optimal Policies, Based on 500 Probability Distributions Number of cases

Average difference in expected cost

Standard deviation of difference

395 39 65

0.1403076

0.1174705

0.0457846

0.0442967

1

-

150

S. P. MARSHALL

Comparison of the two heuristics over many items. Given the high rate of coincidence between the optimal strategy and the two heuristics (92% and 87%), it appears that either one will perform well. While neither is optimal in all cases, both have high percentages of concurrence. Of course, as more items are added, the differences between the heuristics and the optimal policies may change. Some of the changes can be examined by comparing the selections for the two heuristics in a larger set of items. This has been done for the situation of testing individuals abilities in fraction arithmetic. Details of the skills or components are given elsewhere (Marshall, 1980). There are 5 single components and 12 composites. The usual hierarchical relationship between composite and single skills leads to the assumption that mastery on any composite implies mastery on the components of that composite, as described earlier. For the 17 item types, there are 220 vectors characterizing an individual’s ability. Thus, S is a vector of 220 elements. There are two cases of interest, the uniform prior distribution and nonuniform ones. Theorem 3 dictates that both heuristics will select the same item as the first item to be presented. Selection of the second item will depend upon the individual’s response to the first, but again, Theorem 3 assures the selection of an identical item by Hl and H2. The nonuniform case is more interesting. The issue is whether the two heuristics select the same or similar sets of problems for an individual with a particular ability type, given the same nonuniform prior. To compare the two heuristics, an arbitrary prior was constructed. This distribution is approximately symmetric with the middle range of abilities more heavily weighted than those representing almost total mastery or almost total ignorance. No element was given zero value, and the range of probabilities was 0.001 to 0.009.’ For this prior, diagnosis was made as if each of the possible vectors were the true state. Thus, 220 diagnoses were made with each heuristic, and a set of problems was determined for each ability vector. In each diagnosis, items continued to be selected until certainty of diagnosis was obtained, i.e., until a single state had nonzero probability. Table 3 gives the results of the 440 computer simulations of an individual’s responses, given a particular true state of ability and each next-item function. From the 17 items, the maximum number presented by either heuristic was 11, and the minimum was 6. The mean number of items presented by Hl was 8.045; that for H2 was 8.345. The standard deviations were 0.888 and 1.436, respectively. Regardless of true ability, in general the heuristics required presentation of fewer than half of the set of items in order to obtain certain diagnosis. The number of items presented was identical for 103 of the 220 diagnoses. In 84 of these, the two heuristics presented identical sets of problems. For the diagnoses in ’ The probability distribution p used in the computer simulation was (11111 I1 1111111122222222222222233333333334444444444444444555555555555555 666666666677777777777777788888888889999999999aaaaaaaaaaii~~ii~~i~~~~~~ 6666666666555555555555555444444444444444333333333322222222222222211111 1111111111)x 10-3.

SEQUENTIAL

151

ITEM SELECTION

TABLE

3

Number of Items Needed to Obtain Certainty Identification of an Individual’s True Ability Number of items 6 7 8 9 10 11

HI

H2

5 57 92 56 10 0

9 63 7s 17 28 28

of

which the two strategies did not select the same number of items, the number of problems in 87 of the 117 instances differed by a single item and differed by two items in an additional 24 instances. The largest discrepancy in the number of problems presented by the two heuristics is 3, but this occurred only six times. The average discrepancy was 0.6727, and the standard deviation was 0.7599. In general, Hl presented fewer items for the extreme casesand H2 presented fewer items for the middle range of abilities.

SUMMARY

A model has been developed for defining and synthesizing an optimal policy of item selection when items can be presented at most once. Although the optimum can always be defined under the constraints of the model, in practice the computations involved in determining it may exceed the capacity of most computers if the number of possible items to present is large. An alternative to optimal item presentation is offered through the development of two heuristics that approximate optimal policies. While it can be shown that neither heuristic is optimal in general, comparisons of the two indicate that both approach optimal strategies in the case of few possible items. A further comparison using computer simulation reveals that rather than presenting the full set of possible items, both heuristics substantially reduce the number of items that need to be presented in order to estimate an individual’s ability over a large set of items.

REFERENCES ATKINSON. ATKINSON, Bulletin,

R. C. Ingredients for a theory of instruction. American Psychologist, 1972, 27, 921-93 1. R. C., & PAULSON, J. A. An approach to the psychology of instruction. Psychological 1972, 78, 49-61.

152

S. P. MARSHALL

ATTNEAVE, F. Applications of information theory to psychology. New York: Holt, 1959. CHANT, V. G., & ATKINSON, R. C. Optimal allocation of instructional effort to interrelated learning strands. Journal of Mathematical Psychology, 1973, 10, l-25. COOMBS, C. H., DAWES, R. M., & TVERSKY, A. Mathematical psychology. Englewood Cliffs, N. J.: Prentice-Hall, 1970. COTTON, J. W., GALLAGHER, J. P., & MARSHALL, S. P. The identification and decomposition of hierarchical tasks. American Educational Research Journal, 1977, 14, 189-212. DREYFUS, S. E., & LAW, A. M. The art and theory of dynamic programming. New York: Academic Press, 1977. GROEN, G. J., & ATKINSON, R. C. Models for optimizing the learning process. Psychological Bulletin, 1966, 66, 309-320. KARUSH, W., & Dear, R. E. Optimal stimulus presentation strategy for a stimulus sampling model of learning. Journal of Mathematical Psychology, 1966, 3, 19-41. MARSHALL, S. P. Procedural networks and production systems in adaptive diagnosis. Instructional Science,

1980, 9, 129-143. probability

Ross, S. M. Applied 1970. RECEIVED:

July 17, 1980

models

with

optimization

applications.

San Francisco: Holden-Day,