Cognitive Systems Research 7 (2006) 23–33 www.elsevier.com/locate/cogsys
Current advances in SWIFT Action editor: Erik D. Reichle E.M. Richter *, R. Engbert, R. Kliegl Department of Psychology, University of Potsdam, P.O. Box 601553, 14415 Potsdam, Germany Received 12 November 2004; accepted 1 July 2005 Available online 6 September 2005
Abstract Models of eye movement control are very useful for gaining insights into the intricate connections of different cognitive and oculomotor subsystems involved in reading. The SWIFT model (Engbert, Longtin, & Kliegl (2002). Vision Research, 42, 621–636) proposed a unified mechanism to account for all types of eye movement patterns that might be observed in reading behavior. The model is based on the notion of spatially distributed, or parallel, processing of words in a sentence. We present a refined version of SWIFT introducing a letter-based approach that proposes a processing gradient in the shape of a smooth function. We show that SWIFT extents its capabilities by accounting for distributions of landing positions. 2005 Elsevier B.V. All rights reserved. Keywords: Reading; Eye Movements; Mathematical Modeling
1. Introduction As an everyday task, reading allows us to investigate highly diversified and complex behavior within a supposedly simple and well defined array of stimuli. Reading is particularly suitable for a paradigmatic enquiry of the interrelationships between basal physiological and cognitive mechanisms. It is indicative of how perceptual, cognitive and motor systems interact to achieve a shared goal. With text comprehension as the ‘‘final product,’’ we are interested in eye movements as a mediating and measurable behavioral correlate of the reading process. During reading the eyes move in jumps or saccades. For Latin script these are usually rightward oriented, but can also be leftward or regressive in order to bring certain words into focus. The *
Corresponding author. Tel.: +49 331 977 2127. E-mail addresses:
[email protected] (E.M. Richter), engbert @rz.uni-potsdam.de (R. Engbert),
[email protected] (R. Kliegl). URLs: http://www.psych.uni-potsdam.de/people/richter/index-e.html (E.M. Richter), http://www.agnld.uni-potsdam.de/~ralf (R. Engbert), http://www.psych.uni-potsdam.de/people/kliegl/index-e.html (R. Kliegl). 1389-0417/$ - see front matter 2005 Elsevier B.V. All rights reserved. doi:10.1016/j.cogsys.2005.07.003
eyes fixate words for varying lengths of time. We consider these movement patterns as the product of visuomotor and cognitive systems controlling the reading process. The measurement, analysis, and modeling of eye movements are among the most fruitful attempts to understand how visual information is processed and utilized to guide behavior (Findlay & Gilchrist, 2003). The measurement of fixation times on words or parts of sentences is meanwhile considered as an essential tool for studying reading (Liversedge & Findlay, 2000; Rayner, 1998) and the amount of respective data has increased enormously in the recent past. Our central goal is to uncover how these data patterns connect with underlying processes. This can hardly be accomplished without quantitative mathematical modeling. Individual aspects of eye movement behavior can certainly be described qualitatively but an approach that tries to integrate all findings requires a unified computational model. With the quantity of data available today, such a model needs to perform on many levels. In the family of primary oculomotor models (POC) lowlevel information is utilized to reproduce the respective data patterns (McConkie, Kerr, & Dyre, 1994; Reilly &
24
E.M. Richter et al. / Cognitive Systems Research 7 (2006) 23–33
OÕRegan, 1998; Suppes, 1990, 1994; Yang & McConkie, 2001). Higher-level factors have only a modulating function. The so-called cognitive models, on the other hand, assume that eye movements are driven primarily by lexical processing and can be categorized according to how they conceptualize the allocation of visual attention. Some make the assumption that lexical access, i.e. word recognition, is strongly coupled to sequential shifts of attention (SAS) from word to word, which ensures that the meaning of the words become available in the order of their appearance in the text (Morrison, 1984; Reichle, Pollatsek, Fisher, & Rayner, 1998; Reichle, Rayner, & Pollatsek, 2003). Other models assume guidance by attentional gradients (GAG; Engbert, Longtin, & Kliegl, 2002; Legge, Klitz, & Tjan, 1997; Reilly & Radach, 2003), assigning processing capacities to more than one word in view under physiologically plausible constraints. For a thorough review see Reichle et al. (2003). A model that claims to describe the reading process accurately needs to integrate oculomotor as well as lexical factors. E–Z Reader is certainly the most advanced SASbased model and meets this requirement in many respects by addressing such phenomena as landing site distributions (cf. Reichle, Rayner, & Pollatsek, 1999), refixation behavior, word frequency effects, etc. Note that the SAS modelsÕ high-level assumption of a lexical control loop can be replaced without loss of accuracy with autonomous saccade generation, a low-level oculomotor mechanism, as demonstrated by Engbert and Kliegl (2001). SWIFT1 (Engbert et al., 2002), a member of the GAG model family, proposes parallel processing within a window spanning four words, thereby rejecting the strong assumption of sequential attention shifts. In its former version (Engbert et al., 2002), SWIFT already promoted the notion of a processing gradient by assigning a larger processing rate to the foveal word than the parafoveal words. In spite of its simplicity, this turned out to be a viable alternative to sequential attention shifts. We will proceed with an overview of our current considerations regarding the refinement of the SWIFT model. We aim to enable it to integrate a larger amount of findings (cf. Radach, Kennedy, & Rayner, 2004) than was possible with the former version. SWIFT is compatible with the general framework of saccade generation by Findlay and Walker (1999) as well as the theory of movement preparation by Erlhagen and Scho¨ner (2002). Following the principle of minimal modeling we attempted to involve as few assumptions as possible. All assumptions are to be physiologically and psychologically plausible. They must not be overly specified for reading, in order to allow for an extension of the model to other domains such as visual search. The model is designed to provide one general mechanism that is able to explain all types of eye movements in reading such as forward and regressive saccades, refixations, and word skippings.
In the following sections, we discuss modeling goals, current advances in SWIFT with an overview of changes in the formalism, and numerical simulations and evaluations of the modelÕs performance with respect to the modeling goals.
1 SWIFT is the acronym of Saccade generation With Inhibition by Foveal Targets.
2 We computed predictabilities from incremental guesses obtained from 83 subjects for each word in our sentence corpus.
2. Modeling objectives Eye movements have been found to be affected by language processing, particularly when reading multisentence texts (e.g. Frazier, Pacht, & Rayner, 1999; Frazier & Rayner, 1990). Yet it appears that in the situation of reading single sentences, eye-movement control can be explained without a sophisticated model of language processing (Reichle et al., 2003). Instead, straightforward rules for word recognition suffice to explain most of the variance of experimental results. We will focus on experimental phenomena, which we tried to reproduce with our model, and methods to evaluate the modelÕs performance. 2.1. Empirical requirements Word length, printed word frequency and predictability, defined as the probability of guessing a word from reading the preceding words of the sentence, are word properties, which have proven useful for predicting fixation durations and other experimental phenomena. These word properties are therefore commonly used as independent variables in the modeling tradition despite their substantial correlations. Unlike frequency, which can be determined from a large text corpus (Baayen, Piepenbrock, & Rijn, 1993), predictability depends on the context of any particular sentence. It has to be determined experimentally, employing, e.g., the incremental cloze task.2 Typical effects are that longer words are likely to be inspected in more and longer fixations, whereas more frequent and more predictable words are skipped more often. Quantitative dependent variables are spatial (probabilities of different types of saccades) or temporal (fixation durations) in nature or some combination of these factors. The following dependent measures are typically reported in the literature. 2.1.1. Fixation durations Inspection times have been used as a central tool to examine information processing in reading. We employ three separate, non-overlapping populations of fixations related to individual words, i.e. single fixation duration (SF), and first and second fixation duration (F1, F2) for refixated words. These measures are computed only for first-pass reading, meaning that we consider only fixations
E.M. Richter et al. / Cognitive Systems Research 7 (2006) 23–33
that are located on the rightmost word of the current (word-based) fixation sequence. In addition, we compute the total reading time (TT), or the sum of all inspection times regardless of the fixation sequence. Since computational models should account for these fixation duration distributions, the deviation of simulated data from empirical data is one source for the computation of our goodness-of-fit measure. 2.1.2. Fixation probabilities Again, based on first-pass reading, the probabilities of skipping (P0) a word, fixating it twice (P2), and three or more times (P3+) are calculated. The probability of a single fixation (P1) is redundant, since it can be computed from the other probabilities. Since SWIFT also accounts for regressions, we enrich this set of measures by the interword regression probability (PR), i.e. the probability of a word being the target of a regressive saccade. Note that this is a measure relating to second-pass reading, like TT. 2.1.3. Effects of word length and frequency The dependent measures are commonly summarized as functions of word length as well as classes of logarithmic word frequency. We use these functions as ‘‘benchmark’’ data in the sense that we base our fitting coefficient mainly on these curves. 2.1.4. Landing positions Both random and systematic errors of the oculomotor system influence reading behavior (McConkie, Kerr, Reddix, & Zola, 1988). As a consequence, we observe a systematic deviation of mean landing positions from an assumed optimal viewing position. We are able to conduct all of the above-mentioned analyses on one large data set (Kliegl, Grabner, Rolfs, & Engbert, 2004).
3. Outline of the model Dynamic field theory (Erlhagen & Scho¨ner, 2002) proposes a function of space and time, representing an activation field, that is distributed over a number of potential movement targets. To describe the evolution of that field, we rely on concepts from the theory of nonlinear dynamic systems (Engbert, Longtin, & Kliegl, 2004). Moreover, different subsystems (e.g. vision, memory, cognition, oculomotion) are allowed to crosstalk continuously. These attributes taken together render the theory highly feasible for research on eye movement control in reading. Thus, the concept of a movement-planning field from the dynamic field theory of movement preparation motivated the activation field for word targeting in SWIFT. However, we did not refer to the explicit mathematical formalism of the dynamic field theory, since it proposes a framework for movement planning without explicit reference to lexical processing demands. Hence, the formalism was simplified in order to
25
allow for a straightforward incorporation of word recognition to account for lexical effects on eye movements in reading.3 As in the former version of SWIFT (Engbert et al., 2002), we use a rather simple, one-dimensional activation field. However, this already implies spatially distributed processing, i.e. several words can be processed in parallel. A model of eye movement control in reading should be plausible in light of the accumulated neurophysiological knowledge about saccade generation. SWIFT can be seen as a special case of the very general model proposed by Findlay and Walker (1999). As such, SWIFT is theoretically generalizable, e.g. to a two-dimensional visual-search like task (see Trukenbrod & Engbert, in preparation). 3.1. Theoretical assumptions Only one core assumption (Principle V) has been added to the former version of SWIFT (Engbert et al., 2002). Yet, we concisely repeat the others here as well. Thus the core assumptions of SWIFT are as follows. • Principle I: Spatially distributed processing of an activation field. Due to the dynamic-field approach (Erlhagen & Scho¨ner, 2002), the model produces all types of saccades in the course of a competition among words with different activations. The dynamic field of lexical activations evolves as several words are processed in parallel. • Principle II: Separate pathways for saccade timing and target selection. Neurophysiological findings suggest a distinction between temporal and spatial aspects of saccade generation (Findlay & Walker, 1999). SWIFT integrates these two aspects at different stages of the saccade programming scheme (see Principle IV). • Principle III: Random saccade generation with inhibition by foveal targets. Autonomous generation of saccade programs alone would lead to random fixation durations. Therefore, the autonomous timer is modulated by a foveal inhibition process, which is able to decelerate the reading rate for longer inspection times on difficult words. • Principle IV: Two-stage saccade programming. Saccade programming is understood as a two-stage process, motivated by Becker and Ju¨rgens (1979) findings on the double-step paradigm. A preparatory labile stage is followed by a non-labile stage during which active programs can no longer be cancelled. • Principle V: Systematic and random errors in saccade lengths. Following McConkie et al. (1988) we introduce systematic as well as random oculomotor errors. An illustration of the modelÕs architecture is given in Fig. 1. The next section addresses our translation of the principles into mathematical terms which allows for an implementation of the model on a computer to generate 3
Prospectively, dynamic field theoryÕs concept of an interaction between local excitation and global inhibition might also prove useful to contribute to a coherent account of eye movements in reading.
26
E.M. Richter et al. / Cognitive Systems Research 7 (2006) 23–33
the frequency effect, and h as a measure of the effect of predictability on lexical processing time. The first factor addresses the strong dependency of visually based lexical retrieval on word frequency. The predictability factor takes into account the effects of context on lexical difficulty. The parameter h, whose range was set between 0 and 1, allows for the attenuation of the impact of predictability. Note that Eq. (1) implies that visual processing and word prediction are independent factors, which of course is an idealization.
Fig. 1. Outline of SWIFT. Saccade programming is guided by separate temporal and spatial pathways. An autonomous random timer triggers new saccade programs and is itself susceptible to foveal inhibition. The set of lexical activations of all words in a sentence evolves dynamically and can be conceived of as a saliency map.
artificial data by simulations that can be compared to actual reading data.
nj ðtÞ ¼ xnj kðtÞ;
3.2. Mathematical description The details from the original version of SWIFT (Engbert et al., 2002) are summarized briefly whereas the new additions are explained in more detail. 3.2.1. Dynamic field of activation At any point in time activation values an(t) are assigned to each of Nw words in a sentence. The dynamic field {an(t)} of activations is a one-dimensional array that establishes a saliency map for target selection. The actual length of the sentence needs not to be known in advance. The activation value of a word expresses the current extent of processing in the course of its identification. It increases in a preprocessing stage until a given maximum is reached and decreases in the subsequent lexical completion stage. Upcoming saccade targets are determined by a simple transformation of the activation field into a discrete probability distribution. 3.2.2. Word difficulty As in the former version of the model (Engbert et al., 2002; see also Engbert & Kliegl, 2001; Reichle et al., 1998) we assume that word difficulty Ln limits the maximum lexical activation for a given wordn. We assume that word difficulty is determined by printed word frequency fn and predictability pn of wordn as follows: Ln ¼ ða b log fn Þ ð1 hpn Þ ; |fflfflfflfflfflfflfflfflffl{zfflfflfflfflfflfflfflfflffl} |fflfflfflfflffl{zfflfflfflfflffl} frequency factor
3.2.3. Lexical processing rate This part constitutes a major refinement of SWIFT. The notion of a processing gradient has been broken down to the letter level. The lexical processing rate, denoted by k() > 0, is a function of physical distance of a letter from the current fixation position, the eccentricity . The fixation position at time t is denoted by k(t) and can attain real values between 0 and the number z of all characters, spaces and punctuation marks of a given sentence. For the purpose of word-based analyses, a fixation on a space is counted as a fixation on the adjacent word to the right. The eccentricity of letter j of wordn with respect to the current fixation position is given by
ð1Þ
predictability factor
with the free parameters a as the maximum of the frequency term for low-frequency words, b as a measure of
ð2Þ
where xnj is the position of letter j of wordn in sentence coordinates. We postulate that processing speed is mainly limited by visual acuity (see Legge, Hooven, Klitz, Mansfield, & Tjan, 2002) which is a function of eccentricity. Furthermore, we hypothesize a global attentional allocation to the default reading direction (e.g. Rayner, 1998) which leads us to a first approximation of the gradient of processing rate as an asymmetric Gaussian function. Thus, the lexical processing rate for a given eccentricity is computed as r ¼ rL ; if < 0; 2 kðÞ ¼ k0 exp 2 with ð3Þ 2r r ¼ rR ; if P 0; where rL and rR characterize the width of the processing span to the left and to the right, respectively (Fig. 2). To obtain a real density function the normalization constant has to attain the value of rffiffiffi 2 1 k0 ¼ . ð4Þ p ðrR þ rL Þ We assign a processing rate to a word by averaging the processing rates of its letters, i.e. kn ðtÞ ¼
Mn 1 X kðnj ðtÞÞ; M n j¼1
ð5Þ
where Mn is the length of wordn in letters. 3.2.4. Temporal evolution of the field of activations As described above, word identification is modeled as a two-stage process (see also Engbert & Kliegl, 2001; Reichle
E.M. Richter et al. / Cognitive Systems Research 7 (2006) 23–33
27
This assumption yields a mechanism which lies somewhere between a completely random selection and a winner-takes-all target selection.
Processing rate
σL
σR
0
Eccentricity
Fig. 2. The gradient. Lexical processing rate is modeled as a normalized asymmetric Gaussian function with free parameters rL and rR.
et al., 1998). Concomitant lexical activation rises from 0 to Ln within time tp, i.e. from unawareness of the word to completed preprocessing. In the course of the subsequent lexical completion, activation falls back to 0, indicating that the word has been completely processed. The only new aspect here is that we allow for a global decay of activation, which could reasonably be tied to a memory leakage that affects the entire saliency map. The evolution of the dynamically changing field of activations is described by a system of ordinary differential equations (ODE) þf kðn ðtÞÞ x; if t < tp ðpreprocessingÞ; dan ðtÞ ¼ dt kðn ðtÞÞ x; if t P tp ðlexical completionÞ ð6Þ with free parameters f > 1 as a preprocessing factor, i.e. build-up is faster than decrease, and x as the strength of a global decay process. Preprocessing is conceived of as a preliminary stage of identification with the main purpose of entering a word into the set of possible saccade targets as soon as it first appears in the attentional window. Its function within the model is that of a mechanism and therefore it is not of equal rank to lexical processing. As will be demonstrated later, it is about 90 times faster than the lexical-completion branch of the ODE. 3.2.5. Saccade target selection As mentioned earlier, the rationale behind saccade target selection is quite straightforward. We understand the selection process as being probabilistic and competitive. The model assigns a selection probability to each word by a straightforward transformation of their respective lexical activations. Thus, the probability p(n, t) to select wordn as a saccade target at time t is given by its relative lexical activation an ðtÞ pðn; tÞ ¼ PN w . j¼1 aj ðtÞ
ð7Þ
3.2.6. Random timing and foveal inhibition Saccade timing is defined as a stochastic process that can be modulated by foveal processing demands. The interval between two trigger signals that initiate new saccade programs is a c-distributed random variable with mean tsac and a standard deviation of 13 tsac . The free parameter tsac reflects a readerÕs individual mean reading rate as well as the difficulty of the text at hand. The duration of a fixation on a word is modulated by the extent of foveal activation. Let ti be the time of initiation of the saccade program for saccade i. The command to initiate the program for saccade i + 1 will occur after an interval Dti+1, which is drawn from the abovementioned c-distribution. We assume that this interval can be prolonged by a foveal inhibition process. Thus, the next saccade program i + 1 will be triggered if t > ti þ Dtiþ1 þ hak ðtÞ;
ð8Þ
where the free parameter h denotes the strength of the inhibitory process and t is the time elapsed since the start of the current fixation. By way of analytical approximation (similar to the method employed by Kliegl & Engbert, 2003) it can be shown that the maximum inhibition time T is limited even for arbitrarily high values of h and is given by !1 M n 1 X a j2 h!1 a T ! ¼ Mn 1 þ exp 2 ; ð9Þ kmin k0 2rL j¼1 where kmin denotes the processing rate assigned to the foveal word with the length Mn in the worst case of a fixation to the very right of the word, Eq. (5). The maximum inhibition time increases almost linearly with word length and it will turn out that it may actually attain a value of at most 65 ms for the longest word in our sentence corpus (described in Kliegl et al., 2004). 3.2.7. Saccade programming Our thinking here stems from findings obtained with the double-step paradigm in saccade generation by Becker and Ju¨rgens (1979). In the context of reading similar ideas were proposed first by Reichle et al. (1998) see also Engbert and Kliegl (2001). After the triggering of a saccade program, an assumed two-stage process is initiated which consists of a labile and a non-labile stage. The labile stage takes an average of slab throughout which it is susceptible to cancelation. It is followed by the non-labile stage with a selected saccade target, Eq. ( 7). The mean duration of the non-labile stage is referred to as snl. The execution itself takes an average of sex throughout which preprocessing pauses due to saccadic blindness but lexical completion continues. At last, the gaze position is updated and a new saccade program might be
28
E.M. Richter et al. / Cognitive Systems Research 7 (2006) 23–33 Table 1 Estimates for oculomotor error parameters Parameter
Forward Saccade
Refixation
Refixation
Saccade
dSRE L0
0.41 5.4
0.49 5.7
0.5 4.3
0.15 10.0
d0 d1
Regressive
0.870 0.084
4. Simulation results
Fig. 3. Saccade programming in SWIFT. The triggering of a saccade program is governed by the temporal stream, whereas target selection at the beginning of the non-labile stage is guided by the saliency map.
In this section, we will clarify the rationale of our parameter fitting, present parameter estimates, and finally discuss the modelÕs performance in relation to experimental data. 4.1. Simulations and model parameters
triggered. For all model runs the three s-values were fixed at 150, 50, and 25 ms, respectively. A temporal scheme of saccade programming is illustrated in Fig. 3. 3.2.8. Oculomotor errors Our assumptions about oculomotor errors are derived from work by McConkie et al. (1988). Accordingly, the saccadic system aims at the optimal viewing positions within words – defined as word centers in SWIFT. However, the actual landing positions are shifted due to systematic errors and scattered due to random errors. Let L denote the intended saccade length to the center of the current target word. The actual saccade length ‘ can be computed as ‘ ¼ L þ ‘SRE þ ‘G
ð10Þ
with ‘SRE as the so called systematic saccade range error and ‘G as a Gaussian-distributed random error with zero as its mean. The systematic error is assumed to arise from the systemÕs hypothesized preference of saccades with length L0 . This property has the advantage of effective automation with the drawback of a limited adaptivity. Thus, the system undershoots for L > L0 and overshoots for L < L0 . A linear approximation of the systematic error is ‘SRE ¼ dSRE ðL0 LÞ;
ð11Þ
where dSRE specifies the strength of the saccade range error (McConkie et al., 1988). Since motor error tends to increase with motor amplitude, we assume – again as a linear approximation – that the spread of the random error component can be described by r‘G ¼ d0 þ d1 L.
All experimental phenomena that form the basis of our analyses are derived from the Potsdam corpus (Kliegl et al., 2004), a reference data set obtained from 230 participants. For the simulation input, we use the known independent measures of the corpus (1138 words from 144 sentences). The first and last word of each sentence is excluded from the statistical analysis. Each simulation is carried out for 500 virtual subjects. The temporal evolution of the ODEs (Eq. 6) is discretized in steps of 2 ms using the Euler method. SWIFT produces gaze trajectories that can be treated like actual data. We compute summary statistics for the dependent measures (i.e. mean fixation durations and probabilities) as functions of classes of word lengths and word frequencies and the distributions of the durations, as described in Section 2. We are then able to calculate v2-type statistics reflecting the over-all deviation of the simulated data from our experimental data. Optimization4 was carried out employing a genetic algorithm similar to the procedure described in Engbert et al., 2002. Table 2 lists the set of parameters our optimization procedure converged at, together with respective estimates of error based on the 50 best performing parameter sets. Since the best performing 50 parameter sets fit the experimental data similarly well, we assume that the variances of the parameters in this collection are plausible measures of inter-individual parameter variance. Thus, the parameter variances (i.e. parameter sensitivities) were fed back into the model in order to simulate inter-individual differences, avoiding the general problem of insufficient variances of model outcomes.
ð12Þ
All parameters of Eqs. (11) and (12) were estimated from our data and held constant for all model runs. The parameters for the systematic error component were estimated separately for the different saccade types (Table 1).
4
Note that in contrast to linear models, where a different choice of one parameter value may often be compensated by a change in another parameter, we can hope for a unique solution. We also conducted an additional independent run of the genetic algorithm to verify the yielded estimates.
E.M. Richter et al. / Cognitive Systems Research 7 (2006) 23–33
29
Table 2 Estimates of free model parameters Parameter
Symbol
Lexical parameters
Frequency, intercept Frequency, slope Predictability weight
a b h
48.1 0.177 0.62
Visual processing
Visual span, right Visual span, left Preprocessing factor Global decay
rR rL f x
4.55 0.05 90.7 0.026
Saccade timing
Random timing [ms] Inhibition factor
tsac h
4.2. Model performance We open this section with a simulation example to illustrate some of the intrinsic behavior of SWIFT. Then, we present the above-mentioned summary statistics for 500 virtual subjects in comparison to our experimental data. Finally, we discuss data concerning initial landing positions and refixation probabilities. 4.2.1. Simulation example Fig. 4 depicts a single simulated reading trajectory; the word-based fixation sequence is {1, 3, 4, 5, 5, 7, 8, 7}, meaning that the modelÕs eye first landed on word1, then on word3, and so forth. The time course of the set of activa-
Value
193.4 2.23
Error
Min
Max
0.88 0.08 0.04
10 0 0
100 5 1
0.07 0.01 9.82 0.00
1 0 1 0
1.89 0.24
100 0
5 6 120 0.1 300 100
tions {an(t)} is plotted in gray with fixation positions k(t) in black. Some phenomena will now be described briefly. As can be seen in Fig. 4, Word2 and word6 are skipped, indicated by the fact that the fixation bars are never located within the respective word boundaries. The first skipping occurs due to oculomotor error since the skipped word was actually chosen as saccade target and missed, whereas word6 was completely processed parafoveally. We get a refixation on word5, although word6 had higher activation at the time of target selection. This may happen due to the probabilistic nature of target selection. As stated earlier, SWIFT is able to produce regressions. Word7 is regressed to since it was not completely processed during the two previous fixations. According to SWIFT, regressions might
Fig. 4. Simulation example. The set of lexical activations is plotted as gray polygons and the trajectory of fixation positions is represented by the black line. The lengthy vertical bars represent the fixation durations: uppermost black areas denote compounds of the random terms involved in saccade timing; gray and white stretches denote labile and non-labile stages of saccade programming, respectively; the lowest black areas are times of saccade executions. Small black rings denote intended saccade targets.
30
E.M. Richter et al. / Cognitive Systems Research 7 (2006) 23–33 1
occurs in less than 5% of the time. As can be seen in Fig. 5, most of the time the amount of words processed in parallel is 3–4. Processing of just one word at a time, i.e. serial processing, usually occurs towards the end of sentences, since the rest of the sentence has been processed thoroughly by then. Neither in our data nor in our simulations did we find an acceleration, or shorter fixations, towards the end of sentences.
cum probs for parallel processing
1 2
3
4
Fig. 5. Amount of parallel processing in SWIFT. Depicted are the cumulative probabilities for the numbers of words processed in parallel (1, 2, 3, 4, 5, and more than 5) at any given time in the course of sentence reading. We simulated 200 virtual subjects reading all sentences from our corpus. We then rescaled the inspection time for each simulation in order to compute the probabilities. The expanse of each areas reflect the overall probability of processing the respective number of words.
4.2.2. Summary statistics The effects of word length and word frequency on the dependent variables for experimental data and SWIFT simulations can be compared in Fig. 6. We summarized data for five logarithmic frequency classes (1–10, 11–100, 101–1000, 1001–10,000, > 10,000 [per million]) and 11 word lengths (2–11) and >11 [letters]. Distributions of fixation durations of simulated data are compared to those of experimental data in Fig. 7. Simulated durations are in good agreement with experimental data apart from their more narrow distributions, particularly in case of FF and F2.
also be mislocated refixations arising from oculomotor errors. In the depicted example there are never more than four words activated simultaneously. Larger simulations show that simultaneous activation of more than 5 words
4.2.3. Initial landing positions SWIFT, in its former version (Engbert et al., 2002) was able to reproduce the summary statistics equally well. From the results depicted in Fig. 8 it is obvious that SWIFT with the additional assumption of the saccade
5 >5
relative sentence inspection time
b
300 first second single total experiment simulation
250
200
skipping two
three+ regression experiment simulation
0.5 0.4 0.3 0.2 0.1
150
1
350 300
200
1
2
3
4
5
word frequency class d
first second single total experiment simulation
250
150
0
2 3 4 5 word frequency class
c 400 mean fixation duration [ms]
0.7 0.6
relative frequency
mean fixation duration [ms]
a
1
0.7 0.6
relative frequency
0
0.5
skipping two three+ regression experiment simulation
0.4 0.3 0.2 0.1
2 3 4 5 6 7 8 9 10 11 12
word length class
0
2 3 4 5 6 7 8 9 10 11 12
word length class
Fig. 6. Summary statistics for measured (dotted lines) as well as simulated (solid lines) data. Depicted are effects of word frequency on (a) mean fixation durations and (b) relative frequencies of saccadic events. Also depicted are effects of word lengths on (c) fixation durations and (d) relative frequencies of saccadic events.
E.M. Richter et al. / Cognitive Systems Research 7 (2006) 23–33 second fixation
first fixation experiment
0.3
simulation
relative frequency
relative frequency
0.3
0.2
0.1
0
100
200
300
0.2
0.1
0
400
100
fixation duration [ms]
200
300
400
fixation duration [ms]
single fixation
total reading time 0.3
relative frequency
0.3
relative frequency
31
0.2
0.1
0
100
200
300
0.2
0.1
0
400
100
fixation duration [ms]
200
300
400
fixation duration [ms]
Fig. 7. Distributions of simulated and experimental fixation durations (F1, F2, SF and TT).
4-Letter Words
6-Letter Words
8-Letter Words
launch
% of Fix.
0.5
site
0.4 0.3 –1
0.2 experiment simulation
0.1
% of Fix.
0 0.5 0.4 0.3
–3
0.2 0.1
% of Fix.
0 0.5 0.4 0.3 –5
0.2 0.1
% of Fix.
0 0.5 0.4 0.3 –7
0.2 0.1 0 0
1
2
3
4
5
6
7
8
0
1
2
3
4
5
6
7
8
0
1
2
3
4
5
6
7
8
Initial landing position (letters) Fig. 8. Distributions of initial landing positions as a function of word length and launch site. The launch site of a saccade is defined as the distance from the launch position to the space before the word it is directed towards. For clarity, only word lengths of 4, 6, and 8 letters and launch sites of 1, 3, 5, and 7 are depicted.
32
E.M. Richter et al. / Cognitive Systems Research 7 (2006) 23–33 Simulation
relative frequency of refixations
0.4
0.3
0.2
0.1
0
–4
–2
0 2 center based initial landing position
Experiment 0.4 relative frequency of refixations
word length 4 word length 5 word length 6 word length 7 word length 8
word length 4 word length 5 word length 6 word length 7 word length 8
0.3
0.2
0.1
0
–4 –2 0 2 center based initial landing position
Fig. 9. Relative frequency of refixations as functions of initial landing position for different word lengths for experimental and simulated data.
range error, Eq. (11), yields a rather close fit to the experimentally observed effects concerning the distributions of initial landing positions as a function of word length and launch site. In both the model and the experimental data, landing positions are (a) more scattered for longer words and longer preceding saccades and (b) shifted towards the end of short words and towards the beginning of words after longer preceding saccades.
(b) SWIFT is capable of quantitatively reproducing the influence of word characteristics on inspection times and probabilities of the above-mentioned fixational events equally well. (c) Moreover, SWIFT increased its explanatory power, since it accounts for landing position effects due to the introduction of the additional principle of the saccade range error.
4.2.4. Refixation probability The probability of a refixation as a function of initial landing position (cf. McConkie, Kerr, Reddix, Zola, & Jacobs, 1989) provides us with information on the optimal fixation position, i.e. the position which allows the most effective lexical processing. The more effective lexical processing is, the smaller the need for refixations. Experimental data yielded clear U-shaped curves with minima slightly shifted to the left of word centers. A comparison to simulated curves of relative frequencies of refixations is depicted in Fig. 9. The curves for the shorter words are at least qualitatively in accordance with real data, i.e. we find a dip to the left of word centers. For longer words the dip is less pronounced or simply not existent. In addition, the simulated data show a much stronger influence of word length on refixation probabilities to the left of word centers. Thus SWIFT fails to accurately reproduce these data, a shortcoming that must be addressed in its future development.
Yet, we have to acknowledge one shortcoming of SWIFT in its current state. Even though we were able to reproduce some qualitative aspects of the refixation behavior, the experimental data by and large disagree with our simulations on a quantitative level. To account for the refixation probabilities in a future version of SWIFT, it might be feasible to further assume, that large saccade errors induce immediate corrective saccades, thus interrupting the autonomous saccade generation. An immediately following saccade would increase the probability of a refixation, because it is more probable that the fixated word wins the target selection again, as there is less time to change the saliency map. Since fixations on word margins are more likely to arise from saccades with larger errors, immediate corrective saccades after such large errors would yield the desired effect of an increased refixation probability for initial fixations on word margins relative to initial fixations on word centers. The refinement of the processing gradient does not increase the number of parameters. Additional parameters for the saccade range error were fixed and hence add no further degrees of freedom to the model. One psychologically plausible parameter, namely memory decay, has been introduced. Together with the parameter h, which attenuates the degree of the predictabilitiesÕ influence to word difficulty (cf. Reichle et al., 2003, Eq. (2)), SWIFTÕs degrees of freedom have increased by two. On the other hand, we saved one s-parameter related to the programming of saccades and fixed two of the other previously free s-parameters. Finally, no additional states had to be added to the model, thus preserving its straightforward architecture.
5. Discussion The translation of SWIFT to the level of letters was successful in many respects: (a) SWIFT still accounts for the same range of phenomena of gaze sequences employing one general mechanism. In addition, more physiological plausibility is gained through replacing the discrete (word-based) gradient by a continuous gradient that takes into account different word lengths.
E.M. Richter et al. / Cognitive Systems Research 7 (2006) 23–33
References Baayen, R. H., Piepenbrock, R., & Rijn, H. (1993). The CELEX lexical database (Release 1) [CD-ROM]. University of Pennsylvania, Philadelphia, PA: Linguistic Data Consortium. Becker, W., & Ju¨rgens, R. (1979). An analysis of the saccadic system by means of double step stimuli. Vision Research, 19, 1967–1983. Engbert, R., & Kliegl, R. (2001). Mathematical models of eye movements in reading: a possible role for autonomous saccades. Biological Cybernetics, 85, 77–87. Engbert, R., Longtin, A., & Kliegl, R. (2002). A dynamical model of saccade generation in reading based on spatially distributed lexical processing. Vision Research, 42, 621–636. Engbert, R., Longtin, A., & Kliegl, R. (2004). Complexity of eye movements in reading. International Journal of Bifurcation and Chaos, 14, 493–503. Erlhagen, W., & Scho¨ner, G. (2002). Dynamic field theory of movements preparation. Psychological Review, 109, 545–572. Findlay, J. M., & Gilchrist, I. D. (2003). Active vision. The psychology of looking and seeing. Oxford: Oxford University Press. Findlay, J. M., & Walker, R. (1999). A model of saccade generationbased on parallel processing and competitive inhibition. Behavioral and Brain Sciences, 22, 661–721. Frazier, L., Pacht, J. M., & Rayner, K. (1999). Taking on semantic commitments II: collective versus distributive readings. Journal of Memory and Language, 29, 181–200. Frazier, L., & Rayner, K. (1990). Taking on semantic commitments: Processing multiple meanings vs. multiple senses. Journal of Memory and Language, 29, 181–200. Kliegl, R., & Engbert, R. (2003). SWIFT explorations. In J. Hyo¨na¨, R. Radach, & H. Deubel (Eds.), The mindÕs eye: Cognitive and applied aspects of eye movements (pp. 103–117). Oxford: Elsevier. Kliegl, R., Grabner, E., Rolfs, M., & Engbert, R. (2004). Length, frequency, and predictability effects of words on eye movements in reading. European Journal of Cognitive Psychology, 16, 262–284. Legge, G. E., Hooven, T. A., Klitz, T. S., Mansfield, J. S., & Tjan, B. S. (2002). Mr. Chips 2002: new insights from an ideal-observer model of reading. Vision Research, 42, 2219–2234. Legge, G. E., Klitz, T. S., & Tjan, B. S. (1997). Mr. Chips: an idealobserver model of reading. Psychological Review, 104, 524–553. Liversedge, S. P., & Findlay, J. M. (2000). Saccadic eye movements and cognition. Trends in Cognitive Science, 4, 6–14. McConkie, G. W., Kerr, P. W., & Dyre, B. P. (1994). What are ÔnormalÕ eye movements during reading: toward a mathematical description. In
33
J. Ygge & X. Lennestrand (Eds.), Eye movements in reading (pp. 315–327). Elsevier: Oxford. McConkie, G. W., Kerr, P. W., Reddix, M. D., & Zola, D. (1988). Eye movement control during reading: I. The location of initial fixations on words. Vision Research, 28, 1107–1118. McConkie, G. W., Kerr, P. W., Reddix, M. D., Zola, D., & Jacobs, A. M. (1989). Eye movement control during reading: II. Frequency of refixating a word. Perception and Psychophysics, 46, 245– 253. Morrison, R. E. (1984). Manipulation of stimulus onset delay in reading: evidence for parallel programming of saccades. Journal of Experimental Psychology: Human Perception and Performance, 10, 667–682. Radach, R., Kennedy, A., & Rayner, K. (2004). Eye movements and information processing during reading [Special issue]. European Journal of Cognitive Psychology, 16, 1–352. Rayner, K. (1998). Eye movements in reading and information processing: 20 years of research. Psychological Bulletin, 124, 372–422. Reichle, E. D., Pollatsek, A., Fisher, D. L., & Rayner, K. (1998). Toward a model of eye movement control in reading. Psychological Review, 105, 125–157. Reichle, E. D., Rayner, K., & Pollatsek, A. (1999). Eye movements control in reading: accounting for initial fixation locations and refixations within the E–Z Reader model. Vision Research, 39, 4403–4411. Reichle, E. D., Rayner, K., & Pollatsek, A. (2003). The E–Z Reader model of eye movement control in reading: comparisons to other models. Behavioral and Brain Sciences, 26, 446–526. Reilly, R., & OÕRegan, J. K. (1998). Eye movement control in reading: A simulation of some word-targeting strategies. Vision Research, 38, 303317. Reilly, R. G., & Radach, R. (2003). Foundations of an interactive activation model of eye movement control in reading. In J. Hyo¨na¨, R. Radach, & H. Deubel (Eds.), The mindÕs eye: Cognitive and applied aspects of eye movements (pp. 429–455). Amsterdam: Elsevier. Suppes, P. (1990). Eye movement models for arithmetic and reading performance. In E. Kowler (Ed.), Eye movements and their role in visual and cognitive processes. Amsterdam: Elsevier. Suppes, P. (1994). Stochastic models of reading. In J. Ygge & G. Lennerstrand (Eds.), Eye movements in reading. Oxford: Pergamon Press. Trukenbrod, H.A., Engbert, R. (in preparation). Eye movements in visual search: experiment and computational modeling. Yang, S.-N., & McConkie, G. W. (2001). Eye movements during reading: A theory of saccade initiation time. Vision Research, 41, 35673585.