Available online at www.sciencedirect.com
Cognitive Systems Research 11 (2010) 108–119 www.elsevier.com/locate/cogsys
Learning HMM-based cognitive load models for supporting human-agent teamwork q Action editor: Jiming Liu Xiaocong Fan a,*, Po-Chun Chen b, John Yen c a
b
School of Engineering, The Behrend College, The Pennsylvania State University, Erie, PA 16563, USA Department of Computer Science and Engineering, The Pennsylvania State University, University Park, PA 16802, USA c College of Information Sciences and Technology, The Pennsylvania State University, University Park, PA 16802, USA Received 25 February 2008; accepted 4 August 2008 Available online 28 August 2008
Abstract Cognitive studies indicate that members of a high performing team often develop shared mental models to predict others’ needs and coordinate their behaviors. The concept of shared mental models is especially useful in the study of human-centered collaborative systems that require humans to team with autonomous agents in complex activities. We take the position that in a mixed human-agent team, agents empowered with cognitive load models of human team members can help humans develop better shared mental models to enhance team performance. Inspired by human information processing system, we here propose a HMM-based load model for members of human-agent teams, and investigate the development of realistic cognitive load models. A cognitive experiment was conducted in team contexts to collect data about the observable secondary task performance of human participants. The data were used to train hidden Markov models (HMM) with varied numbers of hypothetical hidden states. The result indicates that the model spaces have a three-layer structure. Statistical analysis also reveals some characteristics of the models at the top-layer. This study can be used in guiding the selection of HMM-based cognitive load models for agents in human-centered multi-agent systems. Ó 2008 Elsevier B.V. All rights reserved. Keywords: Cognitive load model; Shared mental model; Hidden Markov model; Multi-agent teamwork
1. Introduction Human-centered multi-agent teamwork has attracted increasing attentions in multi-agent systems field (Bradshaw et al., 2002; Norling, 2004; Fan, Sun, Sun, McNeese, & Yen, 2006). Human-centered teamwork, involving both humans and software agents, is about collaboratively establishing situation awareness, developing shared mental models as situation evolves, and appropriately adapting to mixedinitiative activities. Humans and autonomous agents are q
This is a significant extension of the paper presented at the Eighth Internal Conference on Cognitive Modeling, Ann Arbor, 2007. * Corresponding author. E-mail addresses:
[email protected] (X. Fan),
[email protected] (P.-C. Chen),
[email protected] (J. Yen). 1389-0417/$ - see front matter Ó 2008 Elsevier B.V. All rights reserved. doi:10.1016/j.cogsys.2008.08.004
generally thought to be complementary: while humans are limited by their cognitive capacity in information processing, they are superior in spatial, heuristic, and analogical reasoning; autonomous agents can continuously learn expertise and tacit problem-solving knowledge from humans to improve system performance. However, the foundation of human-agent collaboration keeps being challenged because of nonrealistic modeling of mutual awareness of the state of affairs. In particular, few researchers look beyond to assess the principles of modeling shared mental constructs between a human and his/ her assisting agent. Moreover, human-agent relationships can go beyond partners to teams (Fan, Yen, & Volz, 2005). Many informational processing limitations of individuals can be alleviated by having a group perform tasks. Although groups also can create additional costs centered
X. Fan et al. / Cognitive Systems Research 11 (2010) 108–119
on communication, resolution of conflict, and social acceptance, it is suggested that such limitations can be overcome if people have shared cognitive structures for interpreting task and social requirements (Lord & Maher, 1990). Therefore, there is a clear demand for investigations to broaden and deepen our understanding on the principles of modeling shared mental states among members of a mixed human-agent team. There are lines of research on multi-agent teamwork, both theoretically and empirically. For instance, joint intention (Cohen & Levesque, 1991) and shared plans (Grosz & Kraus, 1996) are two theoretical frameworks for specifying agent collaborations. One of the drawbacks is that, although both have a deep philosophical and cognitive root, they do not explicitly accommodate the modeling of human team members. Cognitive studies suggested that teams which have shared mental models are expected to have common expectations of the task and team, which allow them to predict the behavior and resource needs of team members more accurately (Rouse, Cannon-Bowers, & Salas, 1992; Klimoski & Mohammed, 1994). Rouse et al. (1992) explicitly argue that team members should hold compatible models that lead to common ‘‘expectations”. We agree on this and believe that the establishment of shared mental models among human and agent team members is a critical step to advance human-centered teamwork research. However, it is extremely challenging to develop shared mental models among team members for several reasons. In the first place, the concept of shared mental models can broadly include role assignment and its dynamics, teamwork schemas and progresses, communication patterns and intentions, etc. Team members have to share a large amount of information dynamically in order to effectively coordinate their activities. In the second place, both humans and agents are resource-bounded. It is of no use to share information with teammates who are already overloaded; they simply do not have the capacity to process the information. Thus, we argue that to favor human-agent collaboration, an agent system should be designed to allow the estimation and prediction of human teammates’ (relative) cognitive loads, and use that to offer improvised, unintrusive help. Ideally, being able to predict the cognitive/processing capacity curves of teammates could allow a team member to share the right information with the right party at the right time (Fan & Yen, 2007), avoiding unbalanced work/cognitive loads among the team. While the long-term goal of our research is to understand how shared cognitive structures can enhance human-agent team performance, the specific objective of the work reported here is to develop a computational cognitive capacity model to facilitate the establishment of shared mental models. The rest of the paper is organized as follows. In Section 2 we review cognitive load theories and measurements. A HMM-based load model is given in Section 3 to support resource-bounded teamwork
109
among human-agent-pairs. Section 4 describes the cognitive task design and the experiment conducted to collect observable measures of secondary task performance in a team context. Section 5 reports the methodology of learning hidden Markov models (HMM) and the principles of selecting appropriate HMM models for an agent to estimate its human partner’s dynamic cognitive load. Section 6 concludes the paper. 2. Background on cognitive load studies People are information processors. Most cognitive scientists (Lord & Maher, 1990) believe that human information processing system consists of an executive component and three main information stores: (a) sensory store, which receives and retains information for one second or so; (b) working (or short-term) memory, which refers to the limited capacity to hold (approximately seven elements at any one time (Miller, 1956)), retain (for several seconds), and manipulate (two or three information elements simultaneously) information; and (c) long-term memory, which has virtually unlimited capacity (Baddeley, 1992) and contains a huge amount of accumulated knowledge organized as schemata. Cognitive load studies are, by and large, concerned about working memory capacity and how to circumvent its limitations in human problem-solving activities such as learning and decision making. According to the cognitive load theory (Paas & Merrienboer, 1993), cognitive load is defined as a multidimensional construct representing the load that a particular task imposes on the performer. It has a causal dimension including causal factors that can be characteristics of the subject (e.g. expertise level), the task (e.g. task complexity, time pressure), the environment (e.g. noise), and their mutual relations. It also has an assessment dimension reflecting the measurable concepts of mental load (imposed exclusively by the task and environmental demands), mental effort (the cognitive capacity actually allocated to the task), and performance. Lang’s information processing model (Lang, 2000) consists of three major processes: encoding, storage, and retrieval. The encoding process selectively maps messages in sensory stores that are relevant to a person’s goals into working memory; the storage process consolidates the newly encoded information into chunks, forming associations and schema to facilitate subsequent recalls; the retrieval process searches the associated memory network for a specific element/schema and reactivates it into working memory. The model suggests that processing resources (cognitive capacity) are independently allocated to the three processes. In addition, working memory is used both for holding and for processing information (Baddeley, 1992). Due to limited capacity, when greater effort is required to process information, less capacity remains for the storage of information. Hence, the allocation of the limited cognitive resources has to be balanced in order to enhance human performance. This comes to the issue of
110
X. Fan et al. / Cognitive Systems Research 11 (2010) 108–119
measuring cognitive load, which has proven difficult for cognitive scientists. Cognitive load can be assessed by measuring mental load, mental effort, and performance using rating scales, psycho-physiological, and secondary task techniques (Paas, Tuovinen, Tabbers, & Gerven, 2003). Self-ratings may appear questionable and restricted, especially when instantaneous load needs to be measured over time. Although physiological measures are sometimes highly sensitive for tracking fluctuating levels of cognitive load, costs and work place conditions often favor task- and performance-based techniques, which involve the measure of a secondary task as well as the primary task under consideration. Secondary task techniques are based on the assumption that performance on a secondary task reflects the level of cognitive load imposed by a primary task (Sweller, 1988). From the resource allocation perspective, assuming a fixed cognitive capacity, any increase in cognitive resources required by the primary task must inevitably decrease resources available for the secondary task (Lang, 2000). Consequently, performance in a secondary task deteriorates as the difficulty or priority of the primary task increases. The level of cognitive load can thus be manifested by the secondary task performance: the subject is getting overloaded if the secondary task performance drops. A secondary task can be as simple as detecting a visual or auditory signal but requires sustained attention. Its performance can be measured in terms of reaction time, accuracy, and error rate. However, one important drawback of secondary task performance, as noted by Paas et al. (2003), is that it can interfere considerably with the primary task (competing for limited capacity), especially when the primary task is complex. To better understand and measure cognitive load, Xie and Salvendy (2000) introduced a conceptual framework, which distinguishes instantaneous load, peak load, accumulated load, average load, and overall load. It seems that the notation of instantaneous load, which represents the dynamics of cognitive load over time, is especially useful for monitoring the fluctuation trend so that free capacity can be exploited at the most appropriate time to enhance the overall performance in human-agent collaborations.
HAI
Human
Teammates
Human Partner Agent Processing Model
Agent Comm Model
Agent 1
Human-Agent Pair 1
3. HMM-based load models We first introduce a human-centered teamwork model where team members are human-agent-pairs, then describe an approach to uniformly model humans’ cognitive load, agents’ processing load, and the load of human-agentpairs. 3.1. A human-centered teamwork model We use hHi,Aii to denote human-agent-pair (HAP) i, and take a team as being composed of a group of collaborative HAPs. As illustrated in Fig. 1, each human team member interacts with the rest of the team through his/her agent, which maintains a model of its human partner, an information/ activity processing model, a communication model, and a teamwork model including a model for each HAP in the team. An agent uses the model of its human partner to track the human’s cognitive states (e.g. goals, intentions, trust) and presents the model to the human to establish mutual understanding of the state of affairs. Through the communication model and processing model, an agent also dynamically updates its models of other HAPs, produce a shared picture of team activities, and presents the shared cognitive structure to its human partner. People are limited information processors, and so are intelligent agent systems. In this simple human-centered teamwork model, we withdraw the assumption that an agent knows all the information/intentions communicated from other teammates. Instead, we take the constraint that an agent’s processing capacity is limited by computing resources, inference knowledge, concurrent tasking capability, etc. We contend that due to limited processing capacity, an agent may only have opportunities to process (make sense of) a portion of the incoming information, with the rest ignored. Taking this approach will largely change the way in which an agent views (models) the involvement and cooperativeness of its teammates in a team activity. In other words, the establishment of shared mental models regarding team members’ beliefs, intentions, and responsibilities can no longer rely on inter-agent communication only. This being said, we are not dropping the assumption
Teammates
Agent Processing Model
HAI
Human Partner Agent Comm Model
Agent n
Human-Agent Pair n
Fig. 1. Human-centered teamwork model.
Human
X. Fan et al. / Cognitive Systems Research 11 (2010) 108–119
that ‘‘teammates are trustable”. We still stick to this, only that team members cannot over-trust each other; an agent has to consider the possibility that its information being shared with others might not be as effective as expected due to the recipients’ limited processing capacities. 3.2. Computational cognitive capacity model A hidden Markov model (HMM) (Rabiner, 1989) is a statistical approach to modeling systems that can be viewed as a Markov process with unknown hidden parameters. The hidden state variables are not directly visible, but influenced by certain observable variables. Each hidden state has a probability distribution over the possible observable symbols. Therefore the sequence of observable states can be used to make inference on the sequence of hidden states of a HMM. A HMM is denoted by k = hN,V,A,B,pi, where N is a set of hidden states, V is a set of observation symbols, A is a set of state transition probability distributions, B is a set of observation symbol probability distributions (one for each hidden state), and p is the initial state distribution. HMM have been widely applied in bioinformatics and temporal pattern recognition (such as speech, handwriting, and gesture recognition). An intelligent agent being a cognitive aid, it is desirable that the model of its human partner implemented within the agent is cognitively-acceptable, if not descriptively accurate. However, building a cognitive load model that is cognitively-acceptable is not trivial. A HMM-based approach is used in this study for several reasons. First, cognitive load has a dynamic nature. As we mentioned above, being able to monitor the dynamics of a human partner’s cognitive load over time is very useful for an agent to proactively identify collaboration opportunities in human-centered teamwork. The inference of the instantaneous cognitive load can be cast as a temporal pattern recognition problem, which is especially suitable to adopt a HMM. Second, the HMM approach demands that the system being modeled (here, human’s cognitive capacity) has both observable and hidden state variables, and the hidden variables should be correlated to the observable variables. As discussed above, there is ample evidence supporting secondary task performance as a highly sensitive and reliable technique for measuring human’s cognitive load (Paas et al., 2003). We thus can use the secondary task performance as observable signals to estimate the hidden cognitive load state. For example, the secondary task performance can be measured in terms of the number of items correctly recalled. According to Miller’s 7 ± 2 rule, the observable states will take integer values from 0 to 9 (assume it is 9 when the number of items correctly recalled is no less than 9). Hence, the strong tie, as uncovered in cognitive studies, between human’s cognitive load and his/her secondary task performance also justifies the use of a HMM approach. In the following, we adopt the HMM approach to model human’s cognitive capacity. It is thus assumed that at each time step the secondary task performance of a
111
human subject in a team composed of HAP is observable to all the team members. Human team members’ secondary task performance is used for estimating their hidden cognitive loads. For an example, Fig. 2 gives a 5-state HMM model of human cognitive load. The hidden states are 1 (negligibly-loaded), 2 (slightly-loaded), 3 (fairly-loaded), 4 (heavily-loaded), and 5 (overly loaded). The observable states are tied with secondary task performance. This example assumes the observable states range from 0 to 9. Based on the B Matrix given in Fig. 2, it is very likely that the cognitive load of the subject is ‘‘negligibly” when the observable state is 9. However, to determine the current hidden load status of a human partner is not trivial. The model might be oversensitive if we only consider the last step secondary task performance to locate the most likely hidden state. There is ample evidence suggesting that human cognitive load is a continuous function over time and does not manifest sudden shifts unless there is a fundamental changes in tasking demands. To address this issue, we can place a constraint on the state transition coefficients, e.g. no jumps of more than 2 states are allowed. In addition, we take the position that, a human subject is very likely overloaded if his secondary task performance is mostly low in recent time steps, while he is very likely not overloaded if his secondary task performance is mostly high recently. This leads to the following Windowed-HMM approach. Given a pre-trained HMM k of human cognitive load and the recent observation sequence Ot of length w, let parameter w be the effective window size, ekt be the estimated hidden state at time step t. First apply the HMM to the observation sequence to find the optimal sequence of hidden states S kt ¼ s1 s2 ; . . . ; sw (Viterbi algorithm). Then, compute the estimated hidden state ekt for the current time step, viewing it as a function of S kt . We consider all the hidden states in S kt , weighted by their respective distance to ekt1 (the estimated state of the last step): the closer of a state in S kt to ekt1 , the higher probability of the state being ekt . ekt is set to be the state with the highest probability (note that a
0.2 0.4 0.4
1
negligibly
0.1 0.3
0.4
0.4
2
0.2 0.1 slightly
0.1 0.2
0.2
0.25 0.4
3
0.25 0.1 fairly
0.6
4
5
0.2 0.2 heavily
Fig. 2. A HMM cognitive load model.
overly
112
X. Fan et al. / Cognitive Systems Research 11 (2010) 108–119
state may have multiple appearances in S kt ). More formally, the probability of state s 2 S being ekt is given by: X k pk ðs; tÞ ¼ gðsj Þejsj et1 j ; ð1Þ s¼sj 2S kt
Pw where gðsj Þ ¼ ej = k¼1 ek is the weight of sj 2 S kt (the most recent hidden state has the most significant influence in predicting the next state). The estimated state for the current step is the state with maximum likelihood: ekt ¼ argmax pk ðs; tÞ:
ð2Þ
s2S kt
3.3. Agent processing load model As we mentioned, resource-bounded agents can be overwhelmed with the incoming information; an agent has to adopt a realistic information processing strategy. One solution is load state based. For example, an agent can ignore all the incoming information when it is overloaded, while selectively processes messages when it is fairly loaded. This approach requires an agent to be aware of its own processing load status. In general, statistical approaches or Fuzzy techniques can be employed for an agent to model and estimate other agents’ (including its own) instantaneous processing load. However, the model parameters of agent processing load and human cognitive load might have interaction effects on the measure of team performance. In order to gain a clear understanding of the effectiveness of the HMM-based cognitive load model in human-agent collaboration, we employ a uniformed model for agent processing load estimation as follows. According to schema theory (Paas & Merrienboer, 1993), multiple elements of information can be chunked as single elements in cognitive schemas. A schema can hold a huge amount of information, yet is processed as a single unit. We adapt this idea and assume that agent i’s estimation of agent j’s processing load at time step t is a function of two factors: the number of chunks cj(t) and the total number sj(t) of information being considered by agent j. If cj(t) and sj(t) are observable to agent i, agent i can employ a Windowed-HMM approach as described in Section 3.2 to model and estimate agent j’s instantaneous processing load.
For instance, we can use a 5-state HMM to model agent processing load, with multivariate Gaussian observation probability distributions employed for the hidden states. 3.4. HAP’s processing load model As discussed above, a HAP is viewed as a unit when teaming up with other HAPs. The processing load of a HAP can thus be modeled as the co-effect of the processing load of the agent and the cognitive load of the human partner as captured by the agent. Suppose agent Ai has models for its processing load and its human partner Hi’s cognitive load. Denote the agent processing load and human cognitive load of HAP hHi,Aii at time step t by lit and mit , respectively. Agent Ai can use lit and mit to estimate the load of hHi,Aii as a whole. Similarly, if ljt and mjt are observable to agent Ai, it can estimate the load of hHj,Aji. For model simplicity, we can still employ HMM models for HAP processing load, with the estimated hidden states of the corresponding agent processing load and human cognitive load as the input observation vectors. As revealed from the above discussion, the number of hypothetical hidden states is a critical parameter for modeling both human’s cognitive load and agent’s processing load. Below, we take an experimental approach to collect realistic data to learn HMM-based cognitive load models with varied number of hidden states, investigating the properties of HMM models that are acceptable for modeling human cognitive load. The study of HMM-based agent processing load model is left for future research. 4. Cognitive task design and data collection To study the dynamics of cognitive loads when humans are working in collaborative settings, we developed a system simulating a dynamic battlefield infosphere. A team can have several team members; each of them has limited observability (say, covering only a portion of the battlefield). The goal of a team is to selectively share information among members in a timely manner to develop global situation awareness (e.g. for making critical decisions). Team members share information through a GUI with a shared belief map as shown in Fig. 3. A shared belief map is a table with color-coded info-cells – cells associated with information. Each row captures the belief model of one
Fig. 3. Shared belief map.
X. Fan et al. / Cognitive Systems Research 11 (2010) 108–119
team member, and each column corresponds to a specific information type (all columns together define the boundary of the information space being considered). Thus, info-cell Cij of a map encodes all the beliefs (instances) of information type j held by agent i. Color coding applies to each info-cell to indicate the number of information instances held by the corresponding agent. The concept of shared belief map facilitates the development of global situation awareness. It helps maintain and present a human partner with a synergy view of the shared mental models evolving within a team. Information types that are semantically related (e.g. by inference rules) can be closely organized in the map. Hence, nearby info-cells can form meaningful plateaus (or contour lines) of similar colors. Colored plateaus indicate those sections of a shared mental model that bear high overlapping degrees. In addition, the perceptible color (hue) difference manifested from a shared belief map indicates the information difference among team members, and hence visually represents the potential information needs of each team member. We designed a primary task and a secondary task for the human subjects. The primary task of a human subject is to share the right information with the right party at the right time. Every time step (about 15 s), simulated spot reports (situational information) will be generated and randomly dispatched to team members. An info-cell on a person’s belief map will be flashed (for 2 s) whenever that person gets new information of the type represented by that cell. The flashed cells are exactly those with newly available information that should be shared among teammates at that time step. An info-cell is frozen (the associated information is no longer sharable) when the next time step comes. Hence, a human subject has to share the newly available information with other team members under time stress. To share the information associated with an infocell, a human subject needs to click the right mouse button on the cell to pop-up a context menu, and select the receiving teammate(s) from the pop-up menu. Because information is randomly dispatched to team members, to each participant, the flashed info-cells vary from time to time, and there can be up to 12 info-cells flashed at each time step. To choose an appropriate secondary task for the domain problem at hand is not trivial, although the general rationale is that the secondary task performance should vary as the difficulty of the primary task increases. Typically, a secondary task requires the human subjects to respond to unpredictable stimuli in either overt (e.g. press a button) or covert (e.g. mental counting) ways. Just for the purpose of estimating a human subject’s cognitive load, any artificial task can be used as a secondary task to force the subject to go through. However, in a realistic application, we have to make sure that the selected secondary task interacts with the primary task in meaningful ways, which is not easy and often depends on the domain problem at hand. Specific to this study, the secondary task of a human subject is to remember and mark the cells being flashed (not
113
necessarily in the exact order). Secondary task performance at step t is thus measured as the number of cells marked correctly at t. The more number of cells marked correctly, the lower the subject’s cognitive load. While the experiment is designed in a collaborative setting with a meaningful primary task, we here especially focus on the secondary task performance. We would like to collect realistic data of secondary task performance and use the data to learn and understand the properties of HMM models of human cognitive loads. We randomly recruited 30 human subjects from undergraduate students and randomly formed 10 teams of size three. We ran the simulation system 9 times for each team and collected the secondary task performance of each team member: the number of info-cells marked correctly at each time step. Each run of the experiment has 45 time steps. We thus collected 10 3 9 = 270 observation sequences of length 45. 5. Learning cognitive load models 5.1. Learning procedure With the collected observation sequences, we took a two-phase approach. We first applied a window method to learn HMMs from sub-sequences and then evaluated each learned model with the original 270 observation sequences. Specifically, we assume human cognitive load can be modeled by HMMs with n hypothetical hidden states where 3 6 n 6 10. To train a n-state HMM, we applied a window of width n on the original observation sequences to extract sub-sequences as training data. For example, to learn a 5-state HMM, a window of width 5 was used, which produced 270 41 = 11,070 training samples. The training samples then were fed to the Baum–Welch algorithm (Rabiner, 1989) to learn HMMs. The training process was terminated upon the convergence of its loglikelihood. Given the possibility of convergence at local maxima, we randomly generated initial guesses of parameters (A, B, p) and repeated 100 times for each hypothetical n-state model. Consequently, we obtained 8 model spaces, each has 100 HMMs with n hidden states (3 6 n 6 10). The top component of each subfigure in Figs. 4 and 5 plots ascendingly the final log-likelihoods of the learned models of the corresponding model space. In the second phase, for each learned HMM, we used the Forward procedure (Rabiner, 1989) to evaluate its performance by computing the occurrence probabilities (loglikelihoods) of the original 270 observation sequences, which produced 270 log-likelihoods. The middle component of each subfigure in Figs. 4 and 5 plots, for each corresponding model plotted in the top component, the mean of the 270 log-likelihoods resulted from fitting (testing) the model with the original observation sequences. The bottom components plot the standard deviations of each model in fitting.
114
X. Fan et al. / Cognitive Systems Research 11 (2010) 108–119
b
3—State Hidden Markov Model
4
x 10
—6.55 —6.6 —6.65 log—likelihood in training
—8.6 log—likelihood in training
—80
—80
mean log—likelihood in fitting
10 0
σ of log—likelihood in fitting 0
10
20
30
40
50
60
70
80
90
100
log—likelihood log—likelihood
—8.8
—90 20
x 10
—8.4
—6.7
—85
4—State Hidden Markov Model
4
—8.2
log—likelihoo d
—6.5
log—likelihood log—likelihood
log—likelihood
a
—85 mean log—likelihood in fitting —90 20 10 0
σ of log—likelihood in fitting 0
10
samples sorted according to log—likelihood in training
d
5—State Hidden Markov Model
5
x 10
log—likelihood
—1 —1.02 —1.04 —1.06
log—likelihood in training
log—likelihood log—likelihood
—1.08 —80
40
50
60
70
80
90
100
6—State Hidden Markov Model
5
—1.18
x 10
—1.2 —1.22 —1.24 log—likelihood in training —1.26 —80
—85 mean log—likelihood in fitting —90 20 10 0
30
samples sorted according to log—likelihood in training
σ of log—likelihood in fitting 0
10
20
30
40
50
60
70
80
90
100
samples sorted according to log—likelihood in training
log—likelihood log—likelihood
log—likelihood
c
20
—85 mean log—likelihood in fitting —90 20 10 0
σ of log—likelihood in fitting 0
10
20
30
40
50
60
70
80
90
100
samples sorted according to log—likelihood in training
Fig. 4. Each subfigure has top, middle, and bottom components, which plot the log-likelihoods of models after training, the log-likelihoods in testing, and the standard deviation of the log-likelihoods in testing. It clearly indicates (1) Maxima of each model space (from 3 to 10) form a 3-layer structure; (2) Better trained models lead to better testing log-likelihoods; and (3) Better trained models incur lower deviations. Model space also varied: as the number of hidden states increased from 3 to 10, the fraction of models at the middle and bottom levels reduced with the fraction of models at the top level increased.
5.2. The model space of cognitive load Each subfigure in Figs. 4 and 5 has top, middle, and bottom components, which plot the log-likelihoods of models after training, the log-likelihoods in testing, and the standard deviation of the log-likelihoods in testing. It clearly indicates that the model spaces with the number of hidden states ranging from 3 to 10 share some common properties. First, each model space (from 3 to 10) has a 3-layer structure, which means the log-likelihood maxima are clustered in three levels (models at the middle and bottom levels converged to local maxima). Second, better trained models performed better in testing: the trend of the log-likelihoods in fitting is consistent with the trend of the log-likelihoods in training (as ordered ascendingly in the top component). Third, better models produced lower deviation in fitting. Also, as the number of hidden states increased from 3 to 10, the fraction of models at the middle and bottom levels reduced with the fraction of models at the top level (converged at global maxima) increased. Extremely, most of the models in the space of 3-state HMMs are ‘bad’ models,
while most of the models in the space of 10-state HMMs are ‘good’ models. 5.3. Properties of ‘Good’ cognitive load models We may wonder whether there are any properties shared by the ‘good’ models as appeared at the top-layer of each model space. We first examine B, the observation probability distributions. There is a strong evidence that the B’s of models at the top-layer demonstrated more distinguishable peaks, compared with those at lower layers which typically had indistinguishable peaks or mixed distributions. Fig. 6c gives the B of one 5-state model at the top-layer. There are several statistics to check on the model parameter A, state transition probability distributions. With only top-layer models considered, Table 1 gives the means of the longest hidden state jumps (LHSJ) and the mean fractions of transition pairs with stronger backward jumps (FSB). For example, for HMMs with 5 states, on average state transitions have jumps no more than 2.1 states, and of all the possible state transition pairs (Aij,Aji) where
X. Fan et al. / Cognitive Systems Research 11 (2010) 108–119
log—likelihood
b
7—State Hidden Markov Model
5
—1.34
x 10
—1.36 —1.38 —1.4 log—likelihood in training
—1.55 log—likelihood in training —1.6
—80
—85 mean log—likelihood in fitting —90 20 10 0
x 10
—1.5
σ of log—likelihood in fitting 0
10
20
30
40
50
60
70
80
90
100
log—likelihood log—likelihood
log—likelihood log—likelihood
—1.42
8—State Hidden Markov Model
5
—1.45
log—likelihoo d
a
—80
—85 mean log—likelihood in fitting —90 20 10 0
σ of log—likelihood in fitting 0
10
d
9—State Hidden Markov Model
5
log—likelihood
x 10
—1.65 —1.7 log—likelihood in training
log—likelihood log—likelihood
—1.75 —80
40
50
60
70
80
90
100
x 10
—1.8 —1.85 log—likelihood in training —1.9 —80
—85 mean log—likelihood in fitting —90 20 10 0
30
10—State Hidden Markov Model
5
—1.75
σ of log—likelihood in fitting 0
10
20
30
40
50
60
70
80
90
100
samples sorted according to log—likelihood in training
log—likelihood log—likelihood
log—likelihood
c
20
samples sorted according to log—likelihood in training
samples sorted according to log—likelihood in training
—1.6
115
—85 mean log—likelihood in fitting —90 20 10 0
σ of log—likelihood in fitting 0
10
20
30
40
50
60
70
80
90
100
samples sorted according to log—likelihood in training
Fig. 5. Each subfigure has top, middle, and bottom components, which plot the log-likelihoods of models after training, the log-likelihoods in testing, and the standard deviation of the log-likelihoods in testing. It clearly indicates (1) Maxima of each model space (from 3 to 10) form a 3-layer structure; (2) Better trained models lead to better testing log-likelihoods; and (3) Better trained models incur lower deviations. Model space also varied: as the number of hidden states increased from 3 to 10, the fraction of models at the middle and bottom levels reduced with the fraction of models at the top level increased.
1 6 i < j 6 5 and state j represents a higher cognitive load than state i, 66% have stronger backward transitions (Aij < Aji). Of all the models with states from 3 to 10, the means of LHSJ, ranging from 1.33 to 4.19, linearly related to the number of states with slope 0.407 2/5. Interestingly, all models have relatively more transition pairs with stronger backward jumps. This seems to suggest that humans can more easily recover from than switch to a higher cognitive load state. For each state and each model, Table 2 gives the mean number of transitions and the mean of initial state probability. For each category except models with 3 states, it seems that the highest one (4–6) or two (7–10) states have much few number of transitions than the other states. It may suggest that humans, once become ‘‘overly” loaded, can not easily return to cognitively favorable states. The highest state of each model category also assume extremely lower initial state probability, and the lower states have relatively higher initial state probability. This is intuitively true because humans seldomly get overloaded in the beginning. For each model category, Table 2 also indicates such a trend: as hidden state changed from low to high, the
mean number of state transitions increased to its maximum (in italic), while pi decreased to minimum (with the highest state ignored). This may indicate that the more active a state is, the less likely a human can be initially in that state. 5.4. The number of hidden states A crucial question is, how many hidden states are appropriate for modeling cognitive load using HMMs? Fig. 7a gives the Boxplot of likelihoods in fitting for all the models in each space, and Fig. 7b gives the Boxplot of likelihoods in fitting for top-layer models only. Fig. 7a says that the model variance is small enough when the number of hidden states is more than 4. Fig. 7b shows that there is a linear improvement on the top-layer models as state number increases. However, because the observable measure of secondary task performance ranges from 0 to 9, models with too many hidden states may overfit the given data. In addition, the trained models with more than 8 states demonstrated ‘strongly-connected’ sub-structures. Fig. 8 gives a top-layer model with 10 states, where links on the upper part represent forward transitions and those on the
116
X. Fan et al. / Cognitive Systems Research 11 (2010) 108–119
a 0.02
0.01
0.04 0.94
0.54
1
0.1 negligibly
0.43
2
0.48
slightly 0.05
a
0.83
3
overly b (k)
0.5
0.8
1.0
0.7
0.5
0.6
1.0
0.4
5
0.04 heavily
j
1.0
0.9
0.5
0.96
4
0.09
fairly 0.07
c
ij
1
0.009
0.04
probability
probabiliyt
b
0.35
j=1
j=2
j=3
0.5 1.0
0.3
0.5
0.2
1.0
0.1
0.5
0
0.0
1
2
3
4
5
j=4
j=5 0
i
1
2
3
4
5
6
7
8
9
observable states (k)
Fig. 6. (a) An example 5-state HMM; (b) transition probability distributions A; (c) observation probability distributions B.
Table 1 Means of the longest hidden state jumps (LHSJ) and mean fractions of transition pairs with stronger backward jumps (FSB) for HMMs with states from 3 to 10 States
3
4
5
6
7
8
9
10
LHSJ FSB
1.33 1.00
1.76 0.76
2.10 0.66
2.56 0.62
2.83 0.60
3.30 0.58
3.82 0.56
4.19 0.54
LHSJ = 0.0906 + 0.407 states.
lower part are backward ones (state transition probabilities are visualized in color densities). It is clear that this 10-state model can be reduced to a 5-state model if we view (1,2,3) and (4,5,6,7) as two compound states. Practically, it is better to pick from top-layer models with 7 states, which have a mean likelihood e81-only slightly lower than 10-state models. It is also acceptable to choose from the top-layer models with 5 states; indeed, the performance could be as good as 7-state models if the
best of 5-state models were picked. In addition, there is ample evidence suggesting that human cognitive load is a continuous function over time and does not manifest sudden shifts unless there are fundamental changes in tasking demands. If we place a constraint on the state transition coefficients such that no jumps of more than 3 states are allowed, then models with 5, 6, or 7 states are best choices (Ref. Table 1). Moreover, models with fewer states are more interpretable. Like the 5-state model in Fig. 6, it is easier to assign meaningful names to the hidden states. In sum, with all the above factors considered, it seems we can follow 7 ± 2 rule to choose the number of hidden states of a HMM-based cognitive load model. To examine the efficacy of such a principle, we used 5state HMM models and conducted a study (Fan & Yen, 2007) involving two team types: ten of the teams performed with cognitive load estimation available from agents and ten teams with no such estimation. As shown in Fig. 9, below the shared belief map there is a load display for each
Table 2 For each state, the mean number of state transitions and the mean of initial state probability pi States
1
2
3
4
5
6
7
8
9
10
3 4 5 6 7 8 9 10
1.00/0.64 1.89/0.44 2.32/0.41 2.87/0.33 3.24/0.29 3.52/0.26 4.09/0.24 4.45/0.21
2.00/0.31 2.20/0.26 2.70/0.20 3.60/0.18 3.92/0.18 4.27/0.15 4.89/0.12 5.21/0.12
2.00/0.045 2.75/0.27 3.04/0.18 3.66/0.14 4.40/0.12 4.64/0.12 5.41/0.11 5.62/0.10
1.86/0.037 2.98/0.19 3.65/0.15 4.44/0.11 4.92/0.10 5.60/0.10 5.85/0.09
1.70/0.028 3.14/0.17 4.07/0.12 5.01/0.09 5.70/0.09 6.27/0.08
1.51/0.025 2.68/0.16 4.34/0.11 5.58/0.08 6.37/0.07
1.23/0.019 2.56/0.15 5.12/0.09 6.14/0.08
1.16/0.017 2.66/0.16 4.77/0.09
0.93/0.016 2.61/0.14
0.79/0.015
Cell content: number of state jumps/p.
X. Fan et al. / Cognitive Systems Research 11 (2010) 108–119
a —79
b—79
Boxplot of log-likelihood in fitting
—81 —82 —83
mean
—84
—81 —82 —83 —84
—85
—85
—86
—86
—87
Boxplot of mean log-likelihood in fitting (level 3 only)
—80 max
log—likelihood: exp(y)
log-likelihood: exp(y)
—80
117
3
4
5
6
7
8
9
10
—87
3
4
Number of Hidden State
5 6 7 8 Number of Hidden States
9
10
Fig. 7. Boxplot of model log-likelihoods.
Fig. 8. A 10-state model of cognitive load.
team member. There are two curves in a display: the blue (dark) one plots human’s instantaneous cognitive loads and the red one plots the processing loads of a HAP as a whole. If there are n team members, each agent needs to maintain 2n HMM-based models to support the load displays. The human partner of a HAP can adjust her cognitive load at runtime, as well as monitor another HAP’s
agent processing load and its probability of processing the information she sends at the current time step. Thus, the more closely a HAP can estimate the actual processing loads of other HAPs, the better information sharing performance the HAP can achieve. The experiment result indicated that teams with HMM-based load estimation performed significantly (statistically) better than teams with no load estimation. The reason is that being able to estimate other team members’ cognitive load allows them to share the needed information with the right party at the right time. 6. Discussion and conclusion In a teamwork context involving both agents and humans, empowering agents with cognitive models of human team members can benefit the overall team performance in many ways. For example, although human use of
Fig. 9. Exploiting HMM-based cognitive load model in teamwork.
118
X. Fan et al. / Cognitive Systems Research 11 (2010) 108–119
cognitive simplification mechanisms (e.g. heuristics) does not always lead to errors in judgment, it can lead to predictable biases in responses (Lord & Maher, 1990). Such biases can be greatly alleviated if a human is aided by a cognitive agent that has a trained model of the human’s cognitive inclination. A realistic cognitive model also allows an agent to dynamically adjust its automation level. When its human peer is becoming overloaded, an agent can take over resource-consuming tasks, allowing the human to shift his/her limited cognitive resources to tasks where a human’s role is indispensable. When its human peer’s cognitive load is relatively low, an agent can take the chance to observe the human’s operations to refine its cognitive model of the human. Moreover, many studies have documented that human choices and behaviors do not agree with predictions from rational models. If an agent could establish and use cognitively-inspired computational models in offering services (e.g. decision recommendation), it would be much easier for a human user to understand, trust, and use the agent services. However, it is extremely difficult to develop cognitive models that are descriptively accurate. For this reason, some simplification mechanisms or assumptions have been adopted in modeling human characteristics (e.g. Horvitz, Kadie, Paek, & Hovel, 2003; Sarne & Grosz, 2007). In this paper, we employed an HMM-based approach, where the HMM model of human cognitive load is trained through data collected in experiments recording the observable task performance of human participants. At a particular moment, an agent uses a weighted approach to ‘‘understand” (predict) its human partner’s instantaneous cognitive load, with the most recent hidden state having the most significant influence. It is also worth noting that just for the purpose of estimating human cognitive load, any artificial task can be used as a secondary task to force the subjects to go through. Specific to this study, the human subjects were asked to remember and mark the cells being flashed on the GUI. In a realistic application, we have to make sure that the selected secondary task is somehow related to the primary task in meaningful ways. This is not easy and often depends on the problem to be studied and the nature of the domain. When the secondary task approach is not an option, the estimation of cognitive load can also be conducted by utilizing a variety of devices to track users’ focus of attention (Hinckley, Pierce, Sinclair, & Horvitz, 2000; Maglio, Matlock, Campbell, Zhai, & Smith, 2000). To sum up, inspired by human information processing system, we proposed an HMM-based load model for members of human-agent teams. In order to develop realistic cognitive load models, we conducted a cognitive experiment to capture human’s observable secondary task performance and used that to train hidden Markov models (HMM) with varied numbers of hypothetical hidden states. The result indicates that each model space has a three-layer structure, and it is suggested to choose models with 7 ± 2 hidden states. With all the constraints considered, it is rec-
ommended that HMMs with 5, 6, or 7 states are good choices for modeling human cognitive load. Statistical analysis revealed that good models also share some common properties: (1) observation probability distributions have distinguishable peaks for different states; (2) highest hidden states have extremely lower initial state probabilities; and (3) the longest state jumps are linearly related to the number of states with a slope 2/5, and there are more transition pairs with stronger backward jumps. These can be used in guiding the selection of HMM-based cognitive load models. For future research along this line, it is worthwhile to investigate the properties of HMM-based agent processing models and HAP processing models, and in the long run, to fully understand how shared cognitive structures can enhance human-agent team performance. References Baddeley, A. D. (1992). Working memory. Science, 255, 556–559. Bradshaw, J., Sierhuis, M., Acquisti, A., Gawdiak, Y., Prescott, D., Jeffers, R., et al. (2002). What we can learn about human-agent teamwork from Practice. In Workshop on teamwork and coalition formation at AAMAS 02. Bologna, Italy. Cohen, P. R., & Levesque, H. J. (1991). Teamwork. Nous, 25(4), 487–512. Fan, X., Sun, B., Sun, S., McNeese, M., & Yen, J. (2006). RPD-enabled agents teaming with humans for multi-context decision making. In Proceedings of the fifth international joint conference on autonomous agents and multiagent systems (pp. 34–41). ACM Press. Fan, X., & Yen, J. (2007). Realistic cognitive load modeling for enhancing shared mental models in human-agent collaboration. In Proceedings of the sixth international joint conference on autonomous agents and multiagent systems (pp. 383–390). ACM Press. Fan, X., Yen, J., & Volz, R. A. (2005). A theoretical framework on proactive information exchange in agent teamwork. Artificial Intelligence, 169, 23–97. Grosz, B., & Kraus, S. (1996). Collaborative plans for complex group actions. Artificial Intelligence, 86, 269–358. Hinckley, K., Pierce, J., Sinclair, M., & Horvitz, E. (2000). Sensing techniques for mobile interaction. In Uist’00: Proceedings of the 13th annual ACM symposium on user interface software and technology (pp. 91–100). New York, NY, USA: ACM. Horvitz, E., Kadie, C., Paek, T., & Hovel, D. (2003). Models of attention in computing and communication: From principles to applications. Communication of ACM, 46(3), 52–59. Klimoski, R., & Mohammed, S. (1994). Team mental model: Construct or metaphor? Journal of Management, 20(2), 403–437. Lang, A. (2000). The limited capacity model of mediated message processing. Journal of Communication Winter, 46–70. Lord, R. G., & Maher, K. J. (1990). Alternative information processing models and their implications for theory, research, and practice. Academy of Management Review, 15(1), 9–28. Maglio, P. P., Matlock, T., Campbell, C. S., Zhai, S., & Smith, B. A. (2000). Gaze and speech in attentive user interfaces. In Advances in multimodal interfaces – ICMI 2000 (pp. 1–7). Berlin: Springer. Miller, G. A. (1956). The magical number seven, plus or minus two: Some limits on our capacity for processing information. Psychological Review, 63, 81–97. Norling, E. (2004). Folk psychology for human modelling: Extending the BDI paradigm. In AAMAS’04: International conference on autonomous agents and multi agent systems (pp. 202–209). Paas, F., & Merrienboer, J. V. (1993). The efficiency of instructional conditions: An approach to combine mental-effort and performance measures. Human Factors, 35, 737–743.
X. Fan et al. / Cognitive Systems Research 11 (2010) 108–119 Paas, F., Tuovinen, J. E., Tabbers, H., & Gerven, P. W. M. V. (2003). Cognitive load measurement as a means to advance cognitive load theory. Educational Psychologist, 38(1), 63–71. Rabiner, L. R. (1989). A tutorial on hidden markov models and selected applications in speech recognition. Proceedings of the IEEE, 77, 257–286. Rouse, W., Cannon-Bowers, J., & Salas, E. (1992). The role of mental models in team performance in complex systems. IEEE Transaction on System Man and Cyber, 22(6), 1296–1308.
119
Sarne, D., & Grosz, B. J. (2007). Sharing experiences to learn user characteristics in dynamic environments with sparse data. In Aamas’07: Proceedings of the 6th international joint conference on autonomous agents and multiagent systems (pp. 202–209). Sweller, J. (1988). Cognitive load during problem solving: Effects on learning. Cognitive Science, 12, 257–285. Xie, B., & Salvendy, G. (2000). Prediction of mental workload in single and multiple task environments. International Journal of Cognitive Ergonomics, 4, 213–242.