Statistical modeling for visualization evaluation through data fusion

Statistical modeling for visualization evaluation through data fusion

Applied Ergonomics xxx (2017) 1e11 Contents lists available at ScienceDirect Applied Ergonomics journal homepage: www.elsevier.com/locate/apergo St...

2MB Sizes 2 Downloads 62 Views

Applied Ergonomics xxx (2017) 1e11

Contents lists available at ScienceDirect

Applied Ergonomics journal homepage: www.elsevier.com/locate/apergo

Statistical modeling for visualization evaluation through data fusion Xiaoyu Chen, Ran Jin* Laboratory of Data Science and Visualization, Grado Department of Industrial and Systems Engineering, Virginia Tech, Blacksburg, VA 24061-0001, USA

a r t i c l e i n f o

a b s t r a c t

Article history: Received 15 June 2016 Received in revised form 19 December 2016 Accepted 22 December 2016 Available online xxx

There is a high demand of data visualization providing insights to users in various applications. However, a consistent, online visualization evaluation method to quantify mental workload or user preference is lacking, which leads to an inefficient visualization and user interface design process. Recently, the advancement of interactive and sensing technologies makes the electroencephalogram (EEG) signals, eye movements as well as visualization logs available in user-centered evaluation. This paper proposes a data fusion model and the application procedure for quantitative and online visualization evaluation. 15 participants joined the study based on three different visualization designs. The results provide a regularized regression model which can accurately predict the user's evaluation of task complexity, and indicate the significance of all three types of sensing data sets for visualization evaluation. This model can be widely applied to data visualization evaluation, and other user-centered designs evaluation and data analysis in human factors and ergonomics. © 2016 Elsevier Ltd. All rights reserved.

Keywords: Data fusion Data visualization Electroencephalogram (EEG) Eye tracking User-centered designs Visualization evaluation

1. Introduction The increasing volume and variety of data make it a computational and cognitive challenging task for one to obtain valuable information and insights from big data (Michael and Miller, 2013; Tien, 2003). How to deliver the right information to the right people at the right time in a specific scenario remains an open question (Keim et al., 2008). To address these issues, user-centered data visualization can graphically display raw data to help users rapidly narrow down to interested segments from a large scope, and effectively obtain the insights from data (Fekete et al., 2008). Many data visualization techniques have been developed to efficiently reduce the mental workload and to enlarge user's perception of insights from data. Indeed, different visualization approaches should be selected according to the objectives. In addition to the static visualizations (DeLamarter, 1986; Rohrer and Swing, 1997), a large number of interactive designs have been proposed to visualize high dimensional, large-sample and dynamic data sets (Carlis and Konstan, 1998; Van Ham and Van Wijk, 2004; Wills, 1997). In the past twenty years, Virtual Reality (VR) also provided a platform to immerse the users in data visualization (Bryson, 1996). While its counterpart, Augmented Reality (AR), mixes up the real world and the visualization designs to present

* Corresponding author. E-mail address: [email protected] (R. Jin).

graphical visualizations (Azuma, 1997). These new VR and AR based visualization methods have been widely used in many applications, such as medical science (Bichlmeier et al., 2007; Hansen et al., 2010), education (Billinghurst, 2002; Kaufmann, 2003) and industry (Doil et al., 2003; Nee et al., 2012). Data analytics results were also integrated with AR platform for manufacturing applications (Chen et al., 2016). Despite the tremendous needs of new visualization methods and platforms for wide applications, researchers and practitioners have long identified the need to evaluate the visualization tools to understand the potential and limitations (Lam et al., 2011; Plaisant, 2004). In the literature, a lot of visualization evaluation methods in Human-Computer Interaction (HCI) have been proposed. Field observations (Isenberg et al., 2008) and laboratory observations (Hilbert and Redmiles, 2000) are popular evaluation approaches to assess the visualization works via the understanding of the environments. Besides, heuristic evaluation, formative usability test and summative evaluation methods were also reported to evaluate user-centered designs (Hix et al., 1999). To evaluate the provided insights, a longitudinal study of visual analytics was developed and captured for the entire visualization and analysis process (Saraiya et al., 2006). Furthermore, the controlled experiment approach was used in evaluation study design to provide the objective measures (Willett et al., 2007). Although various evaluation approaches have been presented, they still show limitations in the following three aspects: (1) the lack of unobtrusive data collection during user's’ operations in visualization evaluation, (2) the lack of

http://dx.doi.org/10.1016/j.apergo.2016.12.016 0003-6870/© 2016 Elsevier Ltd. All rights reserved.

Please cite this article in press as: Chen, X., Jin, R., Statistical modeling for visualization evaluation through data fusion, Applied Ergonomics (2017), http://dx.doi.org/10.1016/j.apergo.2016.12.016

2

X. Chen, R. Jin / Applied Ergonomics xxx (2017) 1e11

consistent standards for evaluating different visualization tools, and (3) the lack of online data analytics for visualization evaluation in a timely manner. In order to address the aforementioned limitations, various types of sensing techniques and the corresponding data analysis are considered. In Human Factors and Ergonomics (HFE), a typical unobtrusive and quantitative approach is used to use sensors, such as EEG devices, eye tracking systems, motion tracking systems, etc., to assess the mental workload. In this paper, a similar strategy was used. EEG signals are highly sensitive to variations in mental workload and have been applied to mental workload assessment in several situations. Okogbaa et al. (1994) quantified the relationship between EEG signals and white collar worker mental fatigue for human reliability prediction. Driver' drowsiness, which had a close relationship to mental fatigue, was studied by analyzing the changes of EEG a, b, b=a, ða þ qÞ=b indices among different driving periods (Åkerstedt and Torsvall, 1984; De Waard and Brookhuis, 1991; Eoh et al., 2005). EEG spectral features were extracted to assess working memory load when the participants were performing tasks with different difficulties (Gevins et al., 1998; Murata, 2005). As another powerful tool, the use of eye movements has been established in Psychology as a means for analyzing user attention patterns in information processing tasks (Rayner, 2012). In recent years, eye movements have many applications in HFE research. For example, Dehais et al. (2012) used eye tracking techniques as an index of attentional focus to assess the cognitive conflicts in human-automation interactions. In military, Lin et al. (2014) addressed the significant influence of the camouflage patterns to the eye-movement behavior and evaluated the effects of different camouflage designs. To facilitate user experience design, eye tracking techniques were applied to understand, design and evaluate user experience by a lot of researchers and companies (Bergstrom and Schall, 2014). Because users' characteristics, such as cognitive abilities and personalities, have impacts on effectiveness of information visualization, eye movements were used to evaluate traditional information visualization techniques such as tree diagrams (Burch et al., 2011) and box plots (Toker et al., 2013). In HFE research, motion tracking systems have also been widely adopted. For instance, motion tracking technique was used for interactive work design in occupational ergonomics (Ma et al., 2010). Besides, motion tracking systems were reported to be integrated in virtual environments to assess the work design and to predict real-world ergonomic measurements (Hu et al., 2011; Rajan et al., 1999). Furthermore, a mouse motion tracking system was used to measure the stress condition (e.g., frustration, difficulty) of users during the interaction with interfaces (Sun et al., 2014), which addressed its potential to be used in evaluating information visualization. Despite the successful applications of individual new sensing technology in HFE, the integration of different types of sensor data has not been explored and reported so far. Such an integration may yield a more accurate user-centered evaluation online. Besides, the contribution of each type of data in user-centered evaluation cannot be evaluated until a data fusion framework is studied in this paper. This is an important issue, since it helps one to understand how to select the right sensing devices for cost-effective and unobtrusive user-centered evaluation. Our motivation is to bridge the gap between the high demand of new data visualization tools and the low efficiency of user-centered evaluating processes by proposing a data fusion model for quantitative and online visualization evaluation. A wireless EEG device, a remote eye tracker, as well as a logging system (mouse motion tracking system newly developed in this paper) will be integrated

to achieve this objective. The synchronized data will be correlated with participants' subjective ratings of task complexities (i.e., visualization evaluation scores). These three types of data can be obtained online when the user is interacting with the visualization designs, and can be further used to predict the evaluation scores online before the post-evaluation survey is performed. The rest part of the paper is organized as follows. In Section 2, the study design is introduced. In Section 3, the results of the statistical analysis are presented. And the findings are discussed in Section 4. An application guideline of the proposed approach in HEF are discussed in Section 5. The limitations of the data fusion model are presented in Section 6. Finally, we draw the conclusions and summarize the future work in Section 7.

2. Study design An overview of the study design is shown in Fig. 1. The objective of the study design is to identify if there is a strong correlation between the three types of sensing data and the subjective evaluation scores. In particular, three data sets of last and first names in hierarchical structure were used to generate three hierarchical visualization designs. These designs were visualized and evaluated by 15 participants. During the evaluation, a free exploration task and 11 predefined tasks were performed by participants. These tasks were mainly used to search information from the graphic designs with simple calculations. The experiment followed Institutional Review Board (IRB) approved procedure. For each participant, EEG signals, eye movements and visualization logs were collected. The features were extracted and used to predict the evaluation scores that the participants provided for each task during the experiment. Eight models were estimated to unveil the correlation of the sensing data and the evaluation scores.

2.1. Hierarchical data sets and visualization designs Three arrays of 252 English names, including the last and the first names, were randomly generated from a name data base (Alan, 2011). Motivated by a data visualization example flare (UC Berkeley Visualization Lab, 2008), a hierarchical data structure was generated. Three data sets were automatically generated through randomly mapping the full name arrays to the data structure. These settings were designed to exclude the effects of different data types and structures, and reduce the learning effects to the participants' performances. In this study, three designs of visualization from an open source visualization library D3 were used to visualize the data sets (Bostock, 2011). Fig. 2 (a) presents the static node-link tree, in which full names and hierarchical relationships were mapped to the circles, texts and edges, respectively. The static node-link tree presents all semantic information at once without any interactions. Fig. 2 (b) shows an interactive version of static node-link tree. After a click on the node, the corresponding branch can be expanded or collapsed, which provides users with filtered information. Fig. 2 (c) implements a different layout of nested circles (Wang et al., 2006). The nodes sharing the same parents are mapped to the same circle, and the names are shown at the center of each circle. Hierarchical levels are represented by different level of packs and interactive field of views. By clicking on an interested circle, the circle will be zoomed in/out with more details on children circles inside it. Three visualizations were coded in HTML and JavaScript, and each of them used the web browser as the platform. Moreover, the diagrams shared the same screen with the resolution as 1920*1200.

Please cite this article in press as: Chen, X., Jin, R., Statistical modeling for visualization evaluation through data fusion, Applied Ergonomics (2017), http://dx.doi.org/10.1016/j.apergo.2016.12.016

X. Chen, R. Jin / Applied Ergonomics xxx (2017) 1e11

3

Fig. 1. An overview of study design.

Fig. 2. Three visualization designs of hierarchical full name data sets. (a) Static node-link tree (b) Collapsible node-link tree (c) Zoomable pack layout.

2.2. Participants 15 participants were recruited from population of students and faculty members at Virginia Tech to participate in the study of visualization evaluation. The sample size estimation was based on a calculation considering the degree of freedom in a linear regression model when ignoring the variable selection technique. Since we had 460 features in total, the sample size was expected to be larger than 460. Given that we had three designs and 11 predefined tasks (except for the first free exploration task for personalized standardization) for each design, we needed to have at least 14 participants. In this way, the sample size was 462 (3 designs  11 tasks  14 participants), which was slightly larger than the number of features. The participant No. 4 had missing data for eye movements, therefore the participant was removed from the samples for data

analysis. There were nine male and five female participants with the average age as 28.5 (standard deviation as 3.38); the age of the participants varied from 24 years old to 35 years old. Among 14 participants, four of them wore glasses and three wore contact lenses. Although participants were all from colleges, huge differences existed among their backgrounds. For example, 14 participants were from 13 departments with different levels of experience in data visualization, EEG devices and eye tracking systems. 9 participants had experiences in data visualization, 3 participants had experiences in eye tracking systems, and 4 participants had experiences in EEG devices. 2.3. Apparatus The apparatus of the experiment consisted of one desktop computer with two monitors (Dell® 2400 , 1920*1200 resolution).

Please cite this article in press as: Chen, X., Jin, R., Statistical modeling for visualization evaluation through data fusion, Applied Ergonomics (2017), http://dx.doi.org/10.1016/j.apergo.2016.12.016

4

X. Chen, R. Jin / Applied Ergonomics xxx (2017) 1e11

Monitor 1 presented the visualization designs to the participants, while Monitor 2 visualized the EEG signals and eye movements of the participants. The participants were only allowed to look at the Monitor 1 during the user study, while the experimenter used Monitor 2 to oversee the experiments. Besides, an ABM® BAlert 10 wireless EEG headset (256 Hz, 10 channels), a SMI® REDn remote eye tracker (20 Hz), and a computer based logging software system, which was newly developed in this research, were instrumented with each subjects during the experiment. To synchronize the EEG signals and the eye movements, data from EEG headset system and the eye tracking system were collected and synchronized in an ABM® external synchronize unit (ESU), which transmitted the data streams to the desktop with ESU time stamps and system time stamps. The system time stamps were used to synchronize the eye movements and the logs. As a graphical representation, the experimental setup is shown in Fig. 3.

2.4. User tasks There were 12 user tasks planned in total as presented in Table 1, which were divided into two groups, a free exploration task and 11 predefined tasks. Free exploration was designed for participants to learn the data sets, visualization structure and interaction policy to reduce the learning effect in certain degree. The data collected during this task will serve as the baseline of personalized features. After the free exploration, participants were assigned 11 predefined sustained visual tasks. These 11 tasks include the following three types: (1) to find the nodes with given names (six tasks), (2) to find the direct parent of nodes with given names (two tasks), and (3) to count the total number of children of given names (three tasks). Tasks could be classified into three levels of difficulties according to

the levels of given nodes. Answers needed to be typed in a textbox on the visualization website if the task required. During the experiment, mouse was allowed to use all the time, while the use of keyboard was only allowed to type the subjective task complexity score, or task answers (Task 9e11) after the participants finished each task. The data for the participants using keyboard were removed according to the logs thus being excluded in the modeling step.

2.5. Experimental design and procedure To identify the correlation between the sensing data and the evaluation scores, data collected from the wireless EEG device, the remote eye tracker and the logging system were used to model and predict evaluation scores. The scores can be attributed to three components, namely, inherent task complexity, design of the visualization, and participant's perception and cognition when performing the task. Therefore, tasks and participants, which imply the inherent task complexity and the participants' perception and cognition respectively, were treated as blocks in the analysis to study the performance of different designs. To compare different visualization designs, the null hypotheses were: design m is not better than design n, where m; n ¼ 1; 2; 3 while msn. Since multiple comparisons would be made, one-tailed pairwise t tests with Bonferroni correction (Bender and Lange, 2001) were applied. Meanwhile, in order to validate the model, the same tests were applied to test the performance based on the predicted evaluation scores. During the experiment, all participants were confronted to 3 visualization designs one after another. The order to be confronted to 3 designs was randomized by applying a Latin square (Bradley,

Fig. 3. Experimental setup: Participants were instrumented with a 10-channel wireless EEG headset (256 Hz), a remote eye tracker (20 Hz), and an event-driven logging system. An ESU device was used to collect and synchronize EEG signals and eye movements with ESU time stamps, and system time stamps of the workstation. The synchronized data, the logs of visualization layout parameters and browsing behaviors were recorded with system time stamps by the workstation and were further synchronized by allying the system time stamps. In this figure, different data streams were presented as arrows (wireless signs) in different colors with texts in italics. Three visualization designs with 12 tasks per each in different orders were assigned to different participants and displayed in Monitor 1. Meanwhile, the investigator monitored the EEG signals, eye movements and logs on Monitor 2. When performing the tasks, the participants were allowed to use mouse all the time, while the use of keyboard was only allowed to type the subjective task complexity score, or task answers (Tasks 9e11) after the participants finished each task.

Please cite this article in press as: Chen, X., Jin, R., Statistical modeling for visualization evaluation through data fusion, Applied Ergonomics (2017), http://dx.doi.org/10.1016/j.apergo.2016.12.016

X. Chen, R. Jin / Applied Ergonomics xxx (2017) 1e11

5

Table 1 Task design. Task Task Task Task Task Task Task Task Task Task Task Task a b c

1 2 3 4 5 6 7 8 9 10 11 12

Free Exploration Predefined Tasks

Explore the visualization design in three minutes. Find a given namea in level 1b Find a given name in level 1. Find a given name in level 2. Find a given name in level 2. Find a given name in level 3. Find a given name in level 3. Find a given name in level 2 and click on the direct parentc Find a given name in level 3 and click on the direct parent. Count the total number children of two given names in level 1. Count the total number children of two given names in level 2. Count the total number children of two given names in level 3.

Type 1

Type 2 Type 3

Every given name differs from each other. Level is defined by the distance between the node and the root in a hierarchical data set. Direct parent node is defined as the directly linked nodes in higher level.

1958) to alleviate the learning effects in this within-subjects design. Within each design, participants were assigned 12 tasks. The free exploration task was always assigned as the first task, while the other 11 tasks were in random orders.

2.6. Data collection While performing user tasks in each session, participants were instrumented with the EEG device and the remote eye tracker, which recorded their EEG signals and eye movements. Besides, their visualization logs, such as mouse movements and mouse events associated with system time stamps, were recorded by the logging system. The data format is presented as Table 2. To synchronize the three types of data, an ESU was used for temporal alignment. The ESU used the time stamps from the logging system in the workstation, and applied the same time stamps to both EEG signals and the eye movements. Hence, there is no time delay among all three types of data.

sscaled ¼

sraw  Bl ; Bu  Bl

(1)

where Bl ¼ maxðminðsraw Þ; lower quartile  1:5  interquartile rangeÞ, and Bu ¼ minðmaxðsraw Þ; upper quartile þ 1:5  interquartile rangeÞ. Hence, the EEG signals for different frequency bands and different participants will be scaled to the same range without loss of statistical, morphological and time-frequency differences. Afterwards, three types of EEG features were extracted as follows.

2.7. Feature extraction The three types of sensing data were measured in functional data formats, where features were extracted for dimensional reduction and temporal pattern representations. The summary of the features is presented in Table 3.

2.7.1. Features from EEG signals A pass band filter using inverse Fast Fourier Transform (iFFT) was applied to ten channels of the EEG signals to obtain five bands in time domain (Doyle et al., 1974), i.e., (1) Delta (1e4 Hz), (2) Theta (4e8 Hz), (3) Alpha (8e13 Hz), (4) Beta (13e30 Hz), and (5) Gamma (31e40 Hz). To reduce the high between-participant variability in EEG signals, this study applied a personalized feature standardization approach (Wang et al., 2016) by subtracting the baseline data of each participant during the free exploration task, and scaling each band to ½0; 1. The lower bound and the upper bound were determined by generating a box plot of a distribution as follows (Wang et al., 2016),

(1) Statistical features: The mean, standard deviation, energy, entropy, kurtosis and skewness of the scaled EEG signals were used as statistical features. (2) Morphological features: The two morphological features were extracted from each frequency bands.  Line length: The total length of the curve or sum of distances between successive points, computed as P L¼ N k¼2 jsscaled ðk  1Þ  sscaled ðkÞj, was proved as an efficient feature for classification problem (Esteller et al., 2001), and was used in this study.  Mean of nonlinear energy operator: A nonlinear energy operator was frequently used to separate long-term EEG signals and emphasize the functionality of combining the amplitude and frequency content of the EEG (Agarwal and Gotman, 1999). In this study, the mean of nonlinear energy, computed as PN1 2 1 N2 k¼2 sscaled ðkÞ  sscaled ðk  1Þ$sscaled ðk þ 1Þ; was used as a feature to reflect signal change in both time and frequency domains. (3) Time-frequency features: Wang et al. (2016) reported that EEG signals can be used to assess memory workload levels and showed a wavelet entropy as one of the significant features to classify four levels of workloads. In this study, a four-level Discrete Wavelet Transform (DWT) (Lin et al., 2010) using Daubechies wavelet was implemented to each scaled EEG signals and obtained five levels of wavelet coefficients Cj ðkÞ, where j ¼ 1; 2; …; 5. Afterwards, a relative wavelet energy

Table 2 Data structure from the EEG device, the eye tracker and the logging system. EEG Device 10 Channels a b

Eye Tracker a

Time stamps

Trajectory

Logging System Time stamps

Trajectory

Mouse Eventsb

Task start and end time

Graph layout para-meters

Time stamps

10 EEG channels include Electrocardiography (ECG), Fz, F3, F4, Cz, C3, C4, POz, P3, P4. Mouse events include scrolls, clicks, move overs and move outs of primitives.

Please cite this article in press as: Chen, X., Jin, R., Statistical modeling for visualization evaluation through data fusion, Applied Ergonomics (2017), http://dx.doi.org/10.1016/j.apergo.2016.12.016

6

X. Chen, R. Jin / Applied Ergonomics xxx (2017) 1e11

Table 3 Summary of feature extraction. Data

EEG Signalsa

Features

Stat. features (6)b 450

Number

Eye Movements Morpho-logical features (2)

Time-freq. features (1)

AOIs related features (2) 4

Logs Distance related features (2)

AOIs related features (2) 6

Distance related features (2)

Task features (2)

a

Features of EEG signals were extracted from Delta, Theta, Alpha, Beta, and Gamma bands (5 bands) of ECG, Fz, F3, F4, Cz, C3, C4, POz, P3, P4 channels (10 channels), respectively. Hence in total 10 channels  5 bands  (6 statistical features þ 2 morphological features þ 1 time-frequency features) ¼ 450 features were extracted from EEG signals. b Number in the bracket presents the count of features in corresponding group.

computed as pj ¼ Ej =Etot was applied to the five levels of  2 P  2 wavelet coefficients, where Ej ¼ rj  ¼ k Cj ðkÞ and 2  P P Etot ¼ kSk2 ¼ 5j¼1 Kk¼1 Cj ðkÞ , K is the length of decomposed signals. Then the total wavelet entropy could be P defined as SWT ≡SWT ðpÞ ¼  5j¼1 pj $ln pj . The wavelet entropy measured the degree of order or disorder of the signals. Therefore, information about the underlying dynamical process associated with the signal can be provided (Rosso et al., 2001).

2.7.2. Eye movement features Since two out of three visualization designs in this study were interactive, the static approaches using predefined or post-defined AOIs alone would not be effective to model the interactions in the collapsible node-link tree and the zoomable pack layout diagram. To dynamically map the eye movements to the AOIs, the logging system was used to record every layout changing events, parameters and corresponding system time stamps. The eye movements were registered to the complete layout based on participants' operations. For each visualization design, the primitives' bounding boxes were defined as the AOIs shown in Fig. 4. Specifically, the AOIs were defined as the rectangular bounds of the nodes being displayed in the static node-link tree and the collapsible node-link tree; and the AOIs were defined as circles in the zoomable pack layout design. Afterwards, two types of eye movement features were extracted as follows.

(1) AOIs related features: The number of eye hits within each AOI was counted, which was proportional to the time durations in each AOI since the eye tracker had a fixed sampling rate. Hence, a heat map of time durations was obtained for each task as shown in Fig. 4. Colors represents the standardized time durations in AOIs. The mean and the standard deviation of time durations within the AOI for a task were selected by using the heat map. (2) Distance related features: Reading path distances between two adjacent samples were calculated to serve as the measurement of the reading path. The average distance and the distance variability of a disordered reading path were expected to be larger than those of an ordered reading path. In this study, the mean and the standard deviation of eye moving distances among each primitive were selected as two features for each task.

2.7.3. Log features The logging system served as a browsing behavior recorder. The mouse movements, mouse events (scrolls, clicks, move overs and move outs of primitives), dynamical layout parameters, task start and end time stamps, and time stamps for logs during the tasks were recorded. Logs provided both a temporal and spatial alignment for eye movements and mouse movements. Six features in total were extracted from the logs. (1) AOIs related features: Similar to the AOIs related features for eye movements, the mean and the standard deviation of mouse fixation time on AOIs during each task were used as

Fig. 4. AOIs for (a) static node-link tree, (b) collapsible node-link tree, and (c) zoomable pack layout from left to right respectively. Colors represents the standardized time duration of eye movements in AOIs for Participant No.5. White color in AOIs indicates no time duration.

Please cite this article in press as: Chen, X., Jin, R., Statistical modeling for visualization evaluation through data fusion, Applied Ergonomics (2017), http://dx.doi.org/10.1016/j.apergo.2016.12.016

X. Chen, R. Jin / Applied Ergonomics xxx (2017) 1e11

two features, since they reflected participants' subconscious actions when reading. Moreover, the counts of mouse events pair during each task were used to reflect the number of operations. (2) Distance related features: Similar to the distance related features of eye movements, the mean and the standard deviation of the distance of mouse movements were used. (3) Task features: The time consumption for a task was used as a feature for each task. 2.8. A regularized linear regression model To model the relationship among the EEG signals, eye movements, the logs and the evaluation scores from three visualization designs, a linear regression model were used to unveil their correlation. To estimate this model, each task from the three visualization designs and the participants were considered as one sample. It is likely that there are more features of the sensing data used as predictors than the sample size when training the model. Therefore, a Lasso variable selection approach was adopted to identify a compact set of significant predictors in the model (Tibshirani, 1996). Denote the data for the linear regression model as ðX; YÞ, where X is the n  p matrix of p features and n samples, and Y is a vector of the response regarding the task and the evaluation scores with a range from 1 (the lowest complexity) to 10 (the highest complexity). The relationship between X and Y can be modeled as: Y ¼ X b þ ε, where b is a vector of model parameters, and ε  Nð0; s2 Þ is the model error which is independently and identically distributed and follows normal distribution with mean as 0 and variance as s2 . The Lasso estimation of the model parameters can be formulated as (Tibshirani, 1996),

 X  2 Xp 1 n yi  b0  xij bj i¼1 j¼1 2 Xp   þ l j¼1 bj  ;

blasso ¼ argminb

(2)

Where l is a tuning parameter. In this study, l was selected by Bayesian information criterion (BIC) based on the training data (Friedman et al., 2010).

7

The regularized linear regression approach was applied to the collected data with different features in eight different models. These models consider the following features respectively: (1) Model 1 (M1): EEG signals related features, (2) Model 2 (M2): eye movements related features, (3) Model 3 (M3): logs related features, (4) Model 4 (M4): EEG signals and eye movements related features, (5) Model 5 (M5): eye movements and logs related features, (6) Model 6 (M6): EEG signals and logs related features, (7) Model 7 (M7): EEG signals, eye movements and logs related features without interactions, and (8) Model 8 (M8): EEG signals, eye movements and logs related features with two-way interactions between the feature groups, i.e., considering all the two-way interactions of one variable from one feature group, and another variable from another feature group. These eight models will be compared and discussed for modeling prediction performance, variable selection results, and statistical test results.

3. Results In the study, 14 participants were assigned 11 predefined tasks to evaluate three visualization designs. In total, there are 462 samples. The free exploration tasks were used as a personalized baseline, thus were not used again in the modeling step. In total, 460 features were extracted as predictors from the raw data. If a model considers the two-way interactions of any two predictors from different feature groups, then additional 4524 interaction terms will be added in the full model before variable selection. A ten-fold Cross Validation (CV) (random partitioning) was used to evaluate the prediction performance for the visualization evaluation, and identify the significant predictors (Hastie et al., 2005). To reduce the dimensionality and select significant features, BIC was applied to control the model prediction performance and the model complexity. The model associated with the lowest BIC score was selected. The average lowest BIC scores over 10 folds for Models 1e8 were 226.43 (SE ¼ 4.77), 362.80 (SE ¼ 2.07), 298.34 (SE ¼ 2.00), 236.49 (SE ¼ 3.39), 269.12 (SE ¼ 2.94), 212.63 (SE ¼ 6.05), 213.68 (SE ¼ 4.06), 195.51 (SE ¼ 8.26) respectively. Root Mean Square Errors (RMSE) were used to evaluate the prediction performance of the eight models. Fig. 5 summarizes the average and the box plot of the RMSEs of the testing samples over the ten folds, as well as the R2 of the selected model based on the training

Fig. 5. Performance of the eight models: M1-M8.

Please cite this article in press as: Chen, X., Jin, R., Statistical modeling for visualization evaluation through data fusion, Applied Ergonomics (2017), http://dx.doi.org/10.1016/j.apergo.2016.12.016

8

X. Chen, R. Jin / Applied Ergonomics xxx (2017) 1e11

samples over the ten folds. In addition, Fig. 6 represents the magnitude of model parameters of Model 7 in different colors. If a magnitude of a model parameter is positive, then the corresponding predictor or feature is significant to predict the evaluation score. Besides, the significances of the models were tested by using F statistics. We used the selected predictors from the Lasso and re-estimated the model parameters by using training data sets over the ten folds. The average F statistics and the P-values, and the corresponding standard deviations (SD, in the parenthesis) over the ten folds are shown in Table 4. We also validated the linear regression model assumptions by using residual plots, and confirmed that the use of linear regression model was appropriate in this study. Fig. 7 presents the residual plots of Model 7, including the distribution of residuals, the plot of residuals versus prediction results, the QQ plot, and b ε t Versus b ε t1 for Lag 1 autocorrelation, respectively. To test the performance of different designs according to participants' evaluation scores and the predicted scores respectively, six pairwise one-tailed t tests with Bonferroni correction ða ¼ 0:05=6 ¼ 0:0083Þ were applied and the results are shown in Table 5 (significant results are in bold). Here, 1, 2, and 3 represent collapsible node-link tree, static node-link tree, and zoomable pack layout designs respectively. “>” means that the former design has a lower complexity score from participants than the latter design does to present the same information. For example, 2 > 1 means the static node-link tree outperform the collapsible node-link tree.

4. Discussion By reviewing Fig. 5 and comparing the modeling results, the average of RMSEs and the standard error (SE) for Models 1e8 are 1.39 (SE ¼ 0.10), 1.54 (SE ¼ 0.03), 1.42 (SE ¼ 0.03), 1.24 (SE ¼ 0.05), 1.37 (SE ¼ 0.03), 1.23 (SE ¼ 0.07), 1.18 (SE ¼ 0.06), and 1.18 (SE ¼ 0.05) respectively. Comparing the RMSEs of the testing samples, Models 7 and 8 have accurate prediction for evaluation scores. They have smaller average RMSEs of the testing samples than those of other models. Models 4e6 have smaller average RMSEs of the testing samples than those of Models 1e3. This implies that the data fusion of the features from all three types of

Table 4 Results of F test for Models 1e8.

M1 M2 M3 M4 M5 M6 M7 M8 *

F Statistics

P-values

Average (SD)

Average

14.28 (3.57) 77.45(21.03) 82.36 (17.92) 16.69 (4.88) 63.25 (8.64) 14.21 (2.40) 15.88 (3.98) 13.4 (4.33)

0.00* 0.00* 0.00* 0.00* 0.00* 0.00* 0.00* 0.00*

The P-values  2.2E-16.

sensor data (i.e., EEG signals, eye movements, and logs) will yield better prediction performance than using one or two types of sensor data or features alone. The data fusion of any two types of sensor data will still have better prediction performance than using one type of sensor data alone. In addition, the R2 of Models 1e8 are 0.69 (SE ¼ 0.012), 0.33 (SE ¼ 0.003), 0.43 (SE ¼ 0.003), 0.68 (SE ¼ 0.020), 0.48 (SE ¼ 0.003), 0.71 (SE ¼ 0.006), 0.71 (SE ¼ 0.020), and 0.78 (SE ¼ 0.020), respectively. The R2 values of Models 7 and 8 are generally higher than those of other models. This result is expected, as Models 7 and 8 use more features as predictors. If more predictors are used in the full model and are selected in the final model, a higher R2 value is expected for better fitness of the training samples. This RMSE and R2 results show the merit of data fusion to integrate multiple synchronized sensor data in predicting the visualization evaluation scores. Meanwhile, the eye tracking device is unobtrusive, and logs are less expensive to obtain. The results help the sensor selection in future visualization evaluation tasks. It is noticed that the box plot shows a large standard deriviation of the RMSEs over the ten folds. One possible reason is because the participant specific features may not be captured in the model due to the use of the same model for all participants, even if a personalized standardization was employed before the modeling. This issue will be addressed in future work. According to the variable selection results for Models 1e7 (Fig. 6), there are a number of selected features from each type of

Fig. 6. Magnitude of model parameters for Models 1e7. From left to right, the figure shows (a) EEG signals related features, (b) eye movement related features, and (c) logs related features.

Please cite this article in press as: Chen, X., Jin, R., Statistical modeling for visualization evaluation through data fusion, Applied Ergonomics (2017), http://dx.doi.org/10.1016/j.apergo.2016.12.016

X. Chen, R. Jin / Applied Ergonomics xxx (2017) 1e11

9

Fig. 7. Residual plots of Model 7 for assumption check. Top two figures, from left to right, the distribution of residuals, and residuals versus prediction results. Bottom figures, from left to right, the QQ plot, and bε t Versus bε t1 for Lag 1 autocorrelation.

Table 5 Results of one-tailed pairwise t tests with Bonferroni correction. Alternative Hypothesis

2>1

3>1

3>2

1>2

1>3

2>3

P-values (Evaluation Scores) P-values (Predicted Scores)

0.0003 <0.0001

1.0000 1.0000

1.0000 1.0000

0.9997 1.0000

<0.0001 <0.0001

<0.0001 <0.0001

The test results for participants' evaluation scores showed that P-values of 2 > 1, 1 > 3 and 2 > 3 were less than 0.0083 when treating the combinations of tasks and participants as blocks. The results indicated that static node-link tree was significantly better than the collapsible node-link tree and zoomable pack layout, while the collapsible node-link tree was significantly better than zoomable pack layout. Furthermore, Table 5 showed that the predicted scores were as informative as the participants' evaluation scores since the test results were consistent.

sensor data. It is also noticed that selected features are quite consistent over different models. Furthermore, the small p-values ( < 0:05) of F test in Table 4 indicates the overall significance of the eight models. Fig. 7 shows the residual plots of Model 7 for model assumption checking. Model 7 is selected for illustration purpose because both Models 7 and 8 yield the best prediction performance, and Model 7 has much less predictors. The residual plot validates that the residuals roughly follows normal distribution with some outliers, which are mainly due to the limited range (from 1 to 10) of the response variable. The residuals roughly have zero as the mean and a constant variance. And the residuals are independent. Hence, this model is acceptable to model the response variable. To better interpret the nature of the predictors, Table 6 summarizes the consistently selected features in Model 7 over 10 folds. The strong relationship among the evaluation scores and the features extracted from EEG signals, eye movements and logs can be attributed to the following three components of performing visual searching tasks in visualization tools. 1) Sustained attention and cognitive load, which were quantified by EEG signals. In literature, EEG signals' alpha and gamma bands were proven to be indicators

of sustained attention during performing the visual searching tasks  n et al., 2012), while statistical, (Huang et al., 2007; Ossando morphological, and time-frequency features of EEG signals were shown to be significantly related to participants' cognitive load (Wang et al., 2016). 2) Impacts of visual cues' layouts and participants' reading paths, which were quantified by eye movements. Eye movements were useful physiological data to assess the layouts (Burch et al., 2011) and to capture the patterns of reading paths (Rayner, 2012). And 3) participants' attempts and performances, which were captured by logs. Each collection of features explained different proportions of evaluation scores' variation with overlaps, therefore, the data fusion approach reached the lowest prediction errors. Table 5 presents the results of one-tailed pairwise t tests with Bonferroni correction when treating the combinations of participants and tasks as blocks, and significant difference among the three visualization designs were found by using the participants' evaluation scores and predicted scores respectively. The results indicate that static node-link tree significantly outperforms the other two designs, while the zoomable pack layout ranks the lowest. The significant differences in evaluation scores are due to

Please cite this article in press as: Chen, X., Jin, R., Statistical modeling for visualization evaluation through data fusion, Applied Ergonomics (2017), http://dx.doi.org/10.1016/j.apergo.2016.12.016

10

X. Chen, R. Jin / Applied Ergonomics xxx (2017) 1e11

Table 6 Selected features in Model 7 over 10 Folds. Data

Predictors

Channels

Bands

Features

EEG signals

7 65 94 103 107 123 128 143 148 182 187 204 254 263 278 314 353 391

ECG POz Fz Fz Fz Fz Fz Cz Cz C3 C3 C3 C4 C4 F3 F3 F4 P3

Delta Alpha Delta Theta Theta Beta Gamma Delta Theta Delta Delta Alpha Beta Gamma Delta Gamma Gamma Beta

entropy wavelet entropy curve length curve length standard deviation skewness wavelet entropy standard deviation curve length wavelet entropy entropy skewness wavelet entropy wavelet entropy standard deviation standard deviation wavelet entropy nonlinear energy

Data

Predictors

Features

Eye movements Logs

453 456 459 460

variance of distance variance of distance variance of mouse move over/out duration task duration

the fact that participants prefer to be presented all the information at one time than to interact with the information in such a sustained searching context. Meanwhile, Table 5 shows the consistency between evaluation scores and predicted scores, thus validating the proposed data fusion model. In other words, the consistency enables the potential of the objective measures based data fusion model to replace the subjective evaluation scores. 5. Application procedure to visualization evaluation and general human factors and ergonomics problems This research provides one way to systematically use EEG signals, eye movements, and visualization logs for online, unobtrusive evaluation of data visualization. Similar procedure of study design and data analysis can not only be used for PC based visualization, but also for visualization in other formats, such as information and decision visualization, and wearable display. To apply this method to real visualization evaluation cases, several guidelines should be followed. (1) A free exploration of the visualization tool ought to be conducted for each participant as a personalized standardization baseline in the very first phase. (2) The development of the models will require participants to complete tasks and provide postevaluation scores. After the model is developed, the model is able to provide prediction of subjective evaluation scores online, while the collection of the post-evaluation scores becomes optional. (3) For other visualizations, the visualization logs should be modified to reflect the workload or the insights based on the objectives of the visualization evaluation. In general, such a data fusion framework can be extended to serve other user-centered design evaluation needs, if the design evaluation completely or partially relies on the subjective scores from users in HFE. In the past, assessment of user centered design often used observations and subjective scores collected from questionnaires or interviews. Though they are unreplaceable, the subjective scores are collected offline and obtrusively, which lead to the inefficient trial-and-error fashion of the evaluation (Manenica and Corlett, 1973). Fortunately, the subjective scores have been proven to be predictable by using obtrusive measures. In recent years, as a significant improvement of efficiency, statistical models

or machine learning approaches have been used in user-centered design evaluation, such as the prediction of the overall score of subjective automobile seat comfort (Kolich et al., 2004). These usercentered design evaluation studies generally involve different types of unobtrusive measures. To jointly analyze the correlation of the sensor data and the evaluation scores, the proposed data fusion framework can serve as the online accuracy raiser. For example, if EEG signals and eye movements are used in a study, the same features as shown in this study can be extracted, while the type of logs need to be determined regarding the context (e.g., using motion tracking data in augmented or virtual reality design context, or using posture data of body in automobile seat design context). Although the same features are used in different context, the feature selection process will ensure the estimation performance of models. Afterwards, the model can predict subjective evaluation scores when getting such an evaluation score is costly. 6. Limitations However, there are some limitations for the data fusion model. Firstly, the model was constructed using the data from a group of well-educated participants, thus the generalization of this model is not obvious and need to be further tested. And the model depended on both the participants and the tasks thus reducing the generalization. Though the model performed well at predicting evaluation scores, the second limitation lies in the lack of diagnostic functionalities, which is important to position the weakness of the visualization designs. For diagnosis needs, the parameterization and evaluation of the visualization designs, such as the color coding, the layouts, etc., were not addressed in the data fusion model. Last but not the least, the relationship between the evaluation scores and the personal preferences was not discussed in current state. 7. Conclusions and future work Data visualization serves as an important technique to bring insights to users from the increasing variety and volume of data. However, there is a gap between the high demand of new data visualization tools and the low efficiency of user-centered designing processes due to the lack of unobtrusive, quantitative and online user-centered evaluation methods. In this paper, we propose a data fusion method to integrate the EEG signals, eye movements and logs to predict the visualization evaluation scores. In particular, a regularized regression model uses the online data, selects significant features, and predicts the evaluation scores accurately. This model is expected to obtain a quantitative, online feedback from users to improve the visualization designs. The results also indicate the significance of using all three types of data for visualization evaluation. The data fusion model can be extended to other evaluation problems in HFE to provide an accurate indicator of users' subjective ratings. This paper leaves several open questions to investigate in future research. A personalized model for individual participants will be constructed to further improve the prediction performance, such as the multitask Lasso (Kim and Xing, 2010). The visualization design features will be considered in the study design and treated as covariates in the models, such that this method can be extended to more visualization designs. A cognitive model will also be studied with these sensor data to understand the human cognition process in data visualization. Acknowledgement The authors acknowledge Dr. Nathan Lau at Virginia Tech for

Please cite this article in press as: Chen, X., Jin, R., Statistical modeling for visualization evaluation through data fusion, Applied Ergonomics (2017), http://dx.doi.org/10.1016/j.apergo.2016.12.016

X. Chen, R. Jin / Applied Ergonomics xxx (2017) 1e11

providing the access to eye tracking devices. References Agarwal, R., Gotman, J., 1999. Adaptive segmentation of electroencephalographic data using a nonlinear energy operator. Circuits Syst. 4, 199e202. Proceedings of the 1999 IEEE International Symposium on. Åkerstedt, T., Torsvall, L., 1984. Continuous Electrophysiological Recording, Breakdown in Human Adaptation to ‘Stress’. Springer, Netherlands, pp. 567e583. Alan, D., 2011. Name Database. https://www.drupal.org/project/namedb/ (Accessed 16 June 2013). Azuma, R.T., 1997. A survey of augmented reality. Presence Teleoperators Virtual Environ. 6 (4), 355e385. Bender, R., Lange, S., 2001. Adjusting for multiple testingdwhen and how? J. Clin. Epidemiol. 54 (4), 343e349. Bergstrom, J.R., Schall, A., 2014. Eye Tracking in User Experience Design. Elsevier. Bichlmeier, C., Wimmer, F., Heining, S.M., Navab, N., 2007. Contextual anatomic mimesis hybrid in-situ visualization method for improving multi-sensory depth perception in medical augmented reality. Mix. Augmented Real. 129e138, 6th IEEE and ACM International Symposium on. Billinghurst, M., 2002. Augmented reality in education. New Horizons Learn. 12. Bostock, M., 2011. D3.js e Data-driven Documents. http://d3js.org/ (Accessed 16 June 2013). Bradley, J.V., 1958. Complete counterbalancing of immediate sequential effects in a Latin square design. J. Am. Stat. Assoc. 53 (282), 525e528. Bryson, S., 1996. Virtual reality in scientific visualization. Commun. ACM 39 (5), 62e71. Burch, M., Heinrich, J., Konevtsova, N., Hoferlin, M., Weiskopf, D., 2011. Evaluation of traditional, orthogonal, and radial tree diagrams by an eye tracking study. IEEE Trans. Vis. Comput. Graph. 17 (12), 2440e2448. Carlis, J.V., Konstan, J.A., 1998. Interactive Visualization of Serial Periodic Data, pp. 29e38. Proceedings of the 11th Annual ACM Symposium on User Interface Software and Technology. Chen, X., Sun, H., Jin, R., 2016. Variation Analysis and Visualization of Manufacturing Processes via Augmented Reality. Proceedings of the 2016 Industrial and Systems Engineering Research Conference. (In proceeding). De Waard, D., Brookhuis, K.A., 1991. Assessing driver status: a demonstration experiment on the road. Accid. Analysis Prev. 23 (4), 297e307. Dehais, F., Causse, M., Vachon, F., Tremblay, S., 2012. Cognitive conflict in humaneautomation interactions: a psychophysiological study. Appl. Ergon. 43 (3), 588e595. DeLamarter, R.T., 1986. Big Blue: IBM's Use and Abuse of Power. Dodd, Mead & Company. Doil, F., Schreiber, W., Alt, T., Patron, C., 2003. Augmented Reality for Manufacturing Planning. In: Proceedings of the Workshop on Virtual Environments 2003. ACM, pp. 71e76. Doyle, J.C., Ornstein, R., Galin, D., 1974. Lateral specialization of cognitive mode: II. EEG frequency analysis. Psychophysiology 11 (5), 567e578. Eoh, H.J., Chung, M.K., Kim, S.-H., 2005. Electroencephalographic study of drowsiness in simulated driving with sleep deprivation. Int. J. Ind. Ergon. 35 (4), 307e320. Esteller, R., Echauz, J., Tcheng, T., Litt, B., Pless, B., 2001. Line Length: an Efficient Feature for Seizure Onset Detection, pp. 1707e1710. In Engineering in Medicine and Biology Society, Proceedings of the 23rd Annual International Conference of the IEEE, 2. Fekete, J.-D., Van Wijk, J.J., Stasko, J.T., North, C., 2008. The Value of Information Visualization, Information Visualization. Springer, Berlin Heidelberg, pp. 1e18. Friedman, J., Hastie, T., Tibshirani, R., 2010. Regularization paths for generalized linear models via coordinate descent. J. Stat. Softw. 33 (1), 1. Gevins, A., Smith, M.E., Leong, H., McEvoy, L., Whitfield, S., Du, R., Rush, G., 1998. Monitoring working memory load during computer-based tasks with EEG pattern recognition methods. Hum. Factors J. Hum. Factors Ergon. Soc. 40 (1), 79e91. Hansen, C., Wieferich, J., Ritter, F., Rieder, C., Peitgen, H.-O., 2010. Illustrative visualization of 3D planning models for augmented reality in liver surgery. Int. J. Comput. Assist. Radiol. Surg. 5 (2), 133e141. Hastie, T., Tibshirani, R., Friedman, J., Franklin, J., 2005. The Elements of Statistical Learning: Data Mining, Inference and Prediction, vol. 27. Springer Series in Statistics, Springer, Berlin, pp. 83e85. Hilbert, D.M., Redmiles, D.F., 2000. Extracting usability information from user interface events. ACM Comput. Surv. 32 (4), 384e421. Hix, D., Swan, J.E., Gabbard, J.L., McGee, M., Durbin, J., King, T., 1999. User-centered design and evaluation of a real-time battlefield visualization virtual environment. In: Virtual Reality, Proceedings. IEEE, pp. 96e103. Hu, B., Ma, L., Zhang, W., Salvendy, G., Chablat, D., Bennis, F., 2011. Predicting realworld ergonomic measurements by simulation in a virtual environment. Int. J. Ind. Ergon. 41 (1), 64e71. Huang, R.S., Jung, T.P., Makeig, S., 2007, April. Multi-scale EEG Brain Dynamics during Sustained Attention Tasks. IEEE International Conference on Acoustics, Speech and Signal Processing-ICASSP, 4, IV-1173.

11

Isenberg, P., Zuk, T., Collins, C., Carpendale, S., 2008. Grounded Evaluation of Information Visualizations. Proceedings of the 2008 Workshop on Beyond time and errors: novel evaluation methods for Information Visualization, 6. Kaufmann, H., 2003. Collaborative Augmented Reality in Education. Institute of Software Technology and Interactive Systems. Vienna University of Technology. €rg, C., Kohlhammer, J., Melançon, G., 2008. Keim, D., Andrienko, G., Fekete, J.-D., Go Visual Analytics: Definition, Process, and Challenges. Springer, Berlin Heidelberg, pp. 154e175. Kim, S., Xing, E.P., 2010. Tree-guided Group Lasso for Multi-task Regression with Structured Sparsity. Proceedings of the International Conference on Machine Learning (ICML). Kolich, M., Seal, N., Taboun, S., 2004. Automobile seat comfort prediction: statistical model vs. artificial neural network. Appl. Ergon. 35 (3), 275e284. Lam, H., Bertini, E., Isenberg, P., Plaisant, C., Carpendale, S., 2011. Seven Guiding Scenarios for Information Visualization Evaluation. Technical Report. Lin, Y.P., Wang, C.H., Jung, T.P., Wu, T.L., Jeng, S.K., Duann, J.R., Chen, J.H., 2010. EEGbased emotion recognition in music listening. IEEE Trans. Biomed. Eng. 57 (7), 1798e1806. Lin, C.J., Chang, C.-C., Lee, Y.-H., 2014. Evaluating camouflage design using eye movement data. Appl. Ergon. 45 (3), 714e723. Ma, L., Zhang, W., Fu, H., Guo, Y., Chablat, D., Bennis, F., Sawanoi, A., Fugiwara, N., 2010. A framework for interactive work design based on motion tracking, simulation, and analysis. Hum. Factors Ergon. Manuf. Serv. Ind. 20 (4), 339e352. Manenica, I., Corlett, E.N., 1973. A model of vehicle comfort and a method for its assessment. Ergonomics 16 (6), 849e854. Michael, K., Miller, K.W., 2013. Big data: new opportunities and new challenges [guest editors' introduction]. Computer 46 (6), 22e24. Murata, A., 2005. An attempt to evaluate mental workload using wavelet transform of EEG. Hum. Factors J. Hum. Factors Ergon. Soc. 47 (3), 498e508. Nee, A., Ong, S., Chryssolouris, G., Mourtzis, D., 2012. Augmented reality applications in design and manufacturing. CIRP Annals-Manufacturing Technol. 61 (2), 657e679. Okogbaa, O.G., Shell, R.L., Filipusic, D., 1994. On the investigation of the neurophysiological correlates of knowledge worker mental fatigue using the EEG signal. Appl. Ergon. 25 (6), 355e365. n, T., Vidal, J.R., Ciumas, C., Jerbi, K., Hamame , C.M., Dalal, S.S., Bertrand, O., Ossando Minotti, L., Kahane, P., Lachaux, J.P., 2012. Efficient “pop-out” visual search elicits sustained broadband gamma activity in the dorsal attention network. J. Neurosci. 32 (10), 3414e3421. Plaisant, C., 2004. The Challenge of Information Visualization Evaluation, pp. 109e116. Proceedings of the Working Conference on Advanced Visual Interfaces. Rajan, V.N., Sivasubramanian, K., Fernandez, J.E., 1999. Accessibility and ergonomic analysis of assembly product and jig designs. Int. J. Ind. Ergon. 23 (5), 473e487. Rayner, K., 2012. Eye Movements and Visual Cognition: Scene Perception and Reading. Springer Science & Business Media. Rohrer, R.M., Swing, E., 1997. Web-based information visualization. IEEE Comput. Graph. Appl. 17 (4), 52e59. Rosso, O.A., Blanco, S., Yordanova, J., Kolev, V., Figliola, A., Schürmann, M., Bas¸ar, E., 2001. Wavelet entropy: a new tool for analysis of short duration brain electrical signals. J. Neurosci. Methods 105 (1), 65e75. Saraiya, P., North, C., Lam, V., Duca, K.A., 2006. An insight-based longitudinal study of visual analytics. IEEE Trans. Vis. Comput. Graph. 12 (6), 1511e1522. Sun, D., Paredes, P., Canny, J., 2014. MouStress: Detecting Stress from Mouse Motion, pp. 61e70. Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. Tibshirani, R., 1996. Regression shrinkage and selection via the Lasso. Journal of the royal statistical society. Ser. B Methodol. 58, 267e288. Tien, J.M., 2003. Toward a decision informatics paradigm: a real-time, informationbased approach to decision making. IEEE Transactions on Systems, Man and Cybernetics. Part C Appl. Rev. 33 (1), 102e113. Toker, D., Conati, C., Steichen, B., Carenini, G., 2013. Individual User Characteristics and Information Visualization: Connecting the Dots through Eye Tracking, pp. 295e304. Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. UC Berkeley Visualization Lab, 2008. Flare e Data Visualization for the Web. http:// flare.prefuse.org/ (Accessed 16 June 2013). Van Ham, F., Van Wijk, J.J., 2004. Interactive Visualization of Small World Graphs, pp. 199e206. Information Visualization, INFOVIS 2004, IEEE Symposium on. Wang, W., Wang, H., Dai, G., Wang, H., 2006. Visualization of Large Hierarchical Data by Circle Packing, pp. 517e520. Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. Wang, S., Gwizdka, J., Chaovalitwongse, W.A., 2016. Using wireless EEG signals to assess memory workload in the n-back task. IEEE Trans. Human-Machine Syst. 46 (3), 424e435. Willett, W., Heer, J., Agrawala, M., 2007. Scented widgets: improving navigation cues with embedded visualizations. Vis. Comput. Graph. 13 (6), 1129e1136. IEEE Transactions on. Wills, G.J., 1997. NicheWorksdinteractive visualization of very large graphs. In: International Symposium on Graph Drawing. Springer, Berlin Heidelberg, pp. 403e414.

Please cite this article in press as: Chen, X., Jin, R., Statistical modeling for visualization evaluation through data fusion, Applied Ergonomics (2017), http://dx.doi.org/10.1016/j.apergo.2016.12.016