Perception of bar graphs – A biased impression?

Perception of bar graphs – A biased impression?

Computers in Human Behavior 59 (2016) 67e73 Contents lists available at ScienceDirect Computers in Human Behavior journal homepage: www.elsevier.com...

638KB Sizes 22 Downloads 601 Views

Computers in Human Behavior 59 (2016) 67e73

Contents lists available at ScienceDirect

Computers in Human Behavior journal homepage: www.elsevier.com/locate/comphumbeh

Full length article

Perception of bar graphs e A biased impression? Claudia Godau a, b, *, Tom Vogelgesang b, Robert Gaschler b, c €t Berlin, Germany Humboldt-Universita Image Knowledge Gestaltung, An Interdisciplinary Laboratory Berlin, Germany c €t in Hagen, Hagen, Germany FernUniversita a

b

a r t i c l e i n f o

a b s t r a c t

Article history: Received 9 July 2015 Received in revised form 15 January 2016 Accepted 27 January 2016 Available online xxx

Computers provide ubiquitous contact to data graphs. Yet, employing the power of the human perception system bears the risk of being subject to its biases. Data graphs are used to present the means of different conditions and are supposed to convey group information, such as variability across conditions, as well as the grand average. Across three samples, we tested whether there is a bias in the central tendency perceived in bar graphs, 53 participants with a mean age of 27 years (plus replication with N ¼ 38, mean age ¼ 23 years). Participants were provided with bar and point graphs and had to judge their means. We found that the mean value was systematically underestimated in bar graphs (but not in point graphs) across different methods of testing for biased evaluation. In a second experiment (N ¼ 80, mean age ¼ 24 years) we replicated and extended this finding, by testing the effect of outliers on the bias in average estimation. For instance, outliers might trigger controlled processing. Yet, the underestimation of the average was replicated and was not affected by including outliers e despite that the estimate was torn towards the outlier. Thus, we should be cautious with relying on bar graphs when a bias free estimate of the grand average is relevant. © 2016 Published by Elsevier Ltd.

Keywords: Bar graphs Data graphs Biased perception

1. Introduction Continuous and immediate access to computer includes contact to data graphs. For instance, web applications employ automated bar graphs and spread sheet software grants easy access to generate data graphs in different contexts. Ainley (1994) showed that school children, who are competent in using spreadsheets to record data, are able to produce graphs quickly and easily and even assimilate these skills by producing handedrawn graphs if required. Abilities to produce and read data graphs are correlated (Davis, 2011), moreover Burley (2010) argues that information visualization is a valuable tool for knowledge integration. Most companies use spread sheet software in work contexts. The use of data graphs in research has increased during the 20th century (Gross, Harmon, & Reidy, 2002) in publications and on conferences (Cleveland, 1984). Data graphs enable a rapid apprehension of result patterns in terms of quantitative relations between values. In some cases, conclusions might be drawn to fast when graphs are involved. Even people with high expertise in the subject matter find articles with data graphs

€t zu Berlin, Exzellenzcluster Bild * Corresponding author. Humboldt Universita Wissen Gestaltung, Unter den Linden 6, 10099 Berlin, Germany. E-mail address: [email protected] (C. Godau). http://dx.doi.org/10.1016/j.chb.2016.01.036 0747-5632/© 2016 Published by Elsevier Ltd.

more plausible, attributing data graphs a potential for persuasion (Isberner et al., 2013). Note however that disciplines differ greatly in the extent to which they use data graphs (Arsenault, Smith, & Beauchamp, 2006; Kubina, Kostewicz, & Datchuk, 2010; Smith, Best, Stubbs, Johnston, & Archibald, 2000). Furthermore, there are many ways to visualize scientific results and the design of graphs can affect the interpretation of the presented results (Fischer, Dewulf, & Hill, 2005; Huestegge & Philipp, 2011). Thus, while by and large data graphs are a common and efficient means to convey pattern information, there might be systematic biases in how perception extracts general aspects of a data set form a graph. Increasing our knowledge about the perception of data graphs might be most pressing for frequently used formats and features. The aim of our study was to examine whether bar graphs give a biased impression of central tendency. There are extensive studies on various properties of different types of data graphs. For example, results shown in bar graphs can be read faster and with higher accuracy compared to pie graphs (Simkin & Hastie, 1987). The same authors specify that divided bar graphs should be avoided in favor of simple bar graphs for reducing errors. Furthermore, vertical bar graphs are reported to be more user-friendly than horizontal ones (Fischer et al., 2005). While bar graphs are recommended for discrete values, trends should be

68

C. Godau et al. / Computers in Human Behavior 59 (2016) 67e73

represented in line graphs (cf. Zacks & Tversky, 1999). This should, however, not suggest that bar graphs are about individual values only. The specific strength of presenting data in a graph (e.g., rather than in a table) is that the reader can gain an instant impression about overall properties of the data set. A bar graph should not only convey the values of individual bars, but instead it should also convey group information, such as the variability across conditions and the grand average. For instance, a bar graph representing % sugar per kind of convenience food should allow assessing variability across foods, but also giving an impression of the overall level. Also, a bar graph on percent climate gas emission change per industrial sector should convey the general level of change in the period assessed as well as the differences among the branches. Peebles (2008) compared line graph, bar graph, and kiviat chart and showed that if people have to judge how much larger or smaller the value of a dimension is compared to the average, the values in bar graphs were systematically underestimated. In an experimental study we investigated whether there is a bias in the central tendency perceived in bar graphs. We report two experiments. In the first experiment, we tested the ‘underestimation of the mean in bar graphs’ across different methods. In the second experiment, we tested the ‘effect of outliers on the bias in average estimation’. 2. Literature review

Schützwohl, 1998). According to this general effect on how the mean estimation is being performed, graphs with vs. without outlier might differ in amount of systematic bias in mean estimation. Second, two diverging predictions can be made concerning specific impact of the outlier on mean estimation. On the one hand, on could assume that outliers draw attention (Zacks et al., 1998) and by attending the location of the outlier, participants' mean estimates are biased towards the outlier. For instance, an extremely low bar might draw attention to its location below the mean line. As attended positions should have a stronger weight in the mean estimate (Kruschke, 2003), the attention-hypothesis would suggest that high outliers pull the mean estimate upwards and low outliers pull it downwards. So the mean estimate would go with the outlier. On the other hand, one could assume that the mean estimate is pushed away from the outliers. For instance, outliers have a smaller impact on visual estimates of correlations from scatter plots than they have on the correlation coefficient (Bobko, 1979; Meyer, Taieb, & Flascher, 1997). Participants might (overly) discount the impact of the outliers (cf. Haberman & Whitney, 2010; Strobach & Carbon, 2013). A psychophysics perspective would also support such a pattern. Doubling the size of a stimulus (e.g. a bar) should increase its impact, but should fall short in doubling it. Thus, extreme bars should have less impact on the mean estimate than they have on the mean.

2.1. Underestimation of the mean in bar graphs

3. Methods

Bar graphs (as well as line graphs) are the most common format in technical and popular media (Zacks, Levy, Tversky, & Schiano, 2002). Unfortunately, bar graphs cannot be considered as the best practice for all statements, as a biased impression when judging the height of bars has been repeatedly documented (Jarvenpaa & Dickson, 1988; Kosslyn, 2006; Peebles, 2008; Zacks, Levy, Tversky, & Schiano, 1998). The results of experiments investigating the perception of bar graphs are inconsistent. In some conditions, people overestimate the height of bars (Jarvenpaa & Dickson, 1988; Kosslyn, 2006; Zacks et al., 1998). In others, underestimation is documented (Peebles, 2008; Zacks et al., 1998). Potentially, the accuracy of judging the height of the bar is affected by nearby elements (Zacks et al., 1998). In their study, the presented graphs were either 2D or 3D showing a test bar, either alone or together with a context bar. The task was to match the height of each test bar to the height of 50 different reference bars that were posted in front of the subjects in ascending order. If the difference between context bar and test bars was larger, the height of the bars was overestimated more (Zacks et al., 1998). Additionally, they found that people tend to overestimate short bars, whereas they tend to underestimate tall bars. One (yet untested) explanation for this result could be the biased impression of the height of vertical bars (Jarvenpaa & Dickson, 1988; Kosslyn, 2006). In our first experiment, we manipulated the height of the bars in order to follow up on this potential explanation. Testing for systematical biases in estimating the average from data graphs, we used different methods of assessing the estimate (graphic vs. numerical judgment) and varied format (bar graph vs. point graph) as well as height (high vs. low bars).

3.1. Underestimation of the mean in bar graphs

2.2. The effect of outliers on the bias in average estimation The second experiment was set up to replicate and extend the underestimation of the mean in bar graphs investigated in the first experiment by testing the impact of individual bars that are either extremely high or low. Outliers could influence performance in mean estimation on two levels. First, perceiving an outlier can raise awareness for the need of controlled processing (e.g., Brewer, 2012;

Fifty-three participants with a mean age of 27 years (SD ¼ 7.83 years, range 18e49) took part in the experiment and were paid 8V. They were recruited from the participant pool of the Department of €t (including students and other Psychology at Humboldt-Universita adults). Approximately 45% were women. Most participants (91%) were right-handed. All participants reported normal or correctedto-normal vision. The presented graphs consisted of 8 data points, shown as bars or dots in dark grey (see Fig. 1 for the material). The data for the graphs was generated randomly with Microsoft Excel using the RANDBETWEEN function. The experiment was divided into four blocks. Each block consisted of 80 trials in individually randomized order. In total 320 graphs were presented. In each trial of the first three blocks of the experiment, a data graph was shown together with a red horizontal line. This red line should be used as criterion to judge the mean of the presented data. The participant's task was to answer whether this line should be higher or lower. Therefore, the participant had to estimate the mean of the data points displayed in the graph. In Block 1 and 2 the red line was exactly placed on the numerical average of the displayed values and therefore showed the correct mean. If the participants showed a systematic bias in estimating the mean from the graphs, this should show in a systematic deviation from the 50e50 guessing pattern. The first block tested whether such a bias would be more pronounced in large rather than in small bars (while keeping variability amongst the bars constant). In the first block, 40 bar graphs with high bars (range 8e12; natural numbers) and 40 bar graphs with low bars (range 1e5; natural numbers) were presented. The latter were generated from the former by subtracting 7 units. The second block tested whether a systematic bias in perceiving the mean of data points in a graph is restricted to bar graphs or can also be found in point graphs. In the second block 40 high bar graphs and 40 point graphs, which were generated in pairs from the same pseudo-data, were presented (range 8e12; natural numbers). In Block 3 we varied the distance between criterion line and the

C. Godau et al. / Computers in Human Behavior 59 (2016) 67e73

69

Fig. 1. Conditions and examples of the presented graphs for all blocks of the first experiment e underestimation of the mean in bar graphs.

mean. The red line varied by 1.5 standard deviations around the mean. This allowed us to quantify the strength of a potential bias in terms of the true deviation from the average. In the third block 80 high bar graphs were presented (range 8e12; rational numbers). These graphs were generated in pairs with the same deviation over or under the mean. Finally, in order to avoid that our results would depend on a method employing a criterion line, we presented the mean numerically (Block 4). This time, the graphs included a scale on the left side. After presenting the graph, the mean was presented as a number (including two decimal places) on the next screen. As before, the participant's task was to judge whether the correct mean should be lower or higher. So also in this fourth block, 80 high bar graphs were presented (range 8e12; rational numbers). The graphs were presented centrally on a screen with the resolutions 1280  1024. The experiment took about 30 min. Each block consisted of a short instruction and an example. The participants had to decide whether the correct mean of the data should be higher or lower than the mean presented as the horizontal red line (Blocks 1 to 3) or value (Block 4). They were asked to react as quickly and as accurately as possible by pressing one of two buttons. In Blocks 1 to 3 the graph was presented for 3 s and the participants had additional 2 s for their response after the graph was hidden and replaced by a sentence requesting to press the associated button. In the fourth block, the graph was presented for 8 s and the participants had additional 7 s for their response after the graph was hidden and the mean was shown as a number. After a key press, the trial ended and the next trial was launched after a 2 s delay. No feedback was provided.

trial, participants were presented with a bar graph, overlaid with a horizontal line and had to manually indicate with index and middle finger of their dominant hand whether the mean of the bars in the graph should be higher (upper key) or lower (lower key). The line was set to mean level, so that we could measure the response bias. We applied a time limit of 2 s in order to enforce fast processing. Each bar graph contained 11 bars. There were four different types of bar charts in this task. While 10 of the bars were determined randomly with the constraint that standard deviation (SD) of height should be equal to 1, the types differed in what kind of height was chosen for the 11th bar. This bar was inserted at a random position into the array of the remaining bars (e.g., one random position had been left empty for the 11th bar). In the “þ3SD”-condition (30 trials), the 11th bar was calculated such that it deviated by 3 SD positively from the mean of the other 10 bars. In the “-3SD”-condition (30 trials), there was a deviant in the bar chart, too, but this time it consisted of an especially small column. A “þ1SD” and a “-1SD”-condition (20 trials each) served as control conditions. By rescaling, the distance between the bottom of the bar chart and the yellow horizontal line symbolizing the suggested mean was kept constant across all trials.

3.2. The effect of outliers on the bias in average estimation

4.1.1. Bias in high and low bars e Block 1 We applied t-tests comparing the rate of “should be lower” responses to the 50% chance level of the forced decision task. Participants significantly underrated the mean in the first block for the high, t(52) ¼ 6.38, p < .001, as well as for the low bars, t(52) ¼ 6.12, p < .001. We found no significant difference between the responses of the high bars in comparison to the low bar

We tested 80 university-students from Berlin (mean age ¼ 24 years, SD ¼ 5.35 years, range ¼ 18e52 years; 84% female; 89% right handed). Half of the participants took part for course credit, the other half was paid 8V. We applied a similar method as in the first experiment. On each

4. Results 4.1. Underestimation of the mean in bar graphs All RTs <100 msec. (.25%) were excluded from further analysis in the experiment. The percentages of response of the different blocks are presented in Fig. 2.

70

C. Godau et al. / Computers in Human Behavior 59 (2016) 67e73

standard deviations. If the estimation was bias-free, the percentage of the “mean should be lower” responses should be about 50% when the criterion line is overlapping with the mean. However, the distribution of the “mean should be lower” responses is shifted to the left. According to the fitted sigmoid function (R2 ¼ .976), the point of equivalence is .278 standard deviations lower than the correct mean. Thus, the criterion line has to be presented about ¼ of a standard deviation below the correct mean in order to compensate for the underestimation bias.

100 90 80 70 60

% responses - "mean should be lower"

% 50

% participants, who predominantly underrate (>62.5% of the trails)

40 30 20

4.1.4. Bias when judging against numbers e Block 4 In the fourth block, the mean was presented as a number after the graph. Again, people underestimated the mean. The percentage of the “mean should be lower” responses was significantly higher than 50%, t(51) ¼ 2.92, p ¼ .01. The reaction times are presented in the Fig. 4. The reaction times did not differ significantly between the “mean should be higher/ lower” responses overall sub conditions, F < 1. We calculated an ANOVA for each block and found no significant effects of gender or handedness for the responses in any of the blocks (all Fs  1).

10 0 high

low

bar

Block 1

point Block 2

numerical Block 4

Fig. 2. The percentage of participants, who predominantly underrate the mean (>62.5% of all trails), is presented in orange squares, and the percentage of “mean should be lower” responses is presented in blue points for the Blocks 1, 2, 4 of the first experiment e underestimation of the mean in bar graphs. The error bars show the standard error of the mean. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)

condition, t(52) ¼ .75, p > .46. In addition, we used a binary bias score per participant to check whether the effect was present in most of the participants (rather than being present strongly only in a minority). According to the binomial distribution, a significant deviation from 50% was reached when a participant responded “should be lower” in more than 26 of 40 trails (62.5%). This was the case for 64.15% of the participants in the high bar condition and 62.26% in the low bar condition (light blue points in Fig. 2).

4.2. The effect of outliers on the bias in average estimation We discarded 6.65% of the trials as participants did not respond within the 2 s response window (SD ¼ 5.85%). Despite the impact of the outliers (see below), all conditions showed > 60% “should be lower” responses (see Fig. 5). All differed significantly from the 50% criterion, t(79) ¼ 5.32 to 10.4, ps < .001. The estimation of the mean was biased towards the outlier rather than away from it. As shown in Fig. 5, participants more

4.1.2. Bias in bar graphs vs. point graphs e Block 2 We compared bar versus point graphs in the second block. In the bar graph condition we found that participants significantly underrated the mean again, t(52) ¼ 9.13, p < .001, whereas they significantly overrated the mean in the point graph condition, t(52) ¼ 4.58, p < .001, with t(51) ¼ 9.59, p ¼ .001, for the difference between conditions. Apparently, the underestimation effect depends on the type of graphs. Note that overrating in point graphs was not observed when these were tested in isolation (see Appendix 1a).

2400 2200 2000 1800 RT 1600

RT responses higher

1400

RT responses lower

1200

4.1.3. Measuring bias in distance from mean e Block 3 In Block 3, the red criterion line varied about 1.5 standard deviations around the correct mean. Fig. 3 presents the distribution of the percentage of the responses “mean should be lower” depending on the deviation of the criterion line from the correct mean in

1000 high

low

bar

point

Block 1

Block 2

Block 3

Block 4

high vs. low bar

bar vs. point graphs

mean varied

numeric

Fig. 4. Reaction times in msec. split by “mean should be lower/higher” responses for Block 1, 2, 3 and 4 of the first experiment e underestimation of the mean in bar graphs.

100 90 80 70 %

60 50 % responses - "mean should be lower"

40

Sigmoid fit

30 20 10

too low

Deviation of comparison line from the mean in SD

1,7 5

1,5

1,2 5

1

0,7 5

0,5

0,2 5

0

-0, 25

-0, 5

-0, 75

-1

-1, 25

-1, 5

-1, 75

0

too high

Fig. 3. The percentage of “mean should be lower” responses is presented per deviation of the reference line from the mean in standard deviations (Block 3 data of the first experiment e underestimation of the mean in bar graphs).

C. Godau et al. / Computers in Human Behavior 59 (2016) 67e73 100 % responses - "mean should be lower"

90 80

% participants, who predominatly underrate (> 62.5% of the trials)

70 60 % responses 50

RT in ms

40 1200 RT in ms 1150

30 20 10 0

1100 +3SD deviant

+1SD deviant

-3SD deviant

-1SD deviant

Deviant above average Deviant below average

Fig. 5. The percentage of participants, who predominantly underrate the mean (>62.5% of all trails), is presented in orange squares, and the percentage of “mean should be lower” responses is presented in blue points (left y-axis) of the second experiment e effect of outliers on the bias in average estimation. Gray diamonds show RTs in ms (right y-axis). The error bars show the standard error of the mean. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)

often responded “should be lower” when a single bar was extremely short (-3SD deviant) than when it differed less from the mean of the other bars (-1SD deviant). In line with this, a strong positive deviant had a stronger impact on pulling responses towards “should be higher” than a weaker deviant. A two-factorial ANOVA on % “should be lower” responses with deviant strength (3SD vs. 1SD) and deviant direction (smaller than average vs. larger than average bar) revealed a main effect of deviant direction, F(1, 79) ¼ 4.62, p < .035, h2p ¼ .055, and an interaction of deviant direction and deviant strength, F(1, 79) ¼ 12.065, p < .001, h2p ¼ .132. There was no main effect of deviant strength (F < 1). Response times (Fig. 5) did not suggest that outliers induced a tendency for controlled processing. Instead, participants were fastest with a -3SD outlier e thus, when their dominant “should be lower” response was supported the most by the stimulus. Positive outliers lead to slowed responses. Somewhat unexpectedly, weak support for the dominant response (-1SD deviant) led to the slowest RTs. Applying the above mentioned two-factorial ANOVA to the RTs lead to a significant interaction of deviant direction and deviant strength, F(1, 79) ¼ 7.3, p ¼ .008, h2p ¼ .085, Fs < 1.47 for the main effects. 5. Discussion 5.1. Underestimation of the mean in bar graphs Apart from values of individual bars, bar graphs should convey group information, such as the variability across bars and the grand average. Results of our first experiment suggest that people systematically underestimate the average of data presented in form of bar graphs. On the one hand, we assume that the bias in estimating the central tendency from bar graphs is a problem that has become more relevant due to the computer-aided spreading of such graphs (Card, Mackinlay, & Schneiderman, 1999). Today computers allow easily generating data graphs. In particular, setting up graphs is commonly taught in school (Ainley, 1994). Furthermore, automatically generated bar graphs are ubiquitous in web applications as tools to convey information on status and dynamics (Roth, Kolojejchick, Mattis, & Goldstein, 1994). Thus, the problem of underestimating the mean when quickly scanning a bar graph might increase in relevance with computer-aided spreading of bar graphs into everyday life. On the other hand, we do not assume that the problem is brought about by electronic displays. To rule out that

71

the bias is restricted to situations of human-computer-interaction, we tested 38 students with a short paper-and-pencil version of the mean estimation task using material from Block 2 of the first experiment. We replicated that bar-graphs lead to an underestimation of the mean (see Appendix 1a). Before turning to the second experiment, a note is in place on the unexpected finding of overestimation of the mean found for point graphs in the reported experiment. Likely this was an effect brought about by mixing bar graphs and point graphs. Follow-up experiments (Appendix 1b) suggested that, on the one hand, overestimation of the mean in point graphs was not present when testing point graphs in isolation. On the other hand, when presenting point graphs vs. bar graphs in a blocked design, overestimation of the mean in point graphs was only observed, when participants had before worked on bar graphs (rather than first working on the point graph block followed by the bar graph block). Thus, one explanation for the overestimation of the mean in point graphs in our first experiment could be that it is a reverse effect e when answering “should be lower” frequently towards bar graphs, the participants might have taken the chance to respond “should be higher” with point graph stimuli, e.g., to overall balance the frequency of the two response options, or, alternatively, due to prototype adaptation (cf. Strobach & Carbon, 2013). 5.2. The effect of outliers on the bias in average estimation Despite including outliers, we replicated the bias to underestimate the mean in bar graphs. The “should be lower” responses were predominant even in bar graphs that contained a bar that was extremely high (þ3SD outlier). This was the case, despite that mean estimates were biased towards the outliers. This is in line with the view that outliers attract attention with the consequence that the attended position has an increased impact on the estimate. 6. Conclusion Data graphs should allow to quickly communicate patterns of values across conditions such as variability and grand average. Yet, our results point to a systematic bias in judging averages from a frequently used kind of data graphs. The results showed that the mean appeared lower than it actually is in bar graphs. That is, bar graphs can give a biased impression of central tendency. Our data are in line with results showing an underestimation of individual values compared to the average in bar graphs (Peebles, 2008). While Zacks et al. (1998) showed that with increasing bar height the underestimation of the height increased, we obtained no such effect. However, in our experiments participants were supposed to convey group information rather than estimates for single bars. The effect of underestimating the mean in bar graphs seems to be strong, as the tipping point between the “mean should be lower/ higher” responses is shifted by about .28 standard deviations lower than the correct mean. The bias persisted even when we presented the numerical mean or outliers above the mean. Additionally, we showed that the height of the bars had no effect on the underestimation and therefore we need to think about other possible explanation, because this former one (Jarvenpaa & Dickson, 1988; Kosslyn, 2006) cannot explain the bias. Peebles (2008) suggests that such effects might be routed in the differences in the distribution of spatial attention in bar graphs vs. point graphs. Attention might be torn towards the (vertical) middle of bars rather than to the upper end of bars, while people might attend the dots rather than the space between dots and x-axis. We offer support for this view, as apparently outliers attract attention with the consequence that the attended position has an increased impact on the estimate. Developing this line of argument further, if attention is drawn

72

C. Godau et al. / Computers in Human Behavior 59 (2016) 67e73

toward the middle height of the bars rather than the top of the bars, this could lead to an underestimation. Future work could test this hypothesis by employing and manipulating eye movements during graph comprehension. Allocation of attention in graphs can be influenced by learning. Attention is placed at different features in a graph depending on graph literacy (Okan, Galesic, & Garcia-Retamero, 2015). That is, people with low graph literacy more often relied on misleading spatial-to-conceptual mappings, whereas people with higher graph literacy spent more time viewing the conventional features containing essential information for accurate interpretations. Novices focus on salient perceptual characteristics (e.g. color; Ali & Peebles, 2013; Lowe, 1999; 2003) but can improve in comprehending graphs by assisted dynamic visualizations guiding their attention (Ploetzner, Lippitsch, Galmbacher, Heuer, & Scherrer, 2009). Taking a practical perspective, we need to know when to use which type of graph. While bar graphs are not a good choice when group information such as the average across conditions is relevant, they might remain useful in other settings. For instance, bar graphs seem to be preferable (compared to line graphs) for displaying relationships between three variables, because participants make fewer errors interpreting bar graphs (Peebles & Ali, 2009; Peebles, 2011). Focusing on a perceptual bias in a simple and frequently used graph format might serve as a basis for studying perceptual properties of more complex formats. In science, people produce far more complex data graphs on fMRT, DNA, Proteins and environmental data (Bhatia, Perlman, Costello, & McComb, 2009; Shifrin, 2014). Computers support the interactive visual representation of data to amplify cognition (Card et al., 1999). Only interdisciplinary research projects could focus on the whole picture. Cognitive research has useful information on the strengths of visual perception for choosing the adequate design of graphs (Fekete, van Wijk, Stasko, & North, 2008). Acknowledgments The research reported in this paper was supported by the Deutsche Forschungsgemeinschaft (DFG) under Cluster of Excellence Image Knowledge Gestaltung (EXC1027/1). Appendix Methods Follow-up Experiment 1a e replicating underestimation of the mean in bar graphs. With the purpose to replicate underestimation of the mean in bar graphs, a total of 38 participants (84.2% female) with a mean age of 23.2 years (19e31) were tested individually with a paperpencil-version of bar graphs and point graphs taken from Block 2 of Experiment 1. Participants were recruited from the participant €t Koblenzpool of the Department of Psychology of Universita Landau, campus Landau (students of various disciplines). We showed each participant 18 bar graphs and 18 point graphs (order of format counterbalanced across participants, half starting with the 18 bar graphs, the others starting with the 18 point graphs). The graphs were selected in pairs such that in the bar graph block and in the point graph block identical pseudo data sets were shown. We presented six diagrams together per sheet of paper and participants marked the estimated mean with a pen at the y-axis. Estimates were hand-coded at the accuracy of half units. As shown in Fig. 1, the y-axis had 7 marks in equal distance. Results Follow-up Experiment 1a e replicating underestimation of the mean in bar graphs. On average, the mean indicated in the bar graphs was by .291 units lower than the true mean, t(37) ¼ 3.73, p < .001, for the comparison with zero average deviation from the true mean. When

relying on binary counts per bar graphs, we observed that 65.8% of the participants underestimated the mean in 12 or more of the 18 graphs (i.e., >62.5%). Thus, the underestimation of the mean was replicated with a paper-pencil-method. Somewhat surprisingly, participants overestimated the mean in the point graphs by a small (.047 units) but significant margin, t(37) ¼ 2.99, p ¼ .005. The difference between the mean estimate from bar-vs. point graphs was significant, t(37) ¼ 4.23, p < .001. For 94.7% of the participants (all but 2 participants) the estimated mean was smaller for the bar as compared to the point graph. Follow-up analyses suggested that the overestimation of the mean in the point graphs was only present in those participants working on the block with point graphs after the bar graph block (M ¼ .092 units), but not in the participants working on the point graphs first (M ¼ .003 units), t(36) ¼ 3.42, p ¼ .002, for the difference between groups balancing block order. Methods Follow-up Experiment 1b e no overestimation of the mean in point graphs. With the purpose to check for a potential overestimation of the mean in point graphs when these graphs were presented in isolation (rather than mixed with or preceded by bar graphs), we tested total of 33 participants (63.6% female) with a mean age of 28.2 years (20e47). They were recruited from the participant pool of the €t (including Department of Psychology at Humboldt-Universita students and other adults). Most of the participants were righthanded (81.8%). The procedure and the material of the computerized experiment were very similar to Experiment 1 (especially Block 2). The 80 graphs consist of 8 data points, shown as dots in dark grey. The data for the graphs was generated randomly (range 8e12; natural numbers). After a short instruction and an example, the participants were told that their task is to decide whether the exact mean of the data should be higher or lower than the presented line. Each point graph was presented for 3 s and the participants had additional 2 s for their response. The participants were asked to react as quickly and as accurately as possible by pressing one of two buttons. Results Follow-up Experiment 1b e no overestimation of the mean in point graphs. All RTs <100 msec. were excluded from further analysis. 50.82% of the responses were “the mean should be lower”. Estimates did not differ significantly from the mean, t(32) ¼ .73, p ¼ .47. Only 6.06% of the participants predominantly underrate the mean in more than 62.5% of the trails. We calculated an ANOVA with the factor gender or handedness and found no significant effects (all Fs  1). References Ainley, J. (1994). Building on children's intuitions about line graphs. In D. Ponte, J. Pedro, & J. F. Matos (Eds.), Proceedings of the international conference for the psychology of mathematics education (PME) (18th, Lisbon, Portugal, July 29-August 3, 1994) (Vols. I-IV). Retrieved from http://eric.ed.gov/?id¼ED383537. Ali, N., & Peebles, D. (2013). The effect of gestalt laws of perceptual organization on the comprehension of three-variable bar and line graphs. Human Factors: The Journal of the Human Factors and Ergonomics Society, 55(1), 183e203. http:// dx.doi.org/10.1177/0018720812452592. Arsenault, D. J., Smith, L. D., & Beauchamp, E. A. (2006). Visual inscriptions in the scientific hierarchy mapping the “treasures of science.” Science Communication, 27(3), 376e428. http://dx.doi.org/10.1177/1075547005285030. Bhatia, V. N., Perlman, D. H., Costello, C. E., & McComb, M. E. (2009). Software tool for researching annotations of proteins: open-source protein annotation software with data visualization. Analytical Chemistry, 81(23), 9819e9823. http:// dx.doi.org/10.1021/ac901335x. Bobko, P., Ronald (1979). The perception of pearson product moment correlations from bivariate scatterplots. Personnel Psychology, 32(2), 313e325. Brewer, W. F. (2012). The theory ladenness of the mental processes used in the scientific enterprise. In R. W. Proctor, & E. J. Capaldi (Eds.), Psychology of science: implicit and explicit processes. New York: Oxford University Press, 290e233. Burley, D. (2010). Information visualization as a knowledge integration tool. Journal of Knowledge Management Practice, 11(No. 4).

C. Godau et al. / Computers in Human Behavior 59 (2016) 67e73 Card, S. K., Mackinlay, J. D., & Schneiderman. (1999). Readings in information visualization: Using vision to think. San Francisco, Calif.: San Francisco, Calif.: Morgan Kaufmann Publ. Cleveland, W. S. (1984). Graphs in scientific publications. The American Statistician, 38(4), 261e269. http://dx.doi.org/10.2307/2683400. Davis, D. R. (2011). Enhancing graph production skills via programmed instruction: an experimental analysis of the effect of guided-practice on data-based graph production. Computers in Human Behavior, 27(5). Fekete, J.-D., van Wijk, J., Stasko, J. T., & North, C. (2008). The value of information visualization. In A. Kerren, J. T. Stasko, J.-D. Fekete, & C. North (Eds.), Information visualization e human-centered issues and perspectives. Springer. Retrieved from http://www.springer.com/us/book/9783540709558. Fischer, M. H., Dewulf, N., & Hill, R. L. (2005). Designing bar graphs: orientation matters. Applied Cognitive Psychology, 19(7), 953e962. http://dx.doi.org/ 10.1002/acp.1105. Gross, A. G., Harmon, J. E., & Reidy, M. S. (2002). Communicating science: The scientific article from the 17th century to the present. Oxford: Oxford Univ. Press. Haberman, J., & Whitney, D. (2010). The visual system discounts emotional deviants when extracting average expression. Attention, Perception, & Psychophysics, 72(7), 1825e1838. http://dx.doi.org/10.3758/APP.72.7.1825. Huestegge, L., & Philipp, A. M. (2011). Effects of spatial compatibility on integration processes in graph comprehension. Attention, Perception, & Psychophysics, 73(6), 1903e1915. http://dx.doi.org/10.3758/s13414-011-0155-1. Isberner, M.-B., Richter, T., Maier, J., Knuth-Herzig, K., Horz, H., & Schnotz, W. (2013). Comprehending conflicting science-related texts: graphs as plausibility cues. Instructional Science, 41(5), 849e872. http://dx.doi.org/10.1007/s11251-0129261-2. Jarvenpaa, S. L., & Dickson, G. W. (1988). Graphics and managerial decision making: research-based guidelines. Communication of the ACM, 31(6), 764e774. http:// dx.doi.org/10.1145/62959.62971. Kosslyn, S. M. (2006). Graph design for eye and mind. New York: Oxford University Press Inc. Kruschke, J. K. (2003). Attention in learning. Current Directions in Psychological Science, 12(5), 171. http://dx.doi.org/10.1111/1467-8721.01254. Kubina, R. M., Kostewicz, D. E., & Datchuk, S. M. (2010). Graph and table use in special education: a review and analysis of the communication of data. Evaluation & Research in Education, 23(2), 105e119. http://dx.doi.org/10.1080/ 09500791003734688. Lowe, R. K. (1999). Extracting information from an animation during complex visual learning. European Journal of Psychology of Education, 14(2), 225e244. http:// dx.doi.org/10.1007/BF03172967. Lowe, R. K. (2003). Animation and learning: selective processing of information in dynamic graphics. Learning and Instruction, 13(2), 157e176. http://dx.doi.org/ 10.1016/S0959-4752(02)00018-X. Meyer, J., Taieb, M., & Flascher, I. (1997). Correlation estimates as perceptual judgments. Journal of Experimental Psychology: Applied, 3(1), 3e20. http://dx.doi.org/ 10.1037/1076-898X.3.1.3. Okan, Y., Galesic, M., & Garcia-Retamero, R. (2015). How people with low and high

73

graph literacy process health graphs: evidence from eye-tracking. Journal of Behavioral Decision Making. http://dx.doi.org/10.1002/bdm.1891. Peebles, D. (2008). The effect of emergent features on judgments of quantity in configural and separable displays. Journal of Experimental Psychology-Applied, 14(2), 85e100. http://dx.doi.org/10.1037/1076-898X.14.2.85. Peebles, D. (2011). The effect of graphical format and instruction on the interpretation of three-variable bar and line graphs. In L. Carlson, C. Hoelscher, & T. F. Shipley (Eds.), Proceedings of the 33rd annual conference of the cognitive science society (pp. 3143e3154). Austin, TX: Cognitive Science Society. Retrieved from http://cognitivesciencesociety.org/conference2011/index.html. Peebles, D., & Ali, N. (2009). Differences in comprehensibility between threevariable bar and line graphs. In N. A. Taatgen, & H. van Rijn (Eds.), Proceedings of the 31st annual conference of the cognitive science society (pp. 2938e2943). Amsterdam: Cognitive Science Society. Retrieved from http://csjarchive.cogsci. rpi.edu/Proceedings/2009/index.html. Ploetzner, R., Lippitsch, S., Galmbacher, M., Heuer, D., & Scherrer, S. (2009). Students' difficulties in learning from dynamic visualisations and how they may be overcome. Computers in Human Behavior, 25(1), 56e65. http://dx.doi.org/ 10.1016/j.chb.2008.06.006. Roth, S. F., Kolojejchick, J., Mattis, J., & Goldstein, J. (1994). Interactive graphic design using automatic presentation knowledge. In Proceedings of the SIGCHI conference on human factors in computing systems (pp. 112e117). Boston, Massachusetts, USA: ACM. Schützwohl, A. (1998). Surprise and schema strength. Journal of Experimental Psychology: Learning, Memory, and Cognition, 24(5), 1182e1199. http://dx.doi.org/ 10.1037/0278-7393.24.5.1182. Shifrin, N. (2014). Environmental data visualization. In Environmental perspectives (pp. 35e41). Springer International Publishing. http://dx.doi.org/10.1007/9783-319-06278-5_6. Retrieved from. Simkin, D., & Hastie, R. (1987). An information-processing analysis of graph perception. Journal of the American Statistical Association, 82(398), 454e465. http://dx.doi.org/10.2307/2289447. Smith, L. D., Best, L. A., Stubbs, D. A., Johnston, J., & Archibald, A. B. (2000). Scientific graphs and the hierarchy of the sciences: a Latourian survey of inscription practices. Social Studies of Science, 30, 73e94. http://dx.doi.org/10.1177/ 030631200030001003. Strobach, T., & Carbon, C.-C. (2013). Face adaptation effects: reviewing the impact of adapting information, time, and transfer. Frontiers in Psychology, 4. http:// dx.doi.org/10.3389/fpsyg.2013.00318. Zacks, J., Levy, E., Tversky, B., & Schiano, D. J. (1998). Reading bar graphs: effects of extraneous depth cues and graphical context. Journal of Experimental Psychology: Applied, 4(2), 119e138. http://dx.doi.org/10.1037/1076-898X.4.2.119. Zacks, J., Levy, E., Tversky, B., & Schiano, D. J. (2002). Graphs in press. In M. Anderson, & B. Meyer (Eds.), Diagrammatic representation and reasoning (pp. 187e206). Springer Science & Business Media. Zacks, J., & Tversky, B. (1999). Bars and lines: a study of graphic communication. Memory & Cognition, 27(6), 1073e1079. http://dx.doi.org/10.3758/BF03201236.