Reflections on the 1951 Fitts List: Do Humans Believe Now that Machines Surpass them?

Reflections on the 1951 Fitts List: Do Humans Believe Now that Machines Surpass them?

Available online at www.sciencedirect.com ScienceDirect Procedia Manufacturing 3 (2015) 5334 – 5341 6th International Conference on Applied Human Fa...

155KB Sizes 9 Downloads 47 Views

Available online at www.sciencedirect.com

ScienceDirect Procedia Manufacturing 3 (2015) 5334 – 5341

6th International Conference on Applied Human Factors and Ergonomics (AHFE 2015) and the Affiliated Conferences, AHFE 2015

Reflections on the 1951 Fitts list: Do humans believe now that machines surpass them? J.C.F. de Wintera,*, P.A. Hancockb b

a Department BioMechanical Engineering, Delft University of Technology, Mekelweg 2, 2628 CD Delft, The Netherlands MIT2 Laboratories, Department of Psychology and the Institute for Simulation and Training, University of Central Florida, Orlando, FL USA

Abstract In 1951, Paul Fitts “surveyed the kinds of things men can do better than present-day machines, and vice versa” and introduced a list of statements, now commonly known as the Fitts list. A criticism sometimes raised with respect to the Fitts list is that it is outdated. Hence, it is important to address the issue of modernity, especially in light of the increasing rate of technological advance. A total of 249 engineering MSc students were asked to indicate whether humans surpass machines or whether machines surpass humans, for each of the 11 statements of the Fitts list, using an electronic voting system. 2,948 respondents from 103 countries did the same via a crowdsourcing facility. The results indicated that present-day machines are considered to surpass humans in respect to detection, perception, and long-term memory while Fitts argued that the opposite held true. However, a closer reading of the 1951 report reveals that some of the supposed disagreements were already anticipated at that time. © byby Elsevier B.V. This is an open access article under the CC BY-NC-ND license © 2015 2015 The TheAuthors. Authors.Published Published Elsevier B.V. (http://creativecommons.org/licenses/by-nc-nd/4.0/). Peer-reviewunder underresponsibility responsibility AHFE Conference. Peer-review of of AHFE Conference Keywords: Function allocation; crowdsourcing; CrowdFlower; clickers; electronic voting system

1. Introduction Function allocation is perhaps the most critical issue in all of Human Factors research. The question whether tasks should be performed by humans or by machines has been exercising the minds of those researching human

* Corresponding author. Tel.: +31-15-2786794. E-mail address: [email protected]

2351-9789 © 2015 The Authors. Published by Elsevier B.V. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/). Peer-review under responsibility of AHFE Conference doi:10.1016/j.promfg.2015.07.641

J.C.F. de Winter and P.A. Hancock / Procedia Manufacturing 3 (2015) 5334 – 5341

5335

interaction with technology now for a half a century and more. In 1951, Paul Fitts, together with some fellow founding fathers of the Human Factors discipline, benchmarked the issue of function allocation [1]. In their seminal report “Human engineering for an effective air-navigation and traffic-control system”, these authors “surveyed the kinds of things men can do better than present-day machines, and vice versa” and introduced 11 statements, now colloquially known as the Fitts list or the MABA approach (Table 1). The Fitts list has become a classic, but often debated, set of function allocation observations [2]. It has been argued that the Fitts list unduly dichotomizes human-machine interaction as though humans and machines were necessarily competing entities [3]. Furthermore, a consensus now exists that because you can automate a task does not mean that you should [4]. Another criticism often raised with respect to the Fitts list is that it is now outdated: what may have been an acceptable function allocation strategy in 1951 is not necessarily acceptable today [5–7]. As regards to detection (Statement 1 in Table 1), sensors have become more precise for all base quantities. For example, scanning tunneling microscopy can detect surface irregularities at the atomic level [8], optical clocks can detect that time runs faster in the attic than in the basement [9], and high-speed cameras can detect tiny vibrations in everyday objects, enabling a reconstruction of the sounds that caused these vibrations [10]. Computational power (Statement 10) and the ability to store very large amounts of information (Statement 4) follow an exponential temporal trend in accord with Moore’s law [11]. For example, it is now possible to purchase hard disk drives with an areal density of over 500 Gbit/in2 [12], a number that stands in stark contrast to the 0.000002 Gbit/in2 of the first disk drive introduced in 1956 [13]. The abilities to perceive patterns (Statement 2), improvise (Statement 3), reason inductively (Statement 5), and exercise judgment (Statement 6) have also come within the realm of machines. The performance of face and speech recognition systems is now close to human level ability [14, 15], and IBM’s Watson answered on the Jeopardy! TV quiz open domain questions better than top-level human players did [16]. Some have argued that Watson does not demonstrate true “understanding” [17] or “knowledge” [18], because it merely searches databases and performs statistical analyses. Kurzweil [19], on the other hand, contends that Watson uses methods that resemble those that biology evolved in the form of the neocortex: “If understanding language and other phenomena through statistical analysis does not count as true understanding, then humans have no understanding either”. In light of the ever-increasing technological advance, debates on the relative abilities and responsibilities of humans versus machines will probably become an increasingly central issue in Human Factors. It is important to address the issue of modernity and investigate how people currently judge the strengths of their machine counterparts. In the present paper, we report the results of 249 engineering MSc students who completed the Fitts list by means of an electronic voting system during lectures on Human Factors, and 2,941 respondents from 103 different countries who completed the Fitts list online via a crowdsourcing facility. Thus, our results are based on two populations with, assumedly, a different overall knowledge of machine abilities. Table 1. The original Fitts list [1]. Humans appear to surpass present-day machines in respect to the following:

Present-day machines appear to surpass humans in respect to the following:

1. Ability to detect a small amount of visual or acoustic energy

7. Ability to respond quickly to control signals and to apply great force smoothly and precisely

2. Ability to perceive patterns of light or sound

8. Ability to perform repetitive, routine tasks

3. Ability to improvise and use flexible procedures

9. Ability to store information briefly and then to erase it completely

4. Ability to store very large amounts of information for long periods and to recall relevant facts at the appropriate time

10. Ability to reason deductively, including computational ability

5. Ability to reason inductively

11. Ability to handle highly complex operations, i.e. to do many different things at once

6. Ability to exercise judgment

5336

J.C.F. de Winter and P.A. Hancock / Procedia Manufacturing 3 (2015) 5334 – 5341

2. Method Two methods were employed to gather anonymous responses regarding the Fitts list: an electronic voting system (EVS) during Master of Science (MSc) lectures on human factors, and an online survey using CrowdFlower. 2.1. Electronic voting system The first author of this paper teaches the MSc course Man-Machine Systems at the faculty Mechanical, Maritime, and Materials Engineering of the Delft University of Technology (TU Delft). This course introduces students to the history of Human Factors research, and to topics such as vigilance, detection theory, human-automation interaction, human error, and virtual reality. In some of the lectures, the teacher used an EVS, of the make TurningPoint Technologies. EVSs—also called audience response systems, classroom response systems, or clickers—allow students to respond directly to statements or problems presented during a lecture. The reason for using an EVS was that it contributes to increased student engagement and attention; whether EVSs actually leads to better learning and retention of the lecture material remains debatable [20, 21]. In one lecture on human-automation interaction, the lecturer introduced the Fitts list by asking the students whether machines surpass humans or humans surpass machines, for each of the 11 statements. The following order was employed: (1) Judgment, (2) Improvisation, (3) Simultaneous Operations, (4) Speed & Power, (5) Replication, (6) Induction, (7) Detection, (8) Perception, (9) Long Term Memory, (10) Short Term Memory, and (11) Computation (corresponding to Statements 6, 3, 11, 7, 8, 5, 1, 2, 4, 9, & 10 in Table 1). At this point in the lecture, the students had not yet been told about the Fitts list, nor that these statements were from the year 1951. For all items, the lecturer read the statement orally and gave students sufficient time to respond (~ 20 s) by monitoring the number of responses received on his laptop. After the votes for a statement were received, the results of the voting were presented in the form of percentages (see Fig. 1 for a slide regarding the Judgment item of the Fitts list), before proceeding to the next statement. After the lecture, the lecture slides were made available on the university’s Blackboard Learning System. The lecture that introduced the Fitts list was offered to cohorts of students on the following dates: 11 November 2010, 17 November 2011, 19 November 2012, 18 November 2013, 23 November 2013, and 14 November 2014. On 18 November 2013, the students were not from the course Man-Machine Systems, but from a course called Automotive Safety & Human Factors provided at the faculty Industrial Design Engineering of the TU Delft.

Fig. 1. One slide from the lecture. This slide shows the Judgment item of the Fitts list. In this case, 31 students had voted “Humans surpass machines” while 9 had voted “Machines surpass humans”.

2.2. CrowdFlower Crowdsourcing is increasingly recognized as a tool for conducting psychological research [22, 23]. In the present study, we used CrowdFlower, a service that is well suited for survey research (see [24] for an overview). The CrowdFlower survey was structured as follows. Participants were first provided with information about the aim of the research: “To explore the public opinion on humans and technology/automation/computers”. They were also informed that the survey took approximately 5 min of their time, and they were provided with information

J.C.F. de Winter and P.A. Hancock / Procedia Manufacturing 3 (2015) 5334 – 5341

5337

regarding voluntarity and anonymity. The first author’s e-mail address was listed, in case one had a question about the survey. However, none of the respondents actually sent an e-mail. The survey consisted of the following questions: x “Have you read and understood the above instructions?”, with response options Yes and No. x “What is your gender?”, with response options Male, Female, and I prefer not to respond. x “What is your age?” Here only a positive integer could be submitted. x Next, the survey queried the 11 statements of the Fitts list. The following introductory sentence was provided: “For the following 11 statements, please click ‘humans surpass machines’ when you believe that humans can perform this function better than present-day machines. Vice versa, click ‘machines surpass humans’ when you think that present-day machines can do it better than humans”. For each of the 11 items, the response option I prefer not to respond was also available. The order of the questions was the same as with the EVS. For the Inductive Reasoning item, a clarifying footnote was added: “Inductive reasoning = making an educated guess, reasoning based on abstractions, generalization, argument by analogy, prediction”. x In addition, we used the propensity-to-trust-machines scale as defined by Merritt, Heimbaugh, LaChapell, and Lee [25]. This scale consisted of the following six items: (1) I usually trust machines until there is a reason not to, (2) For the most part, I DISTRUST machines, (3) In general, I would rely on a machine to assist me, (4) My tendency to trust machines is high, (5) It is easy for me to trust machines to do their job, and (6) I am likely to trust a machine even when I have little knowledge about it. The response options were Strongly disagree, Disagree, Neither agree nor disagree, Agree, Strongly Agree, and I prefer not to respond. x Next, we asked: “In which year do you think that most cars will be able to drive fully automatically in your country of residence?” in a single-line text box. The same question was asked in previous research [24, 26, 27]. The reason for asking this question was to investigate whether peoples’ beliefs on human versus machine abilities are associated with peoples’ estimates regarding when machines will actually take over control. x The final question was: “Please provide any suggestions or comments on the survey”, in a multi-line textbox. Respondents were required to answer all questions, except the last one. In order to gather data from an as large and diverse population as possible, no restrictions with respect to the respondents’ country of residence were set, and ‘Level 1’ crowdworking was selected, that is, the lowest of the three available levels accounting for 60% of CrowdFlower’s monthly completed work. Respondents were not allowed to submit multiple surveys. We offered a payment of $0.12 per respondent for completing the survey. 2.3. Statistical analyses For the CrowdFlower sample, we excluded respondents who indicated they had not read the instructions, were under 18 years old, or answered I prefer not to respond in more than three items. For both the EVS and CrowdFlower samples, and for each of the 11 items of the Fitts list, we calculated the percentage of respondents stating that humans surpass machines. Respondents who did not provide an answer or answered I prefer not to respond were excluded from this calculation. For the CrowdFlower sample, we calculated an aggregate ‘Machine score’ by z-transforming the 11 Fitts list items and then averaging across these items. A high Machine score indicates a tendency to believe that machines surpass humans, whereas a low Machine score indicates a tendency to believe that humans surpass machines. Similarly, we calculated a ‘Trust score’ from the six items of the trust scale, with a higher score representing more trust in machines. Correlation coefficients were calculated between the Machine score, on the one hand, and the Trust score, age, gender, and year of automated driving, on the other. 3. Results 3.1. Electronic Voting System (EVS) data considerations Students who attended the lectures were about 23 years old, and about 80% were male. The use of the EVS was optional. Hence, not all students in the lecture room used the EVS and the number of responses varied per item. A

5338

J.C.F. de Winter and P.A. Hancock / Procedia Manufacturing 3 (2015) 5334 – 5341

technical problem occurred with a receiver device during the lecture on 11 November 2010. As a result, 45 and 47 responses were collected for the Judgment and Improvisation items, respectively, but only 19–22 responses were collected for the other nine items of the Fitts list. The maximum number of responses across the 11 items was 47, 44, 52, 21, 41, and 44 for the six measurement occasions, which implies that at least 249 students had participated. The total number of responses per item varied between 211 for Computation and 240 for Improvisation. 3.2. CrowdFlower data considerations 2,999 surveys were completed on CrowdFlower. The responses were gathered between 29 November 2015 15:34 and 30 November 2015 14:30 Central European Time. CrowdFlower offers respondents the option to provide satisfaction ratings. The respondents were generally satisfied with the task and payment (Overall satisfaction = 4.3/5; Instructions clear = 4.5/5; Ease of job = 4.3/5; Test questions fair = 4.1/5; Pay = 4.1/5, N = 479). A total of 58 respondents were excluded, leaving 2,941 respondents for further analyses. Of these 58 respondents, 25 respondents indicated they had not read the instructions, 14 respondents indicated they were under 18 years old, and 23 respondents answered I prefer not to respond for more than 3 items. Of the 2,941 respondents, 815 were female, 2102 were male, and 24 indicated I prefer not to respond. The mean age of the respondents was 31.8 years (SD = 10.9, median = 29; calculated after removing 2 of the 2,941 values because of an unrealistic age greater than 115 years). The median response regarding the question “In which year do you think that most cars will be able to drive fully automatically in your country of residence?” was 2030, identical to previous surveys in which we asked the same question [24, 26, 27]. The 2,941 respondents were from 103 countries. Most respondents were from India (N = 263), followed by Venezuela (N = 156), Spain (N = 137), Portugal (N = 112), Canada (N = 107), and the United States (N = 106). 3.3. Fitts list results for the Electronic Voting System and CrowdFlower samples Table 2 shows the results of the EVS and CrowdFlower surveys regarding the percentages of respondents who think that humans surpass machines, while Fig. 2 provides a visual representation of the same data. Figure 2 shows a congruence between both modes of surveying (r = 0.94). The EVS and crowdsourced samples concur that the Fitts list is no longer considered valid regarding Perception, Detection, and Long Term Memory. In other words, for these three items, present-day machines are now seen as having surpassed humans. The EVS sample seemed to exhibit more extreme responses than the crowdsourced sample. For example, the students were almost unambiguous regarding the fact that machines surpass humans regarding Replication, with 218 out of 220 respondents (99%) indicating this is the case. Contrastingly, in the Crowdsourced sample, ‘only’ 2,579 of the 2,915 respondents (88%) indicated that machines surpass humans. 3.4. Correlational analyses of CrowdFlower sample The crowdsourcing respondents’ Machine score correlated modestly with the Trust score (Pearson r = .23; p = 4.94*10-38; N = 2,941), but not with gender (r = 0.03, p = 0.063, N = 2,917; 1 = female, 2 = male), age (r = 0.02; p = 0.277; N = 2,939), or the year in which they think most cars will be able to drive fully automatically on the roads in their country of residence (ȡ = í0.03; p = 0.151; N = 2,757). For the latter correlation, we reported a Spearman rankorder correlation instead of a Pearson product-moment one, because the distribution of the reported year was highly skewed. For example, 25 people reported the 3000 as the year in which they think that most cars will be able to drive fully automatically in their country of residence.

J.C.F. de Winter and P.A. Hancock / Procedia Manufacturing 3 (2015) 5334 – 5341

5339

Table 2. The percentage of respondents who indicated that humans surpass machines. Electronic voting system (EVS)

CrowdFlower

Judgment

81% (192 / 237)

92% (2,674 / 2,919)

Improvisation

97% (232 / 240)

86% (2,525 / 2,935)

Simultaneous Operations

7% (15 / 218)

14% (424 / 2,923)

Speed & Power

6% (12 / 215)

14% (412 / 2,923)

Replication

1% (2 / 220)

12% (336 / 2,915)

Induction

88% (191 / 218)

86% (2,512 / 2,908)

Detection

16% (34 / 218)

22% (646 / 2,915)

Perception

45% (96 / 213)

31% (914 / 2,918)

Long Term Memory

29% (61 / 214)

15% (427 / 2,924)

Short Term Memory

22% (48 / 220)

17% (502 / 2,924)

Computation

25% (52 / 211)

48% (1,396 / 2,900)

Fig. 2. Percentage stating that humans surpass machines. A red colored circle designates that the result is not in agreement with the Fitts list.

3.5. Results as a function of survey time for the CrowdFlower sample One typical concern with surveys is that respondents may not take sufficient time to read the questions. CrowdFlower offers the opportunity to calculate the time spent to complete the survey, and to relate this number to the item responses. Respondents took on average 320 s to complete the survey (SD = 226 s, median = 256 s). Figure 3 shows the results of the Fitts list as a function of survey time. Specifically, we divided the 2,941 respondents into 50 groups based on their percentile rank of their survey completion time (Ns between 51 and 65 per group). The horizontal axis represents the mean survey time per group, while the vertical axis represents the corresponding percentage of respondents who believe humans surpass machines. The upper gray line represents the average of the following three items: Judgment, Improvisation, and Induction. The lower black represents the average of the following seven items: Simultaneous Operations, Speed & Power, Replication, Detection, Perception, Long Term Memory, Short Term Memory. The item Computation was excluded from this analysis, because the CrowdFlower sample seemed undivided regarding whether humans surpass machines or not (see Table 2). The results in Fig. 3 make clear that the fastest respondents (e.g., the fastest 2% who on average took only 67 s to complete the survey) gave answers that were closer to the equivocal 50% value.

J.C.F. de Winter and P.A. Hancock / Procedia Manufacturing 3 (2015) 5334 – 5341

Percentage humans surpass machines

5340

100 90 Humans surpass machines

80 70 60 50 40 30

Machines surpass humans

20 10 0 0

200

400

600

800

1000

1200

Survey time (s)

Fig. 3. The percentage of respondents who think that humans surpass machines as a function of survey time (N = 2,941)

4. Discussion When comparing these results to the original Fitts list, present-day humans apparently consider that machines surpass humans for detection, perception, and long-term memory, while in 1951 Paul Fitts and his colleagues argued the opposite held true. These patterns are evidently related to the improvements in sensor technology, artificial intelligence, and computer data storage capacity. However, a closer reading of the 1951 report reveals that some of these machine trends were already being anticipated by the authors at that time. For example, the 1951 report stated: “Machines can be constructed with memories, it is true, but the machines so far devised are not very efficient at the kind of selective, long-term storage needed in handling unique problems” and “a listing of those respects in which human capabilities surpass those of machines must, of course, be hedged with the statement that we cannot foresee what machines can be built to do in the future” (Fitts, 1951). The analysis according to survey completion time suggests that people who completed the survey quickly gave less meaningful answers than the other respondents (Fig. 3). Ironically, this observation represents a confirmation of the Fitts list: Machines excel in computational speed, while the information processing capacity of humans is rather low, as evidenced by a speed-accuracy trade-off in their answers. What the Fitts list cannot explain is the issue of individual differences: the reason why some crowdworkers preferred to optimize speed (possibly in an attempt to maximize monetary gain per time unit) while others opted for accuracy, is unknown at present. We observed some subtle differences between the responses provided by the engineering students and the general crowdsourcing public. The students were almost equivocal in stating that humans surpass machines regarding Improvisation, and that machines surpass humans regarding Replication, whereas the CrowdFlower respondents gave more mixed answers mixed. Whether the differences between the engineering students and the CrowdFlower respondents are attributable to the students’ knowledge of machines abilities is unknown. One limitation of surveys is that the question and context shape the answer [28]. The lecturer’s mood, intonation, or other idiosyncratic factors may have influenced the responses of the students. For example, in one of the lectures, a student asked: “What is inductive reasoning?”, and the lecturer replied: “Making an educated guess”. During this specific lecture, 100% of the students believed that humans surpass machines in Induction, compared to 88% overall (see Table 2). A limitation of asking humans to rate themselves is that humans generally overestimate their own abilities, especially in cases where the criterion is open to interpretation [29]. For example, most people think they are better drivers than the “average driver”, something that is logically impossible [30]. The march of automation in certain domains (e.g., computational speed) seems inexorable, but other dimensions (e.g., perception) are indeed much more equivocal in their definition, their nature, and the assumption of their function by machine surrogates. According Kurzweil [19], humans have a tendency to downplay the capacities of machine intelligence. For example, when in 1997 IBM’s Deep Blue beat the reigning world champion in a chess match, critics argued that IBM’s Deep Blue is

J.C.F. de Winter and P.A. Hancock / Procedia Manufacturing 3 (2015) 5334 – 5341

5341

unable to truly understand things such as language and metaphors. Now that computers are able to understand natural language (as demonstrated by IBM’s Watson), critics repeat the same message that computers do not truly ‘understand’ things. In reality, of course, less and less qualities are uniquely human, and the overall balance of humans and machines promises to set the profile of our future as a technology-dependent species. References [1] Fitts, P. M. (Ed.) (1951). Human engineering for an effective air-navigation and traffic-control system. Washington, DC: National Research Council. [2] De Winter, J. C. F., & Dodou, D. (2014). Why the Fitts list has persisted throughout the history of function allocation. Cognition, Technology & Work, 16, 1–11. [3] Hancock, P. A., & Scallen, S. F. (1996). The future of function allocation. Ergonomics in Design: The Quarterly of Human Factors Applications, 4, 24–29. [4] Hancock, P. A. (2014). Automation: how much is too much? Ergonomics, 57, 449–454. [5] Crouser, R. J., Ottley, A., & Chang, R. (2013). Balancing human and machine contributions in human computation systems. In Handbook of Human Computation (pp. 615–623). Springer New York. [6] Levis, A. H., Moray, N., & Hu, B. (1993). Task allocation problems and discrete event systems. In IFAC Symposia Series (p. 9). Pergamon Press. [7] Corkindale, K. G. (1967). Man—machine allocation in military systems. Ergonomics, 10, 161–166. [8] Binnig, G., Rohrer, H., Gerber, C., & Weibel, E. (1982). Surface studies by scanning tunneling microscopy. Physical Review Letters, 49, 57– 61. [9] Chou, C. W., Hume, D. B., Rosenband, T., & Wineland, D. J. (2010). Optical clocks and relativity. Science, 329, 1630–1633. [10] Davis, A., Rubinstein, M., Wadhwa, N., Mysore, G. J., Durand, F., & Freeman, W. T. (2014). The visual microphone: passive recovery of sound from video. ACM Transactions on Graphics, 33, 79. [11] Moore, G. E. (1965). Cramming more components onto integrated circuits. Electron, 38, 114–117 [12] Binnig, G., Despont, M., Drechsler, U., Haeberle, W., Lutwyche, M., Vettiger, P., ... & Kenny, T. W. (1999). Ultrahigh-density atomic force microscopy data storage with erase capability. Applied Physics Letters, 74, 1329–1331. [13] Grochowski, E., & Halem, R. D. (2003). Technological impact of magnetic hard disk drives on storage systems. IBM Systems Journal, 42, 338–346. [14] Graves, A., Mohamed, A. R., & Hinton, G. (2013, May). Speech recognition with deep recurrent neural networks. In 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (pp. 6645–6649). [15] Taigman, Y., Yang, M., Ranzato, M. A., & Wolf, L. (2014, June). Deepface: Closing the gap to human-level performance in face verification. In 2014 IEEE Conference on Computer Vision and Pattern Recognition (pp. 1701–1708). [16] Ferrucci, D. A. (2012). Introduction to “This is Watson”. IBM Journal of Research and Development, 56. [17] Searle, J. (2011). Watson doesn’t know it won on “Jeopardy!”. The Wall Street Journal, February, 23, 15A. [18] Cummings, M. M. (2014). Man versus machine or man+ machine? Intelligent Systems, IEEE, 29, 62–69. [19] Kurzweil, R. (2012). How to create a mind: The secret of human thought revealed. Penguin. [20] Felce, A. (2007). A critical analysis of the use of electronic voting systems: ask the audience. Emirates Journal for Engineering Research, 12, 11–26. [21] King, S. O., & Robinson, C. L. (2009). ‘Pretty Lights’ and Maths! Increasing student engagement and enhancing learning through the use of electronic voting systems. Computers & Education, 53, 189–199. [22] Crump, M. J., McDonnell, J. V., & Gureckis, T. M. (2013). Evaluating Amazon’s Mechanical Turk as a tool for experimental behavioral research. PLOS ONE, 8, e57410. [23] Rand, D. G. (2012). The promise of Mechanical Turk: How online labor markets can help theorists run behavioral experiments. Journal of Theoretical Biology, 299, 172–179. [24] De Winter, J. C. F., Kyriakidis, M., Dodou, D., & Happee, R. (2015). Using CrowdFlower to study the relationship between self-reported violations and traffic accidents. Proceedings of the 6th International Conference on Applied Human Factors and Ergonomics (AHFE). Las Vegas, NV. [25] Merritt, S. M., Heimbaugh, H., LaChapell, J., & Lee, D. (2013). I trust it, but I don’t know why: effects of implicit attitudes toward automation on trust in an automated system. Human Factors, 55, 520–534. [26] Bazilinskyy, P., & De Winter, J. C. F. (2015). Auditory interfaces in automated driving: an international survey. PeerJ Computer Science. [27] Kyriakidis, M., Happee, R., & De Winter, J. C. F. (2015). Public opinion on automated driving: Results of an international questionnaire among 5,000 respondents. Transportation Research Part F: Traffic Psychology and Behaviour, 32, 127–140. [28] Schwarz, N. (1999). Self-reports: how the questions shape the answers. American Psychologist, 54, 93–105. [29] Dunning, D., Meyerowitz, J. A., & Holzberg, A. D. (1989). Ambiguity and self-evaluation: The role of idiosyncratic trait definitions in selfserving assessments of ability. Journal of Personality and Social Psychology, 57, 1082–1090. [30] Sundström, A. (2008). Self-assessment of driving skill–A review from a measurement perspective. Transportation Research Part F: Traffic Psychology and Behaviour, 11, 1–9.