Short-term reliability of a brief hazard perception test

Accident Analysis and Prevention 73 (2014) 41–46 Contents lists available at ScienceDirect Accident Analysis and Prevention journal homepage: www.el...

Download PDF

845KB Sizes 1 Downloads 81 Views

Report

PDF Reader
Full Text

Accident Analysis and Prevention 73 (2014) 41–46

Contents lists available at ScienceDirect

Accident Analysis and Prevention journal homepage: www.elsevier.com/locate/aap

Short-term reliability of a brief hazard perception test Charles T. Scialfa ∗ , Rosemary S. Pereverseff, David Borkenhagen University of Calgary, Department of Psychology, Calgary, AB, Canada T2N 1N4

a r t i c l e

i n f o

Article history: Received 13 October 2013 Received in revised form 2 May 2014 Accepted 7 August 2014 Keywords: Hazard perception Reliability

a b s t r a c t Hazard perception tests (HPTs) have been successfully implemented in some countries as a part of the driver licensing process and, while their validity has been evaluated, their short-term stability is unknown. This study examined the short-term reliability of a brief, dynamic version of the HPT. Fifty-ﬁve young adults (Mage = 21 yrs) with at least two years of post-licensing driving experience completed parallel, 21-scene HPTs with a one-month interval separating each test. Minimal practice effects (∼0.1 s) were manifested. Internal consistency (Cronbach’s alpha) averaged 0.73 for the two forms. The correlation between the two tests was 0.55 (p < 0.001) and correcting for lack of reliability increased the correlation to 0.72. Thus, a brief form of the HPT demonstrates acceptable short-term reliability in drivers whose hazard perception should be stable, an important feature for implementation and consumer acceptance. One implication of these results is that valid HPT scores should predict future crash risk, a desirable property for user acceptance of such tests. However, short-term stability should be assessed over longer periods and in other driver groups, particularly novices and older adults, in whom inter-individual differences in the development of hazard perception skill may render HPT tests unstable, even over short intervals. © 2014 Elsevier Ltd. All rights reserved.

Hazard perception, the ability to detect and respond to potentially dangerous situations, is a critical skill for safe driving. Elander, West, and French (1993) reviewed factors that increase crash risk, and found that driving speed, risk-taking (e.g., the tendency to commit trafﬁc violations) and hazard perception were each associated with collision likelihood. Of these, hazard perception is the most amenable to off-road assessment, as individuals who are more riskseeking or tend to speed might be able to use deceit in response to a test that examines these tendencies (Wetton et al., 2011). In consequence, hazard perception tests (HPTs) have been developed for licensing, training and evaluation (Fisher et al., 2006; Senserrick and Haworth, 2005; Wells et al., 2008; Wood et al., 2013). The characteristics of HPTs have varied across research questions, research groups and countries. HPTs have been developed in a variety of formats, including those utilizing still images (Huestegge et al., 2010; Scialfa et al., 2012a; Whelan et al., 2002) or simulated plan views of potentially hazardous scenarios (Fisher et al., 2006), dynamic video sequences (Horswill et al., 2010a; Scialfa et al., 2011; Shahar et al., 2010) and dynamic simulations (Fisher et al., 2006). In the UK, the brief HPT used for licensure consists of a small number of video segments requiring a speeded response (Driver and Vehicle

∗ Corresponding author. Tel.: +1 403 220 4951. E-mail address: [email protected] (C.T. Scialfa). http://dx.doi.org/10.1016/j.aap.2014.08.007 0001-4575/© 2014 Elsevier Ltd. All rights reserved.

Standards Agency, 2014). Queensland’s computer-based HPT also involves video segments depicting a variety of hazards, including lead vehicle braking, abrupt left turns from an oncoming vehicle and pedestrians entering the roadway unexpectedly. It is estimated to take 14 min on average to complete (Queensland Department of Transport and Main Roads, 2014). There is considerable evidence substantiating the validity of HPTs. Estimations of the risk in HPT scenarios are strongly correlated with on-road risk estimates (Watts and Quimby, 1979). Novice drivers respond more slowly on HPTs than do their agematched counterparts, even when baseline differences in simple reaction time are controlled (Scialfa et al., 2011; Wallis and Horswill, 2007). Older drivers are also slower to respond to videobased HPTs (Scialfa et al., 2012b; Wood et al., 2013), even after controlling for age-related generalized slowing that is greater for visuospatial tasks (Jenkins et al., 2000). Furthermore, crash involvement is greater in those who are deﬁcient in hazard perception (Boufous et al., 2011; Horswill et al., 2010a; Wells et al., 2008). Finally, HPT training produces systematic improvements in simulator-based and on-road driving safety (Fisher et al., 2006). Thus, it is clear that HPTs are valid measures of a critical component of driving safety. Separate from the issue of validity is that of the reliability of an HPT. Broadly construed as consistency of a test, the most common means by which reliability has been evaluated is through internal consistency, generally operationalized by Cronbach’s alpha.

42

C.T. Scialfa et al. / Accident Analysis and Prevention 73 (2014) 41–46

This measure is inﬂuenced greatly by the inter-correlations among items in a test; as such, a composite HPT, which results from the average of responses to many different scenarios, will be inﬂuenced by the degree to which a person’s latency to one scene is predictive of their latency to all scenes. A wide range of internal consistencies has been reported for HPTs, from unacceptably low to quite high (i.e., 0.28–0.93) (see Horswill and McKenna, 2004; Pelz and Krupat, 1974; Scialfa et al., 2011, 2012a). There are a number of reasons for these across-study discrepancies, including variations in test length and variations in scene content that may measure risk-tolerance and not hazard perception, per se. It is important to bear in mind, however, that internal consistency is only one measure of reliability (cf., Lord and Novick, 1968), measuring how well scores generalize across items in a multi-item test. Because it can be measured at one time point, it is an economical psychometric index, but it is of no use in estimating the stability of scores, another desirable test property (see Oliver and Benet-Martínez, 2000 for an excellent review). Assuming that the underlying ability has stabilized, a reliable test will also demonstrate rank-order stability over time, as measured by a Pearson correlation. That is, a person’s relative position in a distribution of scores will remain stable from one time of testing to another (even if there is a mean shift in all scores due to practice, fatigue or other factors). While it is often assumed that valid measures are reliable, this is not necessarily the case. A measure may correlate strongly with other measures of the same construct (e.g., braking distance to a hazard), discriminate groups known to differ on the construct (e.g., safe and unsafe drivers) and have a strong association with relevant outcome variables (e.g., crash risk), but have low test–retest reliability if there are individual differences in rate of growth or change in the ability (Heise, 1969). In fact, poor stability may be expected where a valid test reﬂects a more transient or developing state, as opposed to an enduring trait (Nesselroade et al., 1984). As an example, blood sugar levels may be measured quite accurately, but individual differences in short-term ﬂuctuations are well known and would diminish any estimate of reliability based on two measures taken over time. On the other hand, considerable test–retest reliability would be expected in a valid measure that has stabilized developmentally, such as verbal skills in adulthood. Many studies involving HPT do not report on the reliability of the measures taken (e.g., Borowsky et al., 2010; Huestegge et al., 2010; Shahar et al., 2010). This should not be taken as a strong criticism, because their intent is not to assess the psychometric properties of a test that will be used for population-level applications. However, even when the HPTs are being developed for applied purposes, the reliability estimates provided (e.g., Horswill et al., 2008) are measures of internal consistency (i.e., Cronbach’s alpha) and not short-term stability, because HPT was measured at only one time point. To our knowledge, while brief HPTs have been shown to be valid and internally consistent, to date there has not been an assessment of their short-term reliability in drivers whose hazard perception would be expected to have stabilized. While stability may not be obtained in those just learning to drive or those who are suffering from conditions that quickly change their capacity to identify hazards (e.g., stroke or retinal detachment), one would expect HPT scores to be relatively stable (in the rank-order sense) in those with sufﬁcient experience. In fact, lack of short-term reliability in such a group would threaten user acceptance and adoption because the intent of such a test is to predict future risk. In the current study we asked younger drivers with at least two years of post-licensing driving experience to complete alternate forms of a brief, dynamic HPT on two occasions, separated by approximately one month. Earlier work with longer versions of these tests (e.g., Scialfa et al., 2011, 2012a,b) allowed for the

expectation that they would have high levels of internal consistency, but the more interesting question was whether they would also demonstrate sufﬁcient across-time stability. 1. Methods 1.1. Participants Sixty-two undergraduate students (42 females and 20 males) were recruited through the University of Calgary’s Research Participation System (RPS), and took part in this study in return for partial course credit. Criteria for participation included having a full driving license (Alberta Class 5 or better) for a minimum of two years. This young, but experienced driver sample was chosen because of convenience, ease of comparison with our previous studies in which a similar sample has served as a reference group (e.g., Scialfa et al., 2011, 2012a,b) and because this is an age and driving experience group in which hazard perception is likely to remain stable over the time interval of testing. Additional criteria included normal or corrected-to-normal visual acuity (20/40 or better), color vision, and contrast sensitivity. After participation, some participants were excluded from the analysis because they did not have a full driving license or did not attend a second session. Fifty-ﬁve participants were included in the analysis below. 1.2. Materials and apparatus 1.2.1. Hazard perception test Accuracy and reaction time was measured for a number of potentially hazardous driving scenarios using two parallel versions of a dynamic HPT. The primary measures gathered were the accuracy and reaction time for a number of potentially hazardous driving scenarios. The hazard perception test was modeled after that used by the UK and some Australian states (e.g., Horswill et al., 2008), which has been demonstrated to be a reliable and valid measure of collision risk in both novice and older drivers (Horswill et al., 2010b; Wells et al., 2008). The scenes were ﬁlmed in Vancouver, B.C., Canada, and surrounding areas using a Sony Handycam Camcorder, model HDRSR11 in AVCHD 16 M (FH) format at a resolution of 1920 × 1080/60i. The camera was mounted inside a 2005 Subaru Impreza and secured to the inside door window on the passenger side of the vehicle. An extendable arm allowed the videotaped scenes to give a “driver’s eye” view. Filming occurred in March and April 2009, during daylight hours, generally under clear skies and dry roadway conditions in a variety of frequently encountered environments (e.g., residential, limited-access freeway). Each driving scene was edited from original ﬁles using Sony Vegas Movie Studio Platinum software (version 9.0a) at a resolution of 1280 × 720. The two brief HPTs were shorter versions than those used by Scialfa et al. (2011). Both consisted of 26 scenes: Version 1’s scenes lasted from 12 to 62 s, and version 2’s scenes lasted from 17 to 58 s. When based on the sample of experienced young adults who participated in the study conducted by Scialfa et al. (2011), the internal consistency (Cronbach’s ˛) for version 1 was 0.79, and for version 2 was 0.67. Their mean reaction times were 2.84 s and 2.71 s, respectively. A trafﬁc conﬂict was deﬁned as an event that required the driver to take evasive action, such as slowing down or swerving, to avoid a stationary or moving object (e.g., a parked vehicle, a construction pylon, or a cyclist). Both versions had a good representation of trafﬁc conﬂicts that most frequently result in collisions by novice and adult drivers (McKnight and McKnight, 2003; Preusser et al., 1998). Table 1 lists the types of trafﬁc conﬂicts represented by the 34 conﬂict scenes in both versions. In each version, seventeen (65%) scenes

C.T. Scialfa et al. / Accident Analysis and Prevention 73 (2014) 41–46

43

Table 1 Types of trafﬁc conﬂicts in each analyzed version of the HPT. Types of trafﬁc conﬂicts

Version 1

Version 2

n (%)

n (%)

Vehicle moving in the same direction as the camera car 2 (12.5) Signal/right turn Vehicle parking 1 (6.25) Vehicle slowing 2 (12.5) 3 (18.75) Vehicle merging/moving into path 2 (12.5) Vehicle stopped in camera car’s lane Vehicle moving in the opposite direction as the camera car 1 (6.25) Vehicle crossing from the left Miscellaneous Parked vehicle 2 (12.5) Pedestrian/cyclist 3 (18.75) Total scenes

16

2 (13.33) 1 (6.67) 3 (20.0) 3 (20.0) 2 (13.33) 0 (0) 2 (13.33) 2 (13.33) 15

contained a trafﬁc conﬂict, while nine (35%) contained no trafﬁc conﬂicts. The scenes that had no trafﬁc conﬂict were included to moderate a participant’s criterion for responding, reducing a bias for the selection of even improbable conﬂicts. Examples of these conﬂicts, still images taken from the videos used, are shown in Fig. 1. Average size at the onset of the trafﬁc conﬂict was 2.9 ± 1.3◦ high by 4.3 ± 2.2◦ wide. A trafﬁc conﬂict’s spatial location at onset was 1.0◦ ± 0.8 and 0.8◦ ± 4.3 left of the centre of the scene. Custom software deﬁned the onset, offset and spatial extent of the trafﬁc conﬂicts of each scene (see Marrington et al., 2008). This same software was used to present these driving scenes to participants and record the spatial coordinates of their responses and their reaction times. A touch-sensitive 17-in., LCD desktop monitor (Elo TouchSystems 1729L), with a resolution of 1280 × 1024 was used to present the HPTs (viewing distance of approximately 50 cm) and collect responses. A small, yellow circle appeared at the location touched by the participant so that they were aware their responses were being registered. It did not, however, reveal any information about the accuracy or latency of their responses. 1.2.2. Simple spatial reaction time To control for baseline response latency, a simple spatial reaction time test (SSRT) was given to each participant. In this test, 16 high-contrast (Michelson contrast = 93%) black boxes of differing sizes appeared at random intervals and locations on the monitor. The size of the boxes ranged from 2.75 cm × 2.8 cm to 13 cm × 14 cm and were chosen to represent the 25th, 50th, 75th, and 100th percentiles of the height and width at ﬁrst appearance of the hazards in the HPT. Participants were instructed to touch these boxes as quickly and accurately as possible. As in the HPT, a small yellow circle that appeared where the participant touched the monitor provided feedback. 1.2.3 Vision tests Visual acuity was assessed at a distance of 40 cm with a PostScript generated Landolt C chart that measured acuity in 0.05 logMAR steps from 20/200 to 20/10. Contrast sensitivity at 1.5–18 cycles per degree was measured with the VISTECH 6500. Color vision was assessed with the Farnsworth D-15 Color Test (Farnsworth, 1943). All vision testing was carried out within recommended photopic luminance levels. 1.3. Procedure Participants were randomly assigned to two groups, varying the order of tests taken for each group. Each participant was tested in two sessions that lasted about one hour and which were separated by 19 to 38 days (Mseparation = 27.58 days). Version 1 of the HPT was

Fig. 1. Static examples of representative hazards. Upper panel—lead vehicle braking. Middle panel—sudden left turn of oncoming vehicle. Lower panel—pedestrian entering roadway. Original images are of higher resolution. Blur results from capturing digital video as static image.

administered to a randomly selected one-half of the participants in the ﬁrst session, who received version 2 in the second session. The order was reversed for the remaining observers. Upon arrival at the lab, each participant was informed of the details of the study and consent was obtained, after which vision tests were administered. Next, the participants were shown how to use the touch-screen monitor, and the SSRT was given. Instructions for the HPT were provided, and then seven scenes were practiced to allow the participant to become familiar with the task and protocol. Between the practice scenes, the researcher provided feedback and further instruction on what deﬁned a trafﬁc conﬂict. Finally, the participants took the HPT that, on average, took 13.1 min for version 1 and 12.7 min for version 2. 2. Results Table 2 provides descriptive data for those tested. Group 1 (n = 29) and Group 2 (n = 26) were made up of participants aged

44

C.T. Scialfa et al. / Accident Analysis and Prevention 73 (2014) 41–46

Table 2 Average demographic data and test scores from Group 1 and Group 2.

Age (years) Gender ratio (M/F) Years of education (beginning in Grade 1) Years holding learner’s permit Years of licensure (Class 5 or equivalent) Driven distance (km/yr) LogMAR near visual acuity Contrast sensitivity—SF1.5 Contrast sensitivity—SF3 Contrast sensitivity—SF6 Contrast sensitivity—SF12 Contrast sensitivity—SF18 SSRT (s) Days between Sessions 1 and 2

Group 1 mean (SD)

Group 2 mean (SD)

p-Value*

21.9 (3.75) 17/12 15.02 (1.81) 1.99 (1.32) 4.59 (3.31) 7607 −0.04 72.41 (31.13) 143.93 (44.01) 146.03 (36.07) 87.90 (19.59) 27.62 (13.29) 0.76 (0.11) 27.97 (3.59)

20.54 (1.39) 19/7 14.87 (1.69) 1.80 (0.57) 4.04 (1.26) 22615 −0.01 72.50 (21.04) 135.73 (44.89) 126.81 (49.96) 75.69 (29.52) 29.69 (15.94) 0.74 (0.10) 27.15 (3.16)

0.078 0.266 0.750 0.498 0.411 0.009 0.299 0.990 0.502 0.105 0.081 0.602 0.566 0.38

Note: SF = spatial frequency in cycles per degree of visual angle. * p-Value associated with t-test.

from 18 to 32 years, and 18 to 23 years, respectively. All had previous driving experience of two years or more, and held a valid driving license. The groups differed signiﬁcantly in self-reported distance driven per year. The also differed marginally on age and contrast sensitivity at a spatial frequency of 12 cpd, but these differences, while statistically approaching signiﬁcance, are not likely important in the present context (Table 2). The SSRT for each participant was calculated by taking the mean of all SSRT trials from both sessions. If a response was not recorded for a trial, it was replaced by the group’s average response time for that trial. The group means did not differ. Although trafﬁc conﬂicts were contained in each of the 17 scenes in both versions of the HPT, not all conﬂicts were identiﬁed consistently by all observers. Because it is difﬁcult to interpret HPT data when experienced drivers fail to identify the hazard, some scenes were eliminated following data collection. These scenes, one from version 1 and two from version 2 were eliminated if the hit rate within the entire group was less than 85%. The average of hazard perception response time was calculated using the remaining scenes. A false alarm was identiﬁed as a non-trafﬁc conﬂict scene in which the participant responded as if there was a conﬂict. Any scenes that produced a group false alarm of more than 15% were eliminated. Out of the total 18 scenes with no trafﬁc conﬂicts, four were eliminated (3 from version 1, and 1 from version 2). False alarm rates were calculated from the remaining scenes. The two groups did not differ in false alarms. A hit occurred when a participant properly identiﬁed a hazard within its temporal and spatial window. Hit rate was calculated for the scenes that were not eliminated through the scene selection process. Hit rates did not differ signiﬁcantly between groups. Using the retained 16 trafﬁc conﬂict driving scenes in version 1, and 15 in version 2, an average hazard perception reaction time (HPRT) was calculated for each time of testing. HPRTs did not exhibit a time effect for Group 1 (Ms = 2.3990 and 2.2831, respectively, p = 0.468) or Group 2 (Ms = 2.2485 and 2.3281, respectively, p = 0.611). On average, HPRTs changed 0.10 s from the ﬁrst to second session, a considerable mean stability because observers responded to alternate forms with different scenes tested at a one-month interval (Table 3). Cronbach’s alpha was calculated from z-scores of average response times for all selected scenes containing a trafﬁc conﬂict. These values were 0.73 for version 1 and 0.79 for version 2, a satisfactory level of reliability according to Bland and Altman, 1997. Average standardized HPRTs were moderately correlated, r = 0.554, p < 0.001. Using the internal consistencies as indices of reliability, the disattenuated correlation was 0.72. This disattenuated

Table 3 HPT test scores from Group 1 and Group 2.

HPRT in version 1 (s) HPRT in version 2 (s) Hit rate in version 1 (%) Hit rate in version 2 (%) False alarm rate in version 1 (%) False alarm rate in version 2 (%)

Group 1 mean (SD)

Group 2 mean (SD)

p-Value*

2.40 (0.49) 2.28 (0.70) 95.04 (9.79) 97.47 (4.15) 6.32 (11.28) 6.47 (12.33)

2.25 (0.52) 2.33 (0.60) 97.60 (3.10) 97.44 (3.81) 5.77 (13.29) 11.06 (14.72)

0.277 0.806 0.209 0.974 0.868 0.214

Note: All scores calculated after removal of scenes. * p-Value associated with t-test.

correlation indicates what the short-term stability of the HPT would be if the reliability could be increased, most likely by increasing the number of items in the test. 3. Discussion Hazard perception is a critical component to safe driving. HPTs, tests that assess hazard perception skill, have revealed sensitivity to differences between experienced and novice drivers (e.g., Groeger and Chapman, 1996; Maycock and Lockwood, 2007; Scialfa et al., 2011), agreement with driver’s risk ratings of real driving situations (Watts and Quimby, 1979) and associations with crash likelihood (Wells et al., 2008). Thus, HPTs appear to be a valid measure of the ability to perceive hazards in the driving environment. Aside from the issue of validity, if used in applied settings where decisions (e.g., about licensure or employment) are based on test performance, HPTs should demonstrate good reliability, broadly operationalized as consistency in the obtained score for a given individual. In multiple-item tests, when the underlying ability has stabilized, this means that scores on one item of the test should correlate positively with scores on other items, that alternate forms of the same test should be strongly associated, and that performance at one time should be positively correlated with scores at another time (see Oliver and Benet-Martínez, 2000). Although the reliability of HPTs has not been evaluated in many examinations of hazard perception, more recent studies with a focus on test construction and evaluation have reported acceptable levels of internal consistency, as estimated by Cronbach’s alpha. For example, Scialfa et al. (2011) found that an 18-item dynamic HPT that discriminated novice from experienced drivers had an internal consistency of 0.75 and that a static 21-scene HPT with comparable discriminant validity yielded a Cronbach’s alpha of 0.91 (Scialfa et al., 2012b). Similarly, Horswill et al. (2008, Study 1) reported a Cronbach’s alpha of 0.78 for a brief, dynamic HPT when used with novice and experienced drivers. Additionally, Smith et al. (2009)

C.T. Scialfa et al. / Accident Analysis and Prevention 73 (2014) 41–46

found that brief HPTs with good internal consistency also showed good parallel-forms reliability. While such results are encouraging, conﬁdent use of an HPT depends on the stability of test scores over time in those individuals who can be assumed to have stabilized in the underlying ability. The purpose of this study was to examine the short-term reliability of a pair of brief HPTs in experienced, young adult drivers. When disattenuated for the less-than-perfect internal consistency of the tests, we found a short-term reliability of 0.72 for these brief HPTs. This is an encouraging result, because such stability is a necessary property of any test that will be accepted by consumers and policy-makers in such an important activity as driving a personal vehicle. Lack of reliability would immediately increase the probability that an HPT could be challenged if one’s score on the test adversely affected one’s licensing or employment. There are several reasons why the correlation between scores is not higher. First, short-term reliability is to be expected only when there are no across-person differences in development. If performance has not stabilized and individuals vary in their rate of learning, then a person’s rank order will change from one time of testing to another, thereby lowering the correlation between test scores. While all observers were required to have at least two years of post-license driving experience, this may not be sufﬁcient for all people to have reached asymptotic levels of hazard perception skill. As well, there are likely individual differences in how observers learn the HPT task. Second, both temporal and content variability may have reduced short-term stability as assessed. Alternate forms reliability (e.g., Smith et al., 2009) is generally determined over a very brief time interval and test–retest reliability usually involves the same test content. Because our observers responded to different, though admittedly similar, content over a one-month interval, the correlation between tests may have been reduced. Given these considerations, the obtained correlation seems quite acceptable for advocating the use of brief HPTs. Of course, it is possible to increase the reliability of the test by increasing its length (see Horswill and McKenna, 2004 for a review). While increasing test length is relatively easy to accomplish, the greater time of testing may render such a test less useful for testing large numbers of people, as is done in those cases where HPTs are used as part of a licensing or evaluation process. Thus, we would suggest that a brief test with acceptable reliability and validity is generally preferred for use in the ﬁeld. One potential shortcoming to this research involves the choice of sample. The drivers who participated had an average age of about 21 years and had been driving for approximately 4 years. Collision data (see Evans, 2004 for a review) suggest that hazard perception is deﬁcient in this group and that hazard perception can be improved even in those with considerable driving experience (Horswill et al., 2010a,b, 2013). Thus, it is desirable to replicate this work with very experienced drivers, for example, those working in the commercial trucking industry, to determine if the same short-term stability obtains. On the other hand, our observation of acceptable reliability in a younger sample is encouraging, because it suggests that HPT performance is stable, at least over the short term. Clearly, however, a longer test–retest interval would strengthen this assertion. Future work can be directed along two related paths pertaining to validity and reliability. Validity can be assessed in several ways, but perhaps the most obvious is to predict on-road performance from a brief HPT. Favorable results have been reported recently by Wood et al. (2013) in which a brief HPT was used to predict, with more than 80% accuracy, which older drivers would pass or fail an on-road test. We are currently completing a similar study of healthy, older North American drivers and of cognitively impaired drivers.

45

Short-term stability must also be assessed in at least three other populations. In novice and impaired older drivers, one might expect lower short-term reliability because people improve and decline at differing rates. In fact, short-term stability may not be expected in a valid test. Furthermore, intraindividual variability in both mean and rank performance may be a powerful indicator of driving safety, given that variability has been shown to be an important predictor in other domains such as neurological and cardiovascular health (Jackson et al., 2012; Ram et al., 2011). However, in very experienced, more mature drivers, the short-term stability should be quite high. Whether this obtains is an empirical question. Acknowledgements This research was supported by grants from (A303-ASQ) and the Alberta Motor Association (10001220). We would like to express our gratitude to Mark Horswill and The University of Queensland for use of the software to collect HPT data. Thanks also to David Stewart for invaluable assistance in programming and data analysis. Address correspondence to Charles T. (Chip) Scialfa, Department of Psychology, University of Calgary, Calgary, AB T2N 1N4, Canada. References Bland, J.M., Altman, D.G., 1997. Cronbach’s alpha. Br. Med. J. 314, 572. Borowsky, A., Shinar, D., Oran-Gilad, T., 2010. Age, skill, and hazard perception in driving. Accid. Anal. Prev. 12, 277–287. Boufous, S., Ivers, R., Senserrick, T., Stevenson, M., 2011. Attempts at the practical onroad driving test and the hazard perception test and the risk of trafﬁc crashes in young drivers. Trafﬁc Inj. Prev. 12, 475–482. Driver and Vehicle Standards Agency, 2014. Driving theory test: booking and taking your test, Retrieved from http://safedrivingforlife.info/learners/i-wantdrive/driving-theory-test-booking-and-taking-your-test (retrieved April 23, 2014). Elander, J., West, R., French, D., 1993. Behavioural correlates of individual differences in road-trafﬁc crash risk: an examination of methods and ﬁndings. Psychol. Bull. 113, 279–294. Evans, L., 2004. Trafﬁc Safety. Science Serving Society, Bloomﬁeld Hills, MI. Farnsworth, D., 1943. Farnsworth-Munsell 100-hue and dichotomous tests for color vision. J. Opt. Soc. Am. 33, 568–574. Fisher, D.L., Pollatsek, A.P., Pradhan, A., 2006. Can novice drivers be trained to scan for information that will reduce their likelihood of a crash? Inj. Prev. 12 (Suppl I), i25–i29. Groeger, J.A., Chapman, P.R., 1996. Judgement of trafﬁc scenes: the role of danger and difﬁculty. Appl. Cogn. Psychol. 10, 349–364. Heise, D.R., 1969. Separating reliability and stability in test–retest correlation. Am. Sociol. Rev. 34, 93–101. Horswill, M.S., Anstey, K.J., Hatherly, C.G., Wood, J., 2010a. The crash involvement of older drivers is associated with their hazard perception latencies. J. Int. Neuropsychol. Soc. 16, 939–944. Horswill, M.S., Kemala, C.N., Wetton, M., Scialfa, C.T., Pachana, N.A., 2010b. Improving older drivers’ hazard perception ability. Psychol. Aging 25, 464–469. Horswill, M.S., Marrington, S.A., McCullough, C.M., Wood, J., Pachana, N.A., McWilliam, J., Raikos, M.K., 2008. The hazard perception ability of older drivers. J. Gerontol. B: Psychol. Sci. 63B, 212–218. Horswill, M.S., McKenna, F.P., 2004. Drivers’ hazard perception ability: situation awareness on the road. In: Banbury, S., Tremblay, S. (Eds.), A Cognitive Approach to Situation Awareness: Theory and Application. Ashgate, Aldershot, UK, pp. 155–175. Horswill, M.S., Taylor, K., Newman, S., Wetton, M., Hill, A., 2013. Even highly experienced drivers beneﬁt from a brief hazard perception training intervention. Accid. Anal. Prev. 52, 100–110. Huestegge, L., Skottke, E., Anders, S., Müsseler, J., Debus, G., 2010. The development of hazard perception: dissociation of visual orientation and hazard processing. Transportation Research Part F. Trafﬁc Psychol. Behav. 13, 1–8. Jackson, J.D., Balota, D.A., Duchek, J.M., Head, D., 2012. White matter integrity and reaction time intraindividual variability in healthy aging and early-stage Alzheimer disease. Neuropsychologia 50, 357–366. Jenkins, L., Myerson, J., Joerding, J., Hale, S., 2000. Converging evidence that visuospatial cognition is more age-sensitive than verbal cognition. Psychol. Aging 15, 157–175. Lord, F.M., Novick, M.R., 1968. Statistical theories of mental test scores. AddisonWesley, Reading, MA. Marrington, S., Horswill, M., Wood, J., 2008. The effect of simulated cataract on drivers’ hazard perception ability. Optom. Vis. Sci. 85, 1121–1127. Maycock, G., Lockwood, C.R., 2007. The accident liability of British car drivers. Transp. Rev. 13, 231–245.

46

C.T. Scialfa et al. / Accident Analysis and Prevention 73 (2014) 41–46

McKnight, J.A., McKnight, S.A., 2003. Young novice drivers: Careless or clueless? Accid. Anal. Prev. 35, 921–925. Nesselroade, J.R., Mitteness, L.S., Thompson, L.K., 1984. Short-term changes in anxiety, fatigue, and other psychological states in older adulthood. Res. Aging 6, 3–23. Oliver, P.J., Benet-Martínez, V., 2000. Measurement: reliability, construct validation, and scale construction. In: Reis, H.T., Judd, C.M. (Eds.), Handbook of Research Methods in Social and Personality Psychology. Cambridge University Press, New York, NY, pp. 339–369. Pelz, D.C., Krupat, E., 1974. Caution proﬁle and driving record of undergraduate males. Accid. Anal. Prev. 6, 45–58. Preusser, D.F., Williams, A.F., Ferguson, S.A., Ulmer, R.G., Weinstein, H.B., 1998. Fatal crash risk for older drivers at intersections. Accid. Anal. Prev. 30, 151–159. Queensland Department of Transport and Main Roads, 2009. Hazard Perception Test for P1 Licence Holders, Retrieved online www.transport.qld.gov.au/hpt on April 23, 2014. Ram, N., Gerstorf, D., Lindenberger, U., Smith, J., 2011. Developmental change and intraindividual variability: relating cognitive aging to cognitive plasticity, cardiovascular lability, and emotional diversity. Psychol. Aging 26, 363–371. Scialfa, C.T., Borkenhagen, D., Lyon, J., Deschênes, M., Horswill, M., Wetton, M., 2012a. The effects of driving experience on responses to a static hazard perception test. Accid. Anal. Prev. 45, 547–553. Scialfa, C.T., Deschênes, M.C., Ference, J., Boone, J., Horswill, M.S., Wetton, M., 2011. A hazard perception test for novice drivers. Accid. Anal. Prev. 43, 204–208. Scialfa, C., Deschênes, M., Ference, J., Boone, J., Horswill, M.S., Wetton, M., 2012b. Hazard perception in older drivers. Int. J. Hum. Factors Ergon. 1, 221–233.

Senserrick, T., Haworth, N., 2005. Review of Literature Regarding National and International Young Driver Training, Licensing, and Regulatory Systems (Report no. 239), Retrieved from Monash University website: http://www.monash.edu.au/miri/research/reports/muarc239.pdf. Shahar, A., Alberti, C., Clarke, D., Crundall, D., 2010. Hazard perception as a function of target location and the ﬁeld of view. Accid. Anal. Prev. 42, 1577–1584. Smith, S.S., Horswill, M.S., Chambers, B., Wetton, M., 2009. Hazard perception in novice and experienced drivers: the effects of sleepiness. Accid. Anal. Prev. 41, 729–733. Wallis, T.S.A., Horswill, M.S., 2007. Using fuzzy signal detection theory to determine why experienced and trained drivers respond faster than novices in a hazard perception test. Accid. Anal. Prev. 39, 1177–1185. Watts, G.R., Quimby, A.R., 1979. Design and validation of a driving simulator. In: Report LR 907. Transport and Road Research Laboratory, Crowthorne, UK. Wells, P., Tong, S., Sexton, B., Grayson, G., Jones, E., 2008. Cohort II: A Study of Learner and New Drivers, Retrieved from Department for Transport (UK) website: http://www.dft.gov.uk/publications/cohort-ii-a-study-oflearner-and-new-drivers/. Wetton, M.A., Hill, A., Horswill, M.S., 2011. The development and validation of a hazard perception test for use in driver licensing. Accid. Anal. Prev. 43, 1759–1770. Whelan, M., Groeger, J., Senserrick, T., Triggs, T., 2002. Alternative methods of measuring hazard perception: Sensitivity to driving experience. In: Road Safety Research, Policing and Education Conference, pp. 81–86, 2000, RS2002. Wood, J.M., Horswill, M.S., Lacherez, P.F., Anstey, K., 2013. Evaluation of screening tests for predicting older driver performance and safety assessed by an on-road test. Accid. Anal. Prev. 50, 1161–1168.

Short-term reliability of a brief hazard perception test

Short-term reliability of a brief hazard perception test

Recommend Documents