Chapter 4
Self-Report Instruments and Methods ¨ zkan Timo Lajunen and Tu¨rker O Middle East Technical University, Ankara, Turkey
1. INTRODUCTION In recent decades, the number of social psychological studies in traffic research has increased drastically. The popularity of self-reports has also increased; for example, a SCOPUS literature search (October 15, 2010) returned three times more studies for the time period 2000e2009 including the search word “Questionnaire” than for 1990e1999. Because social psychological studies are mostly based on self-reports, increased interest in social psychological factors has also resulted in the increased use of self-report methodology. Self-reports include a great variety of different methods, including questionnaires and inventories, interviews, focus groups (Basch, DeCicco, & Malfetti, 1989; Kua, Korner-Bitensky, & Desrosiers, 2007), and driving diaries (Gulian, Glendon, Matthews, Davies, & Debney, 1990; Joshi, Senior, & Smith, 2001; Kiernan, Cox, Kovatchev, Kiernan, & Giuliano, 1999). Common features in all these diverse self-report measures are that participants are aware that they are participating in a study; they are asked to actively reply to more or less structured questions; and their responses are taken as “face valid”dthat is, answers are scored and analyzed based on the responses and not, for example, according to response time or other behavioral or physiological measurement. In self-reports, the content of the responses in this way is assumed to reflect a respondent’s reality. Self-reports and especially questionnaires have many advantages. They are usually less expensive than studies using an instrumented vehicle or a simulator, they provide more detailed information than observations, and they can reach large numbers of people. Representativeness of the sample is easy to establish and can be measured with direct statistical comparisons to driver populations. Moreover, the reliability of items and measurements can be easily evaluated with standard statistics. Due to large samples, complicated and detailed statistical analyses can be conducted. The advantages of self-report-based survey studies are clear. However, self-reports also have some serious shortcomings that should be taken into account when
Handbook of Traffic Psychology. DOI: 10.1016/B978-0-12-381984-0.10004-9 Copyright Ó 2011 Elsevier Inc. All rights reserved.
planning a study. This chapter discusses the possible problems in using survey methods in traffic research. It concentrates on questionnaire studies because the survey questionnaire is the most often applied research method and, thus, a good example for this overview. Driving diaries and logs, as well as interviews, share the same problems with questionnaires but also have serious problems of their own. Therefore, this chapter concentrates on questionnaires as self-reports. Most of the examples are from Driver Behaviour Questionnaire (DBQ) literature because the DBQ (Reason, Manstead, Stradling, Baxter, & Campbell, 1990) is one of the most widely used instruments for measuring driver behaviors and, thus, provides good demonstration material. Note, however, that most of the DBQ-related concerns are serious problems of any road behavior measure based on self-reports.
2. FOR WHAT KIND OF TRAFFIC RESEARCH CAN SELF-REPORT BE USED? A literature search shows that self-report methodology has been used for a wide variety of research, including attitudes, opinions, beliefs, emotions, cognitive processes, and behaviorsdbasically any aspect of driving. While acknowledging that self-reports can be the most appropriate tool or even the only tool for many research purposes (e.g., for measuring attitudes, beliefs, and opinions), selfreports are often misused. Especially self-reports of past accidents, near misses, mileage, and driver behavior can be misleading or biased.
2.1. Two Components of Driving: Driver Performance and Behavior Driving can be seen as being composed of two separate components, driving skills and driving style (Elander, West, & French, 1993), or, in other words, driver performance and driver behavior (Evans, 1991). Driving skills include those information-processing and motor skills that may be expected to improve with practice and training, i.e., with 43
44
PART | I
driving experience (Elander et al., 1993). In addition to learning, driving-related skills can be considered to be affected by a driver’s general information-processing ability. The role of the driver’s general information-processing and motor ability is emphasized when some of those skills have declined either temporarily (e.g., driving while intoxicated) or permanently (e.g., Alzheimer’s disease). The driving task can be described as a skilled activity with several distinct levels that are organized hierarchically (Summala, 1996). This hierarchy, from bottom to top, includes the control (operational), maneuvering (guidance), and planning (navigational) levels (Michon, 1985; Van Der Molen & Botticher, 1988). In the beginning, all these functions need conscious control, but gradually with more practice and driving experience, these functions become increasingly automated (Summala, 1987). In this learning process, basic motor skills are acquired quickly, whereas the development of perceptual skills is slower. For example, beginner drivers learn to use the manual gear and clutch rather quickly but are slow to learn to use their peripheral vision for lane keeping (Mourant & Rockwell, 1972). In terms of measurement with self-reports, it is important to understand that drivers are unaware of most of the automated processes that they perform while driving. Driving style concerns individual driving habitsdthat is, the way a driver chooses to drive. Driving style becomes established over a period of years but does not necessarily get safer with driving experience (Elander et al., 1993). Practice and increased exposure to the diversity of traffic situations result in improved skills but also increased subjective control of driving, less concern for safety, and habitually driving with narrow safety margins (Na¨a¨ta¨nen & Summala, 1976; Spolander, 1983; Summala, 1985). In fact, it has been reported that some safety-related skills,
Theories, Concepts, and Methods
such as scanning patterns and the keeping of adequate safety margins, fail to improve or even deteriorate once explicit tuition is removed and the feedback is not consistent as it is with many other safety-related skills (Duncan, Williams, & Brown, 1991). Not even the specific defensive driving courses focused on anticipatory and safety-oriented driving habits have an effect on drivers’ accident involvement (Lund & Williams, 1985). Because driving is to some extent a “self-paced” task and drivers actually determine their own margin of error, driving style can be assumed to reflect drivers’ individual personality characteristics, attitudes, and motives. In addition, the effect of motivational factors and “extra-motives” (Na¨a¨ta¨nen & Summala, 1976) on driving safety is more remarkable in some driver groups than in others. For example, young male drivers tend to take unnecessary risks more frequently than do young female drivers (Evans, 1991). Drivers can, at least in part, influence their subjective crash risk by deliberately adopting a safe driving style that gives them large safety margins (Na¨a¨ta¨nen & Summala, 1976; Summala, 1980). Figure 4.1 shows how driving skills (driving performance) and driving style (driver behavior) are related to the likelihood of errors and size of the safety margin and, finally, to crashes. In this model, errors and safety margin are treated as outcomes of behavior and skills. The “driving skill” (or driver performance) pathway (dotted arrows) describes how driving experience as exposure to a variety of traffic situations and as practice is related to driving skills, which in turn determine the probability of a driver error. The “driving style” pathway in Figure 4.1 describes how driving experience, personality factors, attitudes and beliefs, and lifestyle are related to driving style (or driving behavior), which in turn influences the sizes of safety margins: The more risky the driving style, the narrower
General cognitive abilities Driving skills Driving experience
Errors
Driving style
Lifestyle
Crash Safety margins
Personality factors Attitudes and beliefs FIGURE 4.1 Two pathways to crash
Chapter | 4
Self-Report Instruments and Methods
margins the driver accepts. Finally, frequent driving errors due to lacking skills, and narrow safety margins, together lead to heightened crash risk. The literature on psychological factors associated with differential traffic crash involvement indicates that both driving skills and driving style are related to crash risk (Elander et al., 1993). Drivers’ maximum performance capabilities do not necessarily predict their accident involvement and, conversely, bad attitudes do not alone cause drivers to lose vehicle control in a curve. Because driving is a self-paced task and drivers can largely determine the task demands, a risky driver actually makes the driving task too difficult for himself or herself so that the demands exceed his or her capabilities. Effective countermeasures should therefore include both the driving skill and style components, and these components should be seen as related to each other.
2.2. Measuring Driver Behavior and Performance with Self-Reports: Driver Behavior Questionnaire and Driver Skill Inventory Although in the literature self-reports have been used for measuring both driving skills (performance) and driving style (driver behavior), we believe that self-report methodology fits better to studies of driver behavior than performance for several reasons. First, driver behavior refers to driving styledthat is, the everyday way of driving that the driver prefers. For example, preferred speed, following distance, and rule obedience are all voluntary behaviors about which drivers are mostly aware. Drivers’ awareness of their driving skills can be assumed to be much lower because basic motor and perceptual processes are automatic and do not need attention. Exposure as learning means that experienced drivers are probably even less aware of their skills than are novices because controlling the vehicle requires conscious attention only in especially demanding situations. For example, shifting gears becomes automatic in the very early stages of learning to drive; thus, the experienced driver is no longer aware of his or her skill level in changing gears. Second, driver behavior consists of behaviors that are often included in highway code or unwritten norms. For example, speeding and driving while intoxicated violate both the traffic code and social norms of driving. Rudeness or aggressive driving might not be illegal but still violates social norms. Because of this normative character of most of the driver behaviors, drivers actually know what the ideal normative behavior would be and can compare their behavior to those norms (e.g., try to drive just slightly above the speed limit). Unlike driver behavior, it is usually difficult to determine the “skill norms” of driving, and drivers are mostly unaware of their skill level. The only
45
time when drivers might realize shortcomings in their skills is when an error leads to some negative consequences, such as a crash or car engine stalling because the driver selected a wrong gear. Because of these two issues, high awareness level and existence of an absolute norm, self-reports fit better to studies of driver behavior than to those of driver performance. Of course, drivers’ views of their skills can be studied with self-reports, but direct measurement of real driving skills with a self-report is practically impossible. The problems of self-reports in the measurement of driving skills (performance) and driver behavior can be well demonstrated by comparing the Driver Behavior Inventory (DBQ) and the Driver Skill Inventory (DSI). The DBQ is based on a theoretical taxonomy of aberrant behaviors (Reason et al., 1990). The main distinction between errors and violations is based on the assumption that they have different psychological origins and demand different modes of remediation (Reason et al., 1990). Errors are the result of cognitive processing problems, whereas violations include a motivational component and contextual demands. Errors were defined by Reason et al. as “the failure of planned actions to achieve their intended consequences” (p. 1315) and differentiated into slips and lapses and mistakes. Violations referred to “deliberate deviations from those practices believed necessary to maintain the safe operation of a potentially hazardous system” (p. 1316). In their first study on DBQ, Reason et al. found that driver errors and violations are two empirically distinct classes of behavior comprising three factors (deliberate violations, dangerous errors, and “silly” errors). Later, Parker, Reason, Manstead, and Stradling (1995) confirmed the three-factor structure of the DBQ. Since the publication of the original article by Reason et al. (1990), the DBQ has become one of the most widely used methods for measuring self-reported driving behaviors. A metaanalytical review by de Winter and Dodou (2010) reported 174 studies in which the DBQ was used either in original or in modified form. Cross-cultural studies have demonstrated the universality of the distinction between errors and ¨ zkan, violations (Lajunen, Parker, & Summala, 2004; O Lajunen, Chliaoutakis, Parker, & Summala, 2006). Erroreviolation distinction seems to also be sample invariant because it has been found among different driver groups, such as professional drivers, motor riders, traffic offenders, probationary drivers, parentechild pairs, young women, and older drivers (de Winter and Dodou, 2010). The DBQ items (abbreviated 19-item form) from a sixcountry study are listed in Table 4.1. By definition, the error items include items in which the driver makes a mistakedthat is, these behaviors are not intentional, although the consequences can be serious. The error items mostly describe situations in which the driver makes an attention or perceptual error (“fail to see pedestrians crossing” and “miss ‘give way’ sign”) or vehicle handling error (“brake
46
PART | I
Theories, Concepts, and Methods
TABLE 4.1 The Means of DBQ Items after Controlling the Effects of Age, Mileage, and Sex and ANOVA Results (F) in Finland (FIN), Great Britain (GB), Greece (GR), Iran (IRN), The Netherlands (NL), and Turkey (TR) DBQ Items (Item No.)
FIN
GB
GR
IRN
NL
TR
F7,1452
Aggressive violations
0.78
0.86
1.66
1.33
0.67
1.20
40.69***
Sound horn to indicate your annoyance (03)
1.00
1.29
2.39
1.75
1.07
1.89
40.73***
Get angry, give chase (11)
0.71
0.32
0.56
1.17
0.18
0.61
31.61***
Aversion, indicate hostility (17)
0.64
0.96
2.06
1.09
0.76
1.12
38.16***
1.21
1.20
0.88
1.21
1.19
0.94
11.33***
Pull out, force your way out (06)
0.34
0.99
0.62
0.79
0.54
0.58
16.77***
Disregard the speed limit on a residential road (07)
2.51
1.69
1.18
2.12
1.88
1.44
29.52***
Push in at last minute (12)
0.49
0.60
0.47
1.15
0.73
0.64
15.81***
Overtake a slow driver on the inside (13)
0.32
0.86
0.89
1.45
1.03
1.42
35.20***
Race from lights (14)
1.35
1.31
1.04
0.84
1.66
0.83
17.03***
Close following (15)
1.40
0.92
0.85
1.21
0.82
0.68
18.12***
Shooting lights (16)
1.09
0.85
0.66
0.77
0.55
0.63
10.30***
Disregard the speed limit on a motorway (19)
2.16
2.41
1.31
1.35
2.31
1.29
33.53***
0.53
0.52
0.62
1.02
0.56
0.73
35.31***
Queuing, nearly hit car in front (01)
0.62
0.68
0.59
1.12
0.55
0.67
13.80***
Fail to see pedestrians crossing (02)
0.80
0.47
0.67
1.10
0.59
0.63
14.73***
Fail to check your rearview mirror (04)
0.80
0.77
0.54
1.15
0.94
1.50
15.03***
Brake too quickly on a slippery road (05)
0.59
0.69
0.67
0.83
0.66
0.83
3.39**
Turning right nearly hit cyclist (08)
0.22
0.30
0.51
0.95
0.39
0.45
28.32***
Miss “give way” signs (09)
0.26
0.25
0.60
0.86
0.32
0.47
22.89***
Attempt to overtake someone turning left (10)
0.23
0.24
0.51
0.74
0.34
0.48
16.65***
Underestimate the speed of an oncoming vehicle (18)
0.74
0.75
0.84
1.45
0.67
0.81
23.34***
Ordinary violations
Errors
*p < 0.05; **p < 0.01; ***p < 0.001. Instruction: “How often, if at all, has this kind of thing has happened to you?” Answer scale: 0, never; 1, hardly ever; 2, occasionally; 3, quite often; 4, frequently; 5, nearly all the time. ¨ zkan, Lajunen, Chliaoutakis, et al. (2006). Source: Adapted from O
too quickly on a slippery road”). The other versions of the DBQ also include lapses, which largely involve failures of memory and are embarrassing but not dangerous (“forgot where you parked your car”). Hence, each of these error items requires the driver to (1) notice that he or she made an error, and (2) remember the error later when asked. The paradox in self-reports of attention or memory mistakes is that most of the errors go unnoticed. The DBQ lapses and error scales require a forgetful driver to recall his or her errors that he or she might not even have noticed. As Bjørnskau and Sagberg (2005) accurately noted, “Unconscious errors may be hard to remember precisely because they are unconscious” (p. 137). We can assume that drivers prone to cognitive failures and forgetting while
driving (e.g., elderly drivers and drivers with slight dementia) also have difficulties in remembering their mistakes when answering self-reports such as the DBQ. The DBQ is only one example; there are several even more demanding scales that measure the cognitive performance of the driver. The DSI (Lajunen & Summala, 1995) provides another example of measurement of driver skills (or errors) with self-reports. However, the DSI does not actually measure skills but, rather, measures a driver’s skill and safety orientation. Drivers are asked to evaluate themselves as drivers by indicating their strengths and weaknesses with regard to driving. Hence, the external criterion (compared to “other drivers”) or absolute criterion (frequency of
Chapter | 4
Self-Report Instruments and Methods
47
TABLE 4.2 Driver Skill Inventory (DSI) Items and Means for Finland (FIN), Sweden (SWE), Turkey (TR), and Greece (GR) DSI Items (Item No.)
FIN
SWE
TR
GR
F3,789e795
Perceptual motor skills
3.73
4.07
3.84
3.90
5.46***
Fluent driving (1)
3.50
3.76
3.86
3.87
8.75***
Perceiving hazards in traffic (2)
2.77
3.39
3.45
3.58
29.86***
Managing the car through a skid (4)
3.50
3.70
3.65
3.39
6.37***
Predicting traffic situations ahead (5)
3.38
3.75
3.87
3.62
14.11***
Knowing how to act in particular traffic situations (6)
3.02
3.65
3.46
3.56
16.42***
Fluent lane changing in heavy traffic (7)
3.51
4.09
4.07
3.81
23.08***
Controlling the vehicle (10)
3.29
4.23
4.05
3.56
34.56***
Make a hill start on a steep incline (13)
3.19
3.93
3.93
3.68
31.30***
Overtaking (14)
2.30
2.80
3.42
3.09
28.01***
Reverse parking into a narrow gap (20)
3.73
4.07
3.84
3.90
5.46***
Driving behind a slow car without getting impatient (3)
3.11
2.39
3.05
2.79
14.94***
Staying calm in irritating situations (9)
3.29
3.13
3.32
3.34
1.36
Keeping a sufficient following distance (11)
3.87
3.48
3.81
3.59
7.24***
Conforming to the speed limits (16)
3.60
2.75
3.59
3.30
25.38***
Avoiding unnecessary risks (17)
3.89
3.73
3.87
3.69
Tolerating other drivers’ errors calmly (18)
3.15
2.96
2.90
3.58
20.02***
Obeying the traffic lights carefully (19)
4.04
4.30
4.18
4.15
2.77*
Safety skills
2.53
*p < 0.05; **p < 0.01; ***p < 0.001. Instruction: “Please indicate your strong and weak components.” Scale: 0, definitely weak; 1, weak; 2, neither weak nor strong; 3, strong; 4, definitely strong. Source: Data from Walle´n Warner et al. (2010).
errors) was replaced by internal comparison. Table 4.2 describes the DSI perceptual motor skill and safety skill items as well as item means for Finland, Greece, Sweden, and Turkey. These items do not measure a driver’s absolute level of skills or proneness to safety behaviors but, rather, his or her orientationdthat is, whether the respondent sees himself or herself as a skillful (in terms of perceptual motor skills) or a safe (rule obedient and risk avoidant) driver. In earlier studies, safety orientation (high scores on safety skills) was a strong predictor of accidents, whereas emphasis on perceptual motor skills was related to heightened accident risk. In Lajunen, Parker and Stradling’s study (1998), safety skills even buffered the effects of driver anger on violations. These comparisons of the DBQ and the DSI show that direct measurement of driver skills or errors is very difficult and possibly unreliable using self-report instruments. However, self-reports can be used reliably to measure a driver’s view or opinion about his or her skills. Although remembering individual
mistakes is difficult, drivers usually have a general idea of themselves as drivers, which can be measured by selfreport instruments. It was previously mentioned that the DBQ also measures violations (i.e., deliberate deviations from safe driving). Violations are conducted to achieve some benefits in traffic, such as saving travel time (e.g., “overtake a slow driver inside”), psychological satisfaction from feeling competent and capable (e.g., “race from lights” and “discard a speed limit”), or to vent one’s anger (e.g., “sound horn to indicate hostility”). Whereas errors are unintended deviations from safe practices, violations are deliberate violations of a social norm, highway code, and/or safe practices. Asking “violating drivers” to honestly report their violations is the second paradox in the DBQ and selfreports in general: We assume that people who like to violate social norms in traffic would respect the honesty required in self-reports. It is more likely that drivers who do not respect general rules and norms in traffic would also not
48
be exactly honest when answering self-report surveys. The DSI again uses a different strategy for assessing a driver’s proneness to violations. Several perceptual motor skill items are actually a reverse measure for proneness to violations (e.g., “staying calm in irritating situations,” “conforming to the speed limits,” and “avoiding unnecessary risks”). The positive phrasing of these items can be assumed to reduce the reluctance to answer the items honestly. In the DSI, the driver compares his or her perceptual motor skills and safety skills. The structure of the DSI encourages a driver to admit some “weaknesses” while he or she can excel in other aspects of driving. Whereas using self-reports as a direct measurement of behavior in the DBQ is problematic and possibly biased, the self-report is used adequately in the DSI.
3. SELF-REPORTS OF ACCIDENTS, NEAR ACCIDENTS, AND MILEAGE So far, it has been demonstrated that self-reports can be used to measure only certain aspects of driving. Although self-reports are not adequate measures of unconscious processes or norm violations that drivers would not like to report honestly, self-reports work well when recording driver attitudes, self-evaluated skills, and beliefs. Interestingly, most of the self-reported drivers’ behaviors are evaluated in terms of how risky they aredthat is, the potential to cause a serious accident. Hence, accident risk seems to be the ultimate criterion for driver behaviors; therefore, either objective or self-reported number of past accidents is used as criterion for safe driving.
3.1. Self-Reports of Accident Involvement In most studies concerning behavioral correlates of individual differences in road traffic accident risk, a driver’s accident history (the number and/or severity of accidents) has been used as a criterion for safety. These accident data are collected either from drivers’ self-reports or from official statistics. However, both are subject to systematic and random error and are therefore somewhat biased (Elander et al., 1993). The advantage of asking drivers to report accidents is that minor crashes can also be recorded. In addition, drivers’ self-reports are usually more detailed than official reports because drivers can be asked very specific questions. However, comparisons between selfreported and state-recorded numbers of accidents have shown some underreporting of accidents in self-reports. McGwin, Owsley, and Ball (1998) studied the agreement between self-reports and state records for identifying crash-involved older drivers. Results indicated that there was a moderate level of agreement between self-reported and state-recorded crash involvement. However, there were significant differences between crash-involved drivers
PART | I
Theories, Concepts, and Methods
identified via state records and those who completed selfreports with respect to demographic (age and race), driving (annual mileage and days per week driven), and vision impairment. The authors concluded that research designed to identify risk factors for crash involvement among older drivers should carefully consider the issue of case definition, particularly if self-report is used to identify crashinvolved older drivers. Whereas McGwin et al.’s (1998) study investigated the agreement between self-reports of accidents and official records among older drivers, Boufous et al. (2010) studied the accuracy of self-report of on-road crashes and traffic offenses among 2991 young drivers in New South Wales, Australia. Participants completed the follow-up questionnaire in which they were asked if they had been involved in an on-road crash or had been convicted of a traffic offense while driving during the year prior to the survey. This information was linked to police crash data. Results showed a high level of accuracy in young drivers’ selfreport of police-recorded crashes and of police-recorded traffic offenses. The authors concluded that surveys may be useful tools for estimating the incidence of on-road crashes and traffic offenses for young drivers. The difference between the results of the study by McGwin et al. and those by Boufous et al. may indicate that self-reports are less reliable among (even healthy) elderly drivers than among young drivers. Hence, the choice of accident recording method should also take into account the sample characteristics. Self-reports of accidents are easily biased by intentional or unintentional misrepresentation (Elander et al., 1993). The latter source of bias can be caused either by a different definition of reportable accidents between drivers or by simple forgetting. In a study by Loftus (1993), 14% of people involved in road accidents leading to an injury did not remember the event a year later. In a series of studies, Maycock and colleagues asked drivers to report accidents during the past 3 years and found that the forgetting rate was rather high at approximately 30% per year (Maycock, 1997). The forgetting rate was lower (14%) for serious accidents leading to injuries than for those accidents that resulted in only material damages. When the accidents were recalled over a 3-year period, more accidents were reported for the more recent times during the period than for longer ago (Maycock, 1997). Based on these studies, Chapman and Underwood (2000) concluded that “there is reason to suspect that even severe accidents may be routinely forgotten by normal drivers over periods as short as a year” (p. 33). They also suggested that minor accidents are forgotten at much higher rates. Although self-reports of accidents seem to be more or less problematic and the degree of accuracy seems to depend on various factors (e.g., time period, age of the participants, and the way in which the survey is conducted), official
Chapter | 4
Self-Report Instruments and Methods
accident records from the police, hospitals, or insurance companies also seem to have many shortcomings. Forgetting, various definitions of accidents, or deliberate underreporting typical of self-report do not distort the official accident records, but official records have some other limitations. First, the police or insurance companies’ accident records do not include minor damages. Second, some driver groups, such as older drivers, can be overrepresented in these records for reasons not related to their risk of accident (Elander et al., 1993). For example, elderly drivers are overrepresented in the official accident statistics because they have a greater risk of being injured or dying compared to young drivers (Evans, 1991). The same statistical phenomenon caused by differential injury risk may be observed when males and females are compared (Evans, 1991). Arthur et al. (2005) assessed the convergence of selfreport and archival crash involvement and moving violations data in a 2-year longitudinal follow-up study. Results suggested a lack of convergence between self-report and archival data at both time 1 and time 2. Moreover, the self-report data included a broader range of incidents (more crashes and tickets) than did state records. Similar conclusions were drawn by Anstey, Wood, Caldwell, Kerr, and Lord (2009), who evaluated associations between selfreported crashes and state crash records among 488 community-dwelling participants aged 69 to 95 years. Crash history data were obtained from state records (5-year retrospective and 12-month prospective), retrospective selfreport (5 years), and prospective monthly injury diaries (12 months). As in the study by Arthur et al., respondents reported more accidents than were recorded in official records: During the past 5 years, 22.3% of respondents reported a crash, and 10.0% reported a crash in the 12-month follow-up period, whereas 3.2% of the sample had state crash records during the previous 5 years and 0.6% had state-recorded crashes during the 12-month follow-up period. The authors concluded that caution should be applied when using state crash records as an outcome measure in driving research and suggested that retrospective self-reported accidents over 5 years are preferable. In addition to the age and gender bias as well as the omission of minor accidents, official accident records are not always available because of data protection acts and because the period for recorded accidents might be limited. Also, the type and role of the parties (e.g., culpability) in the accident are rarely available. Studies on self-reports of accidents and state records show that both recording methods have considerable shortcomings and strengths. In conclusion, it seems that the best self-report method for recording accidents is a retrospective self-report of a of maximum of 5 years. The shorter the reporting period, the smaller the underreporting bias due to forgetting should be. If possible, self-reports should be complemented with state accident records.
49
3.2. Self-Reported Near Accidents as a Criterion for Safe Driving Because the forgetting rate increases as the time period for recall becomes longer and when minor accidents are asked for, we could suppose that self-reports of serious accidents, such as in the previous year, would be the best estimate. However, accidents and especially serious accidents are infrequent. This means that samples in accident studies must be enormous, which is impossible for most of the studies. One approach to this problem is to record near accidents, which are extremely frequent events (Chapman & Underwood, 2000) and may lead to a crash in less optimal conditions. By definition, near accidents can be assumed be a good candidate for a general measure of safety. Chapman and Underwood (2000) compared reports and recalls of more than 7000 car journeys from 80 subjects during the course of a year. These records included more than 400 reports of near accidents. The results showed that near accidents are generally forgotten extremely rapidly, with an estimated 80% of incidents being no longer reported after a delay of up to 2 weeks. If the near accident was a serious one or the respondent was the guilty party, the forgetting rate was smaller. Chapman and Underwood’s study shows that although near accidents are very frequent and, thus, easier to use in statistical analyses than actual accidents, the high forgetting rate makes them an unreliable measure of risk. Near accidents can best be recorded by using a driving diary technique, in which drivers keep a log about their near accidents for a number of weeks. Because near accidents are recorded as soon as possible after they happen, the forgetting rate remains low, whereas the number of near accidents is much higher.
3.3. Self-Reports of Mileage and Driving Experience Mileage is one of the main factors measured in studies on driver behavior. Mileage as total lifetime mileage or annual mileage and type of exposure are extremely important variables because mileage (and exposure) largely determines a driver’s likelihood of being involved in an accident. Moreover, particular driver characteristics may lead only to particular forms of driving behavior, which may thereby affect liability only for accidents of certain types (Elander et al., 1993).
3.3.1. Mileage, Exposure, and Accident Involvement Overall mileage has been consistently reported to be associated with accident involvement (French, West, Elander, & Wilding, 1993; Quimby & Watts, 1981). This effect may be largely caused simply by greater exposure,
50
indicating that people who spend much time on the road are also exposed to the danger of having accidents more than people with low annual mileage (Summala, 1996). In addition to this simple effect on accident involvement based on exposure, the overall mileage has much more complicated effects on driving style and safety. It has been found that the relationship between mileage and accidents is not linear but, rather, a negatively accelerating curve, with smaller increases in accident rate at a higher level of mileage (Maycock, 1997). In the beginning of a driver’s career, high mileage means increased exposure and increased accident involvement, but for experienced drivers the higher levels of mileage do not increase the number of accidents in the same ratio (Evans, 1991; Maycock, 1997). The most likely explanation for this finding is that drivers with very high mileage per year drive mostly on relatively safe roads (Elander et al., 1993). People who drive low mileages tend to accumulate much of their mileage on congested city streets with two-way traffic and no restriction of access, whereas high-mileage drivers typically accumulate most of their mileage on freeways or other divided multilane highways with limited access. Because the driving task is simpler, the accident rate per mile is much lower on freeways, and beyond a certain point, a person driving half as many miles as another would be expected to have considerably more than half as many accidents (Janke, 1991). In addition, drivers with exceptionally high overall mileage may have developed expertise based on both extensive practice and interest in driving. It is even possible that drivers with very high annual mileage adopt a safer driving style than moderately experienced drivers, although mileage has been reported to correlate with faster driving (Wilson & Greensmith, 1983). The effects of mileage on driving style and accidents depend largely on the type of exposure. The type of roads usually driven (motorways vs. country roads, and traffic density), time of year (winter vs. summer), time of the day spent on the road (daylight vs. night), and usual purpose of driving (work related vs. free time) all affect the likelihood of accident involvement and driving style. Special effort should therefore be directed to measure these factors related to exposure in studies on driver style and accident involvement because exposure may be related to psychological variables under investigation. Ignoring the role of driving experience and exposure can increase error variance and reduce the true associations between psychological variables and accident frequency (Elander et al., 1993).
3.3.2. Measuring Mileage with Self-Reports In driver behavior studies, mileage is traditionally measured with either driving diaries or questionnaires, which are both based on self-reports. Self-reports of mileage share to some degree the same problems and bias sources as self-reports of
PART | I
Theories, Concepts, and Methods
accidents. The accuracy of self-reports has been questioned in several studies, and in-vehicle recording devices have been suggested as an alterative to self-reports (Blanchard, Myers, & Porter, 2009). Although in-car recordings certainly provide a more unbiased and accurate measure of exposure than selfreports, their usefulness is still very limited to certain types of studies. For example, large-scale population studies do not allow direct recordings of driving frequency and amount. Moreover, many studies require anonymous participation; thus, direct measurements cannot be considered. Although mileage self-reports share some of the problems with self-reports of accidents (e.g., drivers forget to report some trips), self-reports of mileage are less biased than self-reports of accidents for several reasons. An accident is a single event, but the estimation of mileage is continuous, which makes it naturally easier to evaluate. Whereas accidents can be embarrassing or even traumatic events that one would like to forget, mileage is a rather neutral issue. Moreover, there are more techniques for reporting exposure than accident involvement: Drivers can be asked to report their lifetime mileage, mileage during a certain period of time (year, month, week, or day), frequency of driving in days or number of trips, in addition to the type of exposure (e.g., in-city traffic, highways, and night driving). Drivers can also relate their lifetime mileage to other more objective measures, such as ownership and frequency of car use. Smith and Wood (1977) performed a study in which employees were asked to recall the individual business journeys that they had made during the previous 6 or 8 months, and then these recalls were compared with the expense claims submitted during the period. During 6 months, 27.3% of car journeys appeared to be forgotten, and this percentage increased to 34.8% for the 8-month period. Note that although frequency of trips is an estimate for mileage, it might be more vulnerable to forgetting than self-reports of annual or monthly mileage because drivers are more likely to forget an individual trip than how much they drive in general. Staplin, Gish, and Joyce (2008) and Langford, Koppel, McCarthy, and Srinivasan (2008) studied the association between the extent of driving and crash involvement: The lower the annual mileage driven, the higher the per-distance crash rate. According to these studies, there is a clear pattern of misestimating for those who self-report an extremely low or extremely high number of miles driven, which casts serious doubt on self-reports, especially when using extreme estimates of mileage. Both studies demonstrated overestimation by the highest mileage drivers and underestimation by the lowest mileage drivers, with underestimation also linked specifically to a travel pattern composed of frequent but short trips. In both studies, the need for objective exposures instead of self-reports of exposure was highlighted.
Chapter | 4
Self-Report Instruments and Methods
Studies seem to indicate that self-report bias in mileage estimates can be best solved by not using self-reports but, rather, relying on, for example, GPS-based in-car recordings. Because this is not possible in many survey studies, the best remedy for reducing self-report bias is to use as many estimates of mileage as possible. For example, total annual or monthly mileage, frequency of driving (number of trips per time unit), or the proportion of long trips can be asked, and the final estimates can be based on several self-report indicators.
4. VALIDITY OF SELF-REPORTS OF DRIVING Validity simply means the extent to which a test (or a selfreport) measures what it is intended to measure. Validation strategies are traditionally divided into techniques for evaluating the validity of measurement and techniques for evaluating the validity of decisions based on the test scores. The first approach to validation includes content and construct validation strategies, which are aimed at investigating if the contents of the test are adequate and if the correlations with other similar tests and measurements indicate that the structure of the test is supported by empirical evidence. The second group of strategies provides tools for investigating criterion-related validitydthat is, the correlations between the test score and the criterion. Self-report measurements in traffic research do not differ from the usual psychological and educational measurements; thus, the same criteria can be used for assessing the measurement validity. Before validity can be evaluated, the test scores have to show a certain level of reliability. Reliability refers to consistency of measurement and can be evaluated using different strategies, including split-half and alpha reliability for evaluating internal consistency and testeretest (including parallel form reliability) for evaluating the stability of the scores over time. Because reliability is a perquisite for validity, reliability analysis has to be completed successfully before any validity coefficients can be calculated. Because test reliability depends on the length of the test (number of items), quality of items, and the sample and is thus a very technical characteristic of the test, tests published in international peer-reviewed journals usually show adequate reliability.
4.1. Assessing the Reliability and Validity of a Self-Report Instrument of Driving: The Driver Behavior Questionnaire The DBQ (Reason et al., 1990) provides a good example for demonstrating how a test has been validated in traffic ¨ zkan and colleagues research. Two studies by O
51
¨ zkan et al., 2006; O ¨ zkan, Lajunen, & Summala, 2006) (O are used to demonstrate how reliability and validity of a self-report instrument can be assessed.
4.1.1. Reliability Virtually all DBQ studies report internal reliability coefficients for the DBQ scales. The general finding is that both violations and error scales show adequate reliabilities (internal reliability coefficients are usually ~0.80). When the error scale is split into dangerous errors and nondangerous “silly” mistakes, and the violations scale is split into ordinary violations and aggressive violations, the reliability coefficients tend to be somewhat lower. For ¨ zkan, Lajunen, and Summala’s (2006) study, example, in O the alpha reliabilities for errors and violations were 0.84 and 0.83, respectively. When errors were divided into mistakes and lapses and violations were divided into ordinary violations and aggressive violations, the alpha reliabilities were 0.81 for mistakes, 0.67 for lapses, 0.79 for ordinary violations, and 0.74 for aggressive violations ¨ zkan, Lajunen, & Summala, 2006). This slight decrease (O in alpha coefficients does not indicate unreliability but, rather, reflects the fact that the number of the items in the scale is directly related to the strength of the reliability coefficient. In addition to internal consistency (alpha coefficient or split-half reliability), the temporal stability of the scores indicates reliability of the instrument. Stability of the scores over time can be evaluated by calculating a testeretest reliability coefficient in which the correlation between time 1 and time 2 scores is calculated. High correlation between scores indicates high temporal stability. Because testeretest reliability analysis requires testing the same drivers twice, studies reporting retest reliabilities are rare, especially when the time gap between measurements is long. Parker et al. (1995) conducted a survey in which 1600 drivers’ DBQ responses were analyzed. Seven months after their original responses, 80 respondents completed the DBQ again. Testeretest reliabilities were 0.69 for errors, 0.81 for violations, and 0.75 for lapses, which indicate relatively high reliability over time (Parker et al., 1995). The retest sample in the study by Parker et al. (1995) ¨ zkan, Lajunen, and Summala (2006) was small. Later, O assessed testeretest reliability of the DBQ in a sample of 622 drivers. The time gap between the measurements was 3 years, which reduces the probability of respondents remembering their initial answers to the DBQ items. The testeretest reliability was 0.50 for errors, 0.76 for viola¨ zkan, Lajunen, tions, and 0.61 for the whole scale (O & Summala, 2006). The following conclusions based on ¨ zkan, Lajunen, Chliaoutakis, Parker et al. (1995) and O et al. (2006) can be drawn. First, DBQ scales and especially
52
violations show sufficient temporal stability. Second, the error scale shows much lower testeretest stability than the violations scale. This finding may not indicate problems in the temporal stability of the error scale because it can be assumed that among novice drivers the error scores should decrease as a function of mileage: The more young drivers drive and gain experience, the less likely they are to make errors. On the other hand, the opposite might be true for elderly drivers: The older the drivers become, the more frequent cognition-related errors occur. Moreover, the difference in stability score supports the distinction based on driving style (the way drivers tend to drive) and driving skills: Driver performance (lack of errors) can improve, but driving style (violations) as a habitual way of driving stays the same. In addition to the number and quality of the items, reliability coefficients also depend on the sample characteristics. The importance of sample characteristics can be especially seen when the reliability of a self-report instrument is studied among different driver groups (e.g., novices, elderly drivers, and professionals) in one culture or when similar samples from different countries ¨ zkan, Lajunen, Chliaoutakis, and cultures are compared. O et al. (2006) investigated the applicability of the DBQ in six different countries (Finland, Great Britain, Greece, Iran, The Netherlands, and Turkey). A total of 242 drivers were chosen from each of the six countries and were matched for age and sex. Reliabilities were compared to the original British data that were used for developing the DBQ. According to the results, reliabilities of the scales were at the same level as in the original British data. This finding demonstrates the cross-cultural applicability of DBQ.
4.1.2. Content Validity After determining that the instrument has reasonable reliability coefficients in the study population, the next step is to evaluate the content and construct validity of the instrument. Content validity is demonstrated when we can say that the test provides an adequate sample of a particular domain (Guion, 1977). “Adequate sample” means that test items in a self-report test cover all the relevant topics in the domain without including items that belong to another domain. For example, DBQ ordinary (nonaggressive) violations should cover all the most common and thus representative behaviors of “deliberate deviations from those practices believed necessary to maintain the safe operation of a potentially hazardous system” but not violations with aggressive motivation because those behaviors belong to the DBQ aggressive violations scale. This distinction is not always clear because some behaviors, such as “close following” or “forcing one’s way,” might contain an aggressive motivation but can also be violations without aggressive content.
PART | I
Theories, Concepts, and Methods
The procedure for assessing content validity usually consists of the following three steps: (1) Describe the content domain (e.g., aberrant driving behavior), (2) determine the area of the domain (e.g., type of aberrant behavior) measured by each item (e.g., “disregard the speed limit on a residential road” measures highway code violations without aggressive aims, and “miss a ‘give way’ sign” measures potentially dangerous “errors”), and (3) compare the structure of the self-report instrument with the structure of the content domain. Hence, content validity cannot be evaluated numerically by using an index, but the process is based on qualitative comparisons between the theoretical model and contents of the instrument in which self-report test items (questions) as measurements are compared to theoretical constructs. This complete system of relationships among the constructs and behaviors is called as nomological network (Cronbach & Meehl, 1955). Such detailed content analysis has not been used for evaluating the DBQ, so the question about the content validity of DBQ is still open. We can only state that the DBQ typology of dangerous but unintentional errors and deliberate violations seems to match well with the driver behavior performance (or style skills) model presented in Figure 4.1. This can be considered as support for content validity.
4.1.3. Construct Validity: Internal and External Validity Whereas content validity indicates the degree to which a self-report instrument “looks as it should,” construct validity shows to what degree the instrument “performs as it should.” Content validity analysis is based on qualitative comparisons, whereas construct validity is established when the correlation pattern between the instrument and other relevant measures is as theoretically expected. Hence, the test shows high construct validity when its items provide a good measure of the specific construct. Construct validity includes both convergent (test correlates with other tests measuring the same construct) and discriminant validity (test does not correlate with theoretically unrelated measures). Construct validity includes both the validity of the factor structure and relationships to other tests. The validity of the structure of the scale is usually assessed by using either exploratory or confirmatory factor analysis. Another possibility to determine if the test measures the construct concerned is to perform an experimental study in which the same aspect as supposedly measured by the test is experimentally manipulated. For example, driver skill training should increase drivers’ skill scores in the DSI (Lajunen & Summala, 1995) but not safety skills. Construct validity is most often evaluated by investigating a test’s correlation to earlier similar instruments. High correlations in the expected direction indicate construct validity. The most
Chapter | 4
Self-Report Instruments and Methods
advanced way of evaluating construct validity is the multitraitemultimethod approach (Campbell & Fiske, 1959). In this method, all constructs (both those that should be related and those that should not) are measured with different measurement methods. In driver behavior, this could mean self-reports, peer opinions (e.g., spouse), in-car measurements in real traffic, and simulator recordings. If the measurements of the same construct by using different methods correlate, this indicates construct validity. On the other hand, measurements of different unrelated constructs with the same methodology should not correlate. Correlations between unrelated constructs indicate method bias. Development of the Driver Social Desirability Scale (DSDS) (Lajunen, Corry, Summala, & Hartley, 1997) demonstrates how construct validity can be evaluated by using exploratory factor analysis for testing the structural validity of the scale and correlations with existing standard instruments for testing convergent validity. According to Lajunen et al., following the theory of Paulhus (1984), social desirability consists of two components: selfdeception and impression management. Lajunen and colleagues developed a traffic-specific social desirability scale (DSDS) to control self-deception and impression management in self-reports of driving. Data were collected
53
both in Australia and in Finland to minimize cultural bias. The results of exploratory factor analysis showed that items grouped into two independent factors as hypothesized, which can be taken as an indication of construct validity of the structure (Table 4.3). Second, construct validity of the newly developed DSDS was evaluated by calculating correlations with the Balanced Inventory of Desirable Responding (BIDR) (Paulhus & Reid, 1991), which was used as a model for developing DSDS. Table 4.4 shows that the Driver Impression Management (DIM) scale had stronger correlations with the BIDR Impression Management (IM) scale than with the BIDR Self-Deception (SD) scale. Moreover, Driver Self-Deception (DSD) had stronger correlations with BIDR-SD than with BIDR-IM. This was especially true for the Finnish data. These correlations indicate an adequate level of convergent and discriminant validity for the DSDS scale. The factor structure of the DBQ has been well studied ¨ zkan, Lajunen, Chliaoutakis, et al., 2006; O ¨ zkan, (O Lajunen, & Summala, 2006). Based on comparisons between matched samples (by age and sex) from six countries (Finland, Greece, Iran, The Netherlands, United Kingdom, and Turkey) using confirmatory factor analysis,
TABLE 4.3 Factor Loadings for the Driver Social Desirability Scale Items in Australian (A) and Finnish (F) Samples F1
Item
F2
A
F
A
F
I have never exceeded the speed limit.
0.49
0.65
I have never wanted to drive very fast.
0.55
0.56
I have never driven through a traffic light when it has just been turning red.
0.45
0.63
I always obey traffic rules, even if I’m unlikely to be caught.
0.71
0.67
I always keep sufficient distance from the car in front of my car.
0.34
0.47
If there were no police control, I would still obey speed limits.
0.72
0.77
I have never exceeded the speed limit or crossed a solid white line in the center of the road when overtaking.
0.42
0.55
I always know what to do in traffic situations.
0.60
0.65
I never regret my decisions in traffic.
0.60
0.60
I don’t care what other drivers think of me.
0.33
0.31
I always am sure how to act in traffic situations.
0.91
0.77
0.61
0.59
Driver Impression Management (DIM)
Driver Self-Deception (DSD)
I always remain calm and rational in traffic.
0.35
Instruction: “The following items concern your driving in different situations. Please express your agreement or disagreement with each statement, selecting a number from the scale.” Answer scale: 7-point scale ranging from “not true” (1) to “quite true” (4) and “very true” (7). Source: Adapted from Lajunen et al. (1997).
54
PART | I
Theories, Concepts, and Methods
TABLE 4.4 Correlation Coefficients between Driver Social Desirability Scales (DSDS) and Paulhus’ Balanced Inventory of Desirable Responding (BIDR) in the Finnish (FIN) and Australian (AUS) Dataa Scale
DSDS-DIM
DSDS-DSD
DSDS
BIDR-IM
BIDR-SD
DSDS-DSD AUS
0.21**
FIN
0.17*
DSDS
0.86***
0.61***
0.85***
0.67***
AUS
0.54***
0.51***
0.63***
FIN
0.48***
0.05
0.38***
AUS
0.16*
0.47***
0.34***
0.44***
FIN
0.16*
0.42***
0.34***
0.31***
AUS
0.30***
0.43***
0.39***
0.66***
0.65***
FIN
0.42***
0.25***
0.45***
0.88***
0.73***
AUS FIN BIDR-IM
BIDR-SD
BIDR
a
Correlation coefficients indicating convergent validity are shown in boldface type, and correlations indicating discriminant validity are underlined. *p < 0.05; **p < 0.01; ***p < 0.001.
¨ zkan and colleagues concluded that the fit of the threeO factor model (aggressive violations, ordinary violations, and errors) of the DBQ was partially satisfactory in each country. Exploratory factor analyses together with target (Procrustes) rotation and factorial agreement indexes showed that the “ordinary violations” factor was fully congruent and the “errors” factor was fairly congruent ¨ zkan, Lajunen, Chliaoutakis, et al., across countries (O 2006). The two-factor structure based on violations and errors seems to be universally valid and has been found in different driver groups, including professional drivers, motor cycle drivers, offenders, probationary drivers, parentechild pairs, young women, and older drivers (de Winter & Dodou, 2010). Although the DBQ yields slightly different factor structures in different countries, the core structure of the instrument seems to be stable, showing high construct validity. In addition to the factorial validity, the construct validity of the DBQ in terms of other tests has been investigated. de Winter and Dodou (2010) concluded that “the DBQ errors and violations factors are strongly situated in a network of correlations with other questionnaires and tests” (p. 463) and listed a great variety of factors to which the DBQ scale scores have been related. Although the
correlations between DBQ and the variety of indicators are impressive, they still do not prove that the DBQ is a valid instrument. For example, correlations to other driving questionnaires might be based on method bias: The items in different driving inventories are often very similar even though the names of the scales suggest different contents. The most robust validation evidence for the DBQ is the correlation between driver behavior (i.e., errors and violations) measured during real driving and driver behavior measured in a simulator. Despite the large number of DBQ studies, there are no studies in which the DBQ has been systematically compared with errors and violations in real driving, which naturally casts doubts on the construct validity of the DBQ (Af Wa˚hlberg, 2010). The lack of robust (i.e., from other types of measurements besides self-reports) validation evidence is not only a problem specific to the DBQ but also applies to many self-reports of driving. One reason for the lack of validation studies might be the practical difficulties: Driver samples in the in-car observation studies have usually been quite small, whereas questionnaire studies usually require large samples. Another problem is that a short drive with an instrumented car is unlikely to capture differences between drivers that, for example, the DBQ is designed to measure.
Chapter | 4
Self-Report Instruments and Methods
For example, it is unlikely to observe aggressive or even ordinary violations in study conditions. Moreover, the DBQ measures the frequency of aberrant behaviors in a year’s time, whereas a test drive with an instrumented car lasts only a few hours. Therefore, it is not surprising that selfreports of driving lack evidence of convergent validity. West, French, Kemp, and Elander (1993) compared self-reports of speed, calmness, and deviant driver behavior to similar observed behaviors among 48 drivers. According to results, observed speed on the motorway correlated well with drivers’ self-reports of normal driving speed, observed calmness correlated with self-reported calmness, and observed carefulness correlated with self-reported deviant driving behavior. Hence, in this study, self-reports seem ˚ berg (2002) to reflect driver behavior well. Haglund and A compared self-reported speed measures and observed speed among 533 drivers. The observed speed had a significant correlation of 0.36 with self-reported speed, 0.37 with normal speed, and -0.42 with intention to keep speed limit. Although these correlations are mediocre, they are still statistically significant and indicate that self-reports show some construct validity compared to objective observations.
4.1.4. Criterion-Related Validity: Concurrent and Predictive Validity Concurrent and predictive validity refer to validation strategies in which the predictive value of the test score is evaluated by validating it against certain criterion. In the case of driver behavior, the most used criterion is a driver’s accident involvement. Hence, a self-report of driving shows validity if it is related todpreferably predictsdaccident involvement. In concurrent validation, the test scores and criterion variable are measured simultaneously. In predictive validation, the test scores are obtained in time 1 and the criterion scores in time 2, which allows one to evaluate the true prediction power of the self-report instrument. One of the strengths of the DBQdespecially for violationsdis that it has strong correlations with drivers’ ¨ zkan, Lajunen, Chliaoutakis, et al., accident involvement (O ¨ 2006; Ozkan, Lajunen, & Summala, 2006). The results of de Winter and Dodou’s (2010) meta-analytical study showed that both DBQ violations and errors correlated with self-reported accidents. However, the correlations of errors and violations with recorded accidents were not statistically significant, although this might be due to the small number of samples included in the meta-analysis. The metaanalysis also showed that errors and violations correlated negatively with age and positively with exposure, and that males reported fewer errors and more violations than females, which are all common findings in the DBQ literature. In addition to retrospective design, de Winter and Dodou investigated the DBQ and self-reported accidents
55
prospectively in a sample of 10,000 beginner drivers, who answered the DBQ after 6, 12, 24, and 36 months of licensure. The results of this study showed that the error and violation factor predicted accidents prospectively and retrospectively. Because de Winter and Dodou’s metaanalysis included a sample of more than 45,000 respondents and the prospective sample was also large, it can be concluded that the DBQ shows relatively high predictive validity in terms of self-reported accidents. Due to the small number of studies that have used official accident records as criterion, the predictive validity of DBQ in terms of officially recorded accidents is still unclear.
4.2. Socially Desirable Responding in SelfReports of Driving We previously mentioned forgetting as one source of bias influencing accident and near accident self-reports. Forgetting can also naturally influence self-reports of driving behavior such as violations, but because driving behavior refers to how a driver chooses to drivedthat is, to driving style in everyday situationsdforgetting should have only a minor role in self-reports of driving behavior. Whereas errors are failures of cognitive processes (e.g., forgetting to check the rearview mirror when overtaking) and thus usually go unnoticed, driving behaviors are something that drivers choose to do and that they do repeatedly (e.g., speed choice). We can assume that selfreports of driving behavior and especially self-reports of aberrant driver behaviors are influenced by socially desirable responding rather than forgetting.
4.2.1. Socially Desirable Responding as Impression Management and Self-Deception Social psychological studies have shown that self-reports of personality, attitudes, and behavior are inaccurate or even biased to some degree because at least some subjects tend to engage in socially desirable respondingdthat is, a tendency to give answers that make the respondent look good (Paulhus, 1984; Paulhus & Reid, 1991). Consistent with these findings, it can be assumed that self-reports of driving behavior and driver attitudes are somewhat biased by socially desirable responding. A large number of studies using different measures of socially desirable responding have shown that it consists of two distinct factors called “impression management” (or “other-deception”) and “self-deception” (Paulhus, 1984; Paulhus & Reid, 1991). In this distinction, impression management refers to the deliberate tendency to give favorable self-descriptions to others and therefore is close to lying and falsification (Paulhus & Reid, 1991). The deliberate nature of impression management is also manifested in the finding that impression management increases in public settings and
56
seems to be a situation-dependent phenomenon (Paulhus, 1984). It has been recommended that impression management should be controlled when it is conceptually independent of the trait being assessed but still contributes to the self-reported scores of that trait (Paulhus, 1984; Paulhus & Reid, 1991). The self-deception factor can be characterized as a positively biased but subjectively honest self-description. In contrast to impression management, self-deception is an unintentional tendency and not influenced by the anonymity versus public context manipulation (Paulhus, 1984). Unlike impression management, self-deception has been reported to be intrinsically linked to positive personality constructs such as psychological adjustment (Sackeim & Gur, 1979; Taylor & Brown, 1988), high self-esteem (Paulhus & Reid, 1991), and lack of neuroticism (Borkenau & Ostendorf, 1989; Paulhus & Reid, 1991). These findings suggest that self-deception could be used for the purposes of gaining pleasure (ego enhancement) as well as avoiding pain (denial) and therefore provides an aid for coping with negative life events and threatening information (Paulhus, 1984; Paulhus & Reid, 1991). Self-deception appears to be more of a personality construct than simply a distorting factor. Paulhus (Paulhus, 1984; Paulhus & Reid, 1991) suggests that self-deception should not be controlled if it is an intrinsic aspect of the personality construct concerned. Note, however, that the bias caused by self-deception may not be constant over situations. Some situations can be more threatening than others and, therefore, elicit stronger need for self-deception.
4.2.2. Socially Desirable Responding in SelfReports of Driving Behavior In self-reports of traffic behavior, impression management may cause serious bias. The majority of studies concerning personality and motivational factors related to accident proneness use retrospective designs in which accident history and punishments are elicited by self-report and then correlated with personality and background variables. It can be hypothesized that this kind of design is extremely liable to deliberate impression management. In fact, earlier findings show that drivers tend to report speeding tickets honestly but “forget” their involvement in other types of traffic violations (Summala & Hietamaki, 1984). In addition to impression management, the construct of selfdeception can also be hypothesized as an important factor in driving behavior. Drivers’ sense of control in traffic and trust in their own capabilities as drivers also increase with driving experience and improvement in skills. An exaggerated sense of control and confidence in one’s judgment and skills constitutes a real risk factor in traffic, where proper alertness and anticipation of possible risks are essential for safety (Summala, 1988).
PART | I
Theories, Concepts, and Methods
Lajunen and colleagues (1997) investigated the relationship between their DSDS and self-reported accidents as victim and as responsible party, number of tickets, speeding (100 km/h (62 mph) roads and 60 km/h (37 mph) roads, in general), overtaking, rule compliance, and the Driver Behavior Inventory scales (Glendon et al., 1993) dislike of driving, driver aggression, and driver alertness. The samples consisted of 203 Finnish and 201 Australian drivers. Correlation analyses also indicated that driver impression management (lying) was negatively related to the self-reported number of accidents and punishments, overtaking frequency, speeding, and driving aggression, and it was positively related to traffic rule compliance. Driver self-deception correlated positively with variables measuring sense of control in traffic (Lajunen et al., 1997). Lajunen and Summala (2003) investigated the effects of socially desirable responding on self-reports of driving by recording self-reports of driving in both public and private settings. In public settings, 47 applicants for a driving instructor training course completed the DBQ and the BIDR as part of the entrance examination. In a private setting, 54 students in the training course completed the same questionnaires anonymously in the classroom. Comparisons showed a difference between the two settings in six DBQ item scores such that aberrant behaviors were reported less frequently in the public setting than in the private setting. The authors concluded that bias caused by socially desirable responding is relatively small in DBQ responses. Note, however, that the study was based on a between-subjects design (i.e., the same respondents were not followed) and that an entrance examination for a driving instructor training course hardly reflects ordinary drivers’ responses. Sullman and Taylor (2010) replicated Lajunen and Summala’s (2003) study by using a repeated measures design. A sample of 228 undergraduate students completed the DBQ and a measure of socially desirable responding in class, which constituted a public place, and then did so again 2 months later in the privacy of their homes. As expected, participants demonstrated higher levels of general social desirability in the public setting than in the private setting. None of the DBQ items were significantly different across the two locations, and the authors concluded that the DBQ is not particularly vulnerable to socially desirable responding. Note, however, that the study was not counterbalanced, and that the difference in privacy in “private” and “public” settings was actually not maximized because in both conditions subjects’ names were asked. Af Wa˚hlberg (2010) composed a questionnaire that included scales from several well-known driver inventories and distributed it three times to a group of young drivers in a driver education program and twice to a random group. The DIM scale from the DSDS (Lajunen et al., 1997) was
Chapter | 4
Self-Report Instruments and Methods
used to control for socially desirable responding. Whereas in earlier studies only the correlations (Lajunen et al., 1997) or group differences in quasi-experimental settings were studied (Lajunen & Summala, 2003), Af Wa˚hlberg controlled the effects of impression management when calculating the predictive power of driver behavior inventories. All self-report instruments, including the DBQ, included in the study correlated negatively with impression management, indicating bias: The correlations between the DBQ violation scale and impression management were -0.51 and -0.45. Moreover, the predictive power was more than halved when social desirability was controlled for. Impression management also correlated with self-reported accidents and penalty points in both samples. Similar influence of impression management on self-reported accident involvement (but not official records) was also found in an earlier study (Af Wa˚hlberg, Dorn, & Kline, 2009). The authors concluded that whenever self-reported accidents are used as an outcome variable and predicted by other self-report measures, a lie scale should be included and used for correcting the associations. The conclusion about self-report instruments was even more serious. According to Af Wa˚hlberg, even the most well-known psychometric scales used in driver research are susceptible to social desirability bias.
4.2.3. How to Cope with Socially Desirable Responding in Self-Reports of Driving The literature on socially desirable responding and selfreports of driving seems to be mixed. Whereas driver behavior scales seem to have significant correlations with socially desirable responding, quasi-experimental studies do not seem to indicate any serious bias in self-reports of driving. One possibility is to stop using self-reports of driving and accidents and to rely only on observed behavioral data and official accident records, as some researchers seem to suggest (Af Wa˚hlberg, 2010; Af Wa˚hlberg et al., 2009). The other possibility is to let the use of self-reports of driving go unchallenged and accept the small social desirability bias as an innate characteristic of self-reports. The first alternative would limit behavioral traffic research immensely because many fields, especially social psychology, require use of self-reports. For example, driver attitudes, opinions, and attributions cannot be measured “objectively” but only with self-reports. Moreover, the objective measures also have serious methodological limitations, as studies using an instrumented vehicle, simulator, or laboratory tests show. Official accident records suffer their own sources of bias. Studies conducted by Af Wa˚hlberg (2010) and Af Wa˚hlberg et al. (2009) show that the second alternative is not an option: Traffic researchers cannot continue ignoring bias in the self-reports of driving and outcomes.
57
Self-report research methodology offers various ways of coping with socially desirable responding. First, an emphasis on anonymity and confidentiality in instructions and procedures (e.g., sealed envelopes and large group data collection) reduce the effect of socially desirable responding. Second, scales for socially desirable responding, such as the DSDS, can be included in studies and their effect statistically controlled. Scales for controlling impression management, self-deception, careless answering style, and so forth can be easily designed and embedded in such instruments as the DBQ. It is surprising that traffic psychologists have ignored these biases while the use of control scales is common in mainstream psychological tests (e.g., validity scales of the MMPI-2). Third, objective measures of accidents and behavior should be used whenever possible.
5. CONCLUSION This chapter provided a general overview of the use of selfreports in traffic research. Although self-reports can offer a rich source of information, they also have some serious shortcomings and limitation that have to be taken into account. Review of studies using self-report methodology shows that traffic researchers pay far too little attention to the psychometric characteristics and validity of the tests. As the review of the DBQ studies showed, only a few studies have addressed the validity issues and cross-cultural applicability of self-reports of driving. More research and especially large sample validation studies with objective records of accidents and behavior are needed to further assess the level of bias in self-reports and especially to develop effective strategies for reducing different sources of bias. When evaluating the role of self-reports in traffic research, we fully agree with Reason et al. (1990) that the DBQ is a powerful means to measure behaviors that are “too private to be detected by direct observation” but that at the same time “DBQ responses are several stages removed from the actuality of what goes on behind the wheel” (pp. 1329e1330). The same applies to self-reports in general: They are able to reveal information that is not available with any other measurement methods. The gap mentioned by Reason et al. between the reality and the picture given by self-reports may not be possible to erase, but at least it can be considerably reduced with adequate use of self-report methodology.
REFERENCES Af Wa˚hlberg, A. E. (2010). Social desirability effects in driver behavior inventories. Journal of Safety Research, 41(2), 99e106. Af Wa˚hlberg, A. E., Dorn, L., & Kline, T. (2009). The effect of social desirability on self reported and recorded road traffic accidents.
58
Transportation Research Part F: Traffic Psychology and Behaviour, 13(2), 106e114. Anstey, K. J., Wood, J., Caldwell, H., Kerr, G., & Lord, S. R. (2009). Comparison of self-reported crashes, state crash records and an onroad driving assessment in a population-based sample of drivers aged 69e95 years. Traffic Injury Prevention, 10(1), 84e90. Arthur, W., Jr., Bell, S. T., Edwards, B. D., Day, E. A., Tulare, T. C., & Tubre, A. H. (2005). Convergence of self-report and archival crash involvement data: A two-year longitudinal follow-up. Human Factors, 47(2), 303e313. Basch, C. E., DeCicco, I. M., & Malfetti, J. L. (1989). A focus group study on decision processes of young drivers: Reasons that may support a decision to drink and drive. Health Education Quarterly, 16(3), 389e396. Bjørnskau, T., & Sagberg, F. (2005). What do novice drivers learn during the first months of driving? Improved handling skills or improved road user interaction? In G. Underwood (Ed.), Traffic and Transportation Psychology: Theory and Application (pp. 129e140) Oxford: Elsevier. Blanchard, R. A., Myers, A. M., & Porter, M. M. (2009). Correspondence between self-reported and objective measures of driving exposure and patterns in older drivers. Accident Analysis and Prevention, 42(2), 523e529. Borkenau, P., & Ostendorf, F. (1989). Descriptive consistency and social desirability in self and peer reports. European Journal of Personality, 3, 31e45. Boufous, S., Ivers, R., Senserrick, T., Stevenson, M., Norton, R., & Williamson, A. (2010). Accuracy of self-report of on-road crashes and traffic offences in a cohort of young drivers: The DRIVE study. Injury Prevention, 16(4), 275e277. Campbell, D. T., & Fiske, D. W. (1959). Convergent and discriminant validation by the multitraitemultimethod matrix. Psychological Bulletin, 56(2), 81e105. Chapman, P., & Underwood, G. (2000). Forgetting near-accidents: The roles of severity, culpability and experience in the poor recall of dangerous driving situations. Applied Cognitive Psychology, 14(1), 31e44. Cronbach, L. J., & Meehl, P. E. (1955). Construct validity in psychological tests. Psychological Bulletin, 52(4), 281e302. de Winter, J. C. F., & Dodou, D. (2010). The Driver Behaviour Questionnaire as a predictor of accidents: A meta-analysis. Journal of Safety Research, 41(6), 463e470. Duncan, J., Williams, P., & Brown, I. (1991). Components of driving skill: Experience does not mean expertise. Ergonomics, 34(7), 919e937. Elander, J., West, R., & French, D. (1993). Behavioral correlates of individual differences in road-traffic crash risk: An examination of methods and findings. Psychological Bulletin, 113(2), 279e294. Evans, L. (1991). Traffic safety and the driver. New York: Van Nostrand Reinhold. French, D. J., West, R. J., Elander, J., & Wilding, J. M. (1993). Decisionmaking style, driving style, and self-reported involvement in road traffic accidents. Ergonomics, 36(6), 627e644. Glendon, A. I., Dorn, L., Matthews, G., Gulian, E., et al. (1993). Reliability of the Driving Behaviour Inventory. Ergonomics, 36(6), 719e726. Guion, R. M. (1977). Content validitydThe source of my discontent. Applied Psychological Measurement, 1(1), 1e10. Gulian, E., Glendon, A. I., Matthews, G., Davies, D. R., & Debney, L. M. (1990). The stress of driving: A diary study. Work and Stress, 4(1), 7e16.
PART | I
Theories, Concepts, and Methods
˚ berg, L. (2002). Stability in drivers’ speed choice. Haglund, M., & A Transportation Research Part F: Traffic Psychology and Behaviour, 5(3), 177e188. Janke, M. K. (1991). Accidents, mileage, and the exaggeration of risk. Accident Analysis and Prevention, 23(2e3), 183e188. Joshi, M. S., Senior, V., & Smith, G. P. (2001). A diary study of the risk perceptions of road users. Health, Risk and Society, 3(3), 261e279. Kiernan, B. D., Cox, D. J., Kovatchev, B. P., Kiernan, B. S., & Giuliano, A. J. (1999). Improving driving performance of senior drivers through self-monitoring with a driving diary. Physical and Occupational Therapy in Geriatrics, 16(1e2), 55e64. Kua, A., Korner-Bitensky, N., & Desrosiers, J. (2007). Older individuals’ perceptions regarding driving: Focus group findings. Physical and Occupational Therapy in Geriatrics, 25(4), 21e40. Lajunen, T., Corry, A., Summala, H., & Hartley, L. (1997). Impression management and self-deception in traffic behaviour inventories. Personality and Individual Differences, 22(3), 341e353. Lajunen, T., Parker, D., & Stradling, S. G. (1998). Dimensions of driver anger, aggressive and highway code violations and their mediation by safety orientation in UK drivers. Transportation Research Part F: Traffic Psychology and Behaviour, 1(2), 107e121. Lajunen, T., Parker, D., & Summala, H. (2004). The Manchester Driver Behaviour Questionnaire: A cross-cultural study. Accident Analysis and Prevention, 36(2), 231e238. Lajunen, T., & Summala, H. (1995). Driving experience, personality, and skill and safety-motive dimensions in drivers’ self-assessments. Personality and Individual Differences, 19(3), 307e318. Lajunen, T., & Summala, H. (2003). Can we trust self-reports of driving? Effects of impression management on driver behaviour questionnaire responses. Transportation Research Part F: Traffic Psychology and Behaviour, 6(2), 97e107. Langford, J., Koppel, S., McCarthy, D., & Srinivasan, S. (2008). In defence of the ‘low-mileage bias.’ Accident Analysis and Prevention, 40(6), 1996e1999. Loftus, E. F. (1993). The reality of repressed memories. American Psychologist, 48, 518e537. Lund, A. K., & Williams, A. F. (1985). A review of the literature evaluating the defensive driving course. Accident Analysis and Prevention, 17(6), 449e460. Maycock, G. (1997). Sleepiness and driving: The experience of UK car drivers. Accident Analysis and Prevention, 29(4), 453e462. McGwin, G., Jr., Owsley, C., & Ball, K. (1998). Identifying crash involvement among older drivers: Agreement between self-report and state records. Accident Analysis and Prevention, 30(6), 781e791. Michon, J. A. (1985). A critical review of driver behaviour models: What we know, what should we do? In L. Evans, & R. C. Schwing (Eds.), Human behavior and traffic safety (pp. 485e520) New York: Plenum. Mourant, R. R., & Rockwell, T. H. (1972). Strategies of visual search by novice and experienced drivers. Human Factors, 14(4), 325e335. Na¨a¨ta¨nen, R., & Summala, H. (1976). Road-user behavior and traffic accidents. Amsterdam/New York: North-Holland/Elsevier. ¨ zkan, T., Lajunen, T., Chliaoutakis, J. E., Parker, D., & Summala, H. O (2006). Cross-cultural differences in driving behaviours: A comparison of six countries. Transportation Research Part F: Traffic Psychology and Behaviour, 9(3), 227e242. ¨ zkan, T., Lajunen, T., & Summala, H. (2006). Driver Behaviour QuesO tionnaire: A follow-up study. Accident Analysis and Prevention, 38 (2), 386e395.
Chapter | 4
Self-Report Instruments and Methods
Parker, D., Reason, J. T., Manstead, A. S. R., & Stradling, S. G. (1995). Driving errors, driving violations and accident involvement. Ergonomics, 38(5), 1036e1048. Paulhus, D. L. (1984). Two-component models of socially desirable responding. Journal of Personality and Social Psychology, 46(3), 598e609. Paulhus, D. L., & Reid, D. B. (1991). Enhancement and denial in socially desirable responding. Journal of Personality and Social Psychology, 60(2), 307e317. Quimby, A. R., & Watts, G. R. (1981). Human factors in driving performance (Report No. 1004). Crowthorne, UK: Transport and Road Research Laboratory. Reason, J., Manstead, A., Stradling, S., Baxter, J., & Campbell, K. (1990). Errors and violations on the roads: A real distinction? Ergonomics, 33(10e11), 1315e1332. Sackeim, H. A., & Gur, R. C. (1979). Self-deception, other-deception, and self-reported psychopathology. Journal of Consulting and Clinical Psychology, 47, 213e215. Smith, R. S., & Wood, J. E. A. (1977). MemorydIts reliability in the recall of long distance business travel (Supplementary Report No. 322). Crowthorne, UK: Transport and Road Research Laboratory. Spolander, K. (1983). Bilfo¨rares uppfattning om egen ko¨rfo¨rma˚ga (Drivers’ assessment of their own driving ability) (Report No. 252). Linko¨ping: Swedish Road and Traffic Research Institute. Staplin, L., Gish, K. W., & Joyce, J. (2008). ‘Low mileage bias’ and related policy implicationsdA cautionary note. Accident Analysis and Prevention, 40(3), 1249e1252. Sullman, M. J. M., & Taylor, J. E. (2010). Social desirability and selfreported driving behaviours: Should we be worried? Transportation
59
Research Part F: Traffic Psychology and Behaviour, 13(3), 215e221. Summala, H. (1980). How does it change safety margins if overtaking is prohibited: A pilot study. Accident Analysis and Prevention, 12(2), 95e103. Summala, H. (1985). Modeling driver behavior: A pessimistic prediction? In L. Evans, & R. C. Schwing (Eds.), Human behavior and traffic safety (pp. 43e65) New York: Plenum. Summala, H. (1987). Young driver accidents: Risk taking or failure of skills? Alcohol, Drugs and Driving, 3, 79e91. Summala, H. (1988). Risk control is not risk adjustment: The zero-risk theory of driver behaviour and its implications. Ergonomics, 31(4), 491e506. Summala, H. (1996). Accident risk and driver behaviour. Safety Science, 22(1e3), 103e117. Summala, H., & Hietamaki, J. (1984). Drivers’ immediate responses to traffic signs. Ergonomics, 27(2), 205e216. Taylor, S. E., & Brown, J. D. (1988). Illusion and well-being: A social psychological perspective on mental health. Psychological Bulletin, 103, 193e210. Van Der Molen, H. H., & Botticher, A. M. T. (1988). A hierarchical risk model for traffic participants. Ergonomics, 31(4), 537e555. ¨ zkan, T., Lajunen, T., & Tzamalouka, G. (2010). CrossWalle´n Warner, H., O cultural comparison of driving skills. Submitted for publication. West, R., French, D., Kemp, R., & Elander, J. (1993). Direct observation of driving, self reports of driver behaviour, and accident involvement. Ergonomics, 36(5), 557e567. Wilson, T., & Greensmith, J. (1983). Multivariate analysis of the relationship between drivometer variables and drivers’ accident, sex, and exposure status. Human Factors, 25(3), 303e312.