Accepted Manuscript Full Length Article Fully Contextualized, Frequency-Based Personality Measurement: A Replication and Extension Chet Robie, Stephen D. Risavy, Djurre Holtrop, Marise Ph. Born PII: DOI: Reference:
S0092-6566(16)30200-8 http://dx.doi.org/10.1016/j.jrp.2017.05.005 YJRPE 3645
To appear in:
Journal of Research in Personality
Received Date: Revised Date: Accepted Date:
21 October 2016 1 May 2017 21 May 2017
Please cite this article as: Robie, C., Risavy, S.D., Holtrop, D., Ph. Born, M., Fully Contextualized, FrequencyBased Personality Measurement: A Replication and Extension, Journal of Research in Personality (2017), doi: http://dx.doi.org/10.1016/j.jrp.2017.05.005
This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
FRAME-OF-REFERENCE AND FREQUENCY-BASED ESTIMATION
1
Fully Contextualized, Frequency-Based Personality Measurement: A Replication and Extension Chet Robie and Stephen D. Risavy Wilfrid Laurier University Djurre Holtrop University of Western Australia Marise Ph. Born Erasmus University Rotterdam
Corresponding author: Chet Robie, Ph.D., Wilfrid Laurier University, Lazaridis School of Business & Economics, 75 University Avenue West, Waterloo, Ontario N2L 3C5.
[email protected] (email). 519-884-0710 ext. 2965 (work phone).
Abstract
FRAME-OF-REFERENCE AND FREQUENCY-BASED ESTIMATION
2
We compared the predictive validity of two types of Frame-of-Reference personality measures to each other and to a baseline generic measure. Each version of the measures used a unique response-format referred to as frequency-based estimation that allowed the behavioral consistency of responses to be gauged. Generic personality scales, tagged scales with “at school”, and completely modified scales were compared in their prediction of academic performance, counterproductive academic behavior, and participant reactions. Results showed that completely contextualized measures were the most predictively valid and, contrary to our expectations, behavioral consistency did not moderate the relationships. Face validity and to a lesser extent perceived predictive validity improved with increasing contextualization. We discuss the implications of our results for personality assessment in applied settings. KEYWORDS: personality assessment; Frame-of-Reference; contextualization; frequency-based estimation
FRAME-OF-REFERENCE AND FREQUENCY-BASED ESTIMATION
3
1. Introduction The use of personality assessments to predict real-world behavioral outcomes has been a goal of personality and applied psychologists for decades. Extant research has established the usefulness of personality measures in relation to the prediction of important real-world outcomes, such as job performance (e.g., Barrick & Mount, 1991; Tett, Jackson, & Rothstein, 1991) and school success (e.g., grade point average GPA; McAbee & Oswald, 2013). Subsequently, two major developments have been advocated with regards to increasing the validity of personality measures. First, the investigation of whether adding a situation specification to personality items would increase criterion-related validity, known as a Frame-ofReference (FoR) modification, was conducted (e.g., Schmit, Ryan, Stierwalt, & Powell, 1995). Adding a FoR to personality items is most commonly achieved by adding a situational tag to the end of a personality item, such as “at work” or “at school”, and is often referred to as contextualization. Second, the investigation of whether personality constructs are more predictive for individuals who express a more consistent personality, known as the FrequencyBased Estimation (FBE) method for responding to personality assessment items, was conducted (e.g., Edwards & Woehr, 2007; Fleisher, Woehr, Edwards, & Cullen, 2011). The FBE method requires respondents to indicate the percentage of time that their behavior is consistent with each personality item. Despite the fact that personality tests can include both a situation-based (i.e., FoR) modification to items as well as a behavioral consistency-based response option format (i.e., FBE), only one study has thus far assessed the combined effects of these two concepts (Robie & Risavy, 2016). Moreover, the study by Robie and Risavy (2016) found results counter to what the extant FBE literature has found. Thus, further research is needed in order to develop a better
FRAME-OF-REFERENCE AND FREQUENCY-BASED ESTIMATION
4
understanding of personality item modification and response option format combinations that may increase the predictive validity of personality tests for predicting real-world outcomes. The current paper continues the quest to better understand the optimal way to design personality assessment items and response option formats by answering the call from Robie and Risavy (2016) to compare different levels of FoR contextualization (see below for an explanation of levels of contextualization) using the FBE response option format. Thus, the primary contribution of the current paper is to further our understanding of FoR modifications using the FBE response option format by assessing their interaction. This is achieved by using the FBE response option format with a generic as well as two FoR (i.e. tagged and completely contextualized; Holtrop, Born, De Vries, & De Vries, 2014) personality measures. In order to situate the current paper, first, the prior FoR and FBE research that is relevant to the current investigation is discussed. Next, the rationale for the current investigation as well as the resulting hypotheses are presented. Subsequently, a three-wave study designed to assess the focal hypotheses as well as respondent reactions to the different personality assessment combinations is described. 1.1 Previous research 1.1.1. Frame-of-Reference (FoR) research The FoR modification of personality assessment items (e.g., adding “at work” to the end of a personality item stem) was initially investigted by Schmit and colleagues (1995); their research as well as the research of many others since then has provided empirical evidence that adding a specified context to personality measures increases their ability to predict real-world outcomes (cf. a meta-analysis by Shaffer and Postlethwaite [2012]). More recently, different levels of contextualization have begun to appear in the FoR research literature (e.g., Holtrop et
FRAME-OF-REFERENCE AND FREQUENCY-BASED ESTIMATION
5
al., 2014; Pace & Brannick, 2010). The theoretical rationale underlying this line of research is that contextualizing personality assessment items beyond simply adding an “at work” or an “at school” tag to the end of a personality item may yield additional increments in criterion-related validity. Holtrop and colleagues (2014) were the first to compare the effect of two types of contextualized personality measures (i.e., a measure tagged with “at school” and a completely modified measure) with a baseline, generic (i.e., noncontextualized) measure for the traits of Conscientiousness, Emotional Stability, and Integrity. The completely modified scale went beyond the typical FoR modification of adding an “at work” or an “at school” tag to each item (e.g., “I keep my promises at school”) by completely revising each item (e.g., “I keep my promises when I agree to complete a section of a team project”). Using an undergraduate student sample in The Netherlands (N = 531) and a within-participants design, their results generally showed evidence of statistically significant increases in the prediction of outcome variables (i.e., objective GPA and counterproductive academic behavior) with increasing levels of contextualization. For example, the completely modified Conscientiousness scale of the Multicultural Personality Test – Big Six (MPT-BS; De Vries, De Vries, & Born, 2011; NOA, 2009) explained the most variance in GPA compared with the tagged and generic measures (similar conclusions could be derived for the pattern of results that emerged for the counterproductive academic behavior criterion). Moreover, regarding participant reactions, perceived predictive validity and face validity (i.e., the relevance of the questionnaire) also improved with increasing levels of contextualization; however, the students liked the contextualized measures less than the generic measure. This investigation by Holtrop et al. (2014) utilized the standard, Likert-type response option format for personality assessment;
FRAME-OF-REFERENCE AND FREQUENCY-BASED ESTIMATION
6
however, other response option formats are available, such as the FBE response option format (e.g., Edwards & Woehr, 2007; Fleisher et al., 2011). 1.1.2. Frequency-Based Estimation (FBE) research The FBE response option format was first proposed by Edwards and Woehr (2007) as a format that could be a viable alternative to the traditional, Likert-type response format for responding to personality assessment items. In the FBE format, respondents distribute 100 percentage points per item across three categories (i.e., very inaccurate, neither accurate nor inaccurate, and very accurate) to indicate how that personality item is reflective of their behavior over the past six months. The FBE format can provide important information (i.e., within-item variability) that is not available through the Likert-type format. Specifically, FBE is a method of estimating behavioral consistency with a single administration; put differently, FBE allows an assessment of consistency over time within personality items. Behavioral consistency, in this case, refers to the variance in behavior across time that is associated with each personality item. Because personality measures are purported to be more predictive for individuals who consistently display the same type and level of behavior (referred to as traitedness by Baumeister and Tice [1988]), measuring behavioral consistency should theoretically improve the prediction of behavioral outcomes. Edwards and Woehr’s (2007) Study 1, which used an undergraduate student sample (N = 143), provided empirical evidence that the psychometric properties (i.e., reliability estimates and convergent validity coefficients) of the FBE response option format were, for the most part, similar to the Likert-type response option format when the respondents were completing the 50item International Personality Item Pool (IPIP) Scale (Goldberg, 1999). Edwards and Woehr’s (2007) Study 2 also used an undergraduate student sample (N = 120) as well as one–two personal
FRAME-OF-REFERENCE AND FREQUENCY-BASED ESTIMATION
7
acquaintances (i.e., friends and/or family members; N = 210) that had known the participant for at least six months. The results of this second study provided empirical evidence that the measure of behavioral consistency obtained through using the FBE response option format (i.e., low within-item variability/high within-item consistency, meaning that respondents are more consistent over time) moderated self/other agreement for the personality traits of Agreeableness, Emotional Stability, and Extraversion, such that respondents who rated themselves as more consistent over time were more predictable (i.e., had higher levels of agreement with the ratings provided by their acquaintance[s]). Consistent with Kane (1986), the within-item variability term (i.e., the measure of behavioral stability) was calculated by computing the standard deviation of the three percentage responses for each item and then obtaining the mean within-item standard deviation across all of the items for each personality dimension. A more recent study by Fleisher and colleagues (2011) continued this line of research on the FBE format by providing further evidence for the validity of the FBE approach for assessing personality. Consistent with the earlier work of Edwards and Woehr (2007), Fleisher et al.’s (2011) Study 1 provided additional evidence for the statistical equivalence of the reliability coefficients (i.e., alphas) across both the FBE and Likert-type response option formats as well as convergent validity evidence (i.e., the correlation between scores on both response option formats). Extending the work of Edwards and Woehr (2007), Fleisher et al.’s (2011) Study 1 also found statistically significant correlations between the Big Five dimensions and motivational variables (e.g., communion striving, achievement striving, learning goal orientation, performance-avoid goal orientation), which were consistent across response option formats (i.e., FBE and Likert-type). Fleisher et al.’s (2011) Study 2 found significant interactions for Agreeableness and Conscientiousness in predicting peer ratings of task performance, such that
FRAME-OF-REFERENCE AND FREQUENCY-BASED ESTIMATION
8
participants that rated themselves as being more behaviorally consistent (i.e., had lower withinitem variability), exhibited stronger personality–performance relationships. An important extension to the work of Edwards and Woehr (2007) was the third study by Fleisher et al. (2011), which provided empirical evidence that the less transparent FBE format is less susceptible to conscious response distortion than the traditional, more transparent Likert-type format for the personality dimensions of Conscientiousness, Emotional Stability, and Openness to Experience when respondents were instructed to fake (i.e., provide socially desirable responses). Fleisher et al. (2011) echoed the conclusion from Edwards and Woehr (2007) that the FBE response option format has favorable empirical support and that future research using this response option format would be a fruitful endeavor. In fact, one future research direction noted by Fleisher and colleagues (2011) was to investigate the impact of context sensitivity (i.e., FoR) on the FBE response option format. 1.1.3. FoR and FBE research Heeding the call from Fleisher et al. (2011), Robie and Risavy (2016) studied the separate and combined effects of the FoR and FBE concepts. Specifically, they utilized a large undergraduate student sample in Canada (N = 933) to address the mixed effects of FoR (i.e., tagged and generic versions) and response option format (i.e., FBE and Likert-type versions) in a between-participants design when investigating the predictive validity of a personality assessment. However, contrary to the work of Edwards and Woehr (2007) and Fleisher et al. (2011), they did not find the hypothesized moderating effect of behavioral consistency as measured with the FBE format. In fact, the moderating effect that they found was in the opposite direction from what was expected (i.e., the Conscientiousness–GPA relationship was stronger for less consistent respondents; Robie & Risavy, 2016). Regarding respondent reactions, participants
FRAME-OF-REFERENCE AND FREQUENCY-BASED ESTIMATION
9
in the FBE condition experienced more mental fatigue (effect size d = .69) and took longer to complete the assessment (effect size d = 1.04) than participants in the Likert-type condition (Robie & Risavy, 2016). It is important to note that this study was the first published study to not find support for the FBE moderation hypothesis (i.e., that behavioral consistency moderates the relationship between personality scores and outcomes); thus, future research is needed to investigate the veracity of the FBE conclusions purported by Edwards and Woehr (2007) and Fleisher et al. (2011). A fruitful starting point would be to follow the recommendation from Robie and Risavy (2016) to more fully assess the robustness of the FBE moderation hypothesis by investigating all three levels of contextualization: generic, tagged, and completely contextualized. 1.2. Current study rationale and hypotheses Overall, the current study is an attempt to replicate the incremental effect of increasing levels of contextualization (Holtrop et al., 2014) in combination with the line of research from Woehr and colleagues (Edwards & Woehr, 2007; Fleisher et al., 2011). This is achieved by comparing the three different levels of contextualization (i.e., generic, tagged, and completely contextualized) while utilizing the FBE response option format. Put differently, the current research is the first to examine whether the behavioral consistency effect observed in the series of studies conducted by Woehr and colleagues (Edwards & Woehr, 2007; Fleisher et al., 2011) extends to all three FoR modification levels. The current study thus is the first to conjointly examine the degree to which behavioral consistency aspects (i.e., the FBE response option format) and situations (i.e., the FoR modifications) moderate the predictive ability of personality variables. FBE can reasonably be seen as a measure of the behavioral consistency of responses, however limited by asking
FRAME-OF-REFERENCE AND FREQUENCY-BASED ESTIMATION
10
participants to retrospect on how situationally (in)consistent they have behaved over a given period of time (in the current study, and based on previous FBE research, respondents indicated how each personality item was reflective of their behavior over the past six months). It needs to be acknowledged that there are more stringent methods for measuring the effects of behavioral consistency across situations vis-à-vis personality (e.g., Funder, 2016; Steyer, Ferring, & Schmitt, 1992). FoR can reasonably be seen as a response option format that limits the measure to behaviors within a specific context (although it is acknowledged that other variables also assess situational variance; e.g., self- vs. other-ratings). However, the primary practical purpose of the current study is to examine these issues using extant methodologies based on classical test theory that can be used in a single administration; this is especially necessary in high-stakes testing (e.g., pre-employment testing, school admissions). Consistent with the aforementioned findings from Holtrop et al. (2014), it is hypothesized that increasing levels of contextualization (i.e., from generic to tagged to completely contextualized) will lead to increasing levels of predictive validity in predicting real-world outcomes (H1; replication hypothesis). Consistent with the aforementioned FBE research (i.e., Edwards & Woehr, 2007; Fleisher et al., 2011) and its extension to the different levels of contextualization, it is hypothesized that a respondent’s behavioral consistency will moderate the relationship between personality and outcomes at each level of contextualization, such that respondents who are more behaviorally consistent will have more predictable outcomes (H2; extension hypothesis). Lastly, considering the cautions regarding applicant reactions to the FBE response option format identified by Robie and Risavy (2016), an exploratory examination of participant reactions for practical evaluation purposes is also conducted. 2. Method
FRAME-OF-REFERENCE AND FREQUENCY-BASED ESTIMATION
11
2.1. Procedure and participants Our design included scales from the MPT-BS personality questionnaire (NOA, 2009). Three versions of this personality questionnaire were utilized in our study: a generic version, a tagged version, and a completely contextualized version (see under predictor measures for further details). Participants were asked to complete three survey sessions in a proctored group setting in which a different version of the questionnaire was completed in each session. The order of completion of the three questionnaire versions was fully randomly counterbalanced with an approximately equal number of participants assigned to each of the six different order combinations. However, demographic items and a Counterproductive Academic Behavior measure (discussed below) were always completed during the first session. Approximately one week separated each of the survey sessions as a procedural remedy to possible biasing effects of common method variance (Podsakoff, MacKenzie, Lee, & Podsakoff, 2003). One-way ANOVAs with order as the independent variable and either scale score or within-item standard deviation as the dependent variable found significant order effects for the generic Conscientiousness scale mean [F(5, 343) = 3.00, p < .05, partial eta-squared = .04] and the generic Conscientiousness within-item standard deviation [F(5,343) = 2.91, partial eta-squared = .04]. We similarly checked for order effects for the five participant reaction variables across the three versions of the personality measure. Only two of the fifteen tests were statistically significant at the .05 alpha level. No discernable pattern emerged from the post-hoc multiple comparisons of these significant findings suggesting that random order counterbalancing likely controlled for these sources of extraneous variance. All participants (N = 349) were third and fourth year, mostly male (55.9%) and White (53.7%) undergraduate students enrolled in the Bachelor of Business Administration (BBA)
FRAME-OF-REFERENCE AND FREQUENCY-BASED ESTIMATION
12
program at a mid-size comprehensive Canadian university located in Southwestern Ontario who participated in exchange for course credit. A total of 425 participants completed at least one session. Only 386 participants completed all three sessions. Of those, 365 replied in the affirmative to the question, “In your honest opinion, should we use your data from this survey?” Reasons for not affirming were typically related to self-reported low effort, not feeling well, or fatigue. Fifteen of these participants did not respond correctly to at least 13 of 15 directed response items (e.g., “To answer this question, please input 100 next to % very inaccurate”) that were spread equally among the three survey conditions. We chose 13 out of 15 as a cutoff as it seemed to be a natural demarcation (i.e., the number of individuals who cumulatively scored 12 or below was less than those who scored 13). Moreover, there is a miniscule probability of an individual answering 13 or more of the directed response items correctly by chance alone (Marjanovic, Struthers, Cribbie, & Greenglass, 2014). Finally, we could not retrieve GPA data for one participant, which brought our useable sample size to 349. Out of these 349 participants, 27.1% were currently employed either part- or full-time and most (51.4%) possessed more than two years of total work experience. A majority of these participants (82.0%) were enrolled in the university’s cooperative education program, which entails being placed into at least two, four-month, full-time placements with a local employer in the students’ anticipated career field. 2.2. Predictor measures 2.2.1. Multicultural Personality Test – Big Six (MPT-BS) The MPT-BS (De Vries et al., 2011; NOA, 2009) is a personality inventory that consists of 200 short statements, measuring six personality dimensions: Integrity, Emotional Stability, Extraversion, Agreeableness, Conscientiousness, and Openness. The factor-level structure of the
FRAME-OF-REFERENCE AND FREQUENCY-BASED ESTIMATION
13
MPT-BS is based on the HEXACO model (Lee & Ashton, 2004), but contains different subscales and operationalizes these scales independently from the HEXACO (NOA, 2009). We opted to use the MPT-BS over the HEXACO because Holtrop et al. (2014) found that participants rated the MPT-BS higher across format types on face validity, perceived predictive validity, and liking. Participants responded on a 5-point Likert-type scale, ranging from “disagree strongly” to “agree strongly”. We only administered the Conscientiousness (32 items) and Integrity (23 items) scales to reduce testing time and possible participant fatigue and because these two scales were conceptually and empirically most aligned to the criteria we employed. An example item for Conscientiousness is: “I do not like rules” (recoded). An example item for Integrity is: “I’m honest about my intentions.” Alpha reliabilities in the present study ranged from .88 to .90 for Conscientiousness and from .69 to .79 for Integrity. 2.2.2. Tagged contextualization An “at school” tag was added to the end of every personality inventory item. If the tag did not grammatically fit after the last word of the item, then it was placed elsewhere or modified slightly (e.g., “I am precisely on time for appointments relating to school”). 2.2.3. Complete contextualization For complete contextualization, every item was completely revised in previous research. Holtrop et al. (2014, p. 236) detailed the extensive process that was undertaken, which included the following steps: (1) generating examples, (2) developing a preliminary list of items, (3) backtranslation, (4) revision, and (5) a final check by two experts on personality ratings who assigned the completely contextualized items to the facet scales used in the inventory (this generated an inter-rater reliability of .80; Hayes & Krippendorff, 2007). Example items for Conscientiousness
FRAME-OF-REFERENCE AND FREQUENCY-BASED ESTIMATION
14
and Integrity for the generic, tagged, and complete contextualization versions can be found in Table 1. 2.2.4. Translation The MPT-BS was originally written in Dutch. A test expert, the third author (whose first language is Dutch but who is also fluent in English) and a professional translator translated the original, tagged, and complete contextualization versions from Dutch to English. For the purposes of the current study, the first two authors (one American by birth and one Canadian by birth) made revisions to several items from the translated MPT-BS to ensure that all of the items conformed to American or Canadian conventions. These revisions were shared with the third author to ensure that the item still reflected the original intent of the item. Several rounds of interactive revisions were undertaken to arrive at the fully translated measures. 2.2.5. Frequency-Based Estimation We asked participants to estimate the relative frequency that each of the three response categories (very inaccurate, neither accurate nor inaccurate, very accurate) reflected their behavior with respect to the item stem over the previous six months. Participants were required to assign percentage values to each response category such that the total summed to 100. We had to remove any mention of frequency from the personality items (in all formats) as it would be confusing to participants to answer, for example, an item which stated how frequently they “sometimes” engaged in a given behavior. Eight of the 23 Integrity scale items contained frequency information and thus needed to be modified (none of the Conscientiousness items contained frequency information). An example from the Integrity scale was modifying the item “I sometimes pretend to worry about others” to “I pretend to worry about others”.
FRAME-OF-REFERENCE AND FREQUENCY-BASED ESTIMATION
15
The aforementioned percentage ratings were combined into a single score for each item consistent with previous FBE research conducted by Woehr and colleagues (Edwards & Woehr, 2007; Fleisher et al., 2011). We did this by assigning each response category a scale value that would be comparable to the five-point Likert-type scale (very inaccurate = .01, neither accurate nor inaccurate = .03, very accurate = .05) and then computed the sum of the three responses per item by weighting each scale value by the percentage reported for the response category. This resulted in a score for each item that ranged from 1 to 5 (the same scale as the Likert-type format). Participant scale scores on Conscientiousness and Integrity were obtained by averaging the items corresponding to each of the two dimensions (after reverse-coding negatively worded items). As an example, if a participant responded in the following manner to how frequently the behavior “Take charge” was descriptive of their behavior: % very inaccurate (15), % neither inaccurate nor accurate (35), and % very accurate (50), then the single score would be: (15 × .01) + (35 × .03) + (50 × .05) = 3.7. The frequency-based format allows one to measure within-item variability that is not possible with the Likert-type format. We calculated a measure of within-item variability that reflects the standard deviation of the distribution of an item reflected in the percentages assigned (Kane, 1986). An example of very high within-item variability would be if an individual reported that a given personality statement described them as very inaccurate 50% of the time and very accurate 50% of the time. An example of low within-item variability would be if an individual reported that a given personality statement described them as very accurate (or any of the options) 100% of the time. We used the mean within-item SD across the items corresponding to each of the two personality dimensions as a measure of dimensional (in)consistency (see the
FRAME-OF-REFERENCE AND FREQUENCY-BASED ESTIMATION
16
appendix for further details on how the scale scores and variability estimates were calculated as well as for illustrative examples). 2.3. Criterion measures 2.3.1. Grade Point Average (GPA) Approximately two months after the participants had completed the study, their Grade Point Average (GPA) over their entire school career at the institution was obtained from the institution’s database. The GPA criterion is therefore an objective measure of students’ academic performance. GPA scores ranged from 6 to 12 (C+ to A+), with higher scores indicating better performance. 2.3.2. Counterproductive Academic Behavior (CAB) To measure Counterproductive Academic Behavior (CAB), Holtrop et al. (2014) extracted 25 items relevant to the school context from a 40-item inventory of counterproductive behavior across school, home, and work contexts (Hakstian, Farrell, & Tweed, 2002; α = .83), using a 6-point Likert-type scale, ranging from: “Never even considered it” to “Did it three or more times.” An example item is: “Submitted a class paper or project that was not your own work.” Participants were asked to think of the last five school years when completing the CAB inventory. 2.3.3. Participant reactions Three questions were designed to measure participant reactions, based on items from Smither, Reilly, Millsap, Pearlman, and Stoffey (1993) for face validity and perceived predictive validity, and based on items from Wiechman and Ryan (2003) for liking. The items were: “The content of this questionnaire is clearly related to my study” (face validity); “With the results of this questionnaire my study performance can be predicted” (perceived predictive validity), and “I
FRAME-OF-REFERENCE AND FREQUENCY-BASED ESTIMATION
17
did not enjoy completing this questionnaire” (liking; reverse coded). Participants responded on a 7-point Likert scale, ranging from “completely disagree” to “completely agree”. A supporting study was conducted by Holtrop et al. (2014) to estimate single-item reliability using the procedure by Wanous and Reichers (1996) of the participant reaction items, which found the following reliabilities: .55 for face validity, .60 for perceived predictive validity, and .64 for liking. We also included two other participant reaction items: mental effort and fatigue. Mental effort was measured with a single item: “In responding to the items in this questionnaire I invested…” on a 9-point Likert scale ranging from “very, very low mental effort” to “very, very high mental effort” (Leppink, Paas, Van der Vleuten, Van Gog, & Van Merriënboer, 2013). Fatigue was measured with a single item: “I became fatigued and tired during the testing” on a 5point Likert scale ranging from “strongly disagree” to “strongly agree” (Arvey, Strickland, Drauden, & Martin, 1990). 2.4. Data analyses To investigate our first hypothesis, we performed several three-step hierarchical multiple regression analyses according to the methods originally used by Pace and Brannick (2010) and subsequently by Holtrop et al. (2014). When inventory A significantly increases explained variance over inventory B, but not vice versa, then inventory A explains more variance than B. Results of these analyses are presented in Tables 3 and 4 for Conscientiousness with GPA and CAB, respectively and Table 5 for Integrity with CAB. We always included one inventory version (generic/tagged/complete contextualization) in the second step, and one other version in the third step.
FRAME-OF-REFERENCE AND FREQUENCY-BASED ESTIMATION
18
To investigate our second hypothesis, we performed several five-step hierarchical moderated multiple regression analyses according to the methods used by Fleisher et al. (2011). As done by Fleisher et al. (2011), we included the quadratic term to account for the fact that the variability of the within-item SD is necessarily restricted for extreme scores (i.e., an individual with very low or very high trait level also would have low trait variability). We performed these regressions for each version of the questionnaire (generic/tagged/complete contextualization). The results of these analyses can be found in Tables 6 and 7 for GPA and CAB, respectively. 3. Results 3.1. Descriptive results The personality measures used in this study all showed adequate reliability, including the newly designed and modified personality scales (see Table 2 for descriptive statistics). With the exception of the completely contextualized Integrity scale, reliabilities were comparable across different inventory types suggesting that the modification process did not affect the scales’ internal consistency. Correlations between the same personality scale dimensions of different FoR-versions ranged from .65 to .83, with the completely contextualized Integrity scale having lower intercorrelations with its generic and tagged versions. It is likely that the lower base rate of Integrity-related behavior combined with a very specific context would lead to lower inter-item correlations and thus, a lower reliability coefficient and lower intercorrelations with the other Integrity scales. This conjecture is supported by the lower standard deviation for the completely contextualized Integrity scale (.39) versus the generic and tagged versions (.45 and .46, respectively). 3.2. Tests of hypothesis 1
FRAME-OF-REFERENCE AND FREQUENCY-BASED ESTIMATION
19
The regression results for Conscientiousness with GPA can be found in Table 3. Both the tagged and completely contextualized versions showed significant increases in explained variance over the generic version. The tagged and completely contextualized versions did not show increased variance over one another. The generic version showed increased variance over the tagged and completely contextualized versions but this appeared to be a negative suppressor effect (Darlington, 1968). Specifically, the generic version of Conscientiousness both increased the size of the regression coefficients for the other versions of the measure and received negative beta weights (even though positively yet not significantly correlated with the criterion) when included in the regression equation. The generic version of the Conscientiousness measure thus “suppressed” irrelevant variance in GPA, which led to an increased percentage of the remaining variance being predicted for the other versions. This negative suppression makes theoretical sense because the generic version is much broader in terms of the situations it taps, many of which may be irrelevant to how Conscientiousness is expressed vis-à-vis performance in school. The regression results for Conscientiousness and Integrity with CAB can be found in Tables 4 and 5, respectively. For Conscientiousness, each version showed increases in explained variance over each other. For Integrity, the generic and tagged versions of the measures did not provide increases in explained variance over the completely contextualized version. Conversely, the completely contextualized version showed a significant increase in explained variance over both the generic and tagged versions of the measure. Therefore, hypothesis 1 was partially supported. 3.3. Tests of hypothesis 2 The hierarchical moderated multiple regression results for Conscientiousness with GPA and for Conscientiousness and Integrity with CAB can be found in Tables 6 and 7, respectively.
FRAME-OF-REFERENCE AND FREQUENCY-BASED ESTIMATION
20
In all cases, the interaction between scale score and within-item SD was not statistically significant suggesting that behavioral consistency does not moderate the relationships examined in this study. Therefore hypothesis 2 was not supported. 3.4. Exploratory findings: participant reactions Results for participant reactions can be found in Table 8 and can be seen visually in Figure 1. We converted mental effort and fatigue to a 7-point scale in Figure 1 using a simple linear transformation for ease of comparison to the other 7-point scales. Participants reported increasing levels of face validity from the generic, to tagged, to completely contextualized. Only a small effect was found for perceived predictive validity with generic being seen as slightly less predictively valid than tagged or completely contextualized. Participants felt lukewarm towards the assessments with all versions hovering around a mean of 4 out of a possible 7. Participants also reported a fair amount of mental effort (with averages for each version well above 4 out of a possible 7) but that none of the versions were particularly fatiguing (with averages for each version just above 3 out of a possible 7).1 4. Discussion Our first hypothesis, that increasing levels of contextualization will lead to increasing levels of predictive validity in predicting real-world outcomes, was partially supported. The results for Conscientiousness predicting both GPA and CAB supported a “more unique variance accounted for” interpretation, instead of an “increasing levels of predictive validity” interpretation. The results for Integrity predicting CAB fully supported our prediction—the completely contextualized version always incremented the validity of the other versions but the reverse was not true. 1
For mental effort and fatigue, this refers to the data converted to a 7-point scale as reported in Figure 1.
FRAME-OF-REFERENCE AND FREQUENCY-BASED ESTIMATION
21
Our second hypothesis, that behavioral consistency will moderate the relationship between personality and outcomes at each level of contextualization, such that individuals who are more behaviorally consistent will have more predictable outcomes, was not supported. Thus far, only two published studies have found moderating effects in the expected direction for behavioral consistency using the FBE methodology (Edwards & Woehr, 2007; Fleisher et al., 2014); whereas, one unpublished study found no effects (McGee, 2011), one published study found an effect opposite than expected (Robie & Risavy, 2016), and the current study also found no effects. The exploratory analyses found that participants perceived the generic version to be the least face valid, followed by the tagged version, followed by the completely contextualized version. The effect sizes for the other participant reactions were non-significant. Interestingly, because we utilized one of the same measures used in Holtrop et al. (2014), we can also compare our exploratory results using an FBE format for three of the participant reaction variables (face validity, perceived predictive validity, and liking) to the Holtrop et al. (2014) results, which used a Likert-type format. Comparing to the same version (generic/tagged/completely contextualized) across studies, we find that participants in the Holtrop et al. (2014) study—who were also undergraduate students—rated each version significantly higher in face validity and liking compared to the participant ratings in our study. Participants in our study rated the tagged version significantly higher in perceived predictive validity compared to the Holtrop et al. (2014) study. It is also instructive to note that Robie and Risavy (2016) found statistically significant and practically large effects in comparing Likert-type and frequency-based formats for fatigue and time to completion. Specifically, participants in the frequency-based conditions experienced
FRAME-OF-REFERENCE AND FREQUENCY-BASED ESTIMATION
22
significantly more fatigue (d = .69) and took over two times longer to complete the measure than participants in the Likert-type condition. 4.1. Implications Several implications can be drawn from this study. First, it appears that little may be gained from the use of frequency-based measurement in an applied setting. Our results found that behavioral consistency did not moderate the validity of a personality measure for any one of three different levels of contextualization. Moreover, participants found (for each level of contextualization) the Likert-type format to be more face valid and enjoyed completing the questionnaires more than the frequency-based format. Second, contextualized measures of personality appear to still be predictive using formats other than Likert-type but may not always be more predictive than generic versions using the frequency-based format. Note that the incremental validity of contextualization in this study was small, similar to the results of Holtrop et al. (2014). 4.2. Directions for future research Future studies should be conducted in several areas. First, before the book is closed on frequency-based measurement, studies should be conducted that use more motivated respondents. Fleisher et al. (2011) have found frequency-based measurement to be more resistant to faking and it could be possible that in non-motivated samples, the tendency to report higher levels of consistency is caused by a response style. As suggested by Robie and Risavy (2016), it is more effortful for a participant to use all response categories in frequency-based measurement than to simply report that for example “Gets things done quickly” is very accurate of one’s behavior 100% of the time. Second, the psychometric properties of frequency-based measurement should be more intensively studied. At present, the calculations for within-item
FRAME-OF-REFERENCE AND FREQUENCY-BASED ESTIMATION
23
variability assume interval level measurement. A study by Hanisch (1992) found that the strong assumption of interval level measurement does not always apply to Likert-type scales. Specifically, Hanisch (1992) used item response theory methodology to determine that in a Yes, No, ? scale of job satisfaction, the ? option should be scored as more of a negative than a positive response. Perhaps comparable psychometric considerations as discovered by Hanisch (1992) affect FBE measurement, which also uses three similar categories, but uses a different scoring methodology. Third, the possible beneficial effects of contextualization on personality validity should be examined in more highly fake-resistant techniques, such as forced-choice assessment (Christiansen, Burns, & Montgomery, 2005). 4.3. Conclusions Frame-of-reference and frequency-based estimation are two potentially promising methods to increase the predictive validity of personality measures. The results of our study suggest that personality measure developers should expect at least equal and perhaps increased validity from contextualizing measures while almost certainly increasing face validity compared to non-contexualized measures. However, advocating the use of the frequency-based format may be premature at this juncture given the conflicting findings on predictive validity and the degree to which participant reactions and testing time will almost certainly be adversely impacted. We suggest that more research be conducted on both methods to support widespread operational use in school, work, and other applied settings.
FRAME-OF-REFERENCE AND FREQUENCY-BASED ESTIMATION
24
References Arvey, R. D., Strickland, W., Drauden, G., & Martin, C. (1990). Motivational components of test taking. Personnel Psychology, 43(4), 695-716. http://dx.doi.org/10.1111/j.17446570.1990.tb00679.x Barrick, M. R., & Mount, M. K. (1991). The Big Five personality dimensions and job performance: A meta-analysis. Personnel Psychology, 44(1), 1-26. http://dx.doi.org/ 10.1111/j.1744-6570.1991.tb00688.x Baumeister, R. F., & Tice, D. M. (1988). Metatraits. Journal of Personality, 56(3), 571–598. http://dx.doi.org/10.1111/j.1467-6494.1988.tb00903.x Christiansen, N. D., Burns, G. N., & Montgomery, G. E. (2005). Reconsidering forced-choice item formats for applicant personality assessment. Human Performance, 18(3), 267-307. http://dx.doi.org/10.1207/s15327043hup1803_4 Darlington, R. B. (1968). Multiple regression in psychological research and practice. Psychological Bulletin, 69(3), 161-182. http://dx.doi.org/10.1037/h0025471 De Vries, A., De Vries, R. E., & Born, M. Ph. (2011). Broad versus narrow traits: Conscientiousness and Honesty-Humility as predictors of academic criteria. European Journal of Personality, 25(5), 336-348. http://dx.doi.org/10.1002/per.795 Edwards, B. D., & Woehr, D. J. (2007). An examination and evaluation of frequency-based personality measurement. Personality and Individual Differences, 43(4), 803-814. http://dx.doi.org/10.1016/j.paid.2007.02.005 Fleisher, M. S., Woehr, D. J., Edwards, B. D., & Cullen, K. L. (2011). Assessing within-person personality variability via frequency estimation: More evidence for a new measurement
FRAME-OF-REFERENCE AND FREQUENCY-BASED ESTIMATION
25
approach. Journal of Research in Personality, 45(6), 535-548. http://dx.doi.org/10.1016/j.jrp.2011.06.009 Funder, D. C. (2016). Taking situations seriously: The situation construal model and the Riverside Situational Q-Sort. Current Directions in Psychological Science, 25(3), 203208. http://dx.doi.org/10.1177/0963721416635552 Goldberg, L. R. (1999). A broad-bandwidth, public domain, personality inventory measuring the lower-level facets of several five-factor models. In I. Mervielde, I. Deary, F. De Fruyt, & F. Ostendorf (Eds.), Personality psychology in Europe, Vol. 7 (pp. 7–28). Tilburg, The Netherlands: Tilburg University Press. Hakstian, A. R., Farrell, S., & Tweed, R. G. (2002). The assessment of counterproductive tendencies by means of the California Psychological Inventory. International Journal of Selection and Assessment, 10(1/2), 58-86. http://dx.doi.org/10.1111/1468-2389.00194 Hanisch, K. A. (1992). The Job Descriptive Index revisited: Questions about the question mark. Journal of Applied Psychology, 77(3), 377-382. http://dx.doi.org/10.1037/00219010.77.3.377 Hayes, A. F., & Krippendorff, K. (2007). Answering the call for a standard reliability measure for coding data. Communication Methods and Measures, 1(1), 77-89. http://dx.doi.org/10.1080/19312450709336664 Holtrop, D., Born, M. Ph., de Vries, A., & de Vries, R. E. (2014). A matter of context: A comparison of two types of contextualized measures. Personality and Individual Differences, 68, 234-240. http://dx.doi.org/10.1016/j.paid.2014.04.029
FRAME-OF-REFERENCE AND FREQUENCY-BASED ESTIMATION
26
Kane, J. S. (1986). Performance distribution assessment. In R. A. Berk (Ed.), Performance assessment: Methods and applications (pp. 237-273). Baltimore, MD: Johns Hopkins University Press. Lee, K., & Ashton, M. C. (2004). Psychometric properties of the HEXACO personality inventory. Multivariate Behavioral Research, 39(2), 329-358. http://dx.doi.org/10.1207/s15327906mbr3902_8 Leppink, J., Paas, F., Van der Vleuten, Van Gog, T., & Merriënboer, J. J. G. (2013). Development of an instrument for measuring different types of cognitive load. Behavior Research Methods, 45(4), 1058-1072. http://dx.doi.org/10.3758/s13428-013-0334-1 Marjanovic, Z., Struthers, C. W., Cribbie, R., & Greenglass, E. R. (2014). The Conscientious Responders Scale: A new tool for discriminating between conscientious and random responders. Sage Open, 4, 1-10. http://dx.doi.org/10.1177/2158244014545964 McAbee, S. T., & Oswald, F. L. (2013). The criterion-related validity of personality measures for predicting GPA: A meta-analytic validity competition. Psychological Assessment, 25(2), 532–544. http://dx.doi.org/10.1037/a0031748 McGee, E. A. (2011). An examination of the stability of positive psychological capital using frequency-based measurement. Unpublished doctoral dissertation, University of Tennessee, Knoxville, TN. http://trace.tennessee.edu/utk_graddiss/999/ NOA (2009). Handleiding Multiculturele Persoonlijkheids Test – Big Six [Manual Multicultural Personality Test – Big Six]. Amsterdam: NOA BV. http://dx.doi.org/10.1037/00219010.89.2.187 Pace, V. L., & Brannick, M. T. (2010). Improving prediction of work performance through frame-of-reference consistency: Empirical evidence using openness to experience.
FRAME-OF-REFERENCE AND FREQUENCY-BASED ESTIMATION
27
International Journal of Selection and Assessment, 18(2), 230-235. http://dx.doi.org/10.1111/j.1468-2389.2010.00506.x Podsakoff, P. M., MacKenzie, S. B., Lee, J., & Podsakoff, N. P. (2003). Common method biases in behavioral research: A critical review of the literature and recommended remedies. Journal of Applied Psychology, 88(5), 879-903. http://dx.doi.org/10.1037/00219010.88.5.879 Robie, C., & Risavy, S. D. (2016). A comparison of frame-of-reference and frequency-based personality measurement. Personality and Individual Differences, 92, 16-21. http://dx.doi.org/10.1016/j.paid.2015.12.005 Schmit, M. J., Ryan, A. M., Stierwalt, S. L., & Powell, A. B. (1995). Frame-of-reference effects on personality scale scores and criterion-related validity. Journal of Applied Psychology, 80(5), 607-620. http://dx.doi.org/10.1037/0021-9010.80.5.607 Shaffer, J. A., & Postlethwaite, B. E. (2012). A matter of context: A meta-analytic investigation of the relative validity of contextualized and noncontextualized personality measures. Personnel Psychology, 65(3), 445-494. http://dx.doi.org/10.1111/j.17446570.2012.01250.x Smither, J. W., Reilly, R. R., Millsap, R. E., Pearlman, K., & Stoffey, R. W. (1993). Applicant reactions to selection procedures. Personnel Psychology, 46(1), 49-76. http://dx.doi.org/10.1111/j.1744-6570.1993.tb00867.x Steyer, R., Ferring, D., & Schmitt, M. J. (1992). States and traits in psychological assessment. European Journal of Psychological Assessment, 8, 79-98.
FRAME-OF-REFERENCE AND FREQUENCY-BASED ESTIMATION Tett, R. P., Jackson, D. N., & Rothstein, M. (1991). Personality measures as predictors of job performance: A meta-analytic review. Personnel Psychology, 44(4), 703-742. http://dx.doi.org/10.1111/j.1744-6570.1991.tb00696.x Wanous, J. P., & Reichers, A. E. (1996). Estimating the reliability of a single item measure. Psychological Reports, 78, 631-634. http://dx.doi.org/10.2466/pr0.1996.78.2.631 Weichman, D., & Ryan, A. M. (2003). Reactions to computerized testing in selection contexts. International Journal of Selection and Assessment, 78, 631-634. http://dx.doi.org/10.1111/1468-2389.00245
28
FRAME-OF-REFERENCE AND FREQUENCY-BASED ESTIMATION
29
Table 1 Example Conscientiousness and Integrity Items for Generic, Tagged, and Complete Contextualization Personality Inventories Generic
Tagged
Fully Contextualized
I keep things tidy at school. I want to be the best at school.
I keep my notes neatly organized. I want to be better than my fellow students.
I never lie to my lecturers. I do my part for group assignments, as agreed with the other members of the group.
Conscientiousness Example Item 1
I keep things tidy
Example Item 2
I want to be the best
Integrity Example Item 1
I am honest
I am honest at school.
Example Item 2
I keep my promises
I keep my promises at school.
FRAME-OF-REFERENCE AND FREQUENCY-BASED ESTIMATION
30
Table 2 Descriptive Statistics and Intercorrelations for MPT-BS and Criterion Variables M
SD
1
2
3
4
5
6
7
8
9
10
11
12
13
1. Conscientiousness (G)
4.01
0.46
.89
2. Consc WSD (G)
0.88
0.41
-.52
.95
3. Integrity (G)
3.41
0.45
.14
-.05
.79
4. Integrity WSD (G)
0.88
0.38
-.35
.87
-.12
.91
5. Conscientiousness (T)
4.04
0.49
.83
-.39
.09
-.26
.90
6. Consc WSD (T)
0.78
0.40
-.38
.77
.01
.70
-.44
.95
7. Integrity (T)
3.59
0.46
.08
.00
.77
-.07
.08
-.01
.78
8. Integrity WSD (T)
0.75
0.37
-.19
.67
-.14
.74
-.21
.84
-.21
.91
9. Conscientiousness (C)
3.90
0.48
.78
-.33
.06
-.17
.81
-.29
.05
-.07
.88
10. Consc WSD (C)
0.75
0.38
-.34
.75
.00
.70
-.34
.80
.02
.69
-.32
.94
11. Integrity (C)
3.58
0.39
.18
-.04
.65
-.07
.18
-.05
.66
-.14
.18
-.02
.69
12. Integrity WSD (C)
0.62
0.35
-.27
.67
-.12
.72
-.27
.73
-.13
.74
-.20
.86
-.19
.90
13. GPA
9.64
1.31
.02
.10
-.15
.18
.10
.07
-.14
.20
.11
.00
-.01
.10
--
14. CAB
2.24
0.59
-.40
.24
-.24
.25
-.39
.26
-.22
.18
-.36
.27
-.32
.26
-.07
14
Predictors
Criteria .83
N = 349. G = generic. T = tagged. C = completely contextualized. WSD = within-item standard deviation. CAB = Counterproductive Academic Behavior. Values on the diagonal are internal consistency estimates (i.e., coefficient alpha). r ≥ | .11 | significant at the .05 alpha level. r ≥ | .14 | significant at the .01 alpha level. r ≥ | .18 | significant at the .001 alpha level. A fuller correlation matrix can be found in an electronic document found on the publisher’s website.
FRAME-OF-REFERENCE AND FREQUENCY-BASED ESTIMATION
31
Table 3 Hierarchical Multiple Regression Analyses Results for Conscientiousness with GPA Sequence of Inclusion of Conscientiousness Scale Version from Step 2 to Step 3 Generic-Tagged
Generic-Complete
Tagged-Generic
Tagged-Complete
Complete-Generic
Complete-Tagged
ΔR2
β
ΔR2
β
ΔR2
β
ΔR2
β
ΔR2
β
ΔR2
β
Step 1a
.05***
--
.05***
--
.05***
--
.05***
--
.05***
--
.05***
--
Step 2
.00
.02
.00
.02
.01†
.10†
.01†
.10†
.01*
.12*
.01*
.12*
Step 3
.02**
.26**
.03**
.26**
.01*
-.19*
.00
.11
.01*
-.18*
.00
.01
Note. † p < .10. * p < .05. ** p < .01. *** p < .001. a
Control variables always included in Step 1 included gender, age, and minority status.
FRAME-OF-REFERENCE AND FREQUENCY-BASED ESTIMATION
32
Table 4 Hierarchical Multiple Regression Analyses Results for Conscientiousness with Counterproductive Academic Behavior Sequence of Inclusion of Conscientiousness Scale Version from Step 2 to Step 3 Generic-Tagged
Generic-Complete
Tagged-Generic
Tagged-Complete
Complete-Generic
Complete-Tagged
ΔR2
β
ΔR2
β
ΔR2
β
ΔR2
β
ΔR2
β
ΔR2
β
Step 1a
.00
--
.00
--
.00
--
.00
--
.00
--
.00
--
Step 2
.16***
-.41***
.16***
-.41***
.15***
-.40***
.15***
-.40***
.13***
-.38***
.13***
-.38***
Step 3
.01*
-.20*
.01†
-.14†
.02**
-.24**
.01†
-.15†
.03***
-.30***
.03**
-.28**
Note. † p < .10. * p < .05. ** p < .01. *** p < .001. a
Control variables always included in Step 1 included gender, age, and minority status.
FRAME-OF-REFERENCE AND FREQUENCY-BASED ESTIMATION
33
Table 5 Hierarchical Multiple Regression Analyses Results for Integrity with Counterproductive Academic Behavior Sequence of Inclusion of Integrity Scale Version from Step 2 to Step 3 Generic-Tagged
Generic-Complete
Tagged-Generic
Tagged-Complete
Complete-Generic
Complete-Tagged
ΔR2
β
ΔR2
β
ΔR2
β
ΔR2
β
ΔR2
β
ΔR2
β
Step 1a
.00
--
.00
--
.00
--
.00
--
.00
--
.00
--
Step 2
.06***
-.25***
.06***
-.25***
.05***
-.23***
.05***
-.23***
.11***
-.34***
.11***
-.34***
Step 3
.00
-.09
.05***
-.30***
.01*
-.18*
.06***
-.32***
.00
-.07
.00
-.02
Note. † p < .10. * p < .05. ** p < .01. *** p < .001. a
Control variables always included in Step 1 included gender, age, and minority status.
FRAME-OF-REFERENCE AND FREQUENCY-BASED ESTIMATION Table 6 Hierarchical Moderated Multiple Regression Analyses for Conscientiousness with GPA Step Variable Conscientiousness 1 Control variablesa 2 Generic 3 Generic quadratic 4 Generic WSD 5 Interaction Generic-Generic WSD 1 2 3 4 5
Control variablesa Tagged Tagged quadratic Tagged WSD Interaction Tagged-Tagged WSD
ΔR2
β
.05*** .00 .00 .01† .00
.02 -.31 .13† -.25
.05*** .01† .00 .01* .00
.10† -.11 .15* -.21
1 Control variablesa .05*** 2 Complete .01* .12* 3 Complete quadratic .00 .39 4 Complete WSD .00 .05 5 Interaction Complete-Complete WSD .01 -.58 Note. † p < .10. * p < .05. ** p < .01. *** p < .001. WSD = within-item standard deviation. a Control variables always included in Step 1 included gender, age, and minority status.
34
FRAME-OF-REFERENCE AND FREQUENCY-BASED ESTIMATION
35
Table 7 Hierarchical Moderated Multiple Regression Analyses for Conscientiousness and Integrity with CAB Step Variable Conscientiousness 1 Control variablesa 2 Generic 3 Generic quadratic 4 Generic WSD 5 Interaction Generic-Generic WSD
ΔR2
β
.00 .16*** .00 .00 .01
-.41*** -.23 .05 .57
1 2 3 4 5
Control variablesa Tagged Tagged quadratic Tagged WSD Interaction Tagged-Tagged WSD
.00 .15*** .00 .01† .00
-.40*** -.44 .11† .27
1 2 3 4 5 Integrity 1 2 3 4 5
Control variablesa Complete Complete quadratic Complete WSD Interaction Complete-Complete WSD
.00 .13*** .00 .03** .00
-.38*** -.17 .18** .01
Control variablesa Generic Generic quadratic Generic WSD Interaction Generic-Generic WSD
.00 .06*** .00 .06*** .01
-.25*** .09 .26*** .58
1 2 3 4 5
Control variablesa Tagged Tagged quadratic Tagged WSD Interaction Tagged-Tagged WSD
.00 .05*** .00 .02** .01
-.23*** .74 .17** .69
1 Control variablesa .00 2 Complete .11*** -.34*** 3 Complete quadratic .00 .06 4 Complete WSD .04*** .21*** 5 Interaction Complete-Complete WSD .00 -.31 Note. † p < .10. * p < .05. ** p < .01. *** p < .001. WSD = within-item standard deviation. a Control variables always included in Step 1 included gender, age, and minority status.
FRAME-OF-REFERENCE AND FREQUENCY-BASED ESTIMATION
36
Table 8 Descriptive Statistics and Paired T-Tests for Participant Reactions Generic
Tagged
Completely Contextualized
M
SD
ta
d
M
SD
tb
d
M
SD
tc
d
Face validity
3.81
1.41
-4.85
-.27
4.19
1.44
-5.17
-.27
4.57
1.41
9.91
.54
Perceived predictive validity
4.21
1.47
-3.47
-.18
4.47
1.35
-0.18
-.01
4.48
1.37
3.76
.19
Liking
4.25
1.32
-0.28
-.01
4.27
1.38
-1.08
-.05
4.34
1.40
1.29
.07
Mental effort
6.09
1.45
0.46
.02
6.06
1.41
-0.48
-.03
6.10
1.50
-0.04
.01
Fatigue
2.42
1.11
-2.01
-.09
2.52
1.06
2.16
.10
2.41
1.06
-0.27
-.01
Note. a generic vs. tagged. b tagged vs. complete. c complete vs. generic. t ≥ | 1.97 | significant at the .05 alpha level. t ≥ | 2.59 | significant at the .01 alpha level. t ≥ | 3.32 | significant at the .001 alpha level.
FRAME-OF-REFERENCE AND FREQUENCY-BASED ESTIMATION
Generic
Tagged
37
Complete
7.00
6.00
Mean Score
5.00
4.00
3.00
2.00
1.00 Face validity
Perceived predictive validity
Liking
Mental effort
Fatigue
Participant Reaction Types
Figure 1. Participant reactions per FoR-type for the MPT-BS. Note that mental effort and fatigue have been transformed to a 7-point scale for comparison purposes.
FRAME-OF-REFERENCE AND FREQUENCY-BASED ESTIMATION
38
Appendix Calculation of Frequency-Based Scale Scores Formula:
where: i = item, j = total number of items, VI = very inaccurate, N = neither inaccurate nor accurate, VA = very accurate, weight 1 = .01 for a positively worded item and .05 for a negatively worded item, weight 2 = .03, and weight 3 = .05 for a positively worded item and .01 for a negatively worded item. Example scale score for a two-item scale with one positively worded item and one negatively worded item:
Calculation of Frequency-Based Within-Item Standard Deviations Formula:
where:
and where: WSD = within-item standard deviation across items. i = item, j = total number of items, VI = very inaccurate, N = neither inaccurate nor accurate, VA = very accurate.
FRAME-OF-REFERENCE AND FREQUENCY-BASED ESTIMATION Example calculations of frequency-based within-item standard deviations for a single item at various levels of inconsistency: High Inconsistency %VI = 50, %N = 0, %VA = 50
where WSD(i) = within-item standard deviation for a single item. Medium Inconsistency %VI = 45, %N = 50, %VA = 5
where WSD(i) = within-item standard deviation for a single item. Low Inconsistency %VI = 0, %N = 10, %VA = 90
39
FRAME-OF-REFERENCE AND FREQUENCY-BASED ESTIMATION where WSD(i) = within-item standard deviation for a single item.
40
FRAME-OF-REFERENCE AND FREQUENCY-BASED ESTIMATION Highlights:
Frame-of-reference (FoR) and frequency-based estimation (FBE) were examined
Higher contextualization led to increased predictive validity
FBE did not act as a moderator to increase predictive validity
Participants reacted more positively towards more highly contextualized measures
41