Perspectives on the measurement of job attitudes: the long view

Perspectives on the measurement of job attitudes: the long view

PERSPECTIVES ON THE MEASUREMENT OF JOB ATTITUDES: THE LONG VIEW Patricia C. Smith Jeffrey M. Stanton Bowling Green State University, Bowling Green, OH...

146KB Sizes 0 Downloads 53 Views

PERSPECTIVES ON THE MEASUREMENT OF JOB ATTITUDES: THE LONG VIEW Patricia C. Smith Jeffrey M. Stanton Bowling Green State University, Bowling Green, OH, USA

Insights from 50 years of industrial research and consulting projects were synthesized into a set of themes pertaining to the measurement of job attitudes. The themes include the importance of developing instruments using vocabulary and style familiar to respondents; of programmatic research in developing and validating measures; of paying attention to anomalies and discontinuities in data; and of integrating research results into science and practice. The authors develop recommendations concerning obtaining useful field experience and fostering a scientific environment that respects and encourages efforts to improve psychological measurement.

The first author of this article has occupied more than a half century considering, researching, and publishing on the topic of job attitudes (e.g., Smith, 1942). These diverse explorations have ranged from efforts to combat industrial monotony (Smith, 1955), through research on job involvement (Schwyhart & Smith, 1971), to study of the Protestant Work Ethic (Wollack, Goodale, Wijting, & Smith, 1971). It is her efforts to understand and measure job satisfaction and other closely related work attitudes, however, that have undoubtedly provided her most influential contribution to the field (e.g., Smith, Kendall, & Hulin, 1969; Ironson, Smith, Brannick, Gibson, & Paul, 1989). In the course of this work, the first author has stocked an enormous experiential warehouse with a rich variety of information. Consulting projects, norming studies, unpublished theses and dissertations, the odd data set that never made print, and numerous other sources of empirical, if sometimes anecdotal, data have filled this warehouse. The present review intends to throw open the warehouse doors and examine the contents to yield a set of useful messages from the data contained therein. By mining about 50 years of Direct all correspondence to: Patricia C. Smith, Bowling Green State University, Bowling Green, OH

43402-2719, USA.

Human Resource Management Review, Volume 8, Number 4, 1998, pages 367±386 All rights of reproduction in any form reserved.

Copyright # 1999 by Elsevier Science Inc. ISSN : 1053 ± 4822

368

HUMAN RESOURCE MANAGEMENT REVIEW

VOLUME 8, NUMBER 4, 1998

professional experience in industrial psychology, we hope to reveal patterns of success Ð and failure Ð that carry useful lessons for practitioners and scientists with interest in the measurement of job attitudes. Think of this as a qualitative meta-analysis of a career's worth of published and unpublished research. The observations and generalizations in this article are attributable primarily to the first author. The second author has participated only in the most recent phases of the research and so serves mainly an organizing and integrating role in the presentation of this material. Hence, the apparently awkward use of the first person in a coauthored article correctly represents the voice of the first author. Both authors, however, stand behind the conclusions and recommendations presented here. The paper presents two major sections. The first section organizes the first author's research and applied experiences into a set of themes or perspectives. The second section develops action recommendations for scientists and practitioners based on the issues raised by these themes. We intend to provide useful advice and direction for scientist/practitioners who are interested in the measurement of job attitudes. USE THE NATIVE TONGUE Henry Gleitman, an experimental psychologist and former colleague at Cornell, once pinpointed a goal for the clarity and parsimony of scientific communication: ``If you can't explain it to your grandmother, then you're explaining it wrong.'' Although this lesson could be widely applied in the social sciences, it has special significance in the realm of measurement of job attitudes. Particularly, we organizational researchers must make the strongest possible effort to present our measurement instruments in clear language, appropriate for the individuals who will be responding to them. Investigating Monotony Through naive and fortuitous common sense, I put this approach to work while researching my dissertation (Smith, 1942, 1953, 1955). I worked with women in the garment industry to examine individual differences in feelings of monotony. Perhaps in part because of my age and gender, I was quickly able to gain the trust of the women in the mill. This trust made it possible for me to become immersed in their milieu and in turn, to understand their work and jobs in the thoughts and language they themselves used to describe them. I made an effort to dress as they did Ð no business suits Ð and I also worked collaboratively with their union. These efforts had an incalculable payoff in providing unique insights into the research. The biggest payoff was in knowing the right questions to ask and how to ask them. I was able to fine-tune the language of my research survey in interviews and follow-ups with the workers.

369

JOB ATTITUDES

Investigating Performance Measurement and Feedback I carried this formative experience in speaking the language of the workers into my work on the development of behaviorally anchored rating scales (BARS) (Smith & Kendall, 1963; Bernardin, LaShells, Smith, & Alvares, 1976; Bernardin & Smith, 1981). Based, in part, on the idea of Flanagan (1949) about referencing observed behavior, I imagined a set of rating scales that would be useful for feedback. This meant that the same people who would be making or receiving the ratings would word the anchors. In the 1963 study, the National League for Nursing specialists and I worked with medical ± surgical nurses to develop the rating scales. Each of the steps in the development of BARS included extensive input from the nurses. Head nurses provided the list of performance areas, definitions of these areas, behavioral observations and examples, and judgments about all these elements. Although the initial motivation for the development of BARS was to improve psychometric characteristics of the ratings and their usefulness for feedback, another benefit that emerged was the high level of acceptability of the rating scales for both raters and ratees. I believe that the scales were acceptable for two important reasons: first, head nurses and other nurses had participated in their development and second, the scales were conceptualized and phrased in the language those nurses used to describe nursing performance. Investigating Measurement of Satisfaction Efforts leading to the development of the Job Descriptive Index (JDI) adhered to this same philosophy (Smith et al., 1969). My students and I began by asking people to imagine and describe the characteristics of their worst and best jobs. This foundation helped ensure that the resulting instrument would be meaningful, readable and easy to complete. The lesson that I extracted from all these experiences was this: when developing psychological measures, use a process from start to finish that captures the language and meanings of those who will be your respondents. An important corollary is that psychometricians should attend to the mental processes that we have asked respondents to use. Does our instrument ask them to think, feel, remember, predict, judge, generalize, infer, introspect, or some combination of these? As they complete our instruments, we should ask respondents to engage only in those mental activities that they are comfortable and practiced at performing. For some respondents, this implies keeping the amount of reading and the reading level as low as possible.

FOLLOW THE STREAM TO ITS SOURCE Without pointing fingers at specific authors, I suspect that many research projects published in the major journals represent an end point for a particular

370

HUMAN RESOURCE MANAGEMENT REVIEW

VOLUME 8, NUMBER 4, 1998

line of research. Perhaps some unfortunate interplay between journal editors and tenure committees is to blame for this tendency of authors to bounce from one publishable project to another largely unrelated, but also publishable one. Whatever the cause, we lose considerable intellectual depth and excitement when a research stream is not followed back to its various and usually numerous headwaters. The Construct of Job Satisfaction To illustrate a contrast with the present situation, consider the explorations of the construct of job satisfaction beginning with the publication of Locke, Smith, Kendall, Hulin and Miller (1964). Actually, even this initial article was just the tip of a validation iceberg that eventually included collection of 21 different industrial samples in 18 organizations, a research presentation by Kendall (1961), and a technical report by Kendall, Smith, Hulin and Locke (1963). Further considerations of these data appeared in Hulin and Smith (1965), which explored and discounted the possibility of a curvilinear relationship between job satisfaction and job tenure. That whole decade of job satisfaction research was finally captured and synthesized in the publication of the book, The Measurement of Satisfaction in Work and Retirement, by Smith et al. (1969). Publication of this first book represented a possible and logical stopping point. No colleague would have argued with the decision to strike off in a new direction or pursue a subsidiary interest at that time. But clearly, additional work remained. Smith, Smith and Rollo (1974) confirmed the factor structure of the JDI in a racially diverse sample. Johnson, Smith and Tucker (1982) negated other researchers' assertions of the superiority of a Likert-scaled response format for the JDI. Smith et al. (1987) reported the first modernization of the JDI item content. Ironson et al. (1989) developed the initial version of the multi-item ``Job-In-General'' scale, which added a psychometrically sound, global satisfaction measure to the facet measures. These efforts (as well as much great work from other job satisfaction researchers) culminated in the publication of a second book, entitled Job Satisfaction, and edited by Cranny, Smith and Stone (1992). Work on the JDI and explorations of job satisfaction and related constructs continue to the present. The important point emerging from this overview of JDI research concerns the value of a sustained research effort. Four decades of job satisfaction research have produced insights that simply could not have been obtained from a few months or years of data collection and analysis, followed by a single publication. While present day organizational research tends to be ``one-shot,'' follow-up and follow-through are critically important. I believe this point has especial importance for efforts to develop instruments. I recognize, of course, that such single-minded focus is a luxury that many researchers in applied settings cannot afford. This fact only makes the issue that much more pertinent and imperative for academic researchers.

371

JOB ATTITUDES

CHASE AFTER BUGS One of the unfortunate side effects of ``hit and run'' research efforts is that sometimes we organizational researchers give up on a data set too quickly. Having demonstrated our initial hypotheses and expectations to be tenable, we shelve the data and move onto newer and perhaps more exciting pieces of research. An Early Example On occasion, we resist this temptation, and stick with some data long enough to uncover something unique. For example, Jean Taylor's thesis (Taylor & Smith, 1956), investigated workers' learning curves with the expectation of confirming what so many researchers had previously found: a smooth curve that begins steeply but gradually levels off to a plateau. Armed with present-day computing gear, we probably would have run a nice curve-fitting program, found an acceptable fit, and had done with it. Instead, Jean Taylor and I somewhat laboriously plotted the curves for 70 individual operators and graphed them on common axes. What became clear from these plots, and even clearer in composited curves, was that these were not smooth curves at all: In fact, operators exhibited three distinct stages in which they apparently learned different aspects of the tasks. Learning seemed essentially linear within each stage, and this created three different slopes. Together, the three slopes gave the appearance of the classic power law for learning. We might never have found the multi-stage character of the task without a willingness to look for unexpected patterns. A Modern Example In a modern example on the same theme, Delaney, Reder, Staszewski, and Ritter (1998) also recently questioned the accuracy of this power law of learning. These researchers report data contradicting the description of Newell and Rosenbloom (1981) of the ``universal power law of practice.'' Delaney et al.'s study wonderfully illustrates my point: Their careful analysis revealed the presence of a power law of learning, but only when examined in the course of a single task strategy. When task strategy changed, this caused a subtle discontinuity in task performance. Such discontinuities appear to have been overlooked or ignored by previous researchers such as Newell and Rosenbloom. As a side note, I found it curious that Delaney et al. did not cite some of the older literature on this topic. For example, Fleishman and Fruchter (1960) or Ghiselli and Haire (1960) published on the change in factor structure of a task during learning, and French (1965) researched differences in abilities involved when different task strategies were employed. J.P. Guilford, my undergraduate mentor, often said, ``Milk your data dry.'' More specifically, I might advise that the discontinuities, unexplainable discrepancies, and glitches sometimes hide the richest insights into organiza-

372

HUMAN RESOURCE MANAGEMENT REVIEW

VOLUME 8, NUMBER 4, 1998

tional data. Openness to looking in your data for unexpected patterns and relationships can lead to unexpected payoffs. Stress vs. Satisfaction The JDI research group, which has existed for more than 30 years now, is in this kind of investigative process right now in regard to the relationship between stress and job satisfaction. We have repeatedly used the Stress-In-General (SIG) (Smith, 1994) instrument to measure work-related stress and the Job-In-General (JIG) (Ironson et al., 1989) instrument to measure global job satisfaction. Every time we pair these instruments in a data collection effort, we seem to get a radically different correlation Ð sometimes very small, sometimes very large. Is this a measurement problem? Probably not, since the internal reliability estimates of both measures average more than 0.90. We in the JDI research group are inclined instead to think that we are looking at different regions of a curvilinear relationship. We are in the process of publishing data that strongly suggest the existence of such a relationship. In a consulting situation, I found a similar anomaly in the relationship of general job satisfaction to intention to quit. Normally, these two variables are strongly and inversely correlated. But at a garment plant in the rural south, I found a very different situation: high levels of job satisfaction and high levels of intention to quit and actual turnover. Upon close examination of the work situation, two important variables appeared that made sense out of the anomalous relationship. The social and economic circumstances of the women who worked in the plant ensured that most of them would quit their jobs after a few months, only to be hired back again a few months later. The other variable was trust. These women had such a high degree of trust in their manager that their satisfaction with the restrictive work they did was much higher than I would have expected. Do these two variables moderate the relationship between satisfaction and intent to quit? I never tested a moderator hypothesis at that plant, but tracking down the anomaly led me to those unexpected insights. Some Newer Perspectives Some recent theory developments such as catastrophe theory and chaos theory also go along nicely with the idea of tracking down anomalies. In the field of epidemiology, researchers have developed the notion of a tipping point. The tipping point refers to a kind of critical mass of infection, beyond which the spread of disease becomes epidemic. Too rarely in organizational research do we give careful thought to such areas of non-linearity. For instance, the decision of an employee to leave a firm might come after a serious of small setbacks or difficulties each of which, taken singly, has only mild or negligible negative effects on job attitudes. At some critical point, however, the employee's commitment or satisfaction must drop below an important threshold

JOB ATTITUDES

373

where the intention to quit becomes magnified and potent. After that, a chance occurrence, such as receiving a call from a recruiter, could easily trigger the employee's resignation. Somewhere in the great archive of partially analyzed data that has accumulated as a result of neglecting our own advice, there probably lies an insight into this problem. The reverse of the tipping point can also occur. A recent book entitled Fixing Broken Windows (Kelling & Coles, 1996) presents evidence that attending to the little problems can help address the big problems. In terms of job attitudes, this phenomenon suggests that improvements in such small, niggling issues in the workplace, as unnecessary bureaucratic complications, may have an unexpectedly large positive effects on workers' job attitudes. Anecdotally, we have noticed such non-linearity in the relationship between pay and pay satisfaction as measured with the JDI. We have observed that a modest increment in pay can have an unexpectedly strong positive effect but that equal sized increments thereafter do not have an equivalent effect on pay satisfaction. The same non-linearity may be true of the other facets of satisfaction as well. The main point is that a standard correlation or regression analysis would tend to mask these non-linear effects. Cleaning the Data Of course, some discontinuities truly represent problems in data entry, coding, or analysis. In Smith et al. (1986), we advocated the use of 13 rules for avoiding, detecting, and correcting problems in data sets. Interesting non-linearities and other unexpected relationships found in research data should first be attributed to problems in data collection, entry, or analysis. I start cooking up new explanations for weird relationships only after I have exhausted the possibility that I have made a mistake in the data. The experiences discussed above also underscore a related bit of advice. The more involved you are with your own data collection and data entry, the more likely you are to recognize data anomalies for what they really are. Cleaning the data goes well beyond correcting data entry errors, of course. Removing poorly functioning items from scales can greatly improve investigation of the substantive research questions. For example, when I removed a single item Ð ``Too much to do'' Ð from the work scale of the 1985 version of the JDI, the correlation between the scale and a measure of stress dropped 0.08. I considered this change a victory for discriminant validity: the dropped item had caused a spuriously high correlation between satisfaction and stress in the earlier version of the scale. This example shows that the ``dog-work'' of cleaning and clarifying scales has numerous payoffs including the possibility of more meaningful comparisons across studies and of regression models uncluttered by error variance. With clean data, we can better examine the substantive questions of this research. I can't resist pointing out one additional, related lesson. Always begin by examining data at the individual level. Use graphs, scatterplots, bar charts, histograms, or just lists of variables and values. Are the individual cases

374

HUMAN RESOURCE MANAGEMENT REVIEW

VOLUME 8, NUMBER 4, 1998

plausible? Do you see anything strange? Is the pattern of missing data meaningful? Do the patterns fit your preconceptions? Do they fit with previous findings? If individual level data do not make sense, the aggregated statistical tests won't make any sense either. Don't pool your data too soon. Sometimes when you pool your data, you lose the trees for the forest. DATE MODELS, BUT DO NOT MARRY THEM In an article on theory building, Locke (1996) made the argument that researchers should let the weight of evidence accumulate before attempting to synthesize theory. Building theory too early may lead to considerable wheel-spinning as new and possibly contradictory evidence piles up, and the theory must be patched to fit the new facts. I was Ed Locke's dissertation adviser, and I feel free to paraphrase his good suggestion as ``let the data do the talking.'' While it is natural to begin with one's own assumptions and biases, I assert that researchers should make their implicit causal models explicit Ð but not marry them. Diagram or write down your mental model. Let the weight of anomalies that appear in your research shift the structure of your mental model. Keep testing your mental model by pitting it against alternative and especially simpler models. I can illustrate these ideas in reference to the earliest development of the JDI facet measures. Description vs. Evaluation Early efforts on the development of the JDI hid a number of unresolved questions. For instance, could or should description of the job and evaluation of the job be assessed separately? A process of trial and error ensued in which attempts were made to separate description and evaluation. The dissertation research of Maas (1966) made a good first try at separating descriptive words such as ``hot'' from evaluative words such as ``good.'' Upon my arrival at Bowling Green State University, I asked the psychology faculty to divide the initial JDI items into two separate piles: a descriptive pile, and an evaluative pile. They couldn't do it! Irreconcilable disagreement about the sorting appeared. Further, factor analyses showed no breakdown along descriptive and evaluative lines. Eventually, I concluded that, although psychologists can define description and evaluation as different response modes, for the purposes of measuring job satisfaction, respondents make no such separation. Most adjectives seem to contain an inherent evaluative component, and those few that are truly ``neutral'' are not very useful for assessing individual differences. As efforts to refine the JDI have continued, I have repeatedly tested evidence for and against a separation between description and evaluation. Recently, Weiss and Cropanzano (1996) proposed a model called Affective Events Theory for examining the consequences of emotionally-charged events at work. They may have begun to resolve the issue by showing connections

375

JOB ATTITUDES

between feelings about work events and judgments about them. Until we get more data, however, I wouldn't consider the issue entirely resolved. And I am perfectly satisfied to have the question sit there. Meanwhile, my colleagues and I will keep dreaming up new ways of getting evidence to resolve the question. This theme definitely ties back to the idea of sustained effort in pursuing a research area. Pure replications appear to be unpopular in our field, to judge by the tables of contents of scientific journals, but partial or complete replications seem like the best way to make sure that the model we have in our heads is the one that the data keep confirming. The study that confirms a previous finding contributes to the science by adding incrementally to our confidence in the underlying theory. WHEN YOU'RE SURE, POUND THE TABLE ``If it was your own money, Doc, what would you do?'' This question Ð from long ago but still well-remembered Ð left me shaken. An old time plant manager in a little factory in Appalachia challenged my delay in deploying a set of selection tests. I had pedantically explained to him about the importance of the 5% significance level. The validation procedure had been picture-perfect, the criteria nice and clean, the qualitative reports enthusiastic, and the little dots relating scores to current performance on the scattergram lined up nicely. While the factory awaited my OK to use the tests, many people were being hired with primitive methods, and money and effort were being spent training them. Criterion data on these cases were months away. Of course, if it had in my own money at stake, I would have immediately begun screening with the tests rather than waiting for the rest of the data. I knew the tests were working, but academic caution had overtaken common sense. I undervalued my own ability to make an accurate tactical decision. I knew great deal more about the people and organizations than the significance tests let on. The lesson I learned from these events was not to sell short my expertise, knowledge, judgment, and intuition. Stress vs. Satisfaction Reprise So now I will take a moment to apply the lesson to a message about job attitudes that I have failed to sell before. Dissatisfaction and stress are two separate and distinct feelings that must not be confused with each other. I am sure of the distinction, sure as I can be after many years of research and application. I also believe the distinction has clear practical applications as well as theoretical importance. Dissatisfaction and stress have different causes. They are accompanied by different behaviors. They have different economic consequences to the organization and the person. And they should often be alleviated by different means. They may or may not occur at the same time in the same situation. Hard

376

HUMAN RESOURCE MANAGEMENT REVIEW

VOLUME 8, NUMBER 4, 1998

evidence shows, for thousands of cases, that items measuring job satisfaction lie on an axis nearly orthogonal to an axis defined by stress items. The correlation between the two varies, but the overlap averages about 10% using highly reliable measures. The two feelings obviously can occur together but do not do so consistently. The big deal here is that job satisfaction is highly correlated with evaluation of the work itself, while job stress is equally highly correlated with time pressure (at least for U.S. workers). Job satisfaction, on the other hand, has little to do with time pressure, just as job stress is essentially unrelated to satisfaction with the work itself. Satisfaction has to do with preferences and liking; stress has to do with arousal and activation. Straying from the hard data to anecdotal observations, people who dislike the nature of the work assignments will not necessarily like it better if the work is spread more evenly over time. Likewise, distressed and overloaded people will not become calm by hurriedly doing ``nicer'' tasks. Eventually, members of both groups quit, but the dissatisfied person is likely to have planned the move, while the distressed person will explode and walk out when the straw breaks the camel's back. How we approach fixing dissatisfaction and distress must also differ. Perceptions of control probably play an important role in alleviating both maladies. IN GENERAL, BE SPECIFIC I have noticed even in my own research group that we as researchers pay too little attention to levels of specificity in measurement. We often try to predict some specific behavior from a general construct or measure a general construct with very specific items. I believe we need to keep an eye on measuring the core of the construct while we simultaneously worry about the proper level of specificity in item content. If our measurement strategy requires approximating the core construct by a number of peripheral measures, we have to stay cognizant of the need for representativeness. The higher the degree of specificity of the items, the larger the number of items must be and the more carefully balanced the item sampling must be. Otherwise, we begin to capture unwanted content factors. A Geometric Analogy Rather than begin by illustrating these ideas with an example, I want to show a diagram that I have found useful. Fig. 1 shows a cone shape with a cylinder of constant radius in the middle. The cylinder represents the idealized ``core'' of the construct, and the height along the cylinder represents the degree of item content specificity with which the construct is measured. Thus, the top of the cone represents the highest possible level of ``global-ness'' and generality in the measurement device. For example, in attitude measurement the top of the cone would simply be the good±bad

JOB ATTITUDES

377

Figure 1. The Measurement Cone Represents Specificity of Item Content Vertically and the Size of the Potential Item Pool as the Width of the Cone at Any Given Height

dichotomy. As one proceeds down the cylindrical core, one moves toward greater and greater specificity in item content. For example, ``talk too much'' is a stem that assesses one's attitude toward coworkers with a degree of specificity. The cone-shaped area surrounding the core represents the content space from which items can be sampled. At the very top, few items exist and they are extremely general. As one proceeds downward toward more specific, content-loaded items, greater variety emerges. With this greater variety, however, comes greater likelihood of straying away from the core of the construct. Without careful, representative, sampling of items, one cannot triangulate on that center zone where the construct lies. Of course, one could include items of both a general nature and of a specific nature. Most of the JDI facets contain such a mixture. One implication to draw from this cone diagram concerns item sampling vs. internal consistency. As one proceeds down the cone toward greater specificity of item content, it becomes increasingly difficult to obtain high values for Cronbach's alpha. With careful, broad content sampling, as advocated above, and very specific item content, one can compile a set of good items that have fairly tenuous correlations with each other. Such items cover the construct broadly but make a poor showing on measures of internal consistency. This situation would not really be problematic except for our irrational attachment to the magical .70 threshold for alpha of Nunnally (1967). Scale developers have often responded by discarding diverse items, purifying factor structure, and consequently, over-sampling certain small, lopsided content regions near the base of the cone. This approach improves alpha at the cost of creating a narrow measure that does not fully cover the construct. The cone analogy suggests two better approaches: increase the number of specific items sampled from the lower

378

HUMAN RESOURCE MANAGEMENT REVIEW

VOLUME 8, NUMBER 4, 1998

parts of the cone, or create a more global measure with fewer items nearer the top of the cone. Another implication of the cone analogy concerns relationships between constructs. With some spatial imagination, one could visualize a pair of overlapping cones and closely spaced cylinders. The closely spaced cylinders represent two related constructs and the overlapping cones suggest that the same item content can tap more than one construct, depending upon the scale context in which it is administered. This situation becomes more and more likely as the item specificity increases, since the intersecting space of the two cones becomes larger as you progress toward the bases of the cones. Moreover, correlation between the two constructs will be largest when the two constructs are both measured with items at the same level of specificity. Whether this correlation is considered spurious or meaningful depends upon the application to which the measures are being put. A Consulting Example I once completed a consulting effort at a very troubled power plant where a few of these ideas were illustrated. We measured facet job satisfaction and global job satisfaction both before and after conducting an intervention to improve communication. The intervention involved developing BARS scales with feedback on observations of behavior. After the intervention, we found meaningful and statistically significant differences in satisfaction with supervision and satisfaction with coworkers, but no difference whatsoever in global job satisfaction. All of our intervention efforts had been directed at improving relationships and communication between people. The intervention impacted very specific classes of behavior. Our facet measures of satisfaction with supervision and satisfaction with coworkers comprised many items that also referenced specific behavior. In contrast, global satisfaction was measured at the highest level of generality Ð good vs. bad. Even though the facets were quite strongly correlated with the global measure, the intervention affected scores on the former but not the latter. How could our intervention have failed so completely to affect one of the major variables of interest? Perhaps changes in a facet do not cause changes in a global measure. Did we measure global satisfaction too soon? Perhaps there is an inherent time lag and we did not wait long enough. The other relevant anecdote from that power plant site was related to the use of a job characteristics survey developed by some other researchers. We realized that our ``work'' scale Ð satisfaction with the work itself Ð contained many overlapping items with sub-scales of the job characteristics survey. In terms of the concepts represented in Fig. 1, we had two closely related constructs measured at a moderately specific level: our measurement ``cones'' intersected. We had used some of the same item stems, though within the context of different scales. Analysis of the data revealed two expected outcomes. First, the two scales were highly correlated. Second, a factor analysis of

379

JOB ATTITUDES

the component items of both scales revealed that even the overlapping items sorted themselves into the proper factors. A Scale Development Example This latter success highlights a potential danger, however. If we had sampled items in a particular content area too heavily for either of the two constructs, these items would have split off to form their own content factor. We ran into this problem when we pilot tested a new group of items for the 1997 revision of the JDI work scale. We included four experimental items pertaining to control over one's work tasks (Bachiochi, Stanton, Robie, Perez, & Smith, 1998). These items split themselves into a separate, but correlated, factor rather than adhering to the main factor of the work scale. We had sampled this specific content area too heavily and created a somewhat lopsided mixture of items. Closing Thoughts I would like to close this section with a general plea for more care in constructing our organizational research instruments. My experience converges on the idea that organizational researchers must attend carefully to whether their items and scales measure features of the work environment, characteristics inherent in people, or the interactions between the two. In circumstances where people are defensive about revealing aspects of themselves, a focus on measuring the work environment may provide us with superior information. In other circumstances where peoples' emotions and temperaments may color or distort their reports of the work environment, we may more profitably measure characteristics of people without reference to the environment. Of course, like the rest of my prognostications, this advice is subject to empirical verification. I am particularly curious to learn more about the extent to which temperament influences responses to the whole range of organizational research instruments. This idea, in turn, triggers a consideration of whether personality characteristics such as extraversion, affectivity, and trait anxiety may have a much more profound effect on job attitudes than we have supposed and documented so far. If temperament affects job attitudes, and job attitudes affect job behavior, the current trend of using personality tests in selection may yet reveal some uncharted depths.

RECOMMENDATIONS Recommendation 1: Begin with Immersion Work in organizations before you attempt to measure any construct. If possible, perform the job performed by the incumbents you plan to measure.

380

HUMAN RESOURCE MANAGEMENT REVIEW

VOLUME 8, NUMBER 4, 1998

Learn the language and culture of your study population. Collect informal, qualitative, and pilot data before committing to a particular measurement method. Understand the thought processes that individuals in your target population use when evaluating your construct of interest. Collect plenty of data before publishing your grand unifying theory Ð enough so that the underlying patterns emerge and begin to repeat themselves. Let the data speak. The flip side of this coin is organizational access. If you personally have decision-making influence in an organization, please consider granting access to researchers so that they can follow the above advice. Too often we organizational researchers face access denials based on concerns that the presence of researchers and their surveys will stir up trouble, reduce productivity, or engender the problem behavior that is the focus of our study. Although a number of intemperate responses usually come to mind at this point, my main rebuttal to such arguments is that they credit employees with very little intelligence. The great preponderance of employees needs no outside help to focus their disaffection, if it already exists. Few organizational environments could fail to benefit from the presence of ethical and professional researchers to help facilitate communication between and among employees and managers. The willingness to support organizational research usually has to start at or near the top, so if you are in a position of authority in your organization please consider opening your doors. Of course, we researchers have to do our part as well. Organizational access will be improved if our research and intervention proposals are tailored to the needs of the whole organization, not merely to the concerns of the human resources department (who sometimes feel a bit isolated anyway). We, as internal or external organizational consultants, must bear in mind that the non-HR functions of the organization Ð planning, production, quality control, etc. Ð are all interrelated. In other words, do not just do a selection project for a client, think systemically and make sure that your project fits and addresses the whole context. Our own research agenda is likely to become more meaningful in the process. Fragmentation makes for unrealistic proposals Ð and often for contaminated research results. If we do not have the power to integrate across all parts of the organization, we can at least work in that direction. Integration is not only desirable for smooth running operations but also a road to long term organizational survival. Recommendation 2: Do not Forget the Classic Literature I got very excited when I recently read that PsycInfo was expanding to cover psychological literature all the way back to 1872. Armed with modern search engines, on-line journals, and electronic citation indices, it has become all too easy to ignore findings that were published prior to 1980. In developing the BARS scales, for example, I remembered the Fels Parent Behavior Rating Scale, a vertical scale with behavioral anchors, reported in

JOB ATTITUDES

381

Psychometric Methods of Guilford (1954). Similarly, in working with learning curves (Taylor & Smith, 1956), I remembered a procedure called Vincent curving, in which different individual learning curves were converted to common psychological axes by re-scaling both output and time as percent of the value at criterion. This technique revealed the underlying changes in process and in what needed to be emphasized in training. Finally, in making a clear distinction between satisfaction and stress, I drew strength from knowing that psychologists from Wundt (1896) onward had distinguished pleasure ± displeasure from activation or arousal. Examples, with varying terminology are Schlosberg (1952), Osgood, Suci and Tannenbaum (1957) and Russell and Mehrabian (1977). Each of these examples illustrates how critical it can be to have a broad familiarity with older research literatures and especially the classic publications of our field. To avoid re-inventing the wheel it is every researcher's responsibility to ensure that the research question or one of its corollaries has not already been asked and answered by a previous generation of psychologists. Along the way one usually uncovers a few jewels that make the trip more than worthwhile. Recommendation 3: Foster Good Measurement Culture Support a culture of good measurement in our field. I have a particular concern about the status of measurement in our scientific publications. Perhaps journal editors and reviewers could, in greater numbers, join an effort to stop rewarding ``adhocracy'' and start rewarding a deliberate, thoughtful, and long-term approach to measurement. It ought to be feasible to publish a well-developed measurement paper in any top journal. After that initial publication, it should be possible to accrete additional validity evidence via published articles. My intuition tells me that both of these activities have become exceedingly difficult. If collecting and publishing validity evidence constituted a publishable intermediate product, researchers might be less inclined to publish substantive findings prematurely. Perhaps we need separate journal space for demonstration studies. Demonstration studies refine or confirm what we already know or suspect, while other research efforts help develop new theory by refuting prior assumptions. Instrument development and validation efforts often appear to fit better in the former rather than the latter class. Such studies sometimes appear to get short shrift because they fail to ``break new ground'' or ``make a worthwhile contribution.'' Other prognosticators have asserted that the Internet will eventually revolutionize academic publishing and make present concerns about journal space moot. Regardless of how fast this revolution overtakes us, editors and reviewers should consider providing support for research ``building-blocks.'' If a manuscript provides a useful tool that other researchers can use to strengthen their own research, that is reason enough to publish the piece. In other sciences where measurement is difficult and probabilistic, e.g., branches

382

HUMAN RESOURCE MANAGEMENT REVIEW

VOLUME 8, NUMBER 4, 1998

of the biological sciences, manuscripts that make one small step toward isolating a particular phenomenon are granted journal space. These papers provide sufficient detail so that other researchers can begin to use the same technique in pursuit of their own ends. Einstein is said to have attributed his breakthroughs to his ability to ``stand on the shoulders of giants.'' One can peruse the current contents of a number of top social science journals and find precious few shoulders to stand on. These points seem to argue for more attention to synthesis articles as well. While some researchers make useful incremental advances with empirical studies, others must take a step back, pull together the work of others, and weave their common threads together. While meta-analyses do this in very quantitative way, I believe we still have plenty of room for qualitative reviews, especially of advances in the measurement literature. Recommendation 4: A Database of Organizational Measures Let us develop a well-documented ``nomological network'' of measures that all organizational researchers can access. We need a ``Handbook of Measures for Industrial±Organizational Psychology'' or an update of Cook, Hepworth, Wall and Warr (1981), Price (1972), or Robinson, Athanasiou and Head (1969). Researchers could cooperate to develop a meta±meta-analysis with estimates of population correlation among a network of critical variables as well as typical scale and item-level statistics. This network could be published as a resource on the World Wide Web and updated with new data as they are produced. Such a database would serve numerous purposes. Since it would contain item content, it would facilitate content comparisons between measures of closely related constructs. Researchers could then begin the process of cleaning up scales that have problems with factor structure, internal consistency, or discriminant validity because of unintentional and non-representative overlap with measures of other constructs. With the latest and best versions of measures available, applied researchers would make much better-informed decisions when assembling their organizational surveys. Authors would have strong ammunition when questioned by editors and reviewers about their choice of research instruments. This database of organizational measures could create a new paradigm for publishing. Applied and academic researchers would provide a brief description of the nature of the scale, the circumstances of its administration, and primarily descriptive results. The ``editor'' and ``reviewers'' for the database project could link these data into the appropriate locations in the database and release a new ``issue'' of the database. The researchers would receive authorship credit for their contribution, but the real benefit would be to the field of organizational research. If only a fraction of the many of validity and norming data sets that presently lie fallow in the archives of consulting firms and universities were published in this way, many researchers and organizations might benefit.

383

JOB ATTITUDES

Recommendation 5: Open the Warehouse Taking this recommendation to the next level, large organizations, consulting firms, and academic departments also house archival data pertaining to projects other than instrument validation and norming studies. While a sizable proportion of these data are undoubtedly proprietary, many of these data sets are old enough or anonymous enough that their publication would pose no threat to the organizations that originated them. I suspect that lurking within these data sets, there are numerous gems: modest insights into research questions we have all thought about but not had time to collect data on, analyze, and consider. I believe that, if a few of these groups would open their data warehouses to researchers, many new and interesting data mining efforts would ensue. Publishing archival data is a practice that non-profit organizations and government agencies have engaged in for years. The practice seems to have facilitated much useful research in sociology and other branches of the social sciences, and I can't think of any good reasons why the organizational sciences couldn't benefit as well. The World Wide Web provides the perfect medium for publicizing such archives to interested researchers. To set a good example, I have taken the first few steps along this road with archival JDI data. With the help of my coauthor and research team colleagues, I have made some of our JDI research data available for research purposes. We have created a web page (http://www.bgsu.edu/departments/psych/JDI/) to disseminate information about the archive and to describe a streamlined application process for obtaining access to the data. We hope to promote further development of the instrument and to encourage researchers to address some useful questions about job satisfaction that we have not had time to pursue.

CONCLUSIONS A brief review of five decades of research on job attitudes provided the basis for extracting some useful lessons on the development of organizational measures and the importance of programmatic research integrated with both organizational needs and scientific requirements. Measurement is the basis of every scientific field, and, at present, many measurement problems in the field of applied psychology remain unresolved. When researchers generate new self-report scales for projects instead of finding and using established scales, they continue to compound these problems. The present paper argues for a more organized and structured approach to measurement in our field, especially in regard to measurement of ``soft,'' non-performance constructs such as job attitudes. All of us in the organizational research community Ð students, researchers, authors, reviewers, and editors Ð must work to foster a research culture that values high quality measurement. This point pertains to the many worthwhile past and present

384

HUMAN RESOURCE MANAGEMENT REVIEW

VOLUME 8, NUMBER 4, 1998

efforts to develop new psychometric tools and techniques but also extends to important common sense practices such as using existing scales, reporting complete scale statistics, supporting publication of measurement papers, and opening up existing data archives. Applied psychology can advance only to the extent that we define and measure our constructs with precision. REFERENCES Bachiochi, P. D., Stanton, J. M., Robie, C., Perez, L., & Smith, P. C. (1998 April). Revising the JDI Work subscale: Insights into stress and control. Poster presentation at the annual meeting of the Society for Industrial and Organizational Psychology, Dallas, TX.

Bernardin, H. J., & Smith, P. C. (1981). A clarification of some issues regarding the development and use of behaviorally anchored rating scales. Journal of Applied Psychology, 66, 458 ± 463. Bernardin, H. J., LaShells, M. B., Smith, P. C., & Alvares, K. M. (1976). Behavioral expectation scales: Effects of developmental procedures and formats. Journal of Applied Psychology, 61, 75 ± 79. Cook, J. D., Hepworth, S. J., Wall, T. D., & Warr, P. B. (1981). The experience of work: A compendium and review of 249 measures and their use. London: Academic Press. Cranny, C. J., Smith, P. C., & Stone, E. F. (Eds.). (1992). Job satisfaction. New York: Lexington. Delaney, P. F., Reder, L. M., Staszewski, J. J., & Ritter, F. E. (1998). The strategy specific nature of improvement: The power law applies by strategy within task. Psychological Science, 9, 1 ± 7. Flanagan, J. C. (1949). A new approach for evaluating personnel. Personnel, 26, 33 ± 42. Fleishman, E. A., & Fruchter, B. (1960). Factor structure and predictability of successive stages of learning Morse code. Journal of Applied Psychology, 44, 97 ± 101. French, J. W. (1965). The relationship of problem-solving styles to the factor composition of tests. Educational and Psychological Measurement, 25, 9 ± 28. Ghiselli, E. E., & Haire, M. (1960). The validation of selection tests in the lights of the dynamic characteristics of criteria. Personnel Psychology, 13, 225 ± 231. Guilford, J. P. (1954). Psychometric methods. New York: McGraw-Hill. Hulin, C. L., & Smith, P. C. (1965). A linear model of job satisfaction. Journal of Applied Psychology, 49, 209 ± 216. Ironson, G., Smith, P. C., Brannick, M. T., Gibson, W. M., & Paul, K. B. (1989). Construction of a ``Job-In-General'' scale: A comparison of global, composite, and specific measures. Journal of Applied Psychology, 74, 193 ± 200. Johnson, S. M., Smith, P. C., & Tucker, S. M. (1982). Response format of the Job Descriptive Index: Assessment of reliability and validity by the multi-trait, multi-method matrix. Journal of Applied Psychology, 67, 500 ± 505. Kelling, G. L., & Coles, C. M. (1996). Fixing broken windows: Restoring order and reducing crime in our communities. New York: Martin Kessler Books. Kendall, L. M. (1961). The relative validity of the Job Descriptive Index and other methods of measurement of job satisfaction. Paper presented at the annual meeting of the Eastern Psychological Association.

Kendall, L. M., Smith, P. C., Hulin, C. L., & Locke, E. A. (1963). The relative validity of the Job Descriptive Index and other methods of measurement of job satisfaction.

JOB ATTITUDES

385

Ithaca, NY: Cornell University, Industrial and Labor Relations. (Cornell Studies of Job Satisfaction: IV) Locke, E. A. (1996). Using programmatic research to build a grounded theory. In P. J. Frost & M. S. Taylor (Eds.), Rhythms of academic life: Personal accounts of careers in academia. Thousand Oaks, CA: Sage. Locke, E. A., Smith, P. C., Kendall, L. M., Hulin, C. L., & Miller, A. M. (1964). Convergent and discriminant validity for areas and rating methods of job satisfaction. Journal of Applied Psychology, 48, 313 ± 319.

Maas, J. B. (1966). Satisfaction with work as indexed by income level. Ithaca, NY: Cornell University (Unpublished doctoral dissertation).

Newell, A., & Rosenbloom, P. S. (1981). Mechanisms of skill acquisition abd the power law of practice. In J. R. Anderson (Ed.), Cognitive skills and their acquisition (pp. 1±55). Hillsdale, NJ: Erlbaum. Nunnally, J. C. (1967). Psychometric theory. New York: McGraw-Hill. Osgood, C. E., Suci, G. J., & Tannenbaum, P. H. (1957). The measurement of meaning. Urbana: University of Illinois Press. Price, J. L. (1972). Handbook of organizational measurement. Lexington, MA: Heath. Robinson, J. R., Athanasiou, R. T., & Head, K. R. (1969). Measures of occupational characteristics. Ann Arbor: Institute for Social Research. Russell, J. A., & Mehrabian, A. (1977). Evidence for a three-factor theory of emotions. Journal of Research in Personality, 11, 273 ± 294. Schlosberg, H. (1952). The description of facial expressions in terms of two dimensions. Journal of Experimental Psychology, 44, 229 ± 237. Schwyhart, W., & Smith, P. C. (1971). Factors in the job involvement of middle managers. Journal of Applied Psychology, 56, 227 ± 233. Smith, P. C. (1942). The prediction of individual differences in susceptibility to industrial monotony. Ithaca, NY: Cornell University (Unpublished doctoral dissertation).

Smith, P. C. (1953). The curve of output as a criterion of boredom. Journal of Applied Psychology, 37, 69 ± 74. Smith, P. C. (1955). The prediction of individual differences in susceptibility to industrial monotony. Journal of Applied Psychology, 39, 322 ± 329. Smith, P. C. (1994 July). Satisfaction and stress: Correlation and clarification. Invited address presented to the annual meeting of the American Psychological Society. Washington, DC.

Smith, P. C., & Kendall, L. M. (1963). Retranslation of expectations: An approach to the construction of unambiguous anchors for rating scales. Journal of Applied Psychology, 47, 149 ± 155. Smith, P. C., Kendall, L. M., & Hulin, C. L. (1969). The measurement of satisfaction in work and retirement. Chicago, IL: Rand McNally. Smith, P. C., Smith, O. W., & Rollo, J. (1974). Factor structure for Blacks and Whites of the Job Descriptive Index and its discriminations of job satisfactions among three groups of subjects. Journal of Applied Psychology, 59, 99 ± 100. Smith, P. C., Budzeika, K. A., Edwards, N. A., Johnson, S. M., & Bearse, L. N. (1986). Guidelines for clean data: Detection of common mistakes. Journal of Applied Psychology, 71, 457 ± 460. Smith, P. C., Balzer, W., Brannick, M., Chia, W., Eggleston, S., Gibson, W., Johnson, B., Josephson, H., Paul, K., Reilly, C., & Whalen, M. (1987). The revised JDI: A facelift for an old friend. The Industrial±Organizational Psychologist, 24, 31 ± 33. Taylor, J. G., & Smith, P. C. (1956). An investigation of the shape of learning curves for industrial motor tasks. Journal of Applied Psychology, 40, 142 ± 149.

386

HUMAN RESOURCE MANAGEMENT REVIEW

VOLUME 8, NUMBER 4, 1998

Weiss, H. M., & Cropanzano, R. (1996). Affective events theory: A theoretical discussion of the structure, causes and consequences of affective experiences at work. In B. M. Staw & L. L. Cummings (Eds.), Research in Organizational Behavior, 18, 1 ± 74. Wollack, S. W., Goodale, J., Wijting, J., & Smith, P. C. (1971). Development of the Survey of Work Values. Journal of Applied Psychology, 55, 331 ± 338. Wundt, W. (1896). Grundriss der psychologie. Leipsic: Engelman. (Reprinted in E. G. Boring (1929), History of experimental psychology. New York: Century)