Some virtues of dissatisfaction in the science and practice of personnel selection

Some virtues of dissatisfaction in the science and practice of personnel selection

SOME VIRTUES OF DISSATISFACTION IN THE SCIENCE AND PRACTICE OF PERSONNEL SELECTION Robert M. Guion Bowling Green State University, Bowling Green, OH, ...

107KB Sizes 0 Downloads 20 Views

SOME VIRTUES OF DISSATISFACTION IN THE SCIENCE AND PRACTICE OF PERSONNEL SELECTION Robert M. Guion Bowling Green State University, Bowling Green, OH, USA

Dissatisfaction with traditional ways of thinking and doing things can either be mere reaction to tradition or useful in developing improvements in the scientific foundations and practices of management. In the specific field of employee selection, seven areas of dissatisfaction are enumerated and discussed. In most of them, those who have published their dissatisfactions have also pointed to ways to do things better. The conclusion is that dissatisfaction can point to ways to expand the options in the way employee selection programs are conceived and operated, without total abandonment of the valuable results that have accrued from traditional practice.

Habits, including habits of thought, have a way of settling in, of calcifying, of becoming ``the right way'' to do things or the right way to think. Into such inertia comes a sense, now and then, that something is not quite right. Sometimes, it is only the odd person here and there who has that vague sense of dissatisfaction. Sometimes, it is part of the zeitgeist, and several people, in different venues, share a common, if vague, sense of dissatisfaction. At first, it may be poorly articulated if at all. In time, someone does identify the nature or cause of the dissatisfaction, and maybe it begins to be addressed. Those who see that something is not quite right, or even dead wrong, and make it clear to others, may not themselves solve the problem, but they may perhaps make an important contribution merely by calling attention to it. I have not made a serious and thorough study of the matter, but it seems to me that in almost any field of endeavor, this is how science develops and practice improves. For several years, I had a vague sense of dissatisfaction with the way things were going in my own field of special interest Ð employee selection through personnel testing. I could articulate some problems, even if poorly, in some particulars. I knew that the field was too much rooted in the assumption that

Direct all correspondence to: Robert M. Guion, 632 Haskins Road Bowling Green, OH 43402, USA. Human Resource Management Review, Volume 8, Number 4, 1998, pages 351 ± 365 All rights of reproduction in any form reserved.

Copyright # 1999 by Elsevier Science Inc. ISSN : 1053 ± 4822

352

HUMAN RESOURCE MANAGEMENT REVIEW

VOLUME 8, NUMBER 4, 1998

what happened in the past was a great guide to what will happen in the future, and that the assumption was generalized to a notion that the way we have always done it should be pretty much the way we should keep on doing it. I knew that the past was rapidly receding, that the now did not look enough like the then, and I suspected that the someday, even in the fairly short term, would look still more different. Based on this vague dissatisfaction, I started a new book on the use of measurements and quasi-measurements for making decisions about people and the quality of the work they do (Guion, 1998a). I have certainly not been alone in being professionally and scientifically dissatisfied. One might even say that we have entered a new age, more than just a season, of discontent. I do not refer to people who can be fairly called chronic malcontents; I refer to a much broader characteristic of the contemporary scene. Dissatisfaction, with the way things are and have been, is on the rise, not only in employee selection (now called staffing) but in most areas concerned with the psychology of work and of human resources management. Some statements of dissatisfaction amount to little more than rhetorical tirades, wrapped around words intended to be pejorative. Some of them use words like traditional, static, mechanistic, or individualistic, not to be informative, but to be derisive and to associate the author with inventiveness or novelty, change, fluidity, or a social rather than an individualistic orientation. But some statements of dissatisfaction are stimulating and deserve to be taken seriously; getting rid of the dissatisfying practice or attitude may change or even improve the way things are done and the ideas that drive them. However, new ideas, including those intended as replacements for old ones that fail to satisfy, should not be adopted recklessly or without analysis by thought or data; in the words of a very old clicheÂ, one must be careful not to throw the baby out with the bath. I choose, in this article, to look both sympathetically and critically at some recently expressed dissatisfactions Ð and some of my own that may not have been expressed in writing. The headings below express a half-dozen (plus one) criticisms of an unsatisfactory tradition; I try to look both sympathetically and critically at these complaints and, where I can, to see where they might profitably take us.

SOME DISSATISFACTIONS The World has Changed but the Selection Model has not The traditional model for selection research and practice calls for identifying a job for which improved selection methods are needed, specifying a criterion worth predicting, choosing a way to assess candidates that may predict it, correlating scores on the predictor with criterion measures, and (if the correlation is acceptable) selecting among candidates by giving preference to high scorers over low scorers on the predictor. Stewart and Carson (1997) are among those who are dissatisfied with this model; they declare it (rightly, I think) too mechanistic. It is a set of procedural rules for one job (or, at most,

VIRTUES OF DISSATISFACTION

353

one job family) at a time, and too often, it is not feasible in statistical or organizational terms. The traditional model has been modified over time, but it dates from as early as World War I and was rather well set forth in the 1920s (cf. Freyd, 1923). Many things have changed in the last three-quarters of a century. The traditional model is static; it assumes that conditions when the validation study is finished will be pretty much the same as when it was started, but change is a fact of work life and the rate of change has become startling. People used to stay on the jobs they were hired for. Now, people may apply for a specific job and get it, but they may not stay on it terribly long. They may get transferred to a different job, perhaps requiring a different set of skills. They may be promoted to a higher level and presumably more demanding job. They might quit to join a different organization. The job itself might change, following the rate of technological and social change impinging on the organization or the rate of change within the organization's structure and culture. As just one example, consider the people who were hired for an assembly-line-like job in an organization that changed its basic concept to one of self-contained teams. Several implications stem from the fact of change as a constant. Even in the traditional model, the ability to effect change or to adapt to changes that happen should often be the criterion. Change may also require different predictors and predictive hypotheses; perhaps, we should choose candidates who are bright and versatile, who have more varied ways to further emerging organizational or team goals. Indeed, if we seek diversity, perhaps difference in habits of thinking is the very kind of diversity we should seek. But for at least some organizations, such as those adopting the structural imperative of autonomous teams, where change may be particularly rapid, traditional models may not work at all. For them, Stewart and Carson (1997) foresee several further implications, such as organizational competitive strategies based on strengths of employees (as opposed to finding candidates for employment who have the strengths needed for predetermined strategies), or a redefinition of person±organization fit (to which I return later). Dissatisfaction with the old paradigm, because it does not fit the newness of some organizations, is useful. It can result in new paradigms or new hypotheses; Stewart and Carson (1997) suggested five propositions about alternatives to tradition in nontraditional organizations. They were firm, however, in saying that in traditional hierarchical organizations (most organizations, I dare say), the traditional selection model works well. The message seems clear and general: Expand the options for selection procedures and research, but do not throw out old ways just because they have been around a long time. We Use the Wrong Methods to Assess the Wrong Things ``Traditional testing'' is usually thought (erroneously) to be multiple-choice testing, and many critics are dissatisfied with multiple-choice testing. ``Life is not multiple choice,'' said Ryan and Greguras (1998) in their title, although

354

HUMAN RESOURCE MANAGEMENT REVIEW

VOLUME 8, NUMBER 4, 1998

they did not find realistic performance tests a demonstrably better approach. Most tests in the early part of the century consisted of open-ended questions to which the examinee produced an answer; e.g., oral trade tests (Osborne, 1940). It required No. 2 pencils and the invention of the test-scoring machine to turn multiple choice items into the kind of testing that has become called ``traditional.'' Recent years have brought many suggestions for improved measurement methods, including ``constructed responses'' (Bennett & Ward, 1993), but there is little evidence that they are consistently better at the assessment task (see Guion, 1998a, pp. 502±512). Expressions of dissatisfaction are not particularly new. In his Presidential Address to the organization now known as the Society for Industrial and Organizational Psychology, Lawshe criticized ``our tendency in the past to work on problems that fit our methods rather than to devise methods that are appropriate to the major problems of American management'' (Lawshe, 1959, p. 290). The proper sequence, he argued, calls first for conceptualizing the problem to be considered, then identifying conceptually the variables operating in it, and then and only then, finding ways to quantify those variables. We still fail to define the problem before devising methods to attack it; rather, we use the familiar methods we know and have available. Lawshe's approach makes much more sense than using an available measure of something that seems kind of like the variable we want, even if it is manifestly different; nevertheless, that Procrustean approach is still common enough (if not commonly described so blatantly) that it illustrates the timelessness of Lawshe's concern. Recent expressions have focused more on dissatisfaction with the nature of the variables traditionally measured Ð the constructs or attributes, or traits, or characteristics (whatever term one prefers) being assessed. Traditional selection procedures (at least those discussed in professional journals) are most likely to be tests of cognitive abilities Ð mainly tests of components or kinds of intelligence, of general intelligence, and of their less factor-analytic look-alikes. In psychometrics, contemporary dissatisfaction seems to be expressed most often by those in two disciplines, cognitive psychology and educational measurement. The preface to the first volume in a series on individual differences in cognition says, ``We feel the work presented in this volume addresses the clear need to inform psychometrics and the theory of individual differences with the perspectives of cognitive psychology'' (Dillon & Schmeck, 1983, xv±xvi, italics added). In summarizing a special issue on intelligence and life-long learning, Sternberg (1997) reiterated his well-known dissatisfaction with traditional measurement of intelligence by distinguishing static from dynamic tests and by outlining three kinds of theory-based tests (psychometric theories such as the Cattell±Horn theory of fluid and crystallized intelligence, biologically based theories, and systems theories including Sternberg's own triarchic theory of intelligence). He also reviewed new ``kinds'' or aspects (depending on one's overall definition) of intelligence and the methods of assessing them. Educational measurement has promoted ``authentic performance assessment'' as both an improved method and an improved construct, referring to

VIRTUES OF DISSATISFACTION

355

assessment of the quality of work being performed in real time under real (or realistic) conditions. The expressions of dissatisfaction with traditional educational assessment seem somewhat muted of late, but that may be due to a reduction of rhetorical excess now that the concept has been heard and frequently considered. Authentic performance assessment is another idea that is not really new, having been called work sample or job replica testing in earlier times (for some examples, see Guion, 1965, pp. 405±409). One outcome has been a special interest in constructs known as competencies. Mansfield (1996) is among those whose dissatisfaction with traditional staffing tools has led to an emphasis on competency testing, but he went a step further and wrote about ``building competency models'' for the simultaneous descriptions of requirements for several jobs. The word, competency, is too often bandied about without careful definition and delineation. For most writers advocating competency approaches, the word connotes something so broad as to be merely a different word for trait or generalized ability, or even factor. But when the word is focused more narrowly on the behavior required in a specified work context, it expands the pool of characteristics that can be assessed for selection, training, or performance evaluation. It can offer new dimensions to cognitive skills, and it can go beyond cognition to such categories as social competencies or motivational competencies or physical competencies. Many authorities are demanding more use of noncognitive predictors such as personality or motor skills, but it may be especially important to use the competency concept to make assessment in these areas more specifically work-related for a more explicit organizational context. Mansfield's approach reiterates the emphasis on competencies over generalized traits in the context of an integrated HR approach, integrating HR programs (e.g., selection and performance appraisal) and integrating the resulting program over job categories. I am certainly sympathetic with this view; I also called for and illustrated such an integrated approach, although my approach did not emphasize the notion of competencies above other ways of characterizing candidates (Guion, 1998a, pp. 422±430). Both Mansfield and I call for a common taxonomy or vocabulary of qualifying characteristics and a common vocabulary for defining levels of such characteristics across diverse jobs. Although I do not know of any organization which has done all of the work for integrating HR programs that I envisioned, or even the more modest and realistic one envisioned by Mansfield, dissatisfaction with traditional constructs and limitations has produced potentially better ideas for identifying and assessing intrinsically important characteristics on an integrated, organization-wide basis. Among other noncognitive traits, personality variables are receiving renewed attention after a long period of avoidance. Also, and not just incidentally, there is increased attention being given to what Borman and Motowidlo (1993) called ``contextual criteria.'' These are the kinds of criteria that generalize across jobs, such as attendance or on-time records or general organizational citizenship. Hough (1998) reported that personality measures provided useful

356

HUMAN RESOURCE MANAGEMENT REVIEW

VOLUME 8, NUMBER 4, 1998

incremental validity over cognitive ability tests for contextual criteria but not for technical proficiency criteria. In a report of independent meta-analyses of the Big Five personality constructs in different occupational groups and for different kinds of criteria, Barrick and Mount (1991) reported useful mean validity coefficients in several of them, with uniformly valid predictions using measures of conscientiousness. Wagner (1997) expressed dissatisfaction with the ordinarily lauded meta-analytic finding of mean validity coefficients of about 0.50 for traditional cognitive ability tests, arguing that (a) this is an overestimate when multiple sources of information are used (generally considered preferable to single scores); and (b) it is an overestimate of the theoretical role of intelligence in a causal network. He noted with some satisfaction the movement in the field to broaden both the criterion and predictor constructs considered. So do I, and I point out further that this is another example of expanding traditional procedures rather than dismissing them out of hand. Nevertheless, many in the field of employment practice remain dissatisfied with the limited variety of characteristics traditionally considered in making employment decisions, with the methods available for assessing them, and with the tradition of defining requirements and setting up assessment procedures for one job at a time. It seems almost inarguable that a broader catalog of potential employment test and other assessment procedures is needed, and it is clear that much recent work has been intended to reduce this particular source of dissatisfaction. We Concentrate too much on Averages and not Enough on Variability When we look only at the mean of a distribution, we lose (in effect) much of the data in it, and we miss the opportunity for important information. This is another area of dissatisfaction with a lot of ramifications. As just one example, consider research on test fairness. Wagner (1997) recently expressed dissatisfaction with the demise of concern about fairness in test use. In the late 1960s and 1970s, a rash of ``fairness models'' appeared (see the special issue of the Journal of Educational Measurement edited by Jaeger, 1976, for critical reviews). The so-called Cleary (1968) definition of fairness, the one most widely accepted, declares the use of a test to be fair to minority groups if the regression lines for predicting performance are the same in the majority and minority groups; i.e., for any given predictor score, the criterion score is, on the average, the same in both groups. As Wagner (and Thorndike, 1971, before him) pointed out, however, even in this admirable case, the criterion score difference is often about half that of the predictor score difference. Perhaps greater attention to explaining variances would yield fruit. It would be more satisfying if we could adequately explain departures from the mean. An explanation might be found in an influential third variable. Such a variable might reside in the minority groups with lower mean scores; an example might be vulnerability to general stereotypes about their abilities (Steele & Aronson, 1995). Or maybe the third variable would be

VIRTUES OF DISSATISFACTION

357

a group-related difference in test-taking strategies (Guion, 1998b) . In any case, research needs to center on explanations for persistent deviations from averages. Closely related (emotionally, if not statistically) is the concept of adverse impact, another topic where attention so often focused on means that the concepts of adverse impact and mean differences get confused. Adverse impact is a legal concept, not a psychometric one. It is typically based on selection ratios, the proportions of candidates who are accepted. Federal equal employment opportunity regulations (e.g., Equal Employment Opportunity Commission, Civil Service Commission, Department of Labor & Department of Justice, 1978) define adverse impact as a selection ratio in one group (usually but not necessarily the minority or legally protected group) that is less than 80% of the selection ratio in another. Mean differences between groups are often taken as ``measures'' of adverse impact, which they are not. They can be similar if the overall selection ratio is somewhere close to the overall mean, but in the case of relatively small selection ratios (say less than 0.25 overall) coupled with differences in subgroup variances, substantial differences in subgroup means may not be reflected in selection ratio differences Ð they might even be reversed in direction. We should be much more dissatisfied with the concept of adverse impact Ð and for that matter, with talk about mean group differences Ð than we are. We should be concerned with group selection ratios in relation to true scores. In a remarkably widely unnoticed article, Ironson, Guion, and Ostrander (1982) used item response theory to show that tests with a very steep slope in the test characteristic curve tend to exaggerate true group differences and that those with shallow slopes tend to minimize them. Since slopes may be related to validity, the best predictors may erroneously show substantial mean differences, and the search for smaller mean differences may lead to using the least valid predictors. A further implication of this same dissatisfaction is that individuality needs to be considered more systematically than it has been in the employee selection literature. The topic has had little consideration in psychology as a whole and almost none in the psychology of employment. It is too often assumed that uniqueness can be pushed into the category of prediction error. We need to find ways to identify why subgroup differences exist rather than simply brushing them aside as ubiquitous error or assuming that they represent some basic social flaw. We need some way to assure that we consider individual candidates and their own unique patterns of potential contribution Ð the hallmark of individuality (Tyler, 1978). In part, the solution here will be found in new methods of data collection and analysis; we need ways to consider the gestalt idea of patterning of traits or of responses to test components. But we also need to expand our views of what should be considered in forming such patterns. Perhaps banding provides an illustrative case. Banding groups scores together that do not differ appreciably and permits Ð encourages Ð selection decisions being based on factors other than trivial differences in scores. The concept of banding has had two strikes against it. One is that it has been

358

HUMAN RESOURCE MANAGEMENT REVIEW

VOLUME 8, NUMBER 4, 1998

discussed almost exclusively within the concept of demographic factors defining equal employment opportunity; it has been treated as a form of affirmative action, a highly emotionally charged concept. The other is that its introduction was based on statistical concepts of null hypothesis testing (e.g., Cascio, Outtz, Zedeck, & Goldstein, 1991; Schmidt, 1991) rather than management concepts of trivial differences that do not matter. My own discussion of banding (Guion, 1998a) starts with the idea of a ``range of indifference'' in test scores, most likely determined by management judgments about the importance of criterion differences associated with specified predictor differences. It then says that other information Ð information not routinely available, or information judged useful but not validated for one reason or another Ð be used to make choices within a band. In a band of high scores, other information (e.g., prior felony arrests or a job-jumping employment history) might disqualify some candidates; if there are enough openings, all other candidates in the band would be selected. A subsequent score band might include more people without disqualifying characteristics than there are available positions. Choices among them could be random, but I would hope for a more thoughtful approach, e.g., basing them on patterns of prior achievement, indications of special but unusual abilities or other traits, or perhaps on information from expensive, time-consuming individual assessments. Special organizational needs (e.g., for diversity of talent or demographic diversity) can also serve as information to be considered when making choices within a score range of indifference. So far as I know, no one has yet taken the idea of banding to this organizationally relevant position. Selection Procedures are not Evaluated in the Way They are Used When candidates present themselves for consideration, they are usually assessed in some way Ð by interview, assessment center exercises, tests, biodata, or whatever. On some basis Ð perhaps top down, or specifying that a candidate's score is passing or failing, or by considering some other information Ð decisions about these people get made: some of them are chosen and others are not. Unless the decision is based solely on scores used according to a prior rule (called a mechanistic model by Stewart & Carlson, 1997), the process leading to it is some sort of system, even if it is not very systematically followed. What I find dissatisfying is that serious efforts at evaluating selection procedures tend to be limited to the evaluation of assessment methods without evaluation of the procedures leading to selection decisions. The principal concept in the evaluation of selection tests and other measurements is validity, and many experts have been dissatisfied with our concepts of validity (for some samples, see Cronbach, 1971; Frederiksen, 1986; Messick, 1995; Guion, 1998a). We used to refer to validity as a property of a test; now we refer to validity as a property of the inferences or interpretations we draw from test scores. The psychometric trinity of criterion-related, content, and construct validity were treated as discrete, different kinds of validity for many years, although official recommendations and standards

VIRTUES OF DISSATISFACTION

359

insisted that they were all ``aspects of validity'' Ð that validity had unity despite these differing, distinguishable components or categories of evidence. Content validity may have made sense to organizational decision makers, but it is not likely that construct validity ever did or does. Criterion-related validity has consistently assumed a linear model Ð a model that says that if some level of score is good, the next higher level is better, and that how much better is the same at any score level. When linearity is assumed in test validation, it is also assumed that selection will follow a top±down system, choosing first the person with the highest score, then choosing the one with the next highest score, and so on. Except in the public sector, this is not how selection decisions are typically made. HR managers and industrial ± organizational psychologists who validate tests seem to make the obviously wrong assumption that decision makers will almost uniformly select candidates with higher scores in preference to those with lower scores throughout the entire range of scores. In many real-life settings, cut scores are set and people are considered for employment every day with daily numbers to hire for specific jobs. As soon as enough people who ``pass'' the test (and who have no disqualifying characteristic in the opinions of the decision makers) are found, hiring for that job, for that day, is over; others coming later, even if more qualified, are not considered. I may have presented merely a caricature of the procedure with cut scores, but not without some foundation. In any case, a validity coefficient for the dichotomous scores based on very high or very low cut scores can be materially lower than the coefficient reported for the full range of nondichotomized scores. In other settings, moreover, decision makers may let unvalidated information totally override attention to validated test scores that have been shown to predict level of excellence. It is not a very good way to do business, but it is a common way. The system of using assessment results is perhaps more in need of evaluation than the assessment methods themselves. In a more general vein, Alvares (1997) argued persuasively that an HR function should operate as a business, considering other parts of the organization as customers who contract for services at a fee; dissatisfied customers can go to someone else for the service they need. Similarly, Ulrich (1997) gave examples of an expanded view of the ``real'' HR customers. Such an approach is not necessarily easy; it requires, among other things, a good bit of advertising and training to tell ``customers'' what can be done, what should be done, and how the products provided should be used. Even so, persistent harping on validity coefficients and utility analyses without clear, business-oriented evidence of profitability can lead to trouble. Failure to demonstrate the effectiveness of personnel counseling programs established after the Hawthorne studies is at least one reason for their demise. It could happen to sensible selection procedures just as surely. One danger of the work-unit-manager-as-customer idea, without insistence on profitability evidence, is that poor methods that happen to be liked by the customers will satisfy them. Highhouse (1997) identified yet another instance when procedures for evaluating assessments do not match the selection processes actually used.

360

HUMAN RESOURCE MANAGEMENT REVIEW

VOLUME 8, NUMBER 4, 1998

To fill jobs or positions where only one, or at most only a few, will be hired, the choice is usually made from a relatively large set of candidates. Typically, statistical prediction is not feasible, and a judgment process is devised in which a subset of candidates is declared ``finalists'' Ð candidates with perhaps different sets of apparent strengths and weaknesses but who seem generally comparable in level of attractiveness for the position. Someone, or some group, must make a choice among these finalists to determine who will be chosen for the position. Highhouse (1997) recommended that basic decision research be applied in evaluating the judgments in such situations. I concur and support the recommendation more generally. I think we have failed to realize that most employment decisions are, in fact, judgments made by key actors in an organization, and that these judgments are often the selection procedures that need to be evaluated, even beyond Highhouse's special case (Guion, 1998a). If that is so, then decision theory and research should be much more widely used to try to discover the patterns of information that influence these decisions (to ``capture'' decision makers' policies) and to discover the patterns that lead typically to good (or poor) decisions. Judgments and decisions play increasingly large roles in the employment process, but they have not been given a commensurate role in research. Maybe the mere mention of it will lead someone to move toward the study of judgment in employment decisions. Traditional Procedures do not Choose People Who Fit the Organization This is a special case of the previous, more general dissatisfaction, but it is important enough to consider independently. Traditional methods choose people for specified jobs, not for an organization as a whole or for subunits, work groups, or teams. It should be recognized that the Uniform Guidelines (Equal Employment Opportunity Commission et al., 1978) accept this limitation and insist on job-relatedness, and specifically relatedness to the specific job for which a candidate is being considered. Candidates can be considered for a higher-than-entry-level job only if nearly all newly hired people will be promoted to it within a rather short period of time. So far as I know, no court case has addressed the question of selecting on the basis of person±organization fit. From a professional perspective, the Uniform Guidelines are so badly out of date that I do not worry about this limitation, but those who choose candidates on the basis of their probable advancement should be aware of it. A first judgment is whether it is more important to consider a candidate for membership in the organization in general, for a specified job within it, or for membership in some organizational subgroup, work group, or team Ð or whether there should be some combination of these considerations. The question of person±organization fit has its pitfalls. Not the least of these is its history of being used as a surrogate for prejudice and bigotry. Not long ago, before the 1964 Civil Rights Act, many employers and government officials systematically denied employment to women and locally visible ethnic groups on the grounds that these people would not ``fit in'' the organization.

VIRTUES OF DISSATISFACTION

361

Nevertheless, it is a topic that is returning, not in terms of equal employment opportunity or its denial, but in terms of organizational theory. There is new recognition of the importance of certain skills, including social skills, and ways of approaching work. Dissatisfaction arises from the almost exclusive concern in traditional approaches with prediction of individual job performance, with no corresponding concern for individual functioning within a work group, team, or organization. There are some responses to this dissatisfaction. One possible response (one that does not satisfy me) is that promoting fit is a training and development problem, not a selection problem. (It fails to satisfy me only because it seems to beg the question of how one determines fit.) Schneider (1987) presented the attraction±selection±attrition paradigm as describing both a selection problem (one can select only among those attracted to the organization) and a retention problem (keeping the organization as attractive as the employee thought it was when applying). Viewing it as mainly a development issue, Cannon-Bowers, Tannenbaum, Salas, and Volpe (1995) offered 16 propositions to stimulate research on team training. Brannick, Salas, and Prince (1997) recognized dissatisfaction with the criterion side of team selection and edited a volume on the measurement of team performance, explicitly treating the team as the unit of analysis. The Linear Assumption is Unnecessarily Limited We are much too married to linear modeling. It is my guess that truly linear regression does not occur in nature unless we have restricted range, specifically restricted to the middle of a complete distribution. For a variety of reasons, I suspect that a better fit in very low level or very high level jobs would be an ogival or logistic curve Ð one which has very small slopes at the ends of the distribution and a relatively steep slope nearer the center. Raju, Steinhaus, Edwards, and DeLessio (1991) used a two-parameter logistic regression (they called it a job characteristic curve) and found that it fit actual data rather well. Such a regression curve is, like any other regression function, based essentially on a static theory of work and performance. Mendenhall, Macomber, Gregersen, and Cutright (1998) were dissatisfied not only with the traditional linear regression but with its static nature and necessary inclusion of a stochastic error term. They advocated a multivariate approach calling for a time dimension and two or more interaction terms. The mathematical foundation is complex, but a key sentence in their article needs no mathematics. I paraphrase it like this to emphasize two different assumptions: If one were to assume that organizational processes were inherently linear, or if one were to assume that they were inherently nonlinear, ``how would organizational scholars think about, and view, the phenomena they study?'' (Mendenhall et al., 1998, p. 13). Such a question makes it clear that what we think we know stems rather directly from the assumptions we make and mathematical models we use. I have been among those who have some dissatisfaction with an automatic assumption of linearity, but I do not think we need to abandon

362

HUMAN RESOURCE MANAGEMENT REVIEW

VOLUME 8, NUMBER 4, 1998

linear models universally. What is needed is an expansion of the options and perspectives of modeling by inviting the development and testing of nonlinear models as well. We Look to See What Others do Instead of Finding Out What We Need to do ``Ask any 10 human resource managers how they select employees and you will find that most of them work from the same set of unchallenged, generally unspoken ideas'' (Behling, 1998, p. 77). These ideas, according to Behling, call for selecting candidates whose credentials match the requirements of a job for which they are to be hired. In a sense, the ``unchallenged idea'' is a traditional model without its traditional quantification. How can it be so common? It is because of a child-like declaration that ``everyone else is doing it that way.'' Some of us, along with Behling, are dissatisfied with the habitual look over the shoulder to see (and do) what others are doing, whether it fits the local circumstance or not. Ulrich (1997), who was dissatisfied with the moribund nature of the present field of human resources, issued nine challenges to be met if the field of human resources is to play the role it should. I will not reiterate them all, but the set generally offers a clear idea (actually nine of them) of what can be done to ease the dissatisfaction. The first one is to focus more clearly, not on what the field can do , but on what the field will deliver to make organizations more effective Ð in short, on deliverables, not on doables. He described one company's HR plan, which excited its executives, as filled with new initiatives Ð such as competence-based staffing Ð that seemed to me (and perhaps to him) to be a litany of the current ``in'' programs that everyone who is anyone is doing. He ``ventured to ask why Ð Why were they doing these new initiatives? Why would the business benefit from accomplishing these initiatives? The deliverables were never articulated, so the doables dominated thinking. If other functions used this approach, marketing would allocate its time running focus groups, doing surveys, and segmenting customers rather than on gaining market share and selling products'' (Ulrich, 1997, p. 5). I see this as an all-too-common and sheep-like tendency to follow the leader, the leader being anyone or any firm that is doing something interesting whether it serves a specified local organizational purpose or not. His second challenge is to develop and use explanatory theory Ð to find or specify reasons for why something works (or does not). Such theorizing is a special case of if . . . then. If the local circumstance or problem at hand is identified, then some action is underway or advocated. A source of dissatisfaction is the tendency to concentrate on the then of the if± then formula rather than the if; insistence on explanatory theory places the focus on the if Ð the context in which an action may be needed, in theory at least. A recent illustration of relevant theory building is the consideration of three selection models by Behling (1998): (a) one that seeks very bright, intelligent people and leaves it to post-hiring training to teach them any specific knowledge or skill required on the jobs they do; (b) one that worries little, if at all, about general intelligence but hires conscientious people who can be depended on to do their

VIRTUES OF DISSATISFACTION

363

best; and (c) one that is a more traditional matching of specific qualifications and specific job requirements. The first two of these seem to be nontraditional models that give relatively little attention to person±job fit and a great deal to person±organization fit in the sense that the organization as a whole needs bright people or conscientious people (or both) who can be trained in the specifics of particular jobs. These models, according Behling's theory, are most appropriate when new employees will have to do a lot of problem-solving, will have a lot of autonomy, will learn skills and knowledge on the job that are more important than those brought to the job, will need to learn and adapt rapidly, and do not differ importantly in the skills and abilities the job requires. Behling has shown the way: we can drop the one-size-fits-all mentality and instead develop and test theories about the models that are most appropriate for specific sets of circumstances. One of my major dissatisfactions is a tendency in human resource management, and it seems in the world at large, to eschew quantification. If something exists, such as a benefit from an intervention, it exists to some degree and is therefore measurable. Measurement is often hard to do, at least to do well, and the temptation is always present to avoid it with anecdotes or warm, fuzzy feelings that all is well. Worse yet is the common willingness to assume that, because the intervention was designed by an expert and installed at great cost, it must therefore be good. As I have said in other contexts, the Biblical Gideon did not need to evaluate the effectiveness of his tests because they were given by God, and too many people today seem to believe that their favorite HR applications are God-given and therefore need no further evaluation. Ulrich (1997) asked some specific questions that can have quantifiable answers, such as the effect of the activity on a firm's market value, or the economic effect of not doing it, but he never said the answers would come easily. SOME CLOSING REFLECTIONS Through most of the 20th century, the psychology and sociology basic to human resource management have been generally well-developed and have worked reasonably well. In the limited field of test validation by traditional criterion-related methods, the newer tool of meta-analysis has demonstrated, even discounting some hyperbole, that tradition has provided much of value. Its critics, who would wholly ignore the traditional approach to employee selection, sometimes call for its demise. This is an unwise position. To abandon traditional approaches to employee selection is to abandon the valuable results they have demonstrated. On the other hand, there are many thoughtful critics, many of them practitioners and researchers within the specialty of employee selection research and practice, who are not satisfied with the status quo. The tradition is fine, they say, but we can and need to do better Ð i.e., to provide even better approaches to selection in employing organizations. To do so, we must get rid of the myopia that focuses only on the tradition and not on the opportunities to

364

HUMAN RESOURCE MANAGEMENT REVIEW

VOLUME 8, NUMBER 4, 1998

improve on it. Nearly all of the dissatisfactions I have enumerated call not for the elimination of tradition but for the expansion and further development of theoretical and procedural options Ð of which the traditional model is one, but only one. REFERENCES Alvares, K. M. (1997). The business of human resources. Human Resource Management, 36, 9±15. Barrick, M. R., & Mount, M. K. (1991). The big five personality dimensions and job performance: A meta-analysis. Personnel Psychology, 44, 1±26. Behling, O. (1998). Employee selection: Will intelligence and conscientiousness do the job? Academy of Management Executive, 12, 77±86. Bennett, R. E., & Ward, W. C. (Eds.). (1993). Construction versus choice in cognitive measurement: Issues in constructed response, performance testing, and portfolio assessment. Hillsdale, NJ: Lawrence Erlbaum Associates. Borman, W. C., & Motowidlo, S. J. (Eds.). (1993). Expanding the criterion domain to include elements of contextual performance. In N. Schmitt, & W. C. Borman (Eds.), Personnel selection in organizations ( pp. 71±98). San Francisco: Jossey-Bass. Brannick, M. T., Salas, E., & Prince, C. (1997). Team performance assessment and measurement: Theory, methods, and applications. Mahwah, NJ: Lawrence Erlbaum Associates. Cannon-Bowers, J. A., Tannenbaum, S. I., Salas, E., & Volpe, C. E. (1995). Defining competencies and establishing team training requirements. In R. A. Guzzo, & E. Salas (Eds.), Team effectiveness and decision making in organizations ( pp. 333±380). San Francisco: Jossey-Bass. Cascio, W. F., Outtz, J., Zedeck, S., & Goldstein, I. L. (1991). Statistical implications of six methods of test score use in personnel selection. Human Performance, 4, 233± 264. Cleary, T. A. (1968). Test bias: Prediction of grades of Negro and White students in integrated colleges. Journal of Educational Measurement, 5, 115±124. Cronbach, L. J. (1971). Test validation. In R. L. Thorndike (Ed.), Educational measurement, (2nd ed., pp. 443±507). Washington, DC: American Council on Education. Dillon R. F., & Schmeck R. R. (Eds.). (1983). Individual differences in cognition ( Vol. 1). New York: Academic Press. Equal Employment Opportunity Commission, Civil Service Commission, Department of Labor, & Department of Justice (1978). Uniform guidelines on employee selection procedures. Federal Register, 43(166), 38295±38309. Frederiksen, N. (1986). Construct validity and construct similarity: Methods for use in test development and test validation. Multivariate Behavioral Research, 21, 3±28. Freyd, M. (1923). Measurement in vocational selection: An outline of research procedures. Journal of Personnel Research, 2, 215±249, 268±284, 377±385. Guion, R. M. (1965). Personnel testing. New York: McGraw-Hill. Guion, R. M. (1998a). Assessment, measurement, and prediction for personnel decisions. Mahwah, NJ: Lawrence Erlbaum Associates. Guion, R. M. (1998b). Jumping the gun at the starting gate: When fads become trends and trends become traditions. In M. D. Hakel (Ed.), Beyond multiple choice:

VIRTUES OF DISSATISFACTION

365

Evaluating alternatives to traditional testing for selection ( pp. 7±15). Mahwah, NJ: Lawrence Erlbaum Associates. Highhouse, S. (1997). Understanding and improving job-finalist choice: The relevance of behavioral decision research. Human Resource Management Review, 7, 449±470. Hough, L. (1998). Personality at work: Issues and evidence. In M. D. Hakel (Ed.), Beyond multiple choice: Evaluating alternatives to traditional testing for selection ( pp. 131±166). Mahwah, NJ: Lawrence Erlbaum Associates. Ironson, G. H., Guion, R. M., & Ostrander, M. (1982). Adverse impact from a psychometric perspective. Journal of Applied Psychology, 67, 419±432. Jaeger, R. M. (Ed.). 1976. On bias in selection. Journal of Educational Measurement 13, 3 ±99 (Special issue). Lawshe, C. H. (1959). Of management and measurement. American Psychologist , 14, 290±294. Mansfield, R. S. (1996). Building competency models: Approaches for HR professionals. Human Resource Management, 35, 7±18. Mendenhall, M. E., Macomber, J. H., Gregerson, H., & Cutright, M. (1998). Nonlinear dynamics: A new perspective on IHRM research and practice in the 21st century. Human Resource Management Review, 8, 5±22. Messick, S. (1995). Validity of psychological assessment: Validation of inferences from persons' responses and performance as scientific inquiry into score meaning. American Psychologist, 50, 741±749. Osborne, H. F. (1940). Oral trade questions. In W. H. Stead, & C. L. Shartle (Eds.), Occupational counseling techniques: Their development and application ( pp. 30± 48). New York: American Book. Raju, N. S., Steinhaus, S. D., Edwards, J. E., & DeLessio, J. (1991). A logistic regression model for personnel selection. Applied Psychological Measurement, 15, 139±152. Ryan, A. M., & Greguras, G. J. (1998). Life is not multiple choice: Reactions to the alternatives. In M. D. Hakel (Ed.), Beyond multiple choice: Evaluating alternatives to traditional testing for selection. Mahwah, NJ: Lawrence Erlbaum Associates. Schmidt, F. L. (1991). Why all banding procedures in personnel selection are logically flawed. Human Performance, 4, 265±277. Schneider, B. (1987). The people make the place. Personnel Psychology, 40, 437±453. Steele, C. M., & Aronson, J. (1995). Stereotype threat and the intellectual performance of African±Americans. Journal of Personality and Social Psychology, 69, 797±811. Sternberg, R. J. (1997). Intelligence and life-long learning: What's new and how can we use it? American Psychologist, 52, 1134±1139. Stewart, G. L., & Carson, K. P. (1997). Moving beyond the mechanistic model: An alternative approach to staffing for contemporary organizations. Human Resources Management Review, 7, 157±184. Thorndike, R. L. (1971). Concepts of culture-fairness. Journal of Educational Measurement, 8, 63±70. Tyler, L. E. (1978). Individuality: Human possibilities and personal choice in the psychological development of men and women. San Francisco: Jossey-Bass. Ulrich, D. (1997). Judge me more by my future than by my past. Human Resource Management, 36, 5±8. Wagner, R. K. (1997). Intelligence, training, and employment. American Psychologist, 52, 1059±1069.