Electronic decision support for procurement management: evidence on whether computers can make better procurement decisions

Electronic decision support for procurement management: evidence on whether computers can make better procurement decisions

ARTICLE IN PRESS Journal of Purchasing & Supply Management 9 (2003) 191–198 Electronic decision support for procurement management: evidence on whet...

215KB Sizes 3 Downloads 53 Views

ARTICLE IN PRESS

Journal of Purchasing & Supply Management 9 (2003) 191–198

Electronic decision support for procurement management: evidence on whether computers can make better procurement decisions Chris Snijdersa,*, Frits Tazelaara, Ronald Batenburgb a

Department of Sociology/ICS, Utrecht University, Heidelberglaan 1, 3584 CS Utrecht, The Netherlands b Institute of Information and Computing Sciences, Utrecht University, The Netherlands Received 1 October 2002; accepted 1 March 2003

Abstract We analyse how well purchasing managers are able to judge the likelihood of problems for a given purchasing transaction. The literature on clinical versus statistical prediction suggests that humans in general, including purchasing managers, are often outperformed by relatively simple statistical formulas for such kinds of tasks. Based on a vignette experiment of real purchasing transactions, we compare the performance of purchasing managers with freshmen students and with a statistical formula based on a cross-validated sample. The results show that the formula outperforms the humans, and that experienced purchasing managers do not outperform freshmen students. We conclude that it would make sense to use decision support systems in the daily practice of purchase management so that humans can devote their time to what they are good at, while being guided by statistical software that takes care of multi-dimensional decisions in noisy environments. r 2003 Elsevier Ltd. All rights reserved.

1. Introduction Some purchasing transactions can be foreseen to run smoothly without large investments in time and effort. For other purchasing transactions, a substantial investment in time and effort is necessary. Most people would agree that at least one of the tasks of a purchasing manager is to be able to decide whether a transaction belongs to the first or the second category. Stated otherwise, one of the tasks of a purchasing manager is to decide which of a set of transactions needs purchase management more. For some transactions it makes sense to ask for a lot of tenders, invest a lot in the screening of suppliers, involve lots of time in negotiating, and put a serious effort in writing a detailed contract. For some transactions such investments are not necessary or not efficient (Batenburg et al., 2000). There are, however, compelling arguments on the basis of the literature on clinical versus statistical prediction that suggest that purchasing managers — like all other humans — are typically not good at making precisely *Corresponding author. Present address: Department of Technology Management/ECIS, Eindhoven University of Technology, P.O. Box 513, 5600 MB Eindhoven, The Netherlands. Tel.: +31 40 2472640; fax: +31 40 2464646. E-mail address: [email protected] (C. Snijders). 1478-4092/$ - see front matter r 2003 Elsevier Ltd. All rights reserved. doi:10.1016/j.pursup.2003.09.001

these kinds of judgments. We set out to test this assertion. First, we briefly review the literature on clinical versus statistical prediction.

2. The state of affairs in clinical versus statistical prediction Clinical prediction is the term for a situation where a decision maker receives data on several dimensions of a certain decision and subsequently makes a prediction. Clinical prediction is an integral part of our society. Personnel managers predict whether or not an applicant will turn out to be a valuable addition to the company, psychiatrists predict whether a convicted murderer will murder again, doctors assess the likelihood of a patient having a certain disease, and there are certainly many other examples where relevant dimensions are combined into a single prediction or judgment by an ‘expert’. In our context, one could think of a purchasing manager who, for some procurement decision, assesses which of a set of transactions is more likely to lead to problems, given possibly relevant dimensions such as market risk, supplier risk, profit risk, the degree of detail in the contract, or whatever other data are available for that decision.

ARTICLE IN PRESS 192

C. Snijders et al. / Journal of Purchasing & Supply Management 9 (2003) 191–198

Statistical prediction (or mechanical prediction) is the term for a situation where on the basis of data on several dimensions, some formula is used to make a certain prediction (Meehl, 1954). The same examples as mentioned above spring to mind, only now it is not the expert (personnel manager, psychiatrist, medical doctor, purchasing manager, etc.) who predicts, but the decision is made by using a formula instead. Analogous to the example given above, one could think of a procurement manager who, for some procurement decision, assesses the relevant dimensions and subsequently uses a score-card or a computer program to predict the probability that problems will occur for that particular transaction. Perhaps surprisingly, a large number of studies in diverse areas have shown the superiority of statistical prediction over clinical prediction. The experts hardly ever predict better and actually often worse than the statistical formula. Topics investigated include the prediction of academic success, business bankruptcy, longevity, military training success, myocardial infarction, neuropsychological diagnosis, parole violation, and the likelihood of violence. In a meta-analysis by Grove et al. (2000), the superiority of statistical over clinical prediction was still standing: ‘‘There seem, then, to be no barriers to a general preference for mechanical (=statistical) prediction where an appropriate mechanical algorithm is available’’ (p. 26). The areas in which the superiority of statistical prediction have been established have three important aspects in common (Grove and Meehl, 1996): (1) They are typically areas where accumulated experience, intuition, and fingerspitzengefuehl are considered important. Recruiters of personnel claim to know after the first 5 minutes of a job interview whether the applicant is appropriate, managers intuitively feel which of a set of suppliers is best, loan officers are thought to develop over the years a keen sense of which firm is most likely to be able to repay a loan, and clinical physicians are thought to combine the data from blood tests and scans in ways that are superior to merely adding and subtracting measurement results. (2) Decisions or predictions involve the incorporation of a relatively large number of dimensions (typically more than five). (3) Decisions or predictions involve the combination of dimensions in a ‘‘noisy environment’’. It is not clear which dimensions should be included, dimensions are hardly ever measured exactly, and it may very well be that combining the available measurements in even the most optimal way still leads to a decision or prediction that is only reasonable, and not good or even perfect.

These three factors are considered part of the explanation why the experts are not that superior as one might imagine. In a nutshell, research mainly in (social) psychology has revealed that for these kinds of decisions — combining a large amount of data and then making a single prediction or decision — (1) experience and intuition does not offer many useful guidelines, (2) most humans are typically bad at consistently combining data on a large number of dimensions, and (3) humans characteristically perform badly in a stochastic environment. 3. Implications for purchase management Though about two-thirds of the findings on clinical versus statistical prediction concern medical or clinical issues and only about 10 tests have been performed that are related to business predictions (Grove et al., 2000, Table 1), these findings may have substantial implications for the theory and daily practice of procurement management, or even management in general. Typically, management tasks fit the abovementioned three criteria. It is an area where practitioners tend to think that experience, expertise, and ‘‘feeling’’ or ‘‘intuition’’ are important. Many managerial decisions indeed involve the combination of a large number of dimensions (profit risk, market risk, volume of the transaction, involved costs of different kinds, past dealings with the same person or firm, the reputation of the other person or firm, to name just a few). And, decisions are taken in an environment that is undeniably noisy — just think about how or even whether the different dimensions are measured, or about how adequate something like, for instance, the probability of problems can be predicted. In other words, the perhaps shocking conclusion is that at least some managerial decisions are a likely candidate to be added to the list of topics where clinical prediction comes short of statistical prediction. To put it even more bluntly: it may very well be that a simple formula outperforms a manager in decisions where the manager strongly feels that he or she is an expert. 4. A vignette experiment: are computers better purchasing managers? We will put this hypothesis to the test in an experiment with a group of 30 purchasing managers Table 1 Spearman correlations between actual and predicted scores, averaged per group Formula Students Purchasing managers

0.37 0.26 0.24

ARTICLE IN PRESS C. Snijders et al. / Journal of Purchasing & Supply Management 9 (2003) 191–198

and 60 students. Both purchasing managers and undergraduates were given several ‘‘vignettes’’ describing a procurement transaction (the procurement of IT products; see Fig. 1), and were asked to predict *

*

*

know the correct answers to what the purchasing managers and students are predicting and can compare their answers with the real ones (for a more detailed description of this database, see the section Vignette construction and measurement, and Batenburg, (1995/7) or Buskens and Batenburg, 2000). We calculated a formula that generates predictions on precisely the first two issues mentioned above. For both issues, part of the data was set aside, and the formula used was simply the best piecewise linear predictor based on that part of that same dataset (many ‘clinical versus statistical’ studies use this kind of cross-validation to generate the formula that is to compete with human experts). This implies that in fact we use five formulas: one to predict how problematic a transaction was going to be, and four separate formulas, one on each potential problem. Each

the likelihood of this transaction being a problematic one which of the following kinds of problems (if any) were to be expected * late delivery * over price/budget * incorrect specifications upon delivery * inadequate documentation; how certain they were about their judgment.

In fact, the vignettes were chosen from a larger database of real purchasing transactions, so that we actually

The product It concerns the purchase of

< pc(s), cabling, tailor-made software >.

Price in Dutch guilders Importance for the profit of the buyer firm

< between 100,000 and 200,000 > < high >

The supplier Size of the supplier (number of employees) Reputation in the market

< 400 > < reasonable >

The buyer Number of employees Number of years in business with same supplier Purchasing arranged through Legal issues arranged through Can judge price/quality for the different possible suppliers Know other clients of the same supplier

< 35 > <2> < purchasing department > < external experts > < hardly > < yes >

Ex ante purchasing management Number of tender Total investment in search, screening, selection, Negotiating, and contracting Degree of detail of the written contract

<2> < 4 mandays > < high >

On a scale from 0 to 100: How problematic do you think this transaction will turn out to be? __________ where:

0 = completely unproblematic 100 = highly problematic

Which of the following problems do you expect to have a high probability of occurrence for this transaction? (you may choose more than one)

0 0 0 0

late delivery over price / budget incorrect specifications upon delivery inadequate documentation

How certain are you about your answers? “I am just absolutely guessing”

“I am certain” 1

2

3

193

4

5

6

7

Fig. 1. Example vignette. Words between o.> varied across vignettes.

ARTICLE IN PRESS 194

C. Snijders et al. / Journal of Purchasing & Supply Management 9 (2003) 191–198

formula was simply the best piecewise linear predictor as found in the data set aside. A comparison of both the answers as calculated from the formula (statistical prediction) and the answers as provided by the purchasing managers and the students (clinical prediction) with the actual answers from the data will enable us to find out whether computers indeed outperform humans in predicting (the probability of) problems associated with IT transactions. Subjects: All students involved were freshmen in information sciences (in Dutch: ‘‘informatiekunde’’) and participated as part of a course requirement. Purchasing managers participated in response to an invitation from a student. Each pair of students had to find one purchasing manager who was willing to participate. Preferably, the manager should have experience in both purchasing and IT, since the transactions on the vignettes were all about IT products. Ultimately, 30 purchasing managers and 60 students participated. Procedure: Each subject (both students and purchasing managers) was given a set of eight vignettes. Fig. 1 shows an example.

4.1. Vignette construction and measurement The vignettes were taken from our database of purchasing transactions (N=971), the External Management of Automation, a large-scale survey on the purchase of IT products by Dutch SMEs (5–200 employees; N=788). The sampling frame was a business-to-business database of Dutch SMEs that contained information about the characteristics of these SMEs with respect to automation. The database can be considered to be representative for the Dutch population of SMEs (see Batenburg, 1995/7). Care was taken to achieve high response rates: firms were contacted by phone and if a respondent agreed to fill out a survey on a specific purchasing transaction, field workers delivered the survey on the date agreed upon and were instructed to leave with the filled out survey. If the respondent was willing to fill out a second survey on a different purchasing transaction, the field worker was to leave an empty survey and a response envelope so that the respondent could fill out this second case at his or her convenience. Eventually, the average response rate to the telephone interview was 67% (902 out of 1335). Multiplied with the field response rate of 87% (788 out of 902), the total response rate equaled 59% (788 out of 1335). This is a high response rate in comparison with other surveys among organizations (cf. Kalleberg et al., 1996: chapters 1 and 2). Non-response analysis showed that the response group is not biased on crucial firm characteristics such as size, industry, or region. The codebook, which includes the questionnaires, is downloadable from http://www.fss.uu.nl/soc/iscore.

Per transaction, over 300 issues were measured, including those mentioned in the vignettes, such as the number of tenders, the number of days involved in negotiating and contracting, etc. The problems associated with the transactions were measured by having respondents check on a 5-point scale the (degree of) problems that had occurred. Based on results in the pilot phase, the respondents could choose from ‘late delivery’, ‘over budget’, ‘product incomplete’, ‘product too slow or too confined’, ‘specification not as agreed’, ‘incompatible with other systems’, ‘sloppy installation’, ‘aftersales slow or missing’, ‘service slow or missing’, ‘necessary adjustments slow or missing’, and ‘insufficient documentation’. Our empirical analysis showed that there is a single dimension underlying these problems — a principal component analysis yielded a clear single component. Our data show that, on average, a large number of problems and a high degree of problems go together. The degree of ‘‘problematicness’’ is therefore measured as the average score on this list of problems, rescaled to 0–100. We then classified all cases according to their score into eight categories of similar size, ranging from 1 (not very problematic) to 8 (very problematic). Each individual received one (randomly chosen) vignette from each category, in a random order. For (almost) all sets of eight vignettes, there are three individuals who made predictions: a purchasing manager and two students. In addition, our formula made predictions on each of the vignettes. The 240 (30  8) vignettes given to the purchasing managers are all different. This guarantees a large spread in the kinds of vignettes under consideration and it enables a clean comparison of purchasing managers versus students.

5. Results As it turns out, in all respects our formula certainly does not perform worse than both the students and the purchasing managers. First, we consider the prediction of the transaction being a problematic one. As mentioned above, the variable to be predicted was a score on a 0–100 scale. We consider two ways of analysing the data, both of which will lead to the same results. Our first analysis is the most straightforward one. Per person, one can — for each vignette — calculate the difference between the actual score of the transaction in the data and the prediction as given by the individual. For instance, consider a vignette that had a value of, say, 20 in the data. An individual who rates this transaction as, say, 68, is 48 points off. Per person, we calculated how many points off he or she was from the real values of the vignettes, across the eight vignettes that were given to that person. For comparison, we include the perfect

ARTICLE IN PRESS C. Snijders et al. / Journal of Purchasing & Supply Management 9 (2003) 191–198

score (0), the average random score (about 310), and the worst case score (about 650). As can be seen from Fig. 2, the formula performs best (109), followed by the students (189) and the purchasing managers (213). An obvious objection against this analysis is that it puts an extreme emphasis on subjects being able to replicate the scale on which problems occurred in the data. For instance, suppose that an individual was given eight vignettes with actual problem scores 10, 20, 30, 40, y , 80, and rated these vignettes as 1, 2, 3, 4, y, 8. Such ratings would lead to a worse than random absolute difference score of (10 – 1)+(20 – 2)+ y+(80 – 8)= 324, but clearly this rater did something right since he managed to order the vignettes perfectly. Therefore, for each individual we now calculate the Spearman rank correlation between the actual and the predicted scores. A subject ordering the vignettes perfectly — as in the just mentioned example — would yield the maximum score of +1, whereas ordering the transactions precisely the wrong way around would yield the minimum score of 1. Though the average scores are positive for all groups (a satisfactory result), again the formula outperforms the humans, and purchasing managers do not outperform even freshmen students (see Table 1). Using Wilcoxon’s signed-rank test (Wilcoxon, 1945; or see Siegel and Castellan, 1988), we can conclude that the formula produces significantly better estimates than the purchasing managers and the students at the 3% level (one-sided). Or, we could use the metric as used in Grove et al. (2000), and calculate Fisher’s z-transform on the correlations. We then find a difference of 0.13 in favour of the formula where Grove et al. take any difference larger than 0.10 as of substantial interest. The differences between students and purchasing managers are not significant at the 5% level. Next, we consider the predictions regarding the kind of problems to be expected. Whereas it would be quite far-fetched to believe upfront that people (or a formula) can predict the problems perfectly, it would seem appropriate to expect that a person’s indication that a certain problem is likely to occur has at least some predictive value. This is what we will label ‘‘usefulness’’

Average absolute differences per group 700 600 500 400 300 200 100 0 Perfect

Formula

Students

Managers

Random

Worst case

Fig. 2. Sum of the absolute differences between actual and predicted score, averaged per group.

195

Table 2 Usefulnessa of the judgments in predicting the actual occurrence of problems

Late delivery Over price/budget Incorrect specifications upon delivery Inadequate documentation

Managers

Students

Formula

Useless Useless Useful

Useless Useful Useless

Useful Useful Useful

Useless

Useless

Useful

a Predictions from groups of respondents (managers/students) are labelled ‘‘useful’’ if adding these predictions as an independent variable for the explanation of the actual problems improves the model fit.

of a prediction. In other words, when a purchasing manager indicates that, say, late delivery is likely to occur, then one should be able to predict more accurately whether such a problem indeed occurs than compared to the case where we do not have the indication of that manager. But is this really the case? The results reveal a bleak picture regarding the predictive accuracy with respect to the expected problems. Again the humans are outperformed by the formula. Whereas the formula is useful in predicting all four problems, both the predictions of managers and students are of no use for three out of the four problem areas (see Table 2). When we, in addition, compare the percentage of correctly classified problems, the formula is correct in 69, 86, 82, and 63 percent of the cases, whereas both categories of humans average 52, 61, 56, and 56.

6. Counter arguments: are we selling experts short? It is not easy, for laymen and certainly for experts, to accept the defeat of humans by a simple formula. Nevertheless, as mentioned above, scientifically this result is no surprise and in line with previous research in other areas. Actually, many of the standard counterarguments (mostly by the experts) have been addressed in some detail in a paper by Grove and Meehl (1996). Here, we briefly highlight some of the most commonly heard general objections and some that are specific to our topic. One main argument explaining the relatively poor results by experts is that the kind of test as used here is unfavourable for the expert. In real life, experts have learned to take decisions under time-pressure while taking into account many subtleties simultaneously. In such a ‘‘messy’’ situation, it could be argued, the real potential of the human expert will surface. There are two arguments to be made against this position, one theoretical and one empirical. First, research in (social)

ARTICLE IN PRESS 196

C. Snijders et al. / Journal of Purchasing & Supply Management 9 (2003) 191–198

psychology has shown us many of the reasons why humans (including human experts) are not that good at the kind of tasks as the one in this paper. People do not mentally store information consistently, are not good at retrieving information from memory, and are not good at combining information consistently (see, e.g., Wade and Tavris, 1998, chapters 8 and 9). To us it seems hard to fathom why people’s judgments would improve when the conditions under which they work get more difficult. In fact, research has shown that this is typically not the case. For instance, when people have to make decisions in ‘‘noisy environments’’ (in the sense that there is some randomness involved), their decisions get less consistent and generally worse. In short, although we want it to be true, both theoretically and empirically there is nothing to back up the image of the razor-sharp analytical genius that pops up when under pressure. Another argument against using a model to generate predictions is that one could consider it likely that it would be better to inform experts of the model predictions and then have experts predict again, now equipped with the knowledge of the model and their own expertise. Though there may be some merit to this suggestion, empirical evidence is certainly not strongly in favour of this option. A meta-analysis suggests that when feedback does produce improvement, it only moves the less-accurate experts closer to the (naively) better ones, but it does not enable the latter to surpass the formula (cf. Grove and Meehl, 1996, p. 313). In our particular case, arguments can be made not to include all subjects in all analyses. For instance, not all purchasing managers had experience in IT, and not all IT managers had experience in purchasing. Moreover, sometimes subjects indicated that they were not certain about their judgments. However, neither excluding the managers with the least appropriate levels of expertise nor excluding the vignettes where subjects had indicated they were not certain (or both), has an impact on the general gist of the above-mentioned results: the formula performs best, then the humans, with a small edge for the students over the purchasing managers. Moreover, the performance of purchasing managers does not increase with increasing experience or education, a result that is also in line with the literature on clinical versus statistical prediction (Garb, 1989; Grove et al., 2000). An argument against our results can be made because our ‘‘laymen’’ were students in information sciences, and therefore relatively knowledgeable in IT matters. This is indeed a point to consider, and we plan to compare the results we found here with other subjects not related to IT whatsoever. Nonetheless, we do not find effects of the experience of subjects, neither for the students nor for the experts. In other words, students who have little experience in IT score just as well as students with more experience in IT, and experts with

little experience score just as well as experts with lots of experience in IT. This shows that at least it is not so much the experience of subjects that seems to drive the results. In any case, the point remains that, apparently, organizations’ experts in IT purchasing do not outperform freshmen students, which is curious even if the freshmen students are students in information sciences.

7. A practical implication for procurement management These results have interesting implications for the use of decision support systems in procurement management. At this point, it seems appropriate to quote Meehl’s (1986): ‘‘There is no controversy in social science that shows such a large body of qualitatively diverse studies coming out so uniformly in the same direction as this one [the relative validity of statistical versus clinical prediction]. When you are pushing 90 investigations [now over 130], predicting everything from the outcome of football games to the diagnosis of liver disease and when you can hardly come up with a half dozen studies showing even a weak tendency in favour of the clinician, it is time to draw a practical conclusion.’’ (Meehl, 1986, pp. 372–373) We started out by stating that at least one task of the purchasing manager is to decide on the optimal amount of investment, given the characteristics of the transaction. To perform this task satisfactorly, it is necessary to have an idea about the degree and number of problems that occur for different levels of investment. However, our results reveal that purchasing managers are certainly not better at this than a relatively simple formula. By implication, choosing the optimal level of investment would improve when one would use such a formula, some kind of decision support system, without relying on human judgment. Apparently, for these kinds of decisions, it makes no sense to be guided by your ‘instincts’ or ‘expertise’. At the moment, not many firms make use of decision support systems: there are not many software tools around that can add to procurement decisions to begin with, and because tailor-made software is extremely expensive, not many firms feel the investment in such systems would be profitable (Cook, 1997; p. 34). Currently, we are developing a decision support system based on our own database of transactions. Our decision support system is based on ongoing systematic large-scale quantitative scientific research, among hundreds of organizations/business firms, meanwhile using thousands of real-life transactions between buyers and suppliers, and partners in research and development alliances. For any procurement transaction, it generates advice on the level of investment of resources in search

ARTICLE IN PRESS C. Snijders et al. / Journal of Purchasing & Supply Management 9 (2003) 191–198

and selection, contracting, and the likelihood of problems occurring after the sale, based on statistical analyses of this database of procurement transactions. As the usage of the system progresses, advice gets more and more fine-tuned to fit a particular firm’s situation, because the system learns from previous transactions. Letting the computer do the statistics for you, gives better estimates about which transaction to direct your management resources to compared to a purchasing manager basing this on his gut reaction. This decision support system is totally different from the earlier developed expert systems, which solve problems by emulating the problem-solving behaviour of a human expert (Cook, 1992). Expert systems rely heavily on the assumption that experts are indeed experts, and we showed they are not. It also goes one step further than the more recently developed cased-based reasoning systems (Cook, 1997) and experience-repository-based organizational learning systems (Nick, et al., 2001; p. 365). Such systems use libraries of best practices: best practice protocols, enriched with experiences of managers. But again such systems leave a task to the user that we argue should remain the computer’s: sampling effectively and calculating statistically what is the best practice given the knowledge from past cases. In short, expert systems fail when no expert knowledge is around; case-based reasoning fails when sample selection and statistical combination of data are flawed, and making purchasing decisions of the kind we have tested proves to be a case in point. Finally, we briefly want to address the obvious question ‘‘If you are right, how come that organizations or firms are not already using this?’’. We can imagine several reasons. First, often purchasing has a limited budget for investments in these kinds of tools, and the investments that are made are often used for computer support to automate relatively straightforward tasks, such as handling invoices. Second, many purchasing experts wrongly have the idea that their judgments are actually better. Third, there are not many concrete ideas about the size of the profit that can be incurred by using such a system (a rough estimate using our database suggests about a 10% decrease in size and probability of problems and about a 40% decrease in variable management costs). Fourth, one fears high start-up costs and the costs of commitment to using such a system. In fact, even if an organization or firm sees the potential benefits, there are still other barriers (cf. Kaplan et al., 2001). One is that the decision support system is not perfect. The idea ‘‘now that I have a system, everything should always run smoothly’’ cannot be guaranteed. Even though the real question is how the system compares to humans, this idea is a major obstacle for some potential users. Moreover, a decision support system as suggested delivers a profit in the long run, not for each and every purchasing transaction.

197

8. Conclusion The main thing to note — we cannot emphasize this enough — is that essentially the results as shown here, although perhaps counter to our intuition, are in line with results in the literature in other fields of application. Humans are just not very well equipped to make judgments that combine several dimensions in a noisy environment, and purchasing experts are no exception. One reason for this is that humans suffer from a list of well-documented ‘mental flaws’ that make precisely these kinds of tasks difficult: ignoring base rates, wrongly weighing the different dimensions, failing to take regression-to-the-mean into account, availability bias, and a lack of adequate feedback on the accuracy of past judgments are just a few (see, e.g., Kahneman et al., (1982) or any general introduction into psychology). Perhaps more importantly, one should not only acknowledge that these results exist, but also accept them as an empirical fact and act accordingly. Actually, the main barrier resides in our heads. Grove et al.’s conclusion, supported by the results in this paper, is hard to accept, because it hits us in what we think is one of our proficient skills: choosing wisely in our own area of expertise. Swallowing that pride and analysing the data objectively is the sensible way to proceed: do not let humans do what they are not good at, and do not let purchasing managers do what can be improved upon by using statistical prediction.

Acknowledgements Snijders gratefully acknowledges support by the Royal Netherlands Academy of Arts and Sciences (KNAW).

References Batenburg, R., 1995/7. The external management of automation 1995 (MAT95). Codebook of MAT95. ISCORE paper no. 58, Utrecht University, 218pp,. http://www.fss.uu.nl/soc/iscore. Batenburg, R., Raub, W., Snijders, C., 2000. Contacts and contracts: temporal embeddedness and the contractual behaviour of firms. ISCORE paper no. 107, Utrecht University, 34pp, http://www.fss. uu.nl/soc/iscore. Buskens, V., Batenburg, R., 2000. The external management of automation. Codebook for the combined data from The Netherlands and Germany. ISCORE paper no. 175, Utrecht University, 258 pp, http://www.fss.uu.nl/soc/iscore. Cook, R.L., 1992. Expert systems in purchasing: applications and development. International Journal of Purchasing and Materials Management 28 (4), 20–27. Cook, R.L., 1997. Case-based reasoning systems in purchasing: applications and development. International Journal of Purchasing and Materials Management 33 (4), 32–39. Garb, H.N., 1989. Clinical judgment, clinical training, and professional experience. Psychological Bulletin 105, 387–396.

ARTICLE IN PRESS 198

C. Snijders et al. / Journal of Purchasing & Supply Management 9 (2003) 191–198

Grove, W.M., Meehl, P.M., 1996. Comparative efficiency of informal and formal prediction procedures: the clinicalstatistical controversy. Psychology, Public Policy, and Law 2 (2), 293–323. Grove, W.M., Zald, D.H., Lebow, B.S., Snitz, B.E., Nelson, C., 2000. Clinical versus mechanical prediction: a meta-analysis. Psychological Assessment 12 (1), 19–30. Kalleberg, A.L., Knoke, D., Marsden, P.V., Spaeth, J., 1996. Organizations in America. Analyzing their Structures and Human Resource Practices. Sage, London. Kahneman, D., Slovic, P., Tversky, A., 1982. Judgment Under Uncertainty: heuristics and biases. Cambridge University Press, Cambridge. Kaplan, S.E., Reneau, H., Whitecotton, S., 2001. The effects of predictive ability information, locus of control, and decision maker

involvement on decision aid reliance. Journal of Behavioural Decision Making 14, 35–50. Meehl, P., 1954. Statistical Versus Clinical Prediction. University of Minnesota Press, Minneapolis, MN. Meehl, P., 1986. Causes and effects of my disturbing little book. Journal of Personality Assessment 50, 370–375. Nick, M., Althof, K.-D., Tautz, C., 2001. Systematic maintenance of corporate experience repositories. Computational Intelligence 17 (2), 364–386. Siegel, S., Castellan Jr., J.N., 1988. Nonparametric Statistics for the Behavioural Sciences. McGraw-Hill Inc., New York. Wade, C., Tavris, C., 1998. Psychology, 5th edition. Longman, New York. Wilcoxon, F., 1945. Individual comparisons by ranking methods. Biometrics 1, 80–83.