TECHNOLOGICAL
FORECASTING
AND SOCIAL CHANGE
28.43-61
(1985)
EFTE: An Interactive Delphi Method KEITH R. NELMS and ALAN L. PORTER
ABSTRACT
A new expert opinion technique is developed and applied to estimate the impact of information technologies on clerical work. EFTE stands for estimate, feedback, talk, estimate-the sequence of steps involved in this variant on the Delphi approach. EFTE offers advantages in obtaining expert opinions on complex tasks where social interaction poses little problem.
We recently participated in a study attempting to forecast the effect that office automation will be having on clerical employment [49]. This forecast depended directly on an evaluation of the effects of technological advancement on clerical productivity. Analytical models were explored as a means of linking technology and clerical productivity. However, available models proved inadequate for the problem at hand. As with many forecasts, we turned to expert opinion methods [ 11. This paper sketches the needs of this forecast and assessment study with respect to opinion capture and summarizes the literature review conducted in the course of defining an appropriate opinion capture technique. The modified Delphi procedure that was adopted is detailed, and a brief statement of the results is made. Methodological considerations in selecting expert opinion approaches are structured in terms of a set of basic operations: estimate, feedback, talk (and estimate in the case of EFTE). Opinion Capture Requirements The study for the Department of Labor set out to provide forecasts of clerical employment in banking and insurance for the time period 1985-2000. The forecasts needed to reflect the impact of advances in office automation technology as well as the effects of general industry trends. Figure 1 represents the six component model used in this research project [44, 491. The components represent the following information: Work Description. The Work Description is composed of a structured model of clerical work in banking and insurance. Clerical work is broken into a series of job “clusters” representing similar work. A task-function matrix is used to describe the
KEITH NELMS is Research Engineer with the Economic Development Laboratory, Georgia Institute of Technology Research Institute. ALAN PORTER is Associate Professor in Industrial & Systems Engineering. He is co-author of A Guidebook for Technology Assessment and Impact Analysis (North Holland, New York, 1980). Address reprint requests to Prof. Alan L. Porter, School of Industrial and Systems Engineering, Georgia Institute of Technology, Atlanta, Georgia 30332. 0 1985 by Elsevier Science Publishing
Co., Inc.
0040-1625/85/$03.30
K.R. NELMS AND A.L. PORTER
44
TECHNOLOGY FORECAST
DESCRIPTION
i
-A
I
I NDIJSTRY WOW .LOAD FOFiECAST
TECHNOLOGY ASSESSMENT
FROCESS I ING ALGORITHM
i----l CLERICAL EMF’LOYI”IENlFOF:ECAS-r
Fig. 1.
The forecast model.
time distribution of work in each of the clusters (see Table 1). The tasks are based on commonly observed clerical activities, the functions on a study of information flow. Technology Forecast. The technology forecast summarizes literature research into the future of technologies that are key to office automation. Quantitative data are recorded when available. Industry Workload Forecast. The industry workload forecast component projects the amount of clerical workload expected for the banking and insurance industries.
Task-Function
TABLE 1 Matrix AnalysisBanking (Computation Functions
Tasks Type/Process/Distribute Supervise Clerical Bookkeeping and Accounting File/Retrieve Operate Equipment Analyze/Prepare Cash Operations Accounts Records Maintenance Executive Support Company Mail PR/Communications Receptionist
Input
Process
Output
5.56 1.85 -
14.81
7.41 3.70
3.70
3.70
3.70 -
3.70
5.56
3.70 -
and Bookkeeping
Cluster)
(in Percent)
Database
-
Communication
3.70 7.41 14.81 -
Monitoring
1.85 -
7.41 -
-
-
-
7.41 -
Job Titles Included: (by percent of clerical employment in U.S. banking for this cluster of related jobs): Bookkeeper (1.40), Payroll Clerk (0.23). Accounting Clerk (2.67), Statistical Clerk (0.27), Tabulating Machine Operator (0.03), Bookkeeping, Billing Machine Operator (1.18).
EFTE: AN INTERACTIVE DELPHI METHOD
45
Technology Assessment. The technology assessment component, the component requiring an opinion capture procedure, must relate the unstructured information from the technology forecast component and the structure of the work description component to produce a quantitative forecast for the effects of automation on clerical productivity. This estimate must be in a form compatible with the work description structure. Processing Algorithm. The processing algorithm component produces the clerical employment forecast on the basis of quantitative information from the work description, technology forecast, and industry workload forecast components. Clerical Employment Forecast. The clerical employment forecast component projects clerical employment and clerical productivity at various levels of aggregation for the period 1985-2000. Note that the forecasting approach as defined in Figure 1 divorces industry workload growth trends from the impacts of technology on clerical activity. Thus the technology assessment component of the model-the portion of the model requiring opinion techniques-is ultimately a forecast of the impact of office automation technology on clerical productivity in the industries. The technology assessment requires judgment based on the work description and technology forecast model components, as well as a general knowledge of industries, economics, marketing, and organizational behavior. This judgment should provide a quantitative forecast of clerical productivity as a function of time (it is assumed that the change in productivity with respect to time is primarily driven by advances in office automation technology). Since the chief impact of technology is assumed to be in the processing of information, automation effects are estimated with respect to the information-oriented functions in the work description component. To facilitate judgment and more easily relate these functions to specific technologies, a set of subfunctions was derived (see Table 2). As required by the processing algorithm component of the model, the opinion-capture technique must determine the percent of 1980-based time that is required to perform a fixed amount of work (with 1980 assumed to require 100 percent) for each of the years considered. The judgments are made at the subfunction level of detail. We pondered which “experts” to engage. Other parts of the study had called on insurance and banking leaders to estimate the industries’ likely response to the forthcoming office automation technologies [49]. We decided, however, that industry experts lacked a “micro” level expertise for this assessment-that is, expertise on how new technology would alter specific task productivity. The appropriate expertise for this technology assessment task was found in the Georgia Tech research team. Professors, graduate students, and area professionals were involved in various portions of this study; they were already familiar with the concepts and components of the assessment. If this group were to be used as a source of expertise for the technology assessment, however, two considerations had to influence selection of the opinion technique: Communication. The study team as a group contained the appropriate background to make the needed judgments; however, none of the individuals alone possessed adequate knowledge for making such assessments. Although no single individual was indispensible to the judgment task, at least one of the group responsible for the work description component had to be involved in the assessment process. The same was true for the subgroup involved in the technology forecast component.
K.R. NELMS AND A.L. PORTER
46 TABLE 2 Subfunctions Input Function Srrucrured input has a predictable and rigid structure. Unsrrucrured input refers to material that lacks a rigid or predictable presented to the clerical worker.
structure (on a detailed scale) as it is
Processing Function Structured Processing consists of well-defined tasks. It assumes that the clerical worker knows what to do and how to do it. Unstructured Processing consists of new, unusual, unique, and undefined processing tasks. Output Function Temporal Ourput consists of small amounts of information that have a limited (or uncertain) lifetime. Distribution Ourpur consists of output of a more substantial nature with respect to volume of information, lifetime of output, and audience. Archival Output consists of records to be stored for possible future reference. Data Base Function (No subfunctions) Communications Function Local Temporal Communications consist of communicating nonenduring information in the local environment. Exrernal Temporal Communicarions are nonenduring communications not in the local environment. Bulk Transfer Communications are concerned with the communication of data (numbers and characters) an unprocessed form and information in the form of reports. Systems Monitoring Function People Monitoring is the supervision of people. Equipment Monitoring includes the monitoring of machines and equipment. InformarionlMaterials Moniroring includes such activities as the monitoring petty cash, etc.
in
of office supplies inventories,
Geogruphics. All of the members of the study team were in the metropolitan area. Thus, it was relatively easy to assemble the team for discussions.
Atlanta
Opinion-Capture Techniques Expert opinion literature contains a number of opinion-capture techniques. Porter et al. [47] recognize four categories: genius (single-individual) forecasting, survey (polling) forecasting, panel (face-to-face interaction) forecasting, and Delphi (survey with feedback without face-to-face interaction) forecasting. Research on noninteracting group techniques has shown statistical aggregation superior to individual judgment only in relatively simple tasks; yet, research on interacting groups has demonstrated clear superiority for groups at all levels of task difficulty [ 13, 20, 30, 33, 35, 41, 48, 541. Later research has indicated that for the most difficult judgment tasks, interacting groups rarely reach the level of their most accurate members [22, 3 11. It is important to note that a body of research has indicated the inferiority of interactive groups in some settings [37]. In an effort to overcome the undesirable effects of group interaction while retaining the positive aspects of interacting group judgments, Dalkey and Helmer [12] developed the Delphi technique. As originally conceived, the Delphi technique was designed to
EFTE: AN INTERACTIVE
DELPHI METHOD
41
address two primary goals: freedom from group pressure and effects (via anonymity), and consensus (attained via feedback and iteration). The Delphi technique has generated serious discussion in the field of opinion capture [38]. Although questioned by theorists, it has been used in many applications. The Delphirelated literature found pertinent to this research can be divided into four broad categories: Delphi Application Literature, in which the Delphi or some adaptation of it is used in solving a real-world problem. These citations usually do not contain a highly critical discussion of the technique (cf. [23, 431). Delphi Methodology Debate Literature, in which one body of literature “fleshes out” the basic techniques of the Delphi in an attempt to enhance [2, 4, 5, 9, 10, 11, 32, 581 while another debates the merits of the technique [3, 8, 14, 24, 27,45, 551. Of particular interest is the debate stimulated by a Rand report authored by Sackman [51]. Sackman harshly criticized Delphi as weak social science and suggested that the technique be abandoned. Responses expressed both assenting and dissenting views [6, 19, 29, 521. Delphi Adaptation Literature, in which authors propose alterations and adaptations to the “basic Delphi procedure” (cf. [25, 34, 561). For instance, one of the interesting adaptations postulates an “interactive Delphi” based on time-sharing computer services. Alternate Techniques Literature. Even the severest critics of Delphi acknowledge its growing popularity. An increasing need for opinion capture in government, industry, and academia drives this popularity. Several authors offer alternate opinioncapture techniques [ 15, 18, 21, 39, 40, 42, 46, 50, 571. This literature usually describes experiments on a newly proposed technique, on Delphi, and on one or two other techniques previously established in the literature. The Delphi serves as the standard for comparison. In general, the newly proposed technique is found superior, with the Delphi technique rating either as a close second or as a distant last. Criticism of the experimental methods of other researchers is often advanced. Of particular interest is research performed by Fischer [16, 171 who concludes that group judgment is superior to individual judgment, but that “. . . from a practical standpoint, there is no evidence to suggest that the method used to aggregate opinions will have a substantial impact on the quality of the resulting forecast.”
Technique Design A review of the opinion-capture literature provides no generally superior method of opinion capture. Opinion-capture methodology should probably be designed in light of particular forecasting situations [42]. An opinion-capture technique for a problem at hand must address several issues: the effects of interaction, roles of consensus and stability, and the nature of feedback. INTERACTION
EFFECTS
The Delphi technique, originally designed to bypass problems associated with group interaction, is now touted by many practitioners as a communication device. One of its founding fathers has remarked, “Delphi represents a useful communication device among a group of experts and thus facilitates the formation of a group judgment” [24]. However, the communication facilitated by the Delphi technique is too restricted for many problem situations. The requirement of written feedback, editing, and distributing places a high
48
K.R. NELMS AND A.L.
PORTER
cost on the communication of ideas. The distance and anonymity associated with Delphi prevent a meaningful “discussion” even when such interaction would be advantageous. The nominal group technique (NGT) and its variants offer another approach to opinion capture. In its basic form, the NGT requires participants to generate ideas in private. Participants then engage in a round-robin meeting in which each offers one idea at a time. Discussion is then allowed. Finally an appropriate form of vote is taken to determine the group’s final answer [57]. The nominal group technique has been investigated in several studies. Gustafson and his associates [21] found it significantly superior to the Delphi variant employed in their research. However, others found no significant difference between the NGT approach and Delphi [ 17,571. Regardless, the NGT approach permits face-to-face group interaction. This seems to be a requirement for the research task at hand. However, it should be noted that NGT was designed for the generation and capture of ideas rather than for the estimation of prespecified quantities. Many aspects of the NGT approach as defined in the literature are irrelevant or administratively unwieldy for the technology assessment task. The research of Gustafson et al. [21] provides a helpful framework for designing an opinion capture technique by dimensioning techniques at an abstract level. The research examined three types of group opinion capture techniques: Talk-Estimate (TE), which is analogous to an interacting group such as a committee or a panel, Estimate-Feedback-Estimate (EFE), which is represented by a Delphi, and Estimate-Talk-Estimate (ETE), which is a surrogate for the Nominal Group Technique . Although the Gustafson findings have been questioned because of procedural uncertainties [ 17, 571, the results provide useful abstractions for classifying opinion techniques. They identify three distinct processes in opinion capture--talk (interaction), feedback (one-way back communication), and estimate (the decision process). Implicit in the Gustafson approach is the possibility that other meaningful combinations of talk, feedback, and estimate are possible. This research considers different combinations of these individual activities. Other research supports talk (face-to-face interaction) in opinion capture. Scheele [52] suggests that a main cause of participant dropout in Delphi is the “misuse of experts,” whereby experts are forced to “fit” their knowledge into a question format designed by someone less knowledgable, without an opportunity to interact. Dalkey et al. [77] found that additional relevant facts introduced during iterations of a Delphi survey were incorporated into the judgment process. In summary, there is considerable interest in face-to-face interaction in opinion capture even though simple interacting group decision processes have proven consistently problematic in research over half a century. Techniques have been developed that incorporate some measure of face-to-face interaction with more structured opinion techniques. Research has shown that these new techniques (e.g., NGT), may be as good as, if not better than, the Delphi technique. We assert that the design of an opinion technique should be contingent on the features of the forecasting/assessment task at hand. CONSENSUS
AND STABILITY
The idea of consensus has long been a focal point of debate on Delphi methodology. Delphi was originally designed in part as an attempt to arrive at a consensus among experts. The goal was to arrive at a “good” consensus not driven by peer and political
EFTE: AN INTERACTIVE
49
DELPHI METHOD
pressures [4, 121. Censensus was considered the stopping criteria for most early Delphi studies. However, the standards for “consensus” in a Delphi survey have never been rigorously established [5 11. The desirability, reliability, and practicality of consensus in Delphi surveys have been, and continue to be, debated [7, 8, 14, 27, 34, 46, 51, 551. A growing body of research questions consensus as a stopping criterion for Delphi studies [5, 9, 531. Stability of results between two Delphi rounds has been suggested as a more appropriate criterion. This implies that consensus should not be considered a primary goal of the Delphi. Although earlier studies considered stability of group feedback as sufficient for determining the Delphi stopping point, Chaffin and Talley [5] offer evidence that stability of individual results should be required. The use of stability as a stopping criterion may shorten Delphi studies. Most changes in Delphi response occur in the first two rounds [ 14, 361. FEEDBACK
NATURE
AND TIMING
It has been found, almost without exception, that the feedback in the Delphi process improves the quality of judgment [cf. 2, 111. Dalkey et al. [ 1 l] also find indications that the format of the feedback has relatively little effect on the quality of the results. The Dalkey research did point to a danger in distributing only selected portions of feedback opinions-supplying only the “dissenting” opinions tends to degrade the quality of subsequent Delphi judgments. Research by Waldron [58] shows the quality of the Delphi judgment varying with the time between estimation and feedback. The shorter the delay between making judgment and feedback, the better the quality of the subsequent round’s judgment. The Estimate-Feedback-Talk-Estimate Procedure On the basis of theoretical principles and empirical findings in the opinion-capture literature and a consideration of the needs of the technology assessment component of the clerical forecast model, a new technique was devised. The new technique, to be called an estimate-feedback-talk-estimate (EFTE) procedure, is a variant of the Delphi technique-it might in fact be referred to as an “Interactive Delphi” procedure. DESCRIPTION
OF THE EFTE PROCEDURE
The EFTE technique
implemented
in this research consists of the following
steps:
1. Participants are given background information to be used in making the opinion judgments. This background information explains the work description structure and summarizes the technology forecast information. In addition, pertinent information is provided about the banking and insurance industries. 2. Participants are assembled face-to-face in a conference room. Questions regarding the background information are resolved by the Delphi manager. Discussion among the participants is discouraged. 3. A Delphi questionnaire is given to each participant. The instructions are “to estimate what percent of the 1980 base-time is required to complete a fixed amount of work for 1985, 1990, 1995, and 2000.” This is to be done for each information-oriented work subfunction by estimating the extent to which the technology commercially available during that year will be in use. The participants complete the questionnaire and return it to the Delphi leader. 4. The questionnaire results are summarized and displayed before the entire group. Answers are ranked from high to low. The median and the quartiles are shown. In addition, the range is provided.
50
K.R. NELMS AND A.L. PORTER
5. The feedback results are discussed freely among the participants.
6.
7. 8. 9.
10.
The anonymity of each individual’s survey response is maintained. A second Delphi round is performed. In addition to the questionnaire, the participants are given index cards for anonymous questions and comments. The participants complete the questionnaire, write questions on the index cards (if they wish), and return both to the Delphi manager. The results of the second round are summarized and posted. The anonymous questions are read and recorded for display. The participants are again allowed to discuss the Delphi feedback. The Delphi results are examined for individual stability. If sufficient stability is found,’ the process is terminated. If stability is not found, then steps 6 through 9 are repeated. Final results are summarized on paper. Additional statistical analysis is performed. This information is distributed to all participants. Comments and observations are solicited.
The “Delphi manager” in the EFTE technique serves a purely administrative role, and so is not responsible for leading or motivating the group discussions. This ERIE procedure provides several significant theoretical advantages in opinion capture for this technology assessment task: Protection from Group Effects. Anonymity is preserved in the opinion polling and in the other phases of the technique to the extent the participant chooses to be anonymous. Questions may be posed to the group anonymously. Not pushing for consensus also serves to reduce group pressures. Face-to-Face Interaction. This “Interactive Delphi” permits the open discussion of ideas essential for this task since each individual does not have the full expertise required for valid judgment. However, there is a procedural structure and an objective basis for the discussion (e.g., the feedback information is displayed for the entire group). This diverts attention away from social considerations and toward task considerations. Participants can discuss positions relative to other participants without having to reveal their own positions. Initial conversation that might color the individual participant’s first judgments is prohibited. Feedback. The Delphi-like iterative judgments allow individuals to refine their thinking in light of other judgments and in light of additional information. Feedback from each round is very fast. This preserves the participants’ train of thought in making subsequent decisions. Stability Stopping Criterion. The stability-stopping criterion allows honest disagreement to exist in the final stages. This avoids the often unproductive pressures created when consensus is required. The degree of uncertainty is reflected in the final results by the standard deviation, range, and interquartile range of the answers. These “measures of uncertainty” may be the foundation of sensitivity analysis. Fast Start to Finish. The “Interactive Delphi” approach allows the Delphi results to be obtained in less calendar time than the usual Delphi procedure.
‘Computerized operation of the response mechanism would permit statistical measures of stability in “real time.” One might consider aggregation of individual changes (e.g., average percentagt change) or of patterns across respondents (e.g., Chi-square test). Alternatively, as in this case, participants can be polled on the degree to which results are likely to change further.
EFTE: AN INTERACTIVE
DELPHI METHOD
RESULTS OF THE EFl-E PROCEDURE
51
IN PRACTICE
Ten individuals were scheduled to participate in this “Interactive Delphi”; one was unable to attend, and one had to leave early. There were actually two parts to each round. The participants were asked to estimate the “Maximum Possible Impact” that technology could have on clerical productivity as well as the “Actual Expected Impact.” The participants were a congenial group with no strong personality conflicts. Although there was considerable disagreement at times (reflected by the considerable lack of convergence in a few judgments), the disagreements reflected task complexities rather than social complexities [26]. Analyses of the results of the EFTE procedure are shown in Table 3a. Note that the median, upper quartile, and lower quartile serve as a basis for the technology assessment component. The median forms the “base case,” while the quartiles are used in sensitivity analysis. Response as a Function of Year. Without exception, the median responses decreased over time. However, two significant patterns emerged with respect to response convergence as a function of time. These patterns are shown in Table 3b, which gives the ratio of each subfunction-year value to the subfunction’s smallest range and standard deviation. The two dominant patterns are: 1. Increasing Uncertainty. Four of the fourteen subfunctions showed increasing spread as measured by both the range of responses and the standard deviation of responses. This is intuitively sensible since forecasting the distant future may be seen as more uncertain than forecasting the near future. The subfunctions showing decreasing convergence are archival output, external temporal communications, people monitoring, and equipment monitoring. 2. Concave Convergence. Nine of the remaining ten subfunctions showed a concave convergence pattern with respect to time. (Local temporal communications show a somewhat opposite pattern, but that is dominated by high convergence over time and appears only with certain measures.) There was less disagreement in the years 1985 and 2000 than in the years 1990 and 1995. By way of explanation, it may be hypothesized that the participants were in better agreement with respect to current productivity and eventual productivity than with respect to the timing and mechanism of change. It should be noted that some very large ranges and standard deviations are included in this group of responses. Differences between Round One and Round Two The differences between rounds one and two for the Year 2000 are highlighted in Tables 4a and 4b. The changes in the response statistics fall into three broad categories: 1. Median Drops, Large Range Convergence. For five of the fourteen subfunctions, the median response dropped noticeably. In addition, the round two range was less than one-half as large as the round one range. This group includes structured processing, archival output, external temporal communications, bulk transfer communications, and equipment monitoring. With the exception of equipment monitoring, this group also witnessed an equally significant contraction of the interquartile range (IQR).
Q
Q
Q
RANGE RANGE” SDEV SDEV”
IQR
LO
MED
UPQ
1995
RANGE RANGE” SDEV SDEV”
IQR
LO
UP Q MED
1990
RANGE RANGE” SDEV SDEV”
IQR
LO
MED
UPQ
1985
63 58 48 15 50 20 15.40 6.72
80 78 68 12 50 15 16.01 5.77
93 90 90 3 45 5 13.86 I.86
stnlc
Input
73 70 63 10 50 20 14.79 6.24
88 85 80 8 30 10 8.99 3.44
97 95 93 4 18 8 5.52 2.36
Unstr
65 58 50 15 53 15 15.91 6.29
80 80 77 3 55 5 16.64 1.86
95 90 90 5 45 5 14.13 2.36
strut
80 13 60 20 55 30 17.32 10.17
90 87 83 7 35 IO 10.02 3.50
95 95 91 4 19 5 5.36 1.97
Unstr
Process
78 70 70 8 20 10 6.09 3.82
90 85 83 7 25 10 7.81 3.44
96 95 93 3 8 6 2.63 1.97
Temp
70 64 60 10 25 IO 8.01 4.61
85 80 78 7 28 10 8.16 3.44
95 93 88 7 18 10 5.66 3.13
Dist
output
65 60 50 15 45 20 12.98 6.87
85 85 15 10 25 15 8.82 5.53
95 95 88 7 18 15 6.78 5.59
Arch
73 65 48 25 74 35 22.62 11.79
88 80 70 18 70 30 21.93 9.32
98 95 85 13 50 18 15.63 6.26
Database
73 70 68 5 15 10 4.64 2.89
85 83 78 7 15 10 4.96 3.73
95 95 90 5 18 5 5.23 2.27
LTemp
65 60 50 15 25 25 9.98 8.50
81 80 78 3 15 7 4.21 2.14
94 90 90 4 8 5 2.87 1.97
X Temp
Communications
TABLE 3 Analysis of EFTE Procedure Results for the Years 1985-2OtNl
60 50 40 20 20 25 16.92 8.16
81 73 63 18 65 22 19.53 3.44
95 90 88 7 48 10 14.35 3.44
Bulk
92 88 83 9 20 13 6.30 4.30
95 93 90 5 6 5 2.64 2.50
100 98 91 3 5 5 1.83 1.53
People
70 63 50 20 35 20 11.39 8.37
83 80 75 8 28 15 8.31 4.49
95 90 90 5 15 5 4.64 2.36
Equip
Monitor
68 65 58 10 50 20 14.09 6.24
85 83 80 5 30 5 8.45 2.50
95 95 90 5 18 5 5.29 2.36
Inf/Mat
RANGE 1985 1990 1995 2000 STD DEV 1985 1990 1995 2000
YEAR RATIOS
1.00 1.63 2.68 2.33
I .41
1.63 1.57 1.00
1.00 1.67 2.78 2.22
60 53 45 15 40 20 12.85 6.92
1.50 1.67 1.67 1.00
40 40 33 7 30 IO 9.82 3.82
1.13 1.33 1.27 1.00
1.00 1.22 1.18 1.00
40 30 25 15 45 15 12.48 6.24
1.00 1.87 3.23 2.71
1.00 1.84 2.89 2.63
60 55 45 15 50 20 14.52 7.45
1.00 2.97 2.32 1.63
1.00 3.13 2.50 1.25
60 60 53 I 10 10 4.28 3.82
1.23
1.42
1.00 1.44
1.00 1.56 1.39 1.11
50 45 40 10 20 10 6.96 5.00
1.00 1.30 1.91 2.24
1.00 1.39 2.50 2.50
40 30 20 20 45 30 15.19 10.00
1.00 1.40 1.45 1.40
1.00 1.40 1.48 1.26
63 38 23 40 63 45 21.90 17.18
1.13 1.07 1.00 1.56
1.20 1.00 1.OO 1.33
63 55 50 13 20 15 7.26 6.07
1.00 1.49 3.48 4.15
1.00 1.88 3.13 4.38
50 40 33 17 35 25 11.92 8.37
1.39 1.89 1.63 1.00
2.40 3.25 1.00 1.90
30 23 20 10 38 10 10.36 4.49
1.00 1.44 3.44 5.63
1.00 1.20 4.00 7.00
88 80 75 13 35 20 10.31 6.07
1.00 1.79 2.45 3.22
1.00 1.87 2.33 2.67
55 40 30 25 40 30 14.15 12.13
1.00 1.60 2.66 2.50
1.00 1.67 2.78 2.22
55 50 35 20 40 30 13.23 9.43
“High and low extreme responses dropped. Column headings identify the 14 separate information handling subfunctions forecast (see Table 2). Row labels describe various statistical characteristics (all in percentage of 1980 base time required to perform a given unit of work): Up Q (upper quartile), Med (median), and Lo Q (lower quartile) identify the 25th. 5Oth, and 75th percentile responses. IQR (interquartile range) is the difference between Up Q and Lo Q. Range is the difference between the highest and lowest responses among the participants. S Dev is the standard deviation.
(b)
Q
RANGE RANGE” SDEV SDEV”
IQR
LG
MED
UPQ
Q
40 40 35 5 10 38 3.82
Q
UP Q MED” LO Q IQR” RANGE” AVG” SDEV”
IQR
LO
RANGE AVG SDEV
40 25 25 40 38 13.33
50
StNC
40 40 33 7 30 34 9.82
UP Q MED
ROUND Two
RANGE AVG SDEV
IQR
LO
UP Q MED
(a) ROUND One
Input
60 53 50 10 20 53 6.92
60 53 45 15 40 49 12.85
63 50 40 23 40 51 12.64
Unstr
40 30 25 15 15 32 6.24
40 30 25 15 45 32 12.48
60 40 20 40 55 38 19.44
strut
60 60 53 7 10 51 4.28 60 60 55 5 10 58 3.82
60 55 50 10 20 53 1.45
13 60 33 40 90 60 30.52
Temp
60 55 45 15 50 51 14.52
75 50 45 30 65 55 19.44
Unstr
Process
50 45 40 10 10 45 5.00
50 45 40 10 20 44 6.96
63 50 40 23 70 52 18.86
Dist
output
30 30 20 10 30 30 10.00
40 30 20 20 45 33 15.19
15 60 28 47 70 54 23.31
Arch
60 38 25 35 45 41 17.18
63 38 23 40 63 39 21.90
63 45 25 38 80 46 23.39
Data Base
60 55 50 10 15 56 6.07
63 55 50 13 20 56 7.26
65 60 50 15 55 59 14.80
L Temp
50 40 40 10 25 41 8.37
50 40 33 17 35 39 11.92
63 50 23 40 60 44 20.34
X Temp
Communications
TABLE 4 Analvsis of EFTE Procedure Results for the Year 2000
30 23 20 10 10 24 4.49
30 23 20 10 38 23 10.36
5s 40 23 32 50 39 17.71
Bulk
85 80 80 5 20 81 6.07
88 80 75 13 35 80 10.31
94 80 78 16 25 84 8.71
People
50 40 30 20 30 42 12.13
55 40 30 25 40 44 14.95
65 50 30 35 60 52 19.44
Equip
Monitor
50 50 40 10 30 41 9.43
55 50 35 20 40 45 13.23
60 50 30 30 50 45 16.67
Inf/Mat
ONE AND ROUND TWO ROUND 2) 3 - 10 5 1) 0.50 0.27 0.31 0.65 0.38 0.50 1.02 0.64 0.75 -5 0.14 0.43 0.37
0 0.11 0.18 0.14
2
-3
0.71 0.88 0.78 0.63 0.78
0.67 0.50 0.66 0.44 0.33
0
0
0.65 0.67
0.75 0.77 0.84
-3
0.49 0.40
0.71 0.59 0.70
2 0 7
0.42 0.43 0.59
0.27 0.87 0.49
0.56 1.05 0.94
2
- 10
-5
-7
-3 - 10 0
0.43 0.43 0.65
-30
0.26 1.00
0.26 1.00 0.43
1 0 0
0.20 0.31 0.58
- 17
5
1
0.37 0.25
0.57 0.38 0.59
-3
0.80 0.81 1.18
0
0.63 0.67
0.50 0.33
0.75 0.50 0.71
0.75 0.80 0.81
2 5
-5
0.60 0.67 0.79
0
0
-2 -5
0.50 0.71 0.77
- 10
“High and low extreme responses dropped. Column headings identify the 14 separate information handling subfunctions forecast (see Table 2). Row labels describe various statistical characteristics (all in percentage of 1980 base time required to perform a given unit of work): Up Q (upper quartile), Med (median), and Lo Q (lower quartile) identify the 25th. 50th. and 75th percentile responses. IQR (interquartile range) is the difference between Up Q and Lo Q. Range is the difference between the highest and lowest responses among the participants. Avg (average) is the mean and S Dev a the standard deviation.
(c)ROUND TWO ANALYSIS (COMPARING ALL DATA VS DATA WITH EXTREMES DROPPED) SHIFT (ALL TO DROP) AVG 4 4 0 2 1 1 UPQ 0 0 0 0 0 0 LG Q 2 5 0 5 2 0 RATIO (DROP / ALL) RANGE 0.33 0.50 0.33 0.40 1.00 0.50 IQR 0.71 0.67 1.00 0.67 0.71 1.00 SDEV 0.39 0.54 0.50 0.51 0.89 0.72 RATIO (IQRRANGE) ALL 0.23 0.38 0.33 0.30 0.70 0.50 DROP 0.50 0.50 1.00 0.50 0.50 1.oo
(b)CHANGE BETWEEN ROUND SHIFT (PROM ROUND 1 TO MEDIAN 0 RATIO (ROUND 2 / ROUND RANGE 0.25 IQR 0.28 SDEV 0.74
K.R. NELMS AND A.L. PORTER
56
2. Median
Stable, Large Range Convergence. For another six subfunctions, the median value remained relatively constant while the range converged by 50 percent or more. The six were structured input, unstructured input, unstructured processing, temporal output, distribution output, and local temporal communications. With the exception of unstructured input and local temporal communications, the IQR also converged by 50 percent or more. 3. Median Stable, Moderate Range Convergence. The remaining three subfunctions are characterized by a stable median and by range convergence of less than onehalf. This group includes database, people monitoring, and information and materials monitoring. Although the IQR for the two monitoring subfunctions mirrored the changes in range, the IQR for the database actually increased slightly.
It is interesting to note that although most of the medians remained relatively the same for the two rounds, there were instances of significant change (archival output and bulk transfer communications). Characteristics
of responses for the year 2000
The responses for the year 2000 were examined from two analytical perspectives, as shown in Tables 4a and 4c. First, the effect of individual extreme responses was examined by calculating relevant statistics for the set of responses, with the highest single and the lowest single points removed. In addition, the subfunctions were aggregated to the function level for the purpose of interpreting the results. For each subfunction the two extreme responses were omitted and pertinent statistics were calculated. These statistics were then compared to the statistics for the full set of responses. Comparison of ranges and standard deviations for each subfunction indicates considerable convergence for all but one (temporal output), which can be attributed to the removal of the two extremes. We chose to make use of the nonparametric measures in the assessment (the 25th, 50th, and 75th percentile values) because of the general robustness of such measures. The subfunction medians for the year 2000 vary between 23 and 80 percent (between 23 and 60 percent if people monitoring is ignored). However, if these subfunction medians are averaged on the basis of their functional relationship, the aggregated medians range only from 38 to 57 percent (38 to 47 percent if people monitoring is ignored). Thus, despite the wide variation in subfunction medians, functions tend to show increasing productivity at roughly the same rate. SUMMARY
OF EFTE RESULTS
In summary, several significant patterns emerge from the analysis of the EFTE results. Although most of these patterns would likely be present in a normal Delphi procedure, one or two are probably specific to the EFTE approach. The range and standard deviation between the first and second rounds converge to a large degree in most subfunctions. Although the medians for most subfunctions remain relatively stable, a marked shift downward is noted in several. This could be attributed to the information exchange which is the basic attribute of the EFTE approach. This is consistent with the findings of Dalkey et al. [ 1 I]. As expected, medians consistently decreased as a function of years. Note that the greatest decrease occurred in the 1990s for most subfunctions. Although uncertainty, as reflected in the range, standard deviation, and interquartile range, often increased with time, it was common for uncertainty to be the greatest in the intermediate years 1990 and 1995. Thus, the participants may have similar impressions about the long-range effect
EFI-E: AN INTERACTIVE DELPHI METHOD
57
of office automation technology but differ in perceptions of the timing of change. (Timing could have major policy implications in this case in balancing workload, demographics, retraining options, etc.) Although the range and standard deviation of the subfunctions tend to be rather sensitive to outlying responses, the median and quartiles are relatively unaffected. This is generally to be expected, especially if most responses are clustered toward the median.
Implications This study demonstrates both a new expert-opinion technique in EFTE (estimatefeedback-talk-estimate) and a design strategy for devising a technique best suited to situational needs. The prime motivation and cornerstone of the EFTE procedure is the ability of the participants to interact within a Delphi-like decision process. (Note that, unlike the standard Delphi procedures, EFfE provides immediate feedback and does not try to force a consensus.) EFfE may well represent a significant new procedure for opinion capture in multidisciplinary tasks. Further research on the EFTE procedure is needed to provide a better understanding of its strengths and weaknesses. Different forecasts and assessments demand different strategies. Ideally, one would like to combine independent quantitative forecasts with opinion-based predictions. Convergence between the quantitative and qualitative assessments provides great confidence in the findings; divergence highlights areas to be explored. In the office automation study, available data allowed projection for various information technology components (from chips to microcomputers)-for sales, capabilities, and costs. We decided, however, that these were not sufficient for projecting office automation. We were able to juxtapose the impact assessment estimations described herein with other opinion-based projections for the two industries [49]. Results corresponded well, showing dramatic reductions in office job employment through the 1990’s. Within the family of expert opinion techniques there is a range of choices. We identify at least six considerations in choosing among Delphi, NGT, EFTE, etc. 1. Logistics. Very simple tasks and limited resources indicate a single estimate round. In general, a survey is preferable to genius forecasting. In all other situations, feedback appears generally desirable, implying multiple rounds with estimate, feedback, and estimate as a minimum. 2. Feedback. The format used appears uncritical, except for the evidence against emphasis on outlier rationales only. This study points toward advantages of minimum delay times. 3. Communication Medium. Options include in-person (panel), panel with restricted or structured information exchange (as in NGT or EFTE), teleconference, computer conference, and mail. Choice would seem to be a function of access to experts, resources, and time. 4. Sample Size. While it is always desirable to maximize the sample size, Hogarth [28] supports groups of a dozen or fewer as converging on hoped-for validity levels (even for only moderately valid individual judgments). 5. Stopping Rule. Trade off feasibility (experts’ patience, resources, and time available) against stability (of group and individual judgments). Research is needed to formulate stability criteria. Consensus is another possible criterion, but comes at the cost of increased group pressures. _ -
58
K.R. NELMS AND A.L. PORTER
6. Interaction. This depends on resources and constraints, of course, but we suggest it can be usefully considered in terms of task and the social complexities of the situation [26]. Task complexity reflects a difficult problem and dispersed facts/skills among group members. Social complexity rises as members have grounds for serious disagreement and care about the outcome. We nominate opinion techniques for consideration accordingly (parentheses indicating techniques with special problems):
Task Comolexitv
Social Comdexitv
IOW
IOW
IOW
high
high high
high
IOW
Prime Opinion Techniaue survey, survey, EFTE, Delphi,
panel, (genius) (genius) (panel) NGT
As stated earlier, the participants in this research were on good terms with one another. The group was not disturbed by personality conflicts and there were no political spoils to be gained. In addition, the group was task-oriented and was not intimidated by healthy professional disagreement (i.e., the group exhibited low social complexity). Conversely, the task of relating diverse technological changes to specific work subfunctions over time was highly complex. Thus, the situation was almost “ideal” for the EFTE process. Further research should be conducted comparing the EFTE procedure with other opinion-capture techniques in situations requiring information sharing but having varying degrees of group complexity. Another avenue of interesting research could apply EFTE technique to teleconferencing. Teleconferencing generally follows a committee meeting approach which cynics might say simply makes the bad decisions of a committee less expensive to obtain. An additional effect of teleconferencing is the distortion of communication channels (communication between locations is forced through the telecommunications process while within one location communication employs more simultaneous channels). This raises the questions, could EFfE facilitate better decisions in teleconferenced meetings, and how should the feedback and talk components be managed in this scenario? The EFTE procedure offers a significant addition to established opinion capture techniques. Further research is needed to establish its credentials for various problem environments. We believe the framework of considering opinion techniques in terms of their essential components (estimate, talk, feedback) is useful. We also suggest six considerations to address, contingent on situation specifics, in selecting an expert opinion strategy for application.
The authors thank their close colleagues who worked together on this study-David Roessner (project director), Robert Mason, Frederick Rossini, Perry Schwartz, Peter Sassone, Frederick Tarpley, Terrye Schaetzel, and Susan Diehl. Fred Rossini did double duty on the thesis committee with Jerry Banks. We acknowledge partial support from the Employment and Training Administration, U.S. Department of Labor under Grant/Contract No. 21-13-82-13. This study does not necessarily represent the official opinion or policy of the Department of Labor or of the Georgia Institute of Technology.
EFTE: AN INTERACTIVE
59
DELPHI METHOD
References 1. Armstrong, J. S., Relative Accuracy of Judgemental and Extrapolative Methods in Forcasting Annual Earnings, Journal of Forecasting 2, 4 (1983), 437-447. 2. Best, R. J., An Experiment in Delphi Estimation in Marketing Decision Making, Journal of Marketing Research 11 (Nov. 1974), 447-468. 3. Brockhaus, W. L. and Mickelsen, J. F., An Analysis of Prior Delphi Applications and Some Observations on its Future Applicability, Technological Forecasting and Social Change 10 (1977). 103-l 10. 4. Brown, B. and Helmer, O., Inproving the Reliability of Estimatesfrom a Consensus of Experts, The Rand Corporation, P-2986 (Sept. 1964). 5. Chaffm, W. W. and Talley, W. K., Individual Stability in Delphi Studies, Technological Forecosting and Social Change
16 (1980).
67-73.
6. Coates J. F., In Defense of Delphi: A Review of “Delphi Assessment:
Expert Opinion, Forecasting, and Group Process” by H. Sackman, Technological Forecasting and Social Change 7 (1975), 193-194. 7. Cyphert, F. R. and Gant,W. L., The Delphi Technique: A Tool for Collecting Opinions in Teacher Education, Journal of Teacher Education 21 (1970), 417-425. 8. Dagenais, F., The Reliability and Convergence of the Delphi Technique, The Journal of General Psychology 98 (1978).
307-308.
9. Dajani,
10. 11. 12. 13.
14. 15.
16. 17. 18. 19.
20.
J. S., Sincoff, M. Z., and Talley, W. K., Stability and Agreement Criteria for the Termination of Delphi Studies, Technological Forecasting and Social Change 13 (1979), 83-90. Dalkey, N. C., The Delphi Method: An Experimental Study of Group Opinion, The Rand Corporation, RM-5888-PR (June 1969). Dalkey, N. C., Brown, B., and Co&ran, S. W., The Delphi Method, IV: The Effect of Percentile Feedback and Feed-In of Relevant Facts, The Rand Corporation, RM-6118-PR (March 1970). Dalkey, N. C. and Helmer, 0.) The Experimental Application of the Delphi Method to the Use of Experts, Management Science 9 (April 1963) 458-467. Dashiell, J. F., Experimental Studies of the Influence of Social Situations on the Behavior of Individual Human Adults (C. Murchison, Ed.), A Handbook of Social Psychology, Clark University Press, Worcester, Mass. (1935). Dodge, B. J. and Clark, R. E., Research on the Delphi Technique, Educational Technology, (April 1977). 58-60. Erffmeyer, R. C., Decision-Making Formats: A Comparison on an Evaluative Task of Interacting Groups, Consensus Groups, the Nominal Group Technique, and the Delphi Technique, Ph.D. Dissertation, Louisiana State University and Agricultural and Mechanical College (1981). Fischer, G. W., An Experimental Study of Four Procedures for Aggregating Subjective Probability Assessments, NTIS, NR 197-029, (1975). Fischer, G. W., When Oracles Fail-A Comparison of Four Procedures for Aggregating Subjective Probability Forecasts, Organizational Behavior and Human Performance 28 (1981), 96-l 10. Ford, D. A., Shang Inquiry as an Alternative to Delphi: Some Experimental Findings, Technological Forecasting and Social Change 7 (1975) 139-164. Goldschmidt, P. G., Scientific Inquiry or Political Critique?: Remarks on “Delphi Assessment, Expert Opinion, Forecasting, and Group Process” by H. Sackman, Technological Forecasting and Social Change 7 (1975). 1955213. Gordon, K., Group Judgements in the Field of Lifted Weights, Journal of Experimental Psychology 7 (1924),
21.
22.
23. 24. 25. 26. 27.
398-400.
Gustafson, D. H., Shukla, R. K., Delbecq, A., and Walster, G. S., A Comparative Study of Differences in Subjective Likelihood Estimates Made by Individuals, Interacting Groups, Delphi Groups, and Nominal Groups, Organizational Behavior and Human Performance 9 (1973), 28&291. Hall, E. I., Mouton, J. S., and Blake, R. R., Group Problem Solving Effectiveness under Conditions of Pooling vs. Interaction, Journal of Social Psychology 59 (1963). 147-157. Helmer, 0.) The Use of Expert Opinion in International-Relations Forecasting, Center for Futures Research and the Graduate School of Business Administration, University of Southern California, (1973). Helmer, O., Problems in Futures Research: Delphi and Causal Cross-Impact Analysis, Futures (Feb. 1977), 17-31. Helmer, 0.. Looking Forward-A Guide to Futures Research, Sage Publications, Beverly Hills, (1983) 155. Herold, D. M., The Effectiveness of Work Groups, in Organizational Behavior (S. Kerr, Ed.), Grid, Columbus, Ohio (1979). Hill, K. Q., and Fowles, I., The Methodological Worth of the Delphi Forecasting Technique, Technological Forecusting
and Social Change
7 (1975),
179-192.
60
K.R. NELMS AND A.L. PORTER
28. Hogarth, R. M., A Note on Aggregating Opinions, Organizational Behavior and Human Performance 21 (1978), 40-46. 29. Jillson, I. A., Developing Guidelines for the Delphi Method, Technological Forecasting andSocial Change 7 (1975).
221-222.
30. Jenness, A., The Role of Discussion and Social Psychology 31.
27 (1932),
Holloman,
C. R. and Hendrick,
24 (1971),
489-500.
in Changing Opinion Regarding
a Matter of Fact, Journal of Abnormal
279-296.
H. W., Problem Solving in Different Sized Groups, Personnel Psychology
32.
Jones, H. W., An Investigation of the Effects of Feedback on Variability and Central Tendency of Group the Delphi Technique, Ph.D. dissertation, Oregon State University (1973). 33. Kelley, T. L., The Applicability of the Spearman-Brown Formula for the Measurement of Reliability, Opinion while Employing
Journal 34.
35. 36. 37. 38. 39. 40. 41. 42. 43. 44. 45. 46.
47. 48.
of Education
Psychology 49.
50. 5 1.
52.
53.
54. 55. 56.
Psychology
16, (1925),
300-303.
Kendall, J. W., Variations of Delphi, Technological Forecasting and Social Change 1 I, (1977), 75-85. Knight, H. C., A Comparison of the Reliability of Group and Individual Judgements, Ph.D. Thesis, Columbia University (1921). Lanford, H. W., Technological Forecasting Methodologies: A Synthesis, American Management Association (1972). Lewis, A. C., Sadosky, T. L., and Connolly, T., The Effectiveness of Group Brainstorming in Engineering Problem Solving, IEEE Transactions on Engineering Management EM-22, 3 (Aug. 1975). Linstone, H. A. and Turoff, M. (Eds), The Delphi Method: Techniques and Applications, Addison & Wesley, Reading, Mass., (1975). Miner, F. C., Jr., A Comparative Analysis of Three Diverse Group Decision Making Approaches, Academy of Management Journal 22, 1 (1979) 8 l-93, Moskowitz, H. and Bajgier, S. M., Validity of the DeGroot Model for Achieving Consensus in Panel and Delphi Groups, Journal of Interdisciplinary Modeling & Simulation 2, 1 (1979), 67-100. Munsterberg, H., Psychology and Social Sanity, Doubleday, New York (1914). Mumigan, J. K., Group Decision Making: What Strategies Should You Use?, Management Review, (Feb. 1981). 55-62. Nasreen, E., An Application of a ModQ’ied Delphi Technique in Evaluating a Program for Displaced Homemakers, University of Maryland, Baltimore Professional Schools (1981). Nelms, K. R., An Assessment of the Effects of Office Automation Technology on Clerical Employment in the Banking and Insurance Industries, 1985-2oo0, Georgia Institute of Technology, Atlanta, Ga. (1984). Nelson, B. W., Manipulation of Delphi Statements: A Taronomyfor Research, Boeing Computer Services, Inc., Seattle, Wash. Penfield, G. M., The Relative Efjicacy of Varying Applications of Face-to-Face Interaction vs. Delphi in Developing Consensus about Relative Priority among Goals in Student Affairs, Ph.D. Dissertation, University of Cincinnati ( 1975). Porter, A. L., Rossini, F. A., Carpenter, S. R., and Roper,A. T., A Guidebook for Technology Assessment and Impact Analysis, Elsevier-North Holland (1980). Preston, M. G., Note on the Reliability and the Validity of the Group Judgment, Journal of Experimental 22 (1938)
4622471.
Roessner, J. D. et al., Impact of Office Automation on Office Workers, Georgia Institute of Technology, Atlanta, Ga. (prepared for U.S. Department of Labor, Employment and Training Administration-Contract No. 21-13-82-13) (1984). Rohrbaugh, J., Improving the Quality of Group Judgment: Social Judgment Analysis and the Delphi Technique, Organizational Behavior and Human Performance 24 (1979), 73-92. Sackman, H., Delphi Assessment: Expert Opinion, Forecasting, and Group Process, The Rand Corporation R-1283-PR (1974). Scheele, D. S., Consumerism Comes to Delphi: Comments on ‘Delphi Assessment: Expert Opinion, Forecasting, and Group Process’ by H. Sackman, Technological Forecasting and Social Change 7 (1975). 215-219. Scheme, M., Skutsch, M., and Schafer, J., Experiments in Delphi Methodology, in The Delphi Method: Techniques andApplications, (H. A. Linstone, M. Turoff, Eds.) Addison Wesley, Reading, Mass. (1975), 267-287. Smith, B. B., the Validity and Reliability of Group Judgment, Journal of Experimental Psychology 29, (1941), 420-434. Spinelli, T., The Delphi Decision-Making Process, The Journal of Psychology 113 (1983), 75-80. Turoff, M., Delphi + Computers + Communications = 7, Industrial Applications of Technological Forecasting (M. J. Cetron and C. A. Ralph, Eds.), Wiley, New York (1971).
ISmE: AN INTERACTIVE
DELPHI METHOD
61
57. Van de Ven, A. H. and Delbecq, A. L., The Effectiveness of Nominal, Delphi, and Interacting Group Decision Making Process, Academy ofManagement Journal 17, 4, (1974), 605621. 58. Waldron, J. S., An Investigation into the Relationships Among Conceptual Level, Time Delay oflnformation Feedback, and Pe$ormance in rhe Delphi Process, Ph.D. Dissertation, Syracuse University (1970). Received I9 July 1984