Computers & Operations Research 32 (2005) 2635 – 2651 www.elsevier.com/locate/dsw
Employee turnover: a neural network solution Randall S. Sextona,∗ , Shannon McMurtreyb , Joanna O. Michalopoulosc , Angela M. Smithc a Computer Information Systems, Southwest Missouri State University, 901 South National, Springfield, MO 65804, USA b Graduate School of Computer and Information Sciences, Nova Southeastern University, Fort Lauderdale, FL 33314, USA c Business and Administration, Southwest Missouri State University, 901 South National, Springfield, MO 65804, USA
Available online 31 July 2004
Abstract In today’s working environment, a company’s human resources are truly the only sustainable competitive advantage. Product innovations can be duplicated, but the synergy of a company’s workforce cannot be replicated. It is for this reason that not only attracting talented employees but also retaining them is imperative for success. The study of employee turnover has attempted to explain why employees leave and how to prevent the drain of employee talent. This paper focuses on using a neural network (NN) to predict turnover. If turnover can be found to be predictable the identification of at-risk employees will allow us to focus on their specific needs or concerns in order to retain them in the workforce. Also, by using a Modified Genetic Algorithm to train the NN we can also identify relevant predictors or inputs, which can give us information about how we can improve the work environment as a whole. This research found that a NNSOA trained NN in a 10-fold cross validation experimental design can predict with a high degree of accuracy the turnover rate for a small mid-west manufacturing company. 䉷 2004 Elsevier Ltd. All rights reserved. Keywords: Artificial intelligence; Genetic algorithm; Neural networks; Parsimonious; Employee turnover
1. Introduction To understand the nature of employee turnover it is necessary to first define the terminology. While there are many definitions of employee turnover for the purpose of this paper turnover is defined as ∗ Corresponding author. Tel.: +1-417-386-6453; fax: +1-417-836-6907.
E-mail addresses:
[email protected] (Randall S. Sexton),
[email protected] (S. McMurtrey),
[email protected] (Joanna O. Michalopoulos),
[email protected] (Angela M. Smith). 0305-0548/$ - see front matter 䉷 2004 Elsevier Ltd. All rights reserved. doi:10.1016/j.cor.2004.06.022
2636
Randall S. Sexton et al. / Computers & Operations Research 32 (2005) 2635 – 2651
“the movement of workers in and out of employment with respect to a given company” (Nature). This movement is usually considered voluntary however involuntary separations are also of concern, but will not be the focus of this research. According to the article “The Nature of Employee Turnover” there are four distinct categories of turnover that a company must consider • • • •
Voluntary separations: Termination of the employment relationship initiated by the employee. Layoffs: Suspensions from payroll that are initiated by the employer due to an economic slow down. Discharges: Permanent termination of employment for disciplinary reasons. Other: Retirement, death, and permanent disability.
Of these four categories, voluntary separations are the most problematic for companies because the employee controls the separation, and often times the company’s investment in the employee is being lost to one of its competitors [1]. When calculating a company’s turnover rate it should first be determined what employee separations will be included in the calculation. Many times unavoidable separations, or separations that the company could not control, will not be included in the rate. Unavoidable separations are very different from voluntary separations in which the company does play a role in retaining the employee. Examples of unavoidable separations include: retirement, death, permanent disability, or a spouse changing jobs to a different community [2]. Since employee turnover generally focuses on the motivation of employees to maintain the employment relationship, these unavoidable terminations are not always factored into the employee turnover rate and will be excluded for this study. The Society of Human Resource Management Research Committee author of “Employee Turnover: Analyzing Employee Movement Out of the Organization” states that “critical characteristics of employees (obtained by reviewing the distributions of their ages, employment times, salaries, and recruitment sources) can be used to describe those who leave in 6 months versus those who leave in 12, 24, or 36 months.” While turnover rates vary according to industry; organizations; geographic locations; and employee characteristics, the rates of specific groups of employees can help the company determine root causes of employee turnover [3]. 1.1. Impact of employee turnover Employee turnover is a concern for any organization due to the major impact it has on the bottom line. However, turnover does not always bring on negative consequences to the organization; there are positive aspects of turnover for both the organization and the exiting employee. Whether employee turnover impacts the organization in either a positive or negative way depends on the type of turnover that is experienced, either functional or dysfunctional. Functional turnover occurs when poor performers leave and good performers stay. This instance often occurs when the organization terminates the employment relationship. When good performers leave and poor performers stay, the organization experiences dysfunctional turnover. When looking to reduce turnover, a company focuses on dysfunctional turnover due to its negative impact on the organization. The most pressing and often overlooked impact of turnover is the loss of productivity experienced immediately after the loss of an employee. Service firms recognize that delivery of services and customer loyalty decline when employees leave, and that overall firm productivity decreases significantly due to the
Randall S. Sexton et al. / Computers & Operations Research 32 (2005) 2635 – 2651
2637
lack of manpower to accomplish the constant or increasing workload. The morale of the remaining employees also declines because many remaining employees lose the friendships of the exiting employees. Turnover affects the delivery of service and retention of customers, which are also dimensions of organizational productivity. Employee turnover “interrupts the transmission of service values and norms, which are essential underpinnings of high quality service to successive generations of employees” [2]. For example, most entry-level employees begin working in low-paying and dead-end jobs. These working conditions create poor job attitudes and high turnover, which eventually leads to poor customer service. Dissatisfied employees deliver poor service and the loss of employees decreases the quality of service. As service declines, customers become dissatisfied, which further compounds the employees’ frustration because the customers are complaining, and are inclined to make fewer purchases. Understaffed offices are not able to provide adequate customer service, and offices with new hires provide less than adequate service due to the knowledge of the new hire as compared to the exiting employee. Client retention is significantly reduced and the loss of talented employees may easily endanger a firm’s future opportunities in the marketplace [2]. 1.2. Costs of employee turnover Employee turnover costs US companies billions each year not only in direct replacement costs, but also in lost productivity. High turnover reduces productivity and drains corporate profits [4]. Steve Racz [5] states that the direct costs of employee turnover only constitute about 15–30% of total costs associated with lost employees. The other 70–85% are hidden costs such as lost productivity and opportunity costs [5]. One example of the productivity gains from reduced turnover is evident in the profit margins of one large fast food company. This company determined that stores with lower turnover rates could have profit margins more than 50% higher than those stores with higher turnover rates [6]. This statistic comes in handy when considering that the four highest turnover industries: specialty retail, call center service, high tech, and fast food spend more than $75 billion to replace more than 6.5 billion employees annually. When considering this staggering cost, it is clear that cutting turnover is imperative for organizational success. The United States Department of Labor estimates that it costs one-third of a new hire’s salary to fill a vacant position. If this cost estimate were applied to a position paying only $7 per hour a company would spend over $4300 to replace a lost worker. The Department of Labor also estimates that the cost of replacing a manager, supervisor or technical employee can range from 50% to 300% of the position’s annual salary [6]. These costs include the following: advertising fees; headhunter fees; management’s time to make decisions; HR’s time spent recruiting, selecting, and training; overtime expenses from other employees needed to pick up slack; lost productivity; lost sales; decreased employee morale; decreased stock prices; and disgruntled customers [7]. While this list includes many direct and indirect costs it is by no means a complete list of the costs associated with employee turnover. 1.3. Causes of employee turnover Employee turnover has a large impact on the organization not only in the form of direct monetary costs, but also in lost productivity. It is for these reasons that it is imperative for organizations to understand the causes of turnover and work to correct them.
2638
Randall S. Sexton et al. / Computers & Operations Research 32 (2005) 2635 – 2651
By calculating turnover rates for various groups within an organization a company can determine who is leaving and at what stage of employment. Studies have “consistently linked a number of variables to turnover. Those with the strongest empirical correlations are age, tenure, job content, and job satisfaction. Other factors have shown varying correlations including skill level, type of occupation, and education” [8]. Researchers have determined that age is one of the strongest correlating characteristics to turnover [8]. Studies indicate that the highest rate of turnover usually occurs in the first few months of employment, and poor management only complicates this infant period of employment often frustrating the new employee to the point of leaving the organization. Therefore, it is critical to have proper guidance and assimilation of a new employee upon entering the workplace.
2. Neural networks This research attempts to predict employee turnover, specifically voluntary separations that will occur within one year. In order to do this, we will utilize a neural network (NN), which has been found to be successful prediction tool for business problems as well as many other fields, such as technology, medicine, agriculture, engineering, and education. A simple Internet search using ArticleFirst produced over 11,000 articles on NNs. The NN used in this study incorporates the Neural Network Simultaneous Optimization Algorithm (NNSOA), a modified genetic algorithm, as its search technique. The Genetic Algorithm (GA) that was used as the base algorithm for the NNSOA, has been shown in comparisons with gradient search algorithms (variations of backpropagation) to outperform them in computer-generated problems as well as several real-world classification problems [9–13]. Modifications of the GA were made improving the algorithms ability to generalize to data in which it was not trained as well as giving it the ability to recognize relevant versus irrelevant input variables (NNSOA). This has the advantage of giving the researcher or manager additional information about the problem itself. In our case, we will be able to see which of the inputs we included in our data set that are actually helping predict turnover. By doing so, we are one step closer to solving the turnover problem. The NNSOA was shown to outperform the GA, in which it was based, as well as several backpropagation variations [14]. Also, included in the NNSOA is the automatic determination of the optimal number of hidden nodes to include in the NN architecture. This feature alone saved us considerable time and effort from trial-and-error techniques usually employed by other NN programs to find optimal architectures. 2.1. Generalization By using a search algorithm that identifies unneeded weights in a solution, this solution can then be applied to out-of-sample data with confidence that additional error cannot be introduced in the estimates. In current NN practices, every available input is included into the model that has the possibility of contributing to the prediction. While this method can result in fairly good models it has some obvious limitations. During the training process, if the connections (or weights) are not actually needed for prediction, the NN is required by its derivative nature to find nonzero weights that will essentially zero each other out for the training data. However, once this solution is applied to data that it has not seen in the training data set (out-of-sample), the unneeded weights are likely to not zero each other out and therefore add additional error to the estimate. This is a generalization problem. By using a search algorithm that is not based on derivatives, such as the NNSOA, we are allowed to have weights in our model that are hard
Randall S. Sexton et al. / Computers & Operations Research 32 (2005) 2635 – 2651
2639
zeros as well as modifying the objective function to add a penalty for every weight that is not a hard zero. In actuality, weights are really never removed only replaced with hard zeros, which in effect, removes them from the solution. In doing so, when applied to any data, whether training or testing there can be no net effect on the estimates. With the NNSOA, weights can be added and removed automatically at each stage of the optimization process. As weights are added or eliminated during the optimization process, discontinuities are introduced into the objective function. This precludes using search algorithms that require a differentiable objective function and, in fact, preclude most standard hill climbing algorithms. Previous studies have explored using gradient techniques that allow some of the weights to decay to zero or which reduce the size of the network during training [15–19,28]. These methods were found to have limited usefulness. Another alternative is to remove active weights or hidden nodes and then evaluate the impact. This method of weight reduction is basically trial-and-error and requires the user to retrain after every modification to the network. The NNSOA on the other hand is based on the genetic algorithm, which does not require a differentiable objective function and can handle discontinuities such as penalty value for each nonzero weight in the solution. The improvement of generalization has been the topic of much research [14,16–24,28]. 2.2. Identification of relevant inputs An additional benefit of being able to set unneeded weights to zero is the identification of relevant inputs in the NN model. After a solution has been found, an examination of these weights can be conducted in order to determine if any of the input variables have all of its weights set to zero. If a particular input has all of its input weights set to zero, we can conclude that this variable is irrelevant to the NN model since it will have no effect on the estimate. This is not to say the input has no relevant information. It just means that the NN found for this particular solution, to have no value for this input in helping with the prediction. This could mean two things. First, the variable has no value in predicting the output. In this case, if several different runs were conducted (changing the random seed to initialize the network’s starting points) it is likely that this input would be identified as irrelevant every time. The second case is not as clear as to the inputs relevancy. After several runs the input may or may not be included in the final solution. In this case it is likely and makes intuitive sense that the information contained in this input may be duplicated in other input variables. For example, lets say Inputs 1 and 2 have some of the same information contained in them. In one NN run, Input 1 is eliminated from the model. However, in the second run Input 2 is eliminated. A third run might include both variables as relevant, where it captured some of the relevant information from both variables. In either case, more information is gathered by this method, which gives the researcher or manager a better understanding of the problem. By determining the relevant inputs in the model, a manager can now have a better understanding of the problem and will be better equipped in making decisions. Section 3 describes the NNSOA. This is followed by the Monte Carlo study, results and conclusions.
3. The neural network simultaneous optimization algorithm The following is a simple outline of the NNSOA. The NNSOA is used only to search for the input weights. Prior research has found that using ordinary least squares (OLS) for determining the output
2640
Randall S. Sexton et al. / Computers & Operations Research 32 (2005) 2635 – 2651
weights is more efficient and effective [10,20,24–26]. A formal description of the basic GA algorithm can be found in [27]. Unlike backpropagation (BP), which moves from one point to another based on gradient information, the NNSOA simultaneously searches in many directions, which enhances the probability of finding the global optimum. The following is an outline of the NNSOA used in this study. 3.1. The NNSOA outline 3.1.1. Initialization A population of 12 solutions will be created by drawing random real values from a uniform distribution [−1, 1] for input weights. This will happen only once during the training process. The output weights are determined by OLS. 3.1.2. Evaluation Each member of the current population is evaluated by an objective function based on their sum-ofsquared error (SSE) value in order to assign each solution a probability for being redrawn in the next generation. In order to search for a parsimonious solution, a penalty value is added to the SSE for each nonzero weight (or active connection). The following equation shows the objective function used in this study: N N ˆ 2 i=1 (Oi − Oi ) 2 ˆ Min E = . (1) (Oi − Oi ) + C N i−1
Here N is the number of observations in the data set, O the observed value of the dependent variable, Oˆ the NN estimate, and C the number of nonzero weights in the network. The penalty for keeping an additional weight varies during the search and is equal to the current value of the Root Mean Squared Error (RMSE). Based on this objective function each of the 12 solutions in the population is evaluated. The probability of being drawn in the next generation is calculated by dividing the distance of the current solution’s objective value from the worst objective value in the generation by the sum of all distances in the current generation. 3.1.3. Reproduction Selecting solutions from the current population based on their assigned probability creates a mating pool of 12 solutions. This is repeated until the entire new generation, containing 12 solutions, is drawn. This new generation only contains solutions that were in the previous solutions. The only difference in the new generation and the old generation is that some of the solutions (the ones with higher probabilities) may appear more than once and the poorer solutions (the ones with lower probabilities) may not appear at all. 3.1.4. Crossover Once reproduction occurs giving us some combination of solutions from the previous generation, the 12 solutions are then randomly paired constructing 6 sets of parent solutions. A point is randomly selected for each pair of solutions in which the parent solutions will switch the weights that are above that point, generating 12 new solutions or the next generation.
Randall S. Sexton et al. / Computers & Operations Research 32 (2005) 2635 – 2651
2641
3.1.5. Mutation For each weight in the population a random number is drawn, if the random value is less than .05, the weight will be replaced by a randomly drawn value in the entire weight space. By doing this, the entire weight space is globally searched, thus enhancing the algorithm’s ability to find global solutions or at least the global valley.
3.1.6. Mutation2 For each weight in a generation a random number is drawn, if the random value is less than .05, a hard zero will replace the weight. By doing this, unneeded weights are identified as the search continues for the optimum solution. After this operator is performed, this new generation of 12 solutions begins again with evaluation and the cycle continues until it reaches 70% of the maximum set of generations.
3.1.7. Convergence enhancement Once 70% of the maximum set of generations has been completed, the best solution so far replaces all the strings in the current generation. Each weight in the population of strings is varied by a small random amount. These random amounts decrease to an arbitrarily small amount as the number of generations increase to its set maximum.
3.1.8. Termination The algorithm will terminate on a user specified number of generations.
4. Hidden node search The number of hidden nodes included in each NN is automatically determined in the following manner. Each NN begins with 1 hidden node and trained for a user-defined set of generations or MAXHID. After every MAXHID generations, the best solution at that point is saved as the BEST solution and an additional hidden node is included into the NN architecture. The NN is reinitialized by using a different random seed for drawing the initial weights and trained again for MAXHID generations. The BEST solution is also included in this new generation by replacing the first solution with its weights. Since an additional hidden node creates more weights than is found in the BEST solution, these weights will be set to hard zeros. This way, we keep what we have learned so far from previous generations. Upon completion of this training the best solution for this architecture is compared with the BEST solution. If this solution is better than the BEST solution, it now becomes the BEST solution and is saved for future evaluation. This process continues until a hidden node addition finds no solution better than the BEST solution. Once this occurs, the BEST solution and its corresponding architecture is trained with an additional user defined number of generations or MAXGEN, which completes the training process. Although two solutions could achieve the same value for the objective function, they may differ in their architecture. Dorsey et al. [20,24] demonstrated that the NN could have a variety of structures that will reduce to the same equivalent structure.
2642
Randall S. Sexton et al. / Computers & Operations Research 32 (2005) 2635 – 2651
5. Classification problem and experiment The objective of this study is to determine the effectiveness of using a NNSOA trained NN to predict employee turnover, specifically if an employee will leave the organization in the current year. For this problem, data was collected from a small (less than 100 employees), family owned, Mid-Western, stainless-steel fabrication company. The company is less than 30 years old and after experiencing a large growth surge in the late 1990s and early 2000 the company is beginning to experience the effects of employee turnover. Data was used for all employees who were employed between 1992 and 2002, and who are currently employed or who left the organization voluntarily. Employees who were terminated by the company were not included in the study. Reviewing personnel files for the selected company we collected a total of 29 inputs. The input categories chosen are age, marital status, starting salary, ending salary, gender, related to the owners of the company, unemployment index, consumer price index, level of employee in the organization, shift worked, Fair Labor Standards Act status, race, tenure, distance to worksite, full-time status, and the number of dependents. In order to capture the information for the categories we needed to make multiple inputs for some of the categories. For example, the level of employee in the organization had four classifications, including Manager, Supervisor, Lead Employee, and Regular Employee. The output included with these observations was set to a 1 if the employee left that year and a 0 if they stayed. These inputs are shown in Table 1. Since we wanted to predict if an employee was going to leave within that year we needed to collect yearly information. So, if an employee had been working at the business for 5 years, there would be 5 records for this employee, one for each year of service. This seems appropriate since many of the attributes we collected changed on a frequent basis. For example, an employee might get a raise, promotion, move, marital status could change, or even change the number of dependents. The number of observations collected totaled 447. Out of the 447 observations we had 412 observations of employees staying in a specific year and 35 employees that left on their own accord. A ten-fold cross validation was conducted in order to add rigor to our study. We made 10 training and 10 corresponding test sets out of the 447 observations. We did this by first randomizing the order of the observations and then taking off the last 44 observations and saving them into a test file. The remaining 403 observations were saved into a training file. To make the next training and test files, we put the 44 test observations from the previous data set and put them at the top of the training observations from the previous data set. We then took off the last 44 observations and saved them as the second test file and also saving the remaining 403 observations as the second training file. We did this for 9 data sets and on the 10th data set we had to change the number in the training and testing sets because the total number of observations was not divisible by ten and we wanted to make sure that every observation appeared one time in a test set. The last training and testing sets included 396 for training observations and 51 observations for testing. To add rigor to our study we included several other analysis techniques using the exact same data sets for comparison against the NNSOA. These included the original GA trained NN, two commercial BP-based NNs, and discriminant analysis (DA). All five methods were compared using the overall classification error percentage as well as Types I and II classification error percentages. The NNSOA and the GA NN was written in FORTRAN with a Visual Basic interface. The other three methods were commercial programs. All five methods were utilized on a 1.5 GHz machine running the Windows XP operating system.
Randall S. Sexton et al. / Computers & Operations Research 32 (2005) 2635 – 2651
2643
Table 1 Input variables Input
Input abbreviation
Input description
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29
AGE SING MARD BEG$ END$ MALE FMALE INFAM NFAM UI CPI MGR SUPR LEAD EE 1SHIFT 2SHIFT EXEMPT NEXEMPT WHT BLK HSP TNRJAN TNRDEC MILES FT PT DEPDTJAN DEPDTDEC
Age of employee Single employee Married employee Starting salary Ending salary Male employee Female employee In owning family Not in owning family Unemployment index Consumer Price Index Manager Supervisor Lead employee Regular employee Works first shift Works second shift Exempt employee as determined by the FLSA Nonexempt employee as determined by the FLSA White Black Hispanic Tenure of employee on January 1 Tenure of employee on December 31 Distance employee must travel to worksite Full time employee (40 hours) Part time employee (less than 40 hours) Number of dependents on January 1 Number of dependents on December 31
5.1. Training with the NNSOA The NNSOA basically only has two user-defined parameters. They include MAXHID and MAXGEN, which were discussed earlier. For each of the ten training sets the hidden node search was conducted using 100 generations, or MAXHID = 100. After finding the appropriate structure an additional 1000 generations were performed, or MAXGEN = 1000.
5.2. Training with the original GA neural networks This algorithm trained using the same parameters as the NNSOA mentioned above with the only difference in that it did not use the mutation2 operator described earlier as well as excluding any penalty for nonzero weights in the solutions.
2644
Randall S. Sexton et al. / Computers & Operations Research 32 (2005) 2635 – 2651
5.3. Training with commercial BP neural networks To benchmark the results against BP trained NNs, we included comparisons with two commercial versions of BP-based software. These included NeuralWorks䉸 Professional II/Plus (NW) from NeuralWare and NeuroShell Classifier䉸 (NS) from Ward Systems Group䉸 , Inc. NWs had an option called Cascade Correlation that automatically determined the number of hidden nodes used in the NN. We included 80 candidate hidden nodes in all 10 models and let the algorithm determine the optimal number for each data set. Since the NNSOA automatically determined the NN architecture we decided we would use this option for this software as well as a form of BP called QuickProp to search for the weights. The NS software uses a form of BP called TurboProp 2. It also automatically determines the number of hidden nodes in the NN architecture and we included 80 candidate hidden nodes for this version of BP NN software. Although both of these BP-based NNs included methods for building and determining the optimal architecture of the NN models from 1 to 80 hidden nodes, they did not have methods for automatically eliminating unneeded weights beyond simple pruning weights that were already close to zero. 5.4. Discriminate analysis We also included discriminate analysis (DA), a standard classification method. SPSS䉸 version 11 was used to run this analysis using the stepwise option in order to eliminate unnecessary variables in the NN models.
6. Results Once the 10 training sets finished training, we tested the solutions on their respective test sets. Since the number of employees staying in any given year far outnumbered the employees that left, we needed to set the cutoff point to take into account the skewed data. In the entire data set of 447 observations, there were 35 observations of employees leaving (or coded as 1s). We then divided this number by the total observations giving us the value .0783. This means any NN estimate below .0783 would be classified as staying and anything over .0783 would be classified as leaving. Although this cutoff makes intuitive sense for the NNSOA and the GA NN models because of their simplicity in dealing with skewed data, the other classification methods produced erratic results using this point as the cutoff. This occurrence is probably based on how each of these methods deals with unequal classification groups. To correct for this we also used a cutoff of .5 and evaluated the outcomes for all the classification methods. It was found that the commercial BP software and the DA performed much better using this cutoff. To have a fair comparison between these methods, we used the best cutoff for each algorithm. It should be noted that regardless of the cutoff used, the NNSOA outperformed the other four methods and to test for significance we used the Wilcoxon matched-pairs ranks test on the estimates from the best solutions for each algorithm. We found the NNSOA estimates to be significant at the .99% level of significance from the GA, NW, NS, and DA estimates. The classification error percentage was used in evaluating the performance of the NNSOA trained NNs as well as the comparisons with the original GA trained NN, two commercial BP trained NNs, and DA.
Randall S. Sexton et al. / Computers & Operations Research 32 (2005) 2635 – 2651
2645
Table 2 Overall classification error percentages for the 10 NN Runs RUN
NNSOA (%)
GA (%)
NS (%)
NW (%)
DA (%)
1 2 3 4 5 6 7 8 9 10 Average
0.00 0.00 0.00 0.00 0.00 0.00 2.27 4.55 0.00 0.00 0.68
9.09 2.27 9.09 6.82 4.55 11.36 4.55 9.09 4.55 11.76 7.31
2.27 0.00 2.27 2.27 2.27 2.27 4.55 4.55 2.27 0.00 2.27
9.09 2.27 11.36 6.82 4.55 11.36 6.82 9.09 4.55 11.76 7.77
25.00 20.45 29.55 31.82 45.45 20.45 34.09 34.09 18.18 33.33 29.24
Since the main objective of prediction is to be able to produce estimates from observations that have not been seen during training or analysis, we will only report the results for the 10 tests sets for each classification method. Table 2 shows the overall classification error percentages for all 10-test sets for each method. As can be seen in Table 2, the NNSOA found exceptionally good solutions for predicting turnover for this problem. The NNSOA was clearly the winner in producing estimate models that outperformed the other classification methods. In 8 out of 10 runs the NNSOA produced models that correctly classified all test observations. For each run, the NNSOA outperformed or tied for the best error rates. The NeuroShell NN was the only other classification method that produced models that could equal the performance of the NNSOA. This was done only three times for runs 2 and 10 at .00% error and run 8 at 4.55% error. The DA was by far the worst out the five methods. We have demonstrated that the NNSOA performs well for overall classification, but we have yet to show that its models hold up for Types I and II error percentages. For this paper, a Type I error is when we misclassify an employee as leaving when they actually stay. A Type II error is when we misclassify an employee as staying when they actually leave. Since 92% of the cases are classified as staying, a naïve model could classify all employees as staying and be correct 92% of the time. Obviously this model would not help the employer in identifying employees that were at risk of leaving in the near future but it is a baseline we can compare against. Table 3 shows the Type I error percentage comparisons for the 10 test runs. Out of the five different classification methods, the neural models seemed to perform exceptionally well with average errors over all 10 runs under 1%. NW was the overall winner by producing NN models that predicted with 100% accuracy for all 10 runs for Type I. The NNSOA and the GA models tied for second and the NS was a close third. Since 92% of the employees stayed over the 10 year span of the data that was collected, it can easily be seen that these models could possibly be just predicting that they all stay and would account for the good results for Type I error. However, in Table 4, we will be looking at Type II error percentages, which will help us to determine if this was the case. It is apparent from Table 4 that the NW models were predicting all employees as staying and thereby taking up the naïve model. This table also shows that the original GA NN, with the exception of just a few correct classifications, also predominantly predicted employees as staying. Since GA NN did not have
2646
Randall S. Sexton et al. / Computers & Operations Research 32 (2005) 2635 – 2651
Table 3 Type I classification error percentages for the 10 NN Runs RUN
NNSOA (%)
GA (%)
NS (%)
NW (%)
DA (%)
1 2 3 4 5 6 7 8 9 10 Average
0.00 0.00 0.00 0.00 0.00 0.00 0.00 2.50 0.00 0.00 0.25
2.50 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.25
0.00 0.00 0.00 0.00 2.38 0.00 0.00 2.50 0.00 0.00 0.49
0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
27.50 20.93 30.77 31.71 47.62 20.51 36.59 35.00 19.05 31.11 30.08
Table 4 Type II classification error percentages for the 10 NN Runs RUN
NNSOA (%)
GA (%)
NS (%)
NW (%)
DA (%)
1 2 3 4 5 6 7 8 9 10 Average
0.00 0.00 0.00 0.00 0.00 0.00 33.33 25.00 0.00 0.00 5.83
75.00 100.00 80.00 100.00 100.00 100.00 66.67 100.00 100.00 100.00 92.17
25.00 0.00 20.00 33.33 0.00 20.00 66.67 25.00 50.00 0.00 24.00
100.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00
0.00 0.00 20.00 33.33 0.00 20.00 0.00 25.00 0.00 50.00 14.83
the added advantage of eliminating unnecessary weights, it had the difficult task of finding values for these weights that essentially zeroed out their net change on the estimate for the training data sets. Also, since the same number of generations was used as the NNSOA, there may not have been enough time to sufficiently search for solutions that could do this effectively. The NNSOA correctly predicted 94% (5.83% error) of the time that employees would be leaving within a year. Coupled with the 99% (.25% error) accuracy of employees staying the NNSOA clearly outperformed all the classification methods used in this comparison. The next best Type II error percentages came from the DA at 14.83%, however when combining it with its 30.08% Type I error its ability to discriminate becomes less useful. The algorithm that comes closest to NNSOA would be Ward Systems Group䉸 ’s NeuroShell䉸 Classifier that on average correctly classified employees as leaving 76% (24% error) of the time. What is interesting is the ability of the NNSOA to collapse the networks down to a parsimonious architecture while still outperforming other methods. Table 5 shows for each of the methods the number of hidden nodes found to be optimal for each model and Table 6 shows the number of nonzero input weights.
Randall S. Sexton et al. / Computers & Operations Research 32 (2005) 2635 – 2651
2647
Table 5 Optimal hidden nodes RUN
NNSOA
GA
NS
NW
1 2 3 4 5 6 7 8 9 10 Average
2 2 2 2 3 3 3 3 3 2 2.5
5 5 3 3 3 3 3 3 3 3 3.4
68 49 67 80 78 75 64 80 56 79 69.6
80 33 30 12 7 27 61 38 57 6 35.1
Table 6 Optimal nonzero weights RUN
NNSOA
GA
NS
NW
1 2 3 4 5 6 7 8 9 10 Average
3 5 3 3 5 6 4 6 4 3 4.2
150 150 90 90 90 90 90 90 90 90 102
2040 1470 2010 2400 2340 2250 1920 2400 1680 2370 2088
2400 990 900 360 210 810 1830 1140 1710 180 1053
Intuitively, since all the NNSOA networks reduced unneeded weights significantly, the chances of introducing additional error into the estimates were also reduced. This is possibly the reason why the NNSOA trained NNs performed so well for test observations. On average, 94% of the input weights across the 10 runs were eliminated. The percentage of reduction for each network indicates that there were likely several irrelevant variables in our data. Next we want to identify exactly which of the 29 inputs were eliminated from our models and more importantly, which input variables consistently stayed in, giving us our high predictability for turnover. By doing this, we can possibly begin to understand what we need to do in order to get employees to stay and stop unnecessary turnover. Table 7 lists the input variables that were found to have at least 1 nonzero weight making a connection to a hidden node. The list is in order of frequency of how many times a specific input variable showed up with one or more nonzero weights in the final solution. If all the weights for a particular network and input were set to zero, then we can correctly infer that the network did not need this input or variable for prediction.
2648
Randall S. Sexton et al. / Computers & Operations Research 32 (2005) 2635 – 2651
Table 7 Frequency of found relevant variables for the 10 Runs Frequency
Relevant variables
10 5 4 0
TNRJAN, TNRDEC END$ FT, PT AGE, SING, MARD, BEG$, MALE, FMALE, INFAM, NFAM, UI, CPI, MGR, SUPR, LEAD, EE, 1SHIFT, 2SHIFT, EXEMPT, NEXEMPT, WHT, BLK, HSP, MILES, DEPDTJAN, DEPDTDEC
From this table it can be seen that the variable category TENURE (TNRJAN and TNRDEC) seemed to be relevant in all 10 RUNS and ENDING SALARY was relevant 5 out of the 10 Runs. These predictive variables correspond with turnover literature and make intuitive sense. The FULL TIME and PART TIME employee inputs were found to be relevant 4 out of 10 times. It should be noted that multicolinearity could be affecting which of the inputs are included for each network. If FULL TIME and/or PART TIME were picked for one network it is possible that the same or some of the same information could be derived out of ending salary. A sensitivity analysis could give us more insight as to how much a particular input could change the actual estimate. To do this, we constructed a test set that would give us an indication as to the degree of change and direction. The test set was constructed in the following manner. Two artificial observations were constructed for each of the 29 inputs. Since we want to see how a specific input, as its value goes from its smallest value to its largest value, changes the value for the estimate. We did this by putting its minimum value across the entire data set in one observation and its maximum value in the other. The other 28 input values for both observations were set to their average values across the entire data set. This was done so that the only thing that changed from the first observation to the second was the value for the input that we are interested in. Once we run the two observations through the networks we can look at the two estimates that are produced by the NN and see how much it has changed and in what direction. The following graph shows the changes in the estimates of all the variables that were found to be relevant out of the 10 different runs. Fig. 1 shows that as the values grow for the variables shown the estimate will decrease in value for ending salary, tenure (January and December work with each other for all networks leaving TENURE on the graph as the net result), full time employment, and part time employment. These all make sense, however, it should be noted that this analysis does not take into account interaction effects that could happen between variables and is just a guideline to identify trends.
7. Conclusions The NNSOA is shown to perform exceedingly well for optimizing a NN while simultaneously eliminating unnecessary weights in the NN structure during the training process for our employee turnover problem. This results in a parsimonious NN architecture that performs well for testing data sets. By decreasing the structure of the NN, generalization is likely to improve because the shortage of weights forces the algorithm to develop general rules to discriminate between the input patterns, instead of memorizing
Randall S. Sexton et al. / Computers & Operations Research 32 (2005) 2635 – 2651
2649
Sensitivity Analysis 0 END$
TENURE
FT
PT
-0.05 Change
Net Change on Estimate
-0.1 -0.15 -0.2 -0.25 -0.3 -0.35 -0.4 -0.45 -0.5 Relevant Variables
Fig. 1.
special cases. An added benefit in identifying unneeded weights in the solution is the identification of irrelevant variables in the NN model. This identification gives researchers additional information about the behavior of the problem. Since unnecessary weights are removed from the model the NN is more efficient. Also, the additional modification to the NNSOA for hidden node search allows the researcher to basically plug in the data set and let it converge to a final solution without the cumbersome trial-and-error methods of hidden node determination. Overall, the NNSOA performed well on the turnover problem and is easier to use than the majority of NN software available for researchers. However, the NeuroShell software by Ward Systems Group was found to be just as easy to use and produced the second best results in this study. Limitations in this study include two areas where future research is warranted. Additional studies should be conducted on larger firms in order to get a better grasp of the turnover problem. Although, we predicted with some accuracy for this specific problem set we need to continue collecting more data to see how well we can generalize to other companies and even industries as well including other inputs that could possible help us improve upon our accuracy. References [1] The Nature of Employee Turnover, 2002. Article. CCH-EXP, HRM- Personnel. 26 Oct. 2002 http://80-health.cch.com. [2] Griffeth H. Employee Turnover. Cincinnati: South-Western College; 1995. [3] Society for Human Resource Management, Employee Turnover: Analyzing Employee Movement Out of the Organization. SHRM White Papers. June 1993. http://www.shrm.org. [4] Leonard B. Turnover at the Top. HR Magazine 2001;46:46. [5] Racz S. Finding the right talent through sourcing and recruiting. Strategic Finance 2000;82:38. [6] Galbreath R, 2000. Employee turnover hurts small and large company profitability. SHRM White Papers. http://www.shrm.org.
2650
Randall S. Sexton et al. / Computers & Operations Research 32 (2005) 2635 – 2651
[7] Hampton L. Five ways to really irritate your employees. Business Credit 2001;103:20. [8] Bai B, Ghiselli R, LaLopa J. Job satisfaction, life satisfaction and turnover intent. Cornell Hotel & Restaurant Administration Quarterly 2001;42:28. [9] Gupta JND, Sexton RS, Tunc EA. Selecting scheduling heuristics using neural networks. INFORMS Journal on Computing 2000;12(2):150–62. [10] Sexton RS, Dorsey RE, Johnson JD. Toward a global optimum for neural networks: a comparison of the genetic algorithm and backpropagation. Decision Support Systems 1998;22:171–85. [11] Sexton RS, Dorsey RE. Reliable classification using neural networks: a genetic algorithm and backpropagation comparison. Decision Support Systems 2000;30:11–22. [12] Sexton RS, Gupta JND. Comparative evaluation of genetic algorithm and backpropagation for training neural networks. Information Sciences 2000;129:45–59. [13] Gupta JND, Sexton RS. Comparing backpropagation with a genetic algorithm for neural network training. OMEGA The International Journal of Management Science 1999;27:679–84. [14] Sexton RS, Dorsey RE, Sikander NA. Simultaneous optimization of neural network function and architecture algorithm. Decision Support Systems 2004;36(3):283–96. [15] Baum EB, Haussler D. What size net gives valid generalization?. Neural Computation 1989;1:151–60. [16] Burkitt AN. Optimisation of the architecture of feed-forward neural nets with hidden layers by unit elimination. Complex Systems 1991;5:371–80. [17] Fogel DB. A comparison of evolutionary programming and genetic algorithms on selected constrained optimization problems. Simulation 1995;64(6):399–406. [18] Kamimura R. Internal representation with minimum entropy in recurrent neural networks: minimizing entropy through inhibitory connections. Network Computation in Neural Systems 1993;4:423–40. [19] Prechelt L. A study of experimental evaluations of neural network learning algorithm: current research practice, technical report 19/94, Fakultat fur Informatik, Universitat Karlsruhe, D-76128 Karsruhe, Germany, 1994. Anonymous FTP:/pub/papers/techreports/ 1994/1994-19.ps.Z on ftp.ira.uka.de. [20] Dorsey RE, Johnson JD, Mayer WJ. A genetic algorithm for the training of feedforward neural networks. Johnson JD.Whinston AB., editors. Advances in artificial intelligence in economics, finance, and management, vol. 1. Greenwich, CT: JAI Press Inc.; 1994. pp. 93–111. [21] Drucker H, LeCun Y. Improving generalisation performance using double backpropagation. IEEE Transactions on Neural Networks 1992;3:991–7. [22] Karmin ED. A simple procedure for pruning back-propagation trained networks. IEEE Transactions on Neural Networks 1990;1:239–42. [23] Kruschke JK. Distributed bottlenecks for improved generalization in back-propagation networks. International Journal of Neural Networks Research and Applications 1989;1:187–93. [24] Dorsey RE, Johnson JD, Van Boening MV. The use of artificial neural networks for estimation of decision surfaces in first price sealed bid auctions. In: Cooper WW, Whinston AB., editors. New Direction in Computational Economics. Netherlands: Kluwer Academic Publishers; 1994. pp. 19–40. [25] Dorsey RE, Mayer WJ. Genetic algorithms for estimation problems with multiple optima, non-differentiability, and other irregular features. Journal of Business and Economic Statistics 1995;13(1):53–66. [26] Sexton RS, Sriram RS, Etheridge H. Improving decision effectiveness of artificial neural networks: a modified genetic algorithm approach. Decision Sciences 2003;34(3):421–42. [27] Dorsey RE, Johnson JD. Evolution of dynamic reconfigurable neural networks: energy surface optimality using genetic algorithms. In: Levine DS, Elsberry WR., editors. Optimality in biological and artificial networks. Nillsdale, NJ: Lawrence Erlbaum Associates; 1997. pp. 185–202. [28] Chan WS, Tong H. On tests for non-linearity in time series analysis. Journal of forecasting 1986;5:217–28.
Further reading [1] Cottrell M, Girard B, Girard Y, Mangeas M. Time series and neural network: a statistical method for weight elimination. In: Verlysen M., editor. European Symposium on Artificial Neural Networks. Brussels: D. facto; 1993. pp. 157–64.
Randall S. Sexton et al. / Computers & Operations Research 32 (2005) 2635 – 2651
2651
[2] Fahlman SE, Lebiere C. The cascade-correlation learning architecture. Neural Information Processing Systems 1990;2: 524–32. [3] Lendaris GG, Harls IA. Improved generalization in ANN’s via use of conceptual graphs: a character recognition task as an example case. Proceedings of the IJCNN-90. Piscataway, NJ: IEEE; 1990. p. 551–6. [4] Romaniuk SG. Pruning divide and conquer networks. Network: Computation in Neural Systems 1993;4:481–94.