Applied Mathematics and Computation 218 (2011) 3539–3552
Contents lists available at SciVerse ScienceDirect
Applied Mathematics and Computation journal homepage: www.elsevier.com/locate/amc
Prediction of work-related accidents according to working conditions using support vector machines A. Suárez Sánchez a, P. Riesgo Fernández a, F. Sánchez Lasheras b, F.J. de Cos Juez c, P.J. García Nieto d,⇑ a
Department of Business Management, University of Oviedo, 33004 Oviedo, Spain Department of Construction and Manufacturing Engineering, University of Oviedo, 33204 Gijón, Spain c Mining Exploitation and Prospecting Department, University of Oviedo, 33004 Oviedo, Spain d Department of Mathematics, University of Oviedo, 33007 Oviedo, Spain b
a r t i c l e
i n f o
Keywords: Work-related accidents Working conditions Support vector machine Regression analysis Statistical learning
a b s t r a c t Support vector machines (SVMs), which are a kind of statistical learning methods, were applied in this research work to predict occupational accidents with success. In the first place, semi-parametric principal component analysis (SPPCA) was used in order to perform a dimensional reduction, but no satisfactory results were obtained. Next, a dimensional reduction was carried out using an innovative and intelligent computing regression algorithm known as multivariate adaptive regression splines (MARS) model with good results. The variables selected as important by the previous MARS model were taken as input variables for a SVM model. This SVM technique was able to classify, according to their working conditions, those workers that have suffered a work-related accident in the last 12 months and those that have not. SVM technique does not over-fit the experimental data and gives place to a better performance than back-propagation neural network models. Finally, the results and conclusions of this study are presented. Ó 2011 Elsevier Inc. All rights reserved.
1. Introduction Workplace accidents and occupational injuries can mean pain and disability and can affect the worker’s life, both in and out of work. They represent a considerable economic burden to employers, employees and to society as a whole. The International Labor Organization [1] estimates that each year about 2.3 million men and women die from work-related accidents and diseases: nearly 360,000 fatal accidents and an estimated 1.95 million fatal work-related diseases. Approximately 337 million accidents occur on the job annually [1]. In economic terms it is estimated that roughly 4% of the annual global Gross Domestic Product, or US$1.25 trillion, is siphoned off by direct and indirect costs of occupational accidents and diseases such as lost working time, workers’ compensation, the interruption of production and medical expenses [1]. In the European Union, more than 5700 people die annually as a consequence of work-related accidents, according to EUROSTAT figures [2]. Every year 3.2% of workers in the EU-27 have an accident at work, which corresponds to almost 7 million workers. Accidents at work were estimated to cause annually costs of 55 billion euros a year in EU-15 in 2004 [3]. In Spain, more than 800 people die every year in work-related accidents [4]. In average, approximately 850,000 accidents at work are registered annually.
⇑ Corresponding author. E-mail address:
[email protected] (P.J. García Nieto). 0096-3003/$ - see front matter Ó 2011 Elsevier Inc. All rights reserved. doi:10.1016/j.amc.2011.08.100
3540
A. Suárez Sánchez et al. / Applied Mathematics and Computation 218 (2011) 3539–3552
Whilst occupational safety is regulated under various national legislative schemes, analysis of workplace accidents is not always a priority. A potential contributory factor to the problem is the presence of a fatalistic belief about the inevitability of accidents, i.e., ‘accidents will happen’. An unfortunate reaction to this position would be a relaxation of efforts to reduce the frequency of accidents [5]. Fontaneda and Manzanedo [6] demonstrated that an improvement in working conditions leads to a decrease in the probability of an accident occurring. Therefore, the development of an accident model identifying the main factors in accident occurrence can be a helpful tool both to dismiss this belief and in implementing appropriate preventive measures. The aim of this study is the development of a SVM model capable of classifying workers correctly into two groups, those who have suffered an occupational accident in the last twelve months and those who have not, according to their responses to the Sixth National Survey on Working Conditions and without taking into account the variable in which they say if they have suffered an accident or not. Therefore the model obtained would be considered a predictive model. A secondary objective is to determine the subset of variables on employment status and working conditions relevant to classify workers. From this subset of variables, we can assess the risk profile of different professions, as well as identify the factors that should be controlled in order to prevent occupational accidents. Please note that in the present research work, semi-parametric principal component analysis (SPPCA) and multivariate adaptive regression splines method (MARS) are used in order to identify those variables that would be relevant for the construction of an SVM classification model. SVM models are generalized linear classifiers and can be considered an extension of the perceptron. SVMs has a special property, in that they simultaneously minimize the empirical classification error and maximize the geometric margin.
2. Background Diverse studies have been published relating specific factors to accident occurrence and accident-proneness in different economic sectors. Visser et al. [7] carried out a revision of the literature on accident proneness in specific contexts, including work. Although they could not give an overall prevalence rate of accident proneness due to the large variety in operationalizations, they were able to identify certain evidence of the existence of accident-proneness. This is consistent with the results obtained by Gauchard et al. [8] associating accident proneness with work status (short service in the job) and with personal physical and behavioral factors. On the other hand, Kirschenbaum et al. [9] suggest that personal factors do not affect so directly accident-proneness as directly as organizational factors and work relations. According to this work, being employed in a subcontracting status and the interaction of longer hours of work and level of wages are two important factors that have a major influence on accident proneness. The relevance of work relations is also highlighted by Fabiano et al. [10], who show that temporary workers suffer a higher accident frequency than in-house workers employed in the same activities. Moreover, overtime has been revealed to be another crucial organizational factor in accident occurrence in many industries [11–14]. This point is confirmed by Lindroos et al. [15], whose results show that the amount of time worked is the most important factor affecting accident rates. Finally, the link between work injuries and incentive wages is also mentioned by Sundstrom-Frisk [16]. Some additional studies highlight the importance of organizational and management aspects of work. For example, Stave and Törner [17] identify specific organizational preconditions that increase the risk of occupational accidents: insufficient learning and communication, conflicting goals and differences between procedure and practice. Other authors [18,19] confirm the negative effect of inadequacies in training and poor working procedures, and emphasize the pertinence of some other conditions, such as insufficient or superficial assessment of risks. Besides this, Varonen [20] stresses the importance of safety climate, showing that the more safety-concious the climate of the company, the lower the accident rate. Another group of studies focuses on the influence of work-specific hazards on accident occurrence. Picard et al. [21] verified the association between occupational noise exposure, noise-induced hearing loss and the risk of industrial accidents. The influence of noise exposure is also mentioned by Holcroft and Punnett [22], together with other occupational factors such as dangerous working methods and materials, hard physical demands (fast work pace, heavy lifting demands and postural stress) and psychosocial variables (low decision latitude and low social support at work). Very few works have been carried out trying to connect the diverse types of factors identified in literature in a holistic model, and they are oriented to specific sectors. Attwood et al. [23] conducted a comprehensive review of the diverse accident prediction models found in literature. Starting from that point, Attwood et al. [5] developed a model to predict the frequency and associated costs of occupational accidents in the offshore oil and gas industry. This model was based on three groups of elements: external elements (mainly derived from the market), corporate elements (safety culture, safety training programmes and safety procedures) and individual elements (individual behavior, individual mental and physical capability). Breslin et al. [24] carried out a study examining the relative contribution of sociodemographic and work factors to workrelated injury/illness, with an emphasis on job tenure as a predictor. Moreover, Paul and Maiti [25] developed a structural model for mines showing that social support, work hazards and safety environment have a significant effect, either direct or indirect, on work injuries, safe work behavior and job dissatisfaction. Finally, in their study focused on the wood-processing industry, Holcroft and Punnett [22] showed that the main variables associated with injury risk were heavy physical workload, machine-paced work or inability to take a break when tired, lack of training, absence of a lockout program, low seniority and gender.
A. Suárez Sánchez et al. / Applied Mathematics and Computation 218 (2011) 3539–3552
3541
3. Material and methods 3.1. The survey The Sixth National Survey on Working Conditions was published in 2007 by the Spanish National Institute of Safety and Hygiene in the Workplace, a subsidiary body of the Spanish Ministry of Labor and Immigration [4]. The main goal of the study was to assess working conditions and provide an overview of health and safety conditions in Spanish workplaces with the following specific objectives: To discover those issues in the workplace that have an influence on workers’ health and the extent to which workers are exposed to them. To identify existing preventive structures and assess companies’ preventive activities based on practical measures undertaken. To establish trends in working conditions in the Spanish labor market. The National Survey on Working Conditions examines such issues as: effects on health, risk prevention, workplace safety and environmental conditions, workplace design, physical and mental demands or psychosocial factors. The population or universe consisted of 18,518,444 individuals and was composed of workers employed in all economic activities from the entire Spanish territory. Fieldwork was carried out between December 2006 and April 2007. Interviews were conducted in the homes of workers. A total of 11,054 workers responded to a questionnaire consisting of 78 variables, including classification data. The technique used was personal interview: the interviewer asked the questions and recorded the worker’s answers on the questionnaire. The sampling procedure was multistage, stratified cluster sampling, with random selection of primary sampling units (municipalities) and secondary sampling units (sections), and selection of workers by random routes. For a confidence level of 95.5% and P = Q, the error for the overall sample is ±0.95%. The 78 items in the questionnaire ranged from working conditions to damage to health, through questions on the preventive resources of the company, as well as personal characteristics. The model was restricted to the external parameters that defined the labor conditions and working environment, which were structured in the following groups: Labor relation and type of contract: this section also includes also items concerning subcontracting, part-time and temporary work. Information from the company and work center: the items in this section focus on the sector and activity of the worker’s company. Type of work: this group of items includes type of occupation and seniority of the worker. Thermal environment: this section studies the thermal conditions of the work. Physical agents: the items in this group range from noise to vibration and electromagnetic radiation exposure. Chemical and biological agents: this section focuses on exposure to chemicals and biological agents. Safety hazards and conditions: the items in this section investigate the presence of specific workplace hazards such as falls, cuts, fire, electricity, etc. Workplace design, physical demands and psychosocial factors: this section includes items on ergonomic factors, physical and mental workload, as well as psychosocial variables such as social support or autonomy. Health and safety management and resources: this group of items focuses on health and safety services. Working hours: the items in this section include working hours, time flexibility, overtime, etc. Health and safety activities: this section focuses on the type and extension of health and safety activities carried out by the worker’s company. Violent behavior at work: items of this section investigate the presence of violence and discrimination in the workplace. Work-related accidents and damage to health: this section investigates the incidence of accident and the occurrence of illhealth and its causes. 3.2. Principal components analysis (PCA) Principal component analysis (PCA) is widely used for dimensionality reduction with applications ranging from pattern recognition and time series prediction to visualization the PCA involves a mathematical procedure that transforms a number of possibly correlated variables into a smaller number of uncorrelated variables called principal components. The first principal component accounts for as much of the variability in the data as possible, and each succeeding component accounts for as much of the remaining variability as possible. PCA was developed by Pearson [26] and Hotelling [27]. Nowadays it is considered useful for performing exploratory data analysis and as a complementary tool in order to perform predictive models. It must be remarked that the classic PCA algorithm is not based on a probability model [28,29]. A probabilistic formulation of PCA offers the following advantages:
3542
A. Suárez Sánchez et al. / Applied Mathematics and Computation 218 (2011) 3539–3552
it allows for statistical testing and the application of Bayesian inference methods and naturally accommodates missing values [28]. Probabilistic PCA (PPCA) [28] borrows from one such popular model, called factor analysis to propose a probabilistic alternative PCA. A key feature of this probabilistic model is that the latent distribution P(h) is also assumed to be Gaussian since it leads to simple and fast model estimation, i.e., the density of x is approximated by a Gaussian distribution whose covariance matrix is aligned along a lower dimensional subspace. This may be a good approximation when data is drawn from a single population and the goal is to explain the data in terms of a few variables. The PCA algorithm used in the current article presents an alternative probabilistic formulation, called semi-parametric PCA (SPPCA) [29], where no assumptions are made about the distribution of the latent random variable h. Non-parametric latent distribution estimation allows us to approximate data density better than previous schemes and hence gives better low dimensional representations. In particular, multi-modality of the high dimensional density is better preserved in the projected space. To make this method suitable for special data types, we allow the conditional distribution P(xjh) to be any member of the exponential family of distributions. The use of exponential family distributions for P(xjh) is common in statistics where it is known as latent trait analysis and they have also been used in several recently proposed dimensionality reduction schemes [30,31]. It is used Lindsay’s non-parametric maximum likelihood estimation theorem is used to reduce the estimation problem to one with a large enough discrete prior. It turns out that this choice gives us a prior which is ‘conjugate’ to all exponential family distributions, allowing us to give a unified algorithm for all data types. This choice also makes it possible to efficiently estimate the model even in cases where different components of the data vector are of different types. It is assumed that P(h) adequately models the dependencies among the components of x and hence that the components Q of x are independent when conditioned upon h, i.e., PðxjhÞ ¼ j Pðxj jhj Þ, where xj and hj are the jth components of x and h. As has been described previously, using Gaussian means and constraining them to a lower dimensional subspace of the data space is equivalent to using Euclidean distance as a measure of similarity. This Gaussian model may not be appropriate for other data types, for instance the Bernoulli distribution may be better for binary data and Poisson for integer data. These three distributions, along with several others, belong to a family of distributions known as the exponential family. Any member of this family can be written in the form:
log PðxjhÞ ¼ log P0 ðxÞ þ x h GðhÞ
ð1Þ
where h is called the natural parameter and G(h) is a function that ensures that the probabilities sum to one. An important property of this family is that the mean l of a distribution and its natural parameter h are related through a monotone invertible nonlinear function l = G0 (h) = g(h). It can be shown that the negative log-likelihoods of exponential family distributions can be written as Bregman distances (ignoring constants) which are a family of generalized metrics associated with convex functions [32]. The use of different distributions for the various components of x, allowed us to mixed data types. 3.3. Multivariate adaptive regression splines method (MARS) Multivariate adaptive regression splines (MARS) is a multivariate nonparametric classification/regression technique introduced by Friedman [33]. Its main purpose is to predict the values of a continuous dependent variable, ~ yðn 1Þ, from a set of independent explanatory variables, ~ Xðn pÞ. The MARS model can be represented as:
~ y ¼ f ð~ XÞ þ ~ e
ð2Þ
where f is a weighted sum of basis functions that depend on ~ X and ~ e is an error vector of dimension (n 1). MARS can be considered as a generalization of ‘classification and regression trees’ (CART) [34] and is able to overcome some of the limitations of CART. As is well-known [35], MARS does not require any a priori assumptions about the underlying functional relationship between dependent and independent variables. Instead, this relation is uncovered from a set of coefficients and piecewise polynomials of degree q (basis functions) that are entirely ‘‘driven’’ from the regression data ð~ X; ~ yÞ. The MARS regression model is constructed by fitting basis functions to distinct intervals of the independent variables. Generally, piecewise polynomials, also called splines, have pieces smoothly connected together. In MARS terminology, the joining points of the polynomials are called knots, nodes or breakdown points. These will be denoted by the small letter t. For a spline of degree q each segment is a polynomial function. MARS uses two-sided truncated power functions as spline basis functions, described by the following equations [36]:
( ½ðx tÞqþ ¼ ( ½þðx
tÞqþ
¼
ðt xÞq
if x < t
0
otherwise
ðt xÞq
if x P t
0
otherwise
ð3Þ
ð4Þ
where q(P0) is the power to which the splines are raised and which determines the degree of smoothness of the resultant function estimate. When q = 1, which is the case in this study, only simple linear splines are considered.
A. Suárez Sánchez et al. / Applied Mathematics and Computation 218 (2011) 3539–3552
3543
The MARS model of a dependent variable ~ y with M basis functions (terms) can be written as [37]:
~ xÞ ¼ c0 þ y^ ¼ ^f M ð~
M X
cm Bm ð~ xÞ
ð5Þ
m¼1
where ~ y^ is the dependent variable predicted by the MARS model, c0 is a constant, Bm ð~ xÞ is the mth basis function, which may be a single spline basis functions, and cm is the coefficient of the mth basis functions. Both the variables to be introduced into the model and the knot positions for each individual variable have to be optimized. For a data set ~ X containing n objects and p explanatory variables, there are N = n p pairs of spline basis functions, given by Eqs. (3) and (4), with knot locations xij(i = 1, 2, . . . , n; j = 1, 2, . . . , p). A two-step procedure is followed to construct the final model. First, in order to select the consecutive pairs of basis functions of the model, a two-at-a-time forward stepwise procedure is implemented [36]. This forward stepwise selection of basis function leads to a very complex and overfitted model. Such a model, although it fits the data well, has poor predictive abilities for new objects. To improve the prediction, the redundant basis functions are removed one at a time using a backward stepwise procedure. To determine which basis functions should be included in the model, MARS utilizes the generalized cross-validation [35] (GCV). The GCV is the mean squared residual error divided by a penalty dependent on the model complexity. The GCV criterion is defined in the following way: 1
GCVðMÞ ¼ n
Pn
i¼1 ðyi
^f M ð~ xi ÞÞ2
ð6Þ
ð1 CðMÞ=nÞ2
where C(M) is a complexity penalty that increases with the number of basis functions in the model and which is defined as:
CðMÞ ¼ ðM þ 1Þ þ dM
ð7Þ
where M is the number of basis functions in Eq. (5), and the parameter d is a penalty for each basis function included in the model. It can be also regarded as a smoothing parameter. Large values of d lead to fewer basis functions and therefore smoother function estimates. For more details about the selection of the d parameter, see the reference [33]. In our studies, the parameter d equals 2, and the maximum interaction level of the spline basis functions is restricted to 3. The main steps of the MARS algorithm as applied here can be summarized as follows: 1. Select the maximal allowed complexity of the model and define the d parameter. Forward stepwise selection: 2. Start with the simplest model, i.e. with the constant coefficient only. 3. Explore the space of the basis functions for each explanatory variable. 4. Determine the pair of basis functions that minimizes prediction error and include them into the model. 5. Go to step 2 until a model with predetermined complexity is derived. Backward stepwise selection: 6. Search the entire set of basis functions (excluding the constant) and delete from the model the one that contributes least to the overall goodness of fit using the GCV criterion. 7. Repeat 5 until GCV reaches its maximum. The predetermined complexity of MARS model in step 3 should be considerably larger than the optimal (minimal GCV) model size M⁄, so choosing the predetermined complexity of the model as more than 2M⁄ is generally enough [36]. It is possible to analyze a MARS model using surface plots that visualize the interactions and effects between the basis functions. To illustrate this, some definitions will be introduced. Let fi ð~ xi Þ be the set of all single variable basis functions, i.e. basis functions that contain only ~ xi . Similarly, let fij ð~ xi ; ~ xj Þ be the set of all two-variable basis functions that contain the pairs of variables ~ xi and ~ xj , and fijk ð~ xi ; ~ xj ; ~ xk Þ the set of all three-variable basis functions that contain the triplets of variables ~ xi ; ~ xj and ~ xk . The MARS model can be rewritten in the following form:
^f ð~ XÞ ¼ c0 þ
X
fi ð~ xi Þ þ
X
fij ð~ xi ; ~ xj Þ þ
X
fijk ð~ xi ; ~ xj ; ~ xk Þ
ð8Þ
where the first sum is over all single-variable basis functions, the second sum is over all strictly two-variable basis functions, and the third sum represents all three-variable basis functions. Eq. (8) is called ANOVA decomposition due to its similarity to the decomposition by ANOVA of experimental design [35]. The two-variable interaction of a MARS model, Ifij ð~ xi ; ~ xj Þ, is given by:
Ifij ð~ xi ; ~ xj Þ ¼ fi ð~ xi Þ þ fj ð~ xj Þ þ fij ð~ xi ; ~ xj Þ
ð9Þ
Higher level interactions can be defined in a similar way. The graphical presentation of the ANOVA decomposition facilitates the interpretation of the MARS model. The effect of a one-variable basis function can be viewed by plotting fi ð~ xi Þ against ~ xi . Two-variable interaction can be viewed by plotting Ifij ð~ xi ; ~ xj Þ against ~ xi and ~ xj in a surface plot.
3544
A. Suárez Sánchez et al. / Applied Mathematics and Computation 218 (2011) 3539–3552
3.4. Prediction ability of the MARS model The prediction ability of the MARS model can be evaluated in terms of the ‘root mean squared error of cross-validation’ (RMSECV) and the squared leave-one-out correlation coefficient (q2). To compute RMSECV, one object is left out from the data set and the model is constructed for the remaining n 1 objects. Then the model is used to predict the value for the object left out. When all objects have been left out once, RMSECV is given by [36,38]:
sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi Pn ^ 2 i¼1 ðyi yi Þ RMSECV ¼ n
ð10Þ
^i is the predicted value of the dependent variable of the ith where yi is the value of dependent variable of the ith object, y object with the model built without the ith object. The value of q2 is given as:
Pn ^i Þ2 ðyi y q2 ¼ 1 Pi¼1 n 2 i¼1 ðyi yÞ
ð11Þ
is the mean value of the dependent variable for all n objects. where y 3.5. The importance of the variables in the MARS model Once the MARS model is constructed, it is possible to evaluate the importance of the explanatory variables used to construct the basis functions. Since each explanatory variable can be incorporated into different basis functions, the importance of the variable is expressed as its contribution to the goodness of fit of the model. The scoring of the importance of variables in the MARS model is similar to the leave-one-out cross-validation concept. To calculate variable importance scores, MARS refits the model after deleting all terms involving the variable at issue and calculating the reduction in goodness of fit. The importance of the variables is a relative measure and scaled between 0 and 1. The most important variable is the one that, when dropped, decreases the model fit the most; it receives the highest score, i.e. 1. The less important variables receive lower scores, which is the ratio of the reduction in goodness of fit of these variables to that of the most important variable. 3.6. Support vector machines (SVM) The SVM is a learning method with a theoretical root in statistical learning theory [39,40]. The SVM was originally developed for classification, and was later generalized to solve regression problems [41]. The model produced by support vector classification only depends on a subset of the training data, because the cost function for building the model does not care about training points that lie beyond the margin. Analogously, the model produced by support vector regression only depends on a subset of the training data, because the cost function for building the model ignores any training data that are close (within a threshold e) to the model prediction. Please note that in our research SVM is employed as a classification method. The basic idea of SVM to pattern classification is briefly described here. In the first place, the input vectors are mapping into one feature space with a higher dimension, linearly or non-linearly depending on the selection of the kernel function. Next, an optimized linear division is sought, that is to say, a hyperplane is built which separates the two classes. This method can be also extended to multi-classes. The SVM training always tries to find a global optimal solution avoiding over-fitting in order to deal with a large number of features [39,42]. Let us suppose that a set of samples are given, that is to say, a series of training input vector xi 2 Rd for i = 1, 2, . . ., N, which is independent and identically distributed data with sample size N, with corresponding training output labels yi 2 {+1, 1} for i = 1, 2, . . ., N. The output values +1 and 1 means the two classes, respectively. On the one hand, the m SVM maps xi 2 Rd into a feature space Rm of dimension m > d through a function U(xi), which linearizes the non-linear relationship between xi and yi. The estimated value of the output variable yi is given by [41,43,44]:
^i ¼ aT Uðxi Þ þ b y
ð12Þ
m
where a 2 R and b 2 R are coefficient vectors. On the other hand, these coefficients are obtained by solving the following optimization problem [41,43,44]:
( ) N 1 T 1X ~i Þ ¼ a a þ C ~ i Þ þ me MinTða; e; gi ; g ðg þ g 2 N i¼1 i
ð13Þ
with the constraint:
aT Uðxi Þ þ b yi 6 e þ gi ; 8i ¼ 1; 2; . . . ; N
ð14Þ
~ i ; 8i ¼ 1; 2; . . . ; N yi aT Uðxi Þ b 6 e þ g
ð15Þ
A. Suárez Sánchez et al. / Applied Mathematics and Computation 218 (2011) 3539–3552
3545
where:
gi and g~ i are the slack variables, being gi ; g~ i P 0 8i ¼ 1; 2; . . . ; N. C is the regularization parameter. e is the allowable error for each xi, being e P 0. m is a second parameter.
The previous slack variables capture errors above e and are penalized in the objective function through the constant C [41]. ^ i becomes [43]: The estimation function y
^i ¼ FðxÞ ¼ y
N N X X ~i b ÞUðxi ÞT UðxÞ þ b ¼ ~i b Þ Kðxi ; xÞ þ b ðb ðb i i i¼1
ð16Þ
i¼1
where: ~i are the Lagrange multipliers for the constraints given by Eqs. (14) and (15). bi and b K(xi, xj) = U(xi)TU(xj) is the kernel function. In this research work, a radial basis kernel function is used, which is defined as follows:
Kðxi ; xj Þ ¼ expfckxi xj k2 g
ð17Þ
being c another parameter. In this way, taking the radial basis function as the kernel function, the m SVM has three parameters to be determined: C, m and c. The resulting quadratic programming always has a global optimal solution for a and b once the input of the three previous parameters (C,m ,c) is known [43,44]. In summary, in order to use an SVM to solve a classification problem on data that is not linearly separable, we need to first choose a kernel and relevant parameters which you expect might map the non-linearly separable data into a feature space where it is linearly separable. This is more of an art than an exact science and can be achieved empirically, e.g. by trial and error. Sensible kernels to start with are the radial basis, polynomial and sigmoidal kernels. Therefore, the first step consists of choosing our kernel and hence the mapping x ´ U(x). In this work, a radial basis kernel function was used with success. 3.7. Statistical measures of the performance of a classification model In a scenario where people are checked for a work-related accident, the outcome test can be positive (accident) or negative (no accident), while the actual work-related accident status of the person may be different. Sensitivity and specificity are statistical measures of the performance of a binary classification test. Sensitivity measures the proportion of actual positives, which are correctly identified (tp), and specificity measures the proportion of negatives, which are correctly identified (tn):
Sensitiv ity ¼
tp tp þ fn
ð18Þ
Specificity ¼
tn fp þ tn
ð19Þ
where fp means false positive and fn false negative. Supervised Machine Learning (SML) has several ways of evaluating the performance of learning algorithms and the classifiers they produce. Measures of the quality of classification are built from a confusion matrix which records correctly- and incorrectly-recognized examples for each class. Table 1 presents a confusion matrix for binary classification, where as has been explained before, tp are true positive, fp false positive, fn false negative, and tn true negative counts. Accuracy assesses the overall effectiveness of the algorithm. It is given by:
Accuracy ¼
tp þ tn tp þ fp þ fn þ tn
ð20Þ
Table 1 Confusion matrix for binary classification. Class/recognized
As positive
As negative
Positive Negative
tp fp
fn tn
3546
A. Suárez Sánchez et al. / Applied Mathematics and Computation 218 (2011) 3539–3552
4. Results As was previously explained in materials and methods, the blocks of the survey initially considered in order to determine if a worker has suffered or not an accident in the last year or not were formed by 78 variables. Before the use of any machine learning technique it is frequently necessary to apply any dimensional reduction technique that removes variables that are redundant for the model that is going to be built, leaving only the minimum number of variables necessary in order to represent the data variability. In the present research the semi-parametric Probabilistic Principal Components Analysis (SPPCA) method was applied before building the predictive model. The results obtained by the SPPCA were not satisfactory as no subset of variables could be found that was sufficiently representative of data variability. Fig. 1 represents the scree plot of the eigenvalues as a function of the number of components. Due to the high number of variables needed to represent the data variability that was demonstrated by the SPPCA method, it was decided to use a MARS model in order to find out which of the input variables were more relevant in relation to the output variable. According to the results of the MARS model, there are only 18 relevant variables for discovering wether any of the polled workers have suffered an accident in the last year. Table 2 shows the coefficients of the variables of the MARS model arranged by their amount in absolute value (variables are represented by the names officially assigned by the Spanish National Institute of Safety and Hygiene in the Workplace). The meaning of each of the variables is explained in Table 3 while Table 4 represents their importance. Please note that although the name of the variables seems complicated, they are the official code used by the Sixth National Survey on Working Conditions. The generalized cross-validation (GCV) value of the MARS model obtained is 0.08529969 while the raw residual sum-of-squares (RSS) is 930.3035. Fig. 2 shows some of the splines that are part of the MARS model. The six variables represented in this figure have been chosen from among the 18 identified in the MARS model according to their relevance in terms of occupational health and safety: the labor relation with the company highlights the importance of organizational factors, the type of activity and occupation define the kind of tasks carried out, the type of hazards present at work and the physical demands of the job report on the working conditions. This figure not only shows some of the variables which are relevant but also allows the reader to know which answers’ values have more influence over accident occurrence. Please note that x-axis shows the rank of values that are taken by each one of the dependent variables (P2, P10, P10_ABI, etc.) while the y-axis shows how much these variables affect the output variable (having suffered an occupational accident in the last twelve months). Thus, for example, in the case of variable P2 (Employment status), it may be observed that the maximum number of injuries is reported by employees, while the minimum is reported by the self-employed. Following this reasoning, in the case of variable P10 (Major occupational group) the category ‘‘Farmers, ranchers, fishermen and sailors’’ is the one that reports the lowest number of accidents, while in variable P10_ABI (Occupation-detail) the workers who report the most accidents are ‘‘Fixed machine operators: oven, press, saw, etc.’’ As for variable P27_1 (Main accident hazards in the workplace), it is possible to say that safety hazards such as falls, cuts and entrapment are the most frequent in the working environment of injured workers. After the MARS model was applied, support vector machines (SVM) were trained using as the input variables those selected by the MARS model and the variable accident occurrence as output (0 no accident; 1 accident) of the model. The details of the topology selected for the SVM model are listed in Table 5. The SVM model has been applied over the data base five times using different subsets each time for training and validation. The selection of data was randomly performed, each
Fig. 1. Scree plot graph of the eigenvalues versus the number of components.
A. Suárez Sánchez et al. / Applied Mathematics and Computation 218 (2011) 3539–3552
3547
Table 2 Coefficients of the variables of the MARS model (arranged by importance in absolute value). Function
Coefficients
Intercept h(2-P32_3) h(P45_4) h(2-P6) h(P2_3) h(2-P36_7) h(2-P49) h(2-P43) h(P45_2) h(4-P30_4) h(3-P2) h(P30_10-4) h(8-P10) h(P34_2-5) h(4-P30_10) h(5-P30_12) h(P34_11-2) h(5-P34_2) h(7-P13) h(9-P27_5) h(P49_2) h(P32_3-2) h(P10_ABI-97) h(P10_8) h(97-P10_ABI) h(P27_1-16) h(98-P31_1)
5.64 101 4.55 102 4.31 102 3.96 102 3.90 102 3.62 102 3.48 102 3.13 102 2.83 102 2.50 102 1.98 102 1.89 102 1.66 102 1.16 102 1.10 102 1.09 102 8.29 103 7.73 103 7.33 103 7.21 103 5.18 103 5.09 103 2.38 103 2.02 103 1.59 103 4.80 104 2.77 104
Table 3 Name of the variables considered important for the MARS model. P2 P6 P10_ABI P10 P13 P27_1 P27_5 P30_4 P30_10 P30_12 P31_1 P32_3 P34_2 P34_11 P36_7 P43 P45 P49
Employment status Employee of the main company or subcontract Occupation-detail Major occupational group Seniority in the company Main accident hazards I Main accident hazards II Physical demands/Workplace design: important physical effort Physical demands/Workplace design: uncomfortable seat Physical demands/Workplace design: unstable or uneven work surface Body part discomfort due to physical strain Work demands: strict deadlines Psychosocial possibilities: can get help from boss if asked Psychosocial possibilities: considers job emotionally demanding Work rhythm determining factors: traffic Extended working day Fit between working hours and family or social commitments Recent workplace risk assessment
time 8000 cases (72.60% of the total) were employed for the training and 3021 for validation (27.40%). In order to obtain the parameters of the topology that are listed in Table 5 several configurations were tested with different kernel types (radial basis function, polynomial and hyperbolic tangent) and parameter values. These tests were performed in the same way as the methodology proposed by Sánchez Lasheras et al. [45]. Table 6 represents a confusion matrix in which the mean values obtained from the validation of the results of five different SVM models trained are shown. The columns represent the real category of each one of the cases while rows reflect the predicted category. For example, the average number of people that did not suffer a work-related accident and were detected
3548
A. Suárez Sánchez et al. / Applied Mathematics and Computation 218 (2011) 3539–3552 Table 4 Importance of the explanatory variables in the MARS model. Variable
Nsubsets
GCV
RSS
P30_4 P27_1 P27_5 P32_3 P49 P45 P43 P13 P2 P36_7 P30_10 P31_1 P30_12 P6 P10_ABI P34_11 P34_2 P10
28 25 24 23 22 20 19 18 17 16 15 14 12 10 9 8 7 5
100.000 40.458 29.874 23.736 19.836 16.037 13.552 11.606 10.446 8.991 7.806 6.817 4.735 3.023 2.324 1.726 1.057 0.525
100.000 44.905 34.965 29.063 25.193 21.083 18.499 16.406 15.024 13.376 11.975 10.750 8.209 6.005 5.046 4.178 3.247 2.111
Fig. 2. Some of the important variables according to the MARS model (0 no accident; 1 accident): (a) labor relations with company; (b) major occupational group; (c) occupation (detail); (d) main accident risk (I); (e) main accident risk (II); (f) physical demand/workplace design: important physical effort.
A. Suárez Sánchez et al. / Applied Mathematics and Computation 218 (2011) 3539–3552
3549
Table 5 Parameters of the SVM model. Parameter
Value
Type Kernel
m-classification Radial 0.05 0.001 0.1
m Tolerance of stopping criterion
e
Table 6 Confusion matrix: mean values of the validation results of five different SVM models trained. Class/recognized
As positive
As negative
Positive (accident) Negative (no accident)
267 0
7 2747
Fig. 3. ROC curve corresponding to one of the five SVM models trained.
by the model was 2747. The ROC curve of one of the five SVM models trained can be observed in Fig. 3. Please note that the average area under the ROC curve of the five models trained is 0.98725. Finally, it must be remarked according to the information contained in Table 6 that the specificity of the model is 100%. Thus, it is able to detect about 97.45% of all those workers not suffering an accident, while it is able to detect all those who have suffered a work-related accident in the last 12 months. Therefore the model sensitivity is 97.45%. The accuracy achieved was of 99.77%. All these performance measurements are referred to the validation data set. 5. Discussion and conclusions In the present work a classification model of workers by means of SVM was developed. This model presents high rates of specificity and sensibility therefore it is able to classify correctly most of the workers who answered the survey. The variables considered of interest for the model are those listed in Table 3. According to the model (variable P2, employment status), the highest number of accidents is reported by employees covered by social security insurance, whereas the lowest is reported by the self-employed. This is consistent with the results obtained by Lindroos and Burström [46], who suggest that the legal requirement to report occupational accidents to the authorities is probably frequently violated by self-employed, deliberately or unconsciously, because if the injury will not result in financial compensation, there is no incentive to report it to the social security system. The importance of variable P6 (Employee of the main company or subcontract), and specifically, the higher accident ratio among subcontracted workers, has already been mentioned by Kirschenbaum et al. [9] and by Fabiano et al. [10]. As for workplace hazards (P27_1 and P27_5), falls (people or materials), cuts, entrapment, etc., as well as flying particles are the most frequently mentioned by injured workers. All of them are important factors recorded by Camino Lopez et al.
3550
A. Suárez Sánchez et al. / Applied Mathematics and Computation 218 (2011) 3539–3552
[47] as accident causes in their study, which was focused on the construction industry, and can be extrapolated to virtually all industrial sectors. The developed model demonstrates that high physical demands (variable P30_4) can predict accident potential as this variable was found to be the most important in the model (see Table 4). This point has also been proven by Holcroft and Punnett [22]. The importance of working to strict deadlines (P32_3) and overtime (P43) revealed by the model has also been suggested by various authors [11–14]. The model also shows that workers whose work rhythm depends on traffic (P36_7) tend to suffer more frequently from occupational accidents. The database does not provide information on the type of occupational accident suffered by the subjects, but it is necessary to take into account that traffic accidents during work and during the commute to and from work (which are considered occupational accidents) account for up to approximately sum up to 25% of all occupational accidents in Spain. The connection between traffic and occupational accidents has been studied in various recent works [48–51]. A surprising result of the study is the fact that those workers whose job’s risks have not been assessed in the last year are the ones who report the lowest accident frequency. This result contradicts the findings made by Antao et al. [18] and Jacinto et al. [19]. It is necessary to take into account that the assessment and the control of occupational risks are legal duties of employers, but once risk assessment has been carried out it does not need to be updated unless there are changes in the tasks. A possible explanation for the apparent contradiction of results could be that risk assessment may be refreshed more frequently in rapidly changing and risky jobs. Thus, those workers exposed to more frequently assessed risks are probably also exposed to more risky working conditions as well. From our point of view the presence of variables P30_10 (Physical demand/Workplace design: uncomfortable seat) and P31_1 (Body part discomfort due to physical strain) among those considered of interest by the model was not initially expected. According to the model, those workers who have a very uncomfortable seat are more frequently victims of an occupational accident, as well as those who report neck or upper back discomfort due to physical strain. This point can be explained by the connection between physical discomfort, fatigue, risk and, finally, accident occurrence, as has been shown in different contexts by various authors [52–58]. Another group of interesting variables are those related to psychosocial or stressing factors: P34_2 (Psychosocial possibilities: can get help from boss if asked), P34_11 (Psychosocial possibilities: considers job emotionally demanding) and P45 (Fit between working hours and family or social commitments). The connection of safety to social relations has been proven by Luria [59], who demonstrates that relationships of trust between leaders and subordinates are related to safety outcomes. Similar conclusions are obtained by Kath et al. [60]. The demands of the job (including emotional demands) has been negatively connected to occupational injuries by Kirschenbaum et al. [9], Swaen et al. [61] and Anderson Snyder et al. [62]. Finally, Martín-Fernández et al. [63] developed a study that suggested the influence of work-family strain on accident occurrence among women. In general, all these variables can be connected to job dissatisfaction, of which the link to accident occurrence has been strongly supported in the empirical literature [64]. Job dissatisfaction will affect occupational injuries directly, as well as through its immediate consequences, such as distractions, lower safety motivation, etc. There are some other initially relevant variables included in the Spanish Sixth National Survey on Working Conditions, such as the type of contract, size of the company (measured by number of workers), noisy work environment, job training and safety training. For example, Benavides et al. [65] and Fabiano et al. [10] detected a higher accident frequency among temporary workers. As for the company size, diverse studies have shown that the size of the company maintains an inverse relation with accident frequency rates in all productive sectors [47,66–68]. On the other hand, the association between occupational noise exposure and the risk of industrial accidents has been suggested by Picard et al. [21] and by Holcroft and Punnett [22]. Jacinto et al. [19] confirm the negative influence of the inexperience of the worker with the task or technology on accident frequency. In the model developed by Attwood et al. [5], safety training appears as having a crucial relevance on accident occurrence. Although these previous works have highlighted the influence of those variables on accident occurrence, there was no such indication in the current study. It is important to remark that from the mathematical point of view the current research paper have used the following techniques: SPPCA, MARS and SVM. In the first place, SPPCA was used in order to perform a dimensional reduction but no satisfactory results were obtained. Secondly a dimensional reduction was performed using a MARS model with success. Finally the variables selected as important by the MARS model were used as input variables for a SVM model that was able to classify those workers that have suffered an occupational accident in the last twelve months from those that not. Finally, this research study has shown that the SVM technique is able to predict occupational accidents with success similar to that of other complex problems [44,45,69,70]. From an occupational health and safety point of view, the most important contribution of this work is perhaps the development of a prediction model which is independent from the specific sector. According to our model, the type of occupation is in fact relevant in the occurrence of accidents, but several other factors have been identified that have an important influence as well, no matter what the occupation of the worker is. This is a very significant contribution, since similar works published to date [5,22,25] were strongly targeted to specific sectors and their results could not be extrapolated. The authors of this work are confident that the results obtained in this research will be useful in promoting further works in this line developing innovative methodologies in work-related accidents prediction. According to the results of the models, and from our point of view, the easiest way to apply the model in enterprises in order to reduce the incidence of occupational accidents would be the implementation of policies focused on the reduction of those situations considered risky by our model (which means they have a greater influence over the output value of the model). For example, it would be useful
A. Suárez Sánchez et al. / Applied Mathematics and Computation 218 (2011) 3539–3552
3551
to develop policies focused on the improvement of the employment status of workers in each company, to avoid frequent changes of subcontracted personnel, to perform specific programmes focused on certain occupational groups, to improve the means available to workers in order to reduce the physical demands of jobs, to promote the ergonomic design of workplaces, etc. Acknowledgements The authors wish to acknowledge Spanish National Institute of Safety and Hygiene in the Workplace [4], for providing us with the results of the survey as well as the computational supports provided by Tecniproject Ltd and the Department of Mathematics at University of Oviedo. We would like to thank Anthony Ashworth for his revision of the manuscript. References [1] International Labour Organization (ILO), Promoting safe and healthy jobs, The ILO Global Programme on Safety, Health and the Environment (Safework), in: World of Work, vol. 63, 2008, pp. 4–11. [2] EUROSTAT, Labour Force Survey 2007 ad hoc module on accidents at work and work-related health problems, in: European Communities, 2009. [3] European Commission, Directorate-General for Employment, Social Affairs and Equal Opportunities, The Social Situation in the European Union 2007, in: European Communities, 2007. [4] Spanish Ministry of Labour and Immigration, in: Bulletin of Labour Statistics, Government of Spain, 2010. [5] D. Attwood, F. Khan, B. Veitch, Can we predict occupational accident frequency?, Process Saf Environ. 84 (3) (2006) 208–221. [6] I. Fontaneda, M.A. Manzanedo, Working conditions in Spain after the adoption of Law 31/95 on prevention of occupational risks and their evolution, University of Burgos, Spain, 2005. [7] E. Visser, Y.J. Pijl, R.P. Stolk, J. Neeleman, J.G.M. Rosmalen, Accident proneness, does it exist? A review and meta-analysis, Accident Anal. Prev. 39 (2007) 556–564. [8] G.C. Gauchard, J.M. Mur, C. Touron, L. Benamghar, D. Dehaene, Determinants of accident proneness: a case-control study in railway workers, Occup. Med. 56 (3) (2006) 187–190. [9] A. Kirschenbaum, L. Oigenblick, A.I. Goldberg, Well being, work environment and work accidents, Soc. Sci. Med. 50 (5) (2000) 631–639. [10] B. Fabiano, F. Curro, A.P. Reverberi, R. Pastorino, A statistical study on temporary work and occupational accidents: specific risk factors and risk management strategies, Safety Sci. 46 (2008) 535–544. [11] R. Lilley, A.M. Feyer, P. Kirk, P. Gander, A survey of forest workers in New Zealand: do hours of work, rest, and recovery play a role in accidents and injury?, J Safety Res. 33 (1) (2002) 53–71. [12] A.E. Dembe, J.B. Erickson, R.G. Delbos, S.M. Banks, The impact of overtime and long work hours on occupational injuries and illnesses: new evidence from the United States, Occup. Environ. Med. 62 (9) (2005) 588–597. [13] C.C. Caruso, T. Bushnell, D. Eggerth, A. Heitmann, B. Kojola, Long working hours, safety, and health: toward a National Research Agenda, Am. J. Ind. Med. 49 (11) (2006) 930–942. [14] S. Folkard, D.A. Lombardi, Modeling the impact of the components of long work hours on injuries and accidents, Am. J. Ind. Med. 49 (11) (2006) 953– 963. [15] O. Lindroos, W. Aspmana, G. Lidestav, G. Neely, Accidents in family forestry’s firewood production, Accident Anal. Prev. 40 (2008) 877–886. [16] C. Sundstrom-Frisk, Behavioural control through piece-rate wages, J. Occup. Accid. 6 (1984) 49–59. [17] C. Stave, M. Törner, Exploring the organisational preconditions for occupational accidents in food industry: a qualitative approach, Safety Sci. 45 (2007) 355–371. [18] P. Antao, T. Almeida, C. Jacinto, C. Guedes Soares, Causes of occupational accidents in the fishing sector in Portugal, Safety Sci. 46 (2008) 885–899. [19] C. Jacinto, M. Canoa, C. Guedes Soares, Workplace and organisational factors in accident analysis within the food industry, Safety Sci. 47 (2009) 626–635. [20] U. Varonen, M. Mattila, The safety climate and its relationship to safety practices, safety of the work environment and occupational accidents in eight wood-processing companies, Accident Anal. Prev. 32 (2000) 761–769. [21] M. Picard, S.A. Girard, M. Simard, R. Larocque, T. Leroux, Association of work-related accidents with noise exposure in the workplace and noise-induced hearing loss based on the experience of some 240,000 person-years of observation, Accident Anal. Prev. 40 (2008) 1644–1652. [22] C.A. Holcroft, L. Punnett, Work environment risk factors for injuries in wood processing, J. Safety Res. 40 (2009) 247–255. [23] D. Attwood, F. Khan, B. Veitch, Occupational accident models—where have we been and where are we going?, J Loss Prevent. Proc. 19 (2006) 664–682. [24] F.C. Breslin, E. Tompa, R. Zhao, J.D. Pole, The relationship between job tenure and work disability absence among adults: a prospective study, Accident Anal. Prev. 40 (1) (2008) 368–375. [25] P.S. Paul, J. Maiti, The synergic role of sociotechnical and personal characteristics on work injuries in mines, Ergonomics 51 (5) (2008) 737–767. [26] K. Pearson, On lines and planes of closest fit to systems of points in space, Philos. Mag. 2 (6) (1901) 559–572. [27] H. Hotelling, Analysis of a complex of statistical variables into principal components, J. Educ. Psychol. 24 (6–7) (1933) 417–441. 498–520. [28] T. Chen, E. Martin, G. Montague, Robust probabilistic PCA with missing data and contribution analysis for outlier detection, Comput. Stat. Data An. 53 (12) (2009) 3706–3716. [29] N.P. Sajama, A. Orlitsky, Semi-parametric exponential family PCA, Adv. Neur. In. 17 (2004) 1177–1184. [30] A. Kaban, M. Girolami, A combined latent class and trait model for the analysis and visualization of discrete data, IEEE T. Pattern Anal. 23 (8) (2001) 859–872. [31] M. Collins, S. Dasgupta, R.E. Schapire, A generalization of principal components analysis to the exponential family, in: T.G. Dietterich, S. Becker, Z. Ghahramani (Eds.), Advances in Neural Information Processing Systems, vol. 14, 2002. [32] H.H. Bauschke, X. Wang, J. Ye, X. Yuan, Bregman distances and Chebyshev sets, J. Approx. Theory 159 (1) (2009) 3–25. [33] J.H. Friedman, Multivariate adaptive regression splines (with discussion), Ann. Stat. 19 (1991) 1–141. [34] T. Hastie, R. Tibshirani, J. Friedman, The Elements of Statistical Learning, Springer-Verlag, New York, 2003. [35] Q.S. Xu, M. Daszykowski, B. Walczak, F. Daeyaert, M.R. de Jonge, J. Heeres, L.M.H. Koymans, P.J. Lewi, H.M. Vinkers, P.A. Janssen, D.L. Massart, Multivariate adaptive regression splines—studies of HIV reverse transcriptase inhibitors, Chemometr. Intell. Lab. 72 (1) (2004) 27–34. [36] J.H. Friedman, C.B. Roosen, An introduction to multivariate adaptive regression splines, Stat. Methods Med. Res. 4 (1995) 197–217. [37] S.S. Sekulic, B.R. Kowalski, MARS: a tutorial, J. Chemometr. 6 (1992) 199–216. [38] F.J. De Cos Juez, F. Sánchez Lasheras, P.J. García Nieto, M.A. Suárez Suárez, A new data mining methodology applied to the modelling of the influence of diet and lifestyle on the value of bone mineral density in post-menopausal women, Int. J. Comput. Math. 86 (10) (2009) 1878–1887. [39] V. Vapnik, The Nature of Statistical Learning Theory, Springer, New York, 1995. [40] I. Steinwart, A. Christmann, Support Vector Machines, Springer, New York, 2008. [41] B. Schölkopf, A.J. Smola, R.C. Williamson, P.L. Bartlett, New support vector algorithms, Neural Comput. 12 (2000) 1207–1245. [42] V. Vapnik, Statistical Learning Theory, Wiley-Interscience, New York, 1998. [43] X. Li, D. Lord, Y. Zhang, Y. Xie, Predicting motor vehicle crashes using support vector machine models, Accident Anal. Prev. 40 (2008) 1611–1618.
3552
A. Suárez Sánchez et al. / Applied Mathematics and Computation 218 (2011) 3539–3552
[44] F.J. de Cos Juez, P.J. García Nieto, J. Martínez Torres, J. Taboada, Analysis of lead times of metallic components in the aerospace industry through a supported vector machine model, Math. Comput. Model. 52 (7–8) (2010) 1177–1184. [45] F. Sánchez Lasheras, J.A. Vilán Vilán, P.J. García Nieto, J.J. del Coz Díaz, The use of design of experiments to improve a neural network model in order to predict the thickness of the chromium layer in a hard chromium plating process, Math. Comput. Model. 52 (7–8) (2010) 1169–1176. [46] O. Lindroos, L. Burström, Accident rates and types among self-employed private forest owners, Accident Anal. Prev. 42 (6) (2010) 1729–1735. [47] M.A. Camino López, D.O. Ritzel, I. Fontaneda, O.J. Gonzalez Alcantara, Construction industry accidents in Spain, J. Safety Res. 39 (5) (2008) 497–507. [48] S. Salminen, Traffic accidents during work and work commuting, Int. J. Ind. Ergonom. 26 (1) (2000) 75–85. [49] B.P. McCall, I.B. Horwitz, Occupational vehicular accident claims: a workers’ compensation analysis of Oregon truck drivers 1990–1997, Accident Anal. Prev. 37 (4) (2005) 767–774. [50] S. Boufous, A. Williamson, Work-related traffic crashes: a record linkage study, Accident Anal. Prev. 38 (1) (2006) 14–21. [51] B. Charbotel, J.L. Martin, M. Chiron, Work-related versus non-work-related road accidents, developments in the last decade in France, Accident Anal. Prev. 42 (2) (2010) 604–611. [52] T.M. Nelson, Fatigue, mindset and ecology in the hazard dominant environment, Accident Anal. Prev. 29 (4) (1997) 409–415. [53] G. Maycock, Sleepiness and driving: the experience of UK car drivers, Accident Anal. Prev. 29 (4) (1997) 453–462. [54] P.K. Nag, V.G. Patel, Work accidents among shift workers in industry, Int. J. Ind. Ergonom. 21 (1998) 275–281. [55] Z. Li, K. Jiao, M. Chen, C. Wang, Reducing the effects of driving fatigue with magnitopuncture stimulation, Accident Anal. Prev. 36 (4) (2004) 501–505. [56] T. Oron-Gilad, A. Ronen, D. Shinar, Alertness maintaining tasks (AMTs) while driving, Accident Anal. Prev. 40 (2008) 851–860. [57] S. DeArmond, P.Y. Chen, Occupational safety: the role of workplace sleepiness, Accident Anal. Prev. 41 (5) (2009) 976–984. [58] S. Niu, Ergonomics and occupational safety and health: an ILO perspective, Appl. Ergon. 41 (6) (2010) 744–753. [59] G. Luria, The social aspects of safety management: trust and safety climate, Accident Anal. Prev. 42 (4) (2010) 1288–1295. [60] L.M. Kath, V.J. Magley, M. Marmet, The role of organizational trust in safety climate’s influence on organizational outcomes, Accident Anal. Prev. 42 (5) (2010) 1488–1497. [61] G.M.H. Swaen, L.P.G.M. van Amelsvoort, U. Bültmann, J.J.M. Slangen, I.J. Kant, Psychosocial work characteristics as risk factors for being injured in an occupational accident, J. Occup. Environ. Med. 46 (6) (2004) 521–527. [62] L. Anderson Snyder, A.D. Krauss, P.Y. Chen, S. Finlinson, Y. Huang, Occupational safety: application of the job demand-control-support model, Accident Anal. Prev. 40 (5) (2008) 1713–1723. [63] S. Martín-Fernández, I. de los Ríos, A. Cazorla, E. Martínez-Falero, Pilot study on the influence of stress caused by the need to combine work and family on occupational accidents in working women, Safety Sci. 47 (2009) 192–198. [64] J. Barling, E.K. Kelloway, R.D. Iverson, High quality work, job satisfaction and occupational injuries, J. Appl. Psychol. 88 (2) (2003) 276–283. [65] F.G. Benavides, J. Benach, C. Muntaner, G.L. Delclos, N. Catot, M. Amable, Associations between temporary employment and occupational injury: what are the mechanisms?, J Occup. Environ. Med. 63 (6) (2006) 416–421. [66] B.Y. Jeong, Comparisons of variables between fatal and nonfatal accidents in manufacturing industry, Int. J. Ind. Ergonom. 23 (5–6) (1999) 565–572. [67] B. Fabiano, F. Curro, R. Pastorino, A study of the relationship between occupational injuries and firm size and type in the Italian industry, Safety Sci. 42 (7) (2004) 587–600. [68] C.F. Chi, C.C. Yang, Z.L. Chen, In-depth accident analysis of electrical fatalities in the construction industry, Int. J. Ind. Ergonom. 39 (4) (2009) 635–644. [69] J.A.K. Suykens, T. Van Gestel, J. De Brabanter, B. De Moor, J. Vandewalle, Least Squares Support Vector Machines, World Scientific Publishing Company, New York, 2002. [70] Y. Zhang, Y. Xie, Forecasting of short-term freeway volume with support vector machines, Transportation Research Record, in: Journal of the Transportation Research Board, No. 2024, TRB, National Research Council, Washington, DC, 2007, pp. 92–99.