A knowledge discovery and reuse method for time estimation in ship block manufacturing planning using DEA

A knowledge discovery and reuse method for time estimation in ship block manufacturing planning using DEA

Advanced Engineering Informatics 39 (2019) 25–40 Contents lists available at ScienceDirect Advanced Engineering Informatics journal homepage: www.el...

6MB Sizes 0 Downloads 28 Views

Advanced Engineering Informatics 39 (2019) 25–40

Contents lists available at ScienceDirect

Advanced Engineering Informatics journal homepage: www.elsevier.com/locate/aei

Full length article

A knowledge discovery and reuse method for time estimation in ship block manufacturing planning using DEA Jinghua Lia,1, Miaomiao Suna,

T

⁎,1

a b

, Duanfeng Hana, Jiaxuan Wanga, Xuezhang Maoa, Xiaoyuan Wub

College of Shipbuilding Engineering, Harbin Engineering University, Harbin 150001, China Shanghai Waigaoqiao Shipbuilding Co., Ltd, Shanghai 200137, China

ARTICLE INFO

ABSTRACT

Keywords: Estimation Shipbuilding Block manufacturing DEA K-Means Neural network

Rational and precise time estimation in a manufacturing plan is critical to the success of a shipbuilding project. However, due to the large number of various ship blocks, existing means are somehow inadequate to make the expected estimation. This paper proposes a novel three-stage method to discover and reuse the knowledge about how the duration and the slack time is essential while manufacturing a specific ship block. An efficient arrangement of the duration and the slack time means that the activity is more likely to be finished within the allocated duration, or if not, the extra consumed time does not exceed the given slack time which is at its lowest level. With such knowledge, planners can rapidly estimate the time allocation of all the manufacturing activities in the planning stage, which raises the possibility of successful execution within the limited budget. Different from previous studies, this research utilizes the execution data to find efficiency frontiers of the planned time arrangement (the duration and the slack time). For the sake of the evaluation validity, ship blocks are primarily clustered according to their features using the K-Means algorithm. In the second stage, an adapted data envelopment analysis (DEA) model is presented to evaluate the planned time arrangement. By processing the results, efficient time arrangements for manufacturing all the blocks can be obtained, hence, forming a data basis to boost the time estimation accuracy. In the last stage, genetic algorithm-backpropagation neural network (GABPNN) models are trained to capture the knowledge for further reuse by planners. Verified through experiments, this research almost outperforms several peer methods in terms of precision.

1. Introduction

machinery, to name a few. Despite the high degree of product customization, all types of ships need to be hierarchically divided into grand sections, sections (blocks), and smaller assembly units for production. Initially, the intermediate products are produced and pre-outfitted separately, and then transported to the dock for erection and on board outfitting, thus forming the whole ship. This is known as “block construction” method which has been adopted by the majority of shipyards [5]. It is worth noting that the block manufacturing activity count nearly half of the project work. At present, a block manufacturing process is basically standardized: it is sequentially completed through pretreatment, laying-off & cutting, manufacture of small and medium section, construction on the pin-jig site, integrity checking & pre-outfitting, painting and erection, respectively [6,7]. The process is shown as Fig. 1. Accordingly, the block manufacturing plan, which is a major part of the shipbuilding project schedule, has a fixed structure containing certain activities and logical relations. Time arrangements of previous

For a very large project, such as the manufacture of a new ship, it is very important to conduct proper and precise time estimation of the activities in the planning phase since the results highly affect both the resource allocation and the overall project performance. It is believed that if a project timetable is too loose or too tight, it might undermine the productivity [1]. In the case of a loose plan, there would be additional costs caused by the machine idleness and the intermediate storage. However, if the plan is tight, labor fatigue and unforeseen risks may incur the project tardiness [2]. Different from scheduling, the work planning tends to be intuitive with considerable reliance on the expert experience [3]. Planners have to provide the list of activities as well as the corresponding time estimation to instruct scheduling [4]. A shipbuilding project is characterized by its lengthy lifecycle and the associated very large number of activities that need multi-disciplinary collaborations, such as structure, outfitting, piping,

Corresponding author at: Room 416, Chuanhai Building, Harbin Engineering University, Harbin, Heilongjiang Province, PR China. E-mail address: [email protected] (M. Sun). 1 These two authors contributed equally to this work. ⁎

https://doi.org/10.1016/j.aei.2018.11.005 Received 13 March 2018; Received in revised form 10 October 2018; Accepted 22 November 2018 1474-0346/ © 2018 Elsevier Ltd. All rights reserved.

Advanced Engineering Informatics 39 (2019) 25–40

J. Li et al.

Fig. 1. Block manufacturing process.

projects that include the duration and the slack time of all the block manufacturing activities, have been processed and stored as templates in order to rapidly generate the block manufacturing schedule for new projects. In the context of shipbuilding, the slack time is essentially placed between the different activities in order to reduce the risks of potential delays in the lengthy lifecycle. The slack time allows activities to be completed later than the initially anticipated deadline, thus avoiding frequent changes in the plan. The less a plan is modified during execution, the better a plan is, which also implies the planning work of higher quality. At present, planners are able to acquire the time arrangement for the blocks of a new ship by referring to the template of a parent ship. In most cases, the time arrangement works directly as the formal

information for controlling execution. After completing a project, the planners modify the plan template in light of the practical execution data, which involves the application of the linear fitting and average method. Using a template is a very convenient planning way in the shipbuilding industry where thousands of block manufacturing activities exist within a single project. For instance, the manufacturing project of a cruise ship weighing 133,500 tons contains more than 4000 blocks and over 28,000 manufacturing activities. It is impossible to accurately estimate the time of each activity for every block manually. Still, the above described way of reusing a whole ship (the parent ship) manufacturing plan inevitably incorporates more imprecision. One main cause is that blocks of dissimilar complexities require different amounts 26

Advanced Engineering Informatics 39 (2019) 25–40

J. Li et al.

of time to complete the same manufacturing activity. As a consequence, this existing practice usually bears significant execution deviations. Hence, a more accurate estimation approach is urgently demanded. Existing research about estimating the activity duration can be divided into two groups. The first group includes the statistical models and the estimation rules, but shows less attention to the correlation of impact factors. As a consequence, the precision is often unsatisfactory, not to mention the inconvenience in estimating a large number of activities. Likewise, the other group trains neural networks to learn relationships between the duration and the potential influencing factors. However, nearly all these studies directly utilize the actual execution data to train estimation models, which seems to be questionable. The reason is that nowadays most execution data is still fed by on-site workers and a large volume of the collected data is incomplete or incorrect [8]. Hence, directly using the data leaves no doubt that the time arrangement will not be very efficient for a new project. Besides, without comparing to the planned item, it is difficult to figure out whether the manufacturing practice is reasonable or not. In other words, knowledge from the experienced planners is ignored and the planning process is completely driven by practice. Consequently, the company is at the risk in the long run of having a lower productivity. The main purpose of this paper is to improve the time estimation precision in the planning phase, based on employing the knowledge between the block features and the efficient time arrangements for the manufacturing activities. An efficient time arrangement of the duration and slack time means that the activity is more likely to be finished within the allocated time, or if not, the extra consumed time does not exceed the given slack time, while the slack time is at its lowest level. To find such efficient arrangements, actual production data is used to evaluate the planned practice and then, neural networks are trained to learn the knowledge for further reuse. Through this method, planners will rapidly get the more appropriate time arrangement for a new project. Taking blocks as the reference object, this research is considerably aligned with the global manufacturing trend in which several shipyards collaborate to build one ship on the condition of optimal resource utilization. It estimates the time required by an activity in the enterprise’s production circumstance and makes the manufacturing schedule much closer to the actual situation. Additionally, this method overcomes the accuracy barrier faced by the parent ship based method. The remainder of this paper is organized as follows. Section 2 reviews some related works and Section 3 describes the knowledge-based planning concept designed for the shipbuilding industry. Section 4 introduces a three-stage method for improving the planning precision. Section 5 verifies the feasibility and practicability of this research by a real case along with the estimation results being compared to several peer methods. Lastly, Section 6 summarizes the paper with the future work.

distinct intermediates [9]. However, the result was rather rough and merely adequate in the very early stage of the project. Similarly, a knowledge-based estimation method was proposed, combining the statistical models of individual procedures and the aggregation rules defined by experts [10]. Although it was advantageous, correlations amongst the factors as well as the corresponding influences to the activity duration were not considered. Till now, several methods have been applied to simulate the relationship between the factors and the required working time, amongst which, regression analysis and neural networks remain the very popular ones. The former is cost-effective but unsuitable to deal with multiple variables. Through tests in the estimation of assembly time in shipbuilding, it has been verified that regression models were less effective than the neural network [11]. Easy to implement and capable of providing good results, the backpropagation neural network (BPNN) was applied to estimate the duration of the activities in different fields. For instances, BPNN models were trained to predict the duration of major activities involved in constructing concrete frame buildings [1]. The models had the single output of the activity duration and nine to ten inputs for different intermediate products. By this means, planners had efficiently determined the construction duration of any arbitrary beam or column. In shipbuilding, a BPNN model was trained for estimating the block assembly time [10]. This model took block weight, assembly length, block type, small assembly count and subassembly count as the inputs and the assembly man hour as the output. The precision was not so well that man-hour calculation models were additionally introduced for each basic activity. Further, similar blocks were retrieved and clustered prior to the training of the BPNN model [12]. The way had proven to be very effective to improve the estimation precision. Some researchers tried to apply data mining methods for excavating knowledge hidden in the data from enterprises. By definition of the RFID-Cuboid and the Map table, an RFID logistics data warehouse was built for discovering the logistics knowledge that could help to improve the planning precision and the scheduling performance, respectively [2]. The knowledge included the learning curves of workers, the maximum machine utilization as well as the highest efficient logistics trajectory. After cleansing, compression and classification of RFID-Cuboids, spatial-temporal patterns were recognized and the knowledge was interpreted for further reuse. Besides, suitable standard operation times (SOTs), as well as the influence of various impact factors, have been mined in the RFID-enabled shop floor production [13]. To realize this, the support vector machine (SVM) was used for data clustering and the least square polynomial fitting method was then applied for reusing the knowledge. Nonetheless, these studies have directly used the actual data to formulate the duration base for planning and little attention has been paid to the impact of execution deviations. In the current manufacturing background, it is hard to create the ubiquitous RFID environment where all data is collected by machines. Majority of the execution data is still fed by on-site workers and dispersive collectors. Therefore, it is very likely to collect incomplete, inconsistent and even fake practical production data [8]. Hence, directly using the practical data leaves no doubt that the time arrangement will not be very effective for new projects. Furthermore, without comparing to the planned item, it is difficult to figure out whether the manufacturing practice is reasonable or not. The in-depth analysis of both the practical execution data and the planning data has been identified as a key aspect for quickly making a proper manufacturing plan [14].

2. Related works 2.1. Activity duration estimation The researchers’ attention has been drawn to design statistical models and calculation rules for estimating the activity duration. In order to help planners to quickly determine job durations, a hierarchical rule-based activity estimation method was designed [3]. This approach combined parametric statistical models with working condition rules. Planners selected the working conditions and then the activity duration was calculated through aggregation of the time associated with each working condition. Nonetheless, the method required a systematic representation of every possible condition, impeding its application in the complex manufacturing context. With the aim to estimate the manufacturing duration, calculation rules based on average method, percentage method and equivalent daily workload method were separately designed for the key activities to build the

2.2. Data envelopment analysis (DEA) for evaluation and improvements In order to discover the execution deficiencies, data envelopment analysis (DEA) method was used to evaluate the block manufacturing process [15]. Very meaningful as it was, the study did not show its implications in improving the planning accuracy. On the other hand, a three-stage sequential process model, employing DEA and BPNN, was 27

Advanced Engineering Informatics 39 (2019) 25–40

J. Li et al.

designed to realize incremental increases in the production [16]. At first, Decision Making Units (DMUs) were stratified based on the DEA result, and then for each efficiency tier of DMUs, BPNNs learned the input/output pattern. With the trained BPNNs, incremental increases in the inefficient DMUs to different tiers could be predicted. By this way, managers would easily make the most potential improvement strategy for their production. Illuminated by these two studies, this paper tries to use DEA to evaluate the efficiency of time arrangements for block manufacturing and moreover to find the improvements for next planning cycles. Data Envelopment Analysis (DEA) is an objective method suitable for the evaluation of units with multiple inputs and multiple outputs [17]. It characterizes the input-output relationship by enveloping the observed data to determine the best practice frontier [18]. Applied to evaluate production performance, DEA helps benchmarking the production area or process which has less inputs (like material, finance, person, etc.) and high outputs (like product, benefits, etc.) [16,19]. In DEA, the target object is coded with a basic element called Decision Making Unit (DMU). A DMU can represent a workshop, a department, a process, an economic area or any other thing in the concrete research domain. Each DMU gets an efficiency score through the DEA. In general, DMUs with a higher output level and a lower input level (the efficient score is greater than or equal to one) are regarded as the most efficient practice while the others are marked as inefficient DMUs [20]. As a non-parametric method, DEA outweighs peer approaches in two aspects. On one hand, it does not demand a definite functional relationship between inputs and outputs. As a result, there is no need to conduct complex data analysis to model the functional equations (sometimes the equations may even be invalid). On the other hand, DEA is insensitive to either the input dimensions or the input sequences. These advantages make it both simple to implement and to widen its usage in various domains, such as the healthcare industry, manufacturing, supplying chain management and so forth [16,17,21]. With the development in recent years, the DEA method is able to provide decision makers with improvements for the inefficient DMUs [22]. Slack-based measurement (SBM), an adapted DEA model, is a remarkable representative in this category [23]. By using SBM, a DMU will acquire a set of slack variables on its inputs and outputs, in addition to its performance score. Besides, SBM uses the slack variables to gauge the depth of inefficiency per se, making it distinguished amongst peer methods. In the context of SBM, DMUs are considered as efficient when all the slack variables are equal to zero. If the slack variable of an input (or output) is a non-zero number, it means that the evaluated unit has excessive input (or an output shortfall). Thus, the managers are recommended to find the reasons behind this inefficiency and they are bound to take corresponding measures during the next production season. Most DEA models presume that each input and output contribute equally to the DMU’s performance. However, each DMU seeks the more preferable weights for its data, ignoring the real differences in the importance of these indicators [21]. As a consequence, the evaluation result may be unreasonable and even invalid. Besides, decision-makers would stress some indicators in the evaluation process, thus attaching unequal weights to the inputs and outputs. To deal with this issue, the information entropy was introduced into SBM evaluation. It utilizes the law of data to assign weight to each input and output, thus avoiding unexpected influences. In the case of differentiating things, if the values of an indicator change significantly, meaning that it contributes more to the result, the corresponding information entropy is large [24,25]. Verified by tests, the entropy-based SBM has proven to be much reliable and closer to the real world [21]. By now, information entropy has been widely applied to measure the uncertainty or importance in many fields, such as mathematics, energy analysis and operational research, etc. [26].

2.3. Research gaps to be filled In short, there are four gaps between the past research and the current problem: (1) Previous deterministic time estimation models are inapt to precisely capture the complex non-linear relationships between the block features and the time required by each manufacturing activity. Thus, the precision of time estimation has not been satisfied. (2) Although the BPNN trained with classified blocks has proven to be effective in time estimation, studies performed until now have not radically addressed the problem that the execution always deviated from the schedule in shipyards. One main reason is that most studies only focused on individual activities, ignoring their logical connections. (3) Data mining methods have been applied to discover the implicit planning knowledge in the manufacturing process. However, existing research ignores the fact that many construction works are done in the open space and the collected data is more likely to be erratic and even incorrect. Without evaluation of this data, the ultimate estimation might be unexpected. In this study, the conclusion will be verified through experiments. (4) DEA method is suitable for the evaluation problem of multiple inputs and outputs. Moreover, SBM in the DEA series can additionally convert the inefficient items into efficient ones, implying more useful data available for business value creation. Nonetheless, limited effort has been made in using SBM as a means to serve the shipbuilding industry. 3. Knowledge-based activity time estimation concept Planning is a complicated process that highly depends on the experts’ knowledge [3]. It includes the knowledge about how to decompose the end product into small units for manufacturing, about which types of processes are needed, and how to estimate the time necessary for manufacturing activities or operations [13]. If the knowledge is captured for reuse under the aid of computers, planners can easily make a more proper and precise plan within a shorter time period. This research specifically focuses on the estimation knowledge that resides in the relationship between the block features and the efficient time arrangement of block manufacturing activities. In this context, the efficient time arrangement includes the most probable activity duration and the smallest slack time which just allows an activity to be finished without postponing the subsequent works. Based on the knowledge, it is able to more appropriately estimate the required time of manufacturing activities for new blocks. In order to discover and reuse the knowledge, this paper comes up with a three-stage method of which the framework is shown in Fig. 2. The inner stages are described as follows: Stage 1 clustering: data is assembled from various enterprise databases, providing a unique and consistent information backbone. Noticing that blocks with the same weight but different shapes or locations would require a different amount of manufacturing time, it becomes unreasonable to find the efficient frontier with the scope of all blocks. Therefore, blocks are clustered based on their features. Outcomes of this stage are the clustered blocks and the cluster centers. The latter will be used to classify the new blocks so that the discovered knowledge can be reused accordingly. Stage 2 evaluating: for each block cluster, the planned and executed information is organized to conduct the DEA evaluation. After analysis, a block manufacturing activity is regarded as a DMU with [manhour, plannedduration, slacktime]T and [Length, Weight]T as the inputs and the outputs, respectively. To differentiate the contribution of each input and output to the evaluation result, the respective weights

28

Advanced Engineering Informatics 39 (2019) 25–40

J. Li et al.

Fig. 2. Framework of the proposed method.

are calculated based on the information entropy theory. As regards to the specific evaluation model, slack-based measurement (SBM) is selected because it not only figures out the efficient planned duration and slack time, but also enables the conversion of inefficient units into the efficient ones. Indeed, combining the entropy weight with SBM has been verified in the previous studies [21,24,26]. The final outcome of this stage is a whole dataset of the efficient time arrangements for all blocks, indicating that every activity involved in the manufacture of each block is associated with the efficient duration and slack time. Stage 3 training: neural network (NN) models are used to simulate the relationship between block features and the efficient time arrangements. The NN method is applied for the reason that by changing the structure of the hidden layers, a NN is capable to represent as much the real condition as possible. Besides, NN is very apt to capture the non-linear relationship amid multiple factors. In this stage, the vector [Length, Weight]T is the input and the vector [manhour, plannedduration, slacktime]T is the output of each NN model. This is because the ultimate objective is to support planners with the estimated time arrangements in the planning phase. Outcomes of this stage are trained NNs. Finally, the time estimation knowledge is discovered and learned by the respective NNs. Regarding the way to reuse, a block is initially classified into the nearest cluster according to the measured distance based on its features (listed in Table 1). After that, the corresponding NNs are used to generate a combination of the activity duration and the slack time for all the manufacturing activities.

4. Proposed method 4.1. Integrated block information model for data preparation Information islands have impeded the discovery of data value for practical implications, posing a great challenge to the shipbuilding company. In order to acquire the planning knowledge, particularly the relationship between the product and its required manufacturing time, this study puts forward a generalized integrated block information model with four aspects that organize data from distributed databases. The model later abbreviated to IBIM is depicted as Fig. 3. The IBIM of a certain block is expressed as

IBIM = < blockId , StructureAspect , FinalPlanAspect , RealAspect , Knowle dgeAspect > amongst which different business objectives map the information from the StructureAspect , the FinalPlanAspect , and the RealAspect to the elements in the KnowledgeAspect . All the information is identified by the blockId . IBIM incorporates three kinds of original business aspect: the StructureAspect , the FinalPlanAspect and the RealAspect . StructureAspect contains geometry parameters, structural components, the welding length, the projection area and so on. This kind of data mainly resides in design databases and usually remains unchanged during the project unless the ship design is revised. FinalPlanAspect comprises of template remarks, activity content, logical activity sequences, planned duration, 29

Advanced Engineering Informatics 39 (2019) 25–40

J. Li et al.

Table 1 Potential block features for clustering. No.

Feature name

Data type

No.

Feature name

Data type

1 2 3 4 5

Block weight Parts count Welding length Small assembly units count Medium assembly units count

Numeric Numeric Numeric Numeric Numeric

6

Zone type

Enumeration (such as weather deck zone, bow zone etc.)

7

Block type

Enumeration (such as curved block, flat block, two-dimensional unit, volume surface section and so forth)

planned start/end time, the slack time (buffer time), workplace, the person in charge and other attributes about the production arrangement. This data is extracted from the project management software. Concerning the rather long period of shipbuilding projects, the manufacturing plan data is very sensitive to time. Actually, there might be several versions of plans from the start to the end of the project. The first edition of the plan, supported by planning knowledge, is considered to be the most favorable arrangement. In contrast, the final edition (usually the one implemented in the project completion stage) includes the influences of the various production disturbances. To this point, the final edition plan in IBIM is chosen for later knowledge discovery. RealAspect consists of the referenced plan remark, the actual start/end time, the working duration, the quality inspection result and so forth. Different from the above three aspects, KnowledgeAspect includes information derived from the analysis of the original business data. For instance, the preferable activity duration, the optimal time buffers, the most effective logistics trajectory, high efficient production units and so forth. Particularly, all the processing means, including the scheduling optimization models, time estimation models and evaluation methods, pertain to this aspect. On account of the length of this paper, the presentation format of knowledge is not discussed here.

say, complex blocks require more inputs (time, labor) than less complex blocks while the output (product weight) is the same. Therefore, it is unreasonable to evaluate the time arrangements of all blocks as a whole. However, a ship block is now roughly classified by its location zone and physical geometry, which is insufficient to investigate the relationship between different kinds of blocks and the time necessary for the manufacturing activities. Clustering is one main task of data mining, which divides things into groups according to their similarities [28]. It can be used as an independent tool to obtain the distribution of data, to observe the characteristics of each cluster, and to provide a reference for the classification of new things. Particularly, clustering helps to reduce the problem space, which is of great importance when the research domain has a very large number of elements [29]. Prevalent clustering methods include K-Means, K-MEDOIDS and agglomerative hierarchical clustering method. K-Means algorithm is simple to implement and often has good effects, making it the most widely used. By contrast, K-MEDOIDS is robust to data noisy at the cost of much more calculating time. In consideration of the large volume of ship block data, K-Means is preferable. On the other hand, while KMeans and K-MEDOIDS belong to the partitioning clustering method, agglomerative hierarchical clustering is one kind of the systematic clustering methods that are deterministic and suitable to the sample within a certain number of categories [28]. However, due to a lack of knowledge about the fixed classification layers of ship blocks [7], this paper applies K-Means, the non-hierarchical fast clustering method, to the research problem. Define features for clustering: previous studies indicated that a limited number of parameters were able to represent the essential characteristics of activities [1,3]. There are mainly four kinds of features identified in the shipbuilding industry that affect the activity

4.2. Block clustering to partition the data space In the manufacturing industries, it is the (semi-)product features that exert the greatest influence on the production activity [27]. Ship blocks with different features would see a big difference in the manufacturing time. Taking blocks of the Bulk Carrier as an example, a bow block would consume around two or three times the manufacturing time of a weight equivalent block in the parallel middle body. That is to

Fig. 3. The integrated block information model. 30

Advanced Engineering Informatics 39 (2019) 25–40

J. Li et al.

duration: the zone type, the block type, the product weight and the welding length, respectively [30,31]. In some cases, the count of small assembly units and that of medium assembly units are also selected as the inputs for time estimation [10]. In this research, seven block features are initially selected from StructureAspect as the feature candidate for clustering. Details of these features are listed in Table1. After normalization by decimal scaling, all the features are compressed using principal component analysis (PCA). By this way, the feature correlation is reduced and thus the clustering process is accelerated [32]. Processing details can be found in the text labeled by “Block clustering algorithm”. Way to find the preferable clustering result: despite of its advantages, K-Means is rather sensitive to the initial value [33]. Typical initial value selection methods include ‘sample’, ‘uniform’ and so forth. ‘Sample’ randomly selects centroids while ‘uniform’ evenly generates random centroids in the distribution of the input data. Besides, there are several distance measurements widely used in the clustering analysis, including Euclidean Distance, Cosine Method, to name a few. Thus, it is hard to judge which scheme of the measurement and the initial value selection method is better than others without conducting tests in the problem context. Additionally, the number of clusters, hardly known in advance, is also critical to the clustering result. It is believed that the cluster number does not exceed 2% of the input data size. In this situation, this study attempts to conduct experiments with different parameters to find the preferable clustering result. Let k, clusterMax, distanceMethod and initialValueSelectionMethod represent the number of clusters, the maximum number of clusters, the distance measurement and the initial value selection method, respectively. Then a parameter strategy defined in this research is the different combination of k, distanceMethod and initialValueSelectionMethod. Mark the number of distanceMethod and initialValueSelectionMethod with dm and im , respectively, then the parameter strategy space size is (clusterMax 1) × dm × im . Generally, a clustering result is appreciated if the clusters have the minimum distance inside and the maximum distance outside. So a criterion designed here is

clUtility =

× disin + (1



1 ,0 disout

x11

x1n

where m is the number of blocks and n is the x m1 x mn number of features (2) Transform the enumeration type data into natural integers: Suppose that the enumeration type dataset of feature F is et1, et2, , etp, , etf , etp is the p th kind of value that F can adopt and f is the total kinds. Then the dataset is coded by a sequence of natural numbers 1, 2, , p , , f and all the values x iF in datas are replaced correspondingly; (3) Normalize data s using decimal scaling in order to avoid influences of order of magnitude: xij kj = log10 (max (xij )), i = 1, 2, m , yij = kj and Let then 10 y11 y1n datas = ym1 ymn (4) Orthogonalization and convert data s into datapca using PCA, thus eliminating the correlation effect of different features: (a) Calculate the incidence matrix R = [rij]m × n by

datas =

rij =

m k=1

(xki

x¯i )(xkj

m

x¯j )/

k=1

x¯i )2

(xki

m k=1

(xkj

x¯j )2 , rij

= rji, rii = 1;

E ) = 0 and obtain the eigenva(b) Solve the equation of det (R lues 1 2 n > 0 , which indicates the contribution of each principle component; (c) Determine the number of principal components p : p n · i = 1 i where is the required representative level of the i=1 i original data, here = 0.94 ; (d) Let

1

=

11

12

21

22

n1

,

2

=

n2

1p

,

,

= p

2p

represent the eigen-

np

vector of the first p principal components respectively, then datapca = 1i Y1 + 2i Y2 + + ni Yn, (i = 1, 2, , p) (5) Pre-initialization of k-means: 0.02 × rowCount (datas ) , Set k = 2, clusterMax = and utilityThreshold = 1000; (6) For k = 2: clusterMax

1

where disin , disout is the sum of the within-cluster distances and that of the distances away from other clusters, respectively and is the weight provided by the planning experts. The smaller the clUtility , the better the parameter strategy. disin/ disout is not chosen because it is prone to be less effective when there is a big variance in the data space. Suppose that the disout is much greater than disin , it becomes complicated to figure out which is smaller among the numbers approaching 0. Conversely, disout / disin is also inappropriate. What is more, disin/ disout (or disout / disin ) is unable to incorporate experts’ opinions. Consider that 10 times replication of K-Means is enough for each strategy. Then for a given , the calculation of finding the preferable clusters runs at most (clusterMax 1) × dm × im × 10 times. During the process, the minimum clUtility is recorded together with the corresponding parameter strategy and the result (indexes of the target cluster and cluster centers’ points). Each clUtility less than the current utilityThreshold is regarded as the new utilityThreshold for subsequent calculations. If the difference between the new utilityThreshold and the historical minimum utilityThreshold is less than 0.1, the seeking process is terminated. Block clustering algorithm: the input of the clustering stage are the features listed in Table 1 and the output include the preferable cluster count, the cluster centers and the clustered blocks, respectively. The algorithm in the pseudocode is given as follows: (1) Input the weight and the feature data:

For dmi = 1: dm For imi = 1: im Select distanceMethod[dmi] and initialValueSelectionMethod[imi] to initialize kmeans; Acquire centers and indexes of the target cluster through calling K-Means function provided by MATLAB; Calculate clUtility ; If clUtility < utilityThreshold , then bestClusterCount = k, utilityThreshold =clUtility , and record centers’ coordinates and the clustered blocks; End imi End dmi If historical minimum utilityThreshold utilityThreshold < =0.1 Break; End if End k

4.3. DEA-based evaluation to obtain the efficient time arrangements Efficient time arrangement in the block manufacturing plan: in the shipbuilding industry, completion on time is more appreciated than ahead of the schedule. This is because docks, which are the critical resource of a shipyard, have limited production capability and the timetable of their usage is related to the company’s three years development strategy, thus more unlikely to be modified. To this point, the

31

Advanced Engineering Informatics 39 (2019) 25–40

J. Li et al.

time to transfer a specific block into the dock for erection is initially determined in the planning stage, and then is used to instruct the block manufacturing plan. If a block is completed before the predetermined time point, there would be extra cost required by both storing and transporting. In addition, many production units work sequentially to complete one (semi-) block. Since these units usually have their own production resources, time spent on the manufacturing activity varies depending on both the complexity of blocks and the current workloads. The slack time is usually placed in the plan to mitigate the difference in production pace. However, the early completion of the previous activity will cost extra holding cost because at least one of the next activities cannot finish the task ahead of the schedule, otherwise the production cycle time can be reduced in the planning phase. On the contrary, the tardiness of the previous activity makes the subsequent process unable to be implemented as planned. To make things worse, if the tardiness is very serious, there is a need to work overtime in the processes that follow. In such a case, there is no doubt that simply using the average method to revise the plan template will only make the planning less accurate. An efficient arrangement of the duration and the slack time means that the activity is more likely to be finished within the allocated duration, or if not, the extra consumed time does not exceed the planned slack time, while the slack time is at its lowest level. The fewer the deviations during the execution, the more efficient will be the time arrangement. This research attempts to apply data envelopment analysis (DEA) to discover the efficient duration and slack time for manufacturing each block. Define the DMU: DEA is a concise evaluation method that assists managers to rapidly recognize the efficient input-output practice. In DEA, the object being evaluated is coded with a basic element called Decision Making Unit (DMU). Generally, each DMU will get a relative efficiency score ranging from 0 to 1. The efficiency score will be high if a DMU has more output and less input. DMUs with an efficiency score equal to 1 will be regarded as the efficient practice while the others will be marked as inefficient DMUs [20]. All the efficient DMUs constitute the efficiency frontier. A DMU in this research represents an activity to manufacture a certain block. To find the efficient time arrangement, both the planned and executed information is used to describe the DMU. A similar study adopted attributes like the blockId, the schedule and the actual working time, etc. to evaluate the block manufacturing process performance [15]. However, it assumed that the planned timetable was reasonable and the execution accounted for all deviations from the schedule, which is deemed to be inconsistent with the actual situation. Both the plan and the execution data might be imprecise. Hence, it is plausible to use product features which could represent the activity complexity as a neutral judgement. On the other hand, similar blocks could demand a different number of workers or extra shifts of work, so some activities would consume the same number of days but with different man hours per day [13]. For this sake, it is necessary to take the man hour (MH) into consideration. Thus, this paper selects the blockId, part of the planning data and the execution data, as well as the key material volume as candidates to construct a DMU. The data is assembled through IBIM and the details are given in Table 2. As mentioned before, the block manufacturing process contain

relatively steady activity contents and activity sequences. Suppose that each block BLj experiences at most L manufacturing activities which are denoted as A1 , A2 , , Al , AL , respectively, and each activity Ai of BLj possesses a planned end time PETji , a planned duration PDji , an actual end time AETji , and an actual duration ADji . Besides, each Ai of BLj is attached with a slack time stji to reduce the risk of delay. If the block does not go through activity Ai , PDji equals 0 . Then the end time balance EDji = AETji PETji will indicate whether an activity is completed on time or not, while the duration balance DDji = ADji PDji will tell whether the planned duration matches the practical needs. By the time an activity Ai is finished, there will be an aggregated slack time k=i astji = k = 1 stjk left as planned, in contrast to the actual residual slack time rastji = PETji + stji AETji = stji EDji . The end time parameters (PET , AET and ED ) are not suitable for evaluating the DMUs. Fig. 4 illustrates a sketch of the relationship between these parameters. It can be seen that if Ai is not completed on time, the slack time that the subsequent activity Ai + 1 consumes is rastji + stji + 1 rastji + 1 or EDji + 1 + stji EDji , rather than stji + 1 EDji + 1. It implies that whether an activity can be completed on time or not is related with the completion state of its precedent activity and its associated slack time. Particularly, Ai + 2 will be completed as planned even if the actual duration might be longer than the planned one. The value of stji + 2 will be accepted in the previous studies as the activity is accomplished on time. However, in the view of lean manufacturing, stji + 2 is not the actual effective slack time. It can be seen from Fig. 4 that DDji + 1 equals EDji + 1 + stji EDji , indicating that DD can represent how much slack time is essentially required by a specific activity. Hence, the balance of the slack time for a single activity Ai can be calculated by SDji = DDji stji . While DD represents the inconsistency between the planned duration and the real requirement [15], SD indicates the inappropriate estimation of fluctuations of the activity time. Specifically, a positive SD (give less buffer time than needed) may induce a delay in the subsequent activities and even pose a risk to miss the ultimate deadline. On the opposite side, a negative SD may cause extra inventory cost and undermine the production efficiency. Both DD and SD stand for the deviations from the plan. The greater the absolute values of these two factors are, the less effective is the planned time arrangement. As regard to a concrete category of blocks, if there is a large quantity of non-zero DD (or SD ) for an activity, managerial actions will have to be taken in two circumstances. Firstly, if these non-zero values vary sharply, it means that the activity incorporates many uncertainties, then managers should take some on-site control measures to help improve the schedule. Otherwise, if these nonzero values do not change much, it implicates that by modifying the previous template, it will be able to more accurately estimate the time required by an activity. To summarize, DD and SD illustrate the inappropriate degree of PD and st , respectively. Furthermore, a wider range of the value of DD (or SD ) implies the less reliability of PD (or st ), so lesser will be the contribution of PD (or st ) to the efficiency evaluation. Based on the above analysis, [MH, PD, st]T and [L, W]T are chosen as the inputs and the outputs of the DMU (a manufacturing activity), respectively. In addition, DD and SD are chosen to express the contribution degree of PD and st to the efficiency evaluation, respectively. The two reasons why AD is not selected are as follows: first, the purpose of this research is to find the efficient time arrangement for activities to manufacture different blocks, namely the duration and the slack time, rather than the duration alone. It is difficult to guarantee that an activity is absolutely completed in a concrete and tight duration. This is particularly true in the shipbuilding industry where an activity is composed of many process tasks and the semi-products produced by the precedent activity need to be transported to the place where the subsequent activity is carried out. Secondly, DD is more valuable than AD as DD connects the priori knowledge of planners and the real practice. It is very dangerous to merely use AD as the instructor for planning

Table 2 Candidate information for the DMU (obtained through IBIM). Identifier

Planning

Execution

Material volume

blockId

Planned end time (PET ) Planned slack time (st ) Planned activity duration (PD )

Actual end time ( AET ) Actual activity duration ( AD ) Man hour (MH)

Welding length (L ) Block weight (W )

32

Advanced Engineering Informatics 39 (2019) 25–40

J. Li et al.

Fig. 4. Sketch of the relationship among time parameters.

because workers would cheat by a low-level production rate. Decide the weight for each input and output: as explained before, not all the inputs and outputs contribute equally to a DMU’s efficiency. To distinguish the importance of different indicators (inputs and outputs), information entropy is introduced to assign weights [21]. In this context, if the value of an indicator varies significantly, the information entropy weight will be small, otherwise, the entropy weight of this indicator will be large. It is clear that only the same activity to manufacture blocks in the same cluster has the comparability, so the weight decided by the entropy is calculated by the following three steps: (1) form the weight calculation matrix Suppose that there are z indicators, among which m are for inputs and s = z m are for outputs. For each indicator Xi , the value will be formalized into the interval of [1,2] by

(

x ijcl = x ijcl + max (xijcl ) 1 j nc

= 1, 2,

)

2 min (x ijcl ) /( max (xijcl ) 1 j nc

1 j nc

the indicator Xi , if all the blocks have the same value, then ei will be maximum, otherwise, the greater the range is, the smaller will be ei . Establish the DEA model: based on the analysis above, the entropy-based SBM model, one kind of the adapted DEA, is applied and shown as follows [21]:

min s. t .

where nc is the number of blocks in a block cluster c and x ijcl stands for the value of l th activity for the j th block. If max (x ijcl ) = min (x ijcl ) ,

Mwcl = [x ijcl ]z × nc =

1 j nc

nc . Then the weight calculation matrix will be as cl x11

x1clj

x1clnc

cl x 21

x 2clj

x 2clnc

xzcl1

x zjcl

cl xzn c

(2) calculate the information entropy Normalize Mwcl by rows and obtain:

[pij ]z × nc = x ijcl /

nc j =1

x ijcl

k

nc j=1

pij ln(pij ),

i

+ Lj

xoj

Lk+ n i=1

i

yok

j = 1, 2, k = 1, 2,

= 1,

m r j=1 j

= 1,

Lj

0,

s k=1 Lk+

,m ,s

0

i

ok = 1 0

With all the efficient DMUs, estimation models can be established to simulate relationships between the product features and the efficient time arrangement. Each estimation model concerns the time estimation for an activity to manufacture a block in a certain cluster. Thus in this stage, the input and the output are [L, W]T and [MH, PD, st]T , respectively. Since [L, W]T alone is inapt to reflect the complexity of a block, the realm of the block cluster is considered. So if there are nc clusters and AL manufacturing activities, nc × AL estimation models will be used to cover the time arrangement knowledge. Regarding the way to reuse the knowledge, blocks are initially classified into specific block clusters based on their structural features (using the cluster centers’ points

z× nc

i

xij

ok Lk+ s k = 1 yok

4.4. Neural network models trained for knowledge reuse

Then the information entropy of indicator Xi will be

ei =

n i=1

)/ 1 +

m r j Lj j = 1 x oj

Amongst o represents the efficiency score of DMUo (the DMU to be evaluated) and x oj , yok indicate its j th input and k th output, respectively. In the same way, x ij and yik are the j th input and k th of DMUi . i is the weight of DMUi , r j and ok stand for the j th input weight and the k th output weight. Lj is the input redundancy and Lk+ is the output deficiency. If all slack variables of Lj and Lk+ equal 0, then the DMU will be efficient, otherwise the DMU will be inefficient. By adding Lj , Lk+ to x oj , yok respectively, the inefficient DMUs could be transformed into virtual efficient DMUs [16]. Different from the normal entropy-based method that adopted selfweighted scheme [21,24,25], this paper takes DD and SD as the weight indicator for PD and st , respectively. Besides, the weight is associated with the practical meaning. Fig. 5 illustrates the proposed entropy SBM model.

min (xijcl )), j

1 j nc

1 j nc

(

= 1

n y i = 1 ik i

nc

x ijcl = 1, j = 1, 2, follows:

o

{1, 2 z }

where k = 1/ln(n) is Boltzmann constant and 0 ei < 1. (3) calculate the weigh for each indicator m The weights are calculated. ri = ei / i = 1 ei , i = 1, , m . is the i th z input’s weight and oi m = ei / i = m + 1 ei , i = m + 1, , z. is the output’s s m weight. Obviously, i = 1 ri = 1 and i = 1 oi = 1. It can be known that for 33

Advanced Engineering Informatics 39 (2019) 25–40

J. Li et al.

Fig. 5. The entropy SBM model.

Fig. 6. Diagram of BPNN.

resulted from Stage 1) and after that the estimation model for each activity is applied consecutively. Finally, the man hour, the estimated duration and slack time for all activities are obtained. Neural networks are more capable of mimicking the multivariate non-linear relationships, which provides a new way to solve complex problems [12]. A very popular type of neural networks is the backpropagation neural network (BPNN) which uses the gradient descent optimization algorithm to adjust the weight of neurons based on the prediction error [11,31]. Fig. 6 outlines the classical BPNN with three layers: the input layer, the hidden layer and the output layer. Each layer may have more than one neuron that sums the weighted information from neurons in the previous layer. In addition, except for neurons in the input layer, every neuron possesses an activation function to filter the summed data, making sure that the learned relationship is correct. The learning process of BPNN can be described as dynamically adjusting the weight and threshold to minimize the prediction error. It has been proven that a three-layer BPNN has the ability to simulate various nonlinear relationships with the required accuracy [34]. However, the BPNN cannot handle discontinuous optimal criteria and is prone to get into local minimum [35]. To overcome these disadvantages, researchers have introduced a genetic algorithm (GA) into BPNN for the abilities of global optimization, non-continuous function processing and converging in a short time [36]. Briefly, GA is called in

each training loop to find the optimal weight and bias for each neuron through training additional BPNNs. The optimization is achieved by encoding the weight and bias of a BPNN as a chromosome whose fitness is the reciprocal value of the prediction error, then through cross, mutation and selection in the evolution, seeking the global best chromosome (with minimum prediction error obtained). Afterwards, the best chromosome (weights and the bias) is used as the initial setting to train the wanted neural network. NN avoids falling into the local minimum traps, standing in a remarkable contrast to the traditional BPNN. To obtain much information of GA-BPNN, please refer to [35,36]. For the sake of accuracy, GA-BPNN models are trained in this context. In each model, there are two neurons (for [L, W]T ) in the input layer and three neurons (for [MH, PD, st]T ) in the output layer. Each model employs one hidden layer. Since the number of neurons in this layer would impact the final precision of the model, the neuron number neuronnum is calculated by [ m + s ] neuronnum [ m + s ] + b , where m and s are the number of neurons in the input layer and the output layer, respectively and b is a random integer between 1 and 10 [10]. The final neuron number in the hidden layer is determined by the minimum of mean absolute percentage error after running 50 times for each neuronnum . 34

Advanced Engineering Informatics 39 (2019) 25–40

J. Li et al.

Fig. 7. Framework for the validation.

5. Experimental evaluation

based method (PSB) were tested, respectively. All experiments were run on a computer equipped with an Intel Core i7-4790 3.60 GHz processor and 8 GB of RAM. Details of the validation are given below.

To validate the proposed method, experiments were conducted based on the real data from a partner shipyard in China mainland. The shipyard runs the main business of design and construction of marine vessels and offshore products, including Bulk Carriers, Oil Tankers and Container Ships. Key resources in the shipyard are outfitting quays, dry docks, 800-ton and 600-ton gantry cranes. So, the weight of a grand block does not exceed 600 ton or 800 ton. At present, there are seven fixed activities in the block manufacturing process, including preprocessing, laying-off & cutting, manufacture of small and medium section, construction on the pin-jig site, integrity checking & pre-outfitting, painting, and erection (marked as A1 , A2 , , A7 below), respectively. The company has accumulated a pile of planning and scheduling data for standard engineering. However, the precision of parent ship based method does not satisfy the current need. Fig. 7 presents the framework for the validation. Firstly, data was gathered from several business databases through IBIM and after checking completeness and reliability, the IBIM of 489 blocks were finally selected. Then the IBIMs were randomly divided into two sets following the 8–2 principle: 391 IBIMs as the training set and 98 IBIMs as the evaluation set. The training set aimed at discovering the efficient time arrangement for all activities and learning their relationships with the blocks’ features. In this set, the block feature data, the planned information and the actual execution data was mainly used. The evaluation set was applied to evaluate the effectiveness of the method based on the comparison of the estimated time with the actual activity duration. In this set, blocks’ features worked as the input of the trained models and the set of the real activity duration was the evaluation baseline. Since the proposed method was completely new, it was necessary to conduct comparative experiments to check whether the three-stage method was advantageous. Based on this recognition, methods in this series that without GA (NOGA), with the traditional SBM, with the normal entropy SBM (NENSBM) and without clustering (NOC), together with the actual duration trained scenario (ADT) and the parent ship

5.1. Accuracy indicators Before the validation, accuracy indicators should be clarified. Prevalent accuracy parameters include the mean absolute percentage error (MAPE) [1,9], the mean sum of squared errors (MSSE) [12], and the mean balance relative error (MBRE) [37]. The closer these indicators would be to zero, the more precise would be the estimation. Since MSSE has the similar function with MAPE, this paper adopts MAPE and MBRE as the accuracy indicators. MAPE tells the average error degree regardless of whether the estimated time is longer or shorter than the actual, while MBRE represents the average deviation from the actual situation. Additionally, if an estimation model is well enough, the estimation error should follow a normal distribution with a zero as the mean [11]. So the error mean (e¯ ) should be one of the evaluation indicators, too. Upon comparing this research with other methods, it is rational to use the joint hypotheses test (F-test) based on the variance of errors (S 2 ) of two estimation models to ascertain whether the precision of the different methods shows remarkable difference. If the observed value of F is more than the corresponding critical value F under a confidence coefficient , it represents that the two models obviously have different accuracies. Calculations of these parameters are shown as follows:

ei = DiEstimated e¯ =

1 n

Di Actual n e i=1 i

MAPE = MBRE =

n i=1

1 n

F = S12/ S22 = ( n

1

1 1

n i=1

1 n

|ei|/ DiActual

(ei)/min(Di Actual , DiEstimated ) n1 i=1

(ei

e¯))/( n

2

1 1

n2 j=1

(ej

e¯'))

It is noticeable that the estimation result of this research comprises two 35

Advanced Engineering Informatics 39 (2019) 25–40

J. Li et al.

Table 3 Block clustering result with

Count Inner average distance (IAD) Outer average distance (OAD)

= 0.1. Cluster 1

Cluster 2

Cluster 3

88 0.029 0.400

134 0.016 0.230

169 0.011 0.275

time attributes ([PD, st]T ) while the comparison is merely with the actual duration ( AD ). Thus DiEstimated has two situations: (1) the estimated duration only and (2) the sum of the duration and the slack time. As mentioned before, in the context of shipbuilding, it is difficult to guarantee that an activity is absolutely completed within a concrete and lean duration. Therefore, the slack time is necessary in the plan to alleviate the execution fluctuations. In this sense, the latter situation is considered here, namely DiEstimated = PDi + st i and Di Actual = ADi . On the other hand, for a holistic view of the block manufacturing process, it is helpful to evaluate the estimation of all the activities as a whole. Thus, for the manufacturing process of a single block i , l l DiEstimated = a = 1 (PDa + st a) and Di Actual = a = 1 ADa . As l = 7 in this case, the evaluation result ultimately has 7 sets for respective activities and 1 set for the holistic view.

Fig. 8. MAPE of the GA-BPNN models.

the learning rate 0.01, the maximum iteration epoch 1000, the precision goal e 7 , the maximum generation of GA (maxgen ) 10, the population size of GA (sizepop ) 30, the mutation possibility of GA 0.1, and the cross possibility of GA 0.3. The fitness function was the inverse of MAPE . After repetitively running 50 times for each neuronnum , the GABPNN model of the best performance (the minimum of MAPE ) was selected. Fig. 8 exhibits the final MAPE of each selected estimation model. Even the biggest MAPE was less than 0.01, so the models were apt for use. To evaluate the accuracy, the test set (98 IBIMs) was first classified into the three clusters based on the block features. Consequently, 85 out of 98 were dropped into the first cluster, standing in a big contrast to that fell in the second cluster (only 3 blocks) and the third cluster (10 blocks), respectively. Next, the duration and the slack time of each activity of every block were estimated using the respective GA-BPNN model. After that, estimation errors were calculated according to the accuracy indicators described in Section 5.1. Table 4 shows the estimation accuracy of this research. It can be seen that time estimation for A1 was the best with the highest correct rate of 42.86%. The MAPE was less than 0.5, indicating that the improper estimation degree basically did not exceed half of the needed. Meanwhile, the MBRE was slightly less than the MAPE, which implied that some of the estimated time was less than the needed (ei < 0 ) but the shortage was not predominant. In short, the estimated result was less likely to cause the execution delay and if needed more preferable time arrangements, managers were only required to modify a little with the estimated result, otherwise, the result was relatively reliable for direct usage. On the other hand, A 4 had the minimum MAPE and MBRE, namely all individual errors were quite small. Even though the correct rate was not high, this method was suitable for the time estimation of A 4 . A6 was also acceptable as e¯ was negligible and the average estimation deviation was less than 3 days. Despite a little lower accuracy, the estimation for A2 and A3 , compared with the result of PSB (see Fig. 8), were obviously preferable. It was also observed that the estimation for A7 was amongst the best results of the different methods. However, time estimation for A5 , namely integrity checking & preoutfitting, was poor. The MAPE reached up to 0.50 and the error mean was 4.30 days which accounted for 18.86% of the average actual manufacturing time. This indicated that the estimated result was heavily unreliable. However, seen from Fig. 7, the GA-BPNN for A5 was well trained with a MAPE less than 0.01. So, reasons for the imprecision might be twofold: (1) the training data set could not well represent the inner law that how much time was required by the activity for lean manufacturing; (2) the execution of this activity contained many uncertainties. To find out the real reason, the Kolmogorov–Smirnov test was conducted and it turned out to be true that the training set and the test set did not obey the same distribution. By further investigating the actual activity duration for different blocks, the sample variances of the training data set and the test data set were 40.32 and 70.84,

5.2. Effectiveness of the proposed method With the training set, the proposed K-Means clustering method was first conducted. The expert suggested that the distance between clusters was dominated and was equaled to 0.1. Experiments have been performed using the algorithm as shown in Section 4.2. Finally, the combination of the cluster number being three, the ‘Correlation Distance’ and the ‘Sample’ start method was chosen because the clUtility was minimized (0.67 vs the maximum equaling to 11.99). Table 3 lists the information of each block cluster. Then, the planning data and the production data of the clustered blocks was processed to conduct the DEA evaluation. Each entropy SBM model evaluated the time arrangement of a specific activity to manufacture a certain cluster of blocks. As there were 7 activities and 3 block clusters in this case, 3 × 7 instances of the proposed DEA model were built. Inside the model, a DMU was evaluated by the time data of an individual activity of a block: [MH , PD , st ]T and [W , L]T were the inputs and the outputs, respectively and [MH , DD , SD, W , L]T were used for weight calculation using the algorithm stated in Section 4.3. Noting that if the slack variable of outputs exceeded 0.5, it meant that the given input in the planning could produce far more than the expected outputs, then the planning practice would be too heavily ill to be allowed in a real circumstance. For this reason, a limit of 0.5 was adopted for the slack variables. Consequently, there were (7, 10, 20, 10, 14, 11, 24) out of 88 DMUs in block cluster 1, (8, 16, 11, 14, 15, 17, 42) out of 134 DMUs in block cluster 2, and (9, 20, 22, 12, 26, 17, 29) out of 169 DMUs in block 3, that have been identified as the efficient time arrangement for the seven respective activities in the shipyard. This also meant that from the perspective of a single activity, there were (24, 46, 53, 36, 55, 45, 95) out of 391 DMUs that enjoyed a preferable time arrangement. Regarding the remaining inefficient DMUs for each activity, corresponding slack variables were added to the inputs and outputs in proportion to the respective scale. Ultimately, a set of the efficient time arrangement for every activity, including the planned duration (PD), slack time (st), welding length (L ) and block weight (W ) of all the training 391 IBIMs, were obtained. Using this result, GA-BPNN models were trained for further time estimation. In accordance with the previous step, there were 3 × 7 GABPNN models in total. As regards every GA-BPNN model, the input was [W , L]T and the output was [MH , PD , st ]T of the activity to manufacture a certain type of blocks. The parameters used for GA-BPNN were as follows: data division rule of 8:1:1 (training set: validation set: test set), 36

Advanced Engineering Informatics 39 (2019) 25–40

J. Li et al.

Table 4 Estimation accuracy of this research with n = 98.

Activity 1 A1 Activity 2 A2 Activity 3 A3 Activity 4 A 4 Activity 5 A5 Activity 6 A6 Activity 7 A7 The manufacturing process

MAPE

MBRE

e¯ (day)

Number of ei = 0

Correct rate

Average DiActual (day)

0.32 0.34 0.50 0.17 0.50 0.29 0.39 0.15

0.25 0.31 0.29 −0.03 0.35 0.05 0.03 0.07

0.44 2.66 2.29 −1.29 4.30 0.15 −1.00 7.54

42 9 2 3 3 10 4 1

42.86% 9.18% 2.04% 3.06% 3.06% 10.20% 4.08% 1.02%

2.70 11.0 17.8 30.8 22.8 9.60 42.9 137.6

respectively, while the mean durations were only 26.10 and 22.80 days. Therefore, the second reason was also true. To overcome these problems, a larger set of good training data should have been obtained and trained for better precision, meanwhile more managerial attention should have been paid on A5 in order to reduce the fluctuation in manufacturing time. From the general view of the whole manufacturing process, the MAPE and MBRE were rather small and although the error mean was as high as 7.54 days, the additional given time was only about 5% of the average wanted manufacturing duration (137.6 days).

same circumstance. The experiments included the proposed method with the inner method replaced, such as a substitute for GA-BPNN (NOGA), two substitutes for the adapted entropy-based SBM (SBM and NENSBM), and ignorance of the clustering and evaluating stage (NOC). In addition, the estimation values resulted by directly training the actual data (ADT) and by the parent ship based method (PSB) were also acquired. Fig. 9 exhibits the accuracy indicators of these methods for the seven activities and the block manufacturing process. It can be seen that the accuracy of the proposed method was almost the highest, except for A3 , A5 and the whole process. By contrast, the ADT only outperformed in A3 . Further, it was outstanding that the estimation accuracy differed from one activity to another even though applying the same method. Almost all the methods achieved a good estimation for A1 with the average error mean being less than 0.5 day whereas for A2 , even the

5.3. Comparison with other methods To verify that the proposed approach was advantageous, comparative experiments with different methods were performed under the

Fig. 9. Comparison of errors resulted from different methods. 37

Advanced Engineering Informatics 39 (2019) 25–40

J. Li et al.

best performance had an error mean equaled to 2.66 days. This verified that activities with different complexities would result in the differences of the estimation precision. By adopting the confidence coefficient = 0.95, the critical values F0.95,97,97 = 1.3989 and F0.05,97,97 = 0.7767 of the F-distribution were obtained. If the observed value F was not within [0.777, 1.319], there would be a significant discrepancy between the accuracy of the two methods. Comparison in terms of solution quality and fastness of these methods has been discussed below. Firstly, the influence of applying GA was observed. The total time of the proposed method amounted to 2486 s of which training the GABPNNs occupied over 76% (1910 s). In extreme cases, the additional GA segment ran b × maxgen × (sizepop + 1) times in order to find the optimal weights and thresholds for an individual BPNN. In this research, the number was 10 × 10 × (30 + 1) = 3100 . Meanwhile because the iteration epoch of training a single BPNN was set to 1000 and this research needed 21 GA-BPNNs, there were 3100 × 1000 × 21 = 6.5 × 107 more calculation loops in total, which resulted in the significant increase in computing time. Indeed, without the optimization procedure of GA (NOGA), the time was remarkably cut down to 42 s, which was nearly one-fiftieth of the former. However, it is obviously seen from Fig. 8 that except for A5 with an error mean of 0.66 day, NOGA showed no advantage over the present method. Particularly for A2 , A 4 , A6 , the error mean of NOGA was almost twice to that of the proposed method. As regard the whole manufacturing process, the error mean, MAPE and MBRE of the NOGA method were all bigger than the proposed method. Meanwhile the F equaled 2.348, so the accuracy of the proposed method was remarkably higher than NOGA. Next, different SBM models were examined. Time consumed by the SBM method, the normal entropy SBM [21] and this research saw a slight difference (526 s vs. 528 s vs. 533 s). Furthermore, the number of efficient DMUs for each activity were also the same. However, by scrutinizing the result, differences were found in the slack variables and the efficiency value. Because of data standardization, even 0.1 of the slack variable might represent 100 in the original measurement. As a result, the difference of slack variables affected the estimation precision. This was obvious in terms of the MBRE and the error mean in Fig. 8. In contrast to ENSBM, the proposed method almost outdid in all the activities except A5 . The result verified the precondition of this research that if the manufacturing time varied sharply, the contribution degree to benchmarking should have been lower rather than higher. On the other hand, the proposed method had subtle advantage over SBM in the single activity time estimation, except that SBM was superior to all the other methods for A5 . However, SBM seemed more suitable to the estimation of the whole manufacturing process as the error mean was the smallest (less than 1 day) and the correct rate was the highest, reaching up to 3.06%. It was worth to note that SBM introduced an error mean of −7.30 days for A7 , which constituted a lot to the final error mean of the whole manufacturing process. Nonetheless, the error

mean of our method for A7 was only −1 day and the F was 2.35, indicating that our method was much better than SBM. Therefore, we could merely admit that SBM was the most advantageous method for the time estimation of A5 which saw strong fluctuations in the manufacturing time. Then, the method without clustering and evaluating (NOC) was investigated. Not partitioning the data space, NOC only trained 7 GABPNNs which consumed 193 s. As analyzed before, the calculation of a GA-BPNN required 3.1 × 107 more times than that of a normal BPNN in the worst case. So there was no doubt the training process of 7 GABPNNs was far faster than the proposed method of 21 GA-BPNNs. However, it can be seen from Fig. 8 that the precision of NOC was much lower than the proposed method. In addition to the series methods in line with this research, ADT and PSB were also tested. Drawn from Fig. 8, the estimation errors of the ADT were the largest amongst the peer methods. Through analysis, one critical reason was that the law of the actual data was hardly captured with a relatively small sample, especially in consideration of the potential false data. If the actual manufacturing time of an activity changed a lot and the training data happened to be not so good enough, there was no doubt that the performance of neural networks was unexpected. In such a circumstance, other statistical methods were more attractive. In contrast, PSB had more reliable results in spite of the noticeable imprecision. Lastly, attention was paid on the clustering step which would introduce some imprecision. K-Means, an unsupervised clustering method, was used since we intended to group intermediate products by structural features. In other words, the law of data was more decisive than the expertise. A main drawback was that the correctness of the clustering result was not guaranteed with certainty. As much as possible to avoid this, an algorithm was designed to find the preferable clustering result for a concrete given by the experts. Consequently, block clusters with minimum clUtility were eventually discovered, but at the cost of calculating time. In extreme cases, using this algorithm con(clusterMax 1) × dm × im × 10 sumed at most 10 = ([0.02 × 391] 1) × 4 × 3 × 10 10 = 830 times and at least 2 × dm × im × 10 10 = 230 times more than the non-use condition. In fact, the observed total consumption time was 0.80 s for = 0.1, which was considered acceptable. Since depends on the experts, if not explicitly given, there are two means to get the preferable clustering result: (1) traversing to find an objective result (2) discarding and adopting the minimum disin/ disout as the criteria To examine the first means, the sensitive analysis of was conducted. As clUtility was biased, disout / disin was additionally calculated for the preferable clustering result of each . The result is shown in Table 5. It can be seen that the preferable clustering result was slightly sensitive to , ranging from 3 clusters to 4 clusters as ascends from 0

Table 5 Sensitive analysis of . Combination

= = = = = = = = = = =

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Count of clusters

Count of each cluster

MinimumclUtility

Distance ratio (disout / disin )

Time consumption (s)

3 3 3 4 4 4 4 4 4 4 4

94;135;162 88;134;169 80;129;182 40;87;130;134 40;87;130;134 81;83;96;131 36;86;131;138 81;83;99;128 74;82;112;123 81;83;96;131 40;87;130;134

< 0.01 0.67 0.59 0.77 1.02 1.28 1.53 1.78 2.04 2.30 2.55

4.15 15.98 3.73 6.32 7.69 9.81 4.72 8.86 9.36 5.00 3.37

1.11 0.80 3.38 4.02 3.96 3.93 4.14 4.05 4.04 4.20 4.03

Total time consumed: 37.66 s

38

Advanced Engineering Informatics 39 (2019) 25–40

J. Li et al.

to 1 in an interval of 0.1. = 0 and clUtility < 0.01 meant that there existed a clustering result with a maximum disout more than 100 while = 1 and clUtility = 2.55 meant that a preferable clustering result could have a minimum disin = 2.55. So the ideal (upper limit) distance ratio was more than 39.22. However, it was observed that the highest distance ratio was 15.98, which implies that the clustering process had room for improvements. The second means was tested under the same circumstance, namely the same algorithm with substituting disin/ disout for clUtility . The result was very interesting that the preferable cluster number was also 3 and there were 169, 88 and 134 blocks classified into the clusters, respectively. The largest distance ratio was the same as that of using clUtility but the computing time was 0.67 s, only about 1.8% of that costed by using clUtility . Therefore, it was plausible to use the algorithm with disin/ disout to rapidly acquire the preferable result, if the experts did not give any suggestion for the clustering process.

enlarge the training set with more valid practical data. For the sake of fastness, revisions of the algorithms, particularly in consideration of the number of clusters, are on urgent demands. In addition, the way to conveniently use this method should be studied, such as developing a user-friend software or combining the MATLAB files with the existing planning tools etc. Acknowledgement The research programs “Large Cruise Research and Development Project (2017)” and “Research on Key Common Technologies towards Smart Manufacturing in Shipbuilding Industry” are funded by Ministry of Industry and Information Technology of the People’s Republic of China [grant number 2016543], and this paper is also supported by National Natural Science Foundation of China [grant number 51679059]. The authors are responsible for the contents of this publication. Besides, the authors would like to thank VISHWANATH POONEETH for his contribution to the writing quality.

5.4. Managerial implications Key findings and experimental observations could be summarized as managerial implications that are beneficial for the shipbuilding planners. Firstly, more reliable time arrangement is obtained by taking the ship block as the reference object, rather than the parent ship, so managers are encouraged to change their traditional working approach when making the block manufacturing schedule. Secondly, grouping the blocks into small clusters makes it conducive to estimate the time accurately. Therefore, planners will benefit more if they could integrate their expertise of classification with the present clustering process that is based on the objective data. Thirdly, prior to seeking different estimation models for higher precision, activities (such as A5 ) that see significant time fluctuations require control measurements to be taken in the execution environment. Finally, planners should understand that the way of how the historical data is analyzed (or processed) does affect the time estimation precision. This was verified by comparing the proposed method with ENSBM, NOGA, and NOC, respectively. Therefore, it would be apt to compare the different results under the practical circumstance before the final decision.

Appendix A. Supplementary material Supplementary data to this article can be found online at https:// doi.org/10.1016/j.aei.2018.11.005. References [1] H. Golizadeh, et al., Automated tool for predicting duration of construction activities in tropical countries, KSCE J. Civil Eng. 20 (1) (2016) 12–22. [2] R.Y. Zhong, et al., A big data approach for logistics trajectory discovery from RFIDenabled production data, Int. J. Prod. Econ. 165 (2015) 260–272. [3] C. Hendrickson, D. Martinelli, D. Rehak, Hierarchical rule-based activity duration estimation, J. Constr. Eng. Manage. 113 (2) (1987) 288–301. [4] K.H.R.P. Reviewer, A guide to the project management body of knowledge (PMBOK® guide)—fifth edition, Project Manage. J. 44 (3) (2013) 642–651. [5] S.K. Lee, et al., Mining transportation logs for understanding the after-assembly block manufacturing process in the shipbuilding industry, Expert Syst. Appl. 40 (1) (2013) 83–95. [6] C.D. Rose, J.M.G. Coenen, Automatic generation of a section building planning for constructing complex ships in European shipyards, Int. J. Prod. Res. 54 (22) (2016) 6848–6859. [7] D. Lee, et al., Clustering and Operation Analysis for Assembly Blocks Using Process Mining in Shipbuilding Industry, Springer International Publishing, 2013, pp. 67–80. [8] C. Reuter, et al., Improving data consistency in production control by adaptation of data mining algorithms, Procedia Cirp. 41 (2016) 51–56. [9] K.J. Kim, W.G. Yun, I.K. Kim, Estimating approximate construction duration of CFRD in the planning stage, KSCE J. Civil Eng. 20 (7) (2016) 1–10. [10] J.Z. Qu Shipeng, Man-hour calculation methods of the block assembly for shipbuilding, J. Harbin Eng. Univ. 33 (5) (2012) 550–555. [11] B. Liu, Z.H. Jiang, The Man-hour Estimation Models & Its Comparison of Interim Products Assembly for Shipbuilding, 2004, vol. 2, p. 9–14. [12] B. Liu, Z.H. Jiang, The intelligent man -hour estimate technique of assembly for shipbuilding, J. Shang Hai Jiao Tong Univ. 39 (12) (2005) 1979–1983. [13] R.Y. Zhong, et al., Mining SOTs and dispatching rules from RFID-enabled real-time shopfloor production data, J. Intell. Manuf. 25 (4) (2014) 825–843. [14] P. Bubenik, F. Horak, Knowledge-based systems to support production planning, Tehnicki Vjesnik 21 (3) (2014) 505–509. [15] J. Park, D. Lee, J. Zhu, An integrated approach for ship block manufacturing process performance evaluation: case from a Korean shipbuilding company, Int. J. Prod. Econ. 156 (2014) 214–222. [16] H.B. Kwon, J.H. Marvel, J.J. Roh, Three-stage performance modeling using DEA–BPNN for better practice benchmarking, Expert Syst. Appl. 71 (2016) 429–441. [17] A. Azadeh, R. Kokabi, Z-number DEA: a new possibilistic DEA in the context of Znumbers, Adv. Eng. Inf. 30 (3) (2016) 604–617. [18] R. Lin, Allocating fixed costs or resources and setting targets via data envelopment analysis, Appl. Math. Comput. 217 (13) (2011) 6349–6358. [19] A. Charnes, W.W. Cooper, E. Rhodes, Measuring the efficiency of decision making units, Eur. J. Oper. Res. 2 (6) (1978) 429–444. [20] S. Lim, J. Zhu, Incorporating performance measures with target levels in data envelopment analysis, Eur. J. Oper. Res. 231 (3) (2013) 790-790. [21] H. Zhang, J. Yang, X. Su, Using entropy-based super SBM model for evaluating flexible manufacturing systems, Modern Manuf. Eng. 34 (7) (2014) 16–20. [22] A. Amirteimoori, et al., Production planning in data envelopment analysis without explicit inputs, Rairo Recherche Operationnelle 47 (3) (2013) 273–284. [23] K. Tone, A slacks-based measure of efficiency in data envelopment analysis, Eur. J. Oper. Res. 130 (3) (2001) 498–509. [24] Y. Zhou, et al., Environmental efficiency analysis of power industry in China based

6. Conclusions The large number of ship blocks posed a great challenge to the previous activity time estimation methods. To overcome it, this paper presented a three-stage method to discover and reuse the knowledge of making efficient time arrangements for the manufacturing activities while considering the ship blocks’ features. The knowledge (the complex non-linear relationship between the block features and the efficient time arrangement) was excavated from the historical data and was then stored by neural network models, in contrast to the deterministic mathematical models or the descriptive rules adopted in other studies. By using these neural network models, planners will be able to rapidly make a more appropriate schedule for the manufacture of a new ship, which reduces the resource wastage and the risk of delivery delay. It has been verified that the proposed method outperformed several methods in terms of accuracy, except for those activities of high uncertainty. Despite the potentials, there are still some limitations that require further research. First of all, this method is designed for the block manufacturing process which is relatively fixed and the slack time between two manufacturing activities being necessary. Hence, practicality in other fields remains to be seen. Secondly, it will be valuable to improve the clustering and training performance in the following research. A direction is to introduce supervised learning methods and more expert experience into the proposed method, in addition to 39

Advanced Engineering Informatics 39 (2019) 25–40

J. Li et al. on an entropy SBM model, Energy Policy 57 (7) (2013) 68–75. [25] M. Soleimani-Damaneh, M. Zarepisheh, Shannon’s entropy for combining the efficiency results of different DEA models: method and application, Expert Syst. Appl. 36 (3) (2009) 5146–5150. [26] Y.W. Bian, Y. Feng, Resource and environment efficiency analysis of provinces in China: a DEA approach based on Shannon's entropy, Energy Policy 38 (4) (2010) 1909–1917. [27] R. Kretschmer, et al., Knowledge-based design for assembly in agile manufacturing by using Data Mining methods, Adv. Eng. Inf. 33 (2017) 285–299. [28] P. Geyer, A. Schlüter, S. Cisar, Application of clustering for the development of retrofit strategies for large building stocks, Adv. Eng. Inf. 31 (2017) 32–47. [29] W.H. Hung, S.C.J. Kang, Automatic clustering method for real-time construction simulation, Adv. Eng. Inf. 28 (2) (2014) 138–152. [30] Z.H. Cai, L.J. Feng, Rapid interim product man-hour ration estimation with artificial neural network method, Journal of East China Shipbuilding Institute 17(2) (2003) 23–28.

[31] M. Chen, B. Pan, J. Liu, Man-hour Calculation of Working Package using Error Backpropagation Artificial Neural Network, Shipbuilding of China 10 (2) (2003) 65–73. [32] S. Wold, K. Esbensen, P. Geladi, Principal component analysis, Chemometr. Intell. Lab. Syst. 2 (1) (1987) 37–52. [33] P.S. Bradley, U.M. Fayyad, Refining initial points for K-means clustering, in: Fifteenth International Conference on Machine Learning, 1998. [34] K. Hornik, M. Stinchcombe, H. White, Multilayer feedforward networks are universal approximators, Neural Networks 2 (5) (1989) 359–366. [35] D.J. Montana, L. Davis, Training feedforward neural networks using genetic algorithms, in: Proc. of International Joint Conference on Artificial Intelligence, 1989. [36] Z. Fu, et al., Using genetic algorithm-back propagation neural network prediction and finite-element model simulation to optimize the process of multiple-step incremental air-bending forming of sheet metal, Mater. Des. 31 (1) (2010) 267–277. [37] C.H. Tan, et al., Application of fuzzy inference rules to early semi-automatic estimation of activity duration in software project management, IEEE Trans. Hum.Mach. Syst. 44 (5) (2014) 678–688.

40