Input data management in simulation – Industrial practices and future trends

Input data management in simulation – Industrial practices and future trends

Simulation Modelling Practice and Theory 29 (2012) 181–192 Contents lists available at SciVerse ScienceDirect Simulation Modelling Practice and Theo...

604KB Sizes 0 Downloads 74 Views

Simulation Modelling Practice and Theory 29 (2012) 181–192

Contents lists available at SciVerse ScienceDirect

Simulation Modelling Practice and Theory journal homepage: www.elsevier.com/locate/simpat

Input data management in simulation – Industrial practices and future trends A. Skoogh a,⇑, T. Perera b, B. Johansson a a b

Chalmers University of Technology, Gothenburg, Sweden Sheffield Hallam University, Sheffield, UK

a r t i c l e

i n f o

Article history: Received 6 November 2011 Received in revised form 17 July 2012 Accepted 18 July 2012 Available online 6 September 2012 Keywords: Simulation Input data management Data collection Integration Interface Enterprise Resource Planning (ERP)

a b s t r a c t Discrete Event Simulation has been acknowledged as a strategically important tool in the development and improvement of production systems. However, it appears that companies are failing to reap full benefits of this powerful technology as the maintenance of simulation models has become very time-consuming, particularly due to vast amounts of data to be handled. Hence, an increased level of automation of input data handling is highly desirable. This paper presents the current practices relating to input data management and identifies further research and development required to achieve high levels of automation. A survey of simulation users shows that there has been a progress in the use of automated solutions compared to a similar study presented by Robertson and Perera in 2002. The results, however, reveal that around 80% of the users still rely on highly manual work procedures in input data management. Ó 2012 Elsevier B.V. All rights reserved.

1. Introduction Discrete Event Simulation, referred to as simulation hereafter, has proven to be an excellent modelling tool for analyzing and improving performance of manufacturing systems. Jahangirian et al. [1] state that ‘‘Over 60 years of simulation presence in the areas of manufacturing and business, has led to a wide spectrum of successful applications in different areas such as design, planning and control, strategy making, resource allocation, training, etc.’’. Proliferation of affordable and userfriendly simulation systems has immensely contributed to this rapid growth of applications. Ever increasing competitiveness and the need to reduce costs and lead times are continuing to drive the wider use of simulation. Producing credible simulation outputs within acceptable timescales is a key challenge. In order to ensure that all key steps are followed, various simulation project management frameworks have been produced [2–4]. Although there are some slight variations, all simulation project management frameworks embody key steps such as input data collection and analysis, model building, validation, and verification. Most of these steps interact with either input or output data. Consequently, management of data within simulation projects often becomes a major challenge. Multiple scenario analysis, a typical use of simulation models, further escalates this problem as further data sets are added. Within the context of this data management problem, collecting, analyzing and systematically recording simulation input data are vitally important. As the driver of simulation models, input data sets must be complete and accurate. If simulation models are to be re-used then it is also necessary to keep the data sets up-to-date. This is a time-consuming process and, consequently, the re-use of simulation models is often abandoned [5].

⇑ Corresponding author. Address: Department of Product and Production Development, Chalmers University of Technology, 412 96 Gothenburg, Sweden. Tel.: +46 (0)31 772 48 06. E-mail address: [email protected] (A. Skoogh). 1569-190X/$ - see front matter Ó 2012 Elsevier B.V. All rights reserved. http://dx.doi.org/10.1016/j.simpat.2012.07.009

182

A. Skoogh et al. / Simulation Modelling Practice and Theory 29 (2012) 181–192

Previous publications in the area have mainly focused on increasing efficiency in separate steps of the input data management procedure. Examples of such areas are: identification of input parameters [6], collection of data samples [7], and interoperability between data sources and simulation models [8]. However, there is also literature outlining efficient work procedures connecting these steps on a project level [9], and other publications describing test implementations increasing the level of automation in input data management [10]. Robertson and Perera [11] extensively discuss the issues involved in handling input data and explore options available to gather and record input data. Their study also included a survey of data management practices and is one of few publications taking a comprehensive approach to the input data management procedure, categorizing different data input methodologies such as those described in the previous paragraph. Since then there have been major shifts in simulation related software in terms of managing simulation data. Therefore, it is timely to review whether those shifts have made an impact on practices. This paper aims to identify and discuss the changes in practices as the results of advances in the input data management process itself and in associated support systems such as manufacturing databases and simulation software. The paper is structured as follows. Section 2 reviews previous literature related to input data management in simulation. The focus is on describing a paper by Robertson and Perera [11], which presents a reference survey on industrial practices in data collection during the late 90’s. In addition, new approaches and recent development in simulation software packages are reviewed. Section 3 presents the design of a survey which the authors handed out during the Winter Simulation Conference 2010. The aim of this survey was to map current practices in input data management and identify possible progresses since the reference study from 2002 [11]. Section 4 summarizes the survey results and Section 5 discusses and compares findings from the reference survey and the survey presented in this paper. Section 6 contains the authors’ suggestions for further research and development within the area of input data management for simulation and Section 7 concludes all findings in the paper.

2. Input data management In this paper, the term data is defined as quantitative facts about events, such as durations describing a processing time at an assembly station. A closely related term is information, which is here referred to as data further processed for use in simulation models (e.g. contextualized, categorized, corrected, calculated, and condensed [12]). Our simulation adapted view (the original reference does not specifically address simulation issues) is based on that the sender of the information is either a person (model builder or someone else in the project team) or a computerized data source depending on data input methodology. The receiver of information is here the simulation model requiring categorized data on a specific format to ‘‘understand’’ how to use them during simulation. These terms are, however, used in slightly different ways by various researchers and practitioners and this is further discussed in Section 5. As shown in Fig. 1, input data may come from a variety of sources. Corporate Business Systems (CBSs) such as Enterprise Resource Planning (ERP) or Manufacturing Execution Systems (MESs) typically host most of operational data. As an example, ERP systems deployed in manufacturing environments can provide key operational data such as machining times, set-up times, and bill-of-materials. Simulation models may also use project specific data, which typically come from the simulation

Fig. 1. Input data sources.

A. Skoogh et al. / Simulation Modelling Practice and Theory 29 (2012) 181–192

183

project team (e.g. including model builders and industrial engineers). They can include data items such as sales forecasts and future manpower levels. There are also instances where data need to be gathered from external reference systems. Data related to new machinery, for example, may have to be obtained from machine tool manufactures. The three sources discussed so far may not provide all necessary input data. In such situations, model builders need to observe processes and gather relevant data (collected data). For example, it may be necessary to collect a large amount of machining times to generate credible statistical input distributions or to have a sufficient number of samples for other possible input representations (e.g. traces, bootstraps or empirical distributions). Based on the definitions of data and information, it should be mentioned that data can be collected from all sources included in Fig. 1. Primary data are naturally related to collected data in the same figure. Secondary data, initially collected by others than the projects team, can be found either in CBS, external reference systems, or collected as project specific data. However, under favorable circumstances, it is also possible to find information in the same sources as for secondary data. The distinction here is that secondary data need further processing and information is ready to use in a simulation model, compare the definition of information in the beginning of this section. For example, in some companies, Mean Time Between Failures (MTBF) and Mean Time To Repair (MTTR) for machines can be previously collected, processed, and thereby ready to use (information) from maintenance systems. However, in most organizations, the same information needs to be prepared from breakdown samples in automated collection systems or MES (secondary data). 2.1. Data issues The existence of multiple data sources, in combination with several inherent difficulties in the data collection process, presents a number of challenges [11–14]:  Data accuracy – as data come from a variety of internal and external sources, the accuracy of data may be questionable. This means extra efforts are necessary to investigate the data sources and the format of collected data in order to ensure data accuracy.  Data correctness – there are several possible reasons for questioning whether correct data have been collected. One example is possible communication problems in automated data collection systems and another is incorrectly labeled events in collection systems based on human involvement.  Data duplication – same data items may come from different sources, sometimes also referred to as data redundancy. For example, machining time for a specific job may come from ERP but it is quite possible that machine operators have their own records of machining times, possibly more detailed, updated, or accurate than in the ERP system. Again when data duplication occurs, model builders need to make judgment on the most reliable source of data.  Data consistency – data describing the same event in a simulation model may not always be consistent if they are present in different data sources, compare data duplication or data redundancy above. If there is a big difference between these data, the model builder’s judgment on the most reliable data source is even more important. Further data analysis is needed to identify the root cause of the inconsistencies.  Data timeliness – different data from various sources are required depending on model purpose. A simulation model, continuously used for production systems development, can live through several iterations in both systems design and systems management. Therefore, model builders have to collect data at multiple occasions from different systems. For example, a machining time may be estimated by the machine vendor in systems design and available in the ERP system or gathered directly from the shop floor in systems management purposes.  Data validity – for various reasons collected data may not describe the correct behavior of the real-world system. Because data are often samples of historical event, validity checks are necessary to make sure that the data are not obsolete describing previous system states.  Data reliability – similar to data validity but refers more to being reliable in the eye of the model builder and other stakeholders.  Data completeness – in contrast to problems related to data duplication and data consistency, it is also common not to find all necessary data in existing sources. This usually leads to additional gathering or even assumptions from process experts. 2.2. Data input methodologies Once the necessary data items are identified and located, next challenge is to find the best way to process and store data. As shown in Fig. 2, four possible methodologies can be used by simulation teams according to Robertson and Perera [11]. Methodology A has for many years been the most popular approach [11]. The project team, and especially the model builder, manually compiles and processes data from several sources. This work typically includes gathering of primary data from the shop floor, collecting data from computer-based systems, and interviewing individual domain experts. After further processing, the information (processed data) is directly recorded within the simulation model. Although this is a simple approach, it presents a range of problems. Information is typically scattered within the simulation model, hence, locating specific items can be very time-consuming. It is also difficult to spot any mistyped values and the flexibility is limited for updating data between model iterations. In addition to its simplicity, the major benefit of Methodology A is that data are

184

A. Skoogh et al. / Simulation Modelling Practice and Theory 29 (2012) 181–192

Fig. 2. Alternative data input methodologies.

verified by the model builder continuously throughout the entire process. Note that there have been substantial advances in simulation software packages during the latest decade. Hence, some of the disadvantages listed above have been mitigated or eliminated since the methodology was initially described [11]. Read more about these advances in Section 2.4. Methodology B overcomes some of the limitations of the Methodology A. Instead of direct entry, data and information are stored in an external source, normally a spreadsheet. This makes data processing and verification much easier, which increases flexibility and facilitates model updates. Information is transferred to the simulation via direct links or VBAs (Visual Basic for Applications) routines. However, the model builder and/or the project team still manually perform the collection and processing of data. Methodology B is currently a very popular approach as it enables simulation model users to process and modify input data and information through a spreadsheet based interface. Moreover, models can be run without specific knowledge and experience in model building. Methodology C is an extension of Methodology B where the external data sources are linked to the model data storage in order to enable automated updates. The external data sources are typically exemplified by databases within the CBS and the model data are usually stored in an intermediary simulation database. Since the data are stored externally to the model, the same flexibility as for Methodology B applies. Thus, the intermediary step enables data processing and provides the possibility to set up what-if scenarios despite the close integration to external sources. The increased level of automation (compare Methodologies A and B) holds substantial potential to reduce the time taken to manage the input data. The major difficulty with increased integration and automation is to ensure the data quality and convince the stakeholders that the results are credible. Despite the advantages outlined above, only one industrial test implementation was identified by Robertson and Perera [11]. Methodology D eliminates the need for an intermediate data source because the simulation model is directly linked to the relevant data sources. During model development, system entities are referred to sources within the CBS. This automated connection dramatically reduces time, effort, and errors, given that all necessary data are correct and available. The primary drawback is the limited availability of detailed simulation data in major databases (e.g. ERP) [15]. Consequently, Methodology D implementations tend to be extensive and complex. Additionally, there is a risk for data duplication due to the substantial amount of connections to various data sources. Robertson and Perera [11] identified one real-world case of this scenario. The implementation was intended for systems management purposes.

A. Skoogh et al. / Simulation Modelling Practice and Theory 29 (2012) 181–192

185

2.3. Recent development and trends One emerging area in input data management is the development and implementation of Manufacturing Data Acquisition (MDA) solutions. MDA solutions usually include the necessary equipment for raw data collection integrated with an intermediary database for data storage and basic data processing functionality; see for example [16]. This combination, providing close control over the entire input data management procedure, is one recent example that can be classified in the Methodology C described above. A similar approach to MDA, however excluding the raw data collection, is called the GDM-Tool (Generic Data Management) and automates the collection of secondary data, data processing, and supply of data to simulation models [17]. Another area, constantly in progress, is the integration of engineering tools in Product Lifecycle Management (PLM) packages [18]. Such closely integrated software environments provide very close connections between data sources and simulation models and can therefore be categorized as Methodology D. Recent developments in utilizing PLM software are for example described and elaborated on for twelve different packages in [19]. However, these developments do foremost consider geometrical data and transitions of information in relation the product geometries. There are few previous publications focusing on the manufacturing shop floor data needed for the type of simulation described here. One such approach is described in Kim et al. [20], and their scope and implementation of a PLM environment for simulation is very interesting but not widely spread. This conforms to the relatively low dissemination of Methodology D solutions reported in Robertson and Perera [11]. Most other advances in input data management are related to separate steps of the input data management procedure. Some examples are the use of RFID (Radio-Frequency IDentification) [21] and Wikis [22] in data collection, and the development of standards facilitating the interoperability between data sources and simulation software packages [8]. 2.4. Support for input data management in simulation software packages The support for input data management, provided by commercial simulation software packages, has for years been described as insufficient. Perera and Liyanage [6] reported that simulation practitioners consider the ‘‘limited facilities in simulation software to organize and manipulate data’’ as one of the major pitfalls. Such missing facilities, required in data processing, include: extraction and categorization of data points, identification and removal of erroneous samples, correction of data formats or individual values, calculations (e.g. the time between failures), condensation to a suitable representation, etc. In addition to these common data processing operations, Robertson and Perera [11] argue for better integration between simulation software and ERP systems to facilitate complete automation of data collection (Methodology D). Fortunately, there have been advances during the latest decade. Several simulation software packages now provide solutions to facilitate the bi-directional transfer of data with external sources. The embedding of VBA is one example and some vendors also provide direct links to spreadsheets and databases without the need for VBA. As a result, the number of case studies describing simulation models fed with deterministic data from ERP systems or other external sources has increased. It is, however, more difficult to find similar implementations covering the more extensive handling of data for stochastic parameters. Mertins et al. [23] exemplify that it is often considered more appropriate to use an intermediary application (e.g. database or spreadsheet) for compilation of data from different sources and for performing the required data processing operations. It should here be repeated that ERP systems seldom contain all necessary data for simulation [15]. Another substantial improvement is that most simulation packages have implemented support for distribution fitting [24], either by means of self-developed analysis functionality or in cooperation with special purpose software, e.g. Stat::FitÒ [25] and ExpertFitÒ [26]. However, it is still a complex task to automate the complete chain of extraction, correction, calculations and condensation within the simulation software, especially for stochastic representation of varying processing times or breakdown patterns when raw data contain an extensive number of samples. Therefore, the distribution fitting functionality is frequently applied as a separate step. Altogether, these advances facilitate extraction of data from external sources, basic data processing, and storage of data and information in centralized structures within simulation models. These improvements enable experienced simulation engineers to apply approaches with substantial manual involvement (e.g. Methodology A) more successfully than described by Robertson and Perera in 2002 [11]. The same advances also increase the possibilities to implement more automated solutions (Methodologies C and D), relying on continuous exchange of data with external sources. 3. Survey design Given the developments in simulation software and other support systems for input data management, the authors wanted to investigate what impact these advances have made on simulation projects. It was therefore decided to repeat a survey published in 2002 [11]. The new survey was initiated during the Winter Simulation Conference (WSC) in Baltimore, Maryland, USA, December 2010. WSC is one of the world’s major forums for simulation specialists representing industry, academia and government. A questionnaire was given to conference attendees and the industrial representatives were asked to answer 12 questions (see Table 1) about the simulation procedures at their specific companies, mainly focused on input data management. Researchers with close connection to industry (i.e. through a recent case study) were also asked to com-

186

A. Skoogh et al. / Simulation Modelling Practice and Theory 29 (2012) 181–192

Table 1 Questions included in the survey. Survey questions 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12.

Please specify your major areas of application for DES. What makes you to use simulation in your business? Do you apply a structured approach to input data management? Which is the main source of input data to DES models? Which sources of input data are commonly used? What is your major approach for selection between duplicate data sources (if you have multiple sources for the same data item)? How is data accuracy, reliability and validity mainly assured? Models develop and evolve, how is data validity maintained? How is data (information) supplied to the simulation model? Where is the majority of data (information) held, i.e. where does the processed data reside? Consider the entire input data management process, which is the most common methodology? Please explain your answer! Which methodology do you think will be used in ten years? Please explain your answer!

plete the form with information obtained at the case study company. Reminders were sent out by e-mail containing a link to a web-questionnaire (exact copy of the original form). Responses from 86 companies were collected, including different business areas such as manufacturing, logistics, health care, and military applications. Data were analyzed using descriptive statistics to show how many companies using the different approaches to automated input data management. Note that there are significant similarities between the questions designed by Robertson and Perera [11] and the questionnaire used in this study. This overlap enables comparison between the studies in order to map the progress of input data management during the last decade. 4. Survey results The main data collected from the survey are summarized in Table 2. Additional reflections on interesting findings are also presented to prepare for further elaboration in the discussion chapter. Some of the questions in the survey are stated the same way or in a very similar way as was done by Robertson and Perera [11] around ten years ago. The correlation and development during the last decade regarding industrial use of simulation are then presented. Additionally, development trends and future outlooks for the use of different data input methodologies in industry are presented based on the answers to questions 11 and 12 in Table 1. In this survey as well as in Robertson and Perera [11], most respondents represent companies from the manufacturing area. Manufacturing applications are therefore separately presented in Table 2. However, there are also many answers from areas such as health care, logistics and military. The reader can easily see that the difference in methodologies and work procedures is limited between the various application areas. Some findings from Table 2 need to be highlighted and further described. The results of question 3 show that almost two out of three companies lack structured procedures to input data management and, thus, adapt their work procedure to specific model characteristics. Among companies using structured approaches, the most common are data templates and checklists. Another interesting finding is that the main source of data is reported to be a computer-based system in 65% of the companies (74% for manufacturing), which is promising for automated input data management. However, it is also reported that several sources are required to find all necessary data for simulation. A majority of companies is dependent on manual gathering and people-based systems (e.g. interviewing domain experts). A detailed data analysis shows that 80% of the companies reported that more than one type of data source is needed to satisfy the extensive data requirements in dynamic simulations. Table 3 shows the most common motivations to use the Methodologies A–D stemming from the survey results. Comments on future use of these methodologies shows setbacks and positive aspects of them as presented in Table 4. Both Tables 3 and 4 show real comments from the respondents, which means that statements can be contradicting each other. The variations are typically dependant on how the respondent is utilizing simulation and the software choices. Fig. 3 shows the trend in modeling methodologies used from 2000, 2010 and future prediction for 2020. The data from 2000 are described in Robertson and Perera [11]. The other two data sets are collected from the questionnaire results, namely questions 11 and 12; see the questions in Table 1 and the results in Table 2. During 2000 the Methodology A (manual input directly into the model) was the most used one with about 60%. Now, ten years later the most frequently used approach is Methodology B (spreadsheet connected to the model), which is in use by just above 60% of the practitioners. The prediction about the future shows that an increasingly automated data treatment is to expect. Methodology C (intermediary database connected to the model) is predicted to be most popular, with just above 40% of the practitioners. The increasingly automated data management, however, has some disadvantages expected by some practitioners; see Table 4. These problems create a more doubtful future scenario, which is further discussed in Section 5. Another interesting finding from comments in the survey results is for example the need for automated connections to public databases and external systems for input data. No question on this issue was included in the questionnaire, however,

A. Skoogh et al. / Simulation Modelling Practice and Theory 29 (2012) 181–192

187

Table 2 Complete questionnaire results (Mfg. = Manufacturing companies). Question

All

Mfg.

Answers and alternatives

1.

Application area.

86

35

35 Manufacturing, 14 Health care, 11 Logistics, 9 Military applications, 4 Finance and business, 4 Academia, 3 Human resources, 2 Energy, 4 Other

2.

Why use simulation?

35%

40%

57%

54%

8%

6%

Simulation is used to address a specific business need such as design of a new factory. Model is not re-used once the project is completed Simulation is regularly used to improve business operations. Models are often reused Use of simulation is mandatory within the business in every improvement project

3.

Structured approach to input data management?

44% 56%

37% 63%

Yes. If yes, please specify what is used No

4.

Main source of input data.

18% 15% 2% 32% 33%

14% 11% 0% 34% 40%

Manual gathering (e.g. stop watch, movie recording) People based systems (e.g. interviews, expert knowledge) Paper based systems (Brochures, etc.) Local computer based systems (e.g. spreadsheets) Computer based corporate business systems (e.g. ERP, MES, PLM)

5.

Common sources of input data.

53% 66% 16% 67% 58%

57% 74% 20% 80% 77%

Manual gathering (e.g. stop watch, movie recording) People based systems (e.g. interviews, expert knowledge) Paper based systems (Brochures etc.) Local computer based systems (e.g. spreadsheets) Computer based corporate business systems (e.g. ERP, MES, PLM)

6.

Approach for selection between duplicate data sources.

8% 19% 9% 18% 33% 11% 2%

3% 29% 9% 20% 37% 3% 0%

Data duplication is never encountered Select the most recent data Base the selection on personal experience Combination of data sources Base the selection on team knowledge Select data most local to the source/origin Other

7.

Methods for assuring data accuracy, reliability and validity.

19% 13% 7% 5% 52% 4%

29% 9% 3% 3% 54% 3%

Interviewing area experts Basic ‘‘sanity’’ checks Personal experience The internal or external customer’s responsibility Model validation runs Other

8.

How is data validity maintained?

21% 27% 17% 19% 15%

9% 31% 9% 14% 20%

Continuous manual efforts for data collection Manual efforts for data collection, only initiated when the model will be used Automated collection for parts of the data Continuous automated collection of all necessary data. Models are not maintained and re-used

9.

Supply of data to the simulation model?

23% 48% 23% 5% 1%

14% 57% 23% 3% 3%

Manually written in the model code Via an external spreadsheet (automatically connected to the model) or similar An off-line database automatically connected to the model Direct link between corporate business systems and simulation model Other

10.

Where is the majority of data stored?

24% 2% 63% 10% 1%

20% 0% 74% 6% 0%

In the simulation model In a paper based system. In a local computer based system (e.g. a spreadsheet) A computer based corporate business system (e.g. ERP, MES) Other

11.

Current input data management methodology.

17% 61% 21% 1%

17% 63% 17% 3%

Methodology Methodology Methodology Methodology

A B C D

12.

Input data management methodology used in 10 years.

4% 20% 40% 37%

3% 18% 41% 38%

Methodology Methodology Methodology Methodology

A B C D

188

A. Skoogh et al. / Simulation Modelling Practice and Theory 29 (2012) 181–192

Table 3 Comments on present methodology collected from question 11.

Used for/since

Methodology A

Methodology B

Methodology C

Methodology D

– The connectivity is not easy in other methodologies – Small scale models – Few data items needed – Elementary – More accurate – Simpler projects

– Common business practice – Analyze data very clearly – Expert providing interface for others – Lack of experience in automated tools – Effortless in terms of computer knowledge – Supported by simulation software – Unavailable or inconsistent data in ERP systems – Security reasons

– Reduce risk to tamper data – Easy to implement and update data – Presently the best technology available – Most comprehensive solution – Most automatic that is feasible – Possible to set up what-if scenarios – Extensive amounts of data

– All the software modules within a company can communicate with each other. This also includes DES software

Table 4 Comments on the use of future methodologies collected from question 12. Methodology A

Methodology B

Methodology C

Methodology D

Pros

– Elementary – Manual methods will still be used

– Easy input/output simulation as black box – Easy to use for the masses – Convenience – Workability

– Less time consuming – ERP, MES and digital factory systems will merge into Enterprise Lifecycle Systems. – Data not tampered

Cons

– – – –

– No real live update of most recent data – Ad Hoc. – Unstructured

– Intermediate database necessary – Faster and more transparent – Data not tampered – Data processing required – Too difficult

Too much work Cumbersome Danger to tamper data No support for data processing

– – – –

Too difficult No need to automate all steps Manual interaction necessary Data availability

Fig. 3. Trends in data input methodologies applied in 2000, 2010 and prediction for 2020.

the need for simulation data in external databases, e.g. nationwide healthcare databases and web open source data, was raised by a few respondents. Examples of other such public databases are ELCD (European reference Life Cycle Database) [27], UPLCI (Unit Process Life Cycle Inventory) [28], and EcoInvent [29] containing data traditionally used in Life Cycle Assessment (LCA) studies. 5. Discussion The aim of this paper is to map the current practices in input data management for production simulation. Experiences from 86 companies worldwide were collected using a questionnaire handed out during the Winter Simulation Conference

A. Skoogh et al. / Simulation Modelling Practice and Theory 29 (2012) 181–192

189

2010 (WSC’10). The results indicate some advances but also highlight several problems with the integration of major business systems with simulation models. Initially, the authors’ main focus was to provide an update of solutions for automated input data management in manufacturing simulations. However, the results of the survey consistently show that there is no substantial difference between the situation in manufacturing industry and other application areas represented at WSC’10. This means that the results may well be of interest for increasing efficiency in for example health care, logistics and military simulations. The main findings prove that the manual involvement is still significant in input data management. 80% of the companies rely on the Methodologies A and B, including manual data collection and processing. These methodologies are mostly selected because they are efficient and straight-forward for small-scale models in companies not using simulation as a desktop resource. Methodology B is also popular in cases where simulation experts are building models with spread-sheet interfaces for internal or external customers. These needs, together with the desire for continuous and transparent data verification, will for sure remain. Therefore, the authors expect Methodologies A and B to be popular for many years to come, which is supported by the fact that around 20% of the companies plan to keep their current manual approach. However, 20% of the companies have already implemented automated connections to the required data sources and the trend is increasing (Fig. 3). Companies have several alternatives when selecting a suitable solution for increasing efficiency in simulation studies. The survey, however, shows that Methodology C implementations hold the highest potential to succeed given present circumstances. Dynamic simulation models require very detailed data, seldom found in major business systems, according to both this survey and previous research [11,15]. Consequently, there is a need for combining sources within the CBS with local systems providing detailed processing times, stop times, etc.; see for example the diversity of sources reported in Table 2. Moreover, the respondents of this survey also highlight the possibilities to process and modify the data before supplying them to the simulation model as a major advantage of Methodology C. This is often required to extract the correct information from raw data, but also in order to set up what-if-scenarios for simulation analysis. Additionally, the intermediary step can also be utilized for security reasons assuring that interoperability problems do not affect data essential for other engineering applications nor for the flow of critical information on the shop floor. Looking further into Methodology C (Fig. 4), there are several alternative solutions presented in recent literature. MDA [16], for example, is a popular solution often including the technical equipment for the actual raw data collection. Another example focusing on the extraction and processing of already available raw data is the GDM-Tool presented by Skoogh et al. [17]. It should also be mentioned that there have been advances in the data management support provided by commercial simulation software packages, mainly in the interface to common database formats as well as in data processing and analysis. These features can be utilized in the set-up of both Methodology C and D applications. However, the close connections to major ERP vendors have not been established, which is also highlighted in the results of this survey. Therefore, these features are, according to the author’s experience, more used for stand-alone purposes in work procedures categorized as Methodology A or B.

Fig. 4. An example of Methodology C with an intermediary database extracting the necessary data tables (for a specific model) from an ERP-system and a complementary local data source.

190

A. Skoogh et al. / Simulation Modelling Practice and Theory 29 (2012) 181–192

The categorization of input data management methodologies provided by Robertson and Perera [11] can be divided into sub-categories, for example the above mentioned MDA implementations classified as Methodology C. Another subarea is the close integration between simulation models and data sources provided by PLM packages [18–20] (Methodology D). It should also be mentioned that the methodologies are not necessarily disjunctive in all companies. The bulk of data may come from the CBS but need to be complemented by manual collection of single simulation parameters. This survey is not detailed enough to map all such possible mixes but aims to describe how the majority of data is managed. As an additional comment on the survey design, the questionnaire was handed out to around 700 simulation practitioners and researchers at the WSC’10. 86 responses were collected in total. This answering frequency might appear to be limited but the reader should keep in mind that the survey was mainly aimed to the representatives having performed a recent industrial simulation study. It is therefore most likely that many people, declining to submit an answer, were researchers without recent case studies in industry. Further analysis of the participating companies at the WSC’10 shows that many respondents represent large-sized organizations. Thus, the possible underrepresentation of Small and Medium-sized Enterprises (SMEs) implies that the actual use of automated solutions in input data management might be slightly lower in general. Another comment related to the survey design is how the terms ‘‘data’’ and ‘‘information’’ are used. Section 2 in this paper defines data as quantitative facts about events (for example processing times) needed in the simulation model, while information means data which are further processed for use in simulation models (e.g. categorized, corrected, condensed). The authors are aware of that these definitions varies slightly between companies but argue that it is impossible to completely align all possible views using the selected questionnaire design. The descriptions of the four input methodologies [11] (provided in the questionnaire) are assumed to help the experienced simulation users to understand the questions correctly. Despite the current majority of work procedures categorized as Methodology B, companies are very interested and motivated to increase the level of automation. Results show that 77% expect to implement solutions with automated connections to the required data sources within a 10 years period. There is naturally a significant potential to reduce the time-consumption in input data management, and consequently also in entire simulation studies, thanks to the reduction of manual involvement. To maintain model credibility, there are of course important issues associated with automated handling of large data sets. Survey results show that there is a need for involving area experts and team knowledge when selecting between duplicate data sources and for assuring data accuracy, reliability, and validity. However, the authors argue that such investigation of data sources is unnecessary at every data extraction. This means that combining strategies for regular investigations of data sources with clever algorithms for data cleansing and analysis makes it possible to reach a higher and more repeatable data quality compared to manual input data management. It should also be mentioned that automation may not be possible for all input parameters, especially not in large sized models. However, there is a substantial potential for increasing efficiency by reducing the manual involvement for the bulk of data and, most specifically, for the parameters with high variability requiring continuous updates.

6. Suggestions for further development The discussion provided above invites to further development in two possible directions. The first option is to develop Methodology C solutions because their strength to deal with problems such as the combination of data from different sources and the extensive need for data processing. The second solution is to influence the development of major business systems (ERP, MES, PLM, etc.) to include the detailed raw data necessary for dynamic simulations. This alternative, leading to complete integration of data sources with simulation models, holds strong potential because the completely eliminated need for human assistance. However, it will face more problems and require extensive research and development along the way as compared to the first alternative. Such necessary development will probably be given low priority from major ERP-vendors since their simulation-related businesses are relatively limited. This is a chief reason for why the authors argue for methodology C instead of methodology D given present circumstances. Methodology C was the main suggestion already a decade ago [11] and due to the limited advances in support systems for input data management it is still envisaged as the main alternative. The intermediary database provides the possibility to merge data from major business system, local data sources, people-based systems and external reference systems, which is a prerequisite due to the lack of comprehensive and detailed data in single sources. An additional argument is the increasing dependency on external reference systems identified in this survey. Such sources are nowadays important, partly as a result of the sustainability analyses integrated in traditional manufacturing simulations. Data necessary for such purposes are often extracted from public databases, e.g., ELCD [27], UPLCI [28], and EcoInvent [29]. The development of required solutions for direct integration of major business sources and manufacturing simulation applications (Methodology D) has not been sufficient during the last decade. This statement is supported by the finding in the presented survey showing that only 3% of the companies have implemented such solution. However, despite that the authors argue for Methodology C solutions, they encourage further research and development facilitating such implementations. The key factor is to collect, structure, and store detailed data in major business systems, which most likely includes standards and interoperability research to maintain the data accuracy, consistency, and completeness.

A. Skoogh et al. / Simulation Modelling Practice and Theory 29 (2012) 181–192

191

The bottom line is that increased level of automation is important regardless of whether Methodology C or D is most appropriate for a specific company. High efficiency in input data management has a significant impact on the usability and profitability of manufacturing simulations. Succeeding with one of these solutions is therefore a prerequisite for increasing the use of simulation on a regular basis, from the 65% reported in the results of this survey; see question 2 in Table 2. 7. Conclusion Robertson and Perera [11] identified four methodologies of input data management: (a) Manual data collection and processing. The information (processed data) is manually recorded into the simulation model. (b) Manual data collection and processing. The information is automatically transferred to the simulation model via a spreadsheet interface. (c) Automated connection between data sources and simulation model using an intermediary database. (d) Direct link between the CBS and the simulation model. Since their study, published in 2002, there have been advances in the input data management process itself as well as in support systems such as data collection systems, databases, and simulation software. Therefore, this paper presents an update of the industrial practice in input data management in order to identify and describe a possible progress. The categorization above has served as a foundation and the results show:  The most common input data management procedure still includes significant manual involvement in data processing and utilizes a separate spreadsheet interface to the simulation model. 61% of all companies use such approach (Methodology B).  During the last decade there have been advances in automating the input data management procedure. Going from very few industrial examples 10 years ago [11], 22% of the companies have now implemented the more automated Methodologies C and D.  The vast majority of this subgroup (C and D) prefers an intermediary database between the data sources and the simulation model to handle their dependency on multiple data sources and extensive need for data processing.  Another argument for using the intermediary database is the lack of sufficient data processing features in commercial simulation software packages, despite some advances such as the integration of distribution-fitting functionalities.  There is an increasing need for collection and processing of data from external reference systems such as public LCA databases. In manufacturing, this increasing need is most likely due to the integration of sustainability analyses in manufacturing simulation studies. Despite the progress identified in this paper, many companies ask for further support in elevating the level of automation in input data management. Almost 80% of all participating simulation users expect their companies to implement Methodologies C or D within 10 years. Researchers and industrial developers should focus on increasing the availability of detailed raw data in major business systems and to provide efficient solutions for data processing, e.g. provided in intermediary databases. References [1] M. Jahangirian, T. Eldabi, A. Naseer, L.K. Stergioulas, T. Younga, Simulation in manufacturing and business: a review, European Journal of Operational Research 203 (2010) 1–13. [2] J. Banks, J.S. Carson, B.L. Nelson, Discrete-Event System Simulation, second ed., Prentice-Hall, Upper Saddle River, 1996. [3] A.M. Law, Simulation Modeling and Analysis, fourth ed., McGraw-Hill, New York, 2007. [4] M. Rabe, S. Spieckermann, S. Wenzel, A new procedure model for verification and validation in production and logistics simulation, in: Proceedings of the 2008 Winter Simulation Conference, 2008, pp. 1717–1726. [5] A. Skoogh, B. Johansson, Mapping of time-consumption during input data management activities, Simulation News Europe 19 (2009) 39–46. [6] T. Perera, K. Liyanage, Methodology for rapid identification of input data in the simulation of manufacturing systems, Simulation Practice and Theory 7 (2000) 645–656. [7] A. Ingemansson, T. Ylipää, G.S. Bolmsjö, Reducing bottle-necks in a manufacturing system with automatic data collection and discrete event simulation, Journal of Manufacturing Technology Management 16 (2005) 615–628. [8] Y.-T.T. Lee, F.H. Riddick, B. Johansson, Core manufacturing simulation data – a manufacturing simulation integration standard: overview and case studies, International Journal of Computer Integrated Manufacturing 24 (8) (2011) 689–709. [9] J. Bernhard, S. Wenzel, Information acquisition for model based analysis of large logistics networks, in: Proceedings of the 19th European Conference on Modelling and Simulation, 2005, pp. 37–42. [10] L.G. Randell, G.S. Bolmsjö, Database driven factory simulation: a proof-of-concept demonstrator, in: Proceedings of the 2001 Winter Simulation Conference, 2001, pp. 977–983. [11] N. Robertson, T. Perera, Automated data collection for simulation?, Simulation Practice and Theory 9 (2002) 349–364 [12] T.H. Davenport, L. Prusak, Working Knowledge: How Organizations Manage What They Know, Harvard Business School Press, Boston, MA, 1998. [13] J.W. Fowler, O. Rose, Grand challenges in modeling and simulation of complex manufacturing systems, Simulation 80 (9) (2004) 469–476. [14] S. Robinson, Simulation: The Practice of Model Development and Use, John Wiley & Sons Ltd., Chichester, 2004. [15] Y.B. Moon, D. Phatak, Enhancing ERP system’s functionality with discrete event simulation, Industrial Management and Data Systems 105 (2005) 1206–1224.

192

A. Skoogh et al. / Simulation Modelling Practice and Theory 29 (2012) 181–192

[16] M. Aufenanger, A. Blecken, C. Laroque, Design and implementation of an MDA interface for flexible data capturing, Journal of Simulation 4 (2010) 232– 241. [17] A. Skoogh, B. Johansson, J. Stahre, Automated Input Data Management: Evaluation of a Concept for Reduced Time-Consumption in Discrete Event Simulation, SIMULATION: Transactions of the Society for Moeling and Simulation International, 2012. [18] W. Kühn, Digital factory – simulation enhancing the product and production engineering process, in: Proceedings of the 2006 Winter Simulation Conference, 2006, pp. 1899–1906. [19] G. Draghici, A. Draghici, Collaborative product development in PLM multisite platform, Advances in Manufacturing Engineering, Quality and Production Systems, vol. II, WSEAS Press, 2009, pp. 327–332. [20] G.-Y. Kim, J.-Y. Lee, H. S Kang, S.D. Noh, Digital factory wizard: an integrated system for concurrent digital engineering in product lifecycle management, International Journal of Computer Integrated Manufacturing 23 (11) (2010) 1028–1045. [21] D. Diep, F. Pfister, A. Candeias, Integration of agents and RFID in a manufacturing simulation, in: IEEE International Conference on Industrial Informatics (INDIN), 2009, pp. 892–897. [22] P. Dungan, C. Heavey, Proposed visual Wiki system for gathering knowledge about discrete event systems, in: Proceedings of the 2010 Winter Simulation Conference, 2010, pp. 513–521. [23] K. Mertins, M. Rabe, P. Gocev, Integration of factory planning and ERP/MES systems: adaptive simulation models, in: T. Koch (Ed.), IFIP International Federation for Information Processing–Lean Business Systems and Beyond, Springer, Boston, 2008. pp. 185–193. [24] J.J. Swain, Software survey: simulation — back to the future, ORMS-Today 38 (2011). [25] Geer Mountain Software Corporation, Stat::Fit Commercial Webpage. [accessed 22.10.11). [26] A.M. Law, M.G. McComas, How the ExpertFit distribution-fitting software can make your simulation models more valid, in: Proceedings of the 2003 Winter Simulation Conference, 2003, pp. 169–174. [27] Institute for Environment and Sustainability, European Reference Life Cycle Database. (accessed 11.08.11). [28] M. Overcash, J. Twomey, D. Kalla, Unit process life cycle inventory for product manufacturing operations, in: ASME Conference Proceedings MSEC2009, 2009. [29] Swiss Centre for Life Cycle Inventories, The EcoInvent Database. (accessed 11.08.11).