Available online at www.sciencedirect.com Available online at www.sciencedirect.com
ScienceDirect ScienceDirect
Procedia Computer Science 00 (2017) 000–000 Available online at www.sciencedirect.com Procedia Computer Science 00 (2017) 000–000
ScienceDirect
www.elsevier.com/locate/procedia www.elsevier.com/locate/procedia
Procedia Computer Science 119 (2017) 8–15
6th International Young Scientists Conference in HPC and Simulation, YSC 2017, 1-3 November 6th International Young Scientists Conference in HPC and Simulation, YSC 2017, 1-3 November 2017, Kotka, Finland 2017, Kotka, Finland
Pitfalls in Modeling and Simulation Pitfalls in Modeling and Simulation
a a
Matti Koivistoaa * Matti Koivisto *
The South-Eastern Finland University of Applied Sciences, Mikkeli 50100, Finland. The South-Eastern Finland University of Applied Sciences, Mikkeli 50100, Finland.
Abstract Abstract Scholars use models and modeling e.g. to examine, explain or demonstrate ideas or phenomena. Modeling combines discipline Scholars use models andgeneral modeling e.g. toofexamine, explain or demonstrate ideas or phenomena. discipline specific traditions and methods modeling together. The interdisciplinary nature of Modeling modeling combines and simulation can specific traditions and general modeling together. eight The interdisciplinary of modeling and simulation can sometimes cause challenges to amethods scientist.ofThis paper identifies typical pitfalls a nature researcher may encounter in a modeling sometimes cause explains challenges a scientist. This paper eight typical a researcher may inthe a modeling study. The study the to pitfalls and connects themidentifies to the different phases pitfalls of the modeling cycle. Theencounter purpose of article is study. The some study guidance explains the andhow connects them the different of the modeling cycle. The purpose of the article is to provide for pitfalls scientists to avoid theto general traps ofphases modeling and simulation. to provide some guidance for scientists how to avoid the general traps of modeling and simulation. © 2017 The Authors. Published by Elsevier B.V. © 2018 The Authors. Published by Elsevier B.V. © 2017 The Authors. Published by Elsevier B.V. Peer-review Scientist conference conference in in HPC HPC and and Peer-review under under responsibility responsibility of of the the scientific scientific committee committee of of the the 6th 6th International International Young Young Scientist Peer-review under responsibility of the scientific committee of the 6th International Young Scientist conference in HPC and Simulation. Simulation Simulation. Keywords: Modeling, Simulation, Modeling cycle, Pitfalls of modeling Keywords: Modeling, Simulation, Modeling cycle, Pitfalls of modeling
1. Introduction 1. Introduction Modeling is a powerful tool for developing and testing theories and it is applied in many different fields of study. Modeling is a powerful tool for developing testing theories and itdevelopments is applied in many different fields of study. Modeling together with simulation, combinesand general methodological to many specific application Modeling with simulation, methodological developments to but many specific application fields. Thetogether interdisciplinary nature ofcombines modelinggeneral and simulation offers huge possibilities it can also cause some fields. The to interdisciplinary and simulation offers huge possibilities can as also cause some challenges scientists who nature shouldofbemodeling able to combine mathematical, computer sciencebut as itwell modeling and challenges to scientists who should simulation tradition to his or her fieldbe of able study.to combine mathematical, computer science as well as modeling and simulation tradition to his or field of study.in modeling and simulation research. The aim of the paper is to provide This paper concentrates onher possible pitfalls This paper concentrates on possible pitfalls inimplementing modeling and their simulation research. The aim of the paper to provide some guidance for young scientists planning and scientific work based on modeling andissimulation. some guidance for young scientists planning and implementing their scientific work based on modeling and simulation. * Corresponding author. Tel.: +358 50 3124 999. * Corresponding author. Tel.: +358 50 3124 999. -mail address: matti.koivisto(a)xamk.fi. -mail address: matti.koivisto(a)xamk.fi. 1877-0509 © 2017 The Authors. Published by Elsevier B.V. 1877-0509 © 2017 Authors. Published Elsevier committee B.V. Peer-review underThe responsibility of theby scientific of the 6th International Young Scientist conference in HPC and
Peer-review Simulation. under responsibility of the scientific committee of the 6th International Young Scientist conference in HPC and Simulation. 1877-0509 © 2018 The Authors. Published by Elsevier B.V. Peer-review under responsibility of the scientific committee of the 6th International Young Scientist conference in HPC and Simulation 10.1016/j.procs.2017.11.154
2
Matti Koivisto / Procedia Computer Science 119 (2017) 8–15 Matti Koivisto / Procedia Computer Science 00 (2017) 000–000
9
The paper starts in Section 2 with a short introduction to principles of models and modeling in science. Section 3 concentrates on developing and applying scientific models. The special attention here is on the iterative nature of the model development. Section 4 discusses the identified pitfalls of modeling and simulation. Discussion is organized according to the modeling cycle developed and explained in the previous Section. Section 5 contains the conclusions of the study. 2. Models and Modeling Scientists attempt to produce knowledge about the world via numerous methods and models are of central importance in that work. There is a large number of things that are commonly referred to as models including but not limited to physical objects, fictional objects, set-theoretic structures, descriptions, equations, or combinations of some of these [14]. Teller [19] goes even further as he points out that “in principle, anything can be a model, and that what makes a thing a model is the fact that it is regarded or used as a representation of something by the model users”. Models can be classified to two major categories: physical or symbolic [18]. Physical models can be e.g. a model of an airplane or a building. Symbolic models instead are typically based on natural or formal languages or a set of mathematical equations. In this paper, discussion is limited to the symbolic scientific models and their implementations. In the real world, we observe various phenomena and behaviors, which can be either natural in origin or produced by artifacts. With symbolic scientific models, we are able to move from this real world to the conceptual world. This conceptual world is the world of the mind. There we try to understand what is going on in our real, external world. [11]. Fig. 1 describes the relationship between the real and the conceptual worlds. In the conceptual world, we first observe events in the real world. Then in the modeling phase, we analyze the observations and create models typically either to explain the results observed or to predict the future results.
Fig. 1. The real and conceptual worlds of modeling (Modified from [11].
2.1. Simplification and modeling The conceptual world is always a simplified version of the real world. Simplification is an extremely essential part of modeling which is clearly highlighted by Luis Borges’ classical example of a map. A useful map must be a simplified geographical model and a map with a scale 1:1 is totally useless [5]. There are many reasons for simplification. First, we must bear in mind that the purpose of the scientific model is to promote understanding. Simple models enable scientists and other stakeholders to understand links between inputs, assumptions and outputs. Additional complexity can prevent us to see the essential components of the system and the relationships between them. Second, a model has to provide tractability [9]. Otherwise, verification and validation of the model is jeopardized. There are many additional reasons for simplification including lack of calculation power, efficient use of resources, and missing data to mention just a few.
Matti Koivisto / Procedia Computer Science 119 (2017) 8–15 Matti Koivisto/ Procedia Computer Science 00 (2017) 000–000
10
3
2.2. Why do we use models? Like mentioned earlier, models are typically used either to explain or to predict. However, there are of course also other reasons to model. For example, Epstein [12] listed all in all 17 reasons to use models in science as shown in Table 1. Table 1. Reasons to build a model [12]. Models are used to Predict Explain (very distinct from predict) Guide data collection Illuminate core dynamics Suggest dynamical analogies Discover new questions Promote a scientific habit of mind Illuminate core uncertainties Train practitioners
Offer crisis options in near-real time Demonstrate tradeoffs / suggest efficiencies Challenge the robustness of prevailing theory Expose prevailing wisdom as incompatible with data Discipline the policy dialogue Bound (bracket) outcomes to plausible ranges Educate the general public Reveal the apparently simple to be complex
Although there are many reasons to use models, in the philosophy of science, discussion has mainly concentrated on explanatory and predictive models. (See e.g. [10]). According to Breiman [7] we can point out the difference between these two principles as follows. A model can be considered as a black box in which input variables go in and response variables come out (see Fig. 2). In predictive modeling, we try to predict the responses to the future input variables. In explanatory modeling instead, we try to understand what happens inside the black box or to explain the relationships between input and output variables.
Fig. 2. A black box model (Modified from [11]).
Although, today the academic community in theory seems to agree that explaining and predicting are two different and equally important areas, in practice things might be different. For example, Shmueli [17] has analyzed how statistical models are used in different scientific disciplines. He points out that in many fields of study models are used almost exclusively for causal explanation and model that poses high explanatory power are often expected to have predictive power as well. For people outside the research community the difference between explanation and prediction is even more unclear as people reflexively presume that prediction is the ultimate goal of all scientific models [12]. 3. Research Process and Modeling Cycle Still today, many textbooks describe research work as a one-way process in which predefined steps follow each other in well-organized manner. The number of steps in these descriptions varies but typically, elements including topic selection, research question creation, method selection, data collection, as well as data analyses and interpretation are present. In the ideal world, the research work follows nicely this clear path but in the real life, it seldom does. Young scientists might find it stressful, when things do not go as planned and do not follow the theoretical one-way path. Instead, they have to go back and forth between different phases of the research process. With more experience,
4
Matti Koivisto / Procedia Computer Science 119 (2017) 8–15 Matti Koivisto / Procedia Computer Science 00 (2017) 000–000
11
a researcher understands the iterative nature of the research work. Modeling especially is an iterative process of trial and error in which the complexity of the model is gradually either increased or decreased [13]. 3.1. The Art of Model Creation In modeling and simulation, scholars should always bear in mind their focus. In the scientific world, models should be constructed with certain questions and experiments in mind [20]. In most cases models are useful only in the context they were created. Therefore, a model providing answers to some questions might be useless in other even closely related questions. In the research literature, the modeling process is often seen as a cycle that starts with a problem situation in real life, followed by a translation of the problem into mathematical terms and solutions [16]. Scholars have created many versions of this modeling cycle. For example, Blum and Leiss [4] listed the following stages in their cycle: understanding, constructing, simplifying, structuring, mathematising, working mathematically, interpreting, validating and exposing. Augusiak et al. [1] instead distinguish the following elements in their cycle: data evaluation, conceptual model evaluation, implementation verification, model output verification, model analysis, and model output corroboration. Third example, provided by Barth et al. [3], consists of question formulation, identification of relevant elements of target systems, choosing model structure, model implementation, running and analyzing model and communicating results. Fig. 3 shows the modeling cycle used in this paper. It has its roots on the previously introduced examples but it emphasizes the idea of three different domains, which are: the real, the conceptual and the formal world. The first two –the real and conceptual worlds are discussed earlier in Section 2. The formal world is the world of mathematics and computers. The separation of the conceptual and formal worlds underlines the difference between a conceptual model and its formal implementation typically in the form of the computer program. Many scholars have emphasized this separation of the conceptual and computerized models [2].
Fig. 3. The modeling cycle.
4. Pitfalls in Modeling In the previous sections, the following key issues related to the modeling were identified: major reasons to model, need for simplification, and three domains of operation (the real, conceptual and formal worlds). In this section the major pitfalls of modeling will be discussed. Discussion is organized according to the modeling cycle presented in Section 3 above. 4.1. Pitfalls in Problem Specification The heart of every research project is the problem or question the researcher wants to address [15]. This general rule of the scientific research applies naturally also to modeling. To be able to define your problem or research question with clarity and precision you must understand the real situation and its relationships well. Unfortunately far too often
12
Matti Koivisto / Procedia Computer Science 119 (2017) 8–15 Matti Koivisto/ Procedia Computer Science 00 (2017) 000–000
5
researchers move from the real world to the conceptual world before they have gained the clear understanding of the problem. In this paper, we call this a rushing pitfall. Researchers move forward too early both from internal and external reasons. Far too often, a researcher wants to start the “real” research and pays too little time studying the real world problem in the beginning of the research project. In a way, this is understandable, because many researchers are more familiar with the rules and methods of the conceptual and formal worlds than those of the real world. External reasons to rush can have multiple sources including supervisors’ expectations, funding requirements and timetable issues just to name a few. No matter where the pressure to rush comes from a researcher should try to be patient. Time spent thinking and planning in the beginning of the project typically pays back handsomely. 4.2. Pitfalls in Conceptual and Computerized Models In our modeling cycle, there is a clear separation between a conceptual and computerized model. The conceptual model consists of assumptions on system components, interactions between them as well as input parameters and data assumptions [2]. At this point, a researcher has a danger to fall into a complexity pitfall. In this phase, it is essential to reduce the number of components and their relationships. According to Barth et al. [3] the main reason for the complexity pitfall is a false understanding of realism. The main goal of the scientific model is not to be as realistic as possible but to provide better understanding of the studied system. This can be quite often reached by omitting all but the most essential variables of the system. So, in modeling the famous phrase “less is more” should be followed whenever it is possible. A conceptual model is typically expressed by natural languages or diagrams. Turning it to the computerized form requires quite often knowledge outside the researches field of study (e.g. mathematics, programming, statistical methods etc.) Far too often researchers use statistical methods like witchcraft. Following certain steps they get some results but they know little what they did and why they are doing it. According to Westfall and Henning [22] at worst researchers can be like trained parrots only able to recite statistical jargon instead of understanding what they are doing. In this paper, this limited understanding of used methods is called a lack of skills pitfall. Another form of this pitfall occurs when a researcher use some not so suitable method or software tool instead of a better one just because he or she does not know about better. It is very easy to fall into this trap because the selection of the tools and methods is naturally guided by the earlier experiences. 4.3. Pitfalls in Interpretation and Results The formal results provided by the computerized model are typically just numbers. A researcher must interpret them in order to get some conceptual results. One of the major threats here is to interpret evidences in ways that are partial to our expectations or a hypothesis in hand. Researchers may come so close to their own model that they lose the critical distance to their work [3]. This result of confirmation bias is called an interpretation pitfall. When applying a scientific model a researcher should bear in mind that a model is a simplified representation of reality. It is designed to provide answers to specific questions in selected circumstances. Therefore, it can provide valid answers only for the context where it was designed. Using a model in a wrong context is called here a context pitfall. A classic example of the context pitfall is when an explanatory model is used for making predictions. I want to emphases that I do not mean that scientists should not learn from other fields of study and the models used in different disciplines. I just want to point out that models must be verified and validated properly before used in new context. Typically at the end of a study the real results of the study are presented to some stakeholders. The audience could include colleagues, participants of a scientific conference, readers of a journal, management of the research institution, funders etc. To avoid a presentation pitfall a scientist must first think the needs of the audience. Different audiences have different expectations and they are served with different kind of information and presentation. No matter what the audience is, a good presentation is neither an endless flow of statistics nor fancy simulations without relevant content.
6
Matti Koivisto / Procedia Computer Science 119 (2017) 8–15 Matti Koivisto / Procedia Computer Science 00 (2017) 000–000
13
4.4. Pitfalls in Data and Iteration A good model and reasonable data allow a separation of information from noise. Information means here both the structure of the relationships as well as estimates of model parameters. Noise instead relates to the unexplained variation [8]. Quality of data is one of the major concerns in making science. A good rule is to anticipate having at least some problems with data. Data pitfalls are common in all research and therefore the topic is widely covered in many textbooks and studies. As mentioned earlier, in many cases a simple models are preferred to more complicated ones. Finding a simple and more general model typically requires several iterations and substantial changes to the original model [21]. Iterations naturally takes time and effort put like an old proverb says, “the dictionary is the only place where success comes before work”. During the iterations, a researcher should also revisit the real world and the real problem. This way he or she is able to avoid what Banks & Chwif [2] call “a type III error”, which occurs when a scientist develops an elegant solution to a wrong problem, even with good data. In this paper this kind of danger is called a lack of iteration pitfall. 4.5. Summary of Pitfalls Table 2 summarizes the pitfalls identified in this paper. It is important to notice that due the iterative nature of modeling and simulation a researcher has a chance to fall into these traps more than once during his or her research. Table 2. Identified pitfalls and their phases in modeling cycle. Name of the pitfall Rushing pitfall Complexity pitfall Lack of skills pitfall Interpretation pitfall Context pitfall Presentation pitfall Data pitfall Lack of iteration pitfall
Phase in the modeling cycle Problem specification Conceptual model Computerized model Formal and conceptual results All phases Real results All phases All phases
Scientific modeling is a complex exercise. Therefore, it is not possible to provide a simple checklist to guarantee that a researcher has avoided all the pitfalls identified in this paper. However, Table 3 below provides some practical advices for young scientists how to avoid the most typical pitfalls. Most of the advices are already discussed above but it is still worth emphasizing following three points. First, it is practically impossible to develop a solid model without a deep knowledge and understanding of the studied phenomena. Understanding is not only an essential starting point for the model creation process but also vital for the model simplification. Table 3. Some practical advices for young scholars to avoid modeling pitfalls. Name of the pitfall Rushing pitfall Complexity pitfall Lack of skills pitfall Interpretation pitfall Context pitfall Presentation pitfall Data pitfall Lack of iteration pitfall
Advices for young scientists Be patient and gain understanding on the studied phenomena. Identify essential elements and remember that less is more. Know and understand your tools and techniques. Present your early interpretations to your colleagues and other stakeholders for second opinions. Be open to new ideas but do not copy methods or models without validation and verification. Remember the needs and expectations of the audience. Anticipate having at least some problems with data. Be patient again. The development of a valid model requires typically multiple iterations.
Second, there is a real danger that a researcher comes too familiar with his or her own model and loses the critical distance to its outcomes. The key solution to avoid biased interpretations is the second opinion. It is extremely
14
Matti Koivisto / Procedia Computer Science 119 (2017) 8–15 Matti Koivisto/ Procedia Computer Science 00 (2017) 000–000
7
important that a young scientist understands that academic research is a team effort. It is highly recommended for a scientist to share his or her early ideas and interpretations with colleagues, supervisors and other members of the research community. Fellow researchers’ opinions and views can open new paths to more objective interpretations. Finally yet importantly, it is essential to stress the importance of patience in academic research. It is a widely accepted fact that high quality studies require both hard work and time to do it. Unfortunately, it is far from easy to be patient when you feel the pressure to publish more from multiple sources. However, in a long run I truly believe that in science quality will outperform quantity. 5. Conclusions The paper aimed to provide some insight to the challenges related to use of models in academic studies. Like all papers, also this has its limitations. First, it is not possible to cover all possible pitfalls of modeling in one paper. All studies have their own characteristics and the areas where things can go wrong varies. The focus of this paper has been on the most common traps a researcher encounters in modeling and simulation. Second, the paper does not give a simple checklist for a researcher to verify that he or she has avoided the major pitfalls. The reason for this is obvious. High quality modeling requires more than just some simple rules or checklists. Instead, the paper contains some general guidelines, which hopefully provide some advice to avoid the major flaws in modeling. The common issue in all pitfalls of modeling is lack of understanding. Great models and fancy methods are not a substitute for intelligent thinking or like Banks & Chwift [2] have stated it: the most critical component of a modeling and simulation project is not software or hardware but human ware. At the end, a scientist should bear in mind that every model no matter how carefully built is simply a tool of extracting information of interest from selected data. The truth is indefinitely complex and a model as its best is just a good approximation of the truth. Or like Box and Draper [6] has famously pointed out: “Essentially, all models are wrong, but some are useful”. References [1] Augusiak, J., van den Brink, P.J., Grimm, V. (2014). “Merging validation and evaluation of ecological models to ‘evaludation’: a review of terminology and a practical approach”. Ecological Modelling. [2] Banks J. and Chwif, L. (2010). “Warnings about simulation”. Journal of Simulation. [3] Barth, R., Meyerb, M. and Spitznerc, J. (2012). “Typical Pitfalls of Simulation Modeling - Lessons Learned from Armed Forces and Business”. Journal of Artificial Societies and Social Simulation 15 (2). [4] Blum, W., and LeiB, D. (2007). “How do students and teachers deal with modelling problems?” In C. Haines, P. Galbraith, W. Blum, & S. Khan (Eds.), Mathematical modelling: Education, engineering and economics Horwood Publishing. [5] Borges, J. L., (1999). Collected Fiction, translated by Hurley A., Deckle Edge. [6] Box, G. and Draper, N. (1987), Empirical Model-Building and Response Surfaces, John Wiley & Sons. [7] Breiman, L., (2001). “Statistical Modeling: Two Cultures”. Statistical Science 16 (3). [8] Burnham, K., and Anderson, D. (2002). Model Selection and Multimodel Inference - A Practical Information-Theoretic Approach. SpringerVerlag. [9] Burton, R., and Opel B., (1995). “The Validity of Computational Models in Organization Science: From Model Realism to Purpose of the Model”, Computational and Mathematical Organization Theory 1(1). [10] Downes, S. (2011). “Scientific Models”. Philosophy Compass 6 (11). [11] Dym C., (2004). Principles of Mathematical Modeling, 2nd Edition, Academic Press. [12] Epstein, J. M. (2008). ”Why Model?”. Journal of Artificial Societies and Social Simulation 11 (4). [13] Ford A. (2009). Modeling the Environment, 2nd edition. Island Press. [14] Frigg. R., and Hartmann, S., "Models in Science", The Stanford Encyclopedia of Philosophy (Spring 2017 Edition), Edward N. Zalta (ed.) [15] Leedy, P. and Ormrod, J. (2014) Practical Research: Planning and Design, 10th Edition. Pearson Education Ltd. [16] Perrenet, J., and Zwaneveld, B. (2012). “The many faces of the mathematical modeling cycle”. Journal of Mathematical Modelling and Application, 1(6). [17] Shmueli, G. (2010). “To Explain or to Predict?”, Statistical Science 25 (3). [18] Stockburge D. (2001). Introductory Statistics: Concepts, Models, and Applications, 2nd edition. Atomic Dog Publishing;
8
Matti Koivisto / Procedia Computer Science 119 (2017) 8–15 Matti Koivisto / Procedia Computer Science 00 (2017) 000–000
15
[19] Teller, P. (2001). ‘Twilight of the Perfect Model.’ Erkenntnis 55 (1). [20] Uhrmacher, A.. (2012). “Seven pitfalls in modeling and simulation research”. In: Winter Simulation Conference, WSC '12, Berlin, Germany. [21] Varian, H. (1997). “How to build an economic model in your spare time?”. The American Economist. 41 (2). [22] Westfall, P. and Henning, S. (2013). Understanding Advanced Statistical Methods. Taylor & Francis.