Documented decision-making strategies and decision knowledge in open source projects: An empirical study on Firefox issue reports

Documented decision-making strategies and decision knowledge in open source projects: An empirical study on Firefox issue reports

Information and Software Technology 79 (2016) 36–51 Contents lists available at ScienceDirect Information and Software Technology journal homepage: ...

1MB Sizes 0 Downloads 33 Views

Information and Software Technology 79 (2016) 36–51

Contents lists available at ScienceDirect

Information and Software Technology journal homepage: www.elsevier.com/locate/infsof

Documented decision-making strategies and decision knowledge in open source projects: An empirical study on Firefox issue reports Tom-Michael Hesse a,∗, Veronika Lerche b, Marcus Seiler a, Konstantin Knoess b, Barbara Paech a a b

Institute of Computer Science, Heidelberg University, Im Neuenheimer Feld 205, 69120 Heidelberg, Germany Institute of Psychology, Heidelberg University, Hauptstrasse 47–51, 69117 Heidelberg, Germany

a r t i c l e

i n f o

a b s t r a c t

Article history: Received 1 September 2015 Revised 23 June 2016 Accepted 27 June 2016 Available online 29 June 2016

Context: Decision-making is a vital task during software development. Typically, issue tracking systems are used to document decisions in large open source projects where developers are spread across the world. While most decision documentation approaches assume that developers use rational decision strategies, in practice also naturalistic strategies are employed. However, quantitative studies of the distribution of decision strategies and related knowledge are missing.

Keywords: Decision-making strategy Naturalistic decision-making Rational decision-making Decision knowledge Decision documentation Design decision Software development decision Empirical study Issue tracking system

Objective: Our overall goal is to provide insights and ideas for further research to systematically support and document decision-making during software development in open source projects. In this paper, we analyze decisions documented in comments to issue reports in order to understand the documentation of decision-making in detail. Method: We coded the comments of 260 issue reports of the open source project Firefox for decisionmaking strategies and knowledge on decisions. Then, we statistically analyzed the coded data with regard to the dominant decision strategy, the distribution of decision strategies and knowledge, and the relations between strategy and knowledge. Results: The vast majority of documented decision-making strategies was naturalistic. Interestingly, for feature requests the percentage of rational decision-making strategies was higher than for bugs. Documented knowledge mostly concerned the decision context. More solutions were documented together with a higher amount of naturalistic decision-making. However, solutions were negatively correlated with the assessment of the situation. So, developers are likely to exploit and document decision problems and solutions in an imbalanced way. Conclusion: Our analysis revealed important insights on how decision-making and its related knowledge is documented during software development in open source projects. For instance, we found naturalistic decision-making to play an important role for development decisions. Our coding tables can be used by other researchers to further investigate our results. The study insights should be reflected in decision support systems to improve their effectiveness and acceptance by developers. © 2016 Elsevier B.V. All rights reserved.

1. Introduction 1.1. Research problem and goal



Corresponding author. E-mail addresses: [email protected] (T.-M. Hesse), [email protected] (V. Lerche), [email protected] (M. Seiler), [email protected] (K. Knoess), [email protected] (B. Paech). http://dx.doi.org/10.1016/j.infsof.2016.06.003 0950-5849/© 2016 Elsevier B.V. All rights reserved.

Decision-making is a vital task within the software engineering process [40], as the software system under construction depends on the decisions made during different development activities. In open source software projects, decisions are typically discussed and documented in issue tracking systems where issue reports are collected and processed during software development. For instance, Ko and Chilana [27] investigate decisions

T.-M. Hesse et al. / Information and Software Technology 79 (2016) 36–51

documented in issue tracking systems for the Firefox and the Linux kernel projects. Issues are reported by system users or developers to introduce and document a specific concern in the development. Then, the issue reports are discussed and processed by developers what may include further questions to the initial reporter. Issue reports can be divided into bug reports, which describe errors and unintended behavior, and feature requests, which describe functionality extensions. In most cases, issue reports indicate a specific need for adaption or improvement of a given software system. They may concern general system functions or only particular components. Thus, decisions made by the development team to address these needs are rather topical and often span across different development activities. When development teams are globally distributed and make issue-related decisions, for instance in the large open source project Firefox, they need a place to communicate and share their knowledge. Then, this knowledge is typically documented within the comments of an issue report. Regarding the decision-making process, two major types of strategies can be distinguished: rational decision-making (RDM) and naturalistic decision-making (NDM) [56]. For developers, making decisions means to solve a decision problem by analyzing a set of alternatives with different criteria in order to choose a solution [35]. The criterion-guided analysis is a rational decision-making approach. Note that the common understanding is that developers should apply rational decision-making to identify the optimal solution for the given decision problem. For instance, RDM is encouraged for prioritizing requirements [4] or making architectural design decisions [13]. RDM basically means that all possible alternatives to solve the problem are carefully determined and examined by using a complete set of criteria. Therefore, all necessary information has to be collected by the developers. However, in practice, resource and time limitations impede developers in the complete collection and exploitation of information. Thus, a thorough application of RDM is often hindered. Interviews with developers indicate that also naturalistic decision-making is applied in practice [56,57]. NDM is based on problem recognition and comparison between current and former situations to find an applicable solution in time. Therefore, solutions are used repeatedly for current decision problems, if these solutions have been successfully applied in similar situations before. In consequence, NDM is restricted by the decision makers’ experience within the actual context and the knowledge of rules for recognizing and matching similar situations. We will refer to major aspects of these decision-making strategies, such as searching for an optimal solution or matching situations, as decision-making strategy elements. Making a decision requires developers to collect and evaluate different kinds of knowledge on decision problems. In this paper, such knowledge is called decision knowledge. It consists of questions and context to describe the decision problem, alternatives to solve the problem and rationales to justify the choice [21]. We refer to these contents as decision knowledge elements. The overall goal of this paper is described using the Goal Question Metric (GQM) approach [3]. It highlights the addressed matter, the investigated objects, the purpose of investigation, the study context, and the viewpoint from which the investigation is performed. Using GQM, the goal of our study can be formulated as:

• •

• •



Determine significant quantitative effects with respect to documentation of decision-making strategies and their related knowledge for the purpose of improving the knowledge management in the context of development decisions in comments to issue reports for the open source project Firefox from the viewpoint of researchers.

37

The scope of our study is to investigate decision documentation without being restricted to a particular kind of decision-making process or decision knowledge. Therefore, we require access to a detailed documentation of decisions for large software development projects. Thus, we have chosen to investigate discussions in comments to issue reports of an open source project. Most open source projects, and Firefox in particular, do not enforce a specific documentation technique or style for decision knowledge (cf. study results of Ko and Chilana [27]), but offer a documentation of realistic and complex development activities and their related decisions. This documentation is explicit and available within issue tracking systems. The documentation is explicit, as development teams of huge open source projects usually are spread across the world. Therefore, they need to make their decisions and the corresponding discussion processes visible to the other team members on a common platform. The documentation is available, as typically open source projects try to encourage new developers to participate in the project. In consequence, all developers are interested in making the project and its decisions comprehensive and exploitable. Therefore, they share their current and previous decisions in issue tracking systems. Current studies and documentation approaches typically address specific kinds of decisions (cf. Section 2), such as decisions on architecture and design. We identified two important characteristics within this existing work. First, existing studies either focus on observing the developers’ decision-making behavior [45] or perform interviews to examine decisions in retrospect [56,57]. In consequence, many existing studies do not provide quantitative results for decision-making. Second, decision knowledge is often investigated in relation to given knowledge models (cf. [8,29,31,52]) and their tool support (cf. [28,48]). Also, links between decisions and related artifacts like requirement specifications [1,5], architecture descriptions [22,58], or code files [7,20] are investigated. All these approaches address software development projects in general and are typically applied in academic example projects or well-defined industry case studies. In consequence, developers are typically requested to apply a particular style of documentation [32], or they are biased by other project-specific documentation constraints. In addition, access to realistic and detailed data with different development iterations and multiple developers is limited. Regarding these characteristics, our study complements the insights from existing studies by performing a quantitative analysis of decision documentation in issue tracking systems of an open source project. We expect that our study also provides general insights for software development in open source projects, which should be subject to further research. However, it is important to note that the study does not aim at assessing the quality or outcome of a decision. Moreover, we only cover documented decisionmaking strategies and decision knowledge. When developers think about making a decision, they may follow one strategy and document their thinking according to another one. For instance, the outcome of a developers rational weighing of different alternatives could result in a naturalistic documentation favoring one solution and omitting the process behind this claim. However, for other developers, only the documentation of this decision process is available and therefore crucial for their comprehension. Therefore, our approach is only focused on decision documentation and our analyses and results only apply to documented decision-making strategies and knowledge.

1.2. Research questions and contributions We address the aforementioned goal by investigating three different research questions that are outlined in the following paragraphs.

38

T.-M. Hesse et al. / Information and Software Technology 79 (2016) 36–51

Quantitative analysis of decision-making strategies. According to conceptual models of NDM [25,30] and previous study results [56,57], NDM and RDM are mixed and applied together by developers in practice. We expect that developers mainly document the decision-making strategy, which is intuitive to them. However, there are no studies quantitatively investigating which parts of each strategy are documented by developers in which proportion and on which factors this proportion depends. Insights into these proportions and a possible dominant strategy help to understand how developers document their decision-making strategies in practice. Therefore, we investigate whether one kind of strategy is dominant in the documentation. This is valuable input for improving knowledge management systems for decisions. The resulting research question RQ1 is: How are different decisionmaking strategies distributed within the decision documentation for open source projects? Quantitative analysis of decision knowledge. Argumentation structures in discussions in issue comments have already been investigated [27]. However, it is currently unknown what decision knowledge is documented in such comments in detail. Also, it is not clear which parts of documented knowledge are related to an increased overall amount of documentation. Therefore, an analysis of the distribution of decision knowledge elements and their intercorrelations is required. Analyzing this distribution and these relationships helps to align knowledge models and templates for decision documentation to the actually given decision knowledge. The resulting research question RQ2 is: How are decision knowledge elements distributed and correlated in open source projects? Relation of decision-making strategy and decision knowledge. Currently, no study could be found that investigates the relations between documented decision-making strategies and decision knowledge. However, such relationships are stipulated by theory. For instance, the quantity of naturalistic decision-making strategy elements is supposed to correlate with the quantity of context descriptions, the occurrence of only one stated solution and a supporting argument based on personal experiences [6]. Exploring such relations within realistic data helps to better align the documentation of decision-making strategies with their related knowledge. The resulting research question RQ3 is: How are decisionmaking strategies and decision knowledge elements intercorrelated? To answer these research questions, we have performed an empirical study on the comments to issue reports for two different releases of the Firefox project. We used the Firefox project, as it is a well-documented development project with a broad acceptance as study subject [27,43,55]. All described analyses were performed in retrospect based on the given issue reports. We investigated the comments to issue reports and not the issue description in the report itself with regard to the documented decisionmaking strategies and decision knowledge elements. We did so, as the Firefox project prescribes a default structure for such descriptions, asking for observed and expected behavior as well as for steps to reproduce the described situation. The application of this structure was enforced by the lead developers of the project. In consequence, generic decision knowledge elements, like questions or constraints, were given for the description by default. However, we did investigate the issue description to determine the issue type.

1.3. Structure of the paper In Section 2, the theories and related work for RDM, NDM, decision knowledge, and decisions in issue tracking systems are introduced in detail. In Section 3, the research process of this study is described. Section 4 presents the findings of the study which are discussed in Section 5. Possible threats to validity are assessed in

Section 6. Finally, Section 7 concludes the paper and presents ideas for future work. 2. Background and related work In the following subsections, an introduction to decision problems, RDM, and NDM, as well as our notion of decision knowledge are given. These models of decision-making and decision knowledge are the basis for the coding tables which we used for the analysis of the comments on Firefox issue reports. Moreover, as related work, we discuss other empirical studies on decisions in issue tracking systems of open source projects and on rationale and decision-making in software design. Decision problems. For introducing decision-making strategies and decision knowledge, it is important to briefly sketch the term decision problem, as decision-making is a problem solving activity according to Zannier et al. [56]. This means, developers have to understand and structure a decision problem and its related knowledge before the actual decision on a solution can be taken. Typically, two different kinds of problems are distinguished: wellstructured problems which offer criteria to determine appropriate solutions for the given problem and ill-structured problems for which such criteria are not transparent and need to be defined first [37]. For well-structured problems, also the problem-solving process can be approached in a structured and managed way. In contrast, for ill-structured problems, there is no stopping rule for solution search and evaluation, which makes the problem-solving more complicated. These types of problems are closely related to the applied decision-making strategy: Each kind of problem may promote a particular decision-making strategy, but strategies are also mixed and used intertwined for the same decision problem in practice [56]. Decision problems may arise during different development activities, such as requirements engineering [35], design [22], or implementation [7]. Thus, different types of decisions are distinguished, such as decisions on prioritizing requirements, or design decisions. Decision types may consist of further sub-types, as demonstrated by Kruchten with an ontology for types of architectural design decisions [28]. However, decisions typically span across multiple development activities. For instance, architectural design decisions are related to both, requirements engineering and design [9]. 2.1. Strategies for decision-making Rational decision-making (RDM). The term rational decisionmaking subsumes a variety of different decision-making theories and research streams. For instance, Lipshitz et al. distinguish classical decision-making, behavioral decision theory, judgement and decision-making, and organizational decision-making as RDM approaches. All RDM approaches share the major assumption that humans are rational decision-makers searching systematically for relevant information and weighing each alternative with respect to this information [23]. Accordingly, three characteristics of RDM approaches can be identified [56]. First, they aim at choosing the optimal solution to the given decision problem. Thus, the goal of RDM is optimizing the problem’s solution according to the given criteria. Second, there is a focus on the input and output of the decision, as determining the optimal solution requires a criterion evaluation for all given alternatives and criteria. Third, RDM often relies on a formalism in developing context-free and abstract models of the decision [37]. A prominent example is the Questions, Options, and Criteria (QOC) model by MacLean et al. [31]. RDM thereby implies a consequential choice of the solution according to the given information.

T.-M. Hesse et al. / Information and Software Technology 79 (2016) 36–51

However, RDM assumes that there is thorough information on the set of alternatives and their assessment with respect to the given criteria. Moreover, the decision maker is expected to actually strive to find the optimal solution. Finally, time is not relevant for the decision maker when evaluating the alternatives and calculating their outcomes. This led to doubts and criticism concerning RDM. For instance, Tversky and Kahneman [51] showed that decision makers often fail to make rational decisions as they would be predicted by RDM theories. In contrast, decision makers seemed to use shortcuts and heuristics which are subject to biases and errors. Moreover, decision makers often have the goal to satisfice rather than to optimize their decisions. Then, they invest only as much cognitive energy as is necessary for a satisfactory decision which is not necessarily optimal [23,56]. Finally, it is argued by Klein et al. [25,26], that RDM does not contribute to the understanding of real world decision-making, because RDM is often tested or evaluated within laboratory settings. As laboratory settings need to be standardized, they cannot incorporate contextual or situational factors. Therefore, such settings are not significant to participants of a study as they have no further impact on their life or work. This criticism helped a second type of strategies to emerge: naturalistic decision-making. Naturalistic decision-making (NDM). Like RDM, naturalistic decision-making as well is an umbrella term for different decisionmaking strategies sharing the same assumptions and characteristics. For instance, the cognitive task analysis and the critical decision method are based on NDM [44] and the recognition-primed decision-making strategy is a typical NDM strategy [25]. NDM shifts the goal of the decision-making process from finding the optimal solution to determining a sufficient one. It relies on the experiences, heuristics, and individual knowledge of the decision maker. This makes the consideration of different alternatives unlikely for NDM. Instead, often only a singular evaluation of a formerly experienced decision situation and its solution takes place. Therefore, past situations have to be assessed and matched to the current one [25]. This process is a vital part of the recognitionprimed decision-making strategy as depicted in Fig. 1. To allow the matching of situations, NDM strategies focus on describing the prerequisites and rules for matching situations and accept informal models with incomplete information [37]. These characteristics of NDM suit decision makers which aim to react in real-time to a decision problem under dynamically changing conditions, time stress, and potentially far-reaching consequences [36]. Thus, the decision maker requires expert knowledge to match situations and evaluate the prerequisites for applying former solutions in the current situation [30]. However, also NDM is criticized. For instance, Gore et al. argue that the methodology of NDM is largely based on qualitative field studies [18]. In summary, it should be noted that both types of strategies have particular prerequisites, strengths, and shortcomings. Depending on the type of decision problem, its context, and its environmental factors, decision makers may tend to use a mix of both strategies or favor one of them [56]. 2.2. Decision knowledge A decision problem can consist of many different knowledge elements, which we refer to as decision knowledge. In general, a decision problem at least consists of the problem description itself, a set of alternatives, and a set of criteria to evaluate these alternatives [35]. However, also context information should be considered as decision knowledge, such as assumptions about the problem, implications of an alternative, or constraints for choosing the solution. In addition, there are rationales for choosing a solution or matching a situation. Often, they come in the form of arguments or positions supporting or challenging an alternative [8,52].

39

Fig. 1. Model of recognition-primed decision-making according to [25].

Various models exist to document all this knowledge during software development [37], especially for architectural design decisions (cf. the approaches in [8,28,48,52] and the mapping study of Tofan et al. [50]). However, decisions in issue tracking systems of open source projects are rarely design decisions. In addition, the decisions are developed and refined over time. Therefore, we decided to use our own documentation model. It supports the iterative documentation of decision knowledge by integrating and further improving many of the given approaches [21]. The model offers a variety of different knowledge elements for decision knowledge as depicted in Fig. 2. The basic element is the Decision containing the decision knowledge for one decision as aggregated DecisionComponents. We distinguish different kinds of DecisionComponents to describe the decision’s Problem and Solution, any relevant Context and Rationale for the decision. In contrast to other knowledge models for decisions, this documentation model does not enforce the usage of a static template to describe decisions. Instead, the available knowledge can be aggregated over time by adding further DecisionComponents to a given Decision iteratively. This can be done by different persons and even during different activities, e.g., during software design, implementation, and maintenance. Overall, this suits well our needs to analyze issue-related decisions, as these decisions are typically documented in multiple comments over time by different project collaborators. 2.3. Knowledge management for decisions in issue tracking systems Knowledge on both, decision-making strategies and decisions, is managed in decision support systems. For instance, such sys-

40

T.-M. Hesse et al. / Information and Software Technology 79 (2016) 36–51

Fig. 2. Elements of decision knowledge according to [21].

tems can be realized as negotiation support systems to support group decision-making or as knowledge management systems to support knowledge transfer, storage, and retrieval [2]. In particular for knowledge on design decisions, many knowledge management systems exist. An overview can be found in the comparative studies of Ali Babar et al. [1] and Tang et al. [47]. Issue tracking systems can be used as knowledge management systems for decisions [27]. We study decisions documented in issue tracking systems to learn about decision-making in software development in open source projects. This is suitable, as many popular open source projects use an issue tracking system to support development activities [16]. Typically, issues contain many types of information which either address the development or the administrative perspective of the project. From the development perspective, issues may contain descriptions of requirements, concrete design proposals, development tasks, or bug fixing and refactoring requests. Among others, from an administrative perspective, issues comprise information on who created the issue report or the comment and when, a classification of priority and importance, the implementation status of the issue, as well as the addressed technical component. According to the concern of an issue, it can be either classified as bug report or feature request. In the first case, an erroneous functionality or unintended behavior of the system under development is filed in order to be fixed. In the latter case, a new or extended functionality is requested in order to improve the system. In both cases, a decision needs to be made as to whether the issue shall be addressed in development and how. Such decisions on issue reports are typically accompanied by comment discussions of varying length and intensity on what to decide. 2.4. Related empirical studies Several empirical studies have already addressed decisionmaking strategies and decision knowledge in software development. Ko and Chilana investigated discussions in the comments of Firefox issue reports [27]. Their quantitative study analyzes the rhetorical argumentation structure of issue discussions by coding the comments according to their contribution. They distinguish between a stated idea or rationale, the decision itself, and whether the scope, dimension, or process was sharpened. The findings of this study indicate that decision discussions for is-

sue reports mostly have the aim to explore the design space of potential issue solutions. Within this process, a struggle for the intended use of the solution and its quality can be observed. However, Ko and Chilana focus on the argumentation structure to investigate the outcome of issue discussions. Their codes and analyses do not aim to understand the decision-making process and the involved knowledge in detail. In contrast, our study employs fine-grained codes specifically designed to capture the different decision-making strategies and decision knowledge elements. Many other related studies investigate software architecture and design as decisions in these activities have wide-ranged consequences for the project. One of the first qualitative studies on how the different decision-making strategies are applied in software projects was presented by Zannier et al. [56,57]. They interviewed 25 professionals and found that decision-making strategies are often mixed within one decision problem, particularly for agile projects [56]. Mentis et al. investigated the rational decision-making process for planning tasks with 36 undergraduate and graduate participants grouped in twelve teams [34]. The statements within the team discussions were categorized into information sharing, interpretation, summarizing, and arguing. As a result, they found decisions to be dominated by information sharing in newly built teams, whereas established teams mostly argued. Also, information sharing dominated the beginning of the decision process, whereas argue statements were found to be dominant at the end [34]. Tang et al. observed the decision-making process in two teams of software designers with the same design task and a time limit of two hours [45]. The decision-making activities of both teams were recorded and analyzed with regard to the co-evolution of problems and solutions in decision problems. It was found that both teams approached the identical task in very different ways and on different abstraction levels. One team relied on a solution-driven decision process without defined structures and the other followed a problem-based design space exploration [45]. In addition, Tang et al. also investigated the capturing and usage of design rationale among 81 practitioners in another study [46]. The study revealed that concrete rationales are more often documented than generic ones. Moreover, designers tend to document positive rather than negative arguments. Recently, Tang and van Vliet presented a study on how software designers approach the reasoning process to find a solution for a design problem [49]. They found that software designers mostly select satisficing solutions instead of searching for the optimal solution when deciding on how to solve the design problem. However, none of these studies investigated the applied decision-making strategies quantitatively and in a fine-grained manner. For all studies, only a qualitative summary of the identified strategy components is given. Except for the studies of Tang et al. and Tang and van Vliet, the considered decision knowledge is not presented in detail and relations between decision-making strategy and knowledge are not evaluated explicitly. Also in the studies of Tang et al. and van Vliet, with the focus on rationale, only a part of the relevant decision knowledge was examined. In addition, only the study of Zannier et al. investigated decisions in practice, whereas Mentis et al., Tang et al., and Tang and van Vliet used an experimental setup with a prepared example scenario for decision-making. However, in the study of Zannier et al., the developers were interviewed in retrospect to extract the decisions. Thus, there is a risk that information on decision-making is missing or erroneous. In contrast, our study relies on documented knowledge within issue comments. All investigated issues originate from the original Firefox issue tracking system and consequently contain complex and detailed decisions. Both, decision-making strategies and decision knowledge are considered.

T.-M. Hesse et al. / Information and Software Technology 79 (2016) 36–51

Fig. 3. Applied research process with phases and activities.

3. Research method and process This section describes the research method and process we applied in our study. We performed case study research and followed the guidelines of Runeson et al. [41]. Accordingly, the research method consists of a process divided into three phases: preparation of the study and data under investigation, coding of the data, and data analysis. Each phase comprises several steps and is described in more detail in one of the following subsections. An overview of the entire process is depicted in Fig. 3. We first performed a pilot study as part of a master thesis which investigated one Firefox branch and a smaller set of issues. This helped us to further define our research method and the study setup. Based on the insights from this pilot study, we derived two hypotheses which we tested in our main study. However, we also performed further exploratory analyses of the data. Then, the coding table for the issue comments was derived and the issues to be investigated were extracted. In the next phase, issues were coded by two coders in three steps: First, a coding training was performed where both coders coded two sets of 50 issues. For each set, the quality of the codings was assessed by calculating the intercoder reliability and both coders aligned their understanding of the coding table. Then, the types of all issues were determined (whether they were bug reports or feature requests) and the final set of investigated issues was selected. Afterwards, one coder coded the final set of 100 issues from branch 6 and the other one the final set of 160 issues from branch 27. One set was coded by one coder only due to large amounts of comments per issue. Note, that this procedure is justified by the good intercoder reliability values. Finally, the coded data was analyzed statistically. 3.1. Preparation phase Development of hypotheses. Although most of our study investigations are based on exploratory data analyses, we developed and tested two hypotheses for RQ1. They are based on the background and related work for decision-making strategies as well as on the results of our pilot study. Our first hypothesis is based on the studies of Zannier et al. [56,57]. In interviews with developers, they found both NDM and RDM to be used in software development practice in general and in software design in particular. This contradicts an engineering approach to software design which should rely on rational decision-making only, as stated by Falessi et al. [13]. However, we expect that in practice NDM strategies are likely to dominate

41

documented decision-making processes. First, Zannier et al. point out that it is likely to observe a dominance of one decisionmaking strategy over the other instead of a balanced mixture of both strategies for design decisions [56]. Second, in real-world settings NDM occurs more often than RDM [25,36]. This may result from situational factors, which contribute to using and documenting NDM. For instance, different developers take part in issue discussions (“multiple deciders”, see [36]) and issue reports often contain incomplete or changing information (“uncertain environments”, see [36]). Such settings are typical for open source software development projects with various, globally spread developers and short release cycles to cope with time pressure and changing requirements [24]. In addition, open source projects often have experts in their development teams who will use their expertise and thereby fulfill a main prerequisite for NDM (cf. Section 2.1). Accordingly, we expect more NDM than RDM in the investigated issue comments (hypothesis H1a). Our second hypothesis concerns the proportion of NDM to RDM for bug reports and feature requests respectively, and thereby refines H1a. According to the findings of Ko and Chilana [27], decision discussions struggled to explore the design space (cf. Section 2.3). This could be due to a dominance of NDM. However, Ko and Chilana did not differentiate between bug reports and feature requests. Bug reports often require developers to improve a given functionality, so that the original solution only has to be further refined or adapted. Instead of an extensive design space exploration, developers are required to perform structured and defined development activities, such as testing and debugging of the current implementation. This procedure of analyzing the given situation in order to find an applicable solution is in line with the definitions of NDM [25]. In consequence, there should be a smaller need for further problem and work structuring resulting in a high percentage of NDM. In contrast, we expect the exploration of the design space to be more prominent for feature requests as new functionality has to be developed or given functionality has to be extended significantly. Therefore, new design and implementation solutions are required. For feature requests, the procedure to acquire a solution is often not clearly defined beforehand, as typically different implementations of a feature exist to address a given request. These implementations have different qualities which are often not fully comparable by means of one single criterion. In consequence, developers need to clarify the decision problem further in order to determine relevant criteria. Depending on how different alternatives perform according to these criteria, developers can then choose the best solution. This procedure is in line with the definitions of RDM [56], so that the percentage of NDM should be lower than for bug reports. In summary, we expect a higher percentage of NDM for bug reports than for feature requests (hypothesis H1b). Creation of coding tables. We developed two different coding tables, one for decisionmaking strategies (see Table 1) and one for decision knowledge (see Table 2). The development of the coding tables was based on the principles of content analysis according to Mayring [33] and the fundamentals presented in Section 2. We identified major characteristics of decision-making strategies and decision knowledge based on current theories and approaches. These characteristics are the basis for the derived codes of our initial coding table, which was further refined during the coding training in the second phase. Thereby, we merged codes with strong overlaps. For instance, the codes Claim and Alternative were initially derived from the knowledge model (cf. Section 2.2). However, both are often intertwined as a claim can easily contain an alternative and vice versa. So, these codes were merged to the more general Solution.

42

T.-M. Hesse et al. / Information and Software Technology 79 (2016) 36–51

Table 1 Coding table for decision-making strategy elements. Source Naturalistic decision-making Klein [25], Zannier et al. [56] Klein [25], Zannier et al. [56] Klein [25] Klein [25]

Code

Definition

Satisficing

Alternatives are better or worse, a satisfactory/workable solution is aspired. Further alternatives are not considered or they are evaluated serially. Retrieval of attributes that make the current situation comparable to others. The current decision problem or situation is linked to another situation.

Singular Evaluation Situation Assessment Matching

Comment example

Rational decision-making Klein [25], Zannier et al. [56]

Optimizing

Alternatives are right or wrong, the best possible solution is desired.

Klein [25], Zannier et al. [56]

Consequential Choice

Lipshitz et al. [30], Zannier et al. [56]

Criterion Evaluation

Further alternatives are considered, one option is selected from a list of others or options are evaluated concurrently. Criteria linked to alternatives are considered, reasons/rationales for choosing an alternative are provided.

“[...] In the mean time, this user style appears to work around the problem [...]” “It works on today’s Nightly, so I guess the problem was fixed in the meantime.” “[...] are you saying that this only happens when the 3 of them are installed at the same time?” “In my case [...] the back button is not activated, so there is no way to restore the session. See bug 928626. [...]” “[...] Hence I think there is probably a better way to solve this - how about using Firebug’s ability to break on DOM events to watch what’s going on?” “[...] For Beta with irregular builds and minimal functional changes it seems a lot less important. [...]” “[...] Also there are probably cases where what we really need is the state at onload or some other milestone rather than what goes across the wire. [...]”

Table 2 Coding table for decision knowledge elements. Source

Code

Definition

Comment example

Hesse and Paech [21], Tyree and Akerman [52]

Question

Description of a concern or goal for the decision

“Also the pinch-in, pinch-out zoom feature is not supported by default [...]”

Solution

A proposed alternative or claim in order to solve the decision problem General context information for the current decision

“[...] And I solve the problem by temporarily setting gfx.content. azure.enabled to false.” “[...] As far as I know, it’s only in FF6 that cookie preferences are actively being deleted/reset [...]” “[...] I believe this is intentional, so that beta user’s UA is identical to that used for final release [...]” “[...] there is no place for it in a max 30px tab [...]”

Context

Assumption

Constraint

Implication

pro-/con-Argument Question pro-/con-Argument Solution

A belief or expectation concerning decision-related knowledge (like requirements or resources) Restriction for another decision knowledge element originating from either the decision problem or its environment Consequence or outcome of another decision knowledge element An argument supporting or attacking a given question An argument supporting or attacking a given solution

Also, some codes had to be refined. For instance, the Argument code was differentiated into four different subtypes to cover attacking or supporting relations of arguments for either questions or solutions. We refer to these codes as pro-/contra-arguments for questions and solutions. In addition, some codes were not applicable on the data in the coding training and therefore had to be excluded. For example, the code Decision could not be applied as the overall decision was a result of all comments to an issue. Moreover, we did not create codes for relations between decisions. Various relations may exist between decisions, e.g., in the context of design decisions [28,53]. For instance, decisions can depend on or exclude each other at different granularity levels. First, the information given in issue descriptions and comments is typically not sufficient to decide whether decisions are related. Second, covering these relations with additional codes would have significantly increased the overall amount of codes. Large numbers of codes are more difficult to apply by the coders [33]. Data selection, extraction, and cleaning. We performed our study with data from two different branches of the Firefox project to ensure that our statistical analysis of the

“[...] since it’s internal API [...] it can’t be used and will have to be reimplemented in libcubeb [...]” “OK I started in safe mode, still the same problems exist” “After pointing Ubuntu to XP driver for the wi-fi card, all seems OK. [...]”

coded data is not biased by any branch-specific events or focus during the development. In particular, we selected the branches Firefox 6, developed in 2011, and Firefox 27, developed in 2014. We will refer to them as branch 6 and branch 27. These branches were selected because they provide a suitable number of issues for analysis: 642 issues were linked to branch 6 and 559 issues to branch 27. In addition, they were developed under the same release model, but clearly differ concerning development time and functionality. On the one hand, the length of release cycles changed. Whereas traditional release cycles were applied for branches 1 to 4, shorter cycles of 6 weeks were applied in a rapid release model. This impacts the total amount of issues and their variety of addressed topics within a branch [24]. Therefore, we selected two branches developed according to the rapid release model to avoid differences caused by different release models. On the other hand, branch 27 was developed three years after branch 6. This helps to ensure that the amount of similar decisions within both branches is minimal. In total, 1201 issues could be retrieved from the issue tracking system of Firefox [15]. We exported all issues and their comments and imported this raw issue data to Ex-

T.-M. Hesse et al. / Information and Software Technology 79 (2016) 36–51

43

Table 3 Intraclass correlation coefficients (ICC) of decision-making strategies and decision knowledge elements. Decision-making strategy elements NDM Satisficing Singular Evaluation Situation Assessment Matching RDM Optimizing Consequential Choice Criterion Evaluation

ICC

.96 .88 .86 .78 .61 .66 .61

Decision knowledge elements

ICC

Question Solution Context Assumption Constraint Implication Argument pro Question Argument pro Solution Argument contra Question Argument contra Solution

–a .86 .89 .82 .90 1 .94 .65 .84 .79

a For the element “Question”, no intercoder reliability was calculated because the variance in the subsample was too small, as both coders assigned the Question code only once.

cel. Then, the data was formatted to distinguish between the initial issue report, its metadata, and each comment attached to an issue. Any text or formatting artifacts from the export process were removed manually to avoid data loss and misinterpretation during coding. 3.2. Coding phase Coder training. Before coding all issue comments in detail, both coders trained the usage of the two coding tables in two training rounds with one set of 50 randomly selected issues per round. The aim of the training was twofold: First, both coders should reach a high agreement on whether a specific code should be set for an issue comment. Second, the feedback of the coders after each round was evaluated to further improve the coding tables. The first training round was performed before the coding of issue types, the second one took place afterwards. Each coder was coding the set of 50 issues independently. After the first round, the coders discussed each other’s codes to compare differently coded issues and solve ambiguities. The second round was performed to assess the intercoder reliability with an issue set with the same proportion of bug reports to feature requests as the final data set. Due to the good intercoder reliability results, no further training round was performed. Intercoder reliability assessment. To measure the quality of the codings, we assessed the intercoder reliability using the intraclass correlation coefficient (ICC) [42] for the number of each code per issue. This assessment was performed on the coding results of the second training round. Specifically, we calculated the ICC(2,1) (i.e., the two-way random single measures ICC) for agreement between ratings using icc from the irr R package [17,38]. According to Cicchetti and Sparrow [10], ICC values below .40 are “poor”, between .40 and .59 “fair”, between .60 and .74 “good”, and above .74 “excellent”. As shown in Table 3, the intraclass correlation coefficients of both decisionmaking strategies and decision knowledge elements are good or even excellent. It should be noted, that the ICC values for NDM and decision knowledge are slightly better than for RDM. This partly results from a lower total number of RDM elements identified by the coders, so that differences between them weigh higher. However, also the ICC values for RDM are still good and therefore justify that only one coder codes the entire final issue set for each branch. Coding of Issue Type and final data selection. All issues were coded by their type as either “bug” for bug report, “feature” for feature request, or “spam” marking issue reports without reasonable content. All issues belonging to one branch were coded by one coder. For this coding the headline and de-

scription of the issues were evaluated. Whereas bugs were typically indicated by an error description, feature requests typically contained a question or claim to extend or create a certain functionality. However, as a default structure for the issue description had to be used by the reporters, the content of the issue description was not included for coding decision-making strategy or decision knowledge elements. Otherwise codes like “question” or “context” would have been set for every issue description and could have diluted the coding results. The classification according to the issue type was used to select the final set of issues to be coded. We first excluded all “spam” issues (about 4%), all issues with no comments (about 9%), and further issues (about 2%) to which no code could be assigned, because the issue was marked as irrelevant by a commentator preventing any further discussion. In both branch 6 and 27, the number of bugs was much higher compared to the number of features. In order to have a sufficient number of features to analyze the influence of the issue type on the percentage of NDM to RDM (hypothesis H1b), we extracted all features. We further selected four times as many bugs as features for each branch to have a larger total sample for the analyses regardless of the issue type. The bugs were selected randomly from the entire corresponding branches. The final set of issues comprised 260 issues with 52 features and 208 bugs. One hundred issues originated from branch 6 and 160 from branch 27. Coding of issue comment content. Each coder analyzed the issues of one branch. If a comment contained links to external resources, e.g., screenshots or source code, we also considered these resources to determine the codes for this comment. 3.3. Evaluation phase Preparation and execution of statistical analyses. For each issue, we recorded several variables from the issue meta-data in order to check if they have an influence on our main dependent variables (i.e., the decision-making strategy and decision knowledge elements). In the following, we call these variables issue dimensions. The most important issue dimensions are the Issue Type (bug report or feature request) and the Branch of the issue (6 or 27). Additionally, we recorded the related technical Component from the issue header, the Number of Comments of each issue, as well as the Issue Duration as the difference between the date of the last comment in the issue discussion and the date of the creation of the issue report. We selected these issue dimensions, as they have a broad range and sufficient frequency in their values to suit our analyses. In addition, they provide important information on the environmental factors of the documented decision-making process.

44

T.-M. Hesse et al. / Information and Software Technology 79 (2016) 36–51 Table 4 Size of relationships between issue dimensions. Issue dimensions

Issue Type

Branch

Component

Issue Duration

Number of Comments

Issue Type Branch Component Issue Duration Number of Comments



no –

small large –

small large large –

no small small small –

The 36 different values of Component were divided into three categories: Issues related to specific components that have been named (in the following: “specific”, n = 114; e.g., “Location Bar”), issues related to specific but unnamed components (“untriaged”, n = 80), and issues related to general components (“general”, n = 66). The Issue Duration ranges from 65 s to more than 4 years, with a mean duration of 268 days. The starting time of 28% of all issues is before the initiation of their respective branch and 76% of all issues exceed their respective branch. This does not necessarily mean that the discussion is still ongoing, as we are unable to definitely determine the actual end of a discussion. Of course, an issue status “closed” or “resolved” is an indicator for the discussion end. However, we observed many issues with many comments and a longer duration for which this status was no set. And the discussion may also start again over time when former solutions turn out to be erroneous or insufficient under changing environmental conditions of the system. The Number of Comments for each issue was ranging from 1 to 48 with a mean of 5.53 comments per issue. Then, we applied both descriptive and inferential statistics on our coded data. Among the descriptive statistics are means (M), standard deviations (SD), and relative frequencies. Among the inferential statistics are χ 2 tests (for the analysis of relationships between two categorical variables), Pearson correlation coefficients (for the analysis of relationships between two metric variables), and t-tests or ANOVAs (for the analysis of mean differences between two or more groups). All tests reported are two-sided. With high sample sizes even relatively small effects can lead to significant results. Therefore, for each significant test we additionally report an effect size measure (r, η2 , ηp 2 or Cramer’s V, depending on the type of test). The conventions for r and Cramer’s V (df = 1) according to Cohen [11] are .1 for a small, .3 for a medium-sized, and .5 for a large effect. For η2 and ηp 2 , .01 corresponds to a small, .06 to a medium, and .14 to a large effect. 4. Results 4.1. Findings for issue dimensions The relationships between the issue dimensions were analyzed and the results are briefly summed up in Table 4. Significant relationships are classified as “no”, “small”, “medium”, or “large” according to the effect size conventions presented in Section 3.3. More details on the direction and size of each significant effect for the issue dimensions are given in the following paragraphs. Issue Type and Branch are completely independent due to our method of data selection. The Component is related to the issue type (χ 2 [2] = 20.54, p < .001, V = 0.28): For features, the specific components predominate the other two categories more than for bugs. Besides, the Issue Duration (in days) is significantly higher for bugs (M = 296.16, SD = 366.72) than for features (M = 156.71, SD = 297.87, t[258] = 2.54, p < .05, r = .16). We found a significant relationship between the Branch and the Component (χ 2 = 90.39, p < .001, V = 0.59). Branch 6 contains mostly issues with general (54%) and specific components (42%), but very few untriaged issues (4%) while in branch 27 untriaged (48%) and specific (45%) components are predominant. Further-

more, there is a significant effect of the Branch on the Issue Duration (t[258] = 16.23, p < .001, r = .71) and the Number of Comments (t[258] = 2.01; p < .05, r = .12): Branch 6 has on average longer issue durations (in days) (M = 589.38, SD = 395.59 vs. M = 67.57, SD = 76.19) and a higher number of comments than branch 27 (M = 6.42, SD = 6.51 vs. M = 4.97, SD = 5.06). ANOVAs revealed a significant effect of the Component on the Issue Duration (F[2, 257] = 49.48, p < .001, η2 = .28) and the Number of Comments (F[2, 257] = 4.60, p < .05, η2 = .03). Issue discussions about general components take longest (in days) (M = 565.98, SD = 408.42) and comprise the highest number of comments (M = 6.82, SD = 7.48), followed by specific (Issue Duration: M = 237.69, SD = 336.21; Number of Comments: M = 5.81, SD = 5.28), and untriaged issues (Issue Duration: M = 66.22, SD = 65.03; Number of Comments: M = 4.06, SD = 4.10). Not surprisingly, the Issue Duration and the Number of Comments correlate positively (r = .24, p < .001): the higher the number of comments, the higher the issue duration.

4.2. Findings for RQ1: decision-making strategies Frequencies and intercorrelations of decision-making strategies. First, we tested our hypothesis H1a. Then, intercorrelations between decision-making strategies were analyzed. Table 5 shows the mean frequencies of each decision-making strategy element per issue. In addition, we divided the mean frequency of each element per issue by the mean total number of decision-making strategy elements per issue. These proportions are also presented in Table 5 and reveal a clear predominance of NDM elements compared to RDM elements. In a t-test for dependent samples, we compared the mean number of NDM elements with the mean number of RDM elements. Consistent with our hypothesis H1a, we observed a significantly higher number of NDM (M = 5.47, SD = 5.94) than RDM elements (M = 0.18, SD = 0.57, t[259] = 15.06, p < .001, r = .68). Unsurprisingly, the number of codes of an issue depends significantly on the Number of Comments of the respective issue (r = .94, p < .001). Therefore, for the correlational analyses reported in Table 5, we analysed the proportion of each code per comment instead of the absolute frequencies. Interestingly, there are small or even negative correlations between the codes of one decision type. A positive correlation sign indicates that the higher one variable, the higher is the other variable. A negative correlation, on the other hand, reveals that the higher one variable, the smaller is the other one. For example, the higher the proportion of singular evaluation codes, the smaller is the proportion of situation assessment codes (r = –.26, p < .001). To facilitate the understanding of our results, we report the analyses based on a combined measure of codes, the sum of NDM codes divided by the total number of decision type codes (also called Percentage of NDM), in the remainder of this paper. Influence of issue dimensions on decision-making strategies. First, we analyzed our hypothesis H1b. Second, we conducted several exploratory analyses on the influence of issue dimensions on the distribution of decision-making strategy elements.

T.-M. Hesse et al. / Information and Software Technology 79 (2016) 36–51

45

Table 5 Means and Standard deviations of the frequency of each decision-making strategy element per issue, proportions of decision-making strategy elements and intercorrelations between the frequencies of decision-making strategy elements per comment. Decision-making strategy element Satisficing (Sat) Singular Evaluation (SE) Situation Assessment (SA) Matching (Mat) Optimizing (Opt) Consequential Choice (CC) Criterion Evaluation (CE)

M(SD) 0.63 (0.99) 0.82 (1.41) 2.88 (3.89) 1.13 (1.47) 0.03 (0.19) 0.06 (0.26) 0.08 (0.34)

Proportion (in %) 12.15 13.80 47.88 24.33 0.35 0.72 0.76

Sat –

SE .02 –

SA ∗∗∗

–.23 –.26∗∗∗ –

Mat

Opt

CC

CE

–.10 –.25∗∗∗ –.50∗∗∗ –

.06 .14∗ .00 .01 –

.01 .17∗∗ .00 .01 .14∗ –

–.09 .04 –.07 .01 .21∗∗∗ .21∗∗∗ –

Legend: ∗ p < .05; ∗∗ p < .01; ∗∗∗ p < .001

Fig. 4. Percentage of NDM as a function of Branch and Issue Type. Note: Bars represent the means and error bars the 95% confidence intervals of the means.

We conducted an ANOVA with Issue Type and Branch as between-subject factors and the Percentage of NDM as dependent variable. The branch was included in order to check if the pattern of results is similar for the two branches. This 2 (bug vs. feature) by 2 (branch 6 vs. branch 27) ANOVA yielded a main effect of Issue Type (F[1,256] = 24.10, p < .001, η p 2 = .09). The Percentage of NDM for bugs (M = .99, SD = .03) is significantly higher than for features (M = .94, SD = .13), what confirms H1b. There is no main effect of Branch (F[1,256] = 1.31, p = .25) indicating that the two branches do not differ regarding the percentage of NDM elements. Neither is there an interaction of Issue Type and Branch (F < 1) which reveals that the pattern (higher percentage of NDM elements for bugs than for features) is independent of the branch (see also Fig. 4). Exploratory analyses showed that the Issue Duration has no influence on the Percentage of NDM (r = –.01, p = .88). The Issue Component influences the Percentage of NDM (F[2,257] = 4.62, p < .05). Hereby, specific components have the lowest NDM proportion (.97), followed by general (.98) and untriaged issues (1.00). As bugs and features differ in the issue component, the two variables are confounded and the observed effect on Percentage of NDM might be due to only one of these variables. Therefore, we conducted a 2 (bug vs. feature) by 3 (specific vs. untriaged vs. general) ANOVA and found a main effect of Issue Type (F[1,254] = 24.47, p < .001, η p 2 = .09). This shows that the effect of Issue Type on Percentage of NDM is stable and not (solely) due to the issue component. The main effect of the issue component, on the other hand, was now no longer significant (F[2,254] = 1.94, p = .15). Furthermore, there is a small effect of the Number of Comments (r = –.12, p < .05) on the NDM proportion: the higher the number of comments contained within an issue, the smaller is the proportion of NDM. 4.3. Findings for RQ2: decision knowledge elements Frequencies and intercorrelations of decision knowledge elements. As can be seen from Table 6, the decision knowledge elements are

not equally distributed. For example, in each issue on average 2 Context elements and 1 Solution element appear. The number of other elements is considerably smaller. As we are mostly interested in the type of argumentation of the discussants, we tested whether the number of arguments in favour of something (sum of Question and Solution elements) excels the number of contra arguments, but found no significant difference (t[259] = 1.31, p = .19). However, the number of Question elements (M = 0.13, SD = 0.50) is significantly smaller than the number of Solution elements (M = 1.00, SD = 1.27; t[259] = 4.52, p < .001, r = .27). Also, two small but significant effects are noteworthy: Solutions are negatively correlated with the amount of Context elements (r = –.27, p < .001), but are positively correlated with arguments attacking a solution (r = .21, p < .001). In addition, the amount of all arguments is intercorrelated with the total number of all other decision knowledge elements (r = .70). The higher the number of comments, the higher the number of decision knowledge elements (r = .90, p < .001). Therefore, we calculated the relative frequencies of each decision knowledge element dividing it by the Number of Comments. The intercorrelations of these relative frequencies are also given in Table 6. Influence of issue dimensions on decision knowledge. In several exploratory analyses, we examined whether the issue dimensions have an influence on the decision knowledge elements. Thereby, we found that the difference between the number of Question and Solution elements depends on the Issue Type (t[258] = 3.37, p < .001, η p 2 = .21). Only for features, there is a higher number of Question than Solution elements (M = 0.21, SD = 1.50), for bugs it is the other way round (M = -0.63, SD = 1.63). The difference between Question and Solution elements does not depend on any other issue dimension. Additionally, for features, there are generally more decision knowledge elements than for bugs (t[258] = 2.20, p < .05, r = .14). Again, none of the other issue dimensions is related to the total number of decision knowledge elements. 4.4. Findings for RQ3: relations of decision-making strategies and decision knowledge elements Table 7 shows the intercorrelations between the frequency of decision-making strategy elements and the frequency of decision knowledge elements per comment. Additionally, the intercorrelation between the frequency of each decision knowledge element per comment and the Percentage of NDM is given. We will only report interesting medium- and large-sized effects (r >= .3). We found a large positive correlation between Solution elements and the NDM elements Satisficing (r = .62) and Singular Evaluation (r = .39). In addition, solutions are negatively correlated with Situation Assessment (r = –.33). In contrast, no significant correlations were uncovered for Solution and RDM elements. To ensure that these correlations hold for both branches, we also analyzed the intercorrelations between decision-making strategy elements and decision knowledge elements individually for each branch. The results reveal that the correlation coefficients presented above have the same sign in both branchess (Solution

46

T.-M. Hesse et al. / Information and Software Technology 79 (2016) 36–51 Table 6 Means, standard deviations, and proportions of decision knowledge elements and intercorrelations between the frequencies of decision knowledge elements per comment. Decision knowledge element Question (Qst) Solution (Sol) Context (Ctx) Assumption (Asp) Constraint (Cst) Implication (Imp) Argument pro Question (ApQ) Argument pro Solution (ApS) Argument contra Question (AcQ) Argument contra Solution (AcS)

M (SD) 0.13 (0.50) 1.00 (1.27) 2.14 (3.18) 0.33 (0.67) 0.06 (0.23) 0.05 (0.24) 0.52 (0.89) 0.28 (0.64) 0.45 (0.83) 0.24 (0.65)

Proportion (in %) 2.82 24.10 38.05 6.21 0.88 0.76 8.65 4.45 10.90 3.19

Qst –

Sol –.15 –



Ctx

Asp

Cst

Imp

ApQ

ApS

AcQ

AcS

–.01 –.27∗∗∗ –

–.07 .07 –.10 –

.04 –.06 .12 –.04 –

–.03 .17∗∗ –.02 –.03 –.02 –

.01 –.19∗∗ .13∗ .06 –.01 –.03 –

–.06 .03 .04 .06 –.03 .07 –.08 –

–.09 –.02 –.05 –.08 –.03 .19∗∗ –.09 –.10 –

–.03 .21∗∗∗ –.01 .05 .04 –.04 –.09 .09 .01 –

Legend: ∗ p < .05; ∗∗ p < .01; ∗∗∗ p < .001 Table 7 Intercorrelations of decision-making strategy elements and decision knowledge elements. Decision knowledge element

Question Solution Context Assumption Constraint Implication Argument pro Question Argument pro Solution Argument contra Question Argument contra Solution

NDM elements

RDM elements

Percentage of NDM

Satisficing

Singular Evaluation

Situation Assessment

Matching

Optimizing

Consequential Choice

Criterion evaluation

–.11 .62∗∗∗ –.12 –.02 –.06 .05 –.11 .00 .01

–.01 .39∗∗∗ –.05 .16∗ –.03 .21∗∗∗ –.16∗ .21∗∗∗ .10

.04 –.33∗∗∗ .26∗∗∗ .06 .06 .16∗ .20∗∗ .07 .15∗

.00 –.07 .10 –.06 –.05 –.09 .02 –.06 –.09

–.03 .17∗∗ –.08 .17∗∗ .12 –.01 –.02 .20∗∗ –.05

.01 .08 .10 .04 .10 .29∗∗∗ .01 .23∗∗∗ –.05

.17∗∗ –.04 .06 –.01 .10 –.02 –.01 .04 –.06

–.08 –.06 –.03 –.04 –.19∗∗ –.09 .01 –.15∗ .09

.00

.26∗∗∗

–.05

–.07

.28∗∗∗

.11

.12

–.19∗∗

Legend: ∗ p < .05; ∗∗ p < .01; ∗∗∗ p < .001

and Satisficing: r = .92 for branch 6, r = .56 for branch 27; Solution and Singular Evaluation: r = .21 for branch 6, r = .43 for branch 27; Solution and Situation Assessment: r = –.4 for branch 6, r = –.26 for branch 27). 5. Discussion In this section, we discuss our study results with respect to background and related work (within the paragraphs termed “Interpretation of Results”) together with potential implications for software development in open source projects in general. However, these implications require further investigation to be evaluated within other open source projects. Therefore, we also describe implications of our findings for research (within the paragraphs termed “Ideas for Further Research”). Specific ideas for further investigations and improvements of tool support are presented for knowledge management systems in general and issue tracking systems in particular. Therefore, we discuss the quantitative distribution of decision strategy and knowledge elements as well as notable significant medium- or large-sized correlations for issue dimensions, decision-making strategies, and decision knowledge. In addition, small-sized correlations that we find interesting are highlighted. Our quantitative analysis enables dedicated improvement suggestions, particularly involving data mining and recommendation. 5.1. Findings of explorative analyses on issue dimensions Before we discuss our results for the research questions, we briefly highlight the relationships of the investigated issue dimensions. We found three large relationships which are depicted in Fig. 5.

Interpretation of results: The relationships between Branch and Component and between Component and Issue Duration are probably due to the different ages of the branches. It should be noted, that our notion of Issue Duration only covers the last measurable activity in a discussion, but this does not necessarily mean that the issue is actually solved or finished (cf. Section 3.2). As branch 6 is older than branch 27, it is likely that it had not yet been as structured as the later-opened younger branch 27. This is backed up by the fact that for branch 6 mostly “general” was found as value for Component, whereas it was mostly “specific” for branch 27. Furthermore, also the longer issue duration for branch 6 (cf. Section 4.1) can be due to unspecific issue reports, which required more discussion. This is in line with the slightly higher Number of Comments for branch 6 (M = 6,42) than for branch 27 (M = 4,97). Ideas for further research: For different kinds of open source projects it should be investigated, whether issue reports tend to be more general with longer discussions in earlier branches. This could be reflected in issue tracking systems with specialized issue report templates for early development stages. For instance, issue reporters could be encouraged to propose new components for features instead of using the unspecific “general” classification. 5.2. Findings for decision-making strategies (RQ1) 5.2.1. Findings for RQ1, hypothesis H1a Our results clearly confirm hypothesis H1a: Much more NDM elements (about 98%) than RDM elements (less than 2%) were found. In detail, for NDM mostly Situation Assessment elements were found (about 47%), followed by Matching (24%), Singular Evaluation (13%), and Satisficing (12%) (cf. Section 4.2). For RDM, Criterion Evaluation was found most often (0.76%), followed by Consequential Choice (0.72%), and Optimizing (0.35%).

T.-M. Hesse et al. / Information and Software Technology 79 (2016) 36–51

47

Fig. 5. Relationships for issue dimensions.

Interpretation of results: According to the percentages of Situation Assessment for NDM and Criterion Evaluation for RDM, developers seem to have a need for analyzing given decision problems in order to understand them. This is in line with the observation of Zannier et al. [56] that developers are mostly concerned with problem structuring when solving a decision problem. Our findings indicate that developers in open source projects do not spend additional effort to add more RDM elements. It should be noted that the developers were not constrained by prescribed documentation methods in their comments to issues. Thus, they could have added more comments with additional RDM elements at any time. Two different explanations for them not doing so are plausible: First, developers might have found the given NDM documentation to be sufficient. This would imply that for most decisions related to issue reports no extensive RDM-based documentation is requested by the developers. In this case, RDM documentation should not be enforced for minor decisions. Second, developers might have wanted to document more RDM, but were hindered by high documentation effort. This would imply that better support for documenting RDM is required. For software development in open source projects in general, our findings reinforce the study results of Zannier et al. [56] and Tang et al. [45], which identified the application of NDM in software design decisions. This application of NDM challenges the common understanding of software engineering as an engineering task being related to RDM by definition [13]. However, the impact of an extensive NDM documentation on the outcome and quality of the system under development may be positive, indifferent or negative. Ideas for further research: In consequence, it should be investigated whether the dominance of documenting mainly NDM has a negative impact on the development outcome. Therefore, the quality of documented RDM decisions in comparison with NDM decisions and the impact of fewer RDM documentation need to be evaluated. As this is difficult to investigate, we propose two different directions of further research. First, it should be investigated whether the percentage of NDM within the documentation is related to the quality of the decision outcome. Developers could be interviewed about the quality and comprehensiveness of decision documentation in issue reports in relation to the decision outcome during multiple development iterations of the same branch. Falessi et al. performed a similar kind of study for design decisions (cf. [12,14]). Also, the impact of decision documentation on the project success could be investigated, as in the study of Van Heesch et al. [54] on architectural design decisions. However, they focused on RDM documentation, so that an investigation of the specific effects of documenting mostly NDM on the development outcome is still missing. Second, our study should be replicated with issue discussions of other open source projects in order to confirm the dominance of NDM. For architectural design decisions, it is known that RDM documentation helps to improve the comprehensiveness of major decisions (cf. [12,14]). Thus, it could be useful to support RDM documentation in open source projects by reflecting and extending the given NDM documentation. Razavian et al. argue that the intentional reflection of NDM is beneficial to improve the overall quality of decisions and their documentation [39]. To support this reflection, knowledge management systems

could be enhanced by rules for evaluating given NDM documentation. For instance, arguments related to claims could be used to stimulate further RDM documentation in comments within issue tracking systems. Regarding the high percentage of situation analysis in documentation, it could be investigated which situational factors might promote the documentation of NDM. For instance, development projects with different degrees of uncertainty regarding the system requirements could be investigated. 5.2.2. Findings for RQ1, hypothesis H1b Our results show, that there is a significantly higher percentage of NDM elements for bug reports than for feature requests, which confirms hypothesis H1b. This is depicted as (1) in Fig. 6. Interpretation of results: We expected this finding, as in our view there are different decision situations and thereby different development activity structures for the different issue types. On the one hand, developers generally tend to document decision problems in a naturalistic way when they are concerned with fixing errors and follow well-known activities. On the other hand, developers document slightly more rational decision-making when they initially develop features by exploiting the related design space (see derivation of hypotheses in Section 3.1). Both aspects might indicate different levels of reflection for decisions on bug reports and feature requests by developers. The developers might use more RDM elements for decisions on new features in order to make these new decisions comprehensible to themselves and to others. This reasoning is backed by the small-sized correlation between Issue Type and Issue Duration. In detail, a longer Issue Duration for bug reports than for feature requests was observed. This is in line with our argumentation that feature requests are likely to have better substantiated solutions. In consequence, it should take developers less time to discuss and implement them, so that the issue discussion is finished faster. Ideas for further research: Whereas the observed difference is significant, it is not large in size as NDM clearly predominated RDM. In other settings (e.g., with higher degrees of uncertainty and less involved experts), this difference might be more pronounced. Thus, follow-up studies should investigate this finding in open source projects with different user communities and development structures. 5.2.3. Findings for RQ1, explorative analysis In an explorative analysis, we found a negative correlation between Situation Assessment and Matching (cf. part 2 in Figure 6). For this finding, two different explanations are reasonable. First, it might be that an assessment of the decision situation is documented more often if the given situation cannot be matched to former situations in order to identify a possible solution. Second, it is also possible that more matching with former situations is documented when the developers feel they have clarified the situation sufficiently to compare it to others. To address these explanations in knowledge management systems, a support for the matching of decision problems and solutions by patterns for solution matching could be integrated. This has already been applied for architectural design decisions [19]. However, further studies should investigate, which of the explanations is more common in practice. Then, also the focus of the tool support can be determined: Whereas the

48

T.-M. Hesse et al. / Information and Software Technology 79 (2016) 36–51

Fig. 6. Relationships for decision-making strategy elements.

Fig. 7. Relationships for decision knowledge elements.

first explanation requires support for facilitating the matching using patterns, the second is addressed by supporting the derivation of patterns.

5.3. Findings of explorative analysis for decision knowledge (RQ2) We found two large relationships for decision knowledge elements, as depicted in Figure 7. Interpretation of results: Remarkable is a large relationship between the differences in the Number of Solutions and Questions and the Issue Type (cf. part 1 in Figure 7). The number of documented solutions in comparison with the number of questions was found to be higher for bug reports than for feature requests. It is likely that this difference also results from the characteristics of the different issue types. According to our hypothesis H1b, feature requests might require further design space exploration. During this exploration, developers might use questions [27]. In addition, the number of arguments within the comments is positively correlated with the total number of all other identified decision knowledge elements (cf. part 2 in Figure 7). We argue that this correlation might indicate that arguments promote the discussion and foster the documentation of further decision-related knowledge. However, this finding does not indicate whether the quality of the documented knowledge is increased by a higher quantity of knowledge. The quantities of all decision knowledge elements indicate a focus on general context knowledge with the element Context (about 38%). Least found were Implications (0.76%) and Constraints (0.88%). Therefore, correlations for these two elements might be not very reliable and are not considered in the discussion. Ideas for further research: The reasons for the small amounts of constraints and implications within the documentation should be investigated further. It might be that developers find it difficult to distinguish them within the general context information or that they are difficult to identify during the coding. Regarding the documented arguments, one benefit for decision documentation might be that they motivate developers to document further knowledge on decisions. This should be investigated and reflected in issue tracking systems, as they can be adapted easily. For instance, comments could be extended with explicit relations for supporting or challenging solutions described in other comments.

5.4. Findings of explorative analysis for relations between decision-making strategies and decision knowledge (RQ3) We found a set of medium- to large-sized correlations for Solution with three NDM elements. This set is depicted in Fig. 8. Although both branches covered a broad range of different development concerns and issue comment contents, a set of similar correlations between decision-making strategy and knowledge elements was found. We observed a medium-sized negative correlation between Solution and Situation Assessment, and positive correlations of Solution to Satisficing and Singular Evaluation. Interpretation of results: We argue that two different explanations are reasonable. On the one hand, different developers can only evaluate their solutions in terms of Satisficing and Singular Evaluation when they are stated explicitly. Then, developers might spend less effort in documenting information about the decision problem. On the other hand, a thorough investigation of the decision problem is only needed, when solutions are not obvious to developers. Then, less solutions are named. As our study has a correlational design, these different explanations cannot be disentangled. However, both explanations indicate an imbalance in the developers’ exploitation and documentation of the decision problem and its solutions. This imbalance might be mitigated over time in long-lasting issue discussions, when solution knowledge is added. In addition, our interpretation is backed up by two small-sized correlations. First, there is a negative correlation between Solution and Context elements. It might indicate that developers name more alternatives as possible solutions when a lack of context knowledge hinders a proper situation analysis (see results of Ko and Chilana [27]). However, it is also possible that developers focus on context knowledge in order to clarify a decision problem, so that fewer alternatives have to be considered. Second, it is also in line with a positive correlation between Context and situation assessment. This supports our assumption, that context knowledge is derived and documented when the decision situation is assessed. In addition, the correlation could also indicate that developers use documented context knowledge as the basis for a thorough situation analysis (see results of Zannier et al. [56]). Ideas for further research: For open source projects, it should be investigated whether developers focus mostly on the assessment of a given decision problem, when something hinders them from proposing solutions directly. If so, knowledge management systems could focus on supporting the decision problem exploitation when no or very few solutions are given. Therefore, it could be beneficial to enable developers to categorize the content of a

T.-M. Hesse et al. / Information and Software Technology 79 (2016) 36–51

49

Fig. 8. Relationships for decision-making strategy and knowledge elements.

comment explicitly, such as “problem” or “solution”. Then, issue tracking systems could help developers with identifying similar decision problems and their related context and criteria. In contrast, it could also be useful to promote further documentation of the decision problem, when many solutions are given without much assessment of the decision situation. Both would support developers in adding decision knowledge over time instead of asking them to provide large amounts of knowledge at once. For software development in open source projects in general, our findings highlight that the incremental documentation of decisions appears to be beneficial. This also reinforces the requests for supporting the incremental documentation of design decisions (cf. [32,47,50]). 6. Threats to validity According to Runeson et al. [41], four different categories of threats to validity have to be considered for our study. They are described and discussed in the following paragraphs. Internal validity is concerned with the correlation between the investigated factors and other factors [41]. Some of the issues we retrieved for the two Firefox branches might have been linked to their respective branch by mistake. For instance, this might have happened due to inconclusive or misleading issue descriptions. We tried to minimize this problem by checking the content of all issue descriptions manually when the Issue Type was determined (cf. Section 3.2). In addition, issues that belong to one of the branches could have been missed, if they were not properly linked to the branches within the issue tracking system. We addressed this problem by randomly selecting only a subset of the raw data for investigation. However, the actual ratio of bugs and features might deviate from our calculated value. Issues could have been reported in different ways according to the role and experience of the reporter, i.e., whether the reporter was a Firefox developer or an end user. In consequence, the issue discussion could have been determined by the reporter, affecting the type, dimensions, and discussion for such an issue and thereby our coding results. However, this threat was mitigated by guidelines of the Firefox project for its issue tracking system, as every newly created issue needs to describe a number of standardized attributes (i.e., for instance steps to reproduce, actual results, expected results). Finally, not all relevant decision aspects might be documented within the issue tracking system. Issue-related information could have been documented in other sources than the issue tracker, for instance via mailing lists or in forums. However, if there had been extensive external documentation of issue-related knowledge, discussion participants likely would have linked or copied it as a comment into the issue report. Only then, the information would have been made accessible to others to prevent discussions from becoming ineffective. We found some of those references to external documents and considered them for the coding of the related comment. Construct validity is concerned with any gaps between intended and actual observations of the researchers [41]. The coding tables for decision-making strategies and decision knowledge could have identified something else than decision strategy or decision knowl-

edge elements. We addressed this threat by deriving our coding tables from a comprehensive overview of the theories and fundamental approaches for decision-making strategies and decision knowledge. Reliability validity is concerned with the degree to which data and analyses of a study are dependent on specific researchers [41]. As all codings of issue comments were performed manually by the authors, there is the possibility that codes were missed or set inappropriately. Moreover, as the two coders coded one branch each, discrepancies between two coders could have existed, but were not revealed. We tackled both issues by performing two rounds of coding training, where both coders coded the same issues independently, and discussed any differences to mitigate these threats. This is supported by our calculated values for the intercoder reliability which range from good to excellent. External validity is concerned with the degree to which the results of our study can be generalized [41]. First, we only investigated issue reports from one open source software development project. Therefore, our findings might be not comparable to the results gathered from other open source projects. We addressed this threat by explicitly choosing the Firefox project, as it is a wellestablished subject considered in many other issue-related studies [27,43,55]. In consequence, our results should be applicable to other open source projects with similar project size and team distribution. Second, the branches 6 and 27 might be not representative for the majority of issue reports in all Firefox branches. This problem exists for all subsets of branches, unless every branch is investigated. We mitigated this issue by choosing two branches, so that the earlier project stages as well as the later ones are covered. So, it is rather unlikely, that important shifts or changes during issue documentation have been missed. However, we only analyzed a small subset of all issues registered in the issue tracking system (there were still more than 150k issues in July 2015). But our issue sample size of N = 260 is large from a statistical point of view, so that the identified relationships are well grounded.

7. Conclusion and future work In this paper, we presented an empirical study analyzing the documented decision strategies and decision knowledge within open source software projects In detail, we investigated the percentage of documented naturalistic and rational decision-making and the percentage of different decision knowledge elements, like Context information, Solutions, or Arguments, in the comments of 260 Firefox issue reports. Therefore, two different coding tables were created with regard to existing decision-making theories and knowledge documentation approaches. For each issue, we also recorded several additional variables describing the technical and project context of the issue report as issue dimensions. We coded all comments according to these tables and statistically analyzed the resulting codes. In our analyses, we were interested in significant correlations among decision-making strategy elements, decision knowledge elements, issue dimensions, and intercorrelations between them.

50

T.-M. Hesse et al. / Information and Software Technology 79 (2016) 36–51

Several major results were found for the open source project Firefox. We found NDM to be dominant over RDM for both bug reports and feature requests. Interestingly, the Proportion of NDM to RDM was slightly higher for bug reports. We found mostly Context and Solution elements as documented decision knowledge. In addition, we observed a correlation between the total amount of decision knowledge elements and the amount of given arguments. Solution elements were found to be positively correlated with the NDM strategy elements Singular Evaluation and Satisficing. However, there is also a negative correlation between Solution and Situation Assessment. We suggest that the impact, comprehensiveness, and support of NDM documentation should be further investigated for open source projects and industrial projects. The coding tables, that we developed for our approach, enable researchers to investigate documented decision-making strategies and decision knowledge in issue comments. To the best of our knowledge, the coding tables are the first ones to cover these two aspects of decisions during software development. Based on these tables and our experiences from this study, future research should follow two different directions. First, our study should be replicated with other open source projects of different size, type, and domain as well as with data from industry projects. This will show to which extent our method and findings can be generalized or how much they depend on the characteristics of a particular development project. To replicate our study, the coding tables can be further improved based on the experiences described in this paper. For instance, we found that the fine-grained distinction of Assumptions, Constraints, and Implications codes was not used very often by the coders. In consequence, these elements may be omitted or replaced in future studies. Second, the setup of replicated studies can be further improved based on the results and experiences described in our paper. Additional communication channels, like mailing lists, forums, repository commit messages, or the source code could be included to consider as many sources of decision-relevant information as possible and thereby further strengthen the reliability of the results. Acknowledgments This work was partially supported by the DFG (German Research Foundation) under the Priority Programme SPP1593: Design For Future — Managed Software Evolution. We thank our colleagues Paul Huebner, Doris Keidel–Mueller, and Christian Kuecherer for providing valuable feedback on the paper. References [1] M. Ali Babar, I. Gorton, A tool for managing software architecture knowledge, in: Proceedings of the Second Workshop on Sharing and Reusing Architectural Knowledge - Architecture, Rationale, and Design Intent (SHARK/ADI’07: ICSE Workshops 2007), IEEE, 2007, pp. 11–17. [2] D. Arnott, G. Pervan, Eight key issues for the decision support systems discipline, Decis. Support Syst. 44 (3) (2008) 657–672. [3] V.R. Basili, G. Caldiera, D.H. Rombach, Goal question metric paradigm, in: J.J. Marciniak (Ed.), Encyclopedia of Software Engineering, Wiley-Interscience, New York, 1994, pp. 528–532. [4] P. Berander, A. Andrews, Requirements prioritization, in: A. Aurum, C. Wohlin (Eds.), Engineering and Managing Software Requirements, Springer, Berlin, Heidelberg, 2005, pp. 69–94. [5] J.E. Burge, D.C. Brown, An integrated approach for software design checking using design rationale, in: Proceedings of the First International Conference of Design Computing and Cognition, Springer, Berlin, Heidelberg, 2004, pp. 557–576. [6] J.E. Burge, J.M. Carroll, R. McCall, I. Mistrík, Rationale-Based Software Engineering, first edition, Springer, Berlin, Heidelberg, 2008. [7] G. Canfora, G. Casazza, A. De Lucia, A design rationale based environment for cooperative maintenance, Int. J. Softw. Eng. Knowl. Eng. 10 (5) (20 0 0) 627–645. [8] R. Capilla, F. Nava, J.C. Duenas, Modeling and documenting the evolution of architectural design decisions, in: Proceedings of the Second Workshop on Sharing and Reusing Architectural Knowledge - Architecture, Rationale, and Design Intent (SHARK/ADI’07: ICSE Workshops 2007), IEEE, 2007, p. 9.

[9] L. Chen, M. Ali Babar, B. Nuseibeh, Characterizing architecturally significant requirements, Software 30 (2) (2013) 38–45. [10] D.V. Cicchetti, S.A. Sparrow, Developing criteria for establishing interrater reliability of specific items: Applications to assessment of adaptive behavior, Am. J. Mental Defic. 86 (2) (1981) 127–137. [11] J. Cohen, Statistical Power Analysis for the Behavioral Sciences, second edition, Erlbaum Associates, Hillsdale, 1988. [12] D. Falessi, G. Cantone, M. Becker, Documenting design decision rationale to improve individual and team design decision making, in: Proceedings of the 2006 ACM/IEEE International Symposium on Empirical Software Engineering ISESE ’06, ACM, 2006, pp. 134–143. [13] D. Falessi, G. Cantone, R. Kazman, P. Kruchten, Decision-making techniques for software architecture design, ACM Comput. Surv. 43 (4) (2011) 1–28. [14] D. Falessi, R. Capilla, G. Cantone, A value-based approach for documenting design decisions rationale, in: Proceedings of the 3rd International Workshop on Sharing and Reusing Architectural Knowledge - SHARK ’08, ACM, 2008, pp. 63–69. [15] Firefox project, Issue tracking system Bugzilla, 2015. (accessed 2015-08-30). [16] C. Francalanci, F. Merlo, Empirical analysis of the bug fixing process in open source projects, in: Open Source Development, Communities and Quality, IFIP 20th World Computer Congress, Working Group 2.3 on Open Source Software, Springer US, 2008, pp. 187–196. [17] M. Gamer, J. Lemon, I. Fellows, P. Singh, irr: Various Coefficients of Interrater Reliability and Agreement. R package version 0.84, 2012. (accessed 2015-0830). URL http://CRAN.R-project.org/package=irr. [18] J. Gore, A. Banks, L. Millward, O. Kyriakidou, Naturalistic decision making and organizations: Reviewing pragmatic science, Org. Stud. 27 (7) (2006) 925–942. [19] N.B. Harrison, P. Avgeriou, U. Zdun, Using patterns to capture architectural decisions, Software 24 (4) (2007) 38–45. [20] T.-M. Hesse, A. Kuehlwein, B. Paech, T. Roehm, B. Bruegge, Documenting implementation decisions with code annotations, in: Proceedings of the 27th International Conference on Software Engineering and Knowledge Engineering, KSI Research, 2015, pp. 152–157. [21] T.-M. Hesse, B. Paech, Supporting the collaborative development of requirements and architecture documentation, in: Proceedings of the 3rd International Workshop on the Twin Peaks of Requirements and Architecture (TwinPeaks’13), IEEE, 2013, pp. 22–26. [22] A. Jansen, J. Bosch, Software architecture as a set of architectural design decisions, in: Proceedings of the 5th Working IEEE/IFIP Conference on Software Architecture (WICSA’05), IEEE, 2005, pp. 109–120. [23] D. Jonassen, Designing for decision making, Edu. Technol. Res. Dev. 60 (2) (2012) 341–359. [24] F. Khomh, T. Dhaliwal, Y. Zou, B. Adams, Do faster releases improve software quality? An empirical case study of mozilla firefox, in: Proceedings of the 9th IEEE Working Conference on Mining Software Repositories (MSR), IEEE, 2012, pp. 179–188. [25] G. Klein, Naturalistic decision making, Human Factors 50 (3) (2008) 456–460. [26] G. Klein, R. Calderwood, A. Clinton-Cirocco, Rapid Decision Making on the Fire Ground: The Original Study Plus a Postscript, J. Cognit. Eng. Decis. Mak. 4 (3) (2010) 186–209. [27] A.J. Ko, P.K. Chilana, Design, discussion, and dissent in open bug reports, in: Proceedings of the 2011 iConference, ACM, 2011, pp. 106–113. [28] P. Kruchten, P. Lago, H.V. Vliet, Building up and reasoning about architectural knowledge, in: C. Hofmeister, I. Crnkovic, R. Reussner (Eds.), Quality of Software Architectures, Lecture Notes in Computer Science, 4214, Springer, Berlin, Heidelberg, 2006, pp. 43–58. [29] J. Lee, Extending the Potts and Bruns model for recording design rationale, in: Proceedings of the 13th International Conference on Software Engineering, IEEE, 1991, pp. 114–125. [30] R. Lipshitz, G. Klein, J. Orasanu, E. Salas, Taking stock of naturalistic decision making, J. Behav. Decis. Mak. 14 (5) (2001) 331–352. [31] A. MacLean, R.M. Young, M.E. Victoria, T.P. Moran, Questions, options, and criteria: Elements of design space analysis, Human-Comput. Interact. 6 (3-4) (1991) 201–250. [32] C. Manteuffel, D. Tofan, H. Koziolek, T. Goldschmidt, P. Avgeriou, Industrial implementation of a documentation framework for architectural decisions, in: Proceedings of the 11th Working IEEE/IFIP Conference on Software Architecture (WICSA’14), IEEE, 2014, pp. 225–234. [33] P. Mayring, Qualitative Inhaltsanalyse, in: G. Mey, K. Mruck (Eds.), Handbuch Qualitative Forschung in der Psychologie, VS Verlag für Sozialwissenschaften, Wiesbaden, 2010, pp. 601–613. [34] H.M. Mentis, P.M. Bach, B. Hoffman, M.B. Rosson, J.M. Carroll, Development of decision rationale in complex group decision making, in: Proceedings of the 27th international conference on Human factors in computing systems - CHI 09, ACM, 2009, pp. 1341–1350. [35] T. Ngo, G. Ruhe, Decision support in requirements engineering, in: A. Aybüke, C. Wohlin (Eds.), Engineering and Managing Software Requirements, Springer, Berlin, Heidelberg, 2005, pp. 267–286. [36] J. Orasanu, T. Connolly, The reinvention of decision making, in: G. Klein, J. Oransanu, R. Calderwood, C.E. Zsambok (Eds.), Decision Making in Action: Models and Methods, Ablex Publishing, Westport, 1993, pp. 3–20. [37] B. Paech, A. Delater, T.-M. Hesse, Supporting project management through integrated management of system and project knowledge, in: G. Ruhe, C. Wohlin (Eds.), Software Project Management in a Changing World, Springer, Berlin, Heidelberg, 2014, pp. 157–192.

T.-M. Hesse et al. / Information and Software Technology 79 (2016) 36–51 [38] R Core Team, R: A language and environment for statistical computing, 2014. (accessed 2015-08-30) URL http://www.R-project.org/. [39] M. Razavian, A. Tang, R. Capilla, P. Lago, et al., In two minds: How reflections influence software design thinking, Technical Report, VU University Amsterdam, 2015. [40] G. Ruhe, Software engineering decision support – A new paradigm for learning software organizations, in: S. Henninger, F. Maurer (Eds.), Advances in Learning Software Organizations, Lecture Notes in Computer Science, 2640, Springer, Berlin, Heidelberg, 2003, pp. 104–113. [41] P. Runeson, M. Höst, A. Rainer, B. Regnell, Case study research in software engineering. Guidelines and examples, first edition, Wiley, Hoboken, 2012. [42] P.E. Shrout, J.L. Fleiss, Intraclass correlations: Uses in assessing rater reliability, Psychol. Bullet. 86 (2) (1979) 420–428. [43] R. Souza, C. Chavez, R. Bittencourt, Do rapid releases affect bug reopening? A case study of firefox, in: Brazilian Symposium on Software Engineering (SBES), IEEE, 2014, pp. 31–40. [44] N.A. Stanton, B.L.W. Wong, Editorial: Explorations into naturalistic decision making with computers, Int. J. Human Comput. Interact. 26 (2-3) (2010) 99–107. [45] A. Tang, A. Aleti, J. Burge, H. van Vliet, What makes software design effective? Des. Stud. 31 (6) (2010a) 614–640. [46] A. Tang, M. Ali Babar, I. Gorton, J. Han, A survey of architecture design rationale, J. Syst. Softw. 79 (12) (2006) 1792–1804. [47] A. Tang, P. Avgeriou, A. Jansen, R. Capilla, M. Ali Babar, A comparative study of architecture knowledge management tools, J. Syst. Softw. 83 (3) (2010b) 352–370. [48] A. Tang, Y. Jin, J. Han, A rationale-based architecture model for design traceability and reasoning, J. Syst. Softw. 80 (6) (2007) 918–934. [49] A. Tang, H. van Vliet, Software designers satisfice, in: D. Weyns, R. Mirandola, I. Crnkovic (Eds.), Software Architecture: 9th European Conference, ECSA 2015, Dubrovnik/Cavtat, Croatia, September 7-11, 2015. Proceedings, Springer International Publishing, Cham, 2015, pp. 105–120.

51

[50] D. Tofan, M. Galster, P. Avgeriou, W. Schuitema, Past and future of software architectural decisions - A systematic mapping study, Inf. Softw. Technol. 56 (8) (2014) 850–872. [51] A. Tversky, D. Kahneman, Judgment under uncertainty: Heuristics and biases, Science 185 (4157) (1974) 1124–1131. [52] J. Tyree, A. Akerman, Architecture decisions: Demystifying architecture, Software 22 (2) (2005) 19–27. [53] U. van Heesch, P. Avgeriou, R. Hilliard, A documentation framework for architecture decisions, J. Syst. Softw. 85 (4) (2012) 795–820. [54] U. van Heesch, P. Avgeriou, A. Tang, Does decision documentation help junior designers rationalize their decisions? A comparative multiple-case study, J. Syst. Softw. 86 (6) (2013) 1545–1565. [55] S. Zaman, B. Adams, A.E. Hassan, Security versus performance bugs: A case study on firefox, in: Proceedings of the 8th Working Conference on Mining Software Repositories (MSR), ACM, 2011, pp. 93–102. [56] C. Zannier, M. Chiasson, F. Maurer, A model of design decision making based on empirical results of interviews with software designers, Inf. Softw. Technol. 49 (6) (2007) 637–653. [57] C. Zannier, F. Maurer, Foundations of agile decision making from agile mentors and developers, in: P. Abrahamsson, M. Marchesi, G. Succi (Eds.), Extreme Programming and Agile Processes in Software Engineering, Lecture Notes in Computer Science, volume 4044, Springer, Berlin, Heidelberg, 2006, pp. 11–20. [58] O. Zimmermann, T. Gschwind, J. Küster, F. Leymann, N. Schuster, Reusable architectural decision models for enterprise application development, in: S. Overhage, C. Szyperski, R. Reussner, J.A. Stafford (Eds.), Software Architectures, Components, and Applications, Lecture Notes in Computer Science, volume 4880, Springer, Berlin, Heidelberg, 2007, pp. 15–32.