Debugging hardware designs using dynamic dependency graphs

Debugging hardware designs using dynamic dependency graphs

ARTICLE IN PRESS JID: MICPRO [m5G;October 24, 2016;9:33] Microprocessors and Microsystems 0 0 0 (2016) 1–13 Contents lists available at ScienceDir...

2MB Sizes 0 Downloads 61 Views

ARTICLE IN PRESS

JID: MICPRO

[m5G;October 24, 2016;9:33]

Microprocessors and Microsystems 0 0 0 (2016) 1–13

Contents lists available at ScienceDirect

Microprocessors and Microsystems journal homepage: www.elsevier.com/locate/micpro

Debugging hardware designs using dynamic dependency graphs Jan Malburg a,b,1,∗, Alexander Finder a,2, Görschwin Fey a,b a b

University of Bremen, Bibliothekstr. 1 (MZH), 28359 Bremen, Germany German Aerospace Center, Robert-Hooke-Str. 7, 28359 Bremen, Germany

a r t i c l e

i n f o

Article history: Received 6 August 2015 Revised 20 May 2016 Accepted 19 October 2016 Available online xxx Keywords: Dynamic dependency graphs RTL Fault localization Debugging

a b s t r a c t Debugging is a time consuming task in hardware design. In this paper a new debugging approach based on the analysis of dynamic dependency graphs is presented. Powerful techniques for software debugging, including reverse debugging, dynamic forward and backward slicing, and spectrum-based fault localization are combined and adapted for hardware designs. A case study on designs with multiple faults approved the power of the proposed debugging methodology reducing the debugging time to 50% in comparison to conventional techniques.

1. Introduction

© 2016 Elsevier B.V. All rights reserved.

• •

Consuming more than 46% of the total ASIC development effort, verification already is a bottleneck in the design flow [1]. The fastest growing component of the verification process, and with 60% already the largest one, is debugging, i.e. fault localization and error correction. To a large extent debugging is still a manual task, where a developer must understand an error based on a sequence of input values for which the design yields incorrect outputs. Afterwards, he has to localize and fix the fault in the design. This is normally done using a simulator which presents the values of the different variables in a waveform. Several techniques for helping the developer in debugging have been proposed. Reverse debugging has already been applied on software, reducing the debugging time to 25% [2] and is, for example, implemented in the commonly used software debugger GDB [3]. Dynamic forward and backward slicing reduces the amount of code the developer has to inspect in order to find and fix a bug [4]. Spectrum-based fault localization uses program spectra of correct and incorrect simulation runs to identify parts of the code which are likely to contain a bug [5]. This gives the developer further guidance while searching for the fault that causes the error. In this paper we present the following debugging techniques for Hardware Description Languages (HDL): ∗

Corresponding author. E-mail addresses: [email protected] (J. Malburg), alexander.fi[email protected] (A. Finder), [email protected] (G. Fey). 1 During the work Jan Malburg changed to German Aerospace Center. 2 Present affiliation: Daimler AG, Hanns-Klemm-Str. 45, 71034 Böblingen, Germany



The use of reverse debugging Dynamic program slicing which considers control dependency Spectrum-based fault localization based on dynamic dependency graphs

For the implementation of our approach we use dynamic dependency graphs [6], which we additionally annotated with the values of the operands for each vertex of the graph. We implemented a Graphical User Interface (GUI) allowing the developer to investigate single simulation runs of his design. To improve the developer’s understanding of a run, the GUI presents the run as a graph with the executed source code as vertices, dependencies as edges, and temporal behavior as time line. The graph can be shown in different granularities. In addition, the user is able to combine different granularity-levels with each other. We conducted a case study on designs with multiple faults in which we compared our approach with a conventional debugging approach. In this case study, our approach showed a reduction of debugging time by 50% compared to the conventional approach. The remainder of this paper is organized as follows: In Section 2 we summarize related work. Section 3 presents definitions used throughout the paper. In Section 4 our approach is introduced in detail. The evaluation of our approach is given in Section 5 and Section 6 concludes the paper.

2. Related work So far, there exist several methods to support a developer in debugging a design both for software [2,4,5,7–10] and for hardware [11–16].

http://dx.doi.org/10.1016/j.micpro.2016.10.004 0141-9331/© 2016 Elsevier B.V. All rights reserved.

Please cite this article as: J. Malburg et al., Debugging hardware designs using dynamic dependency graphs, Microprocessors and Microsystems (2016), http://dx.doi.org/10.1016/j.micpro.2016.10.004

JID: MICPRO 2

ARTICLE IN PRESS

[m5G;October 24, 2016;9:33]

J. Malburg et al. / Microprocessors and Microsystems 000 (2016) 1–13

A typical approach for software is to reduce the amount of code a developer has to inspect in order to find a fault. One of these techniques is static program slicing, proposed by Weiser [7]. For a given set of program points (slicing criterion), static program slicing computes all statements, for which an input to the system exists, such that those statements influence the slicing criterion (backward slicing) or can be affected by the slicing criterion (forward slicing). While debugging, a developer normally knows an execution for the system which reveals an error. Korel and Laski [4] proposed dynamic program slicing which computes the statements that affect (or are affected by) the slicing criterion under a given input. Zhang et al. [17] describe three different types of dynamic slicing. In data slicing only data dependencies are included. For full slicing control dependencies are additionally contained and for relevant slicing also statements are considered which could change the value of a variable by changing the system’s execution path. Another approach to reduce the amount of code a developer must investigate, is automated fault localization which computes parts of the source code that might contain a fault. In spectrumbased fault localization, program spectra, for example, coverage information of failing and succeeding test cases, are used to localize a fault. A well-known tool applying spectrum-based fault localization is the visualization tool Tarantula [5]. The tool colors statements in the source code depending on their suspiciousness of causing an error. The suspiciousness is computed by comparing the amount of failing and succeeding runs which execute the statements. A technique closely related to spectrum-based fault localization is spectrum-based feature localization [18]. Both techniques use coverage information and heuristics to compute whether some part of the code is related to some behavior of the system. In case of fault localization this is the cause of an incorrect behavior and in case of feature localization the code responsible for an intended behavior. The heuristics used for spectrum-based fault localization can, with only small changes, be applied to spectrum-based feature localization [19]. In [8] Renieres and Reiss present another technique for fault localization. Their approach searches for a successful test case with minimal difference to the failing test case. Then those parts of the code are reported as fault candidates which are covered by the failing run, but not by the successful run. Groce et al. [9] further formalized the approach using a model checker to generate a successful run with minimal distance to the failing run. Delta Debugging [10] aims at helping a developer in finding a fault by isolating the possible trigger of the fault. Delta Debugging originally was presented to determine those changes to a software system which cause a regression error. Later Delta Debugging has been extended to minimize error revealing inputs to the system and to find the minimal change which must be applied to a failing run such that the result becomes correct [20]. The intuition is that knowing the trigger of an error helps in understanding the error and a smaller test input executes less code of the system such that the possible fault locations can be restricted. In [2] Lewis describes a tool for Java programs called Omniscient Debugger which allows to step backwards in the execution of a program. In this approach, first an instrumented version of the program is executed and all events, i.e. assignments to variables, function calls, returns from method calls, etc. are stored. Then a GUI answers questions in the form of “Where has the value of this variable been set?” and allows the user to step forward and backward through the program’s execution. Clarke et al. [11] developed an adaption of static program slicing for code in HDL. In their approach they relate HDL-constructs to constructs of software languages.

Path tracing [12] follows the controlling inputs of different gates to determine the critical path which is responsible for the value of a given signal. In [13] a description is given how to apply path tracing on HDL-level. However, the approach only considers data dependency and neglects control dependency. Also all kinds of path tracing neglect forward dependencies. Altogether, path tracing can be understood as backward dynamic program slicing applied to hardware systems. In [21] Stumptner and Wotawa describe model-based diagnosis [22] for hardware description languages. In their approach they generate different versions of the design based on a set of construction rules to decide whether the incorrect behavior can be fixed by this change, i.e., the inverse of the change explains the incorrect behavior. They use the number of changes which must be applied to the design as a ranking function for their explanations such that fewer changes are considered better explanations. Often program slicing is applied for model-based diagnosis to reduce the amount of possible changes which has to be considered [23]. Another type of model-based diagnosis for hardware designs is Satisfiability (SAT)-based debugging [14]. Given a set of stimuli to a design which result in the violation of the specification, SAT-based debugging uses a SAT-solver to explain the incorrect behavior of the design. The circuit is transformed into a Boolean formula and the SAT-solver determines a minimal number of fault candidates which must be corrected to fulfill the specification. However, the SAT-based approach is limited to the capability of the underlying SAT-solver making the application unfeasible for larger designs. In [15] two analysis, “What-if” and “How-can” are presented which are based on dynamic data flow analysis. “What-if” analysis computes how the change of one or several values affects a target value. The idea is similar to a conventional debugger, where a user can change the value of the variable during debugging. “How-can” analysis computes the values for a set of signals, such that a target signal gets a desired value. This analysis is similar to the idea of SAT-based debugging. To prevent high runtimes they limit the set on which the search is conducted to 70 binary variables, what however, also limits the practical benefit of the analysis. Beer et al. [16] describe a technique which computes the cause of the violation of a specification given as a Linear Time Logic (LTL)-formula. The technique requires the specification and a violating simulation run. However, they do not consider expressions or statements as cause for a violation, instead they consider the valuation of one or more variables as the cause of the violation. Hence their approach does not give any information why the valuation is wrong at this point. The reason could be that an assignment is missing or that the previous assignment was wrong. Further, their approach does not distinguish between the case that changing the variable found would fix the violation and the case that changing the variable found only pushes the violation to another clock cycle. In [24] Le, Große, and Drechsler present a technique for SystemC Transaction-Level Modeling (TLM) fault localization. Their approach targets faults at the level of transaction, synchronization, and timing. Thus, they do not consider faults in the form of incorrect assignment to variables, incorrect formulas or similar. If a design is erroneous, they create a set of alternative designs based on their fault model. These changes to the design are parametrized. Then a model checker is used to check whether a changed design is fault free for some concrete parameter value. Those designs which can correct the fault are reported as diagnosis of the error. Compared to our approach they target different kinds of faults. First of all, they target typical faults introduced at the TLM-level, for example synchronization errors between transactions. These are not common faults at the RT-level at which our approach works; in fact, transactions like in TLM do not exist at the RT-level.

Please cite this article as: J. Malburg et al., Debugging hardware designs using dynamic dependency graphs, Microprocessors and Microsystems (2016), http://dx.doi.org/10.1016/j.micpro.2016.10.004

JID: MICPRO

ARTICLE IN PRESS

[m5G;October 24, 2016;9:33]

J. Malburg et al. / Microprocessors and Microsystems 000 (2016) 1–13

Further, our approach is not limited to a restricted fault model, i.e., we consider faulty expressions or assignments in general. Chang et. al. present a technique to automatically correct an erroneous netlist by resynthesis [25]. The approach requires a set of incorrect and correct execution traces of the design. Based on these traces the approach starts by computing faulty signals in the design with respect to the specification. In the next step primary input signals and internal signals are searched which can be used as inputs to a function resulting in the correct values. Those values are used to create a partial truth-table. A replacement gate-list is created satisfying this partial truth-table. In a final step a formal verification tool is used to either proof the correctness of the design or create a counterexample. In case that a counterexample is found this counterexample is used as additional trace in the next iteration of the approach. The main limitation of the approach is that the approach likely hits runtime limitations as it is only ensured to create a correct design if the exhaustive set of input traces is provided. Also the requirement of a formal verification engine limits the size of the designs under consideration. In contrast the technique presented in this work has no such size limitations. A tool that supports program-spectrum based fault localization for HDL-designs is ZamiaCAD [26]. ZamiaCAD uses statement coverage, branch coverage and static dependency graphs for feature localization. This combination includes larger portions of source code and therefore the localization result is less accurate than in the presented technique based on dynamic dependency graphs. Further, ZamiaCAD neither provides dynamic slicing nor reverse debugging. In [27] Repinski et. al. present an automated tool, which uses fault localization and mutation testing to suggest error corrections for ESL-designs. The approach for fault localization is similar to the computation used for ZamiaCAD, with the difference that only statement coverage is used as coverage metric. However, the coverage is not intersected with the static dependency graph, but with the static program slices for the different assertions used in the test cases. The source code parts with the highest likelihood of to be faulty are mutated and the resulting mutants are checked against the designs test suite. If the test suite runs successful on one mutant this mutant is reported to the developer. Compared to the technique presented in this chapter, their tool works at the ESL-level in the design process, where the approach presented in this chapter works on the RTL-level. Second, if their tool does not find a correction, it provides no information to the user at all. 3. Preliminaries In this section we introduce definitions used throughout the paper. As running example we will use a simple sequential parity computation from a design used in our evaluation:

3

Fig. 1. The static dependency graph of our running example.

in which all bits of in are set to 1. In the first clock cycle tick is equal to 0 and in the second clock cycle tick is equal to 1. A test case of a design is a use case which is additionally annotated with an expected behavior of the design. The expected behavior is defined by the design’s specification. If the design violates the expected behavior, the test case fails. Otherwise the test case succeeds. A design has an error, if there exists a run which violates the design’s specification. A fault is the root cause of an error. The statements in the test case which check whether the design conforms to the specification are called assertions. The sequence of signal valuations used by assertions when applying a test case t to the design H is denoted by aH [t]. The sequence of values at the primary outputs of the design H produced by the test case t is denoted by oH [t]. In general, assertions can consider any combination of primary outputs and internal signals. Let E = {e1 , e2 , .., en } be a set of faults. Further, let the operator  denote a design getting changed by a set of faults. Such that if H is the correct design and B is the design which contains all faults of E, this is described by the formula B = H  E. We call an operand of an expression or statement controlling operand, if it belongs to at least one set of operands with the following properties: 1. Changing all operands in the set results in a different result of the expression or statement. 2. There is no set including less operands and fulfilling property 1. For example given the expression a || b, if a = 1 and b = 0, only a is a controlling operand, if however a = 0 and b = 0 both are controlling operands because there are two minimal sets, one for each operand. In case of a = 1 and b = 1 also both operands are controlling operands. In this case, however, because there is one set containing both operands.

3.2. Static dependency graph

3.1. Basic definitions We call the simulation or emulation of a design under a specific input a run of the design. For our running example we use the run

A static dependency graph is a directed graph. For each statement or expression in the design the graph contains a vertex. Two vertices are connected by an edge if there exists an input to the system such that the starting point v1 of the edge is directly affected by the ending point v2 of the edge, i.e., v1 depends on v2 . The computation of the dependency graph can be done statically from the source code of the system. Fig. 1 shows the static dependency graph of our running example. In this figure statements and expressions are denoted by a single circle, inputs by a double circle and dependencies by arrows.

Please cite this article as: J. Malburg et al., Debugging hardware designs using dynamic dependency graphs, Microprocessors and Microsystems (2016), http://dx.doi.org/10.1016/j.micpro.2016.10.004

JID: MICPRO 4

ARTICLE IN PRESS

[m5G;October 24, 2016;9:33]

J. Malburg et al. / Microprocessors and Microsystems 000 (2016) 1–13

Fig. 2. The dynamic dependency graph of our running example.

3.3. Dynamic dependency graph During a run r of an HDL design H, an expression or statement in the source code of the system can be executed several times. Multiple executions of a statement or expression may have different reasons: • • • • •

It It It It A

is executed several times at different clock ticks. is instantiated several times. is inside a loop. is triggered several times during a clock cycle. combination of those causes.

We call a single execution of a statement or an expression an execution point x. The execution point x is defined by x’s position in the source code, instantiation path, time of execution, and the number of times x has already been triggered for the clock cycle and instantiation path. Depending on whether the corresponding source code is an assignment, an expression, or a control-flowstatement, we call it an assignment execution point, an expression execution point, or a control execution point, respectively. Let X be the set of all execution points in r. We define a data point d as the value of a constant, the initial value of an instantiated variable, the value of an instantiated variable after an assignment to this variable, the result of an expression, or the value of a primary input after a value change at the input. Let D be the set of all data points in r. A dynamic dependency graph G = (V, E ) is an acyclic and directed graph which can be computed from a design H and a run r of the design. The vertex set V = D ∪ X of G contains the execution points X and the data points D of r. In contrast to dynamic dependency graphs for software systems [6], the described dynamic dependency graph also includes data points. This is due to the fact that there are no designated statements to read input data for hardware systems, but the input is given by assignments to the primary input pins. Therefore, without the data points the inputs would not be included within the dynamic dependency graph. Fig. 2 shows the dynamic dependency graph of our running example. In this figure the vertices in the graph are represented as circles, assignment execution points are green, expression execution points are blue, control execution points are red and data points are black. The primary input values are marked with a double circle. An edge e = (v1 , v2 ) of G between v1 ∈ V and v2 ∈ V is a directed connection from v1 to v2 . The edge set E of G is defined by the direct-dependent relations of the vertices. There exist two direct-dependent relations, direct-data-dependency and direct-control-dependency. A data point d ∈ D is direct-data-dependent on an execution point x ∈ X, if either x is the expression execution point for which d is the result or x is the assignment execution point which orig-

inally set d. Additionally, x is direct-data-dependent on d, if d is an operand of x. Given an execution point x1 ∈ X and a control execution point x2 ∈ X, such that x2 is the control execution point that controls the execution of x1 , than we say x1 is direct-control-dependent on x2 . A vertex v1 ∈ V is direct-dependent on a vertex v2 ∈ V, if v1 is direct-control-dependent on v2 or v1 is direct-data-dependent on v2 . In Fig. 2 direct-control-dependent edges are shown as red arrows and direct-data-dependent edges are shown as black arrows. For a vertex v ∈ V the backwardtrace(v) ⊆ V is the set of all vertices which are reachable in G from v following the direction of the edges. Given a vertex v ∈ V we say that the forwardtrace(v ) ⊆ V is the set of all vertices which are reachable in G from v following the edges in reverse direction. For each assignment execution point and each expression execution point, there exists exactly one vertex which is directdependent on this execution point. This vertex is always a data point. Furthermore, each data point has at most one vertex on which it is direct-dependent. This vertex is either an assignment execution point or an expression execution point. Further, all data points with no predecessors are either initial values, primary inputs, or constants. Therefore the dynamic dependency graph can be reduced by merging the assignment execution points and the expression execution points with their respective data points. In Fig. 2 vertices that can be merged are marked by dotted circles around them. Fig. 3 shows the corresponding reduced dynamic dependency graph for our running example. The dynamic dependency graphs used in this work are annotated with all operand values of the vertices of the graph. Additionally, for assignment execution points and control execution points we annotate the simulation time of the execution. Whenever a primary input is read or an expression execution point is executed, this is part of either a control- or an assignment-statement. By this, the simulation time is identical to the simulation time of the corresponding control- or assignment execution point. Consequently, it suffices annotating the time for the control- or assignmentstatements. In the remaining paper we will only consider annotated reduced dynamic dependency graphs. 4. Graph-based debugging In this section we present our approach. Fig. 4 gives an overview of the different parts of our approach. The basis is formed by annotated reduced dynamic dependency graphs, which are used for all other computations and visualizations. We compute annotated reduced dynamic dependency graphs by executing an instrumented version of our design. A GUI is utilized to present the dynamic dependency graphs as well as the results of all other computations to the user. So far, several techniques have been integrated in our debugging tool to support debugging the design.

Please cite this article as: J. Malburg et al., Debugging hardware designs using dynamic dependency graphs, Microprocessors and Microsystems (2016), http://dx.doi.org/10.1016/j.micpro.2016.10.004

JID: MICPRO

ARTICLE IN PRESS

[m5G;October 24, 2016;9:33]

J. Malburg et al. / Microprocessors and Microsystems 000 (2016) 1–13

5

Fig. 3. The reduced dynamic dependency graph of our running example.

Listing 1. A example to present the instrumentation of designs.

Fig. 4. A schematic overview of the relations between the different parts of our approach.

Reverse debugging allows the user to step back and forth within a run of the design. Dynamic slicing computes the backward and forward slices for vertices in the graph. Spectrum-based fault localization computes expressions and statements which are likely to be faulty. In the following we describe the single components of our approach and how they complement each other. 4.1. Annotated reduced dynamic dependency graphs Listing 2. The instrumented version of the example.

In order to compute the annotated reduced dynamic dependency graphs the design under test is instrumented. The idea of this instrumentation is to compute the information flow within the design. For the instrumentation a fully automatic tool is provided. The instrumentation changes the original design as follows: •





All signals and registers of the design are replaced with SystemVerilog objects. These objects contain the original value and a unique identification number. As well as a few other attributes to handle the difference between blocking and nonblocking assignments. Every expression is replaced by a call to a SystemVerilog task which computes the results of the original expression. Each result gets a unique identifier number assigned. Further, the function writes the values of all operands, the unique identifier of the result, the type of the expression, the identifier number of the controlling operands, and the position of the expression in the source code to disk. A stack is introduced which contains the information about the control dependencies. Whenever a block guarded by a control statement is entered an element representing that control statement is pushed onto the stack and is popped from the stack, when the block is left. The element is stored to disk along with the identification number of the conditional expres-



sion and the identification number of the previous top stack element. Whenever a task or function is called the stack is added to the parameters of those tasks and functions. Assignment statements are replaced by function calls. Whenever an assignment is executed, its left hand side gets a new identification number assigned. Like in the previous cases the unique identifier, the identifier of the right hand side, the identifier of the top element of the stack, and the simulation time are stored to disk.

In the following, a small example for such an instrumentation is shown. As an example, the module in Listing 1 is used. The corresponding instrumented version is shown in Listing 2. In line 2–4 (red code) the signals of the modules are defined. As mentioned above each signal is replaced by an object of a SystemVerilog class. As the signal output is defined as a register, the current module is responsible for that object. This especially means ensuring, that the difference between blocking and non-blocking assignments is honored. This is done by the always process in lines 5–11. The always process from the original design is changed into the always process from line 13–21. The two variables defined in lines 15 and 16 store the control dependencies. The always process will not

Please cite this article as: J. Malburg et al., Debugging hardware designs using dynamic dependency graphs, Microprocessors and Microsystems (2016), http://dx.doi.org/10.1016/j.micpro.2016.10.004

JID: MICPRO 6

ARTICLE IN PRESS

[m5G;October 24, 2016;9:33]

J. Malburg et al. / Microprocessors and Microsystems 000 (2016) 1–13

4.3. Reverse debugging

Fig. 5. A screenshot of our GUI with the three parts (A)–(C).

only execute if the value of a signal changes (.value), but also when its dependencies change (.h). The original statement (blue) consists of an assignment and an index-access-operation (green). Correspondingly, the instrumented statement calls the SystemVerilog tasks set and indexSingle from the instrumentation class.

4.2. GUI Fig. 5 shows a screenshot of the GUI. The GUI has three parts: The first part (A) is the graph view. Here the dynamic dependency graph or a dynamic slice is visualized. This view shows the dataand control-flow within the design and how the different parts of the design depend on each other. Also the execution time of each node is marked by a time axis allowing to show when which part of the design was executed during the simulation. The graph can be shown at different levels of detail. Those levels are:

In reverse debugging a developer can step backwards in time to answer the question “How did this variable get its value?”. With our GUI he can do so by following the dependency edges of the graph. This enables the developer to easily trace to the root cause of an incorrect value. This approach has several advantages over a standard waveform view, usually used in debugging hardware designs. First, in a waveform view the developer only sees what value a signal has at a given time. Our visualization, however, shows when a signal was set. This is especially useful as in a waveform view the developer does not see that an assignment to a signal was made when the new value and the old value are identical. Second, with our approach a developer does not only see that an assignment was made, but also at which position in the source code it happens. Third, the developer can also backtrace the control-flow and therefore is able to see why a signal was set by exactly this particular statement and not by another one. Finally, our approach also shows intermediate results of a large expression. 4.4. Dynamic slicing With the dynamic dependency graphs we have the dynamic execution information available. Hence, we can easily compute dynamic backward and dynamic forward slices. Dynamic backward slicing shows a developer those positions in the source code which affect a given slicing criterion. In case that we use the position in the execution where we observed the error, the fault is part of the dynamic slice, expect for some special cases like an error due to missing code [17]. Similarly, the dynamic forward slice shows which positions in the execution are affected by a code position and, thus, allows to estimate where side effects of a change can occur. Given a set of data or execution points C ∈ V as slicing criterion, the dynamic backward slice Sb is computed by the union of the backwardtraces of the elements in C:

Sb = •

• •







The file level, merging all vertices which correspond to code in a certain file into a single vertex. The module level, merging all vertices from one module. The process level, combining all statements which are in the same process into a single vertex. The basic block level, merging all statements which are in the same basic block. The statement level, merging the expression execution points with the control- and assignment execution points to which they belong. The expression level, showing the complete graph.

Different vertices may represent the design at different levels of detail, such that the user can zoom in into parts of the graph, while still keeping the less interesting points at a more abstract level. For the file level, module level, and process level we do not abstract the temporal behavior, even if it would be possible. For example, if a process spans over several clock ticks there is a process node for each of those clock ticks. The second view (B) is the source code view showing the source code corresponding to the currently selected vertex. The last view (C) shows the values of the different operands for that vertex. For each operator a default representation, decimal or binary, is defined in which its operand values are shown. For example, decimal representation for arithmetic operators and binary representation for bitwise operators. However, there exist some operators, like the equality operator, for which the best representation depends on the intended behavior of the code. Therefore, the user can choose another representation for the operand values.

C 

backwardtrace(c )

c

Correspondingly, the dynamic forward slice Sf is computed by the union of the forwardtraces:

Sf =

C 

forwardtrace(c )

c

4.5. Spectrum-based fault localization Spectrum-based fault localization computes the likelihood for a part of the code to contain a fault. For this purpose spectrumbased fault localization uses some program spectra, typically a coverage metric. In case of software systems the most commonly used program spectrum is statement coverage. One problem when using statement coverage as spectrum for hardware designs is that much code is always executed, but does not necessarily influence the computation of the design. Therefore, we use the set of source code positions which correspond to vertices in the dynamic dependency graph or a dynamic slice as program spectrum for our computation. This code likely has an effect on the behavior of the design, or in case of a slice on the slicing criterion. For spectrum-based fault localization a heuristic is required. This heuristic computes whether a source code position is likely to be faulty. Two heuristics often used are the Tarantula heuristic [5] and the Ochiai heuristic [28]. Both heuristics compute for each source code position a likelihood, whether the source code position is faulty, and a confidence, describing the quality of the data which has been used to compute the likelihood. Let us define three sets containing dynamic dependency graphs or dynamic slices. We

Please cite this article as: J. Malburg et al., Debugging hardware designs using dynamic dependency graphs, Microprocessors and Microsystems (2016), http://dx.doi.org/10.1016/j.micpro.2016.10.004

ARTICLE IN PRESS

JID: MICPRO

[m5G;October 24, 2016;9:33]

J. Malburg et al. / Microprocessors and Microsystems 000 (2016) 1–13

7

denote the set of dynamic dependency graphs or dynamic slices which include a source code position p as covers( p). Further, we denote the set of dynamic dependency graphs or dynamic slices corresponding to failing test cases as failures and the set corresponding to the succeeding test cases as successes. Given these definitions the likelihood computation of the Tarantula heuristic likeT ( p) is defined by:



likeT ( p) =

|failures∩covers( p)| |failures| |successes∩covers( p)| |failures∩covers( p)| + |successes| |failures|

0

if covers( p) = ∅ otherwise

and the likelihood computation for the Ochiai heuristic likeO ( p) is defined by:



likeO ( p) =

√|failures∩covers( p)|

if covers( p) = ∅

0

otherwise

|failures|∗|covers( p)|

For both computations the resulting values are in the range of 0.0–1.0, where 0.0 means very unlikely that the corresponding source code position is faulty and 1.0 means the corresponding source code position very likely contains a fault. Further, both heuristics compute a confidence value, describing the quality of the information base used to compute the likelihood. This confidence computation conf( p) is identical for both heuristics and is defined by:

conf( p) =

⎧  ∩covers( p)| , ⎨max |successes |successes | ⎩

|failures∩covers( p)| |failures|

0

if covers( p) = ∅ otherwise

For this function the range is from 0.0 to 1.0 as well. Where 0.0 means no useful information to decide whether the source code position is faulty or not, i.e., the corresponding source code position is not included in any dynamic dependency graph. A confidence value of 1.0 on the other hand means a very high confidence in the computed likelihood value. Spectrum-based fault localization is only applicable when there are several simulation runs for a design and some of them are failing while others fulfill the specification. However, in cases where this technique is applicable, the developer receives a good guidance towards the fault. The results of the spectrum-based fault localization are shown to the user at the editor view as well as in the graph view using color encoding. By default we use the same encoding as the Tarantula tool: The hue of a expression or statement, in case of the editor view, or of a node, in case of the graph view, is determined by the computed likelihood. In case of likelihood 1.0 it is colored red, in case of likelihood 0.5 yellow and in case of 0.0 green. The values in between are interpolated. The confidence is shown as the brightness, where a high confidence is shown as a bright color and a low confidence as a dark color. In case that nodes in the graph are shown at a more abstract level than expression level, i.e., they have more than one likelihood and confidence value pair assigned to them, they are colored corresponding to the pair with the highest likelihood. This allows the user to see the interesting parts even in case that abstraction is used for the graph. The user may then zoom into the corresponding nodes. Fig. 7(a) shows an example of the editor view with color coding. At the top, file names with numbers beside them denote files including suspicious code. Further, those numbers from low to high provide a ranking of those files from files which contain very suspicious code to files containing less suspicious code. Fig. 7(b) shows an example of a graph with color-coded nodes. Data points of primary input pins are shown as black nodes. Spectrum-based fault localization is not limited to single fault assumption, but can handle multiple faults as well, in fact it does not require any knowledge about the amount of faults in the design. In the multiple fault cause, however, the localization quality

Fig. 6. Example for the color coding used by feature localization for the editor view (a) and the graph view (b). (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)

can suffer. To reach a good level of localization quality, the spectrum of the test cases revealing the different fault, should not only differ from the successful test case, but also from test cases revealing other faults. 5. Evaluation In this section we investigate to which extent the presented approach speeds up debug time compared to conventional techniques, using waveform view, variable inspection, and step-by-step execution of the test case. First, we evaluate how much abstraction is achieved using the different levels of detail. Then, the next evaluation measures how much code must be inspected until a fault is found. For this evaluation we use different traversing rules, based on fault localization and dynamic slicing, defining in which order the statements of the design are inspected. Finally, we present a case study which shows how much time a developer can save using the techniques presented in this paper. 5.1. Abstraction First, we evaluate how much abstraction can be achieved with the different levels of detail. For this we apply the approach to two designs and their test suites. The first design is a parallel to serial signal converter. The design supports different modes for the conversion. These modes are: •



• •

Binary mode, where a one is sent as high signal and a zero as low signal. Parity mode, which is similar to the binary mode, but a parity bit is added for 8 bits of data. Return-to-zero mode, a zero is added after each data bit. Hamming mode, where a (7,4)-Hamming-code is used.

The design has 271 non-comment, non-empty Lines of Code (LoC) in five modules and its test suite has 3045 non-comment,

Please cite this article as: J. Malburg et al., Debugging hardware designs using dynamic dependency graphs, Microprocessors and Microsystems (2016), http://dx.doi.org/10.1016/j.micpro.2016.10.004

ARTICLE IN PRESS

JID: MICPRO 8

[m5G;October 24, 2016;9:33]

J. Malburg et al. / Microprocessors and Microsystems 000 (2016) 1–13

Fig. 7. Comparison of a dynamic slice for double_fpu_verilog at the expression level (a) and the file level (b).

Table 1 Average number of vertices per time cycle for the different abstraction levels. Design

Expression

Statement

Basic block

Process

Module/file3

Converter FPU

43.7 205.9

14.9 75.2

14.4 30.6

5.6 12.3

4.3 3.1

non-empty LoC. The complete test suite takes 307 clock cycles. The full dependency graph of the complete test suite consists of 13,401 vertices. When using file or module level3 the graph is reduced to 1315 vertices, which is a reduction of approximately a factor of ten. The second design is the double_fpu_verilog design available from OpenCores.org. The design supports four arithmetic operations: • • • •

Addition Subtraction Multiplication Division Also it supports four rounding modes:

• • • •

Round Round Round Round

to to to to

+Infinite -Infinite nearest even Zero

For each operation of the design exactly one rounding mode and one arithmetic operation must be chosen. The design consists of 1710 non-comment, non-empty LoC and has seven modules. For the evaluation the original test suite is used. This test suite takes 4102 clock cycles to run and consists of 649 LoC. For the double_fpu_verilog design, a reduction of the amount of vertices by a factor of over 66, between expression level and module level, is achieved. Table 1 gives the average size of the graph per clock cycle for each level of abstraction for both designs. In Fig. 7 we see an excerpt of the dynamic dependency graph for the double_fpu_verilog design. The figure compares the graph at expressions level and at file level.

on the original design. In case of the double_fpu_verilog design the original test suite consists of one single test case containing 50 different operations. As our approach requires several test cases, each of these operations has been split in their own test case. Further, a second test suite for the double_fpu_verilog is used. This test suite consists of 20 0 0 (constrained) random use cases, each use case executes a single operation. Neither test suite of the double_fpu_verilog design reaches full statement- or toggle-coverage. All those test cases check the outputs of the design against the non-faulty design.

5.2.1. Traversing rules For the evaluation we define five different approaches to traverse the design while searching for the faults. Each approach defines an ordering in which the parts of the design code are investigated. In this evaluation for each approach the amount of code is estimated, that must be visited until the faults are reached. Obviously, the performance is better if the amount of code, that must be inspected in order to find the fault, is smaller. In the following the approaches are described: •





5.2. Localization power In this evaluation we measure the ability of fault localization, dynamic slicing, and combinations of those techniques to reduce the amount of code a developer has to consider to find a fault. The measurement is conducted for the single fault case and the multiple fault case. In case of the converter design the original test suite has been used. This test suite reaches 100 percent branch-coverage • 3

As each file contains exactly one module the number of vertices for module and file level are identical.

Pure trace inspection (Tr): Starting from the execution points corresponding to primary outputs, which caused assertions to fail, a Breadth First Search (BFS) is used to create the ordering of source code positions to inspect. Pure fault localization (FL): The Tarantula heuristic or Ochiai heuristic is used to compute the likelihood of the different source code positions of to be faulty, as well as the corresponding confidence value. The positions in the code are ordered with respect to their likelihood of being faulty. In case of equal likelihood the positions are further ordered with respect to their confidence. Fault localization with forward slicing (FS): First, source code positions are ranked like in the pure fault localization approach. Then all execution points in the traces, belonging to failing tests, are gathered which correspond to source code positions with the highest ranking with respect to the ranking of FL. Next, the forward slices are computed, using these positions as slicing criterion. A BFS is used on the slices with the slicing criteria as starting points. If all elements in the slices are visited, but not all source code positions in the failing traces, then the set of source code positions, which are ranked highest by the fault localization and not yet included in the ordering are computed. These positions are then used as the next slicing criterion. This is repeated until all source code positions contained in the traces of failing tests are included in the ordering. Fault localization with backward slicing (BS): The ordering is computed analogous to FS, except for that instead of forward slicing, backward slicing is used.

Please cite this article as: J. Malburg et al., Debugging hardware designs using dynamic dependency graphs, Microprocessors and Microsystems (2016), http://dx.doi.org/10.1016/j.micpro.2016.10.004

ARTICLE IN PRESS

JID: MICPRO

[m5G;October 24, 2016;9:33]

J. Malburg et al. / Microprocessors and Microsystems 000 (2016) 1–13 •

Fault localization with unidirectional search (US): As before the source code positions are ranked like in the pure fault localization approach. Then all execution points are gathered, which correspond to source code positions with the highest ranking. Next, the source code positions are ordered based on the distance from these execution points. For the distance computation the direction of the edges is not considered. Again, if not all source code positions included in traces of failing tests are ordered the computation of the ordering is continued with a new starting set based on the fault localization method.

In addition we also compare the results to spectrum-based feature localization using statement coverage as program spectrum [5]. For this we use the Tarantula heuristic and the Ochiai heuristic. In previous work we have presented a combination of toggleand statement-coverage for coverage based analysis [19]. We also evaluate spectrum-based feature localization using this combination as program spectrum. For the classical feature localization the pure fault localization ordering is used. We present the results in the percentage of statements, relative to the total amount of statements, which must be inspected until the fault is reached. For all statements in the source code, only the first appearance is considered. In case that the ordering of several statements is not unique, we give the worst case estimate. For example, if the trace contains 100 different statements, where five statements have the highest likelihood value and equal confidence; and further one of these five statements contains the fault, then the resulting relative amount is 5%. Note that this evaluation is based on strict rules for traversing the resulting dynamic slices. However, as the generated dynamic dependency graphs also contain the operand values, a developer may see which operand of an expression is incorrect and only follow the corresponding part of the dynamic dependency graph. Thus, the amount of code which a developer has to inspect may be lower than the values given in this evaluation. As such effects strongly depend on the experience of the developer and his knowledge about the design, those effects cannot be measured in absolute numbers. 5.2.2. Fault model In this evaluation we use the following types of faults: change of operand (opa), change of operator (opr) and extra code (ec). We consider designs including a single fault and designs including multiple faults. Faulty designs are only considered if they fulfill a set of requirements. Let E be the set of faults in the faulty design. Let T be the set of test cases, and H be the correct design. In this sense correct design means ∀t ∈ T, t succeeds on H. Requirement (1) is:

(∃s ∈ T , s succeeds on H  E ) ∧ (∃ f ∈ T , f fails on H  E ) Thus, there is at least one test case which fails on the faulty design and at least one test case which runs successfully on the design. This requirement is needed for the fault localization technique to be applicable. Requirement (2) is:

∀e ∈ E, ∃t ∈ T , t fails on H  {e} Each fault causes at least one test case to fail on its own. This requirement ensures that a fault is not an equivalent mutant. Requirement (3) is:

∀e ∈ E, ∃t ∈ T ,(oHE [t] = oHE \e [t] )∨ (aHE [t] = aHE \e [t] ) This restriction ensures that each fault has an effect on the behavior of the design. Note that in case of a single fault, i.e., |E | = 1, requirement (1) implies requirement (2). For the designs and their test suites used in this evaluation, the assertions of the test cases

Table 2 Relative amount of statements (in percent), which must be inspected until the fault is found.

C1 C1 C2 C2 C3 C3 C4 C5 C6 C7 C8 C9 F1 F1’ F2 F2’ F3 F3’ F4 F4’ F5 F6 F7 F7 F7’ F7’ F8 F8 F8’ F8’ F9 F9 F9’ F9’ F10 F10 ∅

first all first all first all

first all first all first all first all first all first all first all

classical approaches dynamic dependency graph & dynamic slicing based approach. ST SO STT STO FLT FLO Tr BST BSO FST FSO UST USO 8 16 11 16 8 23 8 8 33 8 24 8 39 98 98 98 98 61 65 44 49 44 57 56 65 52 23 25 30 44 3 4 8 3 15 3 11 3 28 31 92 36 54 33 36 55 59 44 63 63 67 51 86 60 32 46 8 16 3 25 31 19 19 25 25 89 60 67 85 41 41 33 44 44 29 29 32 32 25 25 27 23 12 12 7 12 12 12 12 12 12 66 66 46 66 5 5 22 5 5 5 5 5 5 25 25 26 23 9 9 7 9 9 9 9 9 9 8 8 13 8 8 8 29 8 8 8 8 8 8 25 25 26 23 11 11 6 11 11 11 11 11 11 28 5 35 24 26 6 12 39 6 14 6 26 6 2 2 2 2 4 3 17 4 4 7 6 8 8 2 2 6 2 3 3 17 5 3 8 3 <1 3 63 63 7 63 4 4 14 4 4 4 4 4 4 63 63 7 5 3 3 13 3 3 3 3 3 3 78 61 3 61 <1 <1 18 15 15 2 2 7 7 71 61 19 61 10 1 15 23 9 18 1 16 18 7 6 13 4 <1 <1 10 7 <1 5 <1 6 <1 6 6 4 2 1 1 9 8 8 5 5 6 6 3 3 1 3 6 5 16 10 10 8 7 15 15 76 64 17 64 <1 <1 15 14 <1 10 <1 17 <1 76 61 1 4 <1 12 9 <1 7 <1 <1 <1 <1 76 61 28 18 5 23 36 20 29 13 25 35 30 72 65 8 12 3 9 8 <1 2 8 <1 1 <1 16 27 33 17 28 19 29 28 28 72 65 27 18 4 61 19 63 <1 12 7 <1 8 <1 17 <1 9 71 68 71 68 10 21 24 16 35 21 33 30 34 4 61 20 61 9 14 6 9 3 9 11 9 9 70 68 70 68 17 26 23 25 26 22 34 29 25 2 2 4 2 <1 4 5 1 3 4 3 4 3 11 21 13 36 18 18 64 63 63 63 14 16 22 2 2 2 1 2 4 4 1 1 4 4 4 4 64 63 64 63 17 17 22 6 9 14 20 19 19 7 6 5 1 9 7 19 2 3 10 12 3 7 29 33 32 41 41 77 77 77 77 33 34 31 32 42.9 43.3 27.4 36.0 10.9 13.4 17.4 14.0 14.5 13.3 15.0 16.0 15.9

only consider the primary outputs of the design; therefore (aHE [t] = aHE\e [t]) → (oHE [t] = oHE\e [t]). Because assertions may consider internal signals, this relation does not hold for arbitrary design/test suite combinations. Further, faults of the type extra code are rejected if they are equivalent to a changed operand or changed operator fault. The last rule is used to ensure more different fault effects. Applying these requirements, a large portion of the generated extra code faults were rejected. In most cases those faults were rejected, because they were equivalent to an operand change or because they caused all test cases to fail. Two extra code mutants were rejected because they were equivalent to the original design. The extra code faults we used for the evaluation consist of additional if-branches and additional sub-expressions in formulas. 5.2.3. Results The traces used for the evaluation are created as follows: For each test case one single trace is computed. In case of a successful test the ending set consists of all values observed at the primary outputs. In case of failing tests, the ending set consists of all output values which are checked by failing assertions. In Table 2 we give the relative amount of statements which must be inspected until the faults are found. The top row indicates which approach is used. The classical coverage metric based feature localization using only statement coverage is indicated by an “S” and the variant using statement coverage and toggle coverage is indicated by “ST”. For approaches which include fault localization, the use of the Ochiai heuristic is indicated by a subscript O and the use of the Tarantula heuristic by a subscript T. In case of multiple faults in a faulty design we give the values to find the first fault and to find all faults. The faulty versions of the converter design starts with a “C”, the faulty versions of the double_fpu_verilog design start with an “F”. The faulty design F10

Please cite this article as: J. Malburg et al., Debugging hardware designs using dynamic dependency graphs, Microprocessors and Microsystems (2016), http://dx.doi.org/10.1016/j.micpro.2016.10.004

9

JID: MICPRO 10

ARTICLE IN PRESS

[m5G;October 24, 2016;9:33]

J. Malburg et al. / Microprocessors and Microsystems 000 (2016) 1–13

contains four faults, all other designs with multiple faults contain three. In case of the double_fpu_verilog design, the evaluations using the original test suite are indicated with ’. In case of the designs F5 and F6 the original test suite does not include any failing test case. In case of F10 one fault is only covered by successful test cases, in case of the original test suite. Thus, in those cases our requirements are not fulfilled. The best result in each row is indicated by gray background. Note that in some cases, for example F4, due to rounding the difference between two approaches is not shown by the numbers given in the table, especially in the cases where less than one percent of the design has to be inspected. Our approach using dynamic dependency graphs in combination with the Tarantula heuristic and the pure fault localization traversing order yields the best result with an average of about 10.9% of the design to inspect in order to find the faults. Further, in 15 cases it is among the approaches yielding the best result and in five cases it yields better results than all other approaches. Further, we see that any dynamic dependency graph or dynamic slicing based approach yields better average results than the best classical approach. Also the dynamic dependency graph and dynamic slicing based approaches are more often among approaches yielding the best localization than any classical coverage based approach. The pure trace inspection ordering yields the worst result among the dynamic dependency graph based approaches, but still yields better results than the coverage metric based results. Remarkably, in four cases the pure trace inspection yields better results than any other approach. However, in only six cases trace inspection yields a localization, which is among the best. This result is rather expected as the pure trace inspection, in contrast to all other approaches, is not tuned towards finding faults. Therefore, the quality of the pure trace inspection only depends on the distance of the fault to the primary outputs. Comparing the approaches using backward- and forwardslicing, we see that they yield a similar average result. The backward-slicing based approaches, however, in 17 cases yield results among the best results, compared to 13 for the forwardslicing approach. With six cases to two cases the backward slicing based approach also more often yields the overall best result. The approaches using unidirectional search yield worse results, considering the average amount of statements which must be inspected as well as cases for which they provided a localization among the best localizations (11 times) and cases where they yield better results than any other approach (only once). Comparing the two heuristics on average the Tarantula heuristic includes 20.7% of the design or 13.5% if the coverage metric based approaches are not considered; the Ochiai heuristic includes 23.0% or 14.7%, respectively. Only in case of the unidirectional search the Ochiai heuristic yields a better average result than the Tarantula heuristic. In 24 cases (or 21 cases if only the approach using dynamic dependency graphs are considered) the Tarantula heuristic is among the best results and in 15 (12) cases an approach using the Tarantula heuristic is better than any approach not using it. The Ochiai heuristic is only among the best result in 18 (15) cases and yields the best result in 8 (6) cases. Thus, for the designs considered in this evaluation we can conclude that the Tarantula heuristic yields the better localization.

5.2.4. Limitations Finally, during the evaluation a few faults were found that are executed by the test cases and result in failing test cases, but the faults are not included in the slices. Given the traversing rules of the different approaches, this results in a 100% inspection of the whole design before such a fault is found. The corresponding faults cause a branch-condition to evaluate to false for any test case. In turn, this results in assignments to variables not being executed.

The following simplified code shows that problem:

If the condition is faulty in a way that it always evaluates to false, the inner assignment is not executed and correspondingly, a keeps the value 0. As the contents of the if-branch is never executed, there is no data- or control-dependency from the ifcondition to the output and thus the if-condition is not included in the trace. This shows the difference between full slicing and relevant slicing. In our work the slices are computed with respect to their control- and dataflow. This correlates to the definition of full slicing. If a relevant slicing based approach would be used, this type of faults could be found as well. However, the use of relevant slicing increases the computational effort during the generation of the slices and in most cases increases the size of the slices, which again reduces the localization quality. In case of missing code, the fault localization part as well as the slicing part provide no help, as they search for faults in existing code. However, we believe that the reverse debugging part of our approach is still helpful in those cases. The assumption behind is that the developer has some idea about the computation which must be executed and can recognize that some parts of them are missing. 5.3. Debugging with the presented approach In the previous section we measured the amount of code a developer has to inspect until he reaches the faulty position in the source code using fixed rules for traversing the design. As already discussed, in case of real debugging, a developer is not required to follow such strict traversing rules. He may, for example, be able to identify the incorrect operands of an expression and only follow the corresponding dataflow. Furthermore, inspecting the faulty part of the code is not enough to fix a bug, the developer must also identify the code as faulty and replace it with a correct version. Therefore, the quality of debugging techniques is mostly defined with respect to the amount of time saved by using such techniques. In this subsection we present a case study, which measures the time a developer needs to fix a bug using a conventional approach and the time he needs using our presented approach. For this evaluation again the converter design is used. Here the use of conventional approaches means: waveform view, executions with breakpoints, single step simulation, and variable inspection. The user in this case study is the original developer of the converter and the converter’s test-suite. For the evaluation two sets of incorrect design versions were created. Each set contains five faulty versions of the design, where each of those versions contains two to four faults. The faulty versions have been created using the three fault types used in the previous evaluation: change of operator (opr), change of operand (opa), and extra code (ec). Additionally, we used the following fault types as well: missing code (mc), and change of module parameters (pc). Further, most of the requirements from the previous evaluation have been lifted for this case study, the only restriction left is that for a faulty version of the design at least one test case must fail. We tuned the creation procedure of the two sets such that each set contains comparable faulty versions. This means for each faulty version in one set, there exists a faulty version in the other set with the same type of mutation operators used for its creation. The evaluation user uses ModelSim [29] on a regular basis at his work. To get used to our tool the evaluation user got an

Please cite this article as: J. Malburg et al., Debugging hardware designs using dynamic dependency graphs, Microprocessors and Microsystems (2016), http://dx.doi.org/10.1016/j.micpro.2016.10.004

ARTICLE IN PRESS

JID: MICPRO

[m5G;October 24, 2016;9:33]

J. Malburg et al. / Microprocessors and Microsystems 000 (2016) 1–13

11

Table 3 Time comparison between debugging with conventional approach and our approach. Version

Fault types

Conventional

Our approach

Speed up

Factor

1 2 3 4 5

opa opr,opa opr pc,ec,mc ec,op

7 min 24 s 18 min 33 s 6 min 20 s 13 min 32 s 12 min 10 s

1 2 7 9 7

5 min 30 s 15 min 75 s -1 min 32 s 4 min 13 s 4 min 18 s

3.89 6.70 0.81 1.45 1.55

57 min 59 s 11 min 35 s

29 min 45 s 5 min 57 s

28 min 14 s 5 min 38 s

1.95 1.95

Sum Avg

additional set of four faulty versions of his design. He adapted the following debug flow for using our tool: As input for the spectrumbased fault localization he used the complete dynamic dependency graph in case of a successful run. As heuristic for feature localization the Tarantula heuristic is used. However, in case of failing runs he used dynamic backward slices, where the slicing criteria are the outputs of the design at the clock ticks when an assertion in the test case failed. Correspondingly, he only investigated backward slices in the graph view. Then he first checked the source code position which has been considered suspicious by the spectrumbased fault localization and their predecessors in the dynamic dependency graph. In case of faulty operands and faulty operators this often already revealed the fault and the design could be fixed immediately. If this first approach did not reveal a fault or if there has been no suspicious code, the user chose one single erroneous output and traced it back within the graph. Missing code mutants and wrong module parameters were normally revealed because expected code was not part of the graph. In the end it was checked whether the design was corrected. After the training period, the time the developer needed to fix the bugs was measured in one set using the conventional technique and the time he needed to fix the bugs in the other set using our tool. For both tools, the user had a report which test cases succeed, which test cases failed, and, if test cases failed, which assertions were violated. The amount of bugs in the different versions of the design was unknown to the user. The design was considered fixed, when the test suite ran successfully on the design. The results of our evaluation are shown in Table 3. For readability reasons faulty versions are sorted, such that the corresponding errors are in the same row. During the evaluation the user did not know about such correspondences. Running the test suite required 2 min 48 s for the uninstrumented version and 2 min 57 s for the instrumented version, respectively. However, a fault can mask other faults. For example, in one version the first fault causes a constant output, the second fault is a missing module instantiation and the third fault is located in the corresponding module. Hence, in this case it is necessary to execute the test suite at least three times to find all three faults. In other versions one execution of the test suite is sufficient to correct all faults. Therefore, the time shown is the time required for fixing the bug, excluding the time required to run the test suite. Otherwise, the time would be dominated by running the test suite in cases where the faults mask each other. In this evaluation the tool improves the debugging time by approximately a factor of two. Comparing the different fault types, the tool is especially effective in finding faulty operands, in most cases immediately highlighting the corresponding code. Compared to that, the improvement in finding missing code is only small. Missing code faults, as well as faults as described in Section 5.2.4 were found because expected code was not included in the dynamic dependency graphs. This is also the reason for the large debug time in version 3. Here, one of the mutations turns a conditional statement into dead code. The developer interpreted this as additional code and removed it. After that a large por-

min min min min min

54 46 52 19 52

s s s s s

tion of the debug time was spent to reconstruct the now missing code. 6. Conclusion In this paper we showed the use of reverse debugging, dynamic backward and forward slicing considering control dependencies, and spectrum-based fault localization for hardware designs in an HDL-description. In our case study the presented approach reduced the debugging time to 50%. The evaluation shows that fault localization based on dynamic dependency graphs yields more precise results than fault localization based on coverage metrics. Our approach is scalable to large designs because run time and memory usage only depend on simulation and thus are linear with respect to the design size and the clock cycles in the test suite. In addition, our approach uses several layers of abstraction reducing the complexity. Experiments with a larger design showed that the effect of this abstraction increases with the size of the design under consideration, such that the user can navigate to the fault location even in large designs. Acknowledgment Funding: This work was supported in part by the German Research Foundation (DFG) grant number FE 797/6-1 and the European Union grant number 644905. References [1] H. Foster, Assertion-based verification: Industry myths to realities (Invited tutorial), in: Computer Aided Verification. Vol. 5123 of Lecture Notes in Computer Science, 2008, pp. 5–10, doi:10.1007/978- 3- 540- 70545- 1_3. [2] B. Lewis, Debugging backwards in time, in: International Workshop on Automated and Algorithmic Debugging, 2003, pp. 225–235. [3] Free Software Foundation, Inc., GDB: The GNU Project Debugger. Accesss date: 01.03.2015. URL http://www.gnu.org/software/gdb/. [4] B. Korel, J. Laski, Dynamic program slicing, Inf. Process. Lett. 29 (1988) 155– 163, doi:10.1016/0020-0190(88)90054-3. [5] J.A. Jones, M.J. Harrold, J.T. Stasko, Visualization for fault localization, in: ICSE Workshop on Software Visualization, 2001, pp. 71–75. [6] H. Agrawal, J.R. Horgan, Dynamic program slicing, ACM SIGPLAN Notices 25 (6) (1990) 246–256, doi:10.1145/93548.93576. [7] M. Weiser, Program slicing, in: International Conference on Software Engineering, 1981, pp. 439–449. [8] M. Renieres, S. Reiss, Fault localization with nearest neighbor queries, in: International Conference on Automated Software Engineering, 2003, pp. 30–39, doi:10.1109/ASE.2003.1240292. [9] A. Groce, S. Chaki, D. Kroening, O. Strichman, Error explanation with distance metrics, Int. J. Software Tools Technol. Trans. 8 (2006) 229–247, doi:10.1007/ s10 0 09-0 05-0202-0. [10] A. Zeller, Yesterday, my program worked. today, it does not. why? in: European Software Engineering Conference. Vol. 1687 of Lecture Notes in Computer Science, Springer, 1999, pp. 253–267, doi:10.1145/318773.318946. [11] E. Clarke, M. Fujita, S. Rajan, T. Reps, S. Shankar, T. Teitelbaum, Program slicing of hardware description languages, in: Correct Hardware Design and Verification Methods. Vol. 1703 of Lecture Notes in Computer Science, 1999, pp. 298– 313, doi:10.1007/3- 540- 48153- 2_22. [12] M. Abramovici, P.R. Menon, D.T. Miller, Critical path tracing - an alternative to fault simulation, in: Design Automation Conference, 1983, pp. 214–220, doi:10. 1109/DAC.1983.1585651. [13] M.-C. Lai, C.-H. Lee, B.-H. Ho, J.-S. Tsai, Active trace debugging for hardware description languages. US Patent No. 6546526, 2003.

Please cite this article as: J. Malburg et al., Debugging hardware designs using dynamic dependency graphs, Microprocessors and Microsystems (2016), http://dx.doi.org/10.1016/j.micpro.2016.10.004

JID: MICPRO 12

ARTICLE IN PRESS

[m5G;October 24, 2016;9:33]

J. Malburg et al. / Microprocessors and Microsystems 000 (2016) 1–13

[14] A. Smith, A. Veneris, M. Ali, A. Viglas, Fault diagnosis and logic debugging using boolean satisfiability, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 24 (10) (2005) 1606–1621, doi:10.1109/TCAD.2005.852031. [15] Y.-C. Hsu, B. Tabbara, Y.-A. Chen, F. Tsai, Advanced techniques for RTL debugging, in: Design Automation Conference, 2003, pp. 362–367, doi:10.1109/DAC. 2003.1219025. [16] I. Beer, S. Ben-David, H. Chockler, A. Orni, R. Trefler, Explaining counterexamples using causality, in: Computer Aided Verification. Vol. 5643 of Lecture Notes in Computer Science, 2009, pp. 94–108, doi:10.1007/ 978- 3- 642- 02658- 4_11. [17] X. Zhang, H. He, N. Gupta, R. Gupta, Experimental evaluation of using dynamic slices for fault location, in: International Symposium on Automated and Analysis-Driven Debugging, 2005, pp. 33–42, doi:10.1145/1085130.1085135. [18] N. Wilde, M.C. Scully, Software reconnaissance: Mapping program features to code, J. Software Maint. 7 (1) (1995) 49–62, doi:10.10 02/smr.4360 070105. [19] J. Malburg, A. Finder, G. Fey, A simulation based approach for automated feature localization, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 33 (2014) 1886–1899, doi:10.1109/TCAD.2014.2360462. [20] A. Zeller, R. Hildebrandt, Simplifying and isolating failure-inducing input, IEEE Trans. Software Eng. 28 (2) (2002) 183–200, doi:10.1109/32.988498. [21] M. Stumptner, F. Wotawa, Model-based diagnosis of hardware description languages, in: Computational Engineering in Systems Applications, 1996. [22] R. Reiter, A theory of diagnosis from first principles, Artif. Intell. 32 (1) (1987) 57–95, doi:10.1016/0 0 04-3702(87)90 062-2.

[23] A. Sülflow, G. Fey, C. Braunstein, U. Kühne, R. Drechsler, Increasing the accuracy of SAT-based debugging, in: Design, Automation and Test in Europe, 2009, pp. 1326–1331, doi:10.1109/DATE.2009.5090870. [24] H. Le, D. Grosse, R. Drechsler, Automatic TLM fault localization for systemC, Comput. Aided Des. Integr. Circuits Syst. IEEE Trans. 31 (8) (2012) 1249–1262, doi:10.1109/TCAD.2012.2188800. [25] K.-H. Chang, I. Markov, V. Bertacco, Fixing design errors with counterexamples and resynthesis, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 27 (1) (2008) 184–188, doi:10.1109/TCAD.2007.907257. [26] A. Tepurov, V. Tihhomirov, M. Jenihhin, J. Raik, G. Bartsch, J.H.M. Escobar, H.D. Wuttke, Localization of bugs in processor designs using zamiaCAD framework, in: International Workshop on Microprocessor Test and Verification, 2012, pp. 41–47, doi:10.1109/MTV.2012.20. [27] U. Repinski, H. Hantson, M. Jenihhin, J. Raik, R. Ubar, G. Di Guglielmo, G. Pravadelli, F. Fummi, Combining dynamic slicing and mutation operators for ESL correction, in: IEEE European Test Symposium, 2012, pp. 1–6, doi:10.1109/ ETS.2012.6233020. [28] R. Abreu, P. Zoeteweij, A.J.C. van Gemund, An evaluation of similarity coefficients for software fault localization, in: Pacific Rim International Symposium on Dependable Computing, 2006, pp. 39–46, doi:10.1109/PRDC.2006.18. [29] Mentor Graphics, ModelSim. Access date: 01.03.2015. URL http://model.com/.

Please cite this article as: J. Malburg et al., Debugging hardware designs using dynamic dependency graphs, Microprocessors and Microsystems (2016), http://dx.doi.org/10.1016/j.micpro.2016.10.004

JID: MICPRO

ARTICLE IN PRESS J. Malburg et al. / Microprocessors and Microsystems 000 (2016) 1–13

[m5G;October 24, 2016;9:33] 13

Jan Malburg received his diploma in computer science from the Hochschule für Technik und Wirtschaft des Saarlandes, Germany in 2009, his master degree from the Saarland University, Germany in 2011, and his Dr.-Ing. degree from the University of Bremen, Germany in 2015. From 2011 to 2012 he was with the Group of Computer Architecture headed by Rolf Drechsler. From 2012 to 2015 he was a member of the Group Reliable Embedded Systems led by Görschwin Fey. Since 2016 he is a researcher at the German Aerospace Center. His research interests are in debugging, design understanding, and feature localization.

Alexander Finder received his diploma in computer science from the University of Bremen, Germany in 2009. From 2009 to 2012 he was with the Group of Computer Architecture headed by Rolf Drechsler. From 2012 to 2013 he was a researcher in the Group of Reliable Embedded Systems led by Görschwin Fey. His research focused on automation of debugging methods within the whole circuit design using formal and semi-formal methods. Since 2013 he is the responsible Software Architect at Daimler AG for autonomous driving.

Görschwin Fey received his diploma in computer science from the Martin-Luther Universität, Halle-Wittenberg, Germany in 2001 and his Dr.-Ing. degree from University of Bremen in 2006. Since 2012 he heads the Reliable Embedded Systems Group at the University of Bremen and leads the Department of Avionics Systems in the Institute of Space Systems of the German Aerospace Center. His research interests are in testing and formal verification of circuits and systems with a particular focus on debugging and diagnosis.

Please cite this article as: J. Malburg et al., Debugging hardware designs using dynamic dependency graphs, Microprocessors and Microsystems (2016), http://dx.doi.org/10.1016/j.micpro.2016.10.004