Machine learning and structural characteristics for reverse engineering

Machine learning and structural characteristics for reverse engineering

Journal Pre-proof Machine learning and structural characteristics for reverse engineering Johanna Baehr, Alessandro Bernardini, Georg Sigl, Ulf Schlic...

930KB Sizes 2 Downloads 61 Views

Journal Pre-proof Machine learning and structural characteristics for reverse engineering Johanna Baehr, Alessandro Bernardini, Georg Sigl, Ulf Schlichtmann PII:

S0167-9260(19)30304-9

DOI:

https://doi.org/10.1016/j.vlsi.2019.10.002

Reference:

VLSI 1633

To appear in:

Integration

Received Date: 14 June 2019 Revised Date:

1 October 2019

Accepted Date: 3 October 2019

Please cite this article as: J. Baehr, A. Bernardini, G. Sigl, U. Schlichtmann, Machine learning and structural characteristics for reverse engineering, Integration (2019), doi: https://doi.org/10.1016/ j.vlsi.2019.10.002. This is a PDF file of an article that has undergone enhancements after acceptance, such as the addition of a cover page and metadata, and formatting for readability, but it is not yet the definitive version of record. This version will undergo additional copyediting, typesetting and review before it is published in its final form, but we are providing this version to give early visibility of the article. Please note that, during the production process, errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain. © 2019 Published by Elsevier B.V.

Machine Learning and Structural Characteristics for Reverse Engineering ✩ Johanna Baehr, Alessandro Bernardini, Georg Sigl, Ulf Schlichtmann Technical University of Munich, Munich, Germany

Abstract In the past years, much of the research into hardware reverse engineering has focused on the abstraction of gate level netlists to a human readable form. However, none of the proposed methods consider a realistic reverse engineering scenario, where the netlist is physically extracted from a chip. This paper analyzes the impact of errors caused by this extraction and the later partitioning of the netlist on the ability to identify the functionality. Current formal verification based methods which compare against golden models are incapable of dealing with such erroneous netlists. Two methods focusing on the idea that structural similarity implies functional similarity solve this problem: The first new approach uses fuzzy structural similarity matching to compare the structural characteristics of an unknown design against designs in a golden model library. The second new approach proposes a method for inexact graph matching using fuzzy graph isomorphisms, based on the functionalities of gates used within the design. In addition, past attacks on obfuscation methods such as logic locking have required access to an activated chip to compare the obfuscated netlist to a functionally equivalent model. The proposed methods can also find a golden model without the need of an activated chip, so that attacks can occur even before production and activation of the chip. Experiments show that for simple logic locking the approaches identify a suitable golden model in more than 80% of all cases. For realistic error percentages, both approaches can match more than 90% of designs correctly. This is an important first step for hardware reverse engineering methods beyond formal verification based equivalence matching. Keywords: Netlist Reverse Engineering, Netlist Partitioning, Structural Similarity, Malicious Design Modifications, IP Infringement, Logic Obfuscation

1. Introduction Within the past years, research into the reverse engineering of unknown designs has become more and more common, whether to verify designs against malicious logic, or to find cases of patent violation. On the other hand, protection methods have also become more sophisticated, to counteract the increase in threats from Integrated Circuit (IC) overbuilding and design piracy. The reverse engineering work flow of an IC, starting from an unknown chip, is a complex process. As illustrated in Figure 1, the chip must first be depackaged, delayered and imaged, which are physical processes. Image processing methods are used to stitch the resulting images, and to identify interconnects and standard cells, to reach a gate level netlist [1]. The last step, in which a gate-level netlist is abstracted into a high level description, i.e. into a human readable form, has proven to be particularly challenging. Here, a two stage process is required to first partition a large gate-level netlist into smaller subcircuits, and secondly, identify the functionality of each subcircuit by comparison to a known golden model. Such a golden model library contains a set of functional blocks, such as processing and arithmetic modules, cryptographic elements, ✩ This

research was supported by the Fraunhofer High Performance Center Connected Secure Systems Munich. Email addresses: [email protected] (Johanna Baehr), [email protected] (Alessandro Bernardini), [email protected] (Georg Sigl), [email protected] (Ulf Schlichtmann) Preprint submitted to Elsevier

October 31, 2019

IC

Chip Images

Netlist Abstraction

High Level Description

?

StandardCell Identification

Stitching

Imaging

Delayering

Depackaging Foundry

Interconnect Identification

Image Processing

Physical Process

Netlist Reverse Engineering

Production

Figure 1: Reverse Engineering Work Flow

memories or interfaces. Formal verification methods are commonly used to compare a set of golden models to the unknown design [2]. However, the effectiveness of these methods is highly dependent on the existence of a known, functionally equivalent golden model within the golden model library. Netlists containing errors require a golden model library which includes a golden model with exactly the same errors to find a match. A single error or change within the unknown netlist will mean nonequivalence to an otherwise equivalent component. Modern methods to prevent reverse engineering or overbuilding of ICs are based on the idea of encrypting or locking an IC with a set of key gates [3],[4],[5]. In general, key gates are added or replace other gates, such that only the correct combination of key inputs, i.e. the correct key, unlocks the correct functionality of the chip. Many of these obfuscation methods are considered broken, by using key sensitization or a satisfiability solver to compare the output of a golden model to the obfuscated design. Both of these attacks, the SAT attack [6] and the Key Sensitization attack [4] are only possible when a golden model is available, for example with access to an activated chip with internal testing structures, such as an unsecured or broken scan chain [6]. Both attacks are extremely powerful in identifying the correct key and thus the correct functionality. However, this leads to the assumption that simple obfuscation methods, such as logic locking, can be secure without access to a golden model, for example on a chip with a secure scan path, or no access to the testing structures. This assumption becomes invalid if the scan path security is compromised, or if new attacks no longer require an activated chip, and thus poses a significant risk. Furthermore, and more importantly, when using new attacks which do not require an activated chip, the obfuscation can be broken much sooner in the design flow, for example by an external design house or in the foundry, even before the first chip is produced. Both tasks - reverse engineering and breaking obfuscation - require the existence of a functionally equivalent golden model, which may, either due to errors within the reconstructed netlists, or with no access to an activated chip, not exist. In this paper, we seek to identify the most correct and usable golden model from a large library of known designs. This can be used to correctly identify the functionality of a reverse engineered netlist to test for malicious logic, but also to show that key based obfuscation methods do not require an activated chip to be broken. 1.1. Errors Errors can occur from each step in the reverse engineering flow and will affect each subsequent step. Errors during delayering will affect image quality, which increases the difficulty of stitching. An ineffective stitching algorithm will cause misplacement of connections and gates, and cause errors during the standard cell extraction. While it may be possible to compensate previous errors with plausibility tests, some errors will remain due to the complexity of each step. Even with an expensive and professional setup, it can be assumed that up to 1% of the resulting netlist may be affected by an error [1]. A very general classification of these errors is: 1. Net identification errors (NetIDError): A net between two gates may be misidentified, not identified, or added. 2. Gate identification errors (GateIDError): A gate may be misidentified, not identified, or added. 2

However, the steps taken to extract a gate-level netlist from a chip are not the only steps that can introduce errors. Once a gate level netlist has been extracted, the netlist is partitioned into its subcircuits. Then, a divide and conquer approach is used to identify every subcircuit. While previous partitioning approaches have delivered good results, no method has been found which provides a perfect, error free partitioning. Early approaches use similarity score based methods to identify words of data, which are then used to carve out possible functional subcircuits [2]. Recent work has shown that, even with improvements, the method cannot reach a perfect partitioning [7]. Structural based methods were first proposed in 2016, where different clustering algorithms were used to partition circuits [8]. A 2018 paper proposes the use of the eigendecomposition of a circuit as a feature to partition designs, and compares designs synthesized with two different cell libraries [9]. Using only this feature and an input / output comparison, a partitioning accuracy of approximately 90% is reached. Thus, both with similarity score based methods and with structural methods, the identification of exact boundaries between subcircuits has proven difficult. The following errors may occur: 1. Missing gates (SensitivityError): Gates which belong to a subcircuit are excluded from the partition. 2. Added gates (OverheadError): Gates which do not belong to a subcircuit are included in the partition. Generally, the steps from an unknown device to a final, high-level design are often considered separately, without taking into account the effects they may have on each other. In particular, the effect of the mentioned errors, whether caused during the imaging process or during partitioning, have not been considered during the later functionality identification. Instead, a perfect netlist is assumed when matching against golden models. However, these errors make the use of the currently proposed formal methods to find functional equivalence impossible. 1.2. Breaking Obfuscation With an increasing fear of Intellectual Property (IP) Infringement, new methods to protect IP from threats in the supply chain were developed. One of the earliest ideas was to obfuscate the design by introducing so called key gates into the design, such that the functionality of the design is only correct if the correct key inputs are applied. In first versions, published in 2008, key gates were added randomly [3]. It was soon found that this type of obfuscation, also known as logic locking, can easily be broken by sensitizing key gates to an output using fault analysis tools and comparing this output with an unlocked design [4]. The same paper offered a solution, by inserting only nonmutable key gates, which cannot be sensitized to an output. The second major attack on obfuscation, dubbed SAT Attack, was introduced in the same year [6]. This attack uses an activated chip and an obfuscated netlist to identify special input patterns which can rule out wrong key combinations from the key space. New methods to counteract this were introduced more recently, with the AntiSAT method [5], or BDD-based methods [10]. This cycle will likely continue, with new attacks followed by new obfuscation methods. However, until today, only two attack methods have been published which do not require a golden model to break an obfuscation scheme. The first of these uses a machine learning based approach to predict which structural changes will be made by the addition of the key gates, and then recovers these changes [11]. They are able to recover on average 84%, and at best 95% of all changes. The second attack which does not require a golden model focuses on breaking stripped-functionality logic locking, which is a logic approach not considered in this paper [12],[13]. In contrast to these approaches, this paper seeks to provide an alternate method to break the obfuscation of a gate-level netlist without an activated chip, by selecting a functionally similar circuit to use as a golden model from a pre-existing component library of known components. Current state of the art methods to break obfuscation schemes which do rely on a golden model can then be used to recover the key.

3

1.3. Structural Similarity This paper proposes two new identification approaches to find a match between an unknown, erroneous or obfuscated netlist and a library of golden models. To test both approaches, netlists containing errors, netlists incorrectly partitioned and obfuscated netlists were created and tested against a golden model library. Both approaches are based on the hypothesis that structural characteristics can act as an identifying feature for functionality. Structural similarity matching is not a new problem in graph theory. Graph isomorphism, inexact graph matching, feature extraction based similarity and iterative methods for graph similarity have been studied thoroughly in the last years [14, 15]. In 2018, it was first proposed to use structural features as a basis for design interpretation, with the analysis of the structure of cryptographic circuits [1]. There, a reverse engineered design could be identified using only structural features, through visual comparison with a functionally similar, known design. This paper seeks to improve this method by extracting such structural features and using machine learning to identify similar circuits. Furthermore, a new fuzzy graph isomorphism, or inexact graph matching algorithm is proposed, which calculates a structural hash based on the functionality of the gates and uses this to compare unknown netlists to golden models. These two new methods can give insight into the functionality of a design, without the requirement of an error free or unlocked gate-level netlist. Accordingly, it is possible to identify designs which result from a chip to gate-level netlist reverse engineering flow through the use of structural characteristics. 2. Methodology 2.1. Golden Model Designs To examine the efficiency and correctness of both approaches, a set of designs from OpenCores [16], and from the IWLS benchmarks [17], were synthesized using QFlow [18] with the open source Oklahoma State University 35nm cell library (osu035) [19]. This set contains 628 netlists, ranging from 800 to 55,000 gates in size. Smaller netlists are not included, as they are often trivial to match. The designs include, but are not limited to, cryptographic circuits, microcontrollers, interfaces, memories, and hardware accelerators. This variance in composition is important to test the proposed approaches not only for one type of design functionality, but rather a broad range of different functionalities. Furthermore, particular care was taken to also include many different implementations of a similar design type. This ensures that the proposed approaches are capable of differentiating between similar design functionalities. This set of netlists is the golden model library for all subsequent tests. 2.2. Netlist Extraction Errors To simulate errors that may occur due to the netlist extraction process, half of the golden model library netlists, chosen randomly, were artificially degraded for three error types: • Connection (random): NetIDError - Random edges are selected and reassigned to a random node within the netlist. • Connection (layout): NetIDError - Random edges are selected, and reassigned to a node within the netlist which is located within the vicinity of the node. Without available layout data, close nodes are defined as all nodes within a certain small radius of the selected random edge in the graph. It is assumed that nodes connected within a graph are also close within the layout; an assumption based on the use of partitioning algorithms during the design phase. • Speckled: GateIDError - Gates are removed from a random number of areas, each of a random size.

This was done for 0.05%, 0.1%, 0.5%, 1%, 2% and 3% of all nodes (gates) or edges (connections) of the netlist, resulting in 5,652 error netlists. Error percentages of 2% or 3% are not realistic, however, they are included to test the capabilities of the proposed methods. Both a reassignment of edges or the removal of gates within a specific area (Connection (layout), Speckled) can occur due to errors in the images, for example, caused by dust on the chip. Random reassignments 4

of edges are usually caused by low image quality, or imperfect image processing software. It is assumed that missing connections are identified during the image processing steps, in particular during standard cell identification, based on the fact that some gates do not have the required number of inputs. These missing connections may be reassigned by hand, or reassigned automatically using a rule set. As this assignment may not necessarily be correct, these connections can, for the purpose of classification in this paper, be interpreted as the incorrect assignment of an edge. 2.3. Faulty Partitions A faulty partition can be classified according to a measurement of the sensitivity and the overhead compared to an ideal functional partition: # correct gates in partition # gates in ideal partition # incorrect gates in partition Overhead = # gates in ideal partition Sensitivity =

(1) (2)

A sensitivity of 100% and an overhead of 0% represents the ideal and error free case, i.e. the partition matches the ideal partition. An overhead of 100% and a sensitivity of 100% means that the partition is twice as large as the ideal. It is expected that a structural matching is effective for a high sensitivity and low overhead, and that the matching capabilities degrade both with a lower sensitivity, and a higher overhead. To examine the effect of errors caused by partitioning, two approaches to generate faulty partitions will be considered. 2.3.1. Simulated Partitions To create a set of controllable test cases, a set of errors was simulated using the designs within the golden model library. This was done by randomly and iteratively deleting gates at the borders, i.e. at the inputs and outputs, of each design, up to a certain percentage. Again, half of the designs were artificially degraded, resulting in 1,884 netlists for the simulated faulty partitions. The same number of gates are deleted as in the “Speckled” test case, however, the gates are always removed from the border, and never from within the design. While the design size is reduced, the internal structure tends to remain unchanged, preserving more of the structural features. The percentage of introduced errors correlates directly to a reduced sensitivity, i.e. for a simulated partition with an error of 3%, the sensitivity is 97%. This simulation of faulty partitions only represents a reduction of the sensitivity. Compared to the design in the golden model library (in this case the ideal partition), fewer gates are included. This makes an evaluation of the structural matching algorithms with regards to overhead difficult, as it is not possible to simulate a realistic overhead. It would mean adding gates to the design, which are not included in the golden model. However, it is difficult to predict which gates simulate a realistic surrounding, and thus should be added. 2.3.2. Realistic Partitions To examine more realistic partitions, which have both a reduction in sensitivity, and a realistic overhead, an actual partitioning of the designs in the golden model must be performed. However, without an ideal partition, the comparison regarding sensitivity and overhead is impossible. To create ideal partitions of the designs in the golden model library, the Register Transfer Level (RTL) code of all golden models was analyzed. For 296 of these designs, the RTL code contained one or more functional subcircuits. We assume that these subcircuits represent ideal functional partitions. Each of these subcircuits was then synthesized, resulting in 366 ideal partitions. The other designs were not written in a way where the extraction of functional subcircuits from the RTL code is easily possible, and thus were not considered during this step. Finally, the designs in the golden model library were partitioned using state of the art methods. Both Louvain [20] and the Markov Cluster Algorithm (MCL) [21] were used as underlying clustering algorithms. Each design was partitioned once using Louvain, and with ten parameter settings for MCL. The resulting partitions range in size from 80 - 25,000 gates. However, only those partitions which contain more than 80% 5

of the gates of an ideal partition are considered. Past work has shown that a sensitivity of more than 80% is realistic [7, 9]. A lower sensitivity should not be compensated by a more fuzzy matching algorithm, but rather requires a better partitioning algorithm. The resulting 1,892 netlists are all within a sensitivity range of 80% to 100%, and an overhead from 0% to 100%, compared to an ideal partition. These test netlists will be referred to as realistic partitions. 2.4. Obfuscation To show that a golden model can also be identified for obfuscated designs, all golden model netlists were obfuscated using random logic locking with key gates, as proposed in [3]. This was done by introducing 0.05%, 0.1%, 0.5%, 1%, 2%, 3%, 5%, 10%, 15%, 20%, 25% and 30% key gates to each netlist, if the netlist is large enough to warrant a small fraction of key gates. According to [6], 10% obfuscation overhead is feasible, while more may be considered too much of an overhead. On average, 10% obfuscation overhead corresponds to a 650 bit key. However, higher numbers of key gates are considered to test the scalability of this approach. This results in 7,536 netlists, which will be referred to as the obfuscated designs. Random logic locking is only one of the most basic obfuscation methodologies. It can be argued that the overall structure of the netlist is not strongly affected by a random addition of gates. To evaluate the effectiveness of more complex obfuscation approaches, benchmarks from [6] were also analyzed. While other benchmark suites exist [22], many of the netlists are not large enough to give a fair result. Ten designs of the Microelectronics Center of North Carolina (MCNC) benchmarks with more than 800 gates, along with their locked counterparts are considered. The following obfuscation methods are of interest: • DTC10LUT [23]: Look up Table Insertion • DAC12 [24]: Strong Logic Locking • IOLTS [25]: Minimization of Signals with Low Controllability • RND [3]: Random XOR/XNOR Insertion Each of these obfuscation methods is considered easily breakable using a SAT Attack, and thus also attackable if an appropriate golden model can be found, even without access to an activated chip. The ten original netlists are added to the golden model designs, resulting in 638 golden model designs. For the methods DAC12, IOLTS and RND, each netlist was obfuscated with an overhead of 5%, 10%, 25%, and 50% [6], resulting in 40 benchmark designs per obfuscation method. For DTC10LUT, only ten obfuscated netlists were provided. As the obfuscation methodologies are based on insertion at a gate-level, the designs were not synthesized using the above mentioned synthesis tool flow, but used as is. The effect on the success of matching will be further discussed in Section 3.3.3. 2.5. Approach 1: Fuzzy Structural Matching Fundamentally, gate level netlists can be viewed as directed graphs, where structural similarities indicate functional similarities. The shape or abstraction of a graph can be specific for a specific type of design, in particular for netlists with many functional subcircuits. Such an abstraction may be described by the number of partitions, how these are connected, or the number of nodes with many connections to other nodes. Consider an implementation of an Advanced Encryption Standard (AES) algorithm, which contains rounds, consisting of a number of Substitution Boxes (S-boxes), as well as a key expansion module. These subcircuits are connected in a way specific only to circuits implementing the AES functionality, which is mirrored in the structure. The first approach seeks to leverage this information to find a functional match between an unknown, erroneous netlist and a library of golden models. This fuzzy matching is based on a set of structural characteristics. Metrics which describe the relationship of nodes to each other, but also the structure of the design as a whole, are useful.

6

2.5.1. Structural Features The following characteristics may be distinct for a given design: • The ratio of gate functionalities, i.e. the ratio of the number of gates of a certain functionality compared to all gates. Similar designs may be synthesized using similar numbers of gates of this functionality. For example, an analysis of different types of multipliers shows that NAND gates are most common, and FlipFlops or Buffer gates are almost never part of the design. For FIFOs, OAI and FlipFlop gates are often most common, followed by NAND gates. • A set of centrality measures for each node (Closeness Centrality [26], Betweenness Centrality [27], Eigenvector Centrality [28], Degree Centrality [28], Edge Katz Centrality [28], Load Centrality [27] and Pagerank [29]), which give an overview of the cohesiveness and spread of the graph, and of how nodes affect the transfer of data within the graph. The average over the circuit, but also the distribution in the form of the bins of a histogram, as well as the entropy of the list of values for each node are calculated. • The degree, including in-degree and out-degree for each node. Again, the average over all nodes, the distribution and the entropy of the list of values for each node are calculated. The distribution is of particular interest, as it can help to distinguish between designs with very uniform connection distributions, and those which have a number of nodes with a high out-degree, for example due to control logic. • The density of the graph, which gives an indication of the interconnectedness between all nodes in the graph. Highly interconnected graphs will have a high density, while sparsely connected graphs will have a low density. For example, in an AES S-box, many of the nodes are connected to each other, resulting in a high density, while a fully parallel AES implementation, consisting of many subcircuits without interconnections, has a very low density. This results in 426 single features for each design, where, for example, one feature may represent the entropy of a list, or the number of nodes with a degree within a specific histogram bin. Note that no preprocessing, such as removal of buffers or transformation of the netlist occurs, either for the golden model or for the unknown netlist. Preprocessing the unknown design is generally not possible, when the netlist is extracted from a chip. Furthermore, removing structural features will remove exactly those identifying features used to classify the design. Preprocessing golden models, but not the unknown design will bias the matching probability. 2.5.2. Structural Similarity Matching by Feature Extraction Generally, designs with similar structural features are alike. These features can be represented in an n-dimensional feature space. To find a match between an unknown design and a set of golden models, the golden model in the feature space with features close to those of the unknown design should be found. This can be done by K-Nearest-Neighbor (KNN) Classification, a Machine Learning algorithm where features of the golden model library are used as a training set to build the feature space, and features of the unknown design are used as test data. Figure 2 illustrates the steps taken. First, the structural features are extracted for all designs within the golden model library. Different graph algorithms are applied to each netlist, resulting in a set of data points for each design. Simpler features such as the ratios of gate functionalities are also calculated here. In general, the scale of the features should not affect the classification result. However, the numeric value of one feature may be thousands of times higher than that of another feature, without any correlation to the importance of this feature for matching. Some features represent ratios, some represent distributions and others are absolute values. This means that a feature with a range of values larger than one has a stronger effect on the correct match than a feature with values always between [0, 1]. Figure 3 further illustrates this effect. Both figures show the relationship between two variables, in this case the “Maximum Degree” of the graph, and the “Ratio of NAND gates” compared to all gates. The largest value for “Maximum Degree” is 8200, while the largest value for “Ratio of NAND gates” is 0.8. 7

Set of Golden Models

Feature Extraction

Do only once Unknown Design

Feature Extraction

Training Data

Scale Data

Scaled Training Data

Project Data

Train Scaler

Scaler

Train PCA

PCA

Test Data

Scale Data

Scaled Test Data

Project Data

Final Training Data Train KNN

Final Test Data

Figure 2: Fuzzy Structural Similarity Matching by Feature Extraction

Given the data from Figure 3(a), the KNN algorithm will grossly overestimate the importance of the feature “Maximum Degree”, based on the fact that the distance between points can be very large. When a new, unknown data point is inserted, the matching to the next nearest known data point will occur based on this variable, rather than on the feature ”Ratio of NAND gates”. Figure 3(b) shows the scaled version of the same data, and now, a new unknown data point added to this data set can be matched to the nearest golden model data point more fairly. As the KNN Classifier is distance based, it is particularly dependent on scaled data. Without the use of a scaler, the machine learning algorithm uses only on the largest ranged features; smaller ranged features will affect the matching much less, even though these may contribute the same, if not more information. As it is unknown which features provide relevant information and which features provide only noise, the use of unscaled data greatly reduces the quality of the matching algorithm, as much of the gathered information is not being used. To eliminate this effect, a scaler is fitted with the training data, and the data is scaled. Eight scalers were tested, but due to the large differences of the range of the values of the features, the focus was placed on scalers able to handle outliers well. The Uniform Quantile Transformer scaler provides the best results; it maps each feature to a uniform distribution ranging between [0, 1], making it especially robust to outliers [30]. (a) Unscaled Data

(b) Scaled Data 1.0 Scaled Maximum Degree

Maximum Degree

8000 6000 4000 2000 0

0

0.2

0.4

0.6

0.8 0.6 0.4 0.2 0.0

0.8

0.0

Ratio of NAND gates

0.2

0.4

0.6

0.8

Scaled Ratio of NAND gates

Figure 3: Effect of scaling on two features

8

1.0

After scaling, the high dimensionality of the data set is reduced using a Principal Component Analysis (PCA). The PCA is fitted with the training data, and the data is transformed. For the above mentioned scaled feature data, the first 67 principal components conserve 95% of the data variance. Using these components reduces the number of features to be trained by ≈ 85%. This also reduces the number of features which provide only noise. The resulting data is the final training data. This can be stored, and repeatedly used for different test data. The process for the creation of this training data must be done only once, indicated by the gray shading. If additional designs are added to the golden model library, feature extraction must take place for each new design, and the scaler and PCA must be refitted. Similar to the training data, the features of the test data must also be extracted. Then, the test data is scaled using the scaler fitted with the training data. This ensures that the scaler is independent of the test data, and thus valid for each new set of test data. The same applies to the PCA - the scaled test data is mapped onto the same 67 principal components identified by the PCA on the training data. Finally, a KNN classification finds the design with the features that most closely match the reduced and scaled test data. The design closest to the test data is considered to be the best match, however, the next closest designs are also taken into account. While a perfect match is preferable, a match in the top five is considered successful, as it severely reduces the exploration space. 2.6. Approach 2: Fuzzy Graph Isomorphism The second approach to structural matching is focused on finding a solution for a fuzzy graph isomorphism, to match a netlist to a golden model library. It takes into account not only the structure of the netlist, but also the functionality of each gate. Formally, a digital circuit netlist N can be represented by a directed graph G = (V, E) where V is the set of gates, forming the circuit and where the set of edges E ⊆ V 2 describes the connectivity of the circuit. More precisely, for two gates v1 , v2 ∈ V , (v1 , v2 ) ∈ E if and only if the output of gate v1 drives one input of v2 . Moreover, each node v ∈ V will be labeled by the functionality of the gate represented by v. Thus we define a labeling function T : V → T with T denoting the set of all possible functionalities of gates in the circuit. For a node v ∈ V , T (v) is simply the functionality of the gate represented by v, for example a 2-input NAND gate. Given two netlists N1 and N2 , represented by the two graphs G1 = (V1 , E1 ) and G2 = (V2 , E2 ), we observe that if netlist N1 describes a subcircuit present in netlist N2 , then there will exist an isomorphism between G1 and a subgraph M of G2 . In symbols: G1 ∼ = M with M ⊆ G2 where ⊆ denotes the subgraph relation and ∼ = denotes an edge and label preserving isomorphism. The problem of finding such an edge and label preserving subgraph isomorphism, i.e. the problem of finding M ⊆ G2 with G1 ∼ = M is an NP-complete problem. Without loss of generality we can assume |V1 | ≤ |V2 |. In symbols we will write I(G1 , G2 ) :=



M ⊆G2

G1 ∼ =M

(3)

for the subgraph isomorphism problem. Thus, given G1 and G2 , we have to determine if the predicate I in (3) is true or not. Not only is this problem NP-complete, but also any error in G2 or G1 can result in a wrong or missed answer, i.e. in a wrong truth value for the predicate I in (3). We use a heuristic for determining if I(G1 , G2 ) is likely true or false in the presence of errors in G1 and G2 . The heuristic should ideally be robust in the presence of errors in the graphs, i.e. it should produce the correct answer when considering perturbed graphs. Furthermore, it should produce a reasonably low number of false results in practice, i.e. a low number of false truth values for I, while the runtime should be acceptable. The main idea is to “hash” nodes in G1 and G2 and to see if hash values can be matched. A hash value captures information about the functionality of a node, together with the connectivity and the functionalities of nodes forming predecessors of the given node up to some depth. Details are shown in Algorithm 1. 9

Algorithm 1: HashNode(G, v, l, H). Input : Labeled graph G = (V, E) with label function T : V → T . Node v ∈ V and hash depth l ∈ N. H is a function H : T → N. Output: Hash value. Either NIL or value in N 1 if l == 0 then 2 return H(T (v)) 3 else 4 if Pred(v) == ∅ then 5 return NIL 6 else 7 h←0 8 for p ∈ Pred(v) do 9 hp = HashNode(G, p, l − 1, H) 10 if hp == NIL then 11 return NIL 12 else 13 h ← HashCombine(h, hp , l) 14

return h

We observe that HashNode is a recursive function. HashCombine simply computes a weighted sum with weights dependent on the level l. We then hash all nodes of G1 and of G2 and in the process construct an inverse map Nodes(h, H, l, G) such that Nodes(h, H, l, Gi ) = {v ∈ Gi | h = HashNode(Gi , v, l, H) }

(4)

for i = 1, 2. Finally, Algorithm 2 is used to heuristically solve the subgraph isomorphism problem (3) in the presence of errors in the graphs G1 and/or G2 . The algorithm iterates over the nodes in G1 and checks if a node in G2 exists with the same hash value, and if a hash match can be obtained for the direct successors. It runs in polynomial time with a complexity of O(V 2 ) and depends on the choice of the function H and the depth l. In practice, it repeats the process for multiple iterations i with different choices for H and l and averages the outcome. However, a change in parameter settings requires a recalculation of the hash of the golden model library. Thus, if the algorithm iterates over several values for H and l, the hash of each golden model will be recalculated for each iteration. This approach does not require precomputed features for the golden model library. This allows for a fast update of the golden model library, but is less efficient for a large number of test sets and a large number of designs within the golden model library. Furthermore, it is observed that for a high number of true positives, usually also false positives will be identified. That is, to avoid excluding the correct match, the parameters must be chosen in such a way that the number of false positives is not zero. Both, this effect, and the required recalculation of the hash for each design in the golden model library for an iteration during runtime, means that this approach should not be used as a stand alone solution. Rather, it can be used to refine other methods, as it is particularly strong in identifying true negatives. 3. Results The effectiveness and correctness of the methods proposed above is evaluated for the test cases of errors in the netlist (Section 3.1), partitioning errors (Section 3.2) and obfuscated netlists (Section 3.3). Significant results are also summarized in Table 1, including details on the number of designs and size of designs under test, as well as the percentage of “change” to each design. 10

Algorithm 2: I(G1 , G2 , l, H) Input : Graphs G1 , G2 , hash depth l, function H as in Algorithm 1 Output: (heuristic) true if (3) is likely true, false if (3) is likely false and unclassified otherwise 1 Call HashNode for all nodes in G1 and G2 and construct the inverse map Nodes(h, H, l, Gi ) for i = 1, 2 as in (4). 2 c ← 0 //count hashed nodes 3 m ← 0 //count matches 4 for v ∈ V1 do 5 h ← HashNode(G1 , v, l, H) 6 if h!=NIL then c←c+1 7 8 match ← true 9 if Nodes(h, H, l, G2 )! = ∅ then 10 S ← Succ(v) 11 for s ∈ S do 12 hs ← HashNode(G1 , s, l, H) 13 if hs ! = N IL && Nodes(hs , H, l, G2 ) == ∅ then match ← false 14 15 16 17 18 19 20 21 22 23 24

else match ← false if match then m←m+1

if TooFewNodesHashed(c, |V1 |) then return unclassified if EnoughNodesMatch(m, |V1 |) then return true else return false

To evaluate the correctness of the fuzzy structural matching, two metrics are considered: can a correct match be found as the first match, where the correct design is the closest within the feature space, and can a top five match be found, where the correct design is within the top five designs closest in the feature space? It is assumed that further verification and backtracing can take place for a small sample set; this is discussed in Section 4.1. 3.1. Netlist Extraction Errors The first experiment evaluates the difficulty of equivalence matching a golden model to an unknown netlist with errors caused during the netlist extraction process from the chip to the netlist. The golden model library consists of 628 designs, while the tests are carried out for each of the 5,652 error designs. 3.1.1. Fuzzy Structural Matching The percentage of error designs which could not be matched for both metrics, for each error percentage, are illustrated in Figure 4(a)-(c). Since the error insertion is random, some noise is expected in the results and a trend line is approximated. As expected, a high percentage of errors, i.e. an increased number of erroneous edges or gates, leads to a decrease in the ability to match, with more unmatched designs. However, for wiring errors this effect is less pronounced than expected (Figures 4(a), (b)). Even for an error of 3%, which could mean hundreds of misconnected wires, the correct circuit is nearly always within the top five matched designs. A completely correct identification fails for, at worst, 3.5% of all designs. 11

(a) Connection (random) 4%

Correct Top 5

Unmatched

Unmatched

4%

(b) Connection (layout)

2% 0% 0%

0.5%

1%

1.5% 2% Errors

2.5%

Correct Top 5

2% 0% 0%

3%

(c) Speckled

20%

30%

Correct Top 5

10% 0% 0%

0.5%

1%

1.5% 2% Errors

1%

1.5% 2% Errors

2.5%

3%

(d) Simulated Partitions Unmatched

Unmatched

30%

0.5%

2.5%

20% 10% 0% 0%

3%

Correct Top 5

0.5%

1%

1.5% 2% Errors

2.5%

3%

Figure 4: Percentage of Unmatched Designs for Erroneous Designs and Simulated Partitions (Fuzzy Structural Matching)

As illustrated in Figure 4(c), the performance of the approach is reduced when entire gates are removed from the designs. Here, the percentages of unmatched designs, both for a correct match and for the top five, increase significantly with a larger number of errors. However, for a small error percentage of up to 1.0%, the results are acceptable, with a correct identification failing only in 10.83% of all cases, and a top five match failing for 5.1% of all designs. As previously discussed, a high error percentage is not entirely realistic, in particular for this type of error. In a reverse engineering setup, an error percentage of 3% could mean hundreds of missing gates in a medium sized design. This no longer correlates to bad image quality, or dust on the image, but rather entirely missing image slices. An error affecting such a large number of gates should be corrected at an earlier step of the chip reverse engineering process. 3.1.2. Fuzzy Graph Isomorphism As discussed in Section 2.6, the fuzzy graph isomorphism based approach can be used as a refinement step following fuzzy graph matching. The approach is particularly effective at eliminating true negatives from a small set of possible matches. Thus, the focus is placed on those test cases where the number of designs which are correctly identified with the first match, and the number of designs where a match was found within the top five vary significantly, namely for the higher error percentages of 1%, 2%, and 3%. This means that the fuzzy structural matching failed to identify an exact match, but did identify a small set of possible matches, which may or may not include the correct match. Thus, all designs for which a first correct match was not found are examined, including those where the correct match is not within the top five. This results in 201 designs, predominantly for the “Speckled” error type. To test the capabilities of the method, first fuzzy structural matching is used to identify the top ten designs. In the ideal case, the refinement should identify one true positive and nine true negatives for those designs where the match is within the top ten, and ten true negatives when the correct design is not within the top ten designs. The refinement is successful if it is able to eliminate more incorrect designs compared to using only fuzzy structural matching. Thus, it is then evaluated how often fuzzy graph isomorphism is able to eliminate more than five designs from this set. If this occurs more often than in Figure 4), it can be seen as successful. As the golden model 12

0% 1%

2% 3% Errors

1% 0% 1%

2% 3% Errors

20% 10% 0% 1%

(d) Simulated Partitions Unmatched

1%

2%

(c) Speckled Unmatched

2%

(b) Connection (layout)

Unmatched

Unmatched

(a) Connection (random)

2% 3% Errors

6% 3% 0% 1%

2% 3% Errors

Figure 5: Percentage of Unmatched Designs for Erroneous Designs and Simulated Partitions (Fuzzy Graph Isomerism)

library is reduced to the top ten designs, the calculation is inherently faster than considering the entire golden model library, as the hash is only calculated for the reduced golden model library for each iteration. These results are shown in Figure 5(a)-(c), for the parameter set l=8 and 32 iterations. Similarly to the first approach, for the connection based error, the refinement identifies a correct match in nearly all cases. However, it must be noted that the examined number of designs is very small. Using just the first approach would be sufficient for this error type. When large numbers of gates are removed, the second approach is able to refine the results in all cases, with an improvement of up to 2%. Furthermore, it may occur that more than five incorrect designs are eliminated, which is not considered in Figure 5. On average, ≈ 7 designs can be eliminated from the original ten, if a correct match is found. 3.2. Faulty Partitions The second experiment is concerned with finding a match between a faulty partition of a design and a functionally equivalent golden model. 3.2.1. Simulated Partitions As discussed in Section 2.3, to better understand the effect an imperfect partition may have on designs in a controllable setup, a fourth type of error was simulated, in which random gates at the border of the design are removed up to a specific error percentage. The results for this error type for fuzzy structural matching are summarized in Figure 4(d). While the percentage of incorrect matches increases strongly with more errors in the design, the number of designs for which the correct match is not within the top five remains relatively low, with 5.4% of designs not identified at an error of 3%. Again, the results were refined by using the fuzzy graph isomorphism matching with those 80 designs which could not be matched as a first match. Again, the golden model library is composed of the top ten designs found using the KNN classification. The percentage of cases where the correct design is not found, and thus at least five incorrect designs cannot be eliminated can be seen in Figure 5(d). The approach is again able to refine the results, and is overall able to find a correct match more often than for the Speckled test case, where the same number of gates are removed. This also suggests that removing nodes from the border of a design has a significantly less degrading effect on the effort to match, for both approaches, than removing from random areas within the netlist, as can be seen when comparing Figure 4(c) and (d) to Figure 5(c) and (d). 3.2.2. Realistic Partitions When attempting to match a realistic partition, two possible outcomes are viewed as positive. For one, a match between a realistic partition and the ideal partition is desirable. This means that the 366 ideal partitions should be added to the golden model library, resulting in 994 designs in the golden model library. However, a match with the design it was originally partitioned from is also considered a successful match. If an AES S-Box was partitioned from an AES Round, a match against an ideal AES S-Box, as well as a match against an AES Round is considered a correct match. 13

(a) Percentage of Designs in Top Ten

(b) Match Index of Designs 100

1

10 0.9

90

0.8

0.6

70

0.4

60

0.2

50

0.85

0.9 0.95 Sensitivity

1

8

0.7

80 Overhead

Overhead

0.8

0.6

6

0.5 0.4

4

0.3 0.2

2

0.1

40

0

0.85

0.9 0.95 Sensitivity

1

0

Figure 6: Distribution of Partitioned Designs and Match Index

For fuzzy structural matching, of the 1,892 realistic partitions, 1,742 (92.07%) correctly matched one of the designs in the golden model library within the top five results, and for 1,297 (68.55%), the first match was the correct match. Figures 6(a) and (b) display the results for these realistic partitions with respect to sensitivity and overhead. Designs with a sensitivity of more than 80%, and an overhead of 0% to 100% are shown. For both images, brighter colors signify a better result. The percentage of designs for which a correct match was found within the top ten is summarized in Figure 6(a). As expected, the designs with a low overhead and a high sensitivity were more commonly correctly identified within the top ten matches. An area with a high matching percentage also occurs for a high overhead and medium sensitivity, however, only few of the designs fall into this range. Thus, the number of partitioned designs for each overhead and sensitivity range must also be considered. More than 40% of all designs are located within the lower right hand corner, while the upper left corner contains < 1%. This is illustrated in Figure 6(b), where the distribution of the designs which can be identified and their associated match index up to ten can be found. The match index specifies when the correct match is found, i.e. a match index of one signifies that the correct match was found as the top match by the KNN algorithm, while an index of ten signifies that the correct match is only just part of the top ten matches. The group of data in the lower right hand corner suggests that the chosen underlying clustering algorithms are good, as many designs within the chosen sensitivity and overhead range tend towards the perfect case. As expected, the designs in the lower right hand corner are more often correctly identified than those with large overheads or a lower sensitivity. However, even with a low sensitivity and large overhead, many of the designs are still able to be identified easily. Furthermore, the data suggests that a high sensitivity may compensate a low overhead and vice versa. Nevertheless, a lower sensitivity leads to more good matches than a low overhead, even considering the varying numbers of designs. The results are again refined using the fuzzy graph isomorphism approach. As the approach identifies possible graph isomorphisms between two designs, it does not require the existence of an ideal partition in the golden model library. Rather, it is well suited for finding partial matches, and can identify if one design is a partition of another. For the refinement step, 595 designs were examined. For 150 of these, the correct match is not within the top ten matches and the structural hash should identify ten true negatives. For the parameter set l=8 and 32 iterations, the algorithm correctly found 100 cases with ten true negatives. Furthermore, it identified 337 true positives, which corresponds to 75.73% of all cases where this should 14

Unmatched

50% 40%

Correct Top 5

30% 20% 10% 0% 0%

5%

10%

15%

20%

25%

30%

Obfuscation Overhead Figure 7: Percentage of Unmatched Designs for Random Logic Locking

occur. An elimination of incorrect designs was possible in 44.80% of all cases, with an average of ≈ 5 designs removed. 3.3. Obfuscation The final tests aim to evaluate the feasibility of identifying a usable golden model from a large library of golden model designs for an unknown obfuscated design. The golden model library consists of the 628 designs, and, for the benchmark design, a further ten original benchmark designs. 3.3.1. Random Logic Locking As for netlist extraction errors, two metrics for the success of the fuzzy structural matching for obfuscated designs are considered: whether the correct match is the first match, and whether the correct design is within the top five found matches. The percentage of obfuscated designs which could not be matched correctly or within the top five designs are shown in Figure 7. As expected, a higher number of introduced key gates increases the difficulty in finding a correct match. However, for a typical number of key gates, for example 10%, the results are acceptable, with only 6.55% designs unmatched within the top five. Interestingly, a random addition of gates has a smaller effect on the correctness of the matching algorithm than a random rewiring of gates, or a removal of gates at the border. At 3% key gates, only 1.44% of all designs cannot be matched as with a first correct match, and only 0.32% are not within the top five matches. This result is only surpassed for a layout based rewiring of gates. However, as mentioned in Section 2.4, random logic locking is a basic method, more sophisticated methods may add key gates more similar to the ”Speckled” test case. This may result in worse results, closer to those in the ”Speckled” tests. 3.3.2. Obfuscation Benchmarks To further analyze the effect of more complex insertion methods, the obfuscation benchmark designs were evaluated. As only 130 designs exist, the test gives more of an indication than a supportable result. This is also reflected in Figure 8, which shows the number of matched designs (out of 10), for each obfuscation method and percentage. No real pattern can be discerned regarding the effectiveness of the matching method. Of the ten designs for the method dtc10lut, three could be matched correctly, for eight a match could be found in the top five, which is similar to the results of the other three methods. Generally, a percentage of 10% was easiest to match, but even 50% obfuscation overhead did not significantly decrease the ability to match. However, in addition to the small sample size, this inconclusive result can be explained by the fact that the benchmark designs were not created using the same synthesis tool, parameters and cell library as the original golden model library designs. A different cell library or synthesis flow may have a major effect on the structure of the design. This makes the RND benchmarks designs especially interesting, as they are similar to the above evaluated obfuscated designs, but created with an unknown process, using an unknown cell 15

Correct Top 5

5% 10% 25% 50% Obfuscation Overhead

10 8 6 4 2 0

Correct Top 5

5% 10% 25% 50% Obfuscation Overhead

(c) IOLTS

Matched

10 8 6 4 2 0

(b) DAC12

Matched

Matched

(a) RND

10 8 6 4 2 0

Correct Top 5

5% 10% 25% 50% Obfuscation Overhead

Figure 8: Number of Matched Designs for Obfuscated Designs

library. At 25% obfuscation overhead, they perform considerably worse than the above evaluated obfuscation designs, with, at best, a top five match of 50%. 3.3.3. Cell Library Comparison To evaluate whether the differences in the obfuscation results could result from the difference in cell library, the effect of the cell library on fuzzy structural matching is examined. First, the golden model designs were also synthesized using the open source NanGate FreePDK45 Generic Open cell library (nangate) [31]. As expected, designs decrease in gate count when using this more comprehensive cell library. The gate count of a nangate design is on average only 60% of that of a design synthesized with the osu035 cell library. This second golden model library is matched against the original, to better understand the effect of cell library on the success of matching. 49.29% of the designs could be matched to their osu035 equivalents as a first match, 78.93% were found within the top five matches. This would support the hypothesis that the effect of the choice of cell library is not highly significant for a fuzzy structural matching, as the structure of many designs remains similar even after synthesis with a different cell library. Thus, the difference in results for the obfuscation benchmarks is either caused by the use of a different synthesis tool flow, or differences in implementation of the locking logic methods. 4. Future Work 4.1. Limitations of Fuzzy Matching One limitation of a fuzzy matching algorithm compared to a formal verification based approach is that the result is not easily verifiable. A functional equivalence matching between two designs gives a complete and conclusive result, with the certainty that no other functionality is contained. By design, a fuzzy method does not give this certainty. This proves a challange both for realistic reverse engineering and for the identification of a golden model for an obfuscated design. However, finding a matching golden model for a specific netlist or partition is an important first step for reverse engineering. This must be followed up by a backtracing to verify and remove potential errors. If these are caused due to the reverse engineering process, it may be possible to identify during which step, and due to which reason, the error occurred. For example, an error suspected to be caused by a NetIDError should be followed up by considering the original images, and deciding whether the stitching or wire extraction algorithms provided incorrect results. Errors in partitions are simpler to identify, as the missing or extra gates are inherently contained in the unknown design, and thus previous steps need not be examined. It becomes a problem of identifying whether the proposed golden model is a possible partition of the design. In the case of obfuscated designs, it is generally possible to remove the obfuscation overhead, which gives an insight into the correct key. However, it must be examined in future work how well the correct key can be extracted without an exact functional equivalent. 16

Test

Type

Nr. Gates

Errors

Connection (random)

Partitions

Obfuscation

Cell Library

Nr. Designs

Change

Top Five

Exact

800 - 55,000

314 314 314 314 314

0.1% 0.5% 1% 2% 3%

99.68% 100% 99.68% 100% 99.04%

99.68% 99.04% 98.41% 97.45% 96.50%

Connection (layout)

800 - 55,000

314 314 314 314 314

0.1% 0.5% 1% 2% 3%

100% 100% 99.68% 100% 99.68%

100% 100% 100% 99.68% 99.04%

Speckled

800 - 55,000

314 314 314 314 314

0.1% 0.5% 1% 2% 3%

100% 98.09% 94.90% 88.85% 80.89%

100% 96.82% 89.17% 81.85% 74.52%

Simulated

800 - 55,000

314 314 314 314 314

0.1% 0.5% 1% 2% 3%

99.68% 98.41% 98.09% 96.50% 94.59%

98.73% 96.18% 92.99% 92.04% 89.49%

Realistic

800 - 25,000

1,892

n.a.

92.07%

68.55%

Random

800 - 55,000

628 628 628 628 628 628 628 628

2% 3% 5% 10% 15% 20% 25% 30%

99.85% 99.68% 99.04% 93.45% 92.00% 80.03% 75.00% 64.85%

99.04% 98.56% 95.53% 83.39% 80.00% 66.13% 60.00% 51.70%

RND

800 - 10,000

10 10 10 10

5% 10% 25% 50%

60.00% 10.00% 50.00% 60.00%

30.00% 40.00% 20.00% 30.00%

DAC12

800 - 10,000

10 10 10 10

5% 10% 25% 50%

50.00% 90.00% 50.00% 50.00%

30.00% 50.00% 20.00% 40.00%

IOLTIS

800 - 10,000

10 10 10 10

5% 10% 25% 50%

50.00% 80.00% 50.00% 50.00%

40.00% 40.00% 20.00% 30.00%

DTC10LUT

800 - 10,000

10

n.a.

80.00%

30.00%

Nangate

800 - 55,000

628

n.a.

78.93%

49.29%

Table 1: Overview of Significant Results

17

4.2. Future Research Directions Future work must investigate methods to identify the location of an error or key gate to allow for backtracing to verify that the found match is correct. If a backtracing fails, the matching approaches can also be improved. For the fuzzy structural matching, this could mean either a larger and more comprehensive golden model library, or more advanced classification algorithms for pattern matching. More, fewer, or other structural features may also provide better results. Data Preprocessing is another topic which should be examined more thoroughly. The structural hashing approach could also be refined by considering a match between direct predecessors and successors; this would increase the performance, but also the complexity of the approach. Furthermore, the ability of the algorithm to partition graphs should also be scrutinized, with a focus on matching without an ideal partition in the golden model library. When it comes to obfuscated designs, backtracing is not so simple. Nevertheless, it may be possible to find the correct key for a part of the design, and reduce the key space to one that can be broken using a brute force approach. This should be further investigated in future work. Furthermore, the proposed approaches should be tested for new and more complex hardware obfuscation methods. Unfortunately, many newer combinational logic locking methods are not yet easily applicable to cyclic and sequential circuits, and while it is claimed that obfuscation is possible, to the authors’ knowledge, no benchmarks have been published. On the other hand, the robustness of sequential obfuscation methods against this kind of attack must be evaluated. Finally, to further understand the effect of different synthesis tools and different cell libraries, more thorough testing is needed. Moreover, while it can be assumed that the cell library can be reconstructed from a reverse engineered chip, identifying the employed synthesis tool and parameters present more of a challenge. This must also be evaluated in future work. 5. Conclusion This paper has highlighted the difficulties that errors caused by netlist extraction and netlist partitioning will bring to formal verification based hardware reverse engineering methods. It shows that new, fuzzy methods are required, and proposes two methods which leverage the inherent relationship between structural and functional similarity. It has also shown that logic obfuscation methods are inherently insecure, not only with an activated chip or unsecured testing structures. It introduces a number of error types, and shows that even for a large percentage of errors, the correct design can be identified from a golden model library using structural similarities to find a functional correspondence. The same designs are also obfuscated using a simple random logic locking, and again, the golden model can be identified. In the future, both proposed methods should be tested against a larger golden model library, and for test cases resulting from a real world IC reverse engineering flow. Furthermore, the methods should be evaluated more fully for newer and more advanced obfuscation methods. Finally, the effects of different cell libraries and synthesis flows must be investigated. 6. References [1] M. Werner, B. Lippmann, J. Baehr, H. Gr¨ ab, Reverse engineering of cryptographic cores by structural interpretation through graph analysis, in: 2018 IEEE 3rd International Verification and Security Workshop (IVSW), 2018, pp. 13–18. [2] P. Subramanyan, N. Tsiskaridze, K. Pasricha, D. Reisman, A. Susnea, S. Malik, Reverse engineering digital circuits using functional analysis, in: Design, Automation Test in Europe Conference Exhibition (DATE), 2013, pp. 1277–1280. [3] J. A. Roy, F. Koushanfar, I. L. Markov, EPIC: Ending Piracy of Integrated Circuits, in: 2008 Design, Automation and Test in Europe, 2008, pp. 1069–1074. doi:10.1109/DATE.2008.4484823. [4] M. Yasin, J. J. Rajendran, O. Sinanoglu, R. Karri, On Improving the Security of Logic Locking, in: IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, Vol. 35, 2016, pp. 1411–1424. doi:10.1109/TCAD.2015.2511144. [5] Y. Xie, A. Srivastava, Anti-SAT: Mitigating SAT Attack on Logic Locking, 2017. URL http://eprint.iacr.org/2017/761 [6] P. Subramanyan, S. Ray, S. Malik, Evaluating the security of logic encryption algorithms, in: 2015 IEEE International Symposium on Hardware Oriented Security and Trust (HOST), 2015, pp. 137–143. doi:10.1109/HST.2015.7140252.

18

[7] T. Meade, K. Shamsi, T. Le, J. Di, S. Zhang, Y. Jin, The Old Frontier of Reverse Engineering: Netlist Partitioning, Journal of Hardware and Systems Security (2018) 201–213. [8] J. Couch, E. Reilly, M. Schuyler, B. Barrett, Functional block identification in circuit design recovery, in: 2016 IEEE International Symposium on Hardware Oriented Security and Trust (HOST), 2016, pp. 75–78. [9] B. Cakir, S. Malik, Reverse engineering digital ics through geometric embedding of circuit graphs, ACM Trans. Des. Autom. Electron. Syst. 23 (4) (2018) 50:1–50:19. [10] X. Xu, B. Shakya, M. M. Tehranipoor, D. Forte, Novel Bypass Attack and BDD-based Tradeoff Analysis Against all Known Logic Locking Attacks, 2017. URL http://eprint.iacr.org/2017/621 [11] P. Chakraborty, J. Cruz, S. Bhunia, Sail: Machine learning guided structural analysis attack on hardware obfuscation (2018) 56–61doi:10.1109/AsianHOST.2018.8607163. [12] D. Sirone, P. Subramanyan, Functional analysis attacks on logic locking, in: 2019 Design, Automation Test in Europe Conference Exhibition (DATE), 2019, pp. 936–939. doi:10.23919/DATE.2019.8715163. [13] L. Alrahis, M. Yasin, H. Saleh, B. Mohammad, M. Al-Qutayri, Functional reverse engineering on sat-attack resilient logic locking, in: 2019 IEEE International Symposium on Circuits and Systems (ISCAS), 2019, pp. 1–5. doi:10.1109/ISCAS.2019.8702704. [14] D. Koutra, A. Ramdas, J. Xiang, Algorithms for graph similarity and subgraph matching, 2011. [15] V. Carletti, Exact and Inexact Methods for Graph Similarity in Structural Pattern Recognition PhD thesis of Vincenzo Carletti., Theses, Universit´ e de Caen ; Universita degli studi di Salerno (Apr. 2016). URL https://hal.archives-ouvertes.fr/tel-01315389 [16] Oliscience, Opencores (2018). URL https://opencores.org/projects [17] C. Albrecht, Iwls 2005 benchmarks, in: International Workshop for Logic Synthesis (IWLS), 2005. URL http://www.iwls.org [18] Opencircuitdesign, Qflow 1.1: An open-source digital synthesis flow (2018). URL http://opencircuitdesign.com/qflow/ [19] O. S. University, Oklahoma state university system on chip (soc) design flows. URL https://vlsiarch.ecen.okstate.edu/flow/ [20] V. D. Blondel, J.-L. Guillaume, R. Lambiotte, E. Lefebvre, Fast unfolding of communities in large networks, Journal of Statistical Mechanics: Theory and Experiment 2008 (10). [21] S. van Dongen, C. Abreu-Goodger, Using MCL to Extract Clusters from Networks, Springer New York, New York, NY, 2012, pp. 281–295. [22] S. Amir, B. Shakya, X. Xu, Y. Jin, S. Bhunia, M. Tehranipoor, D. Forte, Development and Evaluation of Hardware Obfuscation Benchmarks, in: 210, Vol. 2, 210, 2018, pp. 142–161. URL https://doi.org/10.1007/s41635-018-0036-3 [23] A. Baumgarten, A. Tyagi, J. Zambreno, Preventing ic piracy using reconfigurable logic barriers, IEEE Design Test of Computers 27 (1) (2010) 66–75. doi:10.1109/MDT.2010.24. [24] J. Rajendran, Y. Pino, O. Sinanoglu, R. Karri, Security analysis of logic obfuscation, in: 2012 49th ACM/EDAC/IEEE Design Automation Conference (DAC), 2012, pp. 83–89. doi:10.1145/2228360.2228377. [25] S. Dupuis, P. Ba, G. Di Natale, M. Flottes, B. Rouzeyre, A novel hardware logic encryption technique for thwarting illegal overproduction and hardware trojans, in: 2014 IEEE 20th International On-Line Testing Symposium (IOLTS), 2014, pp. 49–54. doi:10.1109/IOLTS.2014.6873671. [26] S. Wasserman, K. Faust, Social network analysis: Methods and applications, Vol. 8, Cambridge university press, 1994. [27] U. Brandes, On variants of shortest-path betweenness centrality and their generic computation, Social Networks 30 (2008) 136–145. [28] M. Newman, Networks: An Introduction, Oxford University Press, 2010. [29] L. Page, S. Brin, R. Motwani, T. Winograd, The pagerank citation ranking: Bringing order to the web., Technical report, Stanford InfoLab (November 1999). [30] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, E. Duchesnay, Scikit-learn: Machine learning in Python, Journal of Machine Learning Research 12 (2011) 2825–2830. [31] Si2, Nangate freepdk45 generic open cell library. URL http://projects.si2.org/openeda.si2.org/projects/nangatelib

19

Errors in the reverse engineering process strongly affect the final abstraction of gate-level netlist Formal method based netlist abstraction is no not viable Structural similarity can imply functional similarity New fuzzy methods can identify functionality of erroneous netlists beyond formal methods Applicable to deobfuscation of netlists without the need for an activated chip

Conflict of Interest and Authorship Conformation Form Please check the following as appropriate:

x All authors have participated in (a) conception and design, or analysis and interpretation of the data; (b) drafting the article or revising it critically for important intellectual content; and (c) approval of the final version. x This manuscript has not been submitted to, nor is under review at, another journal or other publishing venue. x The authors have no affiliation with any organization with a direct or indirect financial interest in the subject matter discussed in the manuscript o

The following authors have affiliations with organizations with direct or indirect financial interest in the subject matter discussed in the manuscript:

Author’s name

Affiliation