Adversary resistant deep neural networks via advanced feature nullification

Knowledge-Based Systems 179 (2019) 108–116 Contents lists available at ScienceDirect Knowledge-Based Systems journal homepage: www.elsevier.com/loca...

Download PDF

4MB Sizes 0 Downloads 54 Views

Report

Full Text

Knowledge-Based Systems 179 (2019) 108–116

Contents lists available at ScienceDirect

Knowledge-Based Systems journal homepage: www.elsevier.com/locate/knosys

Adversary resistant deep neural networks via advanced feature nullification✩ ∗

Keji Han, Yun Li , Jie Hang School of Computer Science and Technology, Nanjing University of Posts and Telecommunications, Wenyuanlu 9, Nanjing 210023, China

article

info

Article history: Received 12 November 2018 Received in revised form 6 May 2019 Accepted 8 May 2019 Available online 15 May 2019 Keywords: Adversarial machine learning Deep learning Feature nullification Hadamard product

a b s t r a c t Deep neural networks (DNNs) have been achieving excellent performance in many learning tasks. However, recent studies reveal that DNNs are vulnerable to adversarial examples. Fortunately, a random feature nullification (RFN) algorithm is proposed to improve the robustness of DNNs against gradient-based adversarial examples. However, experimental results demonstrate that RFN ruins the availability of DNNs in some cases. To explore more efficient feature nullification (FN) algorithms, we theoretically prove that FN can improve the robustness of DNNs. Moreover, sliding window feature nullification (SWFN) and fixed stride feature nullification (FSFN) algorithms are proposed to improve the robustness of DNNs. The experimental results demonstrate that compared to RFN, the proposed algorithms can maintain the availability of DNNs without decreasing its robustness against gradient-based attacks. © 2019 Elsevier B.V. All rights reserved.

1. Introduction Deep neural networks (DNNs) have been achieving impressive performance in a couple of learning tasks, such as computer vision [1–3] and nonlinear systems [4–6]. However, Ref. [7–9] reveal that DNNs are vulnerable to adversarial examples and are increasingly security-sensitive. An adversarial example is always defined as the example crafted by an attacker to fool the target model [10]. Therefore, many algorithms have been proposed to generate adversarial examples. According to the knowledge of an attacker, attack algorithms can be divided two categories: white-box attack and black-box attack. In a white-box attack algorithm, the attacker has perfect knowledge of the victim DNNs, such as architectures, parameters, training sets and even adopting defense algorithms. While in a black-box algorithm, the attacker only has limited knowledge. Moreover, attack algorithms fall into three categories according to how adversarial perturbations, which are added to corresponding legitimate examples to craft adversarial examples, are approached, including gradient-based, optimization-based and searching-based algorithms. Gradient-based attack algorithms add gradient information to corresponding legitimate examples, such as fast ✩ No author associated with this paper has disclosed any potential or pertinent conflicts which may be perceived to have impending conflict with this work. For full disclosure statements refer to https://doi.org/10.1016/j.knosys. 2019.05.007. ∗ Corresponding author. E-mail address: [email protected] (Y. Li). https://doi.org/10.1016/j.knosys.2019.05.007 0950-7051/© 2019 Elsevier B.V. All rights reserved.

gradient sign method (FGSM) [11], Papernot’s attack [10] and local distribution smoothness [12]. Optimization-based algorithms formulate the processes of exploring adversarial perturbations into an optimization problem, such as the iterative gradient sign method [13], Carlini attack [14] and Deepfool [15]. Search-based algorithms apply a search strategy to explore adversarial perturbations without using gradient information, such as differential evolution (DE) [16] (one pixel or few attacks). The adversarial examples crafted by FSGM on MNIST [17], which is a hand-written recognition dataset, and GTSRB [18], a traffic sign recognition dataset, are shown in Figs. 1 and 2 To mitigate adversarial attacks, some methods have been proposed as well. MagNet [19] employs a couple of detectors, usually autoencoders, to check the given input is an adversarial example and a reformer to reconstruct the given input to make it closer to the manifold of the legitimate example distribution. If the given example is detected as an adversarial example by any detector, it will be rejected to be fed into the target model. Otherwise, the given input will be fed into a reformer and the target model. Distillation [20] employs the outputs (probability vector) of the target model as soft labels to train the supplementary model based on distillation theory [21]. The adversarial training retrains DNNs with corresponding adversarial examples [7]. DataGrad [22] adds a regularization term containing gradient information of each example into the loss function to retrain the target model. As introduced above, most of the defense algorithms usually need to train supplementary models or retrain target models themselves, while retraining the DNN is resourceconsuming when the architecture of the DNN or training set is

K. Han, Y. Li and J. Hang / Knowledge-Based Systems 179 (2019) 108–116

Fig. 1. Legitimate and corresponding adversarial examples for MNIST. Adversarial examples are generated by FGSM. The numbers at the bottom of the figure represent the attack intensity φ . Specifically, 0 means that the corresponding examples in this column are legitimate examples.

complex. In [23], the random feature nullification (RFN) algorithm is proposed to improve the robustness of DNNs by adding FN layers to deep models, which does not require training a supplementary model or retraining the target model. However, experimental results demonstrate that RFN may reduce availability and the accuracy of the target model for legitimate examples, in some cases. Moreover, since RFN are added to the target model during the training phase, RFN cannot improve the robustness of pretrained models. Therefore, in this paper, based on the matrix Hadamard Product derivative, we theoretically analyze that the feature nullification (FN) algorithm is effective to defend against gradient-based adversarial attack. Furthermore, two feature nullification algorithms, specifically the sliding window feature nullification (SWFN) and fixed stride feature nullification (FSFN), are proposed to solve the aforementioned issues of original RFN. The Experimental results show that SWFN and FSFN improve robustness and classification accuracy compared to the original RFN. Our contributions can be summarized as follows:

• We theoretically prove effectiveness of the FN to improve robustness of DNNs.

• We propose two new FN defense algorithms, namely, SWFN and FSFN.

• We explore the relationship between defense performance of FN algorithms and corresponding mask matrices.

• We experimentally demonstrate that FN can achieve more excellent defense Performance by paying more attention to the semantic regions of image data. • We evaluate the defense algorithms mentioned in this paper with a real-world dataset, GTSRB, and a toy dataset, MNIST. The rest of the paper is organized as follows. In Section 2, we introduce some related work. Theoretical proof of the validity of FN is given in Section 3. Two novel feature nullification algorithms are proposed in Section 4. The experimental details are shown in Section 5. The paper ends with discussions and conclusions in Section 6.

109

Fig. 2. Legitimate and corresponding adversarial examples for GTSRB. Adversarial examples are generated by FGSM. Numbers at the bottom of the figure represent the attack intensity φ . Specifically, 0 means that the corresponding examples in this column are legitimate examples.

2. Related work Dropout [24], which is a nonparameter layer of DNNs, is an effective method to mitigate overfitting for DNNs by pruning features according to the Bernoulli distribution. In a training case, the DNN with a dropout layer amounts to sampling a submodel from the original model to update parameters, while all submodels share weights in different training cases. Since there are couples in neural node combinations, namely, submodels, each submodel is rarely trained rarely. For instance, when dropout layers consist of n neurons, the number of submodels is approximately O(2n ). The probability of a specific submodel is sampled at 1/O(2n ) in each training case. That is, the reason why dropout has the ability to mitigate overfitting. Moreover, when dropout is added as the first layer of DNN, some elements of the original input will be set as zeros and the corresponding elements of the derivative matrix will be zeros as well. Therefore, it can be speculated that dropout can mask the gradient of the input. In this way, dropout can be employed as a defense algorithm for DNNs against gradient-based attack. Inspired by dropout, the RFN [23] algorithm is proposed to make deep models more robust against a gradient-based attack. Feature nullification generates a random mask matrix L, which is a 0 and 1 binary matrix, and the output Hadamard product of the given example is x and the mask matrix is L. To be specific, the element of the mask matrix random will be set as 0 or 1 with a fixed probability p. For an image example, the values of pixels, for which the positions will correspond to the zero elements of the mask matrix, will be set as zeros. As dropout, RFN can mask the gradient of the input of DNN. Therefore, RFN can be employed to improve the robustness of DNNs. Using RFN as an additional layer to DNNs endows DNNs with some randomness, such as the number of nullified features and the specific features to be nullified. Therefore, RFN is an effective defense against a white-box attack as well. 3. Theoretical analysis of FN As introduced in [23], RFN is simple and effective to improve the robustness of DNN. In this section, we prove the validity of

110

K. Han, Y. Li and J. Hang / Knowledge-Based Systems 179 (2019) 108–116

Fig. 3. DNN model with an FN layer and a red border region in L representing a window. The solid line represents the ground-truth propagation, while the dashed lines are introduced to illustrate the process of SWFN. Each row of the mask matrix is a sliding window, and the gray and white grids are 0 and 1, respectively . (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)

the feature nullification algorithm to theoretically improve the robustness of DNNs. In detail, the Hadamard Product derivative of the matrix is employed for theoretical proof of the validity of FN. To analyze the validity of FN, it is necessary to introduce the generation process of adversarial examples. Generally, the process of generating an example can be formalized as an optimization problem as follows [8,9]: arg min ∥δx ∥ s.t . F (x + δx ) = y∗ δx

(1)

where F denotes the DNN model, while δx is the adversarial perturbation added to example x. The output y∗ is a label that is different from the ground-truth of x. For the gradient-based attack, we employ ℑ(θ, x, y), which usually is cross-entropy loss, to denote loss function of the corresponding DNN, where θ is the parameters set of F, and x and y represent the example and its corresponding label, respectively. δx can be formulated as:

∂ (ℑ(θ , x, y)) ((x, y) ∈ Ω ) (2) ∂x The Ω and φ denote example space and the scale of gradi-

δx = φ ·

ent added to the original example. The adversarial example is formulated as x′ = x + δx . As mentioned above, the output of FN is the Hadamard Product of an elaborately crafted mask matrix L and original example x. Therefore, it is necessary to introduce the definition of the Hadamard Product. As introduced in [25], the Hadamard Product is a dot product of two matrices, which can be formulated as: (A ◦ B)i,j = Ai,j · Bi,j

(3)

According to Eq. (3), the derivative of the Hadamard Product is: ∂ Aij · Bij ∂ (A ◦ B) = i ∈ [1, M ], j ∈ [1, N ] (4) ∂A Aij When the FN layer is added to DNN, m(x) is the output of FN, i.e., m(x) = x ◦ L. According to Eq. (2), δx for DNN with FN can be calculated as:

∂ℑ(θ , m(x), y) ) ∂x ∂ℑ(θ , m(x), y) ∂ m(x) = φ·( · ) ∂ m(x) ∂x

δxFN = φ · (

(5)

According to Eqs. (4) and (5), the perturbation can also be formulated as follows: ∂ m(x) ∂ (x ◦ L) = =L (6) ∂x ∂x

δxFN = φ ·

∂ℑ(θ, m(x), y) ·L ∂ m(x)

(7)

According to the first order differential form invariance theorem, the adversarial examples crafted by gradient-based methods for the original model and the model with the FN layer can be formulated as:

{

x′ = x + δx ⇒ x′FN = x + δxFN

{

x′ = x + δx x′FN = x + δx · L

(8)

Eq. (8) demonstrates that the gradient of example x is protected by mask matrix L. Therefore, FNs are effective to make DNNs more robust against the gradient-based attack. Moreover, Ref. [26] speculates that an efficient defense algorithm should block the gradient flow. As shown in Eq. (8), since the mask matrix L of FN can change the original gradient, FN can be applied to block the gradient flow. In summary, FN Algorithms are effective to improve the robustness of DNNs by masking the original gradient. According to the analysis above, it can be concluded that the effectiveness of FN stems from the mask matrix. Moreover, the elaborately designed mask matrix will endow FN with a better performance. Therefore, two new FN algorithms called sliding window feature nullification (SWFN) and fixed stride feature nullification (FSFN) are proposed. 3.1. Sliding window feature nullification Sliding window feature nullification (SWFN) divides the features space into some small successive and equaling local fractions without overlap, and each fraction is named as a sliding window. For an image x ∈ X D×M ×N example, the pixel values matrix of each channel (i.e., B, G, R) is divided into some uniform grid regions, as shown in Fig. 3. Here, X is the examples set, and D, M and N denote the number of channels, width of x and height of x. In Fig. 3, the solid lines represent the ground-truth data propagation of DNN, while the dashed lines are introduced to

K. Han, Y. Li and J. Hang / Knowledge-Based Systems 179 (2019) 108–116

111

Fig. 4. The specific processes of FSFN to generate the mask matrix. To make it easier to follow, it is assumed that the mask matrix is divided into two windows, including the green box region and the rest. The seed means there is a random selected pixel in each window, while ‘‘Head’’ and ‘‘Tail’’ represent the first and last elements of the window. s denotes the stride to nullify features in each window. Workflow direction moves from the top to bottom . (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)

demonstrate the function of FN. Moreover, the hidden layers and output layer together make up the original DNN. The procedures of SWFN can be described as follows according to the number features of an example and the predefined nullification rate r, a suitable window length w is selected, and the number of features n to be nullified is calculated. Then, the features are divided into some windows. Finally, a specified number of features in each window are randomly nullified. To be specific, r and w are predefined, and n is product of r and w. Furthermore, the pseudocode of SWFN is illustrated in Algorithm 1.

3.2. Fixed stride feature nullification To make nullified features scatter as uniformly as possible while maintaining the randomness of FN, we propose the fixed stride feature nullification (FSFN) algorithm. It is demonstrated that the higher the nullified sample dispersion us, the higher the experimental accuracy is maintained both intuitively and experimentally. FSFN adopts a one-step random feature nullification strategy. In detail, FSFN nullifies the features of the input randomly, but the nullification rate is fixed to maintain the availability of DNNs, which is different from RFN. The processes of FSFN can be described in Algorithm 2 and Fig. 3. To be specific, for each channel of input x ∈ RD×M ×N , one matrix Li ∈ RM ×N is generated and reshaped into a 1-d vector L′i ∈ R1×M ·N . Second, according to the size of the example and the predefined nullification rate r, a suitable window length w is selected, while stride s is the floor of the inverse of the feature nullification rate. Third, FSFN divides L′i ∈ R1×M ·N into uniform nonoverlapping windows according to w predefined in the first step. Then, it randomly selects one feature as a seed in each window to nullify pixels. Starting at the seed, features are nullified forward and backward in every window with stride s. Forward means features are nullified with a fixed stride, until the head of the window is reached, while backward means the tail of the window is reached. Moreover, when seed is the head feature of the window, only the backward process is executed.

112

K. Han, Y. Li and J. Hang / Knowledge-Based Systems 179 (2019) 108–116

When distances from the seed to tail and end are greater than s, both forward and backward processes are executed, while the last, forward process is executed only. To assure the specific number of features in each window are nullified, a counter is introduced, which is initialized as zero and plus one when a feature is nullified. The length of the last window may not be w, so it is necessary to recalculate the number of features to be nullified using the product of the length of the last window and the feature nullification rate r. Moreover, the concatenation operation demonstrates that Li ∈ RM ×N , i ∈ [1, D] is concatenated as a tensor L ∈ RD×M ×N . FSFN is a general method to retain the integrity of the local features to a large extent. Given a legitimate example x ∈ X D×M ×N ; X, D,M × N represents the original dataset, the number of example channels and the size of image (see Fig. 4).

Fig. 5. Visualization of ground-truth nullification performance of FNs under feature nullification rate 0.5. The original examples are ‘‘0’’ and ‘‘9’’ of MNIST.

D·M ·N/w

sn =

∏

(Cwn )q

(10)

q=1

·N sn = CDr ··DM·M ·N

(11)

where sn, D, sp , rp and (Cwn )i denote the number of possible final nullification results, the number of input channels, the feature nullification stride for the pth channel and the feature nullification rate and the qth combinatorial number, respectively. For the example of MNIST with a feature nullification rate of 0.5, RFN supplies 10234 nullification strategies, while SWFN supplies 10189 . Ultimately, rp can be different for each channel even for the window of the example, although the aforementioned feature nullification rate is fixed for an example. Additionally, the feature nullification rate will be adjusted according to attack strength. Moreover, the time complexity of our FNs is analyzed as follows. RFN and dropout select the specified number of features from the given example, which can be accomplished in a program step. Therefore, the time complexity of RFN and dropout is O(1). As for SWFN and FSFN, the time complexity depends on the number of windows, and the number of windows is always selected according to the number of the features n. Therefore, the time complexity of SWFN and FSFN is O(n). Moreover, when the number of windows in SWFN is set as 1, SWFN degrades into RFN. 4. Experiments

3.3. Analysis of FNs FN layers, as nonparameter layers, are added to DNNs to improve the robustness of DNNs with mask gradients of the inputs. As proposed in [16], the features of image data are redundant, and only a few features (pixels) are moderately correlated to the classification task. Therefore, it can be speculated that feature nullification has little effect on classification accuracy when the feature nullification rate is not high, i.e., usually no more than 0.3. The ground-truth nullification performance is shown in Fig. 5. The columns from left to right represent the original examples and corresponding examples nullified by Dropout, RFN, FSFN and SWFN, respectively. Under the same feature nullification rate, SWFN and FSFN maintain more features in semantic regions compared to Dropout and RFN. Therefore, it is speculated that both Dropout and RFN are not so efficient for maintaining integrity of the local features of the original examples, while FSFN and SWFN can alleviate this issue. The numbers of the feature nullification plan, under which specific features are nullified for FSFN, SWFN and RFN can be formulated as shown in (9)–(11), respectively: D

sn =

∏ p=1

4.1. Experiment settings MNIST [17] and GTSRB [18] are employed to evaluate the performance of the feature nullification algorithms. MNIST is a hand-written digits dataset. The training set for MNIST consists of 60 000 gray images with a size of 28 × 28, while the testing set includes 10 000 gray images with the same size as the training set. GTSRB is a traffic sign recognition dataset. The training set of GTSRB consists of 39 209 color images, while a testing set consists of 12 630 color images. However, the size of the examples for GTSRB are different and we scale them to 32 × 32. Architectures of DNNs for the classification task on MNIST and GTSRB are shown in Table 1. Moreover, the DNNs are implemented by the deep learning framework PyTorch. Batch normalization (BN) layers [27] are added to both DNNs to accelerate convergence. For both datasets, the adversarial examples for the FGSM with ∈ covering the range [0, 0.3] are used to evaluate the defense performance of different feature nullification algorithms. 4.2. Experiments to compare performance of FNs

D

sp =

∏ 1 i=1

rp

(9)

We conduct experiments to evaluate the classification accuracy of original models with variant FN algorithms with MNIST

K. Han, Y. Li and J. Hang / Knowledge-Based Systems 179 (2019) 108–116

113

Fig. 6. Defense performance of the model with variant FNs for MNIST.

Fig. 7. Defense performance of the model with variant FNs for GTSRB. Table 1 DNN architectures implemented for MNIST and GTSRB.

Table 3 Means and variance of model accuracy for the experiment results on GTSRB (under feature nullification rate 0.2).

MNIST

GTSRB

Conv(1,10,5)+RELU BN MAX_POOL(2) Conv(10,20,5) +RELU MAX_POOL(2) FC(320,50) FC(50,10) Softmax

Conv(3,10,5) +RELU BN MAX_POOL(2) Conv(10,20,5) +RELU MAX_POOL(2) FC(500,172) FC(172,43) Softmax

Table 2 Means and variance of model accuracy for the experiment results on MNIST (under feature nullification rate 0.4). Mean Variance

RFN

Dropout

SWFN

FSFN

0.8519 7.620e−6

0.7468 1.316e−5

0.7975 5.406e−6

0.9327 2.704e−6

and GTSRB. The experimental results are shown in Tables 2 and 3, respectively. In Tables 2 and 3, the Mean and Variance values are for the model accuracies from 5 independent experiments.

Mean Variance

RFN

Dropout

SWFN

FSFN

0.6704 1.326e−5

0.6708 9.770e−6

0.6944 9.319e−6

0.7410 6.024e−6

As shown in Tables 2 and 3, FSFN and SWFN achieve better performance for maintaining the original model accuracies than Dropout and RFN on both MNIST and GTSRB. The variances in Tables 2 and 3 can be viewed as indicators of the randomness of FN. Therefore, it can be concluded that SWFN and FSFN achieve higher model accuracy while losing some randomness. After evaluating the performance of FNs for model accuracy, we conduct experiments to evaluate the defense performance of four FNs. The experimental results are shown in Figs. 6 and 7 for the MNIST and GTSRB datasets, respectively. We explore the FNs defense performance with ϵ covering [0,0.3] under different feature nullification rates. In Figs. 6 and 7, RFN_adv, FSFN_adv, Dropout_adv and SWFN_adv denote accuracies of models using corresponding FNs

114

K. Han, Y. Li and J. Hang / Knowledge-Based Systems 179 (2019) 108–116

Fig. 8. Defense performance of four FNs with a high feature nullification rate for MNIST. Specifically, four lines from the top down denote the defense performance of FNs under feature nullification rates of 0.6, 0.7, 0.8 and 0.9, respectively.

under different attack intensities. Additionally, ori_adv denotes the accuracies of the original model without using any defense algorithm. Figs. 6 and 7 show that FNs are efficient for mitigating gradient-based attacks. Among four FNs, SWFN achieves the most excellent defense performance, while FSFN is most efficient for retaining model accuracy. Moreover, to explore the upper-bound of the feature nullification rate, we conduct experiments to examine the performance of four FNs under high feature nullification rates. In Figs. 8 and 9, ori represents the accuracies of the original model under different attack intensities, while RFN_0.6, RFN_0.7, RFN_0.8 and RFN_0.9 represent accuracies of the original model adopting RFN as a defense algorithm under feature nullification rates of 0.6, 0.7, 0.8 and 0.9, respectively. Similarly, Dropout_0.6, Dropout _0.7, Dropout _0.8 and Dropout _0.9 represent accuracies of the original model adopting Dropout as a defense algorithm under feature nullification rates of 0.6, 0.7, 0.8 and 0.9, respectively; SWFN_0.6, SWFN _0.7, SWFN _0.8 and SWFN _0.9 represent accuracies of the original model adopting SWFN as a defense algorithm under feature nullification rates of 0.6, 0.7, 0.8 and 0.9, respectively; FSFN_0.6, FSFN _0.7, FSFN _0.8 and FSFN _0.9 represent accuracies of the original model adopting FSFN as a defense algorithm under feature nullification rates of 0.6, 0.7, 0.8 and 0.9, respectively. In

Fig. 7, when the feature nullification rate comes to 0.9, the model accuracy remains stable, which means that FGSM attacks are not effective for all four FNs. Additionally, Fig. 8 shows that SWFN can maintain model accuracy stability under a feature nullification rate of 0.9 for GTSRB. In conclusion, there is no upper bound for the feature nullification rate, and as more features are nullified, the model will achieve an excellent defense performance. When the feature nullification rate approaches 1, all gradient-based attacks will fall flat. For the reason demonstrated in Eq. (8), the perturbation δ FN x will be 0. However, more features being nullified will cause DNNs to lose more accuracy. 4.3. Experimental analysis of L As mentioned in Section 3, the mask matrix L is a vital factor that impacts the performance of FN. Therefore, we conducted the experiments to explore the relationship between the properties of the mask matrix and the defense performance of corresponding FN. The 2-norm (Euclidean norm) of a matrix can be applied as a metric for the distance between two matrices [28]. Therefore, 2-norm of L can be applied to measures for the distance between the original gradient and the gradient protected by FN. Moreover,

K. Han, Y. Li and J. Hang / Knowledge-Based Systems 179 (2019) 108–116

115

Fig. 9. Defense performance of four FNs for a high feature nullification rate for GTSRB. Specifically, four lines from the top down denote the defense performance of FNs under feature nullification rates of 0.6, 0.7, 0.8 and 0.9, respectively.

the closer the original gradient and the gradient protected by FN are, the more similar they are. The model achieves poorer classification, as discussed in Section 3. The 2-norm of mask matrices of FNs for MNIST are shown in Table 4. According to Tables 2 and 4, the model accuracy is consistent with the magnitude of 2-norm (FSFN>RFN>SWFN>Dropout). According to Tables 2–4 as well as Figs. 6 and 7, we can draw a conclusion that if FN is more excellent for maintaining model accuracy, its corresponding mask matrix is with bigger than the 2-norm value. Inversely, if FN achieves an excellent defense performance, the 2-norm value of the mask matrix will be smaller. Therefore, it is speculated that 2-norm can be employed to evaluate the performance of FNs and provide guidance to design an efficient feature nullification strategy. As shown in Fig. 1, in edge regions of the examples for MNIST, there are many zero values for pixels (black points), which contain no features related to the classification task. In Fig. 2, the bounding regions of the GTSRB examples contain no features related to the classification task as well. Therefore, it makes no sense to nullify features in bound regions. As mentioned in Section 3, prior knowledge can be utilized to craft mask matrix L, and it is speculated that protecting those regions may be more significant. Therefore, we conducted experiments with the aim

Table 4 2-norm of the mask matrices of FNs. FNs

FSFN SWFN RFN Dropout

Feature nullification rate 0.2

0.3

0.4

0.5

23.61 22.51 22.62 22.59

20.89 19.76 19.89 19.88

17.88 17.05 17.20 17.19

16.31 14.35 14.82 14.47

of validating the defense performance of FNs with or without distinguishing the main semantic regions under the same feature nullification rate. The experiment results are shown in Fig. 10. In Fig. 10, ori, with and without, represent cases of the original model without adopting any defense algorithm, the original model with FN and nullifying features only in semantic regions and the original model with FN and nullifying features in the whole features region, respectively. A model with an FN layer to nullify features only in the semantic region achieves the best robustness compared to the other two cases under the same feature nullification rate. Therefore, it is concluded that a model with FN paying more attention to protecting semantic regions will achieve the best defense performance. Since the main semantic regions

116

K. Han, Y. Li and J. Hang / Knowledge-Based Systems 179 (2019) 108–116

References

Fig. 10. Performance of the feature nullification algorithm with and without semantic region protection.

are variant from datasets even with a specific example, the design of a suitable mask matrix may require some prior knowledge. 5. Discussion and conclusion 5.1. Discussion As demonstrated above, FNs can be employed to improve the robustness of DNNs against a gradient-based attack. Dropout is added to DNNs only in the training phase, and it will be removed from DNNs in the deployment phase. RFN must be added to DNNs in the training phase. In this paper, the experimental results show that SWFN and FSFN can be added to DNNs in the inference phase as well. Compared to the RFN, SWFN and FSFN aim to lower the randomness of the model to maintain model availability by maintaining local feature integrity. Moreover, it is experimentally demonstrated that it is protection of the mask matrix for the semantic region of the example, not just the randomness of the matrix itself, that can improve the robustness of DNNs against gradient-based adversarial examples. Moreover, Ref. [29] speculates that DNN architecture also impacts the performance of FNs. Some more experiments will be conducted to explore this issue. Since image features are redundant, a feature selection algorithm [30] will be applied to design more effective FN. 5.2. Conclusion This paper theoretically proves that feature nullification (FN) can improve the robustness of DNNs. According to theoretical proof, SWFN and FSFN algorithms are proposed. The experimental results show that SWFN and FSFN are more effective for maintaining availability and improving the robustness of the original model. Exploring the relationship between performance of FN and the mask matrix provides us with additional methods to measure algorithm performance. In summary, FNs are simple and effective algorithms to improve the robustness of DNNs. Acknowledgment This work was partially supported by Natural Science Foundation of China (No. 61603197, 61772284, 61876091, 61802205).

[1] Yann LeCun, Yoshua Bengio, Geoffrey Hinton, Deep learning, Nature 521 (2015). [2] Christian Szegedy, Sergey Ioffe, Vincent Vanhoucke, Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning. arXiv:1602.07261 [cs.CV]. [3] A. Krizhevsky, I. Sutskever, Hinton G., Imagenet classification with deep convolutional neural networks, Adv. Neural Inf. Process. Syst. 25 (2) (2012). [4] L. Liu, Y.-J. Liu, S. Tong, Fuzzy based multi-error constraint control for switched nonlinear systems and its applications, IEEE Trans. Fuzzy Syst (2018). [5] X. Yao, Z. Wang, Zhang H., Identification method for a class of periodic discrete-time dynamic nonlinear systems based on sinusoidal ESN, Neurocomputing 275 (2018) 1511–1521. [6] Z. Wang, L. Liu, H. Zhang, Neural network-based model-free adaptive faulttolerant control for discrete-time nonlinear systems with sensor fault, IEEE Trans. Syst. Man Cybern. (2017) 1–12. [7] C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. Goodfellow, R. Fergus, Intriguing properties of neural networks, 2013. http://arxiv.org/ abs/1312.6199. [8] Nicolas Papernot, Patrick McDaniel, Arunesh Sinha, Michael Wellman, SoK: Towards the Science of Security and Privacy in Machine Learning. arXiv: 1611.03814v1 [cs.CR] 11 Nov 2016. [9] M. Barreno, B. Nelson, A.D. Joseph, J.D. Tygar, The security of machine learning, Mach. Learn. 81 (2) (2010) 121–148. [10] Nicolas. Papernot, Patrick. McDaniel, Somesh. Jha, Matt. Fredrikson, Z.B.erkay. Celik, Ananthram. Swami, The limitations of deep learning in adversarial settings, in: 2016 IEEE European Symposium on Security and Privacy (EuroS & P), IEEE, 2016, pp. 372–387. [11] Ian J. Goodfellow, Jonathon Shlens, Christian Szegedy, Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572, 2014. [12] T. Miyato, S. Maeda, M. Koyama, Distributional smoothing by virtual adversarial examples. CoRR, abs/1507.00677, 2015. [13] Alexey Kurakin, Ian J. Goodfellow, Samy Bengio, Adversarial examples in the physical world. CoRR, abs/1607.02533, 2016. [14] Nicholas Carlini, David Wagner, Towards evaluating the robustness of neural networks, in: IEEE Symposiumon Security and Privacy, 2017. [15] Seyed Mohsen, Moosavi Dezfooli, Alhussein Fawzi, Pascal Frossard, Deepfool: as impleand accurate method to fool deep neural networks. CoRR, abs/1511.04599, 2015. [16] Jiawei Su, Danilo Vasconcellos Vargas, Sakurai Kouichi, One pixel attack for fooling deep neural networks. arXiv:1710.08864v1 [cs.LG] 24 Oct 2017. [17] R. Cox, B. Haskell, Y. LeCun, B. Shahraray, L. Rabiner, On the application of multimedia processing to telecommunications, Proc. IEEE 86 (5) (1998) 755–824. [18] Pierre Sermanet, Yann LeCun, Traffic sign recognition with multi-scale convolutional networks, in: Proceedings of the IEEE International Joint Conference on Neural Networks, IEEE Press, 2011, pp. 2809–2813. [19] Dongyu Meng, Hao Chen, MagNet: a Two-Pronged Defense against Adversarial Examples. CCS ’17, October 30-November 3, 2017. [20] Nicolas Papernot, Patrick McDaniel, Xi Wu, Somesh Jha, Ananthram Swami, Distillation as a defense to adversarial perturbations against deep neural networks, in: The 37th IEEE Symposium on Security & Privacy, IEEE, 2016. [21] G. Hinton, O. Vinyals, J. Dean, Distilling the knowledge in a neural network, in: Deep Learning and Representation Learning Workshop at NIPS 2014, arXiv preprint arXiv:1503.02531, 2014. [22] Alexander G. Ororbia II, Daniel Kifer, C. LeeGiles, Unifying Adversarial Training Algorithms with Flexible Deep Data Gradient Regularization. arXiv:1601.07213v3. [23] Qinglong Wang, Wenbo Guo, Kaixuan Zhang, Alexander G. Ororbia II, Xinyu Xing, C. Lee Giles, Xue Liu, Adversary resistant deep neural networks with an application to malware detection, in: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, in: (KDD’17), Halifax, CA, 2017. [24] Nitish Srivastava, Geoffrey E. Hinton, Alex Krizhevsky, Ilya Sutskever, Ruslan Salakhutdinov, Dropout: as imple way to prevent neural networks from overfitting, J. Mach. Learn. Res. 15 (1) (2014) 1929–1958. [25] Elizabeth Million: The Hadamard Product. 2007. [26] Qinglong Wang, Wenbo Guo, Kaixuan Zhang, Alexander G. Ororbia II, Xinyu Xing, C. Lee Giles, Xue Liu, Learning Adversary-Resistant Deep Neural Networks. arXiv:1612.01401v1 [cs.LG] 5 2016. [27] Sergey Ioffe, Christian Szegedy, Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. arXiv:1502.03167v3 [cs.LG], 2015. [28] Prugovečki Eduard, Quantum Mechanics in Hilbert Space, second ed., Academic Press, ISBN: 0-12-566060-X, 1981, p. 20. [29] Chenlin Zhang, Jianhao Luo, Xiushen Wei, Jianxin wu, In Defense of Fully Connected Layers in Visual Representation Transfer. PCM 17’S, 2017. [30] Y. Peng, B.L. Lu, Discriminative extreme learning machine with supervised sparsity preserving for image classification, Neurocomputing 261 (2017) 242–252.

Adversary resistant deep neural networks via advanced feature nullification

Adversary resistant deep neural networks via advanced feature nullification

Recommend Documents