Local learning-based feature weighting with privacy preservation

Neurocomputing 174 (2016) 1107–1115 Contents lists available at ScienceDirect Neurocomputing journal homepage: www.elsevier.com/locate/neucom Local...

Download PDF

585KB Sizes 2 Downloads 74 Views

Report

PDF Reader
Full Text

Neurocomputing 174 (2016) 1107–1115

Contents lists available at ScienceDirect

Neurocomputing journal homepage: www.elsevier.com/locate/neucom

Local learning-based feature weighting with privacy preservation Yun Li a,n, Jun Yang a, Wei Ji b,c a College of Compter Science and Jiangsu High Technology Research Key Laboratory for Wireless Sensor Networks, Nanjing University of Posts and Telecommunications, Nanjing, China b College of Telecommunications and Information Engineering, Nanjing University of Posts and Telecommunications, Nanjing, China c Key Laboratory of Cloud Computing & Complex System, Guilin University of Electronic Technology, Guilin, China

art ic l e i nf o

a b s t r a c t

Article history: Received 4 March 2015 Received in revised form 14 August 2015 Accepted 12 October 2015 Communicated by Feiping Nie Available online 20 October 2015

The privacy-preserving data analysis has been gained signiﬁcant interest across several research communities. The current researches mainly focus on privacy-preserving classiﬁcation and regression. On the other hand, feature selection is also one of the key problems in data mining and machine learning. However, for privacy-preserving feature selection, the relevant papers are few. In this paper, a local learning-based feature weighting framework is introduced. Moreover, in order to preserve the data privacy during local learningbased feature selection, the objective perturbation and output perturbation strategies are used to produce local learning-based feature selection algorithms with privacy preservation. Meanwhile, we give deep analysis about their privacy preserving property based on the differential privacy model. Some experiments are conducted on benchmark data sets. The experimental results show that our algorithms can preserve the data privacy to some extent and the objective perturbation always obtains higher classiﬁcation performance than output perturbation when the privacy preserving degree is constant. & 2015 Elsevier B.V. All rights reserved.

Keywords: Local learning Feature weighting Privacy preservation

1. Introduction Feature selection is one of the key problems in machine learning and data mining [1,2], which brings the immediate effects of speeding up a machine learning or data mining algorithm, improving learning accuracy, and enhancing model comprehensibility. Various studies show that features can be removed without performance deterioration [3]. Roughly speaking, a feature selection algorithm is usually associated with two important aspects: search strategy and evaluation criterion. According to the criterion, algorithms can be categorized into ﬁlter model, wrapper model and embedded model [1,2]. On the other hand, if the categorization is based on output style, feature selection algorithms can be divided into either feature weighting/ranking algorithms or subset selection algorithms [3]. The output of feature selection algorithms discussed in this paper is feature weighting. A comprehensive survey of existing feature selection techniques and a general framework for their uniﬁcation can be found in [1–3]. Current feature selection research focuses on the classiﬁcation accuracy and stability of selected features, however, the privacy preservation property is also very important for feature selection. The privacy preservation means the selected features cannot leak the privacy information of data, and the privacy information is the sensitive one that data owner reluctant to disclose. The privacy n

Corresponding author. Tel.: þ 86 25 85866421; fax: þ 86 25 85866151. E-mail address: [email protected] (Y. Li).

http://dx.doi.org/10.1016/j.neucom.2015.10.038 0925-2312/& 2015 Elsevier B.V. All rights reserved.

information has been a growing concern in medical records, ﬁnancial records, web search histories, social network data, etc. The privacy-preserving classiﬁcation and regression [4–7] have been deeply analyzed. However, the privacy preserving feature selection algorithms are very few. In this paper, we will present some works on the privacy preserving feature selection. Concretely, two strategies, i.e., output perturbation and objective perturbation, are adopted to add privacy preserving property for local learning-based feature selection algorithm, and the ε-differential privacy [8] is chosen as privacy model. For the local learning-based feature selection, the logistic loss with L2-regularizer is utilized to design the evaluation criterion of feature selection. This paper is the expand of our previous work [9] and it is organized as follows, the feature weighting algorithm based on local learning FWELL is introduced in Section 2. Section 3 presents privacy model. Section 4 describes the differentially private feature selection algorithm based on output perturbation Output-FWELL. Section 5 introduces the differentially private feature selection algorithm based on objective perturbation Objective-FWELL. The experimental results on bench mark data sets are shown in Section 6. The paper concludes in Section 7.

2. Feature weighting algorithm based on local learning For feature weighting, we are given a training sample set D, which contains n samples, D ¼ fX; Yg ¼ fxi ; yi gni¼ 1 , where xi is the input for the ith training sample xi ¼ ðxi1 ; xi2 ; …; xid Þ A Rd , and yi is the corresponding label.

1108

Y. Li et al. / Neurocomputing 174 (2016) 1107–1115

Based on local learning, for sample xi , it should be close to the nearest neighbor sample with the same label to xi (i.e., near hit sample NHðxi Þ) and away from the nearest neighbor sample with different class labels (i.e., near miss sample NMðxi Þ) [10]. For the purposes of this paper, we use the Manhattan distance to ﬁnd the nearest neighbors (i.e., NHðxi Þ and NMðxi Þ) and to deﬁne their closeness, while other standard distance deﬁnitions may also be used. The logistic regression loss is adopted to model the ﬁt of data for its simplicity and effectiveness. In addition, the logistic loss is twice differentiable and strongly convex, which is good for faster optimizations [11]. Then for any sample xi , the logistic loss function is deﬁned as follows: LðwT zi Þ ¼ log ð1 þ expð wT zi ÞÞ

ð1Þ

In Eq. (1), T is the transpose, w is the feature weight vector, zi ¼ j xi NMðxi Þj j xi NHðxi Þj and j j is an element-wise absolute operator. zi can be considered as the mapping point of xi . wT zi is the local margin for xi , which belongs to hypothesis margin [12] and an intuitive interpretation of this margin is a measure of the proportion of the features in xi that can be corrupted by noise (or how much xi can “move” in the feature space) before xi is being misclassiﬁed [10]. In other words, the feature weighting based on local learning is like to scale each feature, and thus obtains a weighted feature space parameterized by a vector w, so that a local margin-based loss function in the induced feature space is minimized. Thus by the large margin theory [13], a classiﬁer trained on weighted feature space that minimizes a margin-based loss function usually generalizes well on unseen test data. Moreover, in order to prevent from overﬁtting, the regularization is always used. Thus, the evaluation criterion for feature weighting on the training data set D is deﬁned as follows: Lðw; DÞ ¼

n 1X LðwT zi Þ þ λRðwÞ; ni¼1

ð2Þ

where λ is the cost parameter balancing the importance of the two terms, RðwÞ in (2) is a regularizing term. Then feature selection aims to ﬁnd the target model w, which minimizes the loss function in Eq. (2). Then we obtain the feature selection algorithm based on local learning shown in Algorithm 1. Note that, as an example, the gradient descent algorithm is used to illustrate the minimization of evaluation function Eq. (2). Of course, the optimal feature weights can be found by many other optimization approaches. Algorithm 1. Feature WEighting algorithm based on Local Learning-FWELL. Step 1.

Input training data set D regularization parameter

Step 2. Step 3.

¼ fxi ; yi gni¼ 1 ,

λ in Eq. (2).

d

xi A R and

Initialize w ¼ ð1; 1; …; 1Þ A Rd . For i ¼ 1; 2; …; n (a) Given xi , ﬁnd the NHðxi Þ and NMðxi Þ. (b) Based on Eq. (1) to obtain LðwT zi Þ

For privacy measure, we adopt ε-differential privacy model [8], which is a measure of quantifying the privacy-risk associated with computing functions of sensitive data. A statistical procedure satisﬁes ε-differential privacy if changing a single data point does not shift the output distribution by too much. Therefore, from the output of the algorithm, it is difﬁcult to infer the value of any particular data point [5]. And ε-differential privacy model is robust to known attacks, such as those involving side information [15]. ε-differential privacy model is a strong, cryptographically-motivated deﬁnition of privacy that has recently received a signiﬁcant amount of research attention, such as differentially private empirical risk minimization for classiﬁcation and regression [4–7]. Deﬁnition 1. A randomized mechanism A provides ε-differential privacy, if, for all data sets D and D0 which differ by at most one element, and for all output subsets S D RangeðAÞ: Pr ½AðDÞ A S r expðεÞ Pr AðD0 Þ A S ð4Þ The probability Pr is taken over the coin tosses of A, and Range(A) denotes the output range of A. The privacy parameter ε measures the disclosure. When data sets which are identical except for a single entry are input to the algorithm A, the two distributions on the algorithm's output are close. That is, any single entry of the data set does not affect the output. This means that an adversary, who knows all but one entry of the data set, cannot gain much additional information about this entry by observing the output of the algorithm. So the privacy of this entry is preserved. In other words, suppose D ¼ fðx1 ; y1 Þ; …; ðxn ; yn Þg and D0 ¼ fðx1 ; y1 Þ; …; ðxn0 ; yn0 Þg be two data sets that differ in the value of the nth individual. The two distributions on the differentially private algorithm A's output are close. Then an adversary, who knows all but the nth entry of the data set, cannot gain much additional information about this entry by observing the output of the algorithm.

4. Differentially private feature selection based on output perturbation 4.1. Sensitivity analysis In order to propose privacy preserving FWELLs in terms of the differential privacy in Eq. (4), we like to adopt the output perturbation and objective perturbation strategy. In this section, we will present the differentially private FWELL with output perturbation. This algorithm depends on the FWELL's sensitivity. In general, the sensitivity is always deﬁned as follows [5,16]. Deﬁnition 2. For any function A with n inputs, we deﬁne the sensitivity ΔQ as the maximum, over all inputs, of the difference in the value of A when one input of A is changed. That is,

zi Þ (c) ∇ ¼ 1n ∂Lðw þ λ∂RðwÞ ∂w ∂w . ∇ (d) w ¼ w ‖∇‖2 . T

Step 4.

3. Privacy model

ΔQ ¼ max0 J AðDÞ AðD0 Þ J

Output the feature weighting vector w.

ð5Þ

D;D

In the following analysis and experiments, the L2 regularizer is used as RðwÞ in Eq. (2) for its rotational invariance and strong stability property [14]. Then the concrete evaluation criterion considered in this paper is as follows: n 1X LðwT zi Þ þ λ‖w‖2 : ni¼1

Data sets D and D0 differ by at most one element. According to Deﬁnition 2, we can analyze the sensitivity of FWELL with L2 regularizer and obtain Corollary 1.

ð3Þ

Corollary 1. The feature weighting algorithm described in Algorithm 1 (FWELL) with L2 regularizer has the sensitivity 2=λn.

And the gradient descent algorithm is used to minimize the evaluation function (3) to obtain the feature weights as described in Algorithm 1 with name FWELL.

Proof. Let D ¼ fðx1 ; y1 Þ; …; ðxn ; yn Þg and D0 ¼ fðx1 ; y1 Þ; …; ðxn0 ; yn0 Þg be two data sets that differ in the value of the nth individual. Suppose w1 and w2 are the solutions respectively to FWELL when

Lðw; DÞ ¼

Y. Li et al. / Neurocomputing 174 (2016) 1107–1115

data sets are D and D0 : ( w1 ¼ arg minw Lðw; DÞ; w2 ¼ arg minw Lðw; D0 Þ

Step 1. ð6Þ

Input set D ¼ fxi ; yi gni¼ 1 , regularization parameter

in Eq. (2), privacy parameter ε and parameter a. Obtain w according to Algorithm 1. Draw a random noise vector b with density vðbÞ based on the sensitivity analysis in Corollary 1.

Step 2. Step 3.

Then according to Deﬁnition 2, the sensitivity of FWELL is the upper bound of J w1 w2 J . We deﬁne a function ℓðwÞ as ℓðwÞ ¼ Lðw; D0 Þ Lðw; DÞ:

∇Lðw2 ; DÞ þ ∇ℓðw2 Þ ¼ 0

ð8Þ

Since logistic loss and L2 regularizer are 1-strongly convex, then L satisﬁes λ-strongly convex [5]. We obtain ð∇Lðw1 ; DÞ ∇Lðw2 ; DÞÞT ðw1 w2 Þ Z λ J w1 w2 J 2

ð9Þ

while based on Eq. (8) and ∇Lðw1 ; DÞ ¼ 0, ð∇Lðw1 ; DÞ ∇Lðw2 ; DÞÞT ðw1 w2 Þ ¼ ðw1 w2 Þð∇ℓðw2 ÞÞT

ð10Þ

According to Cauchy–Schwartz inequality, we obtain J w1 w2 J J ∇ℓðw2 Þ J Zðw1 w2 Þð∇ℓðw2 ÞÞT

ð11Þ

So, combining Eqs. (9)–(11) J w1 w2 J J ∇ℓðw2 Þ J Z λ J w1 w2 J 2

ð12Þ

Compute w0 ¼ w þb. Output w0 .

Step4. Step 5.

For Output-FWELL, the following Theorem 1 is obtained. Theorem 1. Output-FWELL is ε-differentially private. Proof. We know that D and D0 are any two data sets that differ in one individual. For w0 derived from Output-FWELL on D and D0 , we obtain w0 ¼ w1 þ b1 and w0 ¼ w2 þ b2 where w1 and w2 are unique outputs of FWELL on data sets D and D0 respectively, b1 and b2 are the corresponding noise vectors in Output-FWELL for D and D0 . Then Prðw0 jDÞ vðb1 Þ ¼ eðnελ=2Þð J b2 J J b1 J Þ ð19Þ ¼ Prðw0 D0 Þ vðb2 Þ where Prðw0 jDÞðPrðw0 D0 ÞÞ is the probability of the output of 0 Output-FWELL at w when the input is data set D ðD0 Þ. Since w1 þ b1 ¼ w2 þ b2 leads to b2 b1 ¼ w1 w2 , we obtain the following equation using a triangle inequality: J b2 J J b1 J r J b2 b1 J ¼ J w1 w2 J

Namely, 1 J w1 w2 J r J ∇ℓðw2 Þ J

λ

ð13Þ

Suppose that zn and zn0 only depend on xn and xn0 respectively, and based on Eqs. (3) and (7), for any w, we can approximately obtain 1 ℓðwÞ ¼ ðLðwT zn0 Þ LðwT zn ÞÞ n

ð14Þ

λ

1 vðbÞ ¼ e ðnελ=2Þ J b J : a

ð7Þ

Because logistic loss and L2 regularizer are adopted, functions L and ℓ are continuous and differentiable. Since the w1 and w2 are minimizers of Lðw; DÞ and Lðw; D0 Þ, respectively, and they are obtained through gradient descent, then their gradients are equal to zero, i.e., ∇Lðw1 ; DÞ ¼ 0 and ∇Lðw2 ; D0 Þ ¼ 0. Based on Eq. (7), we can obtain ℓðw2 Þ ¼ Lðw2 ; D0 Þ Lðw2 ; DÞ, then

1109

ð20Þ

Combining Eqs. (18)–(20), we can achieve Prðw0 jDÞ r eε □ Prðw0 D0 Þ

ð21Þ

Therefore, Output-FWELL provides terms of Deﬁnition 1.

ε-differential privacy in

According to Eq. (1), for any point z LðwT zÞ ¼ log ð1 þ expð wT zÞÞ Then for the normalized J z J r1 z 1 r zJ r 1 J ∇LðwT zÞ J ¼ T T 1 þ expðw zÞ 1 þ expðw zÞ

ð15Þ

ð16Þ

Based on Eqs. (14) and (16), we can achieve 1 1 J ∇ℓðwÞ J ¼ J ∇LðwT zn0 Þ ∇LðwT zn Þ J r ð J ∇LðwT zn0 Þ J n n 2 þ J ∇LðwT zn Þ J Þ r n 2

λn

□

Furthermore, we like to propose another privacy preserving feature selection algorithm with the noise adding to the objective function in FWELL. That is, we can minimize Lðw; DÞ ¼

ð17Þ

So J ∇ℓðw2 Þ J is less than 2=n. Based on Eq. (13), we can obtain J w1 w2 J r

5. Differentially private feature selection based on objective perturbation

ð18Þ

n 1X 1 T LðwT zi Þ þ λ‖w‖2 þ b w; ni¼1 n

ð22Þ

to obtain the another privacy preserving feature selection algorithm Objective-FWELL as described in Algorithm 3. Note that, as an example, the gradient descent algorithm is also used to Table 1 Description of experimental data sets.

4.2. Differential privacy feature selection Output-FWELL

Data sets

No. samples

No. features

Now we are at position to present the differentially private feature selection for FWELL based on output perturbation, which is named as Output-FWELL and described in Algorithm 2.

BASEHOCK Breast Soybean Sonar Waveform WDBC

1993 286 307 208 3343 569

4863 9 34 60 20 30

Algorithm 2. Differentially private feature selection OutputFWELL.

1110

Y. Li et al. / Neurocomputing 174 (2016) 1107–1115

illustrate the minimization of evaluation function Eq. (22). For Objective-FWELL, we also can obtain Theorem 2.

(a) Given xi , ﬁnd the NHðxi Þ and NMðxi Þ. (b) Based on Eq. (1) to obtain LðwT zi Þ

Algorithm 3. Differentially private feature selection ObjectiveFWELL.

zi Þ (c) ∇ ¼ 1n ∂Lðw þ 2λw þ bn. ∂w ∇ (d) w ¼ w ‖∇‖2 . T

Step 1.

Input set D ¼ fxi ; yi gni¼ 1 , regularization parameter

Step 2.

Initialize w ¼ ð1; 1; …; 1Þ A Rd . Draw a random noise vector b with density vðbÞ

in Eq. (2) and privacy degree ε. Step 3.

Step 5.

λ

Theorem 2. Objective-FWELL is ε-differentially private. Proof. Let D ¼ fðx1 ; y1 Þ; …; ðxn ; yn Þg and D0 ¼ fðx1 ; y1 Þ; …; ðxn0 ; yn0 Þg be two data sets that differ in the value of the nth individual. For w0 derived from Objective-FWELL on D and D0 , we know that there exist unique b1 and b2 on data sets D and D0 respectively that map inputs to output w0 . This uniqueness holds, because both the

vðbÞ ¼ e ðε=2Þ J b J : For i ¼ 1; 2; …; n

0.92 Classification accuracy

Classification accuracy

0.9

0.85

0.8 Objective−FWELL Output−FWELL FWELL

0.75 −8

−7

−6 −5 −4 −3 −2 −1 Privacy parameter log(ε)

0

0.9

0.88 Objective−FWELL Output−FWELL FWELL

0.86 −8

−7

−6 −5 −4 −3 −2 −1 Privacy parameter log(ε)

1 Classification accuracy

Classification accuracy

0.8

0.75

0.7 Objective−FWELL Output−FWELL FWELL

0.65 −8

0

Breast

BASEHOCK

−7

−6 −5 −4 −3 −2 −1 Privacy parameter log(ε)

0

0.98 0.96 0.94 0.92 Objective−FWELL Output−FWELL FWELL

0.9 0.88 −8

−7

−6 −5 −4 −3 −2 −1 Privacy parameter log(ε)

Sonar

0

Soybean Classification accuracy

0.86 Classification accuracy

Step 4.

Output the feature weighting vector w0 ¼ w.

0.84 0.82 0.8 Objective−FWELL Output−FWELL FWELL

0.78 0.76 −8

−7

−6 −5 −4 −3 −2 −1 Privacy parameter log(ε)

0

0.94 0.93 0.92 0.91 Objective−FWELL Output−FWELL FWELL

0.9 0.89 −8

−7

−6 −5 −4 −3 −2 −1 Privacy parameter log(ε)

Waveform Fig. 1. Experimental results on 3NN classiﬁer for different privacy degrees.

WDBC

0

Y. Li et al. / Neurocomputing 174 (2016) 1107–1115

1111

Classification accuracy

Classification accuracy

0.92 0.96 0.94 0.92 0.9 0.88 Objective−FWELL Output−FWELL FWELL

0.86 0.84 −8

−7

−6 −5 −4 −3 −2 −1 Privacy parameter log(ε)

0

0.9

0.88 Objective−FWELL Output−FWELL FWELL

0.86 −8

−7

−6 −5 −4 −3 −2 −1 Privacy parameter log(ε)

0

Breast

BASEHOCK Classification accuracy

Classification accuracy

0.74 0.72 0.7 0.68 Objective−FWELL Output−FWELL FWELL

0.66 −8

−7

−6 −5 −4 −3 −2 −1 Privacy parameter log(ε)

0

0.97

0.96

0.95 Objective−FWELL Output−FWELL FWELL

0.94 −8

Sonar

−7

−6 −5 −4 −3 −2 −1 Privacy parameter log(ε)

0

Soybean Classification accuracy

Classification accuracy

0.88 0.86 0.84 0.82 Objective−FWELL Output−FWELL FWELL

0.8 0.78 −8

−7

−6 −5 −4 −3 −2 −1 Privacy parameter log(ε)

0

0.94 0.935 0.93 Objective−FWELL Output−FWELL FWELL

0.925

0.92 −8 −7 −6 −5 −4 −3 −2 −1 Privacy parameter log(ε)

Waveform

0

WDBC

Fig. 2. Experimental results on SVM classiﬁer for different privacy degrees.

Table 2 F-Measure value for different data sets on 3NN classiﬁer.

Table 3 F-Measure value for different data sets on SVM classiﬁer.

Data sets

Output-FWELL

Objective-FWELL

Data sets

Output-FWELL

Objective-FWELL

BASEHOCK Breast Soybean Sonar Waveform WDBC

0.54 0.56 0.57 0.54 0.54 0.57

0.53 0.56 0.58 0.51 0.55 0.57

BASEHOCK Breast Soybean Sonar Waveform WDBC

0.56 0.56 0.57 0.51 0.55 0.57

0.56 0.56 0.57 0.51 0.54 0.57

1112

Y. Li et al. / Neurocomputing 174 (2016) 1107–1115

1

0.9 0.8 0.7 0.6 Objective−FWELL Output−FWELL FWELL

0.5 0.4 1

1215

2430

3645

Classification accuracy

Classification accuracy

1

0.95

0.9

0.85

Objective−FWELL Output−FWELL FWELL

0.8 1

4860

The numbe of selected features

2

4

6

8

The numbe of selected features

Breast

BASEHOCK

0.8

0.7

Objective−FWELL Output−FWELL FWELL

0.6

0.5 1

15

30

45

Classification accuracy

Classification accuracy

0.9

60

0.95 0.9 0.85 0.8 0.75 0.7

Objective−FWELL Output−FWELL FWELL

0.65 1

The numbe of selected features

8

17

26

34

The numbe of selected features

Sonar

Soybean

0.95 0.9 0.85 0.8 0.75 0.7

Objective−FWELL Output−FWELL FWELL

0.65 1

5 10 15 20 The numbe of selected features

Classification accuracy

Classification accuracy

0.95

0.9

0.85

0.8

0.75 1

Objective−FWELL Output−FWELL FWELL 6 12 18 24 The numbe of selected features

Waveform

WDBC

Fig. 3. Experimental results on 3NN classiﬁer for different numbers of selected features.

regularization function and the loss functions are differentiable everywhere. Then Prðw0 jDÞ vðb1 Þ ¼ eðε=2Þð J b2 J J b1 J Þ ¼ Prðw0 D0 Þ vðb2 Þ

ð23Þ

Then for the normalized J zJ r 1 and based on Eq. (16) J b1 b2 J r2 Now, we use a triangle inequality: J b1 J 2 r J b2 J r J b1 J þ 2

In our case, the LðwT zi Þ is deﬁned in Eq. (1). Since w0 is the value that minimizes the evaluation function (22) on data sets D and D0 respectively, the derivatives of both objective functions at w0 on D and D0 are 0, i.e. ∇Lðw0 ; DÞ ¼ 0 and ∇Lðw0 ; D0 Þ ¼ 0. This implies that b1 b2 ¼

zn zn0 1 þ expðw0T zn Þ 1 þ expðw0T zn0 Þ

ð24Þ

ð25Þ

ð26Þ

Therefore, combining Eqs. (23) and (26), we obtain Prðw0 jDÞ reε □ Prðw0 D0 Þ

ð27Þ

So, our algorithm Objective-FWELL is ε-differentially private in terms of Deﬁnition 1.

Y. Li et al. / Neurocomputing 174 (2016) 1107–1115

6. Experiments In this section, we will give some experimental results for our proposed differentially private feature selection algorithms Output-FWELL and Objective-FWELL. Since there exist few privacy preserving feature selection algorithms, we only show the results for the proposed algorithms. The whole experiments consist of three parts: one is to validate the effect of privacy parameter ε, another is to evaluate the trade-off between the privacy preservation and classiﬁcation accuracy, and the last one is to show the classiﬁcation performance for different selected feature subsets under a given privacy degree. The classiﬁers used in the

experiment are linear support vector machine (SVM) with C ¼1 and the 3-nearest neighbor classiﬁer ð3NNÞ. Classiﬁcation accuracy was assessed using a 10-fold cross-validation. For each fold, the proposed FWELL, Output-FWELL and Objective-FWELL algorithms are applied to the training part of the data to obtain the feature weighting results, and the features are ranked in descending order. Then different numbers of important features are selected with top ranks one by one to create classiﬁers. In all experiments, the parameter λ in our proposed methods are tuned by crossvalidation. Six bench-mark data sets from UCI repository (http:// archive.ics.uci.edu/ml/datasets.html) are used in our experiments. The data sets are summarized in Table 1.

0.98

0.9 0.8 0.7 0.6 Objective−FWELL Output−FWELL FWELL

0.5 0.4 1

Classification accuracy

1 Classification accuracy

1113

0.96 0.94 0.92 0.9

0.86 1

1215 2430 3645 4860 The numbe of selected features

Objective−FWELL Output−FWELL FWELL

0.88

2 4 6 The numbe of selected features

Breast

BASEHOCK 1

0.75

0.7

0.65

Objective−FWELL Output−FWELL FWELL 15

30

45

Classification accuracy

Classification accuracy

0.8

1

8

0.98 0.96 0.94 0.92

0.88 1

60

Objective−FWELL Output−FWELL FWELL

0.9

The numbe of selected features

8

17

26

34

The numbe of selected features

Sonar

Soybean

0.95 0.9 0.85 0.8 0.75 0.7 0.65 1

Objective−FWELL Output−FWELL FWELL 5 10 15 20 The numbe of selected features

Waveform

Classification accuracy

Classification accuracy

1 0.95 0.9 0.85 0.8 0.75 1

Objective−FWELL Output−FWELL FWELL 6 12 18 24 The numbe of selected features

WDBC

Fig. 4. Experimental results on SVM classiﬁer for different numbers of selected features.

1114

6.1. Experiments for parameter

Y. Li et al. / Neurocomputing 174 (2016) 1107–1115

ε

For the ﬁrst part of our experiment, we study the privacy preserving degree ε. The number of selected features for training classiﬁer is 10 percent of original feature dimensions. The privacy degree for Output-FWELL and Objective-FWELL is quantiﬁed by the value of ε. The increasing of ε implies a higher change in the belief of adversary when one entry in D changes, and thus lower privacy preserving. The experimental results on six data sets are shown in Fig. 1 using 3NN classiﬁer, and in Fig. 2 using SVM. The Xaxis is the log value of selected privacy parameter ε and the Y-axis is the classiﬁcation accuracy using different classiﬁers. From the results, we can observe that in the case of no privacypreserving, i.e., ε ¼1, the Output-FWELL and Objective-FWELL consistently obtain similar classiﬁcation accuracy to the FWELL's. The performance of Output-FWELL and Objective-FWELL will drop along with the decline of ε, i.e., the data privacy preserving degree is increasing. These results are consistent with our intuition.

algorithms based on local learning and differential privacy are proposed and analyzed in theory. We also conduct many experiments to validate their performance on different bench-mark data sets and classiﬁers. In the experiments, the results for different privacy preserving degree ε and different numbers of selected features under privacy constraint are shown. Our experiments as well as theoretical results indicate that in general, the proposed algorithms Output-FWELL and Objective-FWELL can obtain high performance under some privacy constraint, and it can preserve data privacy to some extent. In the paper, we are interested in producing feature selection algorithms in a manner that preserves the privacy of individual entries in training data set, not to choose feature subset with less privacy leakage and high classiﬁcation performance. The latter case needs some prior or domain knowledge to determine the privacy degree for each feature. However, the combination of them is our future work.

Acknowledgment 6.2. Evaluation of tradeoff between privacy degree and accuracy In the second part of our experiment, we study the tradeoff between the privacy preserving degree and the classiﬁcation accuracy based on F-Measure. F-Measure has been used to measure the trade-off between stability and classiﬁcation accuracy of a feature selection algorithm in [17]. Speciﬁcally, we deﬁne F Measure ¼ 2 privacy accuracy=ðprivacy þ accuracyÞ where privacy is represented by normalized absolute log value of selected privacy parameter ε, accuracy is evaluated using the classiﬁcation outcome based on the selected features. The number of selected features for training classiﬁer is also 10 percent of original feature dimensions. The experimental results of average F-Measure value for different privacy degrees on six data sets are shown in Tables 2 and 3. The results have shown that the Output-FWELL and Objective-FWELL have the similar trade-off between privacy degree and classiﬁcation accuracy for different classiﬁers and data sets. 6.3. Experiments for differentially private feature selection For the third part of our experiment, we study the classiﬁcation accuracy when the classiﬁers are trained with different numbers of selected features from ranked feature set. And the privacy parameter for Output-FWELL and Objective-FWELL is constant, i.e., ε ¼0.01. The experimental results for FWELL, Output-FWELL and Objective-FWELL on six data sets are shown in Fig. 3 for 3NN classiﬁer, and shown in Fig. 4 for SVM classiﬁer. The X-axis is the number of selected features and the Y-axis is the classiﬁcation accuracy using different classiﬁers. From the results, we can observe that the performance of privacy preserving feature selection algorithms, i.e., Output-FWELL and Objective-FWELL, are very close to the non-privacy preserving algorithm FWELL in most cases. Then our proposed differentially private feature selection algorithms can obtain approximate classiﬁcation performance to non-privacy preserving feature selection algorithm under the constraint of certain privacy, such as with ε ¼0.01. This means Output-FWELL and Objective-FWELL are effective and efﬁcient. Furthermore, we also observe that the performance of Objective-FWELL is closer to FWELL than OutputFWELL under certain privacy constraint in most cases.

7. Conclusions In this paper, we study the problem of privacy preserving feature selection, and two privacy preserving feature selection

This research was partially supported by National Natural Science Foundation of China (NSFC 61073114, 61300165 and 61300164), Natural Science Foundation of Jiangsu Province (BK20131378, BK20140885), Post-doctoral Foundation of Jiangsu Province (1401045C) and Foundation of Key Laboratory of Cloud Computing & Complex System (15206).

Appendix A. Supplementary data Supplementary data associated with this paper can be found in the online version at http://dx.doi.org/10.1016/j.neucom.2015.10.038.

References [1] H. Liu, L. Yu, Toward integrating feature selection algorithms for classiﬁcation and clustering, IEEE Trans. Knowl. Data Eng. 17 (4) (2005) 491–502. [2] M. Dash, H. Liu, Feature Selection for Classiﬁcation, Intelligent Data Analysis 1 (1997) 131–156. [3] Z. Zhao, Spectral feature selection for mining ultrahigh dimensional data (Ph. D. dissertation), Arizona State University, 2010. [4] K. Chaudhuri, C. Monteleoni, Privacy-preserving logistic regression, in: Advances in Neural Information Processing Systems, 2008, pp. 289–296. [5] K. Chaudhuri, C. Monteleoni, A.D. Sarwate, Differentially private empirical risk minimization, J. Mach. Learn. Res. 12 (2011) 1069–1109. [6] D. Kifer, A. Smith, A. Thakurta, Private convex empirical risk minimization and high-dimensional regression, J. Mach. Learn. Res. 23 (2012) 1–40. [7] P. Jain, A. Thakurta, (Near) Dimension independent risk bounds for differentially private learning, in: Proceedings of 31th International Conference on Machine Learning, 2014, pp. 1–9. [8] C. Dwork, Differential privacy, in: International Colloquium on Automata, Languages and Programming, 2006, pp. 1–12. [9] J. Yang, Y. Li, Differentially private feature selection, in: International Joint Conference on Neural Networks, 2014, pp. 4182–4189. [10] Y.J. Sun, S. Todorovic, S. Goodison, Local learning based feature selection for high dimensional data analysis, IEEE Trans. Pattern Anal. Mach. Intell. (2010) 1610–1626. [11] M.K. Tan, I.W. Tsang, L. Wang, Minimax sparse logistic regression for very high dimensional feature selection, IEEE Trans. Neural Netw. Learn. Syst. 24 (2013) 1609–1622. [12] K. Crammer, R.G. Bachrach, A. Navot, N. Tishby, Margin analysis of the lvq algorithm, in: Advances in Neural Information Processing Systems, La Jolla CA, 2002. [13] R.E. Schapire, Y. Freud, P. Bartlett, W.S. Lee, Boosting the margin: a new explanation for the effectiveness of voting methods, Ann. Stat. 26 (1998) 1651–1686. [14] A.Y. Ng, Feature selection, l1 vs. l2 regularization, and rotational invariance, in: Proceedings of International Conference on Machine Learning, Banff, Canada, 2004. [15] S.R. Ganta, S.P. Kasiviswanathan, A. Smith, Composition attacks and auxiliary information in data privacy, in: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2008, pp. 265– 273.

Y. Li et al. / Neurocomputing 174 (2016) 1107–1115

[16] C. Dwork, F. McSherry, K. Nissim, A. Smith, Calibrating noise to sensitivity in private data analysis, in: Proceedings of the Third Conference on Theory of Cryptography, 2006, pp. 265–284. [17] Y. Saeys, T. Abeel, Y.V. de Peer, Robust feature selection using ensemble feature selection techniques, in: Proceedings of the European Conference on Machine Learning, 2008, pp. 313–325.

Yun Li received the Ph.D. degree in Computer Science from Chongqing University, Chongqing, China. He is a Professor in the College of Computer Science, Nanjing University of Posts and Telecommunications, China. Prior to that, he was the postdoctoral fellow in Department of Computer Science and Engineering, Shanghai Jiao Tong University, China. His research mainly focuses on machine learning, data mining and parallel computing.

Jun Yang is a Master's degree student in Computer Science, Nanjing University of Posts and Telecommunications, China. His research interest is in machine learning.

1115 Wei Ji received Ph.D. degree in Communication and Information System from Shanghai Jiao Tong University, China, in 2009. She is an Associate Professor in College of Telecommunications and Information Engineering, Nanjing University of Posts and Telecommunications, China. Her research interests are in machine learning and its application in signal processing.

Local learning-based feature weighting with privacy preservation

Local learning-based feature weighting with privacy preservation

Recommend Documents