Neurocomputing 174 (2016) 1107–1115
Contents lists available at ScienceDirect
Neurocomputing journal homepage: www.elsevier.com/locate/neucom
Local learning-based feature weighting with privacy preservation Yun Li a,n, Jun Yang a, Wei Ji b,c a College of Compter Science and Jiangsu High Technology Research Key Laboratory for Wireless Sensor Networks, Nanjing University of Posts and Telecommunications, Nanjing, China b College of Telecommunications and Information Engineering, Nanjing University of Posts and Telecommunications, Nanjing, China c Key Laboratory of Cloud Computing & Complex System, Guilin University of Electronic Technology, Guilin, China
art ic l e i nf o
a b s t r a c t
Article history: Received 4 March 2015 Received in revised form 14 August 2015 Accepted 12 October 2015 Communicated by Feiping Nie Available online 20 October 2015
The privacy-preserving data analysis has been gained significant interest across several research communities. The current researches mainly focus on privacy-preserving classification and regression. On the other hand, feature selection is also one of the key problems in data mining and machine learning. However, for privacy-preserving feature selection, the relevant papers are few. In this paper, a local learning-based feature weighting framework is introduced. Moreover, in order to preserve the data privacy during local learningbased feature selection, the objective perturbation and output perturbation strategies are used to produce local learning-based feature selection algorithms with privacy preservation. Meanwhile, we give deep analysis about their privacy preserving property based on the differential privacy model. Some experiments are conducted on benchmark data sets. The experimental results show that our algorithms can preserve the data privacy to some extent and the objective perturbation always obtains higher classification performance than output perturbation when the privacy preserving degree is constant. & 2015 Elsevier B.V. All rights reserved.
Keywords: Local learning Feature weighting Privacy preservation
1. Introduction Feature selection is one of the key problems in machine learning and data mining [1,2], which brings the immediate effects of speeding up a machine learning or data mining algorithm, improving learning accuracy, and enhancing model comprehensibility. Various studies show that features can be removed without performance deterioration [3]. Roughly speaking, a feature selection algorithm is usually associated with two important aspects: search strategy and evaluation criterion. According to the criterion, algorithms can be categorized into filter model, wrapper model and embedded model [1,2]. On the other hand, if the categorization is based on output style, feature selection algorithms can be divided into either feature weighting/ranking algorithms or subset selection algorithms [3]. The output of feature selection algorithms discussed in this paper is feature weighting. A comprehensive survey of existing feature selection techniques and a general framework for their unification can be found in [1–3]. Current feature selection research focuses on the classification accuracy and stability of selected features, however, the privacy preservation property is also very important for feature selection. The privacy preservation means the selected features cannot leak the privacy information of data, and the privacy information is the sensitive one that data owner reluctant to disclose. The privacy n
Corresponding author. Tel.: þ 86 25 85866421; fax: þ 86 25 85866151. E-mail address:
[email protected] (Y. Li).
http://dx.doi.org/10.1016/j.neucom.2015.10.038 0925-2312/& 2015 Elsevier B.V. All rights reserved.
information has been a growing concern in medical records, financial records, web search histories, social network data, etc. The privacy-preserving classification and regression [4–7] have been deeply analyzed. However, the privacy preserving feature selection algorithms are very few. In this paper, we will present some works on the privacy preserving feature selection. Concretely, two strategies, i.e., output perturbation and objective perturbation, are adopted to add privacy preserving property for local learning-based feature selection algorithm, and the ε-differential privacy [8] is chosen as privacy model. For the local learning-based feature selection, the logistic loss with L2-regularizer is utilized to design the evaluation criterion of feature selection. This paper is the expand of our previous work [9] and it is organized as follows, the feature weighting algorithm based on local learning FWELL is introduced in Section 2. Section 3 presents privacy model. Section 4 describes the differentially private feature selection algorithm based on output perturbation Output-FWELL. Section 5 introduces the differentially private feature selection algorithm based on objective perturbation Objective-FWELL. The experimental results on bench mark data sets are shown in Section 6. The paper concludes in Section 7.
2. Feature weighting algorithm based on local learning For feature weighting, we are given a training sample set D, which contains n samples, D ¼ fX; Yg ¼ fxi ; yi gni¼ 1 , where xi is the input for the ith training sample xi ¼ ðxi1 ; xi2 ; …; xid Þ A Rd , and yi is the corresponding label.
1108
Y. Li et al. / Neurocomputing 174 (2016) 1107–1115
Based on local learning, for sample xi , it should be close to the nearest neighbor sample with the same label to xi (i.e., near hit sample NHðxi Þ) and away from the nearest neighbor sample with different class labels (i.e., near miss sample NMðxi Þ) [10]. For the purposes of this paper, we use the Manhattan distance to find the nearest neighbors (i.e., NHðxi Þ and NMðxi Þ) and to define their closeness, while other standard distance definitions may also be used. The logistic regression loss is adopted to model the fit of data for its simplicity and effectiveness. In addition, the logistic loss is twice differentiable and strongly convex, which is good for faster optimizations [11]. Then for any sample xi , the logistic loss function is defined as follows: LðwT zi Þ ¼ log ð1 þ expð wT zi ÞÞ
ð1Þ
In Eq. (1), T is the transpose, w is the feature weight vector, zi ¼ j xi NMðxi Þj j xi NHðxi Þj and j j is an element-wise absolute operator. zi can be considered as the mapping point of xi . wT zi is the local margin for xi , which belongs to hypothesis margin [12] and an intuitive interpretation of this margin is a measure of the proportion of the features in xi that can be corrupted by noise (or how much xi can “move” in the feature space) before xi is being misclassified [10]. In other words, the feature weighting based on local learning is like to scale each feature, and thus obtains a weighted feature space parameterized by a vector w, so that a local margin-based loss function in the induced feature space is minimized. Thus by the large margin theory [13], a classifier trained on weighted feature space that minimizes a margin-based loss function usually generalizes well on unseen test data. Moreover, in order to prevent from overfitting, the regularization is always used. Thus, the evaluation criterion for feature weighting on the training data set D is defined as follows: Lðw; DÞ ¼
n 1X LðwT zi Þ þ λRðwÞ; ni¼1
ð2Þ
where λ is the cost parameter balancing the importance of the two terms, RðwÞ in (2) is a regularizing term. Then feature selection aims to find the target model w, which minimizes the loss function in Eq. (2). Then we obtain the feature selection algorithm based on local learning shown in Algorithm 1. Note that, as an example, the gradient descent algorithm is used to illustrate the minimization of evaluation function Eq. (2). Of course, the optimal feature weights can be found by many other optimization approaches. Algorithm 1. Feature WEighting algorithm based on Local Learning-FWELL. Step 1.
Input training data set D regularization parameter
Step 2. Step 3.
¼ fxi ; yi gni¼ 1 ,
λ in Eq. (2).
d
xi A R and
Initialize w ¼ ð1; 1; …; 1Þ A Rd . For i ¼ 1; 2; …; n (a) Given xi , find the NHðxi Þ and NMðxi Þ. (b) Based on Eq. (1) to obtain LðwT zi Þ
For privacy measure, we adopt ε-differential privacy model [8], which is a measure of quantifying the privacy-risk associated with computing functions of sensitive data. A statistical procedure satisfies ε-differential privacy if changing a single data point does not shift the output distribution by too much. Therefore, from the output of the algorithm, it is difficult to infer the value of any particular data point [5]. And ε-differential privacy model is robust to known attacks, such as those involving side information [15]. ε-differential privacy model is a strong, cryptographically-motivated definition of privacy that has recently received a significant amount of research attention, such as differentially private empirical risk minimization for classification and regression [4–7]. Definition 1. A randomized mechanism A provides ε-differential privacy, if, for all data sets D and D0 which differ by at most one element, and for all output subsets S D RangeðAÞ: Pr ½AðDÞ A S r expðεÞ Pr AðD0 Þ A S ð4Þ The probability Pr is taken over the coin tosses of A, and Range(A) denotes the output range of A. The privacy parameter ε measures the disclosure. When data sets which are identical except for a single entry are input to the algorithm A, the two distributions on the algorithm's output are close. That is, any single entry of the data set does not affect the output. This means that an adversary, who knows all but one entry of the data set, cannot gain much additional information about this entry by observing the output of the algorithm. So the privacy of this entry is preserved. In other words, suppose D ¼ fðx1 ; y1 Þ; …; ðxn ; yn Þg and D0 ¼ fðx1 ; y1 Þ; …; ðxn0 ; yn0 Þg be two data sets that differ in the value of the nth individual. The two distributions on the differentially private algorithm A's output are close. Then an adversary, who knows all but the nth entry of the data set, cannot gain much additional information about this entry by observing the output of the algorithm.
4. Differentially private feature selection based on output perturbation 4.1. Sensitivity analysis In order to propose privacy preserving FWELLs in terms of the differential privacy in Eq. (4), we like to adopt the output perturbation and objective perturbation strategy. In this section, we will present the differentially private FWELL with output perturbation. This algorithm depends on the FWELL's sensitivity. In general, the sensitivity is always defined as follows [5,16]. Definition 2. For any function A with n inputs, we define the sensitivity ΔQ as the maximum, over all inputs, of the difference in the value of A when one input of A is changed. That is,
zi Þ (c) ∇ ¼ 1n ∂Lðw þ λ∂RðwÞ ∂w ∂w . ∇ (d) w ¼ w ‖∇‖2 . T
Step 4.
3. Privacy model
ΔQ ¼ max0 J AðDÞ AðD0 Þ J
Output the feature weighting vector w.
ð5Þ
D;D
In the following analysis and experiments, the L2 regularizer is used as RðwÞ in Eq. (2) for its rotational invariance and strong stability property [14]. Then the concrete evaluation criterion considered in this paper is as follows: n 1X LðwT zi Þ þ λ‖w‖2 : ni¼1
Data sets D and D0 differ by at most one element. According to Definition 2, we can analyze the sensitivity of FWELL with L2 regularizer and obtain Corollary 1.
ð3Þ
Corollary 1. The feature weighting algorithm described in Algorithm 1 (FWELL) with L2 regularizer has the sensitivity 2=λn.
And the gradient descent algorithm is used to minimize the evaluation function (3) to obtain the feature weights as described in Algorithm 1 with name FWELL.
Proof. Let D ¼ fðx1 ; y1 Þ; …; ðxn ; yn Þg and D0 ¼ fðx1 ; y1 Þ; …; ðxn0 ; yn0 Þg be two data sets that differ in the value of the nth individual. Suppose w1 and w2 are the solutions respectively to FWELL when
Lðw; DÞ ¼
Y. Li et al. / Neurocomputing 174 (2016) 1107–1115
data sets are D and D0 : ( w1 ¼ arg minw Lðw; DÞ; w2 ¼ arg minw Lðw; D0 Þ
Step 1. ð6Þ
Input set D ¼ fxi ; yi gni¼ 1 , regularization parameter
in Eq. (2), privacy parameter ε and parameter a. Obtain w according to Algorithm 1. Draw a random noise vector b with density vðbÞ based on the sensitivity analysis in Corollary 1.
Step 2. Step 3.
Then according to Definition 2, the sensitivity of FWELL is the upper bound of J w1 w2 J . We define a function ℓðwÞ as ℓðwÞ ¼ Lðw; D0 Þ Lðw; DÞ:
∇Lðw2 ; DÞ þ ∇ℓðw2 Þ ¼ 0
ð8Þ
Since logistic loss and L2 regularizer are 1-strongly convex, then L satisfies λ-strongly convex [5]. We obtain ð∇Lðw1 ; DÞ ∇Lðw2 ; DÞÞT ðw1 w2 Þ Z λ J w1 w2 J 2
ð9Þ
while based on Eq. (8) and ∇Lðw1 ; DÞ ¼ 0, ð∇Lðw1 ; DÞ ∇Lðw2 ; DÞÞT ðw1 w2 Þ ¼ ðw1 w2 Þð∇ℓðw2 ÞÞT
ð10Þ
According to Cauchy–Schwartz inequality, we obtain J w1 w2 J J ∇ℓðw2 Þ J Zðw1 w2 Þð∇ℓðw2 ÞÞT
ð11Þ
So, combining Eqs. (9)–(11) J w1 w2 J J ∇ℓðw2 Þ J Z λ J w1 w2 J 2
ð12Þ
Compute w0 ¼ w þb. Output w0 .
Step4. Step 5.
For Output-FWELL, the following Theorem 1 is obtained. Theorem 1. Output-FWELL is ε-differentially private. Proof. We know that D and D0 are any two data sets that differ in one individual. For w0 derived from Output-FWELL on D and D0 , we obtain w0 ¼ w1 þ b1 and w0 ¼ w2 þ b2 where w1 and w2 are unique outputs of FWELL on data sets D and D0 respectively, b1 and b2 are the corresponding noise vectors in Output-FWELL for D and D0 . Then Prðw0 jDÞ vðb1 Þ ¼ eðnελ=2Þð J b2 J J b1 J Þ ð19Þ ¼ Prðw0 D0 Þ vðb2 Þ where Prðw0 jDÞðPrðw0 D0 ÞÞ is the probability of the output of 0 Output-FWELL at w when the input is data set D ðD0 Þ. Since w1 þ b1 ¼ w2 þ b2 leads to b2 b1 ¼ w1 w2 , we obtain the following equation using a triangle inequality: J b2 J J b1 J r J b2 b1 J ¼ J w1 w2 J
Namely, 1 J w1 w2 J r J ∇ℓðw2 Þ J
λ
ð13Þ
Suppose that zn and zn0 only depend on xn and xn0 respectively, and based on Eqs. (3) and (7), for any w, we can approximately obtain 1 ℓðwÞ ¼ ðLðwT zn0 Þ LðwT zn ÞÞ n
ð14Þ
λ
1 vðbÞ ¼ e ðnελ=2Þ J b J : a
ð7Þ
Because logistic loss and L2 regularizer are adopted, functions L and ℓ are continuous and differentiable. Since the w1 and w2 are minimizers of Lðw; DÞ and Lðw; D0 Þ, respectively, and they are obtained through gradient descent, then their gradients are equal to zero, i.e., ∇Lðw1 ; DÞ ¼ 0 and ∇Lðw2 ; D0 Þ ¼ 0. Based on Eq. (7), we can obtain ℓðw2 Þ ¼ Lðw2 ; D0 Þ Lðw2 ; DÞ, then
1109
ð20Þ
Combining Eqs. (18)–(20), we can achieve Prðw0 jDÞ r eε □ Prðw0 D0 Þ
ð21Þ
Therefore, Output-FWELL provides terms of Definition 1.
ε-differential privacy in
According to Eq. (1), for any point z LðwT zÞ ¼ log ð1 þ expð wT zÞÞ Then for the normalized J z J r1 z 1 r zJ r 1 J ∇LðwT zÞ J ¼ T T 1 þ expðw zÞ 1 þ expðw zÞ
ð15Þ
ð16Þ
Based on Eqs. (14) and (16), we can achieve 1 1 J ∇ℓðwÞ J ¼ J ∇LðwT zn0 Þ ∇LðwT zn Þ J r ð J ∇LðwT zn0 Þ J n n 2 þ J ∇LðwT zn Þ J Þ r n 2
λn
□
Furthermore, we like to propose another privacy preserving feature selection algorithm with the noise adding to the objective function in FWELL. That is, we can minimize Lðw; DÞ ¼
ð17Þ
So J ∇ℓðw2 Þ J is less than 2=n. Based on Eq. (13), we can obtain J w1 w2 J r
5. Differentially private feature selection based on objective perturbation
ð18Þ
n 1X 1 T LðwT zi Þ þ λ‖w‖2 þ b w; ni¼1 n
ð22Þ
to obtain the another privacy preserving feature selection algorithm Objective-FWELL as described in Algorithm 3. Note that, as an example, the gradient descent algorithm is also used to Table 1 Description of experimental data sets.
4.2. Differential privacy feature selection Output-FWELL
Data sets
No. samples
No. features
Now we are at position to present the differentially private feature selection for FWELL based on output perturbation, which is named as Output-FWELL and described in Algorithm 2.
BASEHOCK Breast Soybean Sonar Waveform WDBC
1993 286 307 208 3343 569
4863 9 34 60 20 30
Algorithm 2. Differentially private feature selection OutputFWELL.
1110
Y. Li et al. / Neurocomputing 174 (2016) 1107–1115
illustrate the minimization of evaluation function Eq. (22). For Objective-FWELL, we also can obtain Theorem 2.
(a) Given xi , find the NHðxi Þ and NMðxi Þ. (b) Based on Eq. (1) to obtain LðwT zi Þ
Algorithm 3. Differentially private feature selection ObjectiveFWELL.
zi Þ (c) ∇ ¼ 1n ∂Lðw þ 2λw þ bn. ∂w ∇ (d) w ¼ w ‖∇‖2 . T
Step 1.
Input set D ¼ fxi ; yi gni¼ 1 , regularization parameter
Step 2.
Initialize w ¼ ð1; 1; …; 1Þ A Rd . Draw a random noise vector b with density vðbÞ
in Eq. (2) and privacy degree ε. Step 3.
Step 5.
λ
Theorem 2. Objective-FWELL is ε-differentially private. Proof. Let D ¼ fðx1 ; y1 Þ; …; ðxn ; yn Þg and D0 ¼ fðx1 ; y1 Þ; …; ðxn0 ; yn0 Þg be two data sets that differ in the value of the nth individual. For w0 derived from Objective-FWELL on D and D0 , we know that there exist unique b1 and b2 on data sets D and D0 respectively that map inputs to output w0 . This uniqueness holds, because both the
vðbÞ ¼ e ðε=2Þ J b J : For i ¼ 1; 2; …; n
0.92 Classification accuracy
Classification accuracy
0.9
0.85
0.8 Objective−FWELL Output−FWELL FWELL
0.75 −8
−7
−6 −5 −4 −3 −2 −1 Privacy parameter log(ε)
0
0.9
0.88 Objective−FWELL Output−FWELL FWELL
0.86 −8
−7
−6 −5 −4 −3 −2 −1 Privacy parameter log(ε)
1 Classification accuracy
Classification accuracy
0.8
0.75
0.7 Objective−FWELL Output−FWELL FWELL
0.65 −8
0
Breast
BASEHOCK
−7
−6 −5 −4 −3 −2 −1 Privacy parameter log(ε)
0
0.98 0.96 0.94 0.92 Objective−FWELL Output−FWELL FWELL
0.9 0.88 −8
−7
−6 −5 −4 −3 −2 −1 Privacy parameter log(ε)
Sonar
0
Soybean Classification accuracy
0.86 Classification accuracy
Step 4.
Output the feature weighting vector w0 ¼ w.
0.84 0.82 0.8 Objective−FWELL Output−FWELL FWELL
0.78 0.76 −8
−7
−6 −5 −4 −3 −2 −1 Privacy parameter log(ε)
0
0.94 0.93 0.92 0.91 Objective−FWELL Output−FWELL FWELL
0.9 0.89 −8
−7
−6 −5 −4 −3 −2 −1 Privacy parameter log(ε)
Waveform Fig. 1. Experimental results on 3NN classifier for different privacy degrees.
WDBC
0
Y. Li et al. / Neurocomputing 174 (2016) 1107–1115
1111
Classification accuracy
Classification accuracy
0.92 0.96 0.94 0.92 0.9 0.88 Objective−FWELL Output−FWELL FWELL
0.86 0.84 −8
−7
−6 −5 −4 −3 −2 −1 Privacy parameter log(ε)
0
0.9
0.88 Objective−FWELL Output−FWELL FWELL
0.86 −8
−7
−6 −5 −4 −3 −2 −1 Privacy parameter log(ε)
0
Breast
BASEHOCK Classification accuracy
Classification accuracy
0.74 0.72 0.7 0.68 Objective−FWELL Output−FWELL FWELL
0.66 −8
−7
−6 −5 −4 −3 −2 −1 Privacy parameter log(ε)
0
0.97
0.96
0.95 Objective−FWELL Output−FWELL FWELL
0.94 −8
Sonar
−7
−6 −5 −4 −3 −2 −1 Privacy parameter log(ε)
0
Soybean Classification accuracy
Classification accuracy
0.88 0.86 0.84 0.82 Objective−FWELL Output−FWELL FWELL
0.8 0.78 −8
−7
−6 −5 −4 −3 −2 −1 Privacy parameter log(ε)
0
0.94 0.935 0.93 Objective−FWELL Output−FWELL FWELL
0.925
0.92 −8 −7 −6 −5 −4 −3 −2 −1 Privacy parameter log(ε)
Waveform
0
WDBC
Fig. 2. Experimental results on SVM classifier for different privacy degrees.
Table 2 F-Measure value for different data sets on 3NN classifier.
Table 3 F-Measure value for different data sets on SVM classifier.
Data sets
Output-FWELL
Objective-FWELL
Data sets
Output-FWELL
Objective-FWELL
BASEHOCK Breast Soybean Sonar Waveform WDBC
0.54 0.56 0.57 0.54 0.54 0.57
0.53 0.56 0.58 0.51 0.55 0.57
BASEHOCK Breast Soybean Sonar Waveform WDBC
0.56 0.56 0.57 0.51 0.55 0.57
0.56 0.56 0.57 0.51 0.54 0.57
1112
Y. Li et al. / Neurocomputing 174 (2016) 1107–1115
1
0.9 0.8 0.7 0.6 Objective−FWELL Output−FWELL FWELL
0.5 0.4 1
1215
2430
3645
Classification accuracy
Classification accuracy
1
0.95
0.9
0.85
Objective−FWELL Output−FWELL FWELL
0.8 1
4860
The numbe of selected features
2
4
6
8
The numbe of selected features
Breast
BASEHOCK
0.8
0.7
Objective−FWELL Output−FWELL FWELL
0.6
0.5 1
15
30
45
Classification accuracy
Classification accuracy
0.9
60
0.95 0.9 0.85 0.8 0.75 0.7
Objective−FWELL Output−FWELL FWELL
0.65 1
The numbe of selected features
8
17
26
34
The numbe of selected features
Sonar
Soybean
0.95 0.9 0.85 0.8 0.75 0.7
Objective−FWELL Output−FWELL FWELL
0.65 1
5 10 15 20 The numbe of selected features
Classification accuracy
Classification accuracy
0.95
0.9
0.85
0.8
0.75 1
Objective−FWELL Output−FWELL FWELL 6 12 18 24 The numbe of selected features
Waveform
WDBC
Fig. 3. Experimental results on 3NN classifier for different numbers of selected features.
regularization function and the loss functions are differentiable everywhere. Then Prðw0 jDÞ vðb1 Þ ¼ eðε=2Þð J b2 J J b1 J Þ ¼ Prðw0 D0 Þ vðb2 Þ
ð23Þ
Then for the normalized J zJ r 1 and based on Eq. (16) J b1 b2 J r2 Now, we use a triangle inequality: J b1 J 2 r J b2 J r J b1 J þ 2
In our case, the LðwT zi Þ is defined in Eq. (1). Since w0 is the value that minimizes the evaluation function (22) on data sets D and D0 respectively, the derivatives of both objective functions at w0 on D and D0 are 0, i.e. ∇Lðw0 ; DÞ ¼ 0 and ∇Lðw0 ; D0 Þ ¼ 0. This implies that b1 b2 ¼
zn zn0 1 þ expðw0T zn Þ 1 þ expðw0T zn0 Þ
ð24Þ
ð25Þ
ð26Þ
Therefore, combining Eqs. (23) and (26), we obtain Prðw0 jDÞ reε □ Prðw0 D0 Þ
ð27Þ
So, our algorithm Objective-FWELL is ε-differentially private in terms of Definition 1.
Y. Li et al. / Neurocomputing 174 (2016) 1107–1115
6. Experiments In this section, we will give some experimental results for our proposed differentially private feature selection algorithms Output-FWELL and Objective-FWELL. Since there exist few privacy preserving feature selection algorithms, we only show the results for the proposed algorithms. The whole experiments consist of three parts: one is to validate the effect of privacy parameter ε, another is to evaluate the trade-off between the privacy preservation and classification accuracy, and the last one is to show the classification performance for different selected feature subsets under a given privacy degree. The classifiers used in the
experiment are linear support vector machine (SVM) with C ¼1 and the 3-nearest neighbor classifier ð3NNÞ. Classification accuracy was assessed using a 10-fold cross-validation. For each fold, the proposed FWELL, Output-FWELL and Objective-FWELL algorithms are applied to the training part of the data to obtain the feature weighting results, and the features are ranked in descending order. Then different numbers of important features are selected with top ranks one by one to create classifiers. In all experiments, the parameter λ in our proposed methods are tuned by crossvalidation. Six bench-mark data sets from UCI repository (http:// archive.ics.uci.edu/ml/datasets.html) are used in our experiments. The data sets are summarized in Table 1.
0.98
0.9 0.8 0.7 0.6 Objective−FWELL Output−FWELL FWELL
0.5 0.4 1
Classification accuracy
1 Classification accuracy
1113
0.96 0.94 0.92 0.9
0.86 1
1215 2430 3645 4860 The numbe of selected features
Objective−FWELL Output−FWELL FWELL
0.88
2 4 6 The numbe of selected features
Breast
BASEHOCK 1
0.75
0.7
0.65
Objective−FWELL Output−FWELL FWELL 15
30
45
Classification accuracy
Classification accuracy
0.8
1
8
0.98 0.96 0.94 0.92
0.88 1
60
Objective−FWELL Output−FWELL FWELL
0.9
The numbe of selected features
8
17
26
34
The numbe of selected features
Sonar
Soybean
0.95 0.9 0.85 0.8 0.75 0.7 0.65 1
Objective−FWELL Output−FWELL FWELL 5 10 15 20 The numbe of selected features
Waveform
Classification accuracy
Classification accuracy
1 0.95 0.9 0.85 0.8 0.75 1
Objective−FWELL Output−FWELL FWELL 6 12 18 24 The numbe of selected features
WDBC
Fig. 4. Experimental results on SVM classifier for different numbers of selected features.
1114
6.1. Experiments for parameter
Y. Li et al. / Neurocomputing 174 (2016) 1107–1115
ε
For the first part of our experiment, we study the privacy preserving degree ε. The number of selected features for training classifier is 10 percent of original feature dimensions. The privacy degree for Output-FWELL and Objective-FWELL is quantified by the value of ε. The increasing of ε implies a higher change in the belief of adversary when one entry in D changes, and thus lower privacy preserving. The experimental results on six data sets are shown in Fig. 1 using 3NN classifier, and in Fig. 2 using SVM. The Xaxis is the log value of selected privacy parameter ε and the Y-axis is the classification accuracy using different classifiers. From the results, we can observe that in the case of no privacypreserving, i.e., ε ¼1, the Output-FWELL and Objective-FWELL consistently obtain similar classification accuracy to the FWELL's. The performance of Output-FWELL and Objective-FWELL will drop along with the decline of ε, i.e., the data privacy preserving degree is increasing. These results are consistent with our intuition.
algorithms based on local learning and differential privacy are proposed and analyzed in theory. We also conduct many experiments to validate their performance on different bench-mark data sets and classifiers. In the experiments, the results for different privacy preserving degree ε and different numbers of selected features under privacy constraint are shown. Our experiments as well as theoretical results indicate that in general, the proposed algorithms Output-FWELL and Objective-FWELL can obtain high performance under some privacy constraint, and it can preserve data privacy to some extent. In the paper, we are interested in producing feature selection algorithms in a manner that preserves the privacy of individual entries in training data set, not to choose feature subset with less privacy leakage and high classification performance. The latter case needs some prior or domain knowledge to determine the privacy degree for each feature. However, the combination of them is our future work.
Acknowledgment 6.2. Evaluation of tradeoff between privacy degree and accuracy In the second part of our experiment, we study the tradeoff between the privacy preserving degree and the classification accuracy based on F-Measure. F-Measure has been used to measure the trade-off between stability and classification accuracy of a feature selection algorithm in [17]. Specifically, we define F Measure ¼ 2 privacy accuracy=ðprivacy þ accuracyÞ where privacy is represented by normalized absolute log value of selected privacy parameter ε, accuracy is evaluated using the classification outcome based on the selected features. The number of selected features for training classifier is also 10 percent of original feature dimensions. The experimental results of average F-Measure value for different privacy degrees on six data sets are shown in Tables 2 and 3. The results have shown that the Output-FWELL and Objective-FWELL have the similar trade-off between privacy degree and classification accuracy for different classifiers and data sets. 6.3. Experiments for differentially private feature selection For the third part of our experiment, we study the classification accuracy when the classifiers are trained with different numbers of selected features from ranked feature set. And the privacy parameter for Output-FWELL and Objective-FWELL is constant, i.e., ε ¼0.01. The experimental results for FWELL, Output-FWELL and Objective-FWELL on six data sets are shown in Fig. 3 for 3NN classifier, and shown in Fig. 4 for SVM classifier. The X-axis is the number of selected features and the Y-axis is the classification accuracy using different classifiers. From the results, we can observe that the performance of privacy preserving feature selection algorithms, i.e., Output-FWELL and Objective-FWELL, are very close to the non-privacy preserving algorithm FWELL in most cases. Then our proposed differentially private feature selection algorithms can obtain approximate classification performance to non-privacy preserving feature selection algorithm under the constraint of certain privacy, such as with ε ¼0.01. This means Output-FWELL and Objective-FWELL are effective and efficient. Furthermore, we also observe that the performance of Objective-FWELL is closer to FWELL than OutputFWELL under certain privacy constraint in most cases.
7. Conclusions In this paper, we study the problem of privacy preserving feature selection, and two privacy preserving feature selection
This research was partially supported by National Natural Science Foundation of China (NSFC 61073114, 61300165 and 61300164), Natural Science Foundation of Jiangsu Province (BK20131378, BK20140885), Post-doctoral Foundation of Jiangsu Province (1401045C) and Foundation of Key Laboratory of Cloud Computing & Complex System (15206).
Appendix A. Supplementary data Supplementary data associated with this paper can be found in the online version at http://dx.doi.org/10.1016/j.neucom.2015.10.038.
References [1] H. Liu, L. Yu, Toward integrating feature selection algorithms for classification and clustering, IEEE Trans. Knowl. Data Eng. 17 (4) (2005) 491–502. [2] M. Dash, H. Liu, Feature Selection for Classification, Intelligent Data Analysis 1 (1997) 131–156. [3] Z. Zhao, Spectral feature selection for mining ultrahigh dimensional data (Ph. D. dissertation), Arizona State University, 2010. [4] K. Chaudhuri, C. Monteleoni, Privacy-preserving logistic regression, in: Advances in Neural Information Processing Systems, 2008, pp. 289–296. [5] K. Chaudhuri, C. Monteleoni, A.D. Sarwate, Differentially private empirical risk minimization, J. Mach. Learn. Res. 12 (2011) 1069–1109. [6] D. Kifer, A. Smith, A. Thakurta, Private convex empirical risk minimization and high-dimensional regression, J. Mach. Learn. Res. 23 (2012) 1–40. [7] P. Jain, A. Thakurta, (Near) Dimension independent risk bounds for differentially private learning, in: Proceedings of 31th International Conference on Machine Learning, 2014, pp. 1–9. [8] C. Dwork, Differential privacy, in: International Colloquium on Automata, Languages and Programming, 2006, pp. 1–12. [9] J. Yang, Y. Li, Differentially private feature selection, in: International Joint Conference on Neural Networks, 2014, pp. 4182–4189. [10] Y.J. Sun, S. Todorovic, S. Goodison, Local learning based feature selection for high dimensional data analysis, IEEE Trans. Pattern Anal. Mach. Intell. (2010) 1610–1626. [11] M.K. Tan, I.W. Tsang, L. Wang, Minimax sparse logistic regression for very high dimensional feature selection, IEEE Trans. Neural Netw. Learn. Syst. 24 (2013) 1609–1622. [12] K. Crammer, R.G. Bachrach, A. Navot, N. Tishby, Margin analysis of the lvq algorithm, in: Advances in Neural Information Processing Systems, La Jolla CA, 2002. [13] R.E. Schapire, Y. Freud, P. Bartlett, W.S. Lee, Boosting the margin: a new explanation for the effectiveness of voting methods, Ann. Stat. 26 (1998) 1651–1686. [14] A.Y. Ng, Feature selection, l1 vs. l2 regularization, and rotational invariance, in: Proceedings of International Conference on Machine Learning, Banff, Canada, 2004. [15] S.R. Ganta, S.P. Kasiviswanathan, A. Smith, Composition attacks and auxiliary information in data privacy, in: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2008, pp. 265– 273.
Y. Li et al. / Neurocomputing 174 (2016) 1107–1115
[16] C. Dwork, F. McSherry, K. Nissim, A. Smith, Calibrating noise to sensitivity in private data analysis, in: Proceedings of the Third Conference on Theory of Cryptography, 2006, pp. 265–284. [17] Y. Saeys, T. Abeel, Y.V. de Peer, Robust feature selection using ensemble feature selection techniques, in: Proceedings of the European Conference on Machine Learning, 2008, pp. 313–325.
Yun Li received the Ph.D. degree in Computer Science from Chongqing University, Chongqing, China. He is a Professor in the College of Computer Science, Nanjing University of Posts and Telecommunications, China. Prior to that, he was the postdoctoral fellow in Department of Computer Science and Engineering, Shanghai Jiao Tong University, China. His research mainly focuses on machine learning, data mining and parallel computing.
Jun Yang is a Master's degree student in Computer Science, Nanjing University of Posts and Telecommunications, China. His research interest is in machine learning.
1115 Wei Ji received Ph.D. degree in Communication and Information System from Shanghai Jiao Tong University, China, in 2009. She is an Associate Professor in College of Telecommunications and Information Engineering, Nanjing University of Posts and Telecommunications, China. Her research interests are in machine learning and its application in signal processing.