Accepted Manuscript
A new k-harmonic nearest neighbor classifier based on the multi-local means Zhibin Pan , Yidi Wang , Weiping Ku PII: DOI: Reference:
S0957-4174(16)30515-2 10.1016/j.eswa.2016.09.031 ESWA 10895
To appear in:
Expert Systems With Applications
Received date: Revised date:
22 June 2016 18 September 2016
Please cite this article as: Zhibin Pan , Yidi Wang , Weiping Ku , A new k-harmonic nearest neighbor classifier based on the multi-local means, Expert Systems With Applications (2016), doi: 10.1016/j.eswa.2016.09.031
This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
ACCEPTED MANUSCRIPT
Highlights •
K different multi-local means based on the k-nearest neighbors are employed.
•
The harmonic mean distance is firstly introduced in the KNN classification problems. The classification error rates can be significantly reduced.
•
Less sensitive to the choice of neighborhood size k.
•
Easily designed in practice with a little extra computation complexity.
AC
CE
PT
ED
M
AN US
CR IP T
•
1
ACCEPTED MANUSCRIPT
A new k-harmonic nearest neighbor classifier based on the multi-local means Zhibin Pan*, Yidi Wang, Weiping Ku
Jiaotong University, Xi’an, 710049, P.R. China. E-mail address:
[email protected] (Zhibin Pan),
[email protected] (Yidi Wang),
AN US
[email protected] (Weiping Ku) Complete postal address:
School of Electronic and Information Engineering Xi’an Jiaotong University No.28, Xianning West Road
M
Xi’an, 710049
ED
P.R. China.
Correspondence information:
PT
E-mail address:
CR IP T
Affiliation address: School of Electronic and Information Engineering, Xi’an
[email protected] (Zhibin Pan)
CE
Telephone number:
+86-2982665459 (Office) / +86-13201718936 (Mobil) Complete postal address:
AC
Zhibin Pan
School of Electronic and Information Engineering Xi’an Jiaotong University No.28, Xianning West Road Xi’an, 710049 P.R. China.
2
ACCEPTED MANUSCRIPT September 18, 2016
List of Responses
Dear Editor and Reviewers:
AN US
CR IP T
Thank you very much for your notifying letter and for reviewers’ comments concerning our manuscript entitled “A new k-harmonic nearest neighbor classifier based on the multi-local means” (ESWA-D-16-02320R1). These comments are all very valuable and very helpful for revising and improving our paper. We have considered the comments carefully and have made corrections accordingly, which we hope meet with the requirements of Expert Systems with Applications. Revised portions are marked in red in the paper. The corrections and the responses to the reviewers’ comments are as following:
M
Reviewer #1:
PT
ED
1. Comment: The following is my second review of the paper titled, "A new k-harmonic nearest neighbor classifier based on the multi-local means"-ESWA-D-16-02320R1. The authors have addressed my previous concerns. However, the language and grammar also require some work, and I noted a large number of typographical errors. The paper needs a linguistic check, preferably by a native speaker.
CE
Response:
AC
We thank the reviewer for his/her recognition. We have completed manuscript revision and corrections including English usage and have the manuscript proof-read by a native English-speaking colleague at University College London. The revised portions are marked in red in the paper. A letter from Mr. P.V. Amadori who has assisted us with English editing and suggested extensive modifications to improve the clarity and readability of our manuscript is attached in the following on Page 4.
3
ACCEPTED MANUSCRIPT
Reviewer #2: 2. Comment: In the revised manuscript, much of the comments addressed in the earlier revision cycle are properly addressed. However, the paper still requires a language editing. I still realize several typos through the text (For instance, "In this paper, in order to design a local men-based nearest neighbor " (in page 8). So, please carefully check the paper once again.
CR IP T
Response: We thank the reviewer for his/her recognition. Considering the reviewer’s comment very carefully, we have already checked the full manuscript for typos and grammatical errors in this revised version of the paper. For example, the error mentioned in comment (on Page 8) is corrected, and the new sentence is changed to
AN US
"In this paper, in order to design a local mean-based nearest neighbor ".
CE
PT
ED
M
Furthermore, to achieve a better readability and clarity of the manuscript, we have the manuscript linguistic checked and proof-read by a native English-speaking colleague, Mr. P.V. Amadori, at University College London. Following his extensive suggestions and modifications for language editing of our manuscript, we have completed manuscript revision and the revised portions are marked in red in the paper. A letter from Mr. P.V. Amadori is also attached in the following on Page 4.
AC
We appreciate for editor/reviewers’ work earnestly, and hope that the corrections in this revised version will meet with the requirements of Expert Systems with Applications. Once again, thank you very much for your comments and suggestions.
Sincerely yours, Zhibin Pan, Yidi Wang and Weiping Ku
4
AC
CE
PT
ED
M
AN US
CR IP T
ACCEPTED MANUSCRIPT
5
ACCEPTED MANUSCRIPT
A new k-harmonic nearest neighbor classifier based on the multi-local means
CR IP T
Zhibin Pan*, Yidi Wang, Weiping Ku School of Electronic and Information Engineering, Xi’an Jiaotong University Xi’an, 710049, P.R. China
AN US
Abstract
The k-nearest neighbor (KNN) rule is a classical and yet very effective nonparametric technique in pattern classification, but its classification performance
M
severely relies on the outliers. The local mean-based k-nearest neighbor classifier (LMKNN) was firstly introduced to achieve robustness against outliers by computing
ED
the local mean vector of k nearest neighbors for each class. However, its performances
PT
suffer from the choice of the single value of k for each class and the uniform value of k for different classes. In this paper, we propose a new KNN-based classifier, called
CE
multi-local means-based k-harmonic nearest neighbor (MLM-KHNN) rule. In our
AC
method, the k nearest neighbors in each class are first found, and then used to compute k different local mean vectors, which are employed to compute their harmonic mean distance to the query sample. Finally, MLM-KHNN proceeds in classifying the query sample to the class with the minimum harmonic mean distance. The experimental results, based on twenty real-world datasets from UCI and KEEL repository, demonstrated that the proposed MLM-KHNN classifier achieves lower
1
ACCEPTED MANUSCRIPT
classification error rate and is less sensitive to the parameter k, when compared to nine related competitive KNN-based classifiers, especially in small training sample size situations. Key words: Local mean; k-nearest neighbor; Harmonic mean distance; Pattern
CR IP T
classification; Small training sample size
1. Introduction
AN US
The k-nearest neighbor (KNN) rule (Cover, & Hart, 1967) is one of the most famous classification techniques due to its simplicity, effectiveness and intuitiveness (Xia, Mita, & Shibata, 2016; Xu, et al., 2013; Jiang, Pang, Wu, & Kuang, 2012). It
M
assigns the class label to the query sample that appears most frequently in its k nearest neighbors through a majority vote. The KNN classifier has been widely studied and
ED
extensively applied in practice (Rodger, 2014&2015), thanks to its several attractive
PT
properties. In fact, the KNN rule works as a nonparametric technique, which does not require a priori knowledge about the probability distribution of the classification
CE
problem (Li, Chen, & Chen, 2008). Such property is particularly important, especially
AC
for cases where the Gaussian distributions of the samples are difficult to assume, as for small training sample size situations (Zhang, Yang, & Qian, 2012). Additionally, the classification performance of the KNN rule only relies on the method of the used distance metric and one parameter k (Garcia-Pedrajas, Del-Castillo, & Cerruela-garcia, 2015; Hu, Yu, & Xie, 2008), which represents the neighborhood size of the query sample. Finally, it has been proven that the KNN rule can asymptotically approach the
2
ACCEPTED MANUSCRIPT
optimal classification performance achieved by the Bayes method under the constraint of 𝑘/𝑁 → 0 (Wagner, 1971), where N is the total number of training samples. While the KNN rule has many significant advantages, some problems still exist. The first problem is that all the k nearest neighbors are equally considered when
CR IP T
assigning a class label to the query sample via a simple majority vote. Obviously, this is not reasonable when the k nearest neighbors are very different in their distances to the query sample and some closer nearest neighbors seem to be more important
AN US
(Bailey, & Jain, 2010). In order to tackle this problem, several distance-weighted voting methods (Mateos-García, García-Gutiérrez, & Riquelme-Santos, 2016; Gou, Xiong, & Kuang, 2011) for KNN rule have been developed, where larger weights are
M
given to the closer nearest neighbors. However, such approach is not always correct, as some farther neighbors may be more important for classification. Accordingly, a
ED
number of adaptive metric nearest neighbor classifications were developed (Yang,
PT
Wei, & Tao, 2013; Weinberger, & Saul, 2009). The second problem is that the KNN rule cannot properly classify the query sample when attribute data are similar to the
CE
training samples from different classes (Liu, Pan, & Dezert, 2013). In fact, the number
AC
of the nearest neighbors from different classes may be similar in the k-nearest neighbors of the query, hence causing the KNN rule to incorrectly assign a class label. Accordingly, a number of fuzzy classifications and the belief-based classifications have been derived in order to allow the queries to belong to different classes with masses of belief, hence reducing the classification errors ( Liu, Pan, Dezert, & Mercier, 2014; Sarkar, 2007). The third problem is that the classification performance
3
ACCEPTED MANUSCRIPT
of the KNN rule severely relies on the distance function used to compute the distance between query sample and training sample. As a result, several global and local feature weighting-based distance metric learning methods (Chen, & Gou, 2015; Lin, Li, Lin, & Chen, 2014) were proposed to improve the performance of the KNN
CR IP T
classifier. It is also known that nonparametric classifiers suffer from outliers, especially in the situations of small training sample size (Zhang, Yang, & Qian, 2012; Mitani, &
AN US
Hamamoto, 2006). That is one reason why the classification performance of KNN rule is heavily influenced by the neighborhood size k (Bhattacharya, Ghosh, & Chowdhury, 2015; Wang, Neskovic, & Cooper, 2006). In fact, if k is very small, the
M
classification decision can be poor due to some noisy and imprecise samples (Liu et al., 2013). On the contrary, a large value of k can lead to degradation in the
ED
classification performance because of the existing outliers in the k nearest neighbors
PT
that come from the wrong classes. In order to design a practical classifier that is more robust to outliers, a simple nonparametric classifier named local mean-based k-nearest
CE
neighbor (LMKNN) classifier was proposed in (Mitani et al., 2006). Because of the
AC
effectiveness and easy design of the LMKNN classifier, its core idea has been successfully applied to many other improved methods (Gou et al., 2014; Sansudin, & Bradley, 2010; Zeng, Yang, & Zhao, 2009&2010). Even though the LMKNN classifier can be easily designed in practice with a good classification performance, it generally suffers from the choice of the single value of k for each class and the uniform value of k for different classes. Thus, in order to design
4
ACCEPTED MANUSCRIPT
a classifier with better classification performance and less sensitivity to the neighborhood size k, we propose a new KNN-based classifier, which is called the multi-local means-based k-harmonic nearest neighbor (MLM-KHNN) classifier. The MLM-KHNN classifier mainly holds two key properties when compared with the
1)
CR IP T
LMKNN classifier: The MLM-KHNN classifier employs as many as k multi-local means for each class instead of a single local mean, as for the LMKNN classifier, to
2)
AN US
reduce the sensitivity to the choice of neighborhood size k.
The MLM-KHNN classifier for the first time introduces the harmonic mean distance as the similarity measure, hence reducing the error rate by focusing
M
on the more reliable local means in different classes instead of a uniform value of k in all classes, as in the LMKNN classifier.
ED
The rest of paper is organized as follows. In Section 2, we briefly describe the
PT
state-of-the-art k-nearest neighbor classifiers, focusing on the approaches based on the LMKNN rule. The preliminaries of both the LMKNN rule and the harmonic mean
CE
distance for the classification problems are given in Section 3. In Section 4, we
AC
propose our MLM-KHNN classifier. Extensive experiments to compare the proposed approach with other competitive KNN-based classifiers on UCI (Merz & Murphy, 1996) and KEEL (Alcalá-Fdez et al., 2011) real-world datasets are conducted in Section 5. Finally, discussions and conclusions are given in Section 6 and Section 7, respectively.
5
ACCEPTED MANUSCRIPT
2. Related work In this section, we briefly review some related state-of-the-art KNN-based classifiers, with a particular focus on the LMKNN classifier (Mitani et al., 2006) and its extensions.
CR IP T
The traditional KNN classifier is a simple and powerful technique in pattern classification, but one major problem is that its performance severely relies on outliers. The LMKNN classifier was firstly introduced to tackle this problem and enhance the
AN US
classification performance. In the LMKNN rule, the local mean vector is firstly computed according to all the k nearest neighbors in each class, and then the query sample is assigned to the class with the minimum Euclidean distance between the
M
query and local mean vector. However, the LMKNN classifier selects the single value
misclassification.
ED
of k for each class and the uniform value of k for all classes, which may lead to
PT
To further improve the classification performance of the LMKNN classifier, several different local mean-based k-nearest neighbor classifiers have been proposed.
CE
The pseudo k-nearest neighbor classifier (PNN) (Zeng, Yang, & Zhao, 2009) was
AC
developed to address the problem caused by the choice of a single value of k. In the PNN classifier, the sum of the weighted distances between the query sample and its k nearest neighbors in each class is used as the similarity measure between the query and the pseudo training sample. Based on the basic ideas of both the LMKNN classifier and PNN classifier, the local mean-based pseudo k-nearest neighbor classifier (LMPNN) was proposed in (Gou et al., 2014) and showed promising
6
ACCEPTED MANUSCRIPT
classification performance. In the LMPNN classifier, the pseudo distance is obtained by combining the weighted distance between the query sample and each local mean vector in different classes. Additionally, some other LMKNN-based classifiers were also successfully applied to different areas, such as the local mean and class
CR IP T
mean-based k-nearest neighbor classifier (Zeng, Yang, & Zhao, 2010) and the nearest neighbor group-based classifier (Sansudin, & Bradley, 2010).
Additional approaches were also developed in order to address the problems
AN US
caused by outliers. The mutual k-nearest neighbor classifier (MKNN) (Liu, & Zhang, 2012) represents a simple and powerful technique to find the more reliable nearest neighbors through noisy data elimination procedure. In the MKNN rule, only the
M
training sample that also regards the query as one of its k nearest neighbors will be selected as the mutual nearest neighbor. The coarse to fine k-nearest neighbor
ED
classifier (CFKNN) (Xu et al., 2013) is another successful method for selecting the
PT
nearest neighbors from the view of optimally representing the query sample. It first coarsely selects the close training samples in terms of a small number, and then finely
CE
determines the k nearest neighbors of the query sample. Through the coarse-to-fine
AC
procedure, the CFKNN classifier can accurately classify with less redundant information and outliers in the obtained training dataset. Another popular research direction of the KNN rule resides in the study of fuzzy
k-nearest neighbor-based algorithms, which led to several works (Rodger, 2014; Liu, Pan, Dezert, & Mercier, 2014; Sarkar, 2007; Kelly, Gray & Givens, 1985) that exploit the fuzzy uncertainty to enhance the classification result of the KNN rule. Among
7
ACCEPTED MANUSCRIPT
them, the fuzzy-rough nearest neighbor classifier (FRNN) (Sarkar, 2007) showed to be able to obtain richer class confidence values with promising classification results without the need to know the optimal value of k. Moreover, a hybrid k-nearest neighbor classifier (HBKNN) was proposed in (Yu, Chen, Liu & You, 2016), proving
CR IP T
the suppleness and effectiveness of combining the fuzzy membership in the fuzzy KNN classifier and the local information in the LMKNN classifier.
In this paper, in order to design a local mean-based nearest neighbor classifier
AN US
with better classification performance and less sensitivity to the neighborhood size k, we propose a new KNN-based classifier, which is called multi-local means-based k-harmonic nearest neighbor (MLM-KHNN) rule. Instead of employing only one
M
local mean vector based on a fixed single value of k in each class as in the LMKNN rule, we improve it by computing all k different local mean vectors based on the k
ED
nearest neighbors in each class. Additionally, as the local sample distribution in each
PT
class is different, the value of k to obtain the nearest local mean vector is not always the same for all classes. To take into account the importance of the nearer local mean
CE
vector to the query sample under different values of k in each class, the harmonic
AC
mean distance of the k multi-local mean vectors to the query sample is introduced for the first time.
3. Preliminaries In this section, we briefly describe the LMKNN classifier and the harmonic mean distance for classification problems.
8
ACCEPTED MANUSCRIPT
3.1 The LMKNN classifier The local mean-based k-nearest neighbor (LMKNN) rule (Mitani et al., 2006) is a simple, effective and robust nonparametric classifier. Mitani and Hamamoto have demonstrated that it can improve the classification performance and also reduce the
𝑁
CR IP T
influence of existing outliers, especially in small training sample size situations. 𝑡𝑟 Given a training sample set 𝑇𝑟 = *(𝑦𝑖 , 𝑐𝑖 )+𝑖=1 with 𝑁𝑡𝑟 training samples
𝑦𝑖 ∈ 𝑅 𝐷 in a D-dimensional feature space from M classes, 𝑐𝑖 is the corresponding 𝑗
𝑗
𝑁
AN US
𝑗 class label of 𝑦𝑖 , where 𝑐𝑖 ∈ *𝜔1 , 𝜔2 , ⋯ , 𝜔𝑀 +. Let 𝑇𝑟𝑗 = *(𝑦𝑖 , 𝑐𝑖 )+𝑖=1 denote the
training sample set of class 𝜔𝑗 , where 𝑁𝑗 is the number of the training samples in class 𝜔𝑗 and ⋃𝑀 𝑗=1 𝑇𝑟𝑗 = 𝑇𝑟. Instead of finding the original k nearest neighbors in the whole training dataset Tr as in the KNN rule, the LMKNN rule is designed to
M
employ the local mean vector of k nearest neighbors for each class of 𝑇𝑟𝑗 to classify
ED
the query sample. In the LMKNN rule, a given query sample 𝑥 ∈ 𝑅 𝐷 is classified
PT
into class 𝜔𝑐 by the following steps: Step 1. Find the k nearest neighbors of x from the 𝑇𝑟𝑗 of each class 𝜔𝑗 .
CE
𝑁𝑁 𝑁𝑁 𝑘 Let 𝑁𝑁𝜔𝑘𝑗 (𝑥) = *(𝑦𝑖,𝑗 , 𝑐𝑖,𝑗 )+𝑖=1 denote the set of the k nearest neighbors of
AC
𝑁𝑁 𝑁𝑁 the query sample x in class 𝜔𝑗 , where (𝑦𝑖,𝑗 , 𝑐𝑖,𝑗 ) is computed from 𝑇𝑟𝑗
for each class and arranged in an ascending order according to the Euclidean 𝑇
𝑁𝑁 𝑁𝑁 𝑁𝑁 distance measure, i.e., 𝑑(𝑥, 𝑦𝑖,𝑗 ) = √(𝑥 − 𝑦𝑖,𝑗 ) (𝑥 − 𝑦𝑖,𝑗 ).
𝑘 Step 2. Compute the local mean vector 𝑚𝜔 of class 𝜔𝑗 by using the k nearest 𝑗
neighbors in the set 𝑁𝑁𝜔𝑘 𝑗 (𝑥). 1
𝑁𝑁 𝑘 𝑚𝜔 = 𝑘 ∑𝑘𝑖=1 𝑦𝑖,𝑗 𝑗
9
(1)
ACCEPTED MANUSCRIPT
Step 3.
Classify x into the class 𝜔𝑐 when the Euclidean distance between its local 𝑘 mean vector 𝑚𝜔 and x is the minimum among the M classes: 𝑗 𝑘 𝜔𝑐 = arg min𝜔𝑗 𝑑(𝑥, 𝑚𝜔 ) , 𝑗 = 1,2, … , 𝑀 𝑗
(2)
Note that the LMKNN classifier is equivalent to the 1-NN classifier when k=1,
CR IP T
and the meaning of parameter k is totally different from that in KNN rule. In the KNN rule, the k nearest neighbors are chosen from the whole training dataset Tr, while the LMKNN rule employs the local mean vector of k nearest neighbors in each class 𝑇𝑟𝑗 .
AN US
Instead of the majority vote in k nearest neighbors, the LMKNN rule aims at finding the class with the closest local region to the query sample, which can effectively overcome the negative effect of outliers by computing the local mean vector of k
M
nearest neighbors for each class.
ED
3.2 The harmonic mean distance
PT
The harmonic mean distance is introduced to measure the distance between a pair of point groups. We briefly describe the concept of harmonic average, in order to
CE
clarify the rationale behind its usage in the proposed method.
AC
Given a dataset with k elements *𝑦1 , 𝑦2 , ⋯ , 𝑦𝑘 +, its harmonic average is defined as
𝐻𝐴(*𝑦1 , 𝑦2 , ⋯ , 𝑦𝑘 +) =
𝑘 1 𝑦𝑖
∑𝑘 𝑖=1
(3)
Note that if one element in 𝑦1 , 𝑦2 , ⋯ , 𝑦𝑘 is small, their harmonic average can be very small. In other words, the value of 𝐻𝐴(*𝑦1 , 𝑦2 , ⋯ , 𝑦𝑘 +) relies more on the element with a smaller value in *𝑦1 , 𝑦2 , ⋯ , 𝑦𝑘 +. 10
ACCEPTED MANUSCRIPT
The basic idea of harmonic mean distance is to take the sum of the harmonic average of the Euclidean distances between one given data point and each data point in another point group. In this paper, we first apply the harmonic mean distance to the KNN-based classification problem. The harmonic mean distance, which is defined as
CR IP T
𝐻𝑀𝐷(∙), is used to measure the distance between a query sample x and its related training sample group. For example, given a query sample x and its k nearest neighbors set 𝑁𝑁𝑘 (𝑥) = *(𝑦𝑖𝑁𝑁 , 𝑐𝑖𝑁𝑁 )+𝑘𝑖=1 from the training sample set 𝑇𝑟 = 𝑁
AN US
𝑡𝑟 *(𝑦𝑖 , 𝑐𝑖 )+𝑖=1 , the harmonic mean distance 𝐻𝑀𝐷(𝑥, *𝑦𝑖𝑁𝑁 +𝑘𝑖=1 ) between x and the set
𝑁𝑁𝑘 (𝑥) can be obtained as
𝐻𝑀𝐷(𝑥, *𝑦𝑖𝑁𝑁 +𝑘𝑖=1 ) =
𝑘
(4)
1 𝑑.𝑥,𝑦𝑁𝑁 𝑖 /
∑𝑘 𝑖=1
M
In order to highlight the differences between the arithmetic mean distance and
in the following.
ED
the harmonic mean distance used in the proposed method, their comparisons are given
PT
The arithmetic mean distance between x and its related k nearest neighbors *𝑦𝑖𝑁𝑁 +𝑘𝑖=1 , which is denoted as 𝐴𝑀𝐷(𝑥, *𝑦𝑖𝑁𝑁 +𝑘𝑖=1 ), can be expressed as the weighted
CE
𝑘
sum of 𝑑(𝑥, 𝑦𝑖𝑁𝑁 ), 𝑖 = 1,2, ⋯ , 𝑘, as in Eq. (5), while the value of
𝜕𝐴𝑀𝐷(𝑥,{𝑦𝑖𝑁𝑁 }𝑖=1 ) 𝜕𝑑(𝑥,𝑦𝑖𝑁𝑁 )
AC
represents the weight of 𝑑(𝑥, 𝑦𝑖𝑁𝑁 ) to compute the final value of 𝐴𝑀𝐷(𝑥, *𝑦𝑖𝑁𝑁 +𝑘𝑖=1 ). 𝑘
As shown in Eq. (6), it can be proven that
𝜕𝐴𝑀𝐷(𝑥,{𝑦𝑖𝑁𝑁 }𝑖=1 ) 𝜕𝑑(𝑥,𝑦𝑖𝑁𝑁 )
1
= 𝑘 is always true, which
means that the 𝑑(𝑥, 𝑦𝑖𝑁𝑁 ), 𝑖 = 1,2, ⋯ , 𝑘, are considered equally important in the arithmetic mean distance. 𝐴𝑀𝐷(𝑥, *𝑦𝑖𝑁𝑁 +𝑘𝑖=1 )
=
11
𝑁𝑁 ∑𝑘 𝑖=1 𝑑(𝑥,𝑦𝑖 )
𝑘
(5)
ACCEPTED MANUSCRIPT
𝜕𝐴𝑀𝐷(𝑥,*𝑦𝑖𝑁𝑁 +𝑘 𝑖=1 ) 𝜕𝑑(𝑥,𝑦𝑖𝑁𝑁 )
=
∑𝑘 𝑑(𝑥,𝑦𝑁𝑁 𝑖=1 𝑖 )] 𝜕[ 𝑘
𝜕𝑑(𝑥,𝑦𝑖𝑁𝑁 )
=
1
(6)
𝑘
𝑘
𝜕𝐻𝑀𝐷(𝑥,{𝑦𝑖𝑁𝑁 }𝑖=1 )
However, the value of
𝜕𝑑(𝑥,𝑦𝑖𝑁𝑁 )
, which represents the weight of
𝑑(𝑥, 𝑦𝑖𝑁𝑁 ) in computing the 𝐻𝑀𝐷(𝑥, *𝑦𝑖𝑁𝑁 +𝑘𝑖=1 ) , is totally different, because 𝜕𝑑(𝑥,𝑦𝑖𝑁𝑁 )
is inversely proportional to the value of 𝑑 2 (𝑥, 𝑦𝑖𝑁𝑁 ) as shown in Eq.
CR IP T
𝜕𝐻𝑀𝐷(𝑥,*𝑦𝑖𝑁𝑁 +𝑘 𝑖=1 )
(7). Compared with the arithmetic mean distance, the harmonic mean distance focuses more on the influence of the sample that has a closer distance to the query sample x.
AN US
Besides, it can be indicated that if 𝑦𝑖𝑁𝑁 in *𝑦𝑖𝑁𝑁 +𝑘𝑖=1 has a small distance to x, the value of 𝐻𝑀𝐷(𝑥, *𝑦𝑖𝑁𝑁 +𝑘𝑖=1 ) will be small.
=
1 𝑖=1𝑑.𝑥,𝑦𝑁𝑁/ 𝑖
∑𝑘
𝜕𝑑(𝑥,𝑦𝑖𝑁𝑁 )
ED
𝜕𝐻𝑀𝐷.𝑥,*𝑦𝑖𝑁𝑁 +𝑘 𝑖=1 / 𝜕𝑑(𝑥,𝑦𝑖𝑁𝑁 )
𝑘
M
𝜕,
𝑘
2
𝐻𝑀𝐷 (𝑥,*𝑦𝑁𝑁 𝑖 +𝑖=1 ) 2
𝑘×𝑑 (𝑥,𝑦𝑁𝑁 𝑖 )
1 2 ) 1 𝑘 𝑑(𝑥,𝑦𝑖𝑁𝑁 )×∑𝑖=1 𝑑.𝑥,𝑦𝑁𝑁 𝑖 / 𝑘
2
1 𝐻𝑀𝐷 (𝑥,*𝑦𝑁𝑁 𝑖 +𝑖=1 ) = × 2 𝑘 𝑑 (𝑥,𝑦𝑁𝑁 )
(7)
𝑖
CE
PT
=
=𝑘×(
4. The proposed MLM-KHNN classifier
AC
In this section, we describe the proposed multi-local means-based k-harmonic
nearest neighbor (MLM-KHNN) classifier. The goal of the MLM-KHNN rule is to improve the classification performance and reduce the sensitivity to the single value of neighborhood size k for each class and the uniform value of k for all classes in the LMKNN rule. The mechanism of the LMKNN rule is to assign the query sample the class label 12
ACCEPTED MANUSCRIPT
of its most similar local subclass from different classes, where the local mean vector is used as the representation of the local subclass. Obviously, the local subclasses are obtained based on the k nearest neighbors in each class. Thus, the choice of parameter k is of great importance to generate the local mean vector that can better represent its
CR IP T
own class. However, there are two main problems for the choice of the parameter k in the LMKNN rule, which may lead to misclassification. Firstly, only a fixed single value of k is employed in each class, which may lead to the high sensitivity of the
AN US
local mean to the value of k. If k is too small, the useful classification information may be insufficient, whereas a large value of k can easily lead to outliers to be included in the k nearest neighbors of the true class (Gou et al., 2014). Secondly, a
M
uniform value of k is employed in all classes. Since the local sample distributions in different classes are quite different, the value of k for selecting the most similar local
ED
subclass of the query for each class is usually very different as well. Therefore, it is
PT
unreasonable to use the same value of k for all classes, as for the LMKNN rule. From Eq. (1), we have:
𝑇
CE
𝑘 𝑘 𝑑 .𝑥, 𝑚𝜔 / = √.𝑥 − 𝑚𝜔 / 𝑗 𝑗 1
𝑘 .𝑥 − 𝑚𝜔 / 𝑗 𝑇
1
AC
𝑁𝑁 𝑁𝑁 = √(𝑥 − 𝑘 ∑𝑘𝑖=1 𝑦𝑖,𝑗 ) (𝑥 − 𝑘 ∑𝑘𝑖=1 𝑦𝑖,𝑗 ) 1
2
1
𝑁𝑁 𝑁𝑁 𝑁𝑁 = 𝑘 √‖∑𝑘𝑖=1(𝑥 − 𝑦𝑖,𝑗 )‖ = 𝑘 ‖(𝑥 − 𝑦1,𝑗 ) + ⋯ + (𝑥 − 𝑦𝑘,𝑗 )‖ (8)
where ‖∙‖ denotes the Euclidean distance. From Eq. (8), it can be seen that the 𝑁𝑁 difference vector of x and each sample 𝑦𝑖,𝑗 in 𝑁𝑁𝜔𝑘𝑗 (𝑥) is considered when 𝑘 computing the distance of x and the local mean vector 𝑚𝜔 . Taking Fig. 1(a) as an 𝑗
example,
when
𝑁𝑁 𝑁𝑁 𝑁𝑁 𝑁𝑁 ‖(𝑥 − 𝑦1,2 ) + (𝑥 − 𝑦2,2 ) + (𝑥 − 𝑦3,2 )‖ < ‖(𝑥 − 𝑦1,1 )+
13
ACCEPTED MANUSCRIPT
𝑁𝑁 𝑁𝑁 𝑁𝑁 𝑁𝑁 (𝑥 − 𝑦2,1 ) + (𝑥 − 𝑦3,1 )‖ and (𝑥 − 𝑦3+𝑖,2 )>(𝑥 − 𝑦3+𝑖,1 ), i=1,2, ⋯, it is obvious that
the classification result is very sensitive when using a fixed single value of k=3. In this example, the query x will be correctly classified into class 𝜔1 when k=4, but it will get the wrong classification result when k=3. Actually, the class 𝜔2 only has a
CR IP T
very few samples with quite close distances to the query sample x. A similar example is also shown in Fig. 1(b), where a large value of k can easily lead to misclassification due to the outliers existing in the k nearest neighbors in the true class of the query
AN US
𝑁𝑁 𝑁𝑁 sample. As shown in Fig. 1(b), the outlier samples 𝑦7,1 and 𝑦8,1 in class 𝜔1 pull 𝑘 away its local mean vector 𝑚𝜔 from x due to the unsuitable selection of the value of 1
neighborhood size k. Thus, it can be inferred that a fixed single value of k in the
M
LMKNN rule may limit its classification performance.
Additionally, in terms of the diversity of the local sample distribution in different
ED
classes, the value of k to obtain the nearest local mean vector of the query sample x
PT
for each class may be quite different. However, the LMKNN classifier chooses a uniform value of k for all classes, which may result in selecting extra outliers in some
CE
classes and inadequate number of nearest neighbors in other classes. In other words, it
AC
may not always select the effective subclass to represent its own class because the uniform value of k ignores the difference of local sample distribution in different classes, hence leading to misclassification.
14
AN US
CR IP T
ACCEPTED MANUSCRIPT
AC
CE
PT
ED
M
(a) The query pattern x from class 𝜔1 is misclassified to 𝜔2 when k is unsuitably small in LMKNN rule
(b) The query pattern x from class 𝜔1 is misclassified to 𝜔2 when k is unsuitably large in LMKNN rule
Fig. 1. The misclassification examples of two-class classification case in the LMKNN rule
In order to solve this problem, we propose a new KNN-based classifier, called multi-local means-based k-harmonic nearest neighbor (MLM-KHNN) rule. In the proposed method, k different local mean vectors based on the k nearest neighbors in 15
ACCEPTED MANUSCRIPT
each class are firstly computed. Clearly, the k multi-local mean vectors in each class have different distances to the query sample, and the nearer local mean vector is more important to represent its own class for classification. In other words, we want to focus more on the values of k that can find a closer local subclass, where the values of
CR IP T
k in each class can be totally different. To do so, we introduce the harmonic mean distance between the group of the multi-local mean vectors in each class and the query sample x to measure their similarity and finally classify the query sample to the
AN US
class with the minimum harmonic mean distance. The proposed MLM-KHNN classifier achieves lower classification error rate and is less sensitive to the neighborhood size k when compared with LMKNN and other related KNN-based
M
classifiers.
The MLM-KHNN classifier, as a new version of the KNN rule and LMKNN rule,
ED
is presented below. The proposed method essentially shows two significant
PT
advantages when compared with the LMKNN rule. First, the k local mean vectors based on the top 𝑟 (1 ≤ 𝑟 ≤ 𝑘) nearest neighbors are all computed and used after the
CE
k nearest neighbors have been found in each class, which are called as multi-local
AC
means. Second, the harmonic mean distance between the group of k multi-local mean vectors and the query sample in each class is introduced to measure similarity for a more accurate classification, which takes into the influence of different values of k for classification for different classes. 𝑁
𝑡𝑟 Let 𝑇𝑟 = *(𝑦𝑖 , 𝑐𝑖 )+𝑖=1 be a training sample set with 𝑁𝑡𝑟 training samples from
M classes, where 𝑦𝑖 ∈ 𝑅 𝐷 and 𝑐𝑖 is the corresponding class label of 𝑦𝑖 , 𝑐𝑖 ∈
16
ACCEPTED MANUSCRIPT
𝑗
𝑗
𝑁
𝑗 *𝜔1 , 𝜔2 , ⋯ , 𝜔𝑀 +. For class 𝜔𝑗 , suppose that 𝑇𝑟𝑗 = *(𝑦𝑖 , 𝑐𝑖 )+𝑖=1 denotes the training
sample set of class 𝜔𝑗 with 𝑁𝑗 training samples. In the MLM-KHNN rule, the class label 𝜔𝑐 of a given query sample 𝑥 ∈ 𝑅 𝐷 is obtained as follows: Step 1. For each class 𝜔𝑗 , find the k nearest neighbors of x denoted by 𝑁𝑁𝜔𝑘𝑗 (𝑥) =
CR IP T
𝑁𝑁 𝑁𝑁 𝑘 *(𝑦𝑖,𝑗 , 𝑐𝑖,𝑗 )+𝑖=1 from 𝑇𝑟𝑗 , which are then sorted in an ascending order
according to their Euclidean distances to x.
Step 2. Compute k multi-local mean vectors based on the top r (1 ≤ 𝑟 ≤ 𝑘) nearest
AN US
𝑘 (𝑥) neighbors of x from 𝑇𝑟𝑗 in each class 𝜔𝑗 . For class 𝜔𝑗 , let 𝑚𝜔 = 𝑗 𝑟 𝑘 *𝑚 ̅𝜔 + denote the k multi-local mean vectors 𝑗 𝑟=1 1
𝑁𝑁 𝑟 𝑚 ̅𝜔 = 𝑟 ∑𝑟𝑖=1 𝑦𝑖,𝑗 𝑗
(9)
𝑟 Note that 𝑚 ̅𝜔 ∈ 𝑅 𝐷 and their corresponding Euclidean distances to x are 𝑗
M
2 𝑘 denoted by 𝑑(𝑥, 𝑚 ̅ 1𝜔𝑗 ), 𝑑(𝑥, 𝑚 ̅𝜔 ), ⋯ , 𝑑(𝑥, 𝑚 ̅𝜔 ). 𝑗 𝑗
ED
Step 3. For each class 𝜔𝑗 , compute the harmonic mean distance between x and k 𝑘
𝑘 (𝑥) 𝑟 multi-local mean vectors 𝑚𝜔 = {𝑚 ̅𝜔 } 𝑗 𝑗
PT
𝑟=1
𝑘 (𝑥)) 𝐻𝑀𝐷(𝑥, 𝑚𝜔 = 𝑗
obtained from Step 2.
𝑘 ∑𝑘 𝑟=1
1
(10)
CE
𝑟 𝑑.𝑥, ̅ 𝑚𝜔 / 𝑗
AC
Step 4. Assign x to the class 𝜔𝑐 that has the minimum harmonic mean distance to x in terms of Eq. (11). 𝑘 (𝑥) 𝜔𝑐 = arg min𝜔𝑗 , 𝐻𝑀𝐷(𝑥, 𝑚𝜔 )𝑗
(11)
Note that when k=1, 𝑚1𝜔𝑗 (𝑥) only has one local mean vector which is equal to 𝑁𝑁 𝑦1,𝑗 , and the harmonic mean distance of x and 𝑚1𝜔𝑗 (𝑥) is computed by 𝑁𝑁 𝐻𝑀𝐷(𝑥, 𝑚1𝜔𝑗 (𝑥)) = 𝑑(𝑥, 𝑚 ̅ 1𝜔𝑗 ) = 𝑑(𝑥, 𝑦1,𝑗 ). Thus, the MLM-KHNN rule degrades
17
ACCEPTED MANUSCRIPT
Algorithm 1. The proposed MLM-KHNN classifier Input: x: query sample; k: the neighborhood size; 𝑁
𝑡𝑟 𝑇𝑟 = *(𝑦𝑖 , 𝑐𝑖 )+𝑖=1 : the training dataset;
𝑗
𝑗
𝑁
𝑗 𝑇𝑟𝑗 = *(𝑦𝑖 , 𝑐𝑖 )+𝑖=1 : the training dataset of class 𝜔𝑗 with 𝑁𝑗 training samples;
CR IP T
M: the number of classes; 𝜔1 , 𝜔2 , ⋯ , 𝜔𝑀 : the class labels. Output: 𝜔𝑐 : the classification result of the query sample x. Procedures:
AN US
Step 1: Find the k nearest neighbors of x in each class 𝜔𝑗 sorted in an ascending order according to their Euclidean distances to x as: 𝑁𝑁 𝑁𝑁 𝑘 𝑁𝑁𝜔𝑘𝑗 (𝑥) = *(𝑦𝑖,𝑗 , 𝑐𝑖,𝑗 )+𝑖=1
𝑟 Step 2: Compute the Euclidean distance 𝑑(𝑥, 𝑚 ̅𝜔 ) between the local mean vector 𝑗
PT
ED
for j=1 to M do for r=1 to k do
M
𝑟 𝑚 ̅𝜔 of the top r nearest neighbors of x in 𝑁𝑁𝜔𝑘𝑗 (𝑥) for each class 𝜔𝑗 as: 𝑗
𝑟
𝑟 𝑚 ̅𝜔 𝑗
1 𝑁𝑁 = ∑ 𝑦𝑖,𝑗 𝑟 𝑖=1
𝑇
𝑟 𝑟 / .𝑥 − 𝑚 𝑟 / 𝑑(𝑥, 𝑚 ̅𝜔 ) = √.𝑥 − 𝑚 ̅𝜔 ̅𝜔 𝑗 𝑗 𝑗
CE
end for end for
AC
𝑘 (𝑥)) Step 3: Compute the harmonic mean distance 𝐻𝑀𝐷(𝑥, 𝑚𝜔 between x and the 𝑗 𝑘
𝑘 (𝑥) 𝑟 k-multi local mean vectors in 𝑚𝜔 = {𝑚 ̅𝜔 } 𝑗 𝑗
𝑟=1
for each class 𝜔𝑗 as:
for j=1 to M do 𝑘 (𝑥)) 𝐻𝑀𝐷(𝑥, 𝑚𝜔 = 𝑗
𝑘 ∑𝑘 𝑟=1
1
𝑟 𝑑.𝑥, ̅ 𝑚𝜔 / 𝑗
end for Step 4: Assign x to the class 𝜔𝑐 with the minimum harmonic mean distance as: 𝑘 (𝑥) 𝜔𝑐 = arg min, 𝐻𝑀𝐷(𝑥, 𝑚𝜔 )𝑗 𝜔𝑗
18
ACCEPTED MANUSCRIPT
into the LMKNN rule and the KNN rule when k=1, and all of them have the same classification performances as the 1-NN classifier. The pseudo codes of the MLM-KHNN classifier are summarized in detail in Algorithm 1.
CR IP T
5. Experiments In contrast with the LMKNN classifier (Mitani et al., 2006) and the classical KNN classifier (Cover et al., 1967), the proposed MLM-KHNN classifier does not choose
AN US
the local mean vectors using a fixed single value of neighborhood size k. Instead, as many as the k multi-local mean vectors based on the k nearest neighbors in each class are employed, and the harmonic mean distance between the group of multi-local mean
M
vectors in each class and query sample is also utilized to emphasize the importance of closer local mean vectors in classification.
ED
To validate the proposed method on the classification performance, we compare
PT
the MLM-KHNN classifier with the standard KNN classifier and other eight competitive KNN-based classifiers: KNN (Cover et al., 1967), WKNN (Dudani,
CE
1976), LMKNN (Mitani et al., 2006), PNN (Zeng et al., 2009), LMPNN (Gou et al.,
AC
2014), MKNN (Liu & Zhang, 2012), CFKNN (Xu et al., 2013), FRNN (Sarkar, 2007), HBKNN (Yu et al., 2016). The comparative experiments are extensively conducted on ten UCI (Merz & Murphy, 1996) and ten KEEL (Alcalá-Fdez et al., 2011) real-world datasets in terms of error rate, which is one of the most important measures in pattern classification. In our experiments, the parameter k is optimized by the cross-validation (CV) (Toussaint, 1974) approach for each classifier, with the same approach described
19
ACCEPTED MANUSCRIPT
in (Mitani et al., 2006; Zeng et al., 2009; Gou et al., 2014). In order to better evaluate the sensitivity of the proposed classifier to the neighborhood size k, comparative experiments of the classification performance with varying neighborhood size k are
CR IP T
also conducted.
5.1 Datasets
In this subsection, we briefly summarize the selected datasets considered in our
AN US
experiments. The twenty real-world datasets are all taken from the UCI machine-learning repository and the KEEL repository, which are Air, Balance, German, Glass, Ionosphere, Landsat, Monk-2, Optigits, Phoneme, Ring, Saheart,
M
Spambase, Segment, Tae, Texture, Thyroid, Vehicle, Vote, Vowel and Wine, respectively. Among these twenty real-world datasets, they hold quite different
ED
characteristics in numbers of samples, attributes and classes, which are listed in Table
PT
1. The numbers of samples of these datasets are characterized by a wide range and vary from 151 to 7400 in order to comprehensively validate the proposed method.
CE
Since we are interested in the classification performance of the KNN-based
AC
classifiers when considering small training sample size, the size of training set needs to be determined in advance. The same approach as in (Mitani et al., 2006; Zeng et al., 2009; Gou et al., 2014) is adopted in our experiments. For each dataset, we randomly chose a training set that contains approximately 30% of the data and the others are chosen as testing data, as shown in Table 1. The experiments are repeated 50 times for a cross-validation and the average error rate with 95% confidence interval over these
20
ACCEPTED MANUSCRIPT
50 repetitions is reported.
Table 1
Dataset description of twenty real-world datasets from UCI and KEEL repository. Samples
Attributes
Classes
Training set
Air
UCI
359
64
3
120
Balance
UCI
625
4
3
200
German
UCI
1000
24
2
300
Glass
UCI
214
9
6
70
Ionosphere
UCI
351
34
Landsat
UCI
2000
36
Monk-2
KEEL
432
6
Optigits
KEEL
5620
64
Phoneme
KEEL
5404
5
Ring
KEEL
7400
20
Saheart
KEEL
462
9
Segment
UCI
2310
Spambase
KEEL
4597
Tae
KEEL
151
Texture
KEEL
5500
Thyroid
KEEL
7200
Vehicle
UCI
846
Vote
UCI
435
KEEL
Wine
UCI
117
6
700
2
144
10
1900
2
1800
2
2400
2
154
18
7
770
57
2
1500
5
3
50
40
11
1800
21
3
2400
18
4
282
16
2
145
990
13
11
330
178
13
3
60
AN US
2
PT
ED
Vowel
CR IP T
Database
M
Dataset
CE
5.2 Experiments on real-world datasets 5.2.1 Results of the classification performance
AC
As previously stated, the classification performance of the proposed MLM-KHNN
classifier is compared to KNN, WKNN, LMKNN, PNN, LMPNN, MKNN, CFKNN, FRNN and HBKNN rules over twenty real-world datasets of UCI and KEEL by means of the error rate. The WKNN rule is a famous distance-weighted k-nearest neighbor classifier where larger weights are given to the neighbors with a smaller distance to the query sample. The LMKNN rule employs the local mean vector in 21
ACCEPTED MANUSCRIPT
each class to classify the query sample, as described in Section 3.1. Based on the LMKNN rule, the PNN and LMPNN rule were both successfully developed in order to obtain a better classification performance. The PNN rule firstly attempts to obtain the pseudo nearest neighbor in each class, and then assigns the label of closest pseudo
CR IP T
nearest neighbor to the query sample, whereas the LMPNN rule integrates the ideas of both LMKNN and PNN method. The MKNN rule is based on the application of the mutual nearest neighbor to obtain more reliable nearest neighbors. The CFKNN rule
AN US
uses the coarse-to-fine strategy to obtain a revised training dataset that can better represent the query sample. The FRNN rule is a famous fuzzy-based k-nearest neighbor classifier with richer class confidence values based on the fuzzy-rough
M
ownership function. Finally, the HBKNN rule combines the fuzzy membership in the fuzzy k-nearest neighbor classifier and the similar local information in the LMKNN
ED
classifier.
PT
The experiments are carried out by a 50-trials holdout procedure, where for each trial, the dataset is randomly divided into the testing samples and the training samples
CE
under the size shown in Table 1. In our experiments, the value of neighborhood size k
AC
for each classifier is firstly optimized by the CV approach in the training set, and then the average error rate with 95% confidence with the optimized parameter k is used for the final classification performance evaluation.
22
ACCEPTED MANUSCRIPT
Table 2 Data
The error rate (%) of each classifier with the corresponding 95% confidence interval on twenty real-world datasets. WKNN
LMKNN
PNN
LMPNN
MKNN
CFKNN
FRNN
HBKNN
Air
21.44±1.32
21.44±1.32
20.50±1.06
21.44±1.32
18.95±1.14
20.73±1.37
24.54±1.25
27.33±1.06
22.36±1.54
18.66±1.11
Balance
13.11±0.72
13.26±0.69
11.20±0.80
12.95±0.67
12.54±0.64
13.02±0.73
12.02±0.64
13.33±0.81
13.77±0.65
10.71±0.74
German
30.74±0.42
30.42±0.54
30.90±0.61
30.75±0.50
31.10±0.66
30.56±0.57
29.64±0.63
28.58±0.77
31.50±0.65
30.33±0.77
Glass
35.80±1.43
34.38±1.65
35.80±1.43
34.72±1.59
34.03±1.26
33.92±1.42
36.83±1.28
36.49±2.05
33.49±1.58
33.03±1.42
Ionosphere
16.35±1.04
15.62±1.00
14.06±1.26
14.83±1.08
14.23±0.96
14.64±1.04
12.09±0.67
16.54±1.00
15.17±1.23
13.03±1.05
Landsat
13.36±0.33
12.87±0.37
12.18±0.38
12.74±0.34
11.56±0.36
11.94±0.26
Monk-2
5.17±0.75
6.09±1.01
4.62±0.49
4.60±0.80
4.44±0.54
4.65±0.77
Optigits
1.99±0.36
1.86±0.51
1.49±0.37
1.88±0.59
1.60±0.28
1.78±0.51
Phoneme
13.15±0.28
13.01±0.34
13.18±0.34
12.99±0.35
12.68±0.32
13.06±0.43
Ring
28.08±0.39
28.08±0.39
9.59±0.42
28.08±0.39
8.71±0.41
17.65±0.32
Saheart
8.30±1.80
8.12±1.74
8.30±1.80
8.30±1.80
7.66±1.79
8.05±1.68
Segment
6.83±0.27
6.83±0.27
6.59±0.26
6.72±0.32
6.00±0.27
6.83±0.27
Spambase
22.76±0.45
21.74±0.33
20.72±0.25
21.13±0.23
19.13±0.24
Tae
56.44±2.37
56.38±2.48
56.32±2.20
56.28±2.06
Texture
2.16±0.13
2.08±0.20
1.84±0.20
Thyroid
6.54±0.19
6.37±0.21
Vehicle
34.01±0.68
Vote
CR IP T
KNN
MLM-KHNN
16.34±0.36
12.15±0.40
11.43±0.40
8.35±1.25
7.21±1.65
2.88±0.52
4.42±0.52
1.96±0.56
2.95±0.67
2.48±0.59
1.44±0.31
26.60±0.45
17.39±0.28
12.65±0.35
12.43±0.27
23.49±0.46
31.16±0.54
31.08±0.42
7.93±0.35
12.48±2.12
7.41±1.81
12.37±1.69
7.41±1.68
11.62±0.35
6.33±0.30
5.93±0.30
5.80±0.27
22.35±0.43
21.36±0.27
21.38±0.27
21.25±0.24
17.84±0.32
55.14±2.50
56.43±2.37
60.40±2.01
54.45±2.46
55.74±3.00
53.16±2.39
1.97±0.13
1.65±0.17
1.84±0.17
11.97±0.28
4.81±0.14
3.19±0.21
1.57±0.15
6.23±0.25
6.42±0.23
6.10±0.20
6.18±0.21
8.33±0.31
7.39±0.27
6.77±0.25
6.02±0.20
33.08±0.79
30.73±0.73
33.08±0.85
29.50±0.75
32.61±0.78
29.57±0.76
34.45±1.08
31.81±0.86
28.01±0.73
8.88±0.52
8.93±0.60
7.22±0.63
9.00±0.65
7.21±0.53
8.38±0.45
7.60±0.66
8.01±0.68
8.03±0.65
6.69±0.57
Vowel
16.56±0.76
16.56±0.76
16.56±0.76
16.56±0.76
16.56±0.76
16.38±0.77
18.02±0.94
22.83±0.95
15.36±0.79
16.09±0.86
Wine
23.33±2.34
23.33±2.34
23.33±2.34
23.33±2.34
23.27±2.47
23.22±2.34
22.78±2.80
15.89±1.50
22.97±1.46
21.39±2.77
Average
18.25±0.83
18.02±0.88
16.57±0.83
17.89±0.90
16.10±0.86
17.21±0.84
19.85±0.91
18.41±0.93
18.04±0.87
15.36±0.84
ED
M
AN US
17.50±0.45
PT
Table 2 collects the comparison results in terms of classification performance of
CE
each classifier by giving the average error rate under the optimized value of the neighborhood size k, and the corresponding 95% confidence interval is also listed.
AC
From the experimental results in Table 2, it can be observed that the proposed MLM-KHNN classifier outperforms all nine state-of-the-art KNN-based methods for almost all of the twenty real-world datasets. In particular, the MLM-KHNN classifier significantly reduces the error rate of the standard LMKNN rule by introducing the concepts of multi-local mean vectors and harmonic mean distance similarity, as it can focus on the more reliable local mean vectors with smaller distances to the query 23
ACCEPTED MANUSCRIPT
sample in each class. With regards to the CFKNN, FRNN and HBKNN classifiers, although they hold quite low error rate on a few datasets, the proposed MLM-KHNN classifier is still able to outperform them on most of the datasets. In order to further validate the classification performance of the proposed
CR IP T
MLM-KHNN classifier under different values of parameter k, the average classification error rate of all twenty datasets of the KNN, WKNN, LMKNN, PNN, LMPNN, MKNN and CFKNN and the MLM-KHNN classifiers under the condition
AN US
of k=1~15 with a step size of 2 are listed in Table 3. Note that the average classification error rate under different choice of neighborhood size k of the FRNN and the HBKNN are not listed in Table 3. This is due to the fact that the FRNN
M
classifier employs all training samples for classification and does not need to obtain the optimal value of k, while the HBKNN classifier empirically gives the default
ED
value of k. Thus, their error rates on each dataset are constant and are independent of
Table 3
PT
the parameter k.
The average error rate (%) on twenty real-world datasets of each classifier with the corresponding 95%
Data
CE
confidence interval when k=1~15 with the step size of 2. KNN
WKNN
LMKNN
PNN
LMPNN
MKNN
CFKNN
MLM-KHNN
19.49±0.88
19.49±0.88
19.49±0.88
19.49±0.88
19.49±0.88
19.49±0.88
23.44±1.07
19.49±0.88
20.62±0.85
19.35±0.93
18.29±0.79
19.27±0.89
17.56±0.86
19.36±0.85
22.45±0.93
16.79±0.85
k=5
21.79±0.83
19.23±0.88
18.72±0.77
19.45±0.88
17.18±0.86
19.61±0.84
22.79±0.90
16.18±0.88
k=7
22.53±0.91
19.46±0.87
19.21±0.80
19.67±0.86
17.01±0.86
19.84±0.77
22.79±0.88
16.03±0.87
k=9
23.30±0.88
19.71±0.82
19.98±0.80
20.02±0.84
16.91±0.83
20.01±0.79
22.59±0.79
16.04±0.86
k=11
24.14±0.96
19.95±0.86
20.46±0.82
20.27±0.90
16.93±0.81
20.21±0.78
22.52±0.83
16.05±0.88
k=13
24.90±0.94
20.27±0.86
20.04±0.80
20.50±0.90
16.93±0.80
20.46±0.78
22.61±0.88
16.08±0.87
k=15
25.51±0.95
20.53±0.87
20.48±0.77
20.78±0.89
16.96±0.79
20.74±0.80
22.50±0.88
16.14±0.87
k=1
AC
k=3
24
ACCEPTED MANUSCRIPT
From the results in Table 3, it can be found that the MLM-KHNN classifier achieves the best classification performance in all cases, especially when compared with other LMKNN-based methods. As discussed in Section 4, an inappropriate value of k can easily lead to a higher error rate due to existing outliers, however the results
CR IP T
in Table 3 indicate that the proposed method is less sensitive to outliers and can
AN US
potentially reduce the error rate independently of the neighborhood size k.
5.2.2 Results of the sensitivity to the neighborhood size k
To evaluate the sensitivity of the classification performance to the parameter k, the error rates of different classifiers with varying the neighborhood size k are also
M
compared.
ED
The experimental comparisons of the error rate with parameter k changing from 1 to 15 on twenty real-world datasets are shown in Fig. 2. From the results shown in Fig.
PT
2, it can be seen that the MLM-KHNN classifier has a lower error rate than other
CE
methods with different values of k for most cases. Particularly, when the value of the
AC
neighborhood size k is relatively large, the improvements of the classification performance between the MLM-KHNN and the KNN, WKNN, PNN, LMPNN, MKNN and CFKNN can be very significant. As previously stated, the classification performances of the FRNN and the HBKNN classifiers do not rely on the choice of k, hence they are not shown in Fig. 2. Moreover, the curve of the error rate of the proposed MLM-KHNN method is
25
ACCEPTED MANUSCRIPT
shown to be smoother and flatter when compared with other classifiers that are based on the LMKNN rule, such as the LMKNN, PNN, LMPNN classifiers. In general, we can see that the proposed MLM-KHNN can both improve the classification performance and be less sensitive to the neighborhood size k. 22
KNN WKNN LMKNN PNN LMPNN MKNN CFKNN MLM-KHNN
36
34
KNN WKNN LMKNN PNN LMPNN MKNN CFKNN MLM-KHNN
CR IP T
38
20
32
error rate (%)
error rate (%)
18 30
28
26
16
14 24
22 12
0
5
10
0
5
10
15
10
15
k
(a) Air
(b) Balance
37
44
KNN WKNN LMKNN PNN LMPNN MKNN CFKNN MLM-KHNN
36
34
33
32
KNN WKNN LMKNN PNN LMPNN MKNN CFKNN MLM-KHNN
42
M
35
error rate (%)
10
15
k
40
error rate (%)
18
AN US
20
38
30
29
0
5
ED
36 31
10
34
32
15
0
5
k
PT
k
(c) German
28
(d) Glass 30
CE
KNN WKNN LMKNN PNN LMPNN MKNN CFKNN MLM-KHNN
26
KNN WKNN LMKNN PNN LMPNN MKNN CFKNN MLM-KHNN
28
26
24
24
error rate (%)
AC
error rate (%)
22
20
18
22
20
18
16
16
14
14
12
12
0
5
10
10
15
k
0
5
10
k
(e) Ionosphere
(f) Landsat
26
15
ACCEPTED MANUSCRIPT
16
4.5
KNN WKNN LMKNN PNN LMPNN MKNN CFKNN MLM-KHNN
14
KNN WKNN LMKNN PNN LMPNN MKNN CFKNN MLM-KHNN
4
3.5
error rate (%)
error rate (%)
12
10
3
2.5
8 2
6
4
1.5
0
5
10
1
15
0
5
(g) Monk-2
15
(h) Optigits
35
45
KNN WKNN LMKNN PNN LMPNN MKNN CFKNN MLM-KHNN
30
KNN WKNN LMKNN PNN LMPNN MKNN CFKNN MLM-KHNN
40
35
30
error rate (%)
25
25
AN US
error rate (%)
10
k
CR IP T
k
20
20
15
15
10
0
5
10
ED
20
error rate (%)
18
16
14
10
8
0
5
CE
6
PT
12
5
22
AC 25
15
KNN WKNN LMKNN PNN LMPNN MKNN CFKNN MLM-KHNN
20
18
16
14
12
10
8
6
10
4
15
0
5
k
10
15
k
(k) Saheart
26
10
(j) Ring
M
KNN WKNN LMKNN PNN LMPNN MKNN CFKNN MLM-KHNN
22
0
k
(i) Phoneme 24
5
15
k
error rate (%)
10
(l) Segment 68
KNN WKNN LMKNN PNN LMPNN MKNN CFKNN MLM-KHNN
KNN WKNN LMKNN PNN LMPNN MKNN CFKNN MLM-KHNN
66
24
64
62
error rate (%)
error rate (%)
23
22
21
60
58 20 56 19 54
18
17
0
5
10
52
15
k
0
5
10
k
(m) Spambase
(n) Tae 27
15
ACCEPTED MANUSCRIPT
25
18
KNN WKNN LMKNN PNN LMPNN MKNN CFKNN MLM-KHNN
20
KNN WKNN LMKNN PNN LMPNN MKNN CFKNN MLM-KHNN
16
error rate (%)
error rate (%)
14 15
10
12
10
5 8
0
0
5
10
6
15
0
5
(o) Texture 38
36
11
KNN WKNN LMKNN PNN LMPNN MKNN CFKNN MLM-KHNN
10.5
10
35 9.5
error rate (%)
34
33
9
8.5
AN US
error rate (%)
15
(p) Thyroid
KNN WKNN LMKNN PNN LMPNN MKNN CFKNN MLM-KHNN
37
10
k
CR IP T
k
32
8
31
7.5
30
7
29
0
5
10
15
k
55
M
45
40
35
30
25
CE
20
15
0
5
10
15
38
10
15
KNN WKNN LMKNN PNN LMPNN MKNN CFKNN MLM-KHNN
36
34
32
PT
error rate (%)
50
5
(r) Vote
ED
KNN WKNN LMKNN PNN LMPNN MKNN CFKNN MLM-KHNN
60
0
k
(q) Vehicle
65
6.5
error rate (%)
28
30
28
26
24
22
10
20
15
k
0
5
k
(t) Wine
AC
(s) Vowel
Fig. 2. The error rates of each classifier by changing the value of k on twenty real-world datasets
6. Discussions In pattern classification, computational complexity is an important issue when designing an effective classifier for practical applications. To better explain the merits
28
ACCEPTED MANUSCRIPT
of the proposed MLM-KHNN classifier, comparisons of the computational complexities between the LMKNN and MLM-KHNN classifiers are further discussed in this section. We only focus on the complexity of online computations in the classification stage.
CR IP T
Let 𝑁𝑡𝑟 denote the number of the training samples, 𝑁𝑗 denote the number of training samples in class 𝜔𝑗 , p represent the feature dimensionality, M denote the number of classes and k represent the neighborhood size.
AN US
For the traditional LMKNN classifier, the classification stage mainly contains three steps. The first step is to search for the k nearest neighbors in each class based on the Euclidean distance, the multiplications and sum operations are all equal to 𝑂(𝑁1 𝑝 + 𝑁2 𝑝 + ⋯ + 𝑁𝑀 𝑝), which can also be abbreviated to 𝑂(𝑁𝑡𝑟 𝑝). Additionally,
M
the comparisons are 𝑂(𝑁1 𝑘 + 𝑁2 𝑘 + ⋯ + 𝑁𝑀 𝑘), which are equal to 𝑂(𝑁𝑡𝑟 𝑘). The
ED
second step is to compute the local mean vector of each class, which requires
PT
𝑂(𝑀𝑝𝑘) sum operations. The third and final step assigns the query sample to the class with the smallest distance between its local mean and the given query, and it is
CE
characterized by 𝑂(𝑀𝑝) multiplications and sum operations, whereas the class label
AC
is determined with O(M) comparisons. Thus, the total computational complexity of the LMKNN rule is 𝑂(2𝑁𝑡𝑟 𝑝 + 𝑁𝑡𝑟 𝑘 + 𝑀𝑝𝑘 + 2𝑀𝑝 + 𝑀). For the proposed MLM-KHNN classifier, its classification stage consists of four
steps. The first step is the same as the LMKNN rule. During its second step, the MLM-KHNN method obtains the k multi-local mean vectors. It can be easily shown 𝑟 (𝑥) 𝑟−1 (𝑥) 𝑟 from the steps in Section 4 that 𝑚𝜔 = 𝑚𝜔 ∪ {𝑚 ̅𝜔 } , 𝑟 = 1, 2, ⋯ , 𝑘, and 𝑗 𝑗 𝑗
29
ACCEPTED MANUSCRIPT
𝑟−1 (𝑥) 𝑚𝜔 can be reused from the former computations when r varies from 1 to (r-1). 𝑗
Thus, the sum operations are equal to 𝑂(𝑀𝑝𝑘). At the third step, the harmonic mean distance between the query x and k multi-local mean vectors is calculated for each class, which requires 𝑂(𝑀𝑝𝑘) multiplications and 𝑂(𝑀𝑝𝑘 + 𝑀𝑘) sum operations,
CR IP T
as illustrated in Eq. (10). Then, in the final step, the proposed method classifies the query sample to the class with the minimum harmonic mean distance to the given query with O(M) comparisons. Thus, the total computational complexity of the
AN US
MLMKHNN rule is 𝑂(2𝑁𝑡𝑟 𝑝 + 𝑁𝑡𝑟 𝑘 + 3𝑀𝑝𝑘 + 𝑀𝑘 + 𝑀).
From the above analysis, it can be seen that the increased computation costs of the proposed method are 𝑂(2𝑀𝑝𝑘 + 𝑀𝑘 − 2𝑀𝑝) = 𝑂(2𝑀𝑝(𝑘 − 1) + 𝑀𝑘) > 0. Since the number of classes M and the neighborhood size k are usually much smaller than
PT
are rather small.
ED
M
the value of the training sample size 𝑁𝑡𝑟 , it means that the computational differences
7. Conclusions
CE
In this paper, we proposed a new KNN-based classifier, which is called
AC
multi-local means-based k-harmonic nearest neighbor (MLM-KHNN) classifier. The proposed method aims at improving the classification performance of the KNN-based schemes, especially the local mean-based KNN classification approaches, such as the LMKNN, PNN and LMPNN classifiers. To overcome the negative influence of the single and uniform value of k used in the local mean-based KNN classifiers, the proposed method significantly enhances the classification performance mainly from
30
ACCEPTED MANUSCRIPT
two key factors. The first improvement resides in the fact that in the MLM-KHNN rule, as many as k different local mean vectors that we call multi-local mean vectors are computed in each class to achieve a more robust classification performance with lower error rate. The second improvement resides in the fact that we first applied the
CR IP T
harmonic mean distance to measure the similarity, which allows it to focus on the more reliable local mean vectors that have smaller distances to the query sample.
To evaluate the classification performance of the proposed MLM-KHNN rule,
AN US
nine state-of-the-art KNN-based approaches have been compared: KNN, WKNN, LMKNN, PNN, LMPNN, MKNN, CFKNN, FRNN and HBKNN classifiers. Among them, in addition to the standard KNN classifier and three local mean-based KNN
M
classifiers, we also compared our method with five other nearest neighbor classifiers related to different research topics of the KNN rule, such as the distance weighting,
ED
noisy data elimination and fuzzy nearest neighbor classifier. Experimental results on
PT
twenty real-world datasets of the UCI and KEEL repository demonstrated that the proposed MLM-KHNN classifier achieves lower classification error rate and is less
CE
sensitive to outliers under different choices of neighborhood size k.
AC
Furthermore, it was shown that when compared with the traditional LMKNN rule, the increased computations required by the proposed MLM-KHNN method are only related to the number of classes M and the neighborhood size k, which are usually much smaller than the value of the training sample size 𝑁𝑡𝑟 . Therefore, the computational differences between the MLM-KHNN classifier and the LMKNN classifier are very small.
31
ACCEPTED MANUSCRIPT
Acknowledgement This work was supported in part by the Major Programs of National Natural Science Foundation of China (Grant No. 41390454), in part by the Specialized Research Fund for the Doctoral Program of Higher Education (Grant No.
CR IP T
20130201110071), in part by the Open Project Program of the National Laboratory of Pattern Recognition (Grant No. 201407370) and in part by the Open Project Program
AN US
of the State Key Lab of CAD&CG (Grant No. A1512).
References
Alcalá-Fdez, J., Fernández, A., Luengo, J., Derrac, J., & García, S. (2011). Keel data-mining software
M
tool: data set repository, integration of algorithms and experimental analysis framework. Journal of Multiple-Valued Logic & Soft Computing, 17, 255-287. & Jain, A. K. (2010). A note on distance-weighted -nearest neighbor rules. IEEE
ED
Bailey, T.,
PT
Transactions on Systems Man and Cybernetics, 8(4), 311-313. Bhattacharya, G., Ghosh, K., & Chowdhury, A. S. (2015). A probabilistic framework for dynamic k
CE
estimation in kNN classifiers with certainty factor. Eighth International Conference on Advances in
AC
Pattern Recognition, 1-5. Chen, L., & Guo, G. (2015). Nearest neighbor classification of categorical data by attributes weighting. Expert Systems with Applications, 42(6), 3142-3149. Cover, T., & Hart, P. (1967). Nearest neighbor pattern classification. IEEE Transactions on Information Theory, 13, 21-27.
32
ACCEPTED MANUSCRIPT
Dudani, S. A. (1976). The distance-weighted k-nearest-neighbor rule. IEEE Transactions on Systems Man & Cybernetics, SMC-6(4), 325-327. Garcia-Pedrajas, N., Del-Castillo, J. A., & Cerruela-Garcia, G. (2015). A proposal for local k values for k-nearest neighbor rule. IEEE Transactions on Neural Networks & Learning Systems.
classification. Knowledge-Based Systems, 70(C), 361-375.
CR IP T
Gou, J., Zhan, Y., Rao, Y., Shen, X., Wang, X., & He, W. (2014). Improved pseudo nearest neighbor
Hu, Q., Yu, D., & Xie, Z. (2008). Neighborhood classifiers. Expert Systems with Applications, 34,
AN US
866-876.
Jiang, S., Pang, G., Wu, M., & Kuang, L. (2012). An improved K-nearest neighbor algorithm for text categorization. Expert Systems with Applications, 39, 1503-1509.
M
Keller, J. M., Gray, M. R., & Givens, J. A. (1985). A fuzzy k-nearest neighbor algorithm. IEEE Transactions on Systems Man & Cybernetics, SMC-15(4), 580-585.
ED
Li, B., Chen, Y., & Chen, Y.. (2008). The nearest neighbor algorithm of local probability centers.
PT
IEEE Transactions on Systems, Man, and Cybernetics-Part B-Cybernetics, 38, 141-154. Lin, Y., Li, J., Lin, M., & Chen, J. (2014). A new nearest neighbor classifier via fusing neighborhood
CE
information. Neurocomputing, 143(16), 164-169.
AC
Liu, Z. G., Pan, Q., & Dezert, J. (2013). A new belief-based k-nearest neighbor classification method. Pattern Recognition, 46(3), 834-844. Liu, Z. G., Pan, Q., Dezert, J., & Mercier, G. (2014). Fuzzy-belief K-nearest neighbor classifier for uncertain data. Fusion 2014 Int Conf on Information Fusion , 1-8. Liu, H., & Zhang, S. (2012). Noisy data elimination using mutual k-nearest neighbor for classification mining. Journal of Systems & Software, 85(5), 1067-1074.
33
ACCEPTED MANUSCRIPT
Mateos-García, D., García-Gutiérrez, J., & Riquelme-Santos, J. C. (2016). An evolutionary voting for k-nearest neighbours. Expert Systems with Applications, 43, 9-14. Merz,
C.,
&
Murphy,
P.
(1996).
UCI
repository
of
machine
learning
databases.
.
CR IP T
Mitani, Y., & Hamamoto, Y. (2006). A local mean-based nonparametric classifier. Pattern Recognition Letters, 27(10), 1151-1159.
Rodger, J. A. (2014). A fuzzy nearest neighbor neural network statistical model for predicting demand
AN US
for natural gas and energy cost savings in public buildings. Expert Systems with Applications, 41(4), 1813-1829.
Rodger, J. A. (2015). Discovery of medical big data analytics: improving the prediction of traumatic
M
brain injury survival rates by data mining patient informatics processing software hybrid hadoop hive. Informatics in Medicine Unlocked, 1, 17-26.
ED
Samsudin, N. A., & Bradley, A. P. (2010). Nearest neighbour group-based classification. Pattern
PT
Recognition, 43(10), 3458-3467.
Sarkar, M. (2007). Fuzzy-rough nearest neighbor algorithms in classification. Fuzzy Sets & Systems,
CE
158(19), 2134-2152.
AC
Toussaint, G. (1974). Bibliography on estimation of misclassification. IEEE Transactions on Information Theory, 20(4), 472-479. Wagner, T. J. (1971). Convergence of the nearest neighbor rule. IEEE Transactions on Information Theory, 17(5), 566-571. Wang, J., Neskovic, P., & Cooper, L. N. (2006). Neighborhood size selection in the k -nearest-neighbor rule using statistical confidence. Pattern Recognition, 39(3), 417-423.
34
ACCEPTED MANUSCRIPT
Xia, W., Mita, Y., & Shibata, T. (2016). A nearest neighbor classifier employing critical boundary vectors for efficient on-chip template reduction. IEEE Transactions on Neural Networks & Learning Systems, 27(5), 1094-1107. Xu, Y., Zhu, Q., Fan, Z., Qiu, M., Chen, Y., & Liu, H. (2013). Coarse to fine k nearest neighbor
CR IP T
classifier. Pattern Recognition Letters, 34(9), 980-986. Yang, M., Wei, D., & Tao, D. (2013). Local discriminative distance metrics ensemble learning. Pattern Recognition, 46(8), 2337-2349.
Cybernetics, 46(6), 1263-1275.
AN US
Yu, Z., Chen, H., Liu, J., & You, J. (2016). Hybrid k-nearest neighbor classifier. IEEE Transactions on
Zeng, Y., Yang, Y., & Zhao, L. (2009). Pseudo nearest neighbor rule for pattern classification. Expert
M
Systems with Applications, 36(2), 3587-3595.
Zeng, Y., Yang, Y. P., & Zhao, L. (2009). Nearest neighbour classification based on local mean and
ED
class mean. Expert Systems with Applications, 36(4), 8443-8448.
PT
Zhang, N., Yang, J., & Qian, J. J. (2012). Component-based global k-nn classifier for small
AC
CE
sample size problems. Pattern Recognition Letters, 33(13), 1689-1694.
35