A new k-harmonic nearest neighbor classifier based on the multi-local means

A new k-harmonic nearest neighbor classifier based on the multi-local means

Accepted Manuscript A new k-harmonic nearest neighbor classifier based on the multi-local means Zhibin Pan , Yidi Wang , Weiping Ku PII: DOI: Referen...

8MB Sizes 0 Downloads 160 Views

Accepted Manuscript

A new k-harmonic nearest neighbor classifier based on the multi-local means Zhibin Pan , Yidi Wang , Weiping Ku PII: DOI: Reference:

S0957-4174(16)30515-2 10.1016/j.eswa.2016.09.031 ESWA 10895

To appear in:

Expert Systems With Applications

Received date: Revised date:

22 June 2016 18 September 2016

Please cite this article as: Zhibin Pan , Yidi Wang , Weiping Ku , A new k-harmonic nearest neighbor classifier based on the multi-local means, Expert Systems With Applications (2016), doi: 10.1016/j.eswa.2016.09.031

This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

ACCEPTED MANUSCRIPT

Highlights •

K different multi-local means based on the k-nearest neighbors are employed.



The harmonic mean distance is firstly introduced in the KNN classification problems. The classification error rates can be significantly reduced.



Less sensitive to the choice of neighborhood size k.



Easily designed in practice with a little extra computation complexity.

AC

CE

PT

ED

M

AN US

CR IP T



1

ACCEPTED MANUSCRIPT

A new k-harmonic nearest neighbor classifier based on the multi-local means Zhibin Pan*, Yidi Wang, Weiping Ku

Jiaotong University, Xi’an, 710049, P.R. China. E-mail address: [email protected] (Zhibin Pan), [email protected] (Yidi Wang),

AN US

[email protected] (Weiping Ku) Complete postal address:

School of Electronic and Information Engineering Xi’an Jiaotong University No.28, Xianning West Road

M

Xi’an, 710049

ED

P.R. China.

Correspondence information:

PT

E-mail address:

CR IP T

Affiliation address: School of Electronic and Information Engineering, Xi’an

[email protected] (Zhibin Pan)

CE

Telephone number:

+86-2982665459 (Office) / +86-13201718936 (Mobil) Complete postal address:

AC

Zhibin Pan

School of Electronic and Information Engineering Xi’an Jiaotong University No.28, Xianning West Road Xi’an, 710049 P.R. China.

2

ACCEPTED MANUSCRIPT September 18, 2016

List of Responses

Dear Editor and Reviewers:

AN US

CR IP T

Thank you very much for your notifying letter and for reviewers’ comments concerning our manuscript entitled “A new k-harmonic nearest neighbor classifier based on the multi-local means” (ESWA-D-16-02320R1). These comments are all very valuable and very helpful for revising and improving our paper. We have considered the comments carefully and have made corrections accordingly, which we hope meet with the requirements of Expert Systems with Applications. Revised portions are marked in red in the paper. The corrections and the responses to the reviewers’ comments are as following:

M

Reviewer #1:

PT

ED

1. Comment: The following is my second review of the paper titled, "A new k-harmonic nearest neighbor classifier based on the multi-local means"-ESWA-D-16-02320R1. The authors have addressed my previous concerns. However, the language and grammar also require some work, and I noted a large number of typographical errors. The paper needs a linguistic check, preferably by a native speaker.

CE

Response:

AC

We thank the reviewer for his/her recognition. We have completed manuscript revision and corrections including English usage and have the manuscript proof-read by a native English-speaking colleague at University College London. The revised portions are marked in red in the paper. A letter from Mr. P.V. Amadori who has assisted us with English editing and suggested extensive modifications to improve the clarity and readability of our manuscript is attached in the following on Page 4.

3

ACCEPTED MANUSCRIPT

Reviewer #2: 2. Comment: In the revised manuscript, much of the comments addressed in the earlier revision cycle are properly addressed. However, the paper still requires a language editing. I still realize several typos through the text (For instance, "In this paper, in order to design a local men-based nearest neighbor " (in page 8). So, please carefully check the paper once again.

CR IP T

Response: We thank the reviewer for his/her recognition. Considering the reviewer’s comment very carefully, we have already checked the full manuscript for typos and grammatical errors in this revised version of the paper. For example, the error mentioned in comment (on Page 8) is corrected, and the new sentence is changed to

AN US

"In this paper, in order to design a local mean-based nearest neighbor ".

CE

PT

ED

M

Furthermore, to achieve a better readability and clarity of the manuscript, we have the manuscript linguistic checked and proof-read by a native English-speaking colleague, Mr. P.V. Amadori, at University College London. Following his extensive suggestions and modifications for language editing of our manuscript, we have completed manuscript revision and the revised portions are marked in red in the paper. A letter from Mr. P.V. Amadori is also attached in the following on Page 4.

AC

We appreciate for editor/reviewers’ work earnestly, and hope that the corrections in this revised version will meet with the requirements of Expert Systems with Applications. Once again, thank you very much for your comments and suggestions.

Sincerely yours, Zhibin Pan, Yidi Wang and Weiping Ku

4

AC

CE

PT

ED

M

AN US

CR IP T

ACCEPTED MANUSCRIPT

5

ACCEPTED MANUSCRIPT

A new k-harmonic nearest neighbor classifier based on the multi-local means

CR IP T

Zhibin Pan*, Yidi Wang, Weiping Ku School of Electronic and Information Engineering, Xi’an Jiaotong University Xi’an, 710049, P.R. China

AN US

Abstract

The k-nearest neighbor (KNN) rule is a classical and yet very effective nonparametric technique in pattern classification, but its classification performance

M

severely relies on the outliers. The local mean-based k-nearest neighbor classifier (LMKNN) was firstly introduced to achieve robustness against outliers by computing

ED

the local mean vector of k nearest neighbors for each class. However, its performances

PT

suffer from the choice of the single value of k for each class and the uniform value of k for different classes. In this paper, we propose a new KNN-based classifier, called

CE

multi-local means-based k-harmonic nearest neighbor (MLM-KHNN) rule. In our

AC

method, the k nearest neighbors in each class are first found, and then used to compute k different local mean vectors, which are employed to compute their harmonic mean distance to the query sample. Finally, MLM-KHNN proceeds in classifying the query sample to the class with the minimum harmonic mean distance. The experimental results, based on twenty real-world datasets from UCI and KEEL repository, demonstrated that the proposed MLM-KHNN classifier achieves lower

1

ACCEPTED MANUSCRIPT

classification error rate and is less sensitive to the parameter k, when compared to nine related competitive KNN-based classifiers, especially in small training sample size situations. Key words: Local mean; k-nearest neighbor; Harmonic mean distance; Pattern

CR IP T

classification; Small training sample size

1. Introduction

AN US

The k-nearest neighbor (KNN) rule (Cover, & Hart, 1967) is one of the most famous classification techniques due to its simplicity, effectiveness and intuitiveness (Xia, Mita, & Shibata, 2016; Xu, et al., 2013; Jiang, Pang, Wu, & Kuang, 2012). It

M

assigns the class label to the query sample that appears most frequently in its k nearest neighbors through a majority vote. The KNN classifier has been widely studied and

ED

extensively applied in practice (Rodger, 2014&2015), thanks to its several attractive

PT

properties. In fact, the KNN rule works as a nonparametric technique, which does not require a priori knowledge about the probability distribution of the classification

CE

problem (Li, Chen, & Chen, 2008). Such property is particularly important, especially

AC

for cases where the Gaussian distributions of the samples are difficult to assume, as for small training sample size situations (Zhang, Yang, & Qian, 2012). Additionally, the classification performance of the KNN rule only relies on the method of the used distance metric and one parameter k (Garcia-Pedrajas, Del-Castillo, & Cerruela-garcia, 2015; Hu, Yu, & Xie, 2008), which represents the neighborhood size of the query sample. Finally, it has been proven that the KNN rule can asymptotically approach the

2

ACCEPTED MANUSCRIPT

optimal classification performance achieved by the Bayes method under the constraint of 𝑘/𝑁 → 0 (Wagner, 1971), where N is the total number of training samples. While the KNN rule has many significant advantages, some problems still exist. The first problem is that all the k nearest neighbors are equally considered when

CR IP T

assigning a class label to the query sample via a simple majority vote. Obviously, this is not reasonable when the k nearest neighbors are very different in their distances to the query sample and some closer nearest neighbors seem to be more important

AN US

(Bailey, & Jain, 2010). In order to tackle this problem, several distance-weighted voting methods (Mateos-García, García-Gutiérrez, & Riquelme-Santos, 2016; Gou, Xiong, & Kuang, 2011) for KNN rule have been developed, where larger weights are

M

given to the closer nearest neighbors. However, such approach is not always correct, as some farther neighbors may be more important for classification. Accordingly, a

ED

number of adaptive metric nearest neighbor classifications were developed (Yang,

PT

Wei, & Tao, 2013; Weinberger, & Saul, 2009). The second problem is that the KNN rule cannot properly classify the query sample when attribute data are similar to the

CE

training samples from different classes (Liu, Pan, & Dezert, 2013). In fact, the number

AC

of the nearest neighbors from different classes may be similar in the k-nearest neighbors of the query, hence causing the KNN rule to incorrectly assign a class label. Accordingly, a number of fuzzy classifications and the belief-based classifications have been derived in order to allow the queries to belong to different classes with masses of belief, hence reducing the classification errors ( Liu, Pan, Dezert, & Mercier, 2014; Sarkar, 2007). The third problem is that the classification performance

3

ACCEPTED MANUSCRIPT

of the KNN rule severely relies on the distance function used to compute the distance between query sample and training sample. As a result, several global and local feature weighting-based distance metric learning methods (Chen, & Gou, 2015; Lin, Li, Lin, & Chen, 2014) were proposed to improve the performance of the KNN

CR IP T

classifier. It is also known that nonparametric classifiers suffer from outliers, especially in the situations of small training sample size (Zhang, Yang, & Qian, 2012; Mitani, &

AN US

Hamamoto, 2006). That is one reason why the classification performance of KNN rule is heavily influenced by the neighborhood size k (Bhattacharya, Ghosh, & Chowdhury, 2015; Wang, Neskovic, & Cooper, 2006). In fact, if k is very small, the

M

classification decision can be poor due to some noisy and imprecise samples (Liu et al., 2013). On the contrary, a large value of k can lead to degradation in the

ED

classification performance because of the existing outliers in the k nearest neighbors

PT

that come from the wrong classes. In order to design a practical classifier that is more robust to outliers, a simple nonparametric classifier named local mean-based k-nearest

CE

neighbor (LMKNN) classifier was proposed in (Mitani et al., 2006). Because of the

AC

effectiveness and easy design of the LMKNN classifier, its core idea has been successfully applied to many other improved methods (Gou et al., 2014; Sansudin, & Bradley, 2010; Zeng, Yang, & Zhao, 2009&2010). Even though the LMKNN classifier can be easily designed in practice with a good classification performance, it generally suffers from the choice of the single value of k for each class and the uniform value of k for different classes. Thus, in order to design

4

ACCEPTED MANUSCRIPT

a classifier with better classification performance and less sensitivity to the neighborhood size k, we propose a new KNN-based classifier, which is called the multi-local means-based k-harmonic nearest neighbor (MLM-KHNN) classifier. The MLM-KHNN classifier mainly holds two key properties when compared with the

1)

CR IP T

LMKNN classifier: The MLM-KHNN classifier employs as many as k multi-local means for each class instead of a single local mean, as for the LMKNN classifier, to

2)

AN US

reduce the sensitivity to the choice of neighborhood size k.

The MLM-KHNN classifier for the first time introduces the harmonic mean distance as the similarity measure, hence reducing the error rate by focusing

M

on the more reliable local means in different classes instead of a uniform value of k in all classes, as in the LMKNN classifier.

ED

The rest of paper is organized as follows. In Section 2, we briefly describe the

PT

state-of-the-art k-nearest neighbor classifiers, focusing on the approaches based on the LMKNN rule. The preliminaries of both the LMKNN rule and the harmonic mean

CE

distance for the classification problems are given in Section 3. In Section 4, we

AC

propose our MLM-KHNN classifier. Extensive experiments to compare the proposed approach with other competitive KNN-based classifiers on UCI (Merz & Murphy, 1996) and KEEL (Alcalá-Fdez et al., 2011) real-world datasets are conducted in Section 5. Finally, discussions and conclusions are given in Section 6 and Section 7, respectively.

5

ACCEPTED MANUSCRIPT

2. Related work In this section, we briefly review some related state-of-the-art KNN-based classifiers, with a particular focus on the LMKNN classifier (Mitani et al., 2006) and its extensions.

CR IP T

The traditional KNN classifier is a simple and powerful technique in pattern classification, but one major problem is that its performance severely relies on outliers. The LMKNN classifier was firstly introduced to tackle this problem and enhance the

AN US

classification performance. In the LMKNN rule, the local mean vector is firstly computed according to all the k nearest neighbors in each class, and then the query sample is assigned to the class with the minimum Euclidean distance between the

M

query and local mean vector. However, the LMKNN classifier selects the single value

misclassification.

ED

of k for each class and the uniform value of k for all classes, which may lead to

PT

To further improve the classification performance of the LMKNN classifier, several different local mean-based k-nearest neighbor classifiers have been proposed.

CE

The pseudo k-nearest neighbor classifier (PNN) (Zeng, Yang, & Zhao, 2009) was

AC

developed to address the problem caused by the choice of a single value of k. In the PNN classifier, the sum of the weighted distances between the query sample and its k nearest neighbors in each class is used as the similarity measure between the query and the pseudo training sample. Based on the basic ideas of both the LMKNN classifier and PNN classifier, the local mean-based pseudo k-nearest neighbor classifier (LMPNN) was proposed in (Gou et al., 2014) and showed promising

6

ACCEPTED MANUSCRIPT

classification performance. In the LMPNN classifier, the pseudo distance is obtained by combining the weighted distance between the query sample and each local mean vector in different classes. Additionally, some other LMKNN-based classifiers were also successfully applied to different areas, such as the local mean and class

CR IP T

mean-based k-nearest neighbor classifier (Zeng, Yang, & Zhao, 2010) and the nearest neighbor group-based classifier (Sansudin, & Bradley, 2010).

Additional approaches were also developed in order to address the problems

AN US

caused by outliers. The mutual k-nearest neighbor classifier (MKNN) (Liu, & Zhang, 2012) represents a simple and powerful technique to find the more reliable nearest neighbors through noisy data elimination procedure. In the MKNN rule, only the

M

training sample that also regards the query as one of its k nearest neighbors will be selected as the mutual nearest neighbor. The coarse to fine k-nearest neighbor

ED

classifier (CFKNN) (Xu et al., 2013) is another successful method for selecting the

PT

nearest neighbors from the view of optimally representing the query sample. It first coarsely selects the close training samples in terms of a small number, and then finely

CE

determines the k nearest neighbors of the query sample. Through the coarse-to-fine

AC

procedure, the CFKNN classifier can accurately classify with less redundant information and outliers in the obtained training dataset. Another popular research direction of the KNN rule resides in the study of fuzzy

k-nearest neighbor-based algorithms, which led to several works (Rodger, 2014; Liu, Pan, Dezert, & Mercier, 2014; Sarkar, 2007; Kelly, Gray & Givens, 1985) that exploit the fuzzy uncertainty to enhance the classification result of the KNN rule. Among

7

ACCEPTED MANUSCRIPT

them, the fuzzy-rough nearest neighbor classifier (FRNN) (Sarkar, 2007) showed to be able to obtain richer class confidence values with promising classification results without the need to know the optimal value of k. Moreover, a hybrid k-nearest neighbor classifier (HBKNN) was proposed in (Yu, Chen, Liu & You, 2016), proving

CR IP T

the suppleness and effectiveness of combining the fuzzy membership in the fuzzy KNN classifier and the local information in the LMKNN classifier.

In this paper, in order to design a local mean-based nearest neighbor classifier

AN US

with better classification performance and less sensitivity to the neighborhood size k, we propose a new KNN-based classifier, which is called multi-local means-based k-harmonic nearest neighbor (MLM-KHNN) rule. Instead of employing only one

M

local mean vector based on a fixed single value of k in each class as in the LMKNN rule, we improve it by computing all k different local mean vectors based on the k

ED

nearest neighbors in each class. Additionally, as the local sample distribution in each

PT

class is different, the value of k to obtain the nearest local mean vector is not always the same for all classes. To take into account the importance of the nearer local mean

CE

vector to the query sample under different values of k in each class, the harmonic

AC

mean distance of the k multi-local mean vectors to the query sample is introduced for the first time.

3. Preliminaries In this section, we briefly describe the LMKNN classifier and the harmonic mean distance for classification problems.

8

ACCEPTED MANUSCRIPT

3.1 The LMKNN classifier The local mean-based k-nearest neighbor (LMKNN) rule (Mitani et al., 2006) is a simple, effective and robust nonparametric classifier. Mitani and Hamamoto have demonstrated that it can improve the classification performance and also reduce the

𝑁

CR IP T

influence of existing outliers, especially in small training sample size situations. 𝑡𝑟 Given a training sample set 𝑇𝑟 = *(𝑦𝑖 , 𝑐𝑖 )+𝑖=1 with 𝑁𝑡𝑟 training samples

𝑦𝑖 ∈ 𝑅 𝐷 in a D-dimensional feature space from M classes, 𝑐𝑖 is the corresponding 𝑗

𝑗

𝑁

AN US

𝑗 class label of 𝑦𝑖 , where 𝑐𝑖 ∈ *𝜔1 , 𝜔2 , ⋯ , 𝜔𝑀 +. Let 𝑇𝑟𝑗 = *(𝑦𝑖 , 𝑐𝑖 )+𝑖=1 denote the

training sample set of class 𝜔𝑗 , where 𝑁𝑗 is the number of the training samples in class 𝜔𝑗 and ⋃𝑀 𝑗=1 𝑇𝑟𝑗 = 𝑇𝑟. Instead of finding the original k nearest neighbors in the whole training dataset Tr as in the KNN rule, the LMKNN rule is designed to

M

employ the local mean vector of k nearest neighbors for each class of 𝑇𝑟𝑗 to classify

ED

the query sample. In the LMKNN rule, a given query sample 𝑥 ∈ 𝑅 𝐷 is classified

PT

into class 𝜔𝑐 by the following steps: Step 1. Find the k nearest neighbors of x from the 𝑇𝑟𝑗 of each class 𝜔𝑗 .

CE

𝑁𝑁 𝑁𝑁 𝑘 Let 𝑁𝑁𝜔𝑘𝑗 (𝑥) = *(𝑦𝑖,𝑗 , 𝑐𝑖,𝑗 )+𝑖=1 denote the set of the k nearest neighbors of

AC

𝑁𝑁 𝑁𝑁 the query sample x in class 𝜔𝑗 , where (𝑦𝑖,𝑗 , 𝑐𝑖,𝑗 ) is computed from 𝑇𝑟𝑗

for each class and arranged in an ascending order according to the Euclidean 𝑇

𝑁𝑁 𝑁𝑁 𝑁𝑁 distance measure, i.e., 𝑑(𝑥, 𝑦𝑖,𝑗 ) = √(𝑥 − 𝑦𝑖,𝑗 ) (𝑥 − 𝑦𝑖,𝑗 ).

𝑘 Step 2. Compute the local mean vector 𝑚𝜔 of class 𝜔𝑗 by using the k nearest 𝑗

neighbors in the set 𝑁𝑁𝜔𝑘 𝑗 (𝑥). 1

𝑁𝑁 𝑘 𝑚𝜔 = 𝑘 ∑𝑘𝑖=1 𝑦𝑖,𝑗 𝑗

9

(1)

ACCEPTED MANUSCRIPT

Step 3.

Classify x into the class 𝜔𝑐 when the Euclidean distance between its local 𝑘 mean vector 𝑚𝜔 and x is the minimum among the M classes: 𝑗 𝑘 𝜔𝑐 = arg min𝜔𝑗 𝑑(𝑥, 𝑚𝜔 ) , 𝑗 = 1,2, … , 𝑀 𝑗

(2)

Note that the LMKNN classifier is equivalent to the 1-NN classifier when k=1,

CR IP T

and the meaning of parameter k is totally different from that in KNN rule. In the KNN rule, the k nearest neighbors are chosen from the whole training dataset Tr, while the LMKNN rule employs the local mean vector of k nearest neighbors in each class 𝑇𝑟𝑗 .

AN US

Instead of the majority vote in k nearest neighbors, the LMKNN rule aims at finding the class with the closest local region to the query sample, which can effectively overcome the negative effect of outliers by computing the local mean vector of k

M

nearest neighbors for each class.

ED

3.2 The harmonic mean distance

PT

The harmonic mean distance is introduced to measure the distance between a pair of point groups. We briefly describe the concept of harmonic average, in order to

CE

clarify the rationale behind its usage in the proposed method.

AC

Given a dataset with k elements *𝑦1 , 𝑦2 , ⋯ , 𝑦𝑘 +, its harmonic average is defined as

𝐻𝐴(*𝑦1 , 𝑦2 , ⋯ , 𝑦𝑘 +) =

𝑘 1 𝑦𝑖

∑𝑘 𝑖=1

(3)

Note that if one element in 𝑦1 , 𝑦2 , ⋯ , 𝑦𝑘 is small, their harmonic average can be very small. In other words, the value of 𝐻𝐴(*𝑦1 , 𝑦2 , ⋯ , 𝑦𝑘 +) relies more on the element with a smaller value in *𝑦1 , 𝑦2 , ⋯ , 𝑦𝑘 +. 10

ACCEPTED MANUSCRIPT

The basic idea of harmonic mean distance is to take the sum of the harmonic average of the Euclidean distances between one given data point and each data point in another point group. In this paper, we first apply the harmonic mean distance to the KNN-based classification problem. The harmonic mean distance, which is defined as

CR IP T

𝐻𝑀𝐷(∙), is used to measure the distance between a query sample x and its related training sample group. For example, given a query sample x and its k nearest neighbors set 𝑁𝑁𝑘 (𝑥) = *(𝑦𝑖𝑁𝑁 , 𝑐𝑖𝑁𝑁 )+𝑘𝑖=1 from the training sample set 𝑇𝑟 = 𝑁

AN US

𝑡𝑟 *(𝑦𝑖 , 𝑐𝑖 )+𝑖=1 , the harmonic mean distance 𝐻𝑀𝐷(𝑥, *𝑦𝑖𝑁𝑁 +𝑘𝑖=1 ) between x and the set

𝑁𝑁𝑘 (𝑥) can be obtained as

𝐻𝑀𝐷(𝑥, *𝑦𝑖𝑁𝑁 +𝑘𝑖=1 ) =

𝑘

(4)

1 𝑑.𝑥,𝑦𝑁𝑁 𝑖 /

∑𝑘 𝑖=1

M

In order to highlight the differences between the arithmetic mean distance and

in the following.

ED

the harmonic mean distance used in the proposed method, their comparisons are given

PT

The arithmetic mean distance between x and its related k nearest neighbors *𝑦𝑖𝑁𝑁 +𝑘𝑖=1 , which is denoted as 𝐴𝑀𝐷(𝑥, *𝑦𝑖𝑁𝑁 +𝑘𝑖=1 ), can be expressed as the weighted

CE

𝑘

sum of 𝑑(𝑥, 𝑦𝑖𝑁𝑁 ), 𝑖 = 1,2, ⋯ , 𝑘, as in Eq. (5), while the value of

𝜕𝐴𝑀𝐷(𝑥,{𝑦𝑖𝑁𝑁 }𝑖=1 ) 𝜕𝑑(𝑥,𝑦𝑖𝑁𝑁 )

AC

represents the weight of 𝑑(𝑥, 𝑦𝑖𝑁𝑁 ) to compute the final value of 𝐴𝑀𝐷(𝑥, *𝑦𝑖𝑁𝑁 +𝑘𝑖=1 ). 𝑘

As shown in Eq. (6), it can be proven that

𝜕𝐴𝑀𝐷(𝑥,{𝑦𝑖𝑁𝑁 }𝑖=1 ) 𝜕𝑑(𝑥,𝑦𝑖𝑁𝑁 )

1

= 𝑘 is always true, which

means that the 𝑑(𝑥, 𝑦𝑖𝑁𝑁 ), 𝑖 = 1,2, ⋯ , 𝑘, are considered equally important in the arithmetic mean distance. 𝐴𝑀𝐷(𝑥, *𝑦𝑖𝑁𝑁 +𝑘𝑖=1 )

=

11

𝑁𝑁 ∑𝑘 𝑖=1 𝑑(𝑥,𝑦𝑖 )

𝑘

(5)

ACCEPTED MANUSCRIPT

𝜕𝐴𝑀𝐷(𝑥,*𝑦𝑖𝑁𝑁 +𝑘 𝑖=1 ) 𝜕𝑑(𝑥,𝑦𝑖𝑁𝑁 )

=

∑𝑘 𝑑(𝑥,𝑦𝑁𝑁 𝑖=1 𝑖 )] 𝜕[ 𝑘

𝜕𝑑(𝑥,𝑦𝑖𝑁𝑁 )

=

1

(6)

𝑘

𝑘

𝜕𝐻𝑀𝐷(𝑥,{𝑦𝑖𝑁𝑁 }𝑖=1 )

However, the value of

𝜕𝑑(𝑥,𝑦𝑖𝑁𝑁 )

, which represents the weight of

𝑑(𝑥, 𝑦𝑖𝑁𝑁 ) in computing the 𝐻𝑀𝐷(𝑥, *𝑦𝑖𝑁𝑁 +𝑘𝑖=1 ) , is totally different, because 𝜕𝑑(𝑥,𝑦𝑖𝑁𝑁 )

is inversely proportional to the value of 𝑑 2 (𝑥, 𝑦𝑖𝑁𝑁 ) as shown in Eq.

CR IP T

𝜕𝐻𝑀𝐷(𝑥,*𝑦𝑖𝑁𝑁 +𝑘 𝑖=1 )

(7). Compared with the arithmetic mean distance, the harmonic mean distance focuses more on the influence of the sample that has a closer distance to the query sample x.

AN US

Besides, it can be indicated that if 𝑦𝑖𝑁𝑁 in *𝑦𝑖𝑁𝑁 +𝑘𝑖=1 has a small distance to x, the value of 𝐻𝑀𝐷(𝑥, *𝑦𝑖𝑁𝑁 +𝑘𝑖=1 ) will be small.

=

1 𝑖=1𝑑.𝑥,𝑦𝑁𝑁/ 𝑖

∑𝑘

𝜕𝑑(𝑥,𝑦𝑖𝑁𝑁 )

ED

𝜕𝐻𝑀𝐷.𝑥,*𝑦𝑖𝑁𝑁 +𝑘 𝑖=1 / 𝜕𝑑(𝑥,𝑦𝑖𝑁𝑁 )

𝑘

M

𝜕,

𝑘

2

𝐻𝑀𝐷 (𝑥,*𝑦𝑁𝑁 𝑖 +𝑖=1 ) 2

𝑘×𝑑 (𝑥,𝑦𝑁𝑁 𝑖 )

1 2 ) 1 𝑘 𝑑(𝑥,𝑦𝑖𝑁𝑁 )×∑𝑖=1 𝑑.𝑥,𝑦𝑁𝑁 𝑖 / 𝑘

2

1 𝐻𝑀𝐷 (𝑥,*𝑦𝑁𝑁 𝑖 +𝑖=1 ) = × 2 𝑘 𝑑 (𝑥,𝑦𝑁𝑁 )

(7)

𝑖

CE

PT

=

=𝑘×(

4. The proposed MLM-KHNN classifier

AC

In this section, we describe the proposed multi-local means-based k-harmonic

nearest neighbor (MLM-KHNN) classifier. The goal of the MLM-KHNN rule is to improve the classification performance and reduce the sensitivity to the single value of neighborhood size k for each class and the uniform value of k for all classes in the LMKNN rule. The mechanism of the LMKNN rule is to assign the query sample the class label 12

ACCEPTED MANUSCRIPT

of its most similar local subclass from different classes, where the local mean vector is used as the representation of the local subclass. Obviously, the local subclasses are obtained based on the k nearest neighbors in each class. Thus, the choice of parameter k is of great importance to generate the local mean vector that can better represent its

CR IP T

own class. However, there are two main problems for the choice of the parameter k in the LMKNN rule, which may lead to misclassification. Firstly, only a fixed single value of k is employed in each class, which may lead to the high sensitivity of the

AN US

local mean to the value of k. If k is too small, the useful classification information may be insufficient, whereas a large value of k can easily lead to outliers to be included in the k nearest neighbors of the true class (Gou et al., 2014). Secondly, a

M

uniform value of k is employed in all classes. Since the local sample distributions in different classes are quite different, the value of k for selecting the most similar local

ED

subclass of the query for each class is usually very different as well. Therefore, it is

PT

unreasonable to use the same value of k for all classes, as for the LMKNN rule. From Eq. (1), we have:

𝑇

CE

𝑘 𝑘 𝑑 .𝑥, 𝑚𝜔 / = √.𝑥 − 𝑚𝜔 / 𝑗 𝑗 1

𝑘 .𝑥 − 𝑚𝜔 / 𝑗 𝑇

1

AC

𝑁𝑁 𝑁𝑁 = √(𝑥 − 𝑘 ∑𝑘𝑖=1 𝑦𝑖,𝑗 ) (𝑥 − 𝑘 ∑𝑘𝑖=1 𝑦𝑖,𝑗 ) 1

2

1

𝑁𝑁 𝑁𝑁 𝑁𝑁 = 𝑘 √‖∑𝑘𝑖=1(𝑥 − 𝑦𝑖,𝑗 )‖ = 𝑘 ‖(𝑥 − 𝑦1,𝑗 ) + ⋯ + (𝑥 − 𝑦𝑘,𝑗 )‖ (8)

where ‖∙‖ denotes the Euclidean distance. From Eq. (8), it can be seen that the 𝑁𝑁 difference vector of x and each sample 𝑦𝑖,𝑗 in 𝑁𝑁𝜔𝑘𝑗 (𝑥) is considered when 𝑘 computing the distance of x and the local mean vector 𝑚𝜔 . Taking Fig. 1(a) as an 𝑗

example,

when

𝑁𝑁 𝑁𝑁 𝑁𝑁 𝑁𝑁 ‖(𝑥 − 𝑦1,2 ) + (𝑥 − 𝑦2,2 ) + (𝑥 − 𝑦3,2 )‖ < ‖(𝑥 − 𝑦1,1 )+

13

ACCEPTED MANUSCRIPT

𝑁𝑁 𝑁𝑁 𝑁𝑁 𝑁𝑁 (𝑥 − 𝑦2,1 ) + (𝑥 − 𝑦3,1 )‖ and (𝑥 − 𝑦3+𝑖,2 )>(𝑥 − 𝑦3+𝑖,1 ), i=1,2, ⋯, it is obvious that

the classification result is very sensitive when using a fixed single value of k=3. In this example, the query x will be correctly classified into class 𝜔1 when k=4, but it will get the wrong classification result when k=3. Actually, the class 𝜔2 only has a

CR IP T

very few samples with quite close distances to the query sample x. A similar example is also shown in Fig. 1(b), where a large value of k can easily lead to misclassification due to the outliers existing in the k nearest neighbors in the true class of the query

AN US

𝑁𝑁 𝑁𝑁 sample. As shown in Fig. 1(b), the outlier samples 𝑦7,1 and 𝑦8,1 in class 𝜔1 pull 𝑘 away its local mean vector 𝑚𝜔 from x due to the unsuitable selection of the value of 1

neighborhood size k. Thus, it can be inferred that a fixed single value of k in the

M

LMKNN rule may limit its classification performance.

Additionally, in terms of the diversity of the local sample distribution in different

ED

classes, the value of k to obtain the nearest local mean vector of the query sample x

PT

for each class may be quite different. However, the LMKNN classifier chooses a uniform value of k for all classes, which may result in selecting extra outliers in some

CE

classes and inadequate number of nearest neighbors in other classes. In other words, it

AC

may not always select the effective subclass to represent its own class because the uniform value of k ignores the difference of local sample distribution in different classes, hence leading to misclassification.

14

AN US

CR IP T

ACCEPTED MANUSCRIPT

AC

CE

PT

ED

M

(a) The query pattern x from class 𝜔1 is misclassified to 𝜔2 when k is unsuitably small in LMKNN rule

(b) The query pattern x from class 𝜔1 is misclassified to 𝜔2 when k is unsuitably large in LMKNN rule

Fig. 1. The misclassification examples of two-class classification case in the LMKNN rule

In order to solve this problem, we propose a new KNN-based classifier, called multi-local means-based k-harmonic nearest neighbor (MLM-KHNN) rule. In the proposed method, k different local mean vectors based on the k nearest neighbors in 15

ACCEPTED MANUSCRIPT

each class are firstly computed. Clearly, the k multi-local mean vectors in each class have different distances to the query sample, and the nearer local mean vector is more important to represent its own class for classification. In other words, we want to focus more on the values of k that can find a closer local subclass, where the values of

CR IP T

k in each class can be totally different. To do so, we introduce the harmonic mean distance between the group of the multi-local mean vectors in each class and the query sample x to measure their similarity and finally classify the query sample to the

AN US

class with the minimum harmonic mean distance. The proposed MLM-KHNN classifier achieves lower classification error rate and is less sensitive to the neighborhood size k when compared with LMKNN and other related KNN-based

M

classifiers.

The MLM-KHNN classifier, as a new version of the KNN rule and LMKNN rule,

ED

is presented below. The proposed method essentially shows two significant

PT

advantages when compared with the LMKNN rule. First, the k local mean vectors based on the top 𝑟 (1 ≤ 𝑟 ≤ 𝑘) nearest neighbors are all computed and used after the

CE

k nearest neighbors have been found in each class, which are called as multi-local

AC

means. Second, the harmonic mean distance between the group of k multi-local mean vectors and the query sample in each class is introduced to measure similarity for a more accurate classification, which takes into the influence of different values of k for classification for different classes. 𝑁

𝑡𝑟 Let 𝑇𝑟 = *(𝑦𝑖 , 𝑐𝑖 )+𝑖=1 be a training sample set with 𝑁𝑡𝑟 training samples from

M classes, where 𝑦𝑖 ∈ 𝑅 𝐷 and 𝑐𝑖 is the corresponding class label of 𝑦𝑖 , 𝑐𝑖 ∈

16

ACCEPTED MANUSCRIPT

𝑗

𝑗

𝑁

𝑗 *𝜔1 , 𝜔2 , ⋯ , 𝜔𝑀 +. For class 𝜔𝑗 , suppose that 𝑇𝑟𝑗 = *(𝑦𝑖 , 𝑐𝑖 )+𝑖=1 denotes the training

sample set of class 𝜔𝑗 with 𝑁𝑗 training samples. In the MLM-KHNN rule, the class label 𝜔𝑐 of a given query sample 𝑥 ∈ 𝑅 𝐷 is obtained as follows: Step 1. For each class 𝜔𝑗 , find the k nearest neighbors of x denoted by 𝑁𝑁𝜔𝑘𝑗 (𝑥) =

CR IP T

𝑁𝑁 𝑁𝑁 𝑘 *(𝑦𝑖,𝑗 , 𝑐𝑖,𝑗 )+𝑖=1 from 𝑇𝑟𝑗 , which are then sorted in an ascending order

according to their Euclidean distances to x.

Step 2. Compute k multi-local mean vectors based on the top r (1 ≤ 𝑟 ≤ 𝑘) nearest

AN US

𝑘 (𝑥) neighbors of x from 𝑇𝑟𝑗 in each class 𝜔𝑗 . For class 𝜔𝑗 , let 𝑚𝜔 = 𝑗 𝑟 𝑘 *𝑚 ̅𝜔 + denote the k multi-local mean vectors 𝑗 𝑟=1 1

𝑁𝑁 𝑟 𝑚 ̅𝜔 = 𝑟 ∑𝑟𝑖=1 𝑦𝑖,𝑗 𝑗

(9)

𝑟 Note that 𝑚 ̅𝜔 ∈ 𝑅 𝐷 and their corresponding Euclidean distances to x are 𝑗

M

2 𝑘 denoted by 𝑑(𝑥, 𝑚 ̅ 1𝜔𝑗 ), 𝑑(𝑥, 𝑚 ̅𝜔 ), ⋯ , 𝑑(𝑥, 𝑚 ̅𝜔 ). 𝑗 𝑗

ED

Step 3. For each class 𝜔𝑗 , compute the harmonic mean distance between x and k 𝑘

𝑘 (𝑥) 𝑟 multi-local mean vectors 𝑚𝜔 = {𝑚 ̅𝜔 } 𝑗 𝑗

PT

𝑟=1

𝑘 (𝑥)) 𝐻𝑀𝐷(𝑥, 𝑚𝜔 = 𝑗

obtained from Step 2.

𝑘 ∑𝑘 𝑟=1

1

(10)

CE

𝑟 𝑑.𝑥, ̅ 𝑚𝜔 / 𝑗

AC

Step 4. Assign x to the class 𝜔𝑐 that has the minimum harmonic mean distance to x in terms of Eq. (11). 𝑘 (𝑥) 𝜔𝑐 = arg min𝜔𝑗 , 𝐻𝑀𝐷(𝑥, 𝑚𝜔 )𝑗

(11)

Note that when k=1, 𝑚1𝜔𝑗 (𝑥) only has one local mean vector which is equal to 𝑁𝑁 𝑦1,𝑗 , and the harmonic mean distance of x and 𝑚1𝜔𝑗 (𝑥) is computed by 𝑁𝑁 𝐻𝑀𝐷(𝑥, 𝑚1𝜔𝑗 (𝑥)) = 𝑑(𝑥, 𝑚 ̅ 1𝜔𝑗 ) = 𝑑(𝑥, 𝑦1,𝑗 ). Thus, the MLM-KHNN rule degrades

17

ACCEPTED MANUSCRIPT

Algorithm 1. The proposed MLM-KHNN classifier Input: x: query sample; k: the neighborhood size; 𝑁

𝑡𝑟 𝑇𝑟 = *(𝑦𝑖 , 𝑐𝑖 )+𝑖=1 : the training dataset;

𝑗

𝑗

𝑁

𝑗 𝑇𝑟𝑗 = *(𝑦𝑖 , 𝑐𝑖 )+𝑖=1 : the training dataset of class 𝜔𝑗 with 𝑁𝑗 training samples;

CR IP T

M: the number of classes; 𝜔1 , 𝜔2 , ⋯ , 𝜔𝑀 : the class labels. Output: 𝜔𝑐 : the classification result of the query sample x. Procedures:

AN US

Step 1: Find the k nearest neighbors of x in each class 𝜔𝑗 sorted in an ascending order according to their Euclidean distances to x as: 𝑁𝑁 𝑁𝑁 𝑘 𝑁𝑁𝜔𝑘𝑗 (𝑥) = *(𝑦𝑖,𝑗 , 𝑐𝑖,𝑗 )+𝑖=1

𝑟 Step 2: Compute the Euclidean distance 𝑑(𝑥, 𝑚 ̅𝜔 ) between the local mean vector 𝑗

PT

ED

for j=1 to M do for r=1 to k do

M

𝑟 𝑚 ̅𝜔 of the top r nearest neighbors of x in 𝑁𝑁𝜔𝑘𝑗 (𝑥) for each class 𝜔𝑗 as: 𝑗

𝑟

𝑟 𝑚 ̅𝜔 𝑗

1 𝑁𝑁 = ∑ 𝑦𝑖,𝑗 𝑟 𝑖=1

𝑇

𝑟 𝑟 / .𝑥 − 𝑚 𝑟 / 𝑑(𝑥, 𝑚 ̅𝜔 ) = √.𝑥 − 𝑚 ̅𝜔 ̅𝜔 𝑗 𝑗 𝑗

CE

end for end for

AC

𝑘 (𝑥)) Step 3: Compute the harmonic mean distance 𝐻𝑀𝐷(𝑥, 𝑚𝜔 between x and the 𝑗 𝑘

𝑘 (𝑥) 𝑟 k-multi local mean vectors in 𝑚𝜔 = {𝑚 ̅𝜔 } 𝑗 𝑗

𝑟=1

for each class 𝜔𝑗 as:

for j=1 to M do 𝑘 (𝑥)) 𝐻𝑀𝐷(𝑥, 𝑚𝜔 = 𝑗

𝑘 ∑𝑘 𝑟=1

1

𝑟 𝑑.𝑥, ̅ 𝑚𝜔 / 𝑗

end for Step 4: Assign x to the class 𝜔𝑐 with the minimum harmonic mean distance as: 𝑘 (𝑥) 𝜔𝑐 = arg min, 𝐻𝑀𝐷(𝑥, 𝑚𝜔 )𝑗 𝜔𝑗

18

ACCEPTED MANUSCRIPT

into the LMKNN rule and the KNN rule when k=1, and all of them have the same classification performances as the 1-NN classifier. The pseudo codes of the MLM-KHNN classifier are summarized in detail in Algorithm 1.

CR IP T

5. Experiments In contrast with the LMKNN classifier (Mitani et al., 2006) and the classical KNN classifier (Cover et al., 1967), the proposed MLM-KHNN classifier does not choose

AN US

the local mean vectors using a fixed single value of neighborhood size k. Instead, as many as the k multi-local mean vectors based on the k nearest neighbors in each class are employed, and the harmonic mean distance between the group of multi-local mean

M

vectors in each class and query sample is also utilized to emphasize the importance of closer local mean vectors in classification.

ED

To validate the proposed method on the classification performance, we compare

PT

the MLM-KHNN classifier with the standard KNN classifier and other eight competitive KNN-based classifiers: KNN (Cover et al., 1967), WKNN (Dudani,

CE

1976), LMKNN (Mitani et al., 2006), PNN (Zeng et al., 2009), LMPNN (Gou et al.,

AC

2014), MKNN (Liu & Zhang, 2012), CFKNN (Xu et al., 2013), FRNN (Sarkar, 2007), HBKNN (Yu et al., 2016). The comparative experiments are extensively conducted on ten UCI (Merz & Murphy, 1996) and ten KEEL (Alcalá-Fdez et al., 2011) real-world datasets in terms of error rate, which is one of the most important measures in pattern classification. In our experiments, the parameter k is optimized by the cross-validation (CV) (Toussaint, 1974) approach for each classifier, with the same approach described

19

ACCEPTED MANUSCRIPT

in (Mitani et al., 2006; Zeng et al., 2009; Gou et al., 2014). In order to better evaluate the sensitivity of the proposed classifier to the neighborhood size k, comparative experiments of the classification performance with varying neighborhood size k are

CR IP T

also conducted.

5.1 Datasets

In this subsection, we briefly summarize the selected datasets considered in our

AN US

experiments. The twenty real-world datasets are all taken from the UCI machine-learning repository and the KEEL repository, which are Air, Balance, German, Glass, Ionosphere, Landsat, Monk-2, Optigits, Phoneme, Ring, Saheart,

M

Spambase, Segment, Tae, Texture, Thyroid, Vehicle, Vote, Vowel and Wine, respectively. Among these twenty real-world datasets, they hold quite different

ED

characteristics in numbers of samples, attributes and classes, which are listed in Table

PT

1. The numbers of samples of these datasets are characterized by a wide range and vary from 151 to 7400 in order to comprehensively validate the proposed method.

CE

Since we are interested in the classification performance of the KNN-based

AC

classifiers when considering small training sample size, the size of training set needs to be determined in advance. The same approach as in (Mitani et al., 2006; Zeng et al., 2009; Gou et al., 2014) is adopted in our experiments. For each dataset, we randomly chose a training set that contains approximately 30% of the data and the others are chosen as testing data, as shown in Table 1. The experiments are repeated 50 times for a cross-validation and the average error rate with 95% confidence interval over these

20

ACCEPTED MANUSCRIPT

50 repetitions is reported.

Table 1

Dataset description of twenty real-world datasets from UCI and KEEL repository. Samples

Attributes

Classes

Training set

Air

UCI

359

64

3

120

Balance

UCI

625

4

3

200

German

UCI

1000

24

2

300

Glass

UCI

214

9

6

70

Ionosphere

UCI

351

34

Landsat

UCI

2000

36

Monk-2

KEEL

432

6

Optigits

KEEL

5620

64

Phoneme

KEEL

5404

5

Ring

KEEL

7400

20

Saheart

KEEL

462

9

Segment

UCI

2310

Spambase

KEEL

4597

Tae

KEEL

151

Texture

KEEL

5500

Thyroid

KEEL

7200

Vehicle

UCI

846

Vote

UCI

435

KEEL

Wine

UCI

117

6

700

2

144

10

1900

2

1800

2

2400

2

154

18

7

770

57

2

1500

5

3

50

40

11

1800

21

3

2400

18

4

282

16

2

145

990

13

11

330

178

13

3

60

AN US

2

PT

ED

Vowel

CR IP T

Database

M

Dataset

CE

5.2 Experiments on real-world datasets 5.2.1 Results of the classification performance

AC

As previously stated, the classification performance of the proposed MLM-KHNN

classifier is compared to KNN, WKNN, LMKNN, PNN, LMPNN, MKNN, CFKNN, FRNN and HBKNN rules over twenty real-world datasets of UCI and KEEL by means of the error rate. The WKNN rule is a famous distance-weighted k-nearest neighbor classifier where larger weights are given to the neighbors with a smaller distance to the query sample. The LMKNN rule employs the local mean vector in 21

ACCEPTED MANUSCRIPT

each class to classify the query sample, as described in Section 3.1. Based on the LMKNN rule, the PNN and LMPNN rule were both successfully developed in order to obtain a better classification performance. The PNN rule firstly attempts to obtain the pseudo nearest neighbor in each class, and then assigns the label of closest pseudo

CR IP T

nearest neighbor to the query sample, whereas the LMPNN rule integrates the ideas of both LMKNN and PNN method. The MKNN rule is based on the application of the mutual nearest neighbor to obtain more reliable nearest neighbors. The CFKNN rule

AN US

uses the coarse-to-fine strategy to obtain a revised training dataset that can better represent the query sample. The FRNN rule is a famous fuzzy-based k-nearest neighbor classifier with richer class confidence values based on the fuzzy-rough

M

ownership function. Finally, the HBKNN rule combines the fuzzy membership in the fuzzy k-nearest neighbor classifier and the similar local information in the LMKNN

ED

classifier.

PT

The experiments are carried out by a 50-trials holdout procedure, where for each trial, the dataset is randomly divided into the testing samples and the training samples

CE

under the size shown in Table 1. In our experiments, the value of neighborhood size k

AC

for each classifier is firstly optimized by the CV approach in the training set, and then the average error rate with 95% confidence with the optimized parameter k is used for the final classification performance evaluation.

22

ACCEPTED MANUSCRIPT

Table 2 Data

The error rate (%) of each classifier with the corresponding 95% confidence interval on twenty real-world datasets. WKNN

LMKNN

PNN

LMPNN

MKNN

CFKNN

FRNN

HBKNN

Air

21.44±1.32

21.44±1.32

20.50±1.06

21.44±1.32

18.95±1.14

20.73±1.37

24.54±1.25

27.33±1.06

22.36±1.54

18.66±1.11

Balance

13.11±0.72

13.26±0.69

11.20±0.80

12.95±0.67

12.54±0.64

13.02±0.73

12.02±0.64

13.33±0.81

13.77±0.65

10.71±0.74

German

30.74±0.42

30.42±0.54

30.90±0.61

30.75±0.50

31.10±0.66

30.56±0.57

29.64±0.63

28.58±0.77

31.50±0.65

30.33±0.77

Glass

35.80±1.43

34.38±1.65

35.80±1.43

34.72±1.59

34.03±1.26

33.92±1.42

36.83±1.28

36.49±2.05

33.49±1.58

33.03±1.42

Ionosphere

16.35±1.04

15.62±1.00

14.06±1.26

14.83±1.08

14.23±0.96

14.64±1.04

12.09±0.67

16.54±1.00

15.17±1.23

13.03±1.05

Landsat

13.36±0.33

12.87±0.37

12.18±0.38

12.74±0.34

11.56±0.36

11.94±0.26

Monk-2

5.17±0.75

6.09±1.01

4.62±0.49

4.60±0.80

4.44±0.54

4.65±0.77

Optigits

1.99±0.36

1.86±0.51

1.49±0.37

1.88±0.59

1.60±0.28

1.78±0.51

Phoneme

13.15±0.28

13.01±0.34

13.18±0.34

12.99±0.35

12.68±0.32

13.06±0.43

Ring

28.08±0.39

28.08±0.39

9.59±0.42

28.08±0.39

8.71±0.41

17.65±0.32

Saheart

8.30±1.80

8.12±1.74

8.30±1.80

8.30±1.80

7.66±1.79

8.05±1.68

Segment

6.83±0.27

6.83±0.27

6.59±0.26

6.72±0.32

6.00±0.27

6.83±0.27

Spambase

22.76±0.45

21.74±0.33

20.72±0.25

21.13±0.23

19.13±0.24

Tae

56.44±2.37

56.38±2.48

56.32±2.20

56.28±2.06

Texture

2.16±0.13

2.08±0.20

1.84±0.20

Thyroid

6.54±0.19

6.37±0.21

Vehicle

34.01±0.68

Vote

CR IP T

KNN

MLM-KHNN

16.34±0.36

12.15±0.40

11.43±0.40

8.35±1.25

7.21±1.65

2.88±0.52

4.42±0.52

1.96±0.56

2.95±0.67

2.48±0.59

1.44±0.31

26.60±0.45

17.39±0.28

12.65±0.35

12.43±0.27

23.49±0.46

31.16±0.54

31.08±0.42

7.93±0.35

12.48±2.12

7.41±1.81

12.37±1.69

7.41±1.68

11.62±0.35

6.33±0.30

5.93±0.30

5.80±0.27

22.35±0.43

21.36±0.27

21.38±0.27

21.25±0.24

17.84±0.32

55.14±2.50

56.43±2.37

60.40±2.01

54.45±2.46

55.74±3.00

53.16±2.39

1.97±0.13

1.65±0.17

1.84±0.17

11.97±0.28

4.81±0.14

3.19±0.21

1.57±0.15

6.23±0.25

6.42±0.23

6.10±0.20

6.18±0.21

8.33±0.31

7.39±0.27

6.77±0.25

6.02±0.20

33.08±0.79

30.73±0.73

33.08±0.85

29.50±0.75

32.61±0.78

29.57±0.76

34.45±1.08

31.81±0.86

28.01±0.73

8.88±0.52

8.93±0.60

7.22±0.63

9.00±0.65

7.21±0.53

8.38±0.45

7.60±0.66

8.01±0.68

8.03±0.65

6.69±0.57

Vowel

16.56±0.76

16.56±0.76

16.56±0.76

16.56±0.76

16.56±0.76

16.38±0.77

18.02±0.94

22.83±0.95

15.36±0.79

16.09±0.86

Wine

23.33±2.34

23.33±2.34

23.33±2.34

23.33±2.34

23.27±2.47

23.22±2.34

22.78±2.80

15.89±1.50

22.97±1.46

21.39±2.77

Average

18.25±0.83

18.02±0.88

16.57±0.83

17.89±0.90

16.10±0.86

17.21±0.84

19.85±0.91

18.41±0.93

18.04±0.87

15.36±0.84

ED

M

AN US

17.50±0.45

PT

Table 2 collects the comparison results in terms of classification performance of

CE

each classifier by giving the average error rate under the optimized value of the neighborhood size k, and the corresponding 95% confidence interval is also listed.

AC

From the experimental results in Table 2, it can be observed that the proposed MLM-KHNN classifier outperforms all nine state-of-the-art KNN-based methods for almost all of the twenty real-world datasets. In particular, the MLM-KHNN classifier significantly reduces the error rate of the standard LMKNN rule by introducing the concepts of multi-local mean vectors and harmonic mean distance similarity, as it can focus on the more reliable local mean vectors with smaller distances to the query 23

ACCEPTED MANUSCRIPT

sample in each class. With regards to the CFKNN, FRNN and HBKNN classifiers, although they hold quite low error rate on a few datasets, the proposed MLM-KHNN classifier is still able to outperform them on most of the datasets. In order to further validate the classification performance of the proposed

CR IP T

MLM-KHNN classifier under different values of parameter k, the average classification error rate of all twenty datasets of the KNN, WKNN, LMKNN, PNN, LMPNN, MKNN and CFKNN and the MLM-KHNN classifiers under the condition

AN US

of k=1~15 with a step size of 2 are listed in Table 3. Note that the average classification error rate under different choice of neighborhood size k of the FRNN and the HBKNN are not listed in Table 3. This is due to the fact that the FRNN

M

classifier employs all training samples for classification and does not need to obtain the optimal value of k, while the HBKNN classifier empirically gives the default

ED

value of k. Thus, their error rates on each dataset are constant and are independent of

Table 3

PT

the parameter k.

The average error rate (%) on twenty real-world datasets of each classifier with the corresponding 95%

Data

CE

confidence interval when k=1~15 with the step size of 2. KNN

WKNN

LMKNN

PNN

LMPNN

MKNN

CFKNN

MLM-KHNN

19.49±0.88

19.49±0.88

19.49±0.88

19.49±0.88

19.49±0.88

19.49±0.88

23.44±1.07

19.49±0.88

20.62±0.85

19.35±0.93

18.29±0.79

19.27±0.89

17.56±0.86

19.36±0.85

22.45±0.93

16.79±0.85

k=5

21.79±0.83

19.23±0.88

18.72±0.77

19.45±0.88

17.18±0.86

19.61±0.84

22.79±0.90

16.18±0.88

k=7

22.53±0.91

19.46±0.87

19.21±0.80

19.67±0.86

17.01±0.86

19.84±0.77

22.79±0.88

16.03±0.87

k=9

23.30±0.88

19.71±0.82

19.98±0.80

20.02±0.84

16.91±0.83

20.01±0.79

22.59±0.79

16.04±0.86

k=11

24.14±0.96

19.95±0.86

20.46±0.82

20.27±0.90

16.93±0.81

20.21±0.78

22.52±0.83

16.05±0.88

k=13

24.90±0.94

20.27±0.86

20.04±0.80

20.50±0.90

16.93±0.80

20.46±0.78

22.61±0.88

16.08±0.87

k=15

25.51±0.95

20.53±0.87

20.48±0.77

20.78±0.89

16.96±0.79

20.74±0.80

22.50±0.88

16.14±0.87

k=1

AC

k=3

24

ACCEPTED MANUSCRIPT

From the results in Table 3, it can be found that the MLM-KHNN classifier achieves the best classification performance in all cases, especially when compared with other LMKNN-based methods. As discussed in Section 4, an inappropriate value of k can easily lead to a higher error rate due to existing outliers, however the results

CR IP T

in Table 3 indicate that the proposed method is less sensitive to outliers and can

AN US

potentially reduce the error rate independently of the neighborhood size k.

5.2.2 Results of the sensitivity to the neighborhood size k

To evaluate the sensitivity of the classification performance to the parameter k, the error rates of different classifiers with varying the neighborhood size k are also

M

compared.

ED

The experimental comparisons of the error rate with parameter k changing from 1 to 15 on twenty real-world datasets are shown in Fig. 2. From the results shown in Fig.

PT

2, it can be seen that the MLM-KHNN classifier has a lower error rate than other

CE

methods with different values of k for most cases. Particularly, when the value of the

AC

neighborhood size k is relatively large, the improvements of the classification performance between the MLM-KHNN and the KNN, WKNN, PNN, LMPNN, MKNN and CFKNN can be very significant. As previously stated, the classification performances of the FRNN and the HBKNN classifiers do not rely on the choice of k, hence they are not shown in Fig. 2. Moreover, the curve of the error rate of the proposed MLM-KHNN method is

25

ACCEPTED MANUSCRIPT

shown to be smoother and flatter when compared with other classifiers that are based on the LMKNN rule, such as the LMKNN, PNN, LMPNN classifiers. In general, we can see that the proposed MLM-KHNN can both improve the classification performance and be less sensitive to the neighborhood size k. 22

KNN WKNN LMKNN PNN LMPNN MKNN CFKNN MLM-KHNN

36

34

KNN WKNN LMKNN PNN LMPNN MKNN CFKNN MLM-KHNN

CR IP T

38

20

32

error rate (%)

error rate (%)

18 30

28

26

16

14 24

22 12

0

5

10

0

5

10

15

10

15

k

(a) Air

(b) Balance

37

44

KNN WKNN LMKNN PNN LMPNN MKNN CFKNN MLM-KHNN

36

34

33

32

KNN WKNN LMKNN PNN LMPNN MKNN CFKNN MLM-KHNN

42

M

35

error rate (%)

10

15

k

40

error rate (%)

18

AN US

20

38

30

29

0

5

ED

36 31

10

34

32

15

0

5

k

PT

k

(c) German

28

(d) Glass 30

CE

KNN WKNN LMKNN PNN LMPNN MKNN CFKNN MLM-KHNN

26

KNN WKNN LMKNN PNN LMPNN MKNN CFKNN MLM-KHNN

28

26

24

24

error rate (%)

AC

error rate (%)

22

20

18

22

20

18

16

16

14

14

12

12

0

5

10

10

15

k

0

5

10

k

(e) Ionosphere

(f) Landsat

26

15

ACCEPTED MANUSCRIPT

16

4.5

KNN WKNN LMKNN PNN LMPNN MKNN CFKNN MLM-KHNN

14

KNN WKNN LMKNN PNN LMPNN MKNN CFKNN MLM-KHNN

4

3.5

error rate (%)

error rate (%)

12

10

3

2.5

8 2

6

4

1.5

0

5

10

1

15

0

5

(g) Monk-2

15

(h) Optigits

35

45

KNN WKNN LMKNN PNN LMPNN MKNN CFKNN MLM-KHNN

30

KNN WKNN LMKNN PNN LMPNN MKNN CFKNN MLM-KHNN

40

35

30

error rate (%)

25

25

AN US

error rate (%)

10

k

CR IP T

k

20

20

15

15

10

0

5

10

ED

20

error rate (%)

18

16

14

10

8

0

5

CE

6

PT

12

5

22

AC 25

15

KNN WKNN LMKNN PNN LMPNN MKNN CFKNN MLM-KHNN

20

18

16

14

12

10

8

6

10

4

15

0

5

k

10

15

k

(k) Saheart

26

10

(j) Ring

M

KNN WKNN LMKNN PNN LMPNN MKNN CFKNN MLM-KHNN

22

0

k

(i) Phoneme 24

5

15

k

error rate (%)

10

(l) Segment 68

KNN WKNN LMKNN PNN LMPNN MKNN CFKNN MLM-KHNN

KNN WKNN LMKNN PNN LMPNN MKNN CFKNN MLM-KHNN

66

24

64

62

error rate (%)

error rate (%)

23

22

21

60

58 20 56 19 54

18

17

0

5

10

52

15

k

0

5

10

k

(m) Spambase

(n) Tae 27

15

ACCEPTED MANUSCRIPT

25

18

KNN WKNN LMKNN PNN LMPNN MKNN CFKNN MLM-KHNN

20

KNN WKNN LMKNN PNN LMPNN MKNN CFKNN MLM-KHNN

16

error rate (%)

error rate (%)

14 15

10

12

10

5 8

0

0

5

10

6

15

0

5

(o) Texture 38

36

11

KNN WKNN LMKNN PNN LMPNN MKNN CFKNN MLM-KHNN

10.5

10

35 9.5

error rate (%)

34

33

9

8.5

AN US

error rate (%)

15

(p) Thyroid

KNN WKNN LMKNN PNN LMPNN MKNN CFKNN MLM-KHNN

37

10

k

CR IP T

k

32

8

31

7.5

30

7

29

0

5

10

15

k

55

M

45

40

35

30

25

CE

20

15

0

5

10

15

38

10

15

KNN WKNN LMKNN PNN LMPNN MKNN CFKNN MLM-KHNN

36

34

32

PT

error rate (%)

50

5

(r) Vote

ED

KNN WKNN LMKNN PNN LMPNN MKNN CFKNN MLM-KHNN

60

0

k

(q) Vehicle

65

6.5

error rate (%)

28

30

28

26

24

22

10

20

15

k

0

5

k

(t) Wine

AC

(s) Vowel

Fig. 2. The error rates of each classifier by changing the value of k on twenty real-world datasets

6. Discussions In pattern classification, computational complexity is an important issue when designing an effective classifier for practical applications. To better explain the merits

28

ACCEPTED MANUSCRIPT

of the proposed MLM-KHNN classifier, comparisons of the computational complexities between the LMKNN and MLM-KHNN classifiers are further discussed in this section. We only focus on the complexity of online computations in the classification stage.

CR IP T

Let 𝑁𝑡𝑟 denote the number of the training samples, 𝑁𝑗 denote the number of training samples in class 𝜔𝑗 , p represent the feature dimensionality, M denote the number of classes and k represent the neighborhood size.

AN US

For the traditional LMKNN classifier, the classification stage mainly contains three steps. The first step is to search for the k nearest neighbors in each class based on the Euclidean distance, the multiplications and sum operations are all equal to 𝑂(𝑁1 𝑝 + 𝑁2 𝑝 + ⋯ + 𝑁𝑀 𝑝), which can also be abbreviated to 𝑂(𝑁𝑡𝑟 𝑝). Additionally,

M

the comparisons are 𝑂(𝑁1 𝑘 + 𝑁2 𝑘 + ⋯ + 𝑁𝑀 𝑘), which are equal to 𝑂(𝑁𝑡𝑟 𝑘). The

ED

second step is to compute the local mean vector of each class, which requires

PT

𝑂(𝑀𝑝𝑘) sum operations. The third and final step assigns the query sample to the class with the smallest distance between its local mean and the given query, and it is

CE

characterized by 𝑂(𝑀𝑝) multiplications and sum operations, whereas the class label

AC

is determined with O(M) comparisons. Thus, the total computational complexity of the LMKNN rule is 𝑂(2𝑁𝑡𝑟 𝑝 + 𝑁𝑡𝑟 𝑘 + 𝑀𝑝𝑘 + 2𝑀𝑝 + 𝑀). For the proposed MLM-KHNN classifier, its classification stage consists of four

steps. The first step is the same as the LMKNN rule. During its second step, the MLM-KHNN method obtains the k multi-local mean vectors. It can be easily shown 𝑟 (𝑥) 𝑟−1 (𝑥) 𝑟 from the steps in Section 4 that 𝑚𝜔 = 𝑚𝜔 ∪ {𝑚 ̅𝜔 } , 𝑟 = 1, 2, ⋯ , 𝑘, and 𝑗 𝑗 𝑗

29

ACCEPTED MANUSCRIPT

𝑟−1 (𝑥) 𝑚𝜔 can be reused from the former computations when r varies from 1 to (r-1). 𝑗

Thus, the sum operations are equal to 𝑂(𝑀𝑝𝑘). At the third step, the harmonic mean distance between the query x and k multi-local mean vectors is calculated for each class, which requires 𝑂(𝑀𝑝𝑘) multiplications and 𝑂(𝑀𝑝𝑘 + 𝑀𝑘) sum operations,

CR IP T

as illustrated in Eq. (10). Then, in the final step, the proposed method classifies the query sample to the class with the minimum harmonic mean distance to the given query with O(M) comparisons. Thus, the total computational complexity of the

AN US

MLMKHNN rule is 𝑂(2𝑁𝑡𝑟 𝑝 + 𝑁𝑡𝑟 𝑘 + 3𝑀𝑝𝑘 + 𝑀𝑘 + 𝑀).

From the above analysis, it can be seen that the increased computation costs of the proposed method are 𝑂(2𝑀𝑝𝑘 + 𝑀𝑘 − 2𝑀𝑝) = 𝑂(2𝑀𝑝(𝑘 − 1) + 𝑀𝑘) > 0. Since the number of classes M and the neighborhood size k are usually much smaller than

PT

are rather small.

ED

M

the value of the training sample size 𝑁𝑡𝑟 , it means that the computational differences

7. Conclusions

CE

In this paper, we proposed a new KNN-based classifier, which is called

AC

multi-local means-based k-harmonic nearest neighbor (MLM-KHNN) classifier. The proposed method aims at improving the classification performance of the KNN-based schemes, especially the local mean-based KNN classification approaches, such as the LMKNN, PNN and LMPNN classifiers. To overcome the negative influence of the single and uniform value of k used in the local mean-based KNN classifiers, the proposed method significantly enhances the classification performance mainly from

30

ACCEPTED MANUSCRIPT

two key factors. The first improvement resides in the fact that in the MLM-KHNN rule, as many as k different local mean vectors that we call multi-local mean vectors are computed in each class to achieve a more robust classification performance with lower error rate. The second improvement resides in the fact that we first applied the

CR IP T

harmonic mean distance to measure the similarity, which allows it to focus on the more reliable local mean vectors that have smaller distances to the query sample.

To evaluate the classification performance of the proposed MLM-KHNN rule,

AN US

nine state-of-the-art KNN-based approaches have been compared: KNN, WKNN, LMKNN, PNN, LMPNN, MKNN, CFKNN, FRNN and HBKNN classifiers. Among them, in addition to the standard KNN classifier and three local mean-based KNN

M

classifiers, we also compared our method with five other nearest neighbor classifiers related to different research topics of the KNN rule, such as the distance weighting,

ED

noisy data elimination and fuzzy nearest neighbor classifier. Experimental results on

PT

twenty real-world datasets of the UCI and KEEL repository demonstrated that the proposed MLM-KHNN classifier achieves lower classification error rate and is less

CE

sensitive to outliers under different choices of neighborhood size k.

AC

Furthermore, it was shown that when compared with the traditional LMKNN rule, the increased computations required by the proposed MLM-KHNN method are only related to the number of classes M and the neighborhood size k, which are usually much smaller than the value of the training sample size 𝑁𝑡𝑟 . Therefore, the computational differences between the MLM-KHNN classifier and the LMKNN classifier are very small.

31

ACCEPTED MANUSCRIPT

Acknowledgement This work was supported in part by the Major Programs of National Natural Science Foundation of China (Grant No. 41390454), in part by the Specialized Research Fund for the Doctoral Program of Higher Education (Grant No.

CR IP T

20130201110071), in part by the Open Project Program of the National Laboratory of Pattern Recognition (Grant No. 201407370) and in part by the Open Project Program

AN US

of the State Key Lab of CAD&CG (Grant No. A1512).

References

Alcalá-Fdez, J., Fernández, A., Luengo, J., Derrac, J., & García, S. (2011). Keel data-mining software

M

tool: data set repository, integration of algorithms and experimental analysis framework. Journal of Multiple-Valued Logic & Soft Computing, 17, 255-287. & Jain, A. K. (2010). A note on distance-weighted -nearest neighbor rules. IEEE

ED

Bailey, T.,

PT

Transactions on Systems Man and Cybernetics, 8(4), 311-313. Bhattacharya, G., Ghosh, K., & Chowdhury, A. S. (2015). A probabilistic framework for dynamic k

CE

estimation in kNN classifiers with certainty factor. Eighth International Conference on Advances in

AC

Pattern Recognition, 1-5. Chen, L., & Guo, G. (2015). Nearest neighbor classification of categorical data by attributes weighting. Expert Systems with Applications, 42(6), 3142-3149. Cover, T., & Hart, P. (1967). Nearest neighbor pattern classification. IEEE Transactions on Information Theory, 13, 21-27.

32

ACCEPTED MANUSCRIPT

Dudani, S. A. (1976). The distance-weighted k-nearest-neighbor rule. IEEE Transactions on Systems Man & Cybernetics, SMC-6(4), 325-327. Garcia-Pedrajas, N., Del-Castillo, J. A., & Cerruela-Garcia, G. (2015). A proposal for local k values for k-nearest neighbor rule. IEEE Transactions on Neural Networks & Learning Systems.

classification. Knowledge-Based Systems, 70(C), 361-375.

CR IP T

Gou, J., Zhan, Y., Rao, Y., Shen, X., Wang, X., & He, W. (2014). Improved pseudo nearest neighbor

Hu, Q., Yu, D., & Xie, Z. (2008). Neighborhood classifiers. Expert Systems with Applications, 34,

AN US

866-876.

Jiang, S., Pang, G., Wu, M., & Kuang, L. (2012). An improved K-nearest neighbor algorithm for text categorization. Expert Systems with Applications, 39, 1503-1509.

M

Keller, J. M., Gray, M. R., & Givens, J. A. (1985). A fuzzy k-nearest neighbor algorithm. IEEE Transactions on Systems Man & Cybernetics, SMC-15(4), 580-585.

ED

Li, B., Chen, Y., & Chen, Y.. (2008). The nearest neighbor algorithm of local probability centers.

PT

IEEE Transactions on Systems, Man, and Cybernetics-Part B-Cybernetics, 38, 141-154. Lin, Y., Li, J., Lin, M., & Chen, J. (2014). A new nearest neighbor classifier via fusing neighborhood

CE

information. Neurocomputing, 143(16), 164-169.

AC

Liu, Z. G., Pan, Q., & Dezert, J. (2013). A new belief-based k-nearest neighbor classification method. Pattern Recognition, 46(3), 834-844. Liu, Z. G., Pan, Q., Dezert, J., & Mercier, G. (2014). Fuzzy-belief K-nearest neighbor classifier for uncertain data. Fusion 2014 Int Conf on Information Fusion , 1-8. Liu, H., & Zhang, S. (2012). Noisy data elimination using mutual k-nearest neighbor for classification mining. Journal of Systems & Software, 85(5), 1067-1074.

33

ACCEPTED MANUSCRIPT

Mateos-García, D., García-Gutiérrez, J., & Riquelme-Santos, J. C. (2016). An evolutionary voting for k-nearest neighbours. Expert Systems with Applications, 43, 9-14. Merz,

C.,

&

Murphy,

P.

(1996).

UCI

repository

of

machine

learning

databases.

.

CR IP T

Mitani, Y., & Hamamoto, Y. (2006). A local mean-based nonparametric classifier. Pattern Recognition Letters, 27(10), 1151-1159.

Rodger, J. A. (2014). A fuzzy nearest neighbor neural network statistical model for predicting demand

AN US

for natural gas and energy cost savings in public buildings. Expert Systems with Applications, 41(4), 1813-1829.

Rodger, J. A. (2015). Discovery of medical big data analytics: improving the prediction of traumatic

M

brain injury survival rates by data mining patient informatics processing software hybrid hadoop hive. Informatics in Medicine Unlocked, 1, 17-26.

ED

Samsudin, N. A., & Bradley, A. P. (2010). Nearest neighbour group-based classification. Pattern

PT

Recognition, 43(10), 3458-3467.

Sarkar, M. (2007). Fuzzy-rough nearest neighbor algorithms in classification. Fuzzy Sets & Systems,

CE

158(19), 2134-2152.

AC

Toussaint, G. (1974). Bibliography on estimation of misclassification. IEEE Transactions on Information Theory, 20(4), 472-479. Wagner, T. J. (1971). Convergence of the nearest neighbor rule. IEEE Transactions on Information Theory, 17(5), 566-571. Wang, J., Neskovic, P., & Cooper, L. N. (2006). Neighborhood size selection in the k -nearest-neighbor rule using statistical confidence. Pattern Recognition, 39(3), 417-423.

34

ACCEPTED MANUSCRIPT

Xia, W., Mita, Y., & Shibata, T. (2016). A nearest neighbor classifier employing critical boundary vectors for efficient on-chip template reduction. IEEE Transactions on Neural Networks & Learning Systems, 27(5), 1094-1107. Xu, Y., Zhu, Q., Fan, Z., Qiu, M., Chen, Y., & Liu, H. (2013). Coarse to fine k nearest neighbor

CR IP T

classifier. Pattern Recognition Letters, 34(9), 980-986. Yang, M., Wei, D., & Tao, D. (2013). Local discriminative distance metrics ensemble learning. Pattern Recognition, 46(8), 2337-2349.

Cybernetics, 46(6), 1263-1275.

AN US

Yu, Z., Chen, H., Liu, J., & You, J. (2016). Hybrid k-nearest neighbor classifier. IEEE Transactions on

Zeng, Y., Yang, Y., & Zhao, L. (2009). Pseudo nearest neighbor rule for pattern classification. Expert

M

Systems with Applications, 36(2), 3587-3595.

Zeng, Y., Yang, Y. P., & Zhao, L. (2009). Nearest neighbour classification based on local mean and

ED

class mean. Expert Systems with Applications, 36(4), 8443-8448.

PT

Zhang, N., Yang, J., & Qian, J. J. (2012). Component-based global k-nn classifier for small

AC

CE

sample size problems. Pattern Recognition Letters, 33(13), 1689-1694.

35