A fitting model based intuitionistic fuzzy rough feature selection

A fitting model based intuitionistic fuzzy rough feature selection

Engineering Applications of Artificial Intelligence 89 (2020) 103421 Contents lists available at ScienceDirect Engineering Applications of Artificia...

4MB Sizes 0 Downloads 82 Views

Engineering Applications of Artificial Intelligence 89 (2020) 103421

Contents lists available at ScienceDirect

Engineering Applications of Artificial Intelligence journal homepage: www.elsevier.com/locate/engappai

A fitting model based intuitionistic fuzzy rough feature selection✩ Pankhuri Jain (MCA) a , Anoop Kumar Tiwari (PhD) b ,∗, Tanmoy Som (PhD) a a b

Department of Mathematical Sciences, IIT (BHU), Varanasi, 221005, India Department of Computer Science, BHU, Varanasi, 221005, India

ARTICLE Keywords: Rough set Fuzzy set Intuitionistic fuzzy set Degree of dependency Feature selection

INFO

ABSTRACT Feature subset selection is an essential machine learning approach aimed at the process of dimensionality reduction of the input space. By removing irrelevant and/or redundant variables, not only it enhances model performance, but also facilitates its improved interpretability. The fuzzy set and the rough set are two different but complementary theories that apply the fuzzy rough dependency as a criterion for performing feature subset selection. However, this concept can only maintain a maximal dependency function. It cannot preferably illustrate the differences in object classification and does not fit a particular dataset well. This problem was handled by using a fitting model for feature selection with fuzzy rough sets. However, intuitionistic fuzzy set theory can deal with uncertainty in a much better way when compared to fuzzy set theory as it considers positive, negative and hesitancy degree of an object simultaneously to belong to a particular set. Therefore, in the current study, a novel intuitionistic fuzzy rough set model is proposed for handling above mentioned problems. This model fits the data well and prevents misclassification. Firstly, intuitionistic fuzzy decision of a sample is introduced using neighborhood concept. Then, intuitionistic fuzzy lower and upper approximations are constructed using intuitionistic fuzzy decision and parameterized intuitionistic fuzzy granule. Furthermore, a new dependency function is established. Moreover, a greedy forward algorithm is given using the proposed concept to calculate reduct set. Finally, this algorithm is applied to the benchmark datasets and a comparative study with the existing algorithm is presented. From the experimental results, it can be observed that the proposed model provides more accurate reduct set than existing model.

1. Introduction Millions of data is generated in multiple scenarios, including weather, census, health care, government, social networking, production, business, and scientific research. Such high dimensional data may increase inefficiency of classifiers, as they possess several irrelevant or redundant features. Therefore, it is necessary to preprocess the dataset before applying any classification algorithm. Feature selection is a preprocessing step to remove irrelevant and/or redundant features and offers more concise and explicit descriptions of data. Feature selection has got wide applications in data mining, signal processing, bioinformatics, machine learning, etc. (Iannarilli and Rubin, 2003; Jaeger et al., 2002; Jain et al., 2000; Kohavi and John, 1997; Kwak and Chong-Ho, 2002; Langley, 1994; Webb and Copsey, 2011; Xiong et al., 2001). Rough set (as introduced by Pawlak, 1982, 2012; Pawlak et al., 1995) based feature selection technique utilizes information present in the data alone and successfully produces the reduct set without using any additional information. It deals with indiscernibility between

attributes. In this model, the dependency between conditional and decision attribute is determined to evaluate the classification ability of attributes. However, data need to be discretized in order to apply rough set based feature selection technique, which frequently leads to information loss. Fuzzy rough set based feature selection overcomes this problem of information loss. Fuzzy rough set theory (as proposed by Dubois and Prade, 1990, 1992) deals with the concept of vagueness and indiscernibility by combining the concepts of fuzzy set theory (Klir and Yuan, 1995; Zadeh, 1965) and rough set theory (Pawlak, 1982; Pawlak et al., 1995). In fuzzy rough set theory, a similarity relation is defined between the samples and lower as well as upper approximations are constructed on the basis of this relation. Union of lower approximations gives the positive region of decision. The greater is the membership to positive region; more is the possibility of sample belonging to a particular category. Using dependency function (Chen et al., 2011, 2012a,b; Degang and Suyun, 2010; Hu et al., 2006, 2010; Jensen and Shen, 2004a,b, 2005, 2007, 2008, 2009; Kumar et al., 2011; Suyun et al., 2009; Tsang

✩ No author associated with this paper has disclosed any potential or pertinent conflicts which may be perceived to have impending conflict with this work. For full disclosure statements refer to https://doi.org/10.1016/j.engappai.2019.103421. ∗ Corresponding author. E-mail addresses: [email protected] (P. Jain), [email protected] (A.K. Tiwari), [email protected] (T. Som).

https://doi.org/10.1016/j.engappai.2019.103421 Received 7 April 2019; Received in revised form 10 July 2019; Accepted 5 December 2019 Available online xxxx 0952-1976/© 2019 Published by Elsevier Ltd.

P. Jain, A.K. Tiwari and T. Som

Engineering Applications of Artificial Intelligence 89 (2020) 103421 Table 1 Example dataset.

et al., 2008; Wang et al., 2019a, 2016, 2019b), significance of a subset of features is evaluated. Also, the conditional entropy measure is used in Wang et al. (2017, 2019) to find reduct set for homogeneous and heterogeneous datasets respectively. However, it may lead to misclassification of samples when there is a large degree of overlap between different categories of data (Wang et al., 2017b). Also, it deals only with membership of sample to a set. Hence, there is a need of different kind of model that can both fit data well and at the same time it can handle uncertainty arising due to non-membership as uncertainty is not found only in judgment but also in the identification. Intuitionistic fuzzy (IF) set (Atanasov, 1999; Atanassov, 1986, 1989) handles the uncertainty by considering both membership and nonmembership of a sample to a set. In spite of the fact that rough and IF sets both capture specific aspects of the same idea-imprecision, the combination of IF set theory and rough set theory are rarely discussed by the researchers. Jena et al. (2002) demonstrated that lower and upper approximations concept of IF rough sets are again IF sets. In the last few years, some of the IF rough set models have been established (Chakrabarty et al., 1998; Cornelis et al., 2003; De et al., 1998; Huang et al., 2013; Jena et al., 2002; Nanda and Majumdar, 1992; Rizvi et al., 2002; Samanta and Mondal, 2001; Zhang et al., 2019, 2012) and applied for various decision making problems. Çoker (1998) discussed relationship between rough set and IF set and revealed the fact that fuzzy rough set is admittedly an intuitionistic L-fuzzy set. Huang et al. (2013) established dominance in intuitionistic fuzzy rough set and presented its various applications. Moreover, some of the recently published research articles have presented intuitionistic fuzzy rough set based feature selection or attribute reduction techniques (Chen and Yang, 2011; Esmail et al., 2013; Huang et al., 2012; Lu et al., 2009; Shreevastava et al., 2018; Tiwari et al., 2018a,b; Zhang, 2016). Lu et al. (2009) established the genetic algorithm for performing attribute reduction of the intuitionistic fuzzy information system (IFIS). An intuitionistic fuzzy rough set model was presented by Huang et al. (2012) by using distance function. Furthermore, they generalized it for attribute reduction. An approach for attribute reduction based on the discernibility matrix concept was given by Zhang (2016). Chen and Yang (2011) presented a novel attribute reduction algorithm by combining intuitionistic fuzzy rough set with information entropy. Esmail et al. (2013) discussed about the structure of the intuitionistic fuzzy rough set model as well as its properties and presented concepts of attribute reduction and rule extraction. Tan et al. (2018) established an intuitionistic fuzzy rough set model and applied it for attribute subset selection. Tiwari et al. (2018a,b) and Shreevastava et al. (2019, 2018) established different intuitionistic fuzzy rough set models and developed various feature subset selection techniques for supervised as well as semi-supervised datasets. Li et al. (2019) proposed a novel intuitionistic fuzzy clustering algorithm using feature selection for tracking multiple objects. In the recent years, various research articles (Boran et al., 2009; Revanasiddappa and Harish, 2018; Singh et al., 2019; Tiwari et al., 2019) have presented IF rough set models, with its application in feature selection. However, none of the above studies fit a given dataset well and can ideally illustrate the differences in sample classification (Wang et al., 2017a). In the current work, a new intuitionistic fuzzy rough set model is proposed. It fits data well and prevents misclassification of data. Although a model for feature selection is presented in Sheeja and Kuriakose (2018), that fits data well and prevents misclassification, but fitting model based on intuitionistic fuzzy rough set is not yet considered. Our proposed model can handle uncertainty, vagueness and imprecision by combining intuitionistic fuzzy set and rough set for feature subset selection. Intuitionistic fuzzy decision of a sample is defined using neighborhood concept. Then, we construct intuitionistic fuzzy lower and upper approximations based on intuitionistic fuzzy decision and parameterized intuitionistic fuzzy relation. Furthermore, dependency function is presented to calculate reduct set. Moreover, a greedy forward algorithm based on proposed concept is introduced.

𝑥1 𝑥2 𝑥3 𝑥4 𝑥5 𝑥6 𝑥7 𝑥8

𝑎1

𝑎2

𝑎3

𝑎4

𝑎5

Class

0.08 0.06 0.1 0.08 0.09 0.15 0.24 0.276

0.08 0.06 0.1 0.08 0.15 0.02 0.75 0.225

0.1 0.05 0.15 0.08 0.4 0.34 0.32 0.81

0.24 0.25 0.65 0.98 0.1 0.4 0.18 0.27

0.9 0.33 0.3 0.24 0.66 0.01 0.86 0.33

4 2 3 2 3 1 4 2

Finally, this algorithm of fitting model based intuitionistic fuzzy rough feature selection (FMIFRFS) is applied to the benchmark datasets and the results are compared with the results of existing algorithm. This paper is organized as follows. In Section 2, some preliminaries are given to introduce the basic concept of intuitionistic fuzzy rough set theory. In Section 3, a fitting intuitionistic fuzzy rough set model is developed. The algorithm for feature selection is presented in Section 4. Experimental results are shown in Section 5. Finally, Section 6 concludes the entire work. 2. Preliminaries ̈ be an information system, where 𝑈 is non empty finite Let (𝑈 , 𝐶, 𝐷) collection of objects 𝑥1 , 𝑥2 , … , 𝑥𝑛 , 𝐶 be the non empty finite set of features and 𝐷̈ be the decision attribute. An intuitionistic fuzzy set 𝐴 in 𝑈 is collection of objects represented in the form 𝐴 = {[𝑥, 𝜇𝐴 (𝑥) , 𝜈𝐴 (𝑥)]|𝑥 ∈ 𝑈 } where 𝜇𝐴 ∶ 𝑈 ⤏ [0, 1] and 𝜈𝐴 ∶ 𝑈 ⤏ [0, 1] are called degree of membership and degree of non membership of the element 𝑥 respectively, satisfying 0 ≤ 𝜇𝐴 (𝑥) + 𝜈𝐴 (𝑥) ≤ 1, ∀𝑥 ∈ 𝑈 𝜋𝐴 (𝑥) = 1 − 𝜇𝐴 (𝑥) − 𝜈𝐴 (𝑥) where 𝜋𝐴 (𝑥) represents the degree of hesitancy of 𝑥 to 𝐴. It is obvious from the above discussions that 0 ≤ 𝜋𝐴 (𝑥) < 1, ∀𝑥 ∈ 𝑈 . The cardinality of an intuitionistic fuzzy set 𝐴 is defined by Iancu (2014): ∑ 1 + 𝜇𝐴 (𝑥) − 𝜈𝐴 (𝑥) |𝐴| = 2 𝑥𝜖𝑈 where 1 is added to ensure that |𝐴| is a positive number, divided by 2 so that it varies between 0 and 1. Let 𝑅𝐴 (𝑥, 𝑦) = [𝜇𝑅 (𝑥, 𝑦) , 𝜈𝑅 (𝑥, 𝑦)] be intuitionistic fuzzy relation induced on the system. 𝑅𝐴 (𝑥, 𝑦) is intuitionistic fuzzy similarity relation if it satisfies (1) Reflexivity: 𝜇𝑅 (𝑥, 𝑥) = 1, 𝜈𝑅 (𝑥, 𝑥) = 0, ∀𝑥 ∈ 𝑈 (2) Symmetry: 𝜇𝑅 (𝑥, 𝑦) = 𝜇𝑅 (𝑦, 𝑥) , 𝜈𝑅 (𝑥, 𝑦) = 𝜈𝑅 (𝑦, 𝑥) , ∀𝑥, 𝑦 ∈ 𝑈 Intuitionistic fuzzy neighborhood of an instance 𝑥 ∈ 𝑈 is intuitionistic fuzzy similarity class [x]A (y) = [𝜇[x]A , 𝜈[x]A ] associated with 𝑥 and 𝑅𝐴 where: 𝜇[x]A (y) = 𝜇𝑅𝐴 (𝑥,𝑦) , 𝑦 ∈ 𝑈 and 𝜈[x]A (y) = 𝜈𝑅𝐴 (𝑥,𝑦) , 𝑦 ∈ 𝑈 Let 𝐷̈ partitions 𝑈 into 𝑟 crisp equivalence classes 𝑈 ∕𝐷̈ = {𝐷̈ 1 , 𝐷̈ 2 , 𝐷̈ 3 , … , 𝐷̈ 𝑟 }. Then, intuitionistic fuzzy decision of x is defined as follows: ⎡ ||𝜇[𝑥] ∩ 𝐷̈ 𝑖 || ||𝜈[𝑥] ∩ 𝐷̈ 𝑖 || ⎤ |, | 𝐴 |⎥, ̃ ̈ 𝑖 (𝑥) = ⎢ | 𝐴 𝐷 | | ⎥ ⎢ ||𝜇 || 𝜈 | | ⎣ | [𝑥]𝐴 | | [𝑥]𝐴 | ⎦ 2

𝑖 = 1, 2, 3, … , 𝑟, 𝑥 ∈ 𝑈 .

P. Jain, A.K. Tiwari and T. Som

Engineering Applications of Artificial Intelligence 89 (2020) 103421

Table 2 Dataset characteristics and reduct size. Dataset

Instances

Attributes

Reduct size FMFRFS

Wine Heart Ionosphere Balloon scale+stretch Balloon scale-stretch Db-world-bodies-stemmed Db-world-bodies Cardiotocography-3 class Lung cancer Soyabean small Trains Zoo

178 267 351 20 20 64 64 2126 32 47 10 10

13 13 34 4 4 3721 4702 35 56 21 26 16

Proposed algorithm

Full training set

10 fold cross validation

Full training set

10 fold cross validation

2 2 2 2 2 4 4 2 9 4 6 2

11.9 6.6 11.3 3.3 2.9 3.7 3.7 2 9.5 4 3 8.2

6 7 9 2 2 6 6 6 5 2 3 8

12.9 11.4 8.7 2 2 5.2 17 12 5.3 2 3 6.8

Table 3 Comparison of classification accuracies for original datasets and reduced datasets by proposed model, and FMFRFS using full training. Dataset

Original

FMFRFS

Proposed model

PART

JRip

J48

PART

JRip

J48

PART

JRip

J48

Wine

92.82 ± 5.79

92.16 ± 6.36

93.90 ± 6.00

78.52 ± 10.10

75.32 ± 9.91

79.08 ± 9.63

94.66 ± 4.82

93.82 ± 5.17

95.86 ± 4.73

Heart

77.63 ± 7.55

79.27 ± 7.45

79.73 ± 6.40

64.32 ± 8.05

66.19 ± 8.48

64.66 ± 7.87

80.38 ± 7.82

81.73 ± 7.21

82.03 ± 7.21

Ionosphere

90.83 ± 4.66

89.55 ± 4.82

89.74 ± 4.38

74.93 ± 4.91

74.93 ± 4.91

74.93 ± 4.91

90.89 ± 4.89

91.34 ± 4.85

91.97 ± 4.26

Balloon scale + stretch

100.0 ± 0.00

100.0 ± 0.00

100.0 ± 0.00

56.00 ± 21.65

60.00 ± 20.10

60.00 ± 20.1

100.0 ± 0.00

100.0 ± 0.00

100.0 ± 0.00

Balloon scale-stretch

100.0 ± 0.00

100.0 ± 0.00

100.0 ± 0.00

55.00 ± 21.90

60.00 ± 20.10

60.00 ± 20.1

100.0 ± 0.00

100.0 ± 0.00

100.0 ± 0.00

Db-world-bodies-stemmed

80.00 ± 15.7

79.88 ± 15.3

80.93 ± 15.7

54.52 ± 5.29

54.52 ± 5.29

54.52 ± 5.29

89.86 ± 11.2

90.98 ± 10.83

89.86 ± 11.2

Db-world-bodies

77.81 ± 14.3

80.57 ± 14.1

76.71 ± 15.1

53.60 ± 6.20

53.14 ± 6.66

53.60 ± 6.20

87.40 ± 10.7

88.98 ± 10.58

87.40 ± 10.7

Cardiotocography-3class

98.50 ± 0.91

98.63 ± 0.83

98.67 ± 0.85

78.89 ± 2.09

78.48 ± 1.39

79.54 ± 1.99

98.40 ± 0.90

98.40 ± 0.90

98.40 ± 0.90

Lung cancer

47.08 ± 27.7

50.58 ± 23.7

44.75 ± 23.9

44.58 ± 23.85

45.50 ± 20.43

37.42 ± 21.2

66.23 ± 23.4

61.17 ± 23.75

67.08 ± 24.9

Soyabean small

100.0 ± 0.00

97.65 ± 7.12

97.65 ± 7.12

83.60 ± 15.09

81.85 ± 12.82

84.05 ± 14.3

100.0 ± 0.00

100.0 ± 0.00

100.0 ± 0.00

Trains

90.0 ± 30.15

51.0 ± 50.24

90.0 ± 30.15

60.49 ± 49.24

57.00 ± 49.76

60.0 ± 49.24

90.0 ± 30.15

82.00 ± 38.61

90.0 ± 30.15

Zoo

93.41 ± 7.28

89.81 ± 8.37

92.61 ± 7.33

71.32 ± 5.83

60.43 ± 3.06

71.32 ± 5.83

97.05 ± 4.76

92.59 ± 7.39

97.05 ± 4.76

Table 4 Comparison of classification accuracies for original datasets and reduced datasets by proposed model, and FMFRFS using 10 fold cross validation. Dataset

Original

FMFRFS

Proposed model

3NN

SVM

3NN

SVM

3NN

SVM

Wine

95.29 ± 5.40

95.88 ± 4.84

94.11 ± 5.54

94.11 ± 6.20

97.05 ± 4.99

96.47 ± 7.44

Heart

79.61 ± 8.31

72.69 ± 13.74

73.07 ± 8.10

56.53 ± 7.48

82.30 ± 8.91

71.15 ± 7.53

Ionosphere

84.57 ± 3.35

88.00 ± 8.05

89.71 ± 8.21

87.42 ± 8.43

88.85 ± 5.62

84.28 ± 6.49

Balloon scale + stretch

100.0 ± 0.00

100.0 ± 0.00

70.00 ± 48.30

70.00 ± 48.30

100.00 ± 0.00

100.00 ± 0.00

Balloon scale-stretch

100.0 ± 0.00

100.0 ± 0.00

70.00 ± 42.16

70.00 ± 42.16

100.00 ± 0.00

100.00 ± 0.00

Db-world-bodies-stemmed

58.33 ± 18.00

86.67 ± 13.14

60.00 ± 11.66

53.33 ± 10.54

70.00 ± 15.31

78.33 ± 11.24

Db-world-bodies

53.33 ± 23.30

86.67 ± 13.14

53.33 ± 20.48

53.33 ± 20.48

70.00 ± 23.030

80.00 ± 23.30

Cardiotocography-3class

98.86 ± 0.71

92.07 ± 10.37

75.33 ± 3.35

41.32 ± 8.97

98.34 ± 1.13

98.39 ± 0.77

Lung cancer

43.33 ± 22.49

46.67 ± 32.20

53.33 ± 32.30

53.33 ± 32.30

63.33 ± 10.54

53.33 ± 32.20

Soyabean small

100.0 ± 0.00

100.0 ± 0.00

85.00 ± 17.48

92.50 ± 12.07

100.00 ± 0.00

100.00 ± 0.00

Trains

50.00 ± 52.70

70.00 ± 48.30

70.00 ± 48.30

50.00 ± 52.70

70.00 ± 48.30

90.00 ± 31.62

Zoo

93.00 ± 6.74

95.00 ± 9.71

85.00 ± 13.54

84.00 ± 14.29

94.00 ± 8.43

92.00 ± 9.18

̃ ̈ 𝑖 (𝑥) is an intuitionistic fuzzy set and it indicates the degree where 𝐷 of membership and non membership of 𝑥 to decision class 𝐷̈ 𝑖 . Each decision class represents IF set given by:

̃ ̃ ̃ ̈ 1 (𝑥) , 𝐷 ̈ 2 (𝑥) , … , 𝐷 ̈ 𝑟 (𝑥)} is a intuitionistic fuzzy partiObviously, {𝐷 tion of 𝑈 . The intuitionistic fuzzy lower and upper approximations are defined as follows:

𝐷̈ 𝑖 (𝑥) = [𝑥, 𝜇𝐷̈ 𝑖 (𝑥) 𝜈𝐷̈ 𝑖 (𝑥)] where [𝑥, 𝜇𝐷̈ 𝑖 (𝑥)𝜈𝐷̈ 𝑖 (𝑥)] =

[ ( ) 𝑅𝐴 𝐷̈ 𝑖 (𝑦) = min 𝑚𝑎𝑥{1 𝑥∈𝑈

{

− 𝜇RA (𝑥, 𝑦) , 𝜇 ̃ (𝑥)}, max min{1 𝐷̈ 𝑖 𝑥∈𝑈 ] − 𝜈RA (𝑥, 𝑦) , 𝜈 ̃ (𝑥)} 𝐷̈

[1, 0], 𝑥 ∈ 𝑈 [0, 1], 𝑥 ∉ 𝑈

𝑖

3

P. Jain, A.K. Tiwari and T. Som

Engineering Applications of Artificial Intelligence 89 (2020) 103421 Table 5 Performance metrics for different classifiers for reduced Ionosphere dataset by proposed model using full training set.

𝑅𝐴 (𝐷̈ 𝑖 ) (𝑦) =

Model

Learning algorithms

Ionosphere dataset Sensitivity

Specificity

Accuracy

AUC

MCC

Original

PART JRip J48

84.1 88.1 82.5

96.0 89.8 96.4

91.7 89.2 91.5

91.6 89.4 89.2

81.9 76.9 81.3

Fitting model using FR

PART JRip J48

30.2 30.2 30.2

100 100 100

74.9 74.9 74.9

61.4 61.4 61.4

46.6 46.6 46.6

Proposed model

PART JRip J48

84.1 85.7 84.1

92.9 94.7 96.4

89.7 91.5 92.0

89.4 91.7 89.0

77.6 81.3 82.5

[ ] max min{𝜇RA (𝑥, 𝑦) , 𝜇 ̃ ̈ (𝑥)},

The lower and upper approximation of decision 𝐷̈ with respect to attribute 𝐴 is given by

𝐷𝑖

𝑥∈𝑈

min max{𝜈RA (𝑥, 𝑦) , 𝜈 ̃ ̈ (𝑥)} 𝐷𝑖

𝑥∈𝑈

⎧ max{1 − 𝜇R𝜀 (𝑥, 𝑦) , 𝜇 ̃ (𝑥)},⎤ ⎪⎡min 𝐷̈ 𝑖 A ⎢𝑥∈𝑈 ⎥ , 𝑦 ∈ 𝐷̈ 𝑖 ( ) ⎪ 𝜀 𝑅𝐴 𝐷̈ 𝑖 (𝑦) = ⎨⎢ max min{1 − 𝜈R𝜀 (𝑥, 𝑦) , 𝜈 ̃ (𝑥)} ⎥⎦ ⎣ 𝑥∈𝑈 𝐷̈ 𝑖 A ⎪ 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒 ⎪[0, 1] , ⎩

The membership of an object 𝑥 ∈ 𝑈 to intuitionistic fuzzy positive region is given by: [ ] 𝑃 𝑜𝑠𝐴 (𝑦) = max 𝜇R (𝐷̈ i ) (y) , min 𝜈R (𝐷̈ i ) (y) 𝑖

A

𝑖

A

Using intuitionistic fuzzy positive region, dependency function can be computed using formula: ∑ | | 𝑦∈𝑈 |𝑃 𝑜𝑠𝐴 (𝑦)| Y𝐴 = |𝑈 |

⎧ min{𝜇R𝜀 (𝑥, 𝑦) , 𝜇 ̃ (𝑥)},⎤ ⎪⎡max 𝐷̈ 𝑖 A ⎥ , 𝑦 ∈ 𝐷̈ 𝑖 𝜀 ( ) ⎪⎢ 𝑥∈𝑈 ̈ ⎢ 𝑅𝐴 𝐷𝑖 (𝑦) = ⎨ min max{𝜈R𝜀 (𝑥, 𝑦) , 𝜈 ̃ (𝑥)} ⎥⎦ ⎣ 𝑥∈𝑈 𝐷̈ 𝑖 A ⎪ 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒 ⎪[0, 1] , ⎩ ( ) Similar to classical rough sets, 𝑅𝜀𝐴 𝐷̈ 𝑖 (𝑦) denotes the degree of 𝜀 ( ) certainty with which sample 𝑦 belong to category 𝑖 and 𝑅𝐴 𝐷̈ 𝑖 (𝑦) indicates the possibility of 𝑦 belonging to category 𝑖. Intuitionistic fuzzy positive region is calculated using above lower approximation, which is given by: ] [ 𝑃 𝑜𝑠𝜀𝐴 (𝑦) = max 𝜇R𝜀 (𝐷̈ i ) (y) , min 𝜈R𝜀 (𝐷̈ i ) (y) .

Dependency function is defined as the ratio of sizes of positive region and overall samples in feature space. However, this model can result in misclassification of training samples, as illustrated by fuzzy counterpart (Sheeja and Kuriakose, 2018). 3. A fitting model based on intuitionistic fuzzy rough set

𝑖

̈ where 𝐶 = {𝑎1 , 𝑎2 , The information system is denoted by (𝑈 , 𝐶, 𝐷), … , 𝑎𝑚 } is the set of conditional attributes and 𝐷̈ is the decision of the system. 𝑈 = {𝑥1 , 𝑥2 , … , 𝑥𝑛 } comprises of the set of samples. Set of samples partitions the decisions into 𝑟 crisp equivalence classes U/𝐷̈ = ̃ ̃ ̃ ̈ 1, 𝐷 ̈ 2, … , 𝐷 ̈ 𝑟 } is intuitionistic fuzzy decision of {𝐷̈ 1 , 𝐷̈ 2 , . . . , 𝐷̈ r }, and {𝐷 samples induced by 𝐷̈ and 𝐶. Let 𝑅𝑎 be intuitionistic fuzzy similarity class of samples induced by attribute 𝑎, then for any set 𝐴 ⊆ 𝐶, IF [⋂ ] ⋃ relation is given by 𝑅𝐴 (𝑥, 𝑦) = 𝑎∈𝐴 𝑅𝑎 ((𝑥, 𝑦)), 𝑎∈𝐴 𝑅𝑎 (𝑥, 𝑦) . Different levels of granularity, acquired from every intuitionistic fuzzy similarity, lead to more classification information. Optimal feature subset is obtained by choosing granularity (Chen et al., 2011; Hu et al., 2007) that leads to optimized accuracy. The 𝑅𝐴 (𝑥, 𝑦) between sample 𝑥 and 𝑦 denotes the similarity between sample based on their membership value and dissimilarity between their non membership value. To remove the impact of noise, low value of 𝑅𝐴 can be equated to zero, considering small value to have resulted due to noise. Parameterized intuitionistic fuzzy granule is constructed to achieve this, by introducing 𝜀 ∈ [0, 1) to avoid noise as follows: { 0, 𝜇𝑅𝐴 (𝑥,𝑦) < 𝜀 𝜇[x]𝜀 (y) = , 𝑦∈𝑈 A 𝜇𝑅𝐴 (𝑥,𝑦) , 𝜇𝑅𝐴 (𝑥,𝑦) ≥ 𝜀 and 𝜈

[x]𝜀A

{ (y) =

0,

𝜈𝑅𝐴 (𝑥,𝑦) < 𝜀

𝜈𝑅𝐴 (𝑥,𝑦) ,

𝜈𝑅𝐴 (𝑥,𝑦) ≥ 𝜀

A

𝑖

A

Greater is the size of positive region, the more is the dependency of sample 𝑦 on feature subset A for its classification. Thereby, dependency degree of attribute A is obtained using formula: ∑ | | 𝜀 y∈U ||PosA (y)|| YA = |U| The desire is to find feature subset with maximum dependency degree, as misclassification error is smaller in such case. Theorem 1. Given ⟨U, C, D⟩ and 0 < 𝜀 < 1, 𝑖𝑓 𝐴1 ⊆ 𝐴2 ⊆ 𝐶, then ( ) ( ) 𝑃 𝑜𝑠𝜀𝐴 𝐷̈ ⊆ 𝑃 𝑜𝑠𝜀𝐴 𝐷̈ 1

2

Proof. From Proposition 1, 𝑅𝜀𝐴 ⊆ 𝑅𝜀𝐴 , whenever 𝐴1 ⊆ 𝐴2 ⟹ 1 − 2 1 𝜇𝑅𝜀 (𝑥, 𝑦) ≥ 1−𝜇𝑅𝜀 (𝑥, 𝑦) and 1−𝜈𝑅𝜀 (𝑥, 𝑦) ≤ 1−𝜈𝑅𝜀 (𝑥, 𝑦) , ∀𝑥 ∈ 𝑈 and 𝐴2 𝐴1 𝐴2 𝐴1 𝑦 ∈ 𝐷̈ 𝑖 ⇒ 𝜇𝑅𝜀 (𝐷̈ ) (𝑦) ≤ 𝜇𝑅𝜀 (𝐷̈ ) (𝑦) and 𝜈𝑅𝜀 (𝐷̈ ) (𝑦) ≥ 𝜈𝑅𝜀 (𝐷̈ ) (𝑦), then 𝐴1 𝐴2 𝐴1 𝐴2 ( ) ( ) from definition of lower approximation ⇒ 𝑅𝜀𝐴 𝐷̈ (𝑦) ≤ 𝑅𝜀𝐴 𝐷̈ (𝑦) ⇒ 1 2 ( ) ( ) 𝑃 𝑜𝑠𝜀𝐴 𝐷̈ ⊆ 𝑃 𝑜𝑠𝜀𝐴 𝐷̈ . 1

2

Theorem 2. Given ⟨U, C, D⟩ and 0 < 𝜀 < 1, 𝑖𝑓 𝐴1 ⊆ 𝐴2 ⊆ ⋯ ⊆ 𝐴𝑚 ⊆ 𝐶, ( ) ( ) ( ) then Y𝜀𝐴 𝐷̈ ⊆ Y𝜀𝐴 𝐷̈ ⊆ ⋯ ⊆ Y𝜀𝐴 𝐷̈ . 1

2

𝑚

Proof. Obvious from above. , 𝑦∈𝑈

The above theorem shows that with increase in size of subset, dependency also increases. This guarantees that adding attribute to existing feature set will increase dependency of the new subset obtained. If dependency does not increase on adding an attribute F to feature subset, then that attribute is redundant and can be removed as superfluous attribute, otherwise F is indispensable and cannot be removed. A feature subset T is a reduct set if it has same dependency

Clearly, it can be seen that 𝜀 impacts the size of intuitionistic fuzzy granule. Therefore, intuitionistic fuzzy similarity is denoted by 𝑅𝜀𝐴 . It can be derived from above that (Wang et al., 2017b): Proposition 1. If 𝐴 ⊆ 𝐵, 𝑡ℎ𝑒𝑛 𝑅𝜀𝐵 ⊆ 𝑅𝜀𝐴 . 4

P. Jain, A.K. Tiwari and T. Som

Engineering Applications of Artificial Intelligence 89 (2020) 103421

Hence,

as a whole set of attributes and removing an attribute decreases its dependency.

4. Algorithm for reduct computation

𝑅𝑎1

A greedy forward algorithm for feature selection is proposed. The algorithm begins with empty set and iteratively adds attribute to the set with maximum dependency until dependency increases further.

⎡ [1, 0] ⎢ ⎢[.85, .15] ⎢[.88, .19] ⎢ ⎢ [1, 0] =⎢ ⎢[.93, .09] ⎢[.81, .41] ⎢ ⎢[.63, .30] ⎢ ⎣[.42, .37]

[1, 0] [.74, .34] [.85, .15] [.79, .24] [.67, .56] [.77, .14] [.56, .22]

Thereby, using 𝜀 = 0.7, granularity is obtained as: ⎡ [1, 0] ⎢ ⎢[0.85, 0] ⎢[0.88, 0] ⎢ ⎢ [1, 0] =⎢ ⎢ [.93, 0] ⎢[0.81, 0] ⎢ ⎢ [0, 0] ⎢ ⎣ [0, 0]

Algorithm. Heuristic algorithm based on FMIFRFS can be given as follows: 1. Input IF information system { } ̃̈ , 𝐷 ̃̈ = 𝐷 ̃̈ , … , 𝐷 ̃̈ 2. Find the IF decision classes 𝑈 ∕𝐷 1 2 𝑟 3. 4. 5. 6. 7.

[1, 0] [.93, .09] [.81, .41] [.63, .30] [.42, .37]⎤ ⎥ [.74, .34] [.85, .15] [.79, .24] [.67, .56] [.77, .14] [.56, .22]⎥ [1, 0] [.88, .19] [.94, .10] [.93, .21] [.51, .49] [.30, .56]⎥ ⎥ [.88, .19] [1, 0] [.93, .09] [.81, .41] [.63, .30] [.42, .37]⎥ ⎥ [.94, .10] [.93, .09] [1, 0] [.87, .32] [.56, .39] [.35, .46]⎥ [.93, .21] [.81, .41] [.87, .32] [1, 0] [.44, .71] [.23, .78]⎥ ⎥ [.51, .49] [.63, .30] [.56, .39] [.44, .71] [1, 0] [.78, .07]⎥ ⎥ [.30, .56] [.42, .37] [.35, .46] [.23, .78] [.78, .07] [1, 0] ⎦

[.85, .15] [.88, .19]

[𝑥]𝜀𝑎 1

Initialize 𝐶 ← 𝑎1 , 𝑎2 , … , 𝑎𝑚 , 𝑇 ← ∅ Do Set 𝐾 ← 𝑇 For every 𝑎 ∈ 𝐶 − 𝐾 Compute IF similarity 𝑅𝜀𝑇 ∪𝑎

[0.85, 0] [0.88, 0] [1, 0]

[1, 0]

[.93, 0] [0.81, 0]

[0.74, 0] [0.85, 0] [0.79, 0]

[0.74, 0]

[1, 0]

[0.88, 0] [.94, 0]

[0.85, 0] [0.88, 0]

[1, 0]

[0.79, 0] [.94, 0]

[.93, 0]

[0, 0]

[0, 0] [.93, 0]

[.93, 0] [0.81, 0] [1, 0]

[.93, 0] [0.81, 0] [0.87, 0]

[0.87, 0] [1, 0]

[0.77, 0]

[0, 0]

[0, 0]

[0, 0]

[0, 0.71]

[0, 0]

[0, 0]

[0, 0]

[0, 0]

[0, 0.78]

[0, 0] ⎤ ⎥ [0, 0] ⎥ [0, 0] [0, 0] ⎥ ⎥ [0, 0] [0, 0] ⎥ ⎥ [0, 0] [0, 0] ⎥ [0, 0.71] [0, 0.78]⎥ ⎥ [1, 0] [0.78, 0]⎥ ⎥ [0.78, 0] [1, 0] ⎦ [0, 0]

[0.73, 0]

The decision attribute partitions decision class into four sets as:

( ) 8. Compute lower approximation 𝑅𝜀𝑇 ∪𝑎 𝐷̈ 𝑖 (𝑦), for each 𝑦 ∈ 𝑈 ( ) 9. Calculate degree of dependency Y𝑇 ∪{𝑎} 𝐷̈

{[ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ]} 𝑥1 , 0, 1 , 𝑥2 , 0, 1 , 𝑥3 , 0, 1 , 𝑥4 , 0, 1 , 𝑥5 , 0, 1 , 𝑥6 , 1, 0 , 𝑥7 , 0, 1 , 𝑥8 , 0, 1 , {[ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ]} = 𝑥1 , 0, 1 , 𝑥2 , 1, 0 , 𝑥3 , 0, 1 , 𝑥4 , 1, 0 , 𝑥5 , 0, 1 , 𝑥6 , 0, 1 , 𝑥7 , 0, 1 , 𝑥8 , 1, 0 , {[ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ]} = 𝑥1 , 0, 1 , 𝑥2 , 0, 1 , 𝑥3 , 1, 0 , 𝑥4 , 0, 1 , 𝑥5 , 1, 0 , 𝑥6 , 0, 1 , 𝑥7 , 0, 1 , 𝑥8 , 0, 1 , {[ ] [ ] [ ] [ ] [ ] [ ] [ ] } = 𝑥1 , 1, 0 , 𝑥2 , 0, 1 , 𝑥3 , 0, 1 , 𝑥4 , 0, 1 , 𝑥5 , 0, 1 , 𝑥6 , 0, 1 , 𝑥7 , 1, 0 , [𝑥8 , 0, 1]

𝐷̈ 1 = 𝐷̈ 2 𝐷̈ 3

10. End-for ( ) 11. Find attribute 𝑎 ∈ 𝐶 − 𝑇 with greatest Y𝑇 ∪{𝑎} 𝐷̈ and set 𝑇 ← 𝐾 ∪𝑎 ( ) ( ) ( ) 12. Until Y𝑇 𝐷̈ = 1 or Y𝐾 𝐷̈ = Y𝑇 𝐷̈

𝐷̈ 4

Intuitionistic fuzzy decision matrix is obtained as: ⎡ [0.14, 𝑛𝑎] ⎢[0, .37, 𝑛𝑎] ⎢ ⎢ [0.17, 𝑛𝑎] [ ] ⎢ [0.14, 𝑛𝑎] 𝐷̌̈ = 𝐷̌̈ 1 , 𝐷̌̈ 2 , 𝐷̌̈ 3 , 𝐷̌̈ 4 = ⎢ ⎢ [0.15, 𝑛𝑎] ⎢ ⎢ [0.22, 0] ⎢ [0, 1] ⎢ ⎣ [0, 1]

Return 𝑇 The pseudo code for the above algorithm can be given by:

[0.33, 𝑛𝑎] [0.36, 𝑛𝑎] [0.30, 𝑛𝑎] [0.33, 𝑛𝑎] [0.31, 𝑛𝑎] [0.18, 0.52] [0.60, 0] [0.55, 0]

[0.33, 𝑛𝑎] [0.30, 𝑛𝑎] [0.36, 𝑛𝑎] [0.33, 𝑛𝑎] [0.35, 𝑛𝑎] [0.40, 0] [0, 0] [0, 0]

[0.18, 𝑛𝑎] ⎤ [0.32, 𝑛𝑎] ⎥⎥ [0.16, 𝑛𝑎] ⎥ [0.18, 𝑛𝑎] ⎥⎥ [0.17, 𝑛𝑎] ⎥ ⎥ [0.18, 0.47]⎥ [0.39, 0] ⎥ ⎥ [0.44, 0] ⎦

Some of non-membership values of decision matrix are ‘𝑛𝑎’ as the corresponding non membership value is 0 in [𝑥]𝜀𝑎 . Thereby, lower 1 approximation is obtained as:

𝑥1 𝑥2 𝑥3 𝑥4 𝑥5 𝑥6 𝑥7 𝑥8

𝜀 𝑅𝐴 (𝐷̈ 1 )

𝜀 𝑅𝐴 (𝐷̈ 2 )

𝜀 𝑅𝐴 (𝐷̈ 3 )

𝜀 𝑅𝐴 (𝐷̈ 4 )

⎡ [0, 1] ⎢ [0, 1] ⎢ ⎢ [0, 1] ⎢ [0, 1] ⎢ ⎢ [0, 1] ⎢ [0.15, 1] ⎢ [0, 1] ⎢ ⎣ [0, 1]

[0, 1] [0.30, 1] [0, 1] [0.18, 1] [0, 1] [0, 1] [0, 1] [0.55, 1]

[0, 1] [0, 1] [0.30, 1] [0, 1] [0.30, 1] [0, 1] [0, 1] [0, 1]

[0.16, 1] ⎤ [0, 1] ⎥ ⎥ [0, 1] ⎥ [0, 1] ⎥ ⎥ [0, 1] ⎥ [0, 1] ⎥ [0.32, 1] ⎥⎥ [0, 1] ⎦

Now, degree of dependency of decision attribute over 𝑎1 is calculated by proposed concept and we obtain the result as: The proposed algorithm is illustrated using following example dataset.

Y𝑎1 = .1441

Firstly, dataset is normalized into interval [0, 1], then the normalized values are converted into intuitionistic fuzzy values. Finally, Intuitionistic fuzzy similarity 𝑟𝑖𝑗 between 𝑥 and 𝑦 is obtained using formula. √ √ √𝑚 √𝑚 ⎡ ⎤ ∑ ∑ 1√ 1√ √ 2 ⎢ 𝑟𝑖𝑗 (𝑥, 𝑦) = 1 − (𝜇 (𝑥) − 𝜇(𝑦)) , √ (𝜈 (𝑥) − 𝜈(𝑦))2 ⎥ ⎢ ⎥ 𝑚 𝑖=1 𝑚 𝑖=1 ⎣ ⎦

Similarly, degrees of dependencies of decision attribute over other conditional attributes are:

where 𝜇 (𝑥) and 𝜈 (𝑥) are membership and non membership degree, respectively of an instance 𝑥 to the attribute set 𝐴 and m is the number of attributes in set 𝐴.

Therefore a4 is selected as the potential reduct set. Combining with other attributes, this process iterates and after termination of algorithm, we obtain the reduct set as {a1 , a4 , a5 }.

Y𝑎2 = .1928 Y𝑎3 = .1676 Y𝑎4 = .1751 Y𝑎5 = .2621

5

P. Jain, A.K. Tiwari and T. Som

Engineering Applications of Artificial Intelligence 89 (2020) 103421

Fig. 1. Variation of Classification accuracy and Reduct size with epsilon by Proposed Method.

5. Results and discussion

These algorithms employed forward search to obtain optimal feature subset. The intuitionistic fuzzy similarity 𝑟𝑖𝑗 between instances x and y is computed by:

In the current study, the performance of the proposed model is evaluated and compared with existing fitting model based on fuzzy-rough feature selection (Wang et al., 2017b) (FMFRFS). All the algorithms are implemented in Matlab 2018a and classification is done using WEKA (Hall et al., 2009). Firstly, the dataset is fuzzified using simple algorithm. Then, fuzzified data is converted to intuitionistic fuzzy dataset.

𝑟𝑖𝑗 (𝑥, 𝑦) = [1 − |𝜇 (𝑥) − 𝜇(𝑦)|, |𝜈 (𝑥) − 𝜈(𝑦)|] where 𝜇 (𝑥) and 𝜈 (𝑥) are membership and non membership degree, respectively of an instance 𝑥 to a set. 6

P. Jain, A.K. Tiwari and T. Som

Engineering Applications of Artificial Intelligence 89 (2020) 103421

Fig. 1. (continued).

Further, the choice of 𝜀 depends on the amount of noise present in the dataset. The value of 𝜀 is varied from 0.1 to 0.9 in a small interval, and the value of 𝜀 giving highest classification accuracy is selected.

5.1. Dataset Twelve benchmark datasets from the University of California, Irvine, Machine Learning Repository (Blake and Merz, 1998) is used to represent the performance of our proposed approach. The details of these datasets are mentioned in Table 1. The dimension of the datasets

The following experimental setup is used to conduct the entire experiments: 7

P. Jain, A.K. Tiwari and T. Som

Engineering Applications of Artificial Intelligence 89 (2020) 103421

Fig. 2. AUC of four machine learning algorithms for Reduced Datasets by Proposed Method.

indicates that these are small to medium size datasets as number of instances range from 10 to 2126 and attributes range from 4 to 4702.

reduced datasets. PART (Frank and Witten, 1998), JRip (Cohen, 1995) and J48 (Ross Quinlan, 1993) are used for the purpose of evaluating classification accuracy using full dataset. While kNN (k = 3) and SVM were employed to test performance on dataset using 10-fold cross validation. Furthermore, we perform a comparative study of proposed model with the existing fitting model for feature selection using fuzzy

5.2. Classifiers Three different machine learning algorithms, available under rules and trees categories are employed to demonstrate performance on 8

P. Jain, A.K. Tiwari and T. Som

Engineering Applications of Artificial Intelligence 89 (2020) 103421

Fig. 2. (continued).

rough set by observing the change in overall accuracies of different classifiers along with standard deviation for reduced datasets.

into ten subsets, of which one is used for testing and remaining nine forms training set. After ten such rounds, average value of accuracy is used as final performance. While for performing 10-fold cross validation for feature selection, dataset is randomly divided to ten subsets, nine of which are used for feature selection. Whole reduced dataset is then employed to evaluate classification accuracy. After ten such iterations, average value of accuracy is used as final performance.

5.3. Dataset split Using full training set for feature selection, accuracy is evaluated based on 10-fold cross validation, that is, dataset is randomly divided 9

P. Jain, A.K. Tiwari and T. Som

Engineering Applications of Artificial Intelligence 89 (2020) 103421

Fig. 3. AUC of four machine learning algorithms for Reduced Datasets by FMFRFS.

These parameters are determined using true positive (TP), true negative (TN ), false positive (FP) and false negative (FN ). TP is number of correctly classified positive instances; TN is number of correctly classified negative instances. FN is number of incorrectly classified positive instances while FP is number of incorrectly classified negative instances.

5.4. Performance evaluation metrics

The prediction performances of the three machine learning algorithms are evaluated using threshold-dependent and threshold-independent parameters. 10

P. Jain, A.K. Tiwari and T. Som

Engineering Applications of Artificial Intelligence 89 (2020) 103421

Fig. 3. (continued).

Sensitivity: It provides the percentage of correctly classified positive instances and is represented as: 𝑆𝑒𝑛𝑠𝑖𝑡𝑖𝑣𝑖𝑡𝑦 =

Accuracy: It is the percentage of correctly classified instances (both positive and negative), which can be calculated as:

𝑇𝑃 × 100 (𝑇 𝑃 + 𝐹 𝑁)

𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 =

Specificity: It provides the percentage of correctly classified negative instances and is represented by: 𝑆𝑝𝑒𝑐𝑖𝑓 𝑖𝑐𝑖𝑡𝑦 =

𝑇𝑃 + 𝑇𝑁 × 100 𝑇𝑃 + 𝐹𝑃 + 𝑇𝑁 + 𝐹𝑁

AUC: It gives the area under the receiver operating characteristic curve (ROC), the closer the value to 1, the more accurate the classifier. It is robust to heterogeneity of dataset, and is used for evaluating performance (Jensen and Shen, 2004b).

𝑇𝑁 × 100 (𝑇 𝑁 + 𝐹 𝑃 ) 11

P. Jain, A.K. Tiwari and T. Som

Engineering Applications of Artificial Intelligence 89 (2020) 103421

MCC: Mathew’s correlation coefficient is a performance parameter mostly used for binary classification, and is calculated by using the equation as follows:

as well as execution time and improved model interpretation. The fuzzy rough set theory has been successfully applied in the field of feature selection by outperforming the insufficiencies of the classical rough set based techniques in various aspects. However, traditional fuzzy-rough dependency cannot reveal better the learning ability of a subset of attributes or features as it only tries to keep the fuzzy positive region maximal and it cannot suitably fit data. Wang et al. (2017a) handled this problem by introducing a fitting model for feature selection with fuzzy rough sets. However, fuzzy set theory has certain limitations and it cannot handle the uncertainty in the case where it is not found only in judgment but also in the identification. It is anticipated that the human decision-making process and activities require human expertise and knowledge which are inevitability imprecise or not totally reliable and that could be used to simulate by using intuitionistic fuzzy set concept as it considers membership, non-membership and hesitancy functions simultaneously. In this paper, we introduced a novel intuitionistic fuzzy rough set model in order to cope with above mentioned problems. This model successfully fitted data well and avoided misclassification properly. Firstly, Intuitionistic fuzzy decision of an object was established using neighborhood concept. Then, intuitionistic fuzzy lower and upper approximations were introduced using intuitionistic fuzzy decision along with parameterized intuitionistic fuzzy granule. Furthermore, an intuitionistic fuzzy dependency function was presented. Moreover, a heuristic greedy forward algorithm was presented based on proposed model to compute the reduct set. Finally, our proposed technique was applied (based on 10-fold cross validation and full training set) to the benchmark datasets and a comparative study with existing model was presented. From the experimental results, we observed that presented algorithm provided more accurate reduct set than existing algorithm especially for those information systems in which various categories have a great degree of overlap. In the future, we can propose a discernibility matrix based approach to find all possible reduct sets by using our proposed model. Furthermore, we can extend this concept with some more accurate intuitionistic fuzzy rough set models like variable precision intuitionistic fuzzy rough set model and type-2 intuitionistic fuzzy rough set model which can handle uncertainty in much better way. Moreover, we intend to explore how the proposed model can be applied to construct rule-based classifiers.

𝑇𝑃 × 𝑇𝑁 − 𝐹𝑃 × 𝐹𝑁 𝑀𝐶𝐶 = √ (𝑇 𝑃 + 𝐹 𝑃 ) (𝑇 𝑃 + 𝐹 𝑁) (𝑇 𝑁 + 𝐹 𝑃 ) (𝑇 𝑁 + 𝐹 𝑁) Best performance is considered for 𝑀𝐶𝐶 value of 1. All these performance parameters are evaluated using open source Java based machine learning platform WEKA (Hall et al., 2009). 5.5. Experimental results In Table 2, we have presented the characteristics of the datasets and the size of the reduct set produced by FMFRFS as well as FMIFRFS using full training set and 10-fold cross validation technique respectively. Overall classification accuracies with standard deviation are evaluated by using PART, JRip, J48 for both original datasets and reduced datasets as produced by FMFRFS and FMIFRFS on full training sets, as recorded in Table 3. Moreover, average classification accuracies along with standard deviation are again evaluated by using 3NN, SVM for both original datasets and reduced datasets as produced by FMFRFS and FMIFRFS on 10-fold cross validation, as depicted in Table 4. From the experimental results, it can be observed that our proposed technique usually provides smaller subset of features than existing method. For some of the datasets, FMIFRFS is producing larger subsets when compared with FMFRFS but these reduct sets are more accurate as the performance of different learning algorithms (Tables 3–4) for these sets are better when compared with FMFRFS based reduct sets. From the experiments, it can be observed that the average accuracies of different classifiers for the reduced datasets produced by FMIFRFS is always more than those of reduced datasets produced by FMFRFS and the values of standard deviation are vice-versa. This established that our proposed model can provide more relevant and less redundant features than all the existing approaches can produce till date. Therefore, FMIFRFS provides more accurate result when compared to existing FMFRFS. Wang et al. (2017b) has revealed that FMFRFS is better performing approach than other existing feature selection techniques. Therefore, our proposed approach outperforms all the existing approach till date. For better explanation of our justification, we have presented more experimental results in Table 5, where the values of sensitivity, specificity, accuracy, AUC, MCC, which was obtained by using full training set, for PART, JRip and J48 classifiers based on reduced Ionosphere dataset as produced by FMFRFS and FMIFRFS approaches as well as original dataset. By observing the values of performance evaluation metrics, it is obvious that our approach is not only producing better result in the form of improvement of accuracies of the various learning algorithms but also in the form of values of other performance parameters. Variation of classification accuracy and reduct size with noise parameter 𝜀 is depicted in Fig. 1, which is obtained by using 10-fold cross validation by conducting series of experiments. Further, for full dataset, a convenient way to observe the overall performance of different classifiers at different decision threshold is the Receiver Operating Characteristic (ROC) curve, which gives a visual representation of the classifiers performance. The ROC curves by using different classifiers namely PART, JRip, J48 and Random Forest (Breiman, 1996, 2001) for various reduced datasets by FMFRFS and FMIFRFS are depicted in Figs. 2–3. These figures clearly indicate that our proposed algorithm is superior to the existing algorithm.

Acknowledgment This research work is funded by UGC Research Fellowship, India. References Atanasov, K.T., 1999. Intuitionistic Fuzzy Sets: Theory and Applications. In: Studies in Fuzziness and Soft Computing, Physica-Verlag, Heidelberg, p. 35. Atanassov, K.T., 1986. Intuitionistic fuzzy sets. Fuzzy Sets and Systems 20 (1), 87–96. http://dx.doi.org/10.1016/s0165-0114(86)80034-3. Atanassov, K.T., 1989. More on intuitionistic fuzzy sets. Fuzzy Sets and Systems 33 (1), 37–45. http://dx.doi.org/10.1016/0165-0114(89)90215-7. Blake, C., Merz, C., 1998. UCI repository of machine learning databases. Boran, F.E., Genç, S., Kurt, M., Akay, D., 2009. A multi-criteria intuitionistic fuzzy group decision making for supplier selection with TOPSIS method. Expert Syst. Appl. 36 (8), 11363–11368. Breiman, L., 1996. Bagging predictors. Mach. Learn. 24 (2), 123–140. Breiman, L., 2001. Random forests. Mach. Learn. 45 (1), 5–32. Çoker, D., 1998. Fuzzy rough sets are intuitionistic L-fuzzy sets. Fuzzy Sets and Systems 96 (3), 381–383. http://dx.doi.org/10.1016/s0165-0114(97)00249-2. Chakrabarty, K., Gedeon, T., Koczy, L., 1998. Intuitionistic fuzzy rough set. In: Paper Presented at the Proceedings of 4th Joint Conference on Information Sciences, JCIS, Durham, NC. Chen, D., Hu, Q., Yang, Y., 2011. Parameterized attribute reduction with Gaussian kernel based fuzzy rough sets. Inform. Sci. 181 (23), 5169–5179. http://dx.doi. org/10.1016/j.ins.2011.07.025. Chen, D., Kwong, S., He, Q., Wang, H., 2012a. Geometrical interpretation and applications of membership functions with fuzzy rough sets. Fuzzy Sets and Systems 193, 122–135. http://dx.doi.org/10.1016/j.fss.2011.07.011. Chen, H., Yang, H., 2011. One new algorithm for intuitiontistic fuzzy-rough attribute reduction. J. Chin. Comput. Syst. 32 (3), 506–510.

6. Conclusion Feature selection is an optimization process for selecting the most informative features from various alternatives to facilitate classification or data mining problems. Feature selection is one of the dimensionality reduction techniques, which offers several benefits in terms of reduced storage cost, reduced data collection effort, lesser model building time 12

P. Jain, A.K. Tiwari and T. Som

Engineering Applications of Artificial Intelligence 89 (2020) 103421 Li, L.-q., Wang, X.-l., Liu, Z.-x., Xie, W.-x., 2019. A novel intuitionistic fuzzy clustering algorithm based on feature selection for multiple object tracking. Int. J. Fuzzy Syst. http://dx.doi.org/10.1007/s40815-019-00645-7. Lu, Y.L., Lei, Y.J., Hua, J.X., 2009. Attribute reduction based on intuitionistic fuzzy rough set. Control Decis. 3, 003. Nanda, S., Majumdar, S., 1992. Fuzzy rough sets. Fuzzy Sets and Systems 45 (2), 157–160. http://dx.doi.org/10.1016/0165-0114(92)90114-j. Pawlak, Z., 1982. Rough sets. Int. J. Comput. Inf. Sci. 11 (5), 341–356. http://dx.doi. org/10.1007/bf01001956. Pawlak, Z., 2012. Rough Sets: Theoretical Aspects of Reasoning About Data, Vol. 9. Springer Science & Business Media. Pawlak, Z., Grzymala-Busse, J., Slowinski, R., Ziarko, W., 1995. Rough sets. Commun. ACM 38 (11), 88–95. http://dx.doi.org/10.1145/219717.219791. Revanasiddappa, M., Harish, B., 2018. A new feature selection method based on intuitionistic fuzzy entropy to categorize text documents. Int. J. Interact. Multimedia Artif. Intell. 5 (3). Rizvi, S., Naqvi, H.J., Nadeem, D., 2002. Rough intuitionistic fuzzy sets. Paper presented at the JCIS. Ross Quinlan, J., 1993. C4. 5: programs for machine learning. Mach. Learn. 16 (3), 235–240. Samanta, S., Mondal, T., 2001. Intuitionistic fuzzy rough sets and rough intuitionistic fuzzy sets. J. Fuzzy Math. 9 (3), 561–582. Sheeja, T., Kuriakose, A.S., 2018. A novel feature selection method using fuzzy rough sets. Comput. Ind. 97, 111–121. Shreevastava, S., Tiwari, A.K., Som, T., 2018. Intuitionistic fuzzy neighborhood rough set model for feature selection. Int. J. Fuzzy Syst. Appl. 7 (2), 75–84. http: //dx.doi.org/10.4018/ijfsa.2018040104. Shreevastava, S., Tiwari, A., Som, T., 2019. Feature subset selection of semi-supervised data: An intuitionistic fuzzy-rough set-based concept. In: Paper Presented at the Proceedings of International Ethical Hacking Conference 2018. Singh, S., Shreevastava, S., Som, T., Jain, P., 2019. Intuitionistic fuzzy quantifier and its application in feature selection. Int. J. Fuzzy Syst. 21 (2), 441–453. http://dx.doi.org/10.1007/s40815-018-00603-9. Suyun, Z., Tsang, E., Degang, C., 2009. The model of fuzzy variable precision rough sets. IEEE Trans. Fuzzy Syst. 17 (2), 451–467. http://dx.doi.org/10.1109/tfuzz.2009. 2013204. Tan, A., Wu, W.-Z., Qian, Y., Liang, J., Chen, J., Li, J., 2018. Intuitionistic fuzzy rough set-based granular structures and attribute subset selection. IEEE Trans. Fuzzy Syst. 27 (3), 527–539. Tiwari, A.K., Shreevastava, S., Shukla, K.K., Subbiah, K., 2018a. New approaches to intuitionistic fuzzy-rough attribute reduction. J. Intell. Fuzzy Systems 34 (5), 3385–3394. http://dx.doi.org/10.3233/jifs-169519. Tiwari, A.K., Shreevastava, S., Som, T., Shukla, K.K., 2018b. Tolerance-based intuitionistic fuzzy-rough set approach for attribute reduction. Expert Syst. Appl. 101, 205–212. http://dx.doi.org/10.1016/j.eswa.2018.02.009. Tiwari, A.K., Shreevastava, S., Subbiah, K., Som, T., 2019. An intuitionistic fuzzy-rough set model and its application to feature selection. J. Intell. Fuzzy Systems 36 (5), 4969–4979. Tsang, E.C., Degang, C., Yeung, D.S., Xi-Zhao, W., Lee, J., 2008. Attributes reduction using fuzzy rough sets. IEEE Trans. Fuzzy Syst. 16 (5), 1130–1141. http://dx.doi. org/10.1109/tfuzz.2006.889960. Wang, C., He, Q., Shao, M., Xu, Y., Hu, Q., 2017a. A unified information measure for general binary relations. Knowl.-Based Syst. 135, 18–28. Wang, C., Huang, Y., Shao, M., Chen, D., 2019a. Uncertainty measures for general fuzzy relations. Fuzzy Sets and Systems 360, 82–96. Wang, C., Huang, Y., Shao, M., Fan, X., 2019b. Fuzzy rough set-based attribute reduction using distance measures. Knowl.-Based Syst. 164, 205–212. Wang, C., Qi, Y., Shao, M., Hu, Q., Chen, D., Qian, Y., Lin, Y., 2017b. A fitting model for feature selection with fuzzy rough sets. IEEE Trans. Fuzzy Syst. 25 (4), 741–753. http://dx.doi.org/10.1109/tfuzz.2016.2574918. Wang, C., Shao, M., He, Q., Qian, Y., Qi, Y., 2016. Feature subset selection based on fuzzy neighborhood rough sets. Knowl.-Based Syst. 111, 173–179. Wang, C., Shi, Y., Fan, X., Shao, M., 2019c. Attribute reduction based on k-nearest neighborhood rough sets. Internat. J. Approx. Reason. 106, 18–31. Webb, A.R., Copsey, K.D., 2011. Performance assessment. In: Statistical Pattern Recognition. John Wiley & Sons, Ltd, pp. 404–432. http://dx.doi.org/10.1002/ 9781119952954. Xiong, M., Fang, X., Zhao, J., 2001. Biomarker identification by feature wrappers. Genome Res. 11 (11), 1878–1887. http://dx.doi.org/10.1101/gr.190001. Zadeh, L.A., 1965. Fuzzy sets. Inf. Control 8 (3), 338–353. http://dx.doi.org/10.1016/ s0019-9958(65)90241-x. Zhang, Z., 2016. Attributes reduction based on intuitionistic fuzzy rough sets. J. Intell. Fuzzy Systems 30 (2), 1127–1137. http://dx.doi.org/10.3233/ifs-151835. Zhang, L., Zhan, J., Xu, Z., Alcantud, J.C.R., 2019. Covering-based general multigranulation intuitionistic fuzzy rough sets and corresponding applications to multi-attribute group decision-making. Inform. Sci. 494, 114–140. Zhang, X., Zhou, B., Li, P., 2012. A general frame for intuitionistic fuzzy rough sets. Inform. Sci. 216, 34–49. http://dx.doi.org/10.1016/j.ins.2012.04.018.

Chen, D., Zhang, L., Zhao, S., Hu, Q., Zhu, P., 2012b. A novel algorithm for finding reducts with fuzzy rough sets. IEEE Trans. Fuzzy Syst. 20 (2), 385–389. http: //dx.doi.org/10.1109/tfuzz.2011.2173695. Cohen, W.W., 1995. Fast Effective Rule Induction Machine Learning Proceedings 1995. Elsevier, pp. 115–123. Cornelis, C., De Cock, M., Kerre, E.E., 2003. Intuitionistic fuzzy rough sets: at the crossroads of imperfect knowledge. Expert Syst. 20 (5), 260–270. http://dx.doi. org/10.1111/1468-0394.00250. De, S.K., Biswas, R., Roy, A.R., 1998. Intuitionistic fuzzy database. In: Paper Presented at the Second International Conference on IFS, NIFS. Degang, C., Suyun, Z., 2010. Local reduction of decision system with fuzzy rough sets. Fuzzy Sets and Systems 161 (13), 1871–1883. http://dx.doi.org/10.1016/j.fss.2009. 12.010. Dubois, D., Prade, H., 1990. Rough fuzzy sets and fuzzy rough sets. Int. J. Gen. Syst. 17 (2–3), 191–209. http://dx.doi.org/10.1080/03081079008935107. Dubois, D., Prade, H., 1992. Putting Rough Sets and Fuzzy Sets Together Intelligent Decision Support. Springer, Netherlands, pp. 203–232. Esmail, H., Maryam, J., Habibolla, L., 2013. Rough set theory for the intuitionistic fuzzy information. Syst. Int. J. Mod. Math. Sci. 6 (3), 132–143. Frank, E., Witten, I.H., 1998. Generating accurate rule sets without global optimization. Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H., 2009. The WEKA data mining software. ACM SIGKDD Explor. Newsl. 11 (1), 10. http: //dx.doi.org/10.1145/1656274.1656278. Hu, Q., Xie, Z., Yu, D., 2007. Hybrid attribute reduction based on a novel fuzzyrough model and information granulation. Pattern Recognit. 40 (12), 3509–3521. http://dx.doi.org/10.1016/j.patcog.2007.03.017. Hu, Q., Yu, D., Xie, Z., 2006. Information-preserving hybrid data reduction based on fuzzy-rough techniques. Pattern Recognit. Lett. 27 (5), 414–423. http://dx.doi.org/ 10.1016/j.patrec.2005.09.004. Hu, Q., Zhang, L., Chen, D., Pedrycz, W., Yu, D., 2010. GaussIan kernel based fuzzy rough sets: Model, uncertainty measures and applications. Internat. J. Approx. Reason. 51 (4), 453–471. http://dx.doi.org/10.1016/j.ijar.2010.01.004. Huang, B., Li, H.x., Wei, D.-k, 2012. Dominance-based rough set model in intuitionistic fuzzy information systems. Knowl.-Based Syst. 28, 115–123. http://dx.doi.org/10. 1016/j.knosys.2011.12.008. Huang, B., Zhuang, Y.-l, Li, H.-x., Wei, D.-k., 2013. A dominance intuitionistic fuzzy-rough set approach and its applications. Appl. Math. Model. 37 (12–13), 7128–7141. http://dx.doi.org/10.1016/j.apm.2012.12.009. Iancu, I., 2014. Intuitionistic fuzzy similarity measures based on Frank t-norms family. Pattern Recognit. Lett. 42, 128–136. http://dx.doi.org/10.1016/j.patrec.2014.02. 010. Iannarilli, F.J., Rubin, P.A., 2003. Feature selection for multiclass discrimination via mixed-integer linear programming. IEEE Trans. Pattern Anal. Mach. Intell. 25 (6), 779–783. http://dx.doi.org/10.1109/tpami.2003.1201827. Jaeger, J., Sengupta, R., Ruzzo, W.L., 2002. Improved gene selection for classification of microarrays. http://dx.doi.org/10.1142/9789812776303_0006, Paper presented at the Biocomputing 2003. Jain, A.K., Duin, P.W., Jianchang, M., 2000. Statistical pattern recognition: a review. IEEE Trans. Pattern Anal. Mach. Intell. 22 (1), 4–37. http://dx.doi.org/10.1109/ 34.824819. Jena, S., Ghosh, S., Tripathy, B., 2002. Intuitionistic fuzzy rough sets. Notes Intuit. Fuzzy Sets 8 (1), 1–18. Jensen, R., Shen, Q., 2004a. Fuzzy–rough attribute reduction with application to web categorization. Fuzzy Sets and Systems 141 (3), 469–485. http://dx.doi.org/10. 1016/s0165-0114(03)00021-6. Jensen, R., Shen, Q., 2004b. Semantics-preserving dimensionality reduction: rough and fuzzy-rough-based approaches. IEEE Trans. Knowl. Data Eng. 16 (12), 1457–1471. http://dx.doi.org/10.1109/tkde.2004.96. Jensen, R., Shen, Q., 2005. Fuzzy-rough data reduction with ant colony optimization. Fuzzy Sets and Systems 149 (1), 5–20. http://dx.doi.org/10.1016/j.fss.2004.07.014. Jensen, R., Shen, Q., 2007. Fuzzy-rough sets assisted attribute selection. IEEE Trans. Fuzzy Syst. 15 (1), 73–89. http://dx.doi.org/10.1109/tfuzz.2006.889761. Jensen, R., Shen, Q., 2008. Computational Intelligence and Feature Selection: Rough and Fuzzy Approaches, Vol. 8. John Wiley & Sons. Jensen, R., Shen, Q., 2009. New approaches to fuzzy-rough feature selection. IEEE Trans. Fuzzy Syst. 17 (4), 824–838. http://dx.doi.org/10.1109/tfuzz.2008.924209. Klir, G.J., Yuan, B., 1995. Fuzzy Sets and Fuzzy Logic: Theory and Applications, Vol. 574. Prentice Hall PTR, New Jersey. Kohavi, R., John, G.H., 1997. Wrappers for feature subset selection. Artificial Intelligence 97 (1–2), 273–324. http://dx.doi.org/10.1016/s0004-3702(97)00043x. Kumar, P., Vadakkepat, P., Poh, L.A., 2011. Fuzzy-rough discriminative feature selection and classification algorithm, with application to microarray and image datasets. Appl. Soft Comput. 11 (4), 3429–3440. http://dx.doi.org/10.1016/j.asoc.2011.01. 013. Kwak, N., Chong-Ho, C., 2002. Input feature selection by mutual information based on Parzen window. IEEE Trans. Pattern Anal. Mach. Intell. 24 (12), 1667–1671. http://dx.doi.org/10.1109/tpami.2002.1114861. Langley, P., 1994. Selection of Relevant Features in Machine Learning. Defense Technical Information Center, http://dx.doi.org/10.21236/ada292575. 13