Applying the JBOS reduction method for relevant knowledge extraction

Applying the JBOS reduction method for relevant knowledge extraction

Expert Systems with Applications 40 (2013) 1880–1887 Contents lists available at SciVerse ScienceDirect Expert Systems with Applications journal hom...

762KB Sizes 0 Downloads 15 Views

Expert Systems with Applications 40 (2013) 1880–1887

Contents lists available at SciVerse ScienceDirect

Expert Systems with Applications journal homepage: www.elsevier.com/locate/eswa

Applying the JBOS reduction method for relevant knowledge extraction Sérgio M. Dias a,b,⇑, Newton J. Vieira a a b

Department of Computer Science, Federal University of Minas Gerais (UFMG), Av. Antônio Carlos 6627-ICEx, Pampulha, 31270-010 Belo Horizonte, Minas Gerais, Brazil Federal Service of Data Processing (SERPRO), Av. José Cândido da Silveira, 1.200 Cidade Nova, 31.035-536, Belo Horizonte, MG, Brazil

a r t i c l e

i n f o

Keywords: Formal concept analysis JBOS method Formal context reduction Lattice reduction

a b s t r a c t This work presents results from an experiment used to assess the JBOS (junction based on objects similarity) reduction method. Two reductions were made of a formal context about patients having symptoms in a tuberculosis data base. The first reduction used the knowledge expressed in the original formal context and the second used the knowledge expressed in expert rules. The assessment was made, in the first case, by comparison of the performances of the sets of extracted rules (stem bases) before and after the reduction, and in the second case, by comparison of the performances of the set of extracted rules after reduction with that of the expert rules. The performance in the first case was exactly the same as before reduction. In the second case the performance even improved, showing that the weighting process, besides incorporating the expert knowledge, resulted in rules well adjusted to the knowledge expressed in the original formal context. So, both reductions resulted in rule sets absolutely consistent with the original ones. The expert rules, FCA rules and both set of rules obtained after reduction were used also to classify patients of a validation set. In this case, the results have shown that the performance was the same before and after reduction. Therefore, it was shown that by means of an appropriate attributes weight assignment it is possible, by the JBOS method, to achieve a suitable level of performance in a specific task after reduction.  2012 Elsevier Ltd. All rights reserved.

1. Introduction Formal concepts analysis (FCA) is a technique based on the mathematization of the notion of concept as consisting of intention and extension and on the organization of the concepts through a conceptual hierarchy (Ganter & Wille, 1999). It was born in 1982 with the work of Wille (1982). Currently there is a growing interest in applications of FCA in several areas, as for example information retrieval (Koester, 2006), data mining (Falk & Gardent, 2011; Poelmans, Elzinga, Viaene, & Dedene, 2010; Valtchev, Missaoui, & Godin, 2004), neural networks (Zárate & Dias, 2009), social networks (Jay, Kohler, & Napoli, 2008; Riadh, Le Grand, Aufaure, & Soto, 2009), software engineering (Codocedo, Taramasco, & Astudill, 2011), etc. The initial data in FCA is supposed to be supplied in the form of a binary relation on a set of objects and attributes called formal context. The great potential of FCA is provided by the organization of the knowledge present in the formal context, essentially a set of (formal) concepts, into a conceptual hierarchy termed the concept

⇑ Corresponding author at: Department of Computer Science, Federal University of Minas Gerais (UFMG), Av. Antônio Carlos 6627-ICEx, Pampulha 31270-010, Belo Horizonte, Minas Gerais, Brazil. E-mail addresses: [email protected], [email protected] (S.M. Dias), [email protected] (N.J. Vieira). 0957-4174/$ - see front matter  2012 Elsevier Ltd. All rights reserved. http://dx.doi.org/10.1016/j.eswa.2012.10.010

lattice. In fact, the main applications make use of the concept lattice represented by means of a line diagram, a nested line diagram, a tree diagram, etc. The cost of generating and organizing hierarchically the set of all formal concepts is exponential in the worst case (Kuznetsov, 2001). Although this worst case is rarely found in practice (Godin, Saunders, & Gecsei, 1986), the computational cost can still be prohibitive for many applications. Besides, the resulting number of concepts and/or hierarchy complexity can make difficult a proper analysis of the underlying knowledge (Rice & Siff, 2001). In particular, the essential aspects, those which one is effectively looking for, can be immersed into a myriad of irrelevant details. In many applications, as for example knowledge extraction, ontology construction, machine learning and others, it is important to have the possibility to construct a higher level abstracted version of the concept lattice. In fact, the problem of obtaining a concept lattice of a proper size and complexity that makes explicit the relevant aspects in the intended application is one of the more important problems of FCA (Belohlavek & Macko, 2011; Konecny & Krupk, 2011; Kuznetsov, 1990; Liu & Mi, 2008; Pei & Mi, 2011; Roth, Obiedkov, & Kourie, 2006; Wang & Zhang, 2008b). In Dias and Vieira (2010) we proposed a new method for reducing the complexity of the concept lattice, called junction based on objects similarity (JBOS), which seeks to replace groups of similar objects by representative ones. The similarity is

1881

S.M. Dias, N.J. Vieira / Expert Systems with Applications 40 (2013) 1880–1887

measured based, not on the simple presence/absence of attributes as done in several previous works (Kuznetsov, 2007; Li, Mei, & Lv, 2011a; Pei & Mi, 2011; Rice & Siff, 2001; Stumme, Taouil, Bastide, Pasquier, & Lakhal, 2002; Wenxiu, Ling, & Jianjun, 2005), but on the attributes semantic relevance. Even though there are other methods that also take in account the importance of attributes, as (Bélohlávek, Sklenár, & Zacpal, 2004; Belohlavek & Macko, 2011; Belohlavek & Vychodil, 2009), our method has a feature that can make it more suitable for many applications: the grouping of similar objects based on a relevance measure directly put forward by the user. Here we show that our method can effectively be used to simplify a formal context in order to expose relevant knowledge in the final concept lattice. Moreover, it does so without losing information that could be important for the intended application. This result shows that through careful attributes weighting one can obtain, in practice, a concept lattice that exhibits more clearly the really important relationships required for an actual application. The problem of evaluating the different approaches for concept lattice reduction is very important. Kuznetsov and Obiedkov (2002) remark that a well characterized set of data bases in needed in order to assess the quality of new algorithms for FCA. In the context of complexity reduction the problem is even more evident: how to measure the quality of the resulting formal context and/ or concept lattice after reduction? General methods such as those proposed in Riadh et al. (2009), Codocedo et al. (2011), Soldano, Ventos, Champesme, and Forge (2010), King (2004), Bélohlávek et al. (2004), Belohlavek and Vychodil (2009) and Belohlavek and Macko (2011) give an idea of information or descriptive losses implied by the reduction, but actual tests using specific data bases are still very important. In our method different attribute weightings give different reductions; a way to assess the quality of a specific one is to have it evaluated by a domain expert, as is usually done for expert systems in general (Jay et al., 2008; Riadh et al., 2009; Codocedo et al., 2011; Soldano et al., 2010; King, 2004; Falk & Gardent, 2011; Gaillard, Lieber, & Nauer, 2011). To demonstrate the effectiveness of JBOS for its intended purpose, exposure of relevant knowledge, we selected a data base in health care, more specifically one about tuberculosis (TB). The TB data base was taken from Horner (2007), where it was used in the development of an expert system. Later it was used by Kumar and Srinivas (2010) and Kumar, 2011 for mining association rules via AFC after applying a reduction method based on SVD (singular value decomposition). The TB data base is presented as a formal context in which the objects are patients and the attributes are symptoms. The relevant knowledge to be extracted is a simplified concept lattice that, in spite of having fewer details than the original one, allows the extraction of a good set of rules for tuberculosis diagnostics. Along with the TB base (Horner, 2007) supplies a set of expert’s rules in order to make possible a comparison with rules obtained using FCA with and without reduction. Two different reductions were applied, one in which the attribute weights were based on the frequency of symptoms in those patients actually having TB, and the other based on the frequency of symptoms in the expert rules. In the first case, after a reduction of 33% on the number of concepts, surprisingly all rules extracted before the reduction were also extracted from reduced lattice and exactly the same number of correct diagnosis was achieved. In the second case, after a reduction of 78%, all rules extracted before the reduction were subsumed by the new rules and the performance increased from 81% to 90% correct diagnosis, thus showing that even after the process of reduction the resultant set of rules became more adjusted to the data of the original formal context. In the next section the basic concepts of FCA sufficient for the understanding of the results of this paper are revised and, at the

Table 1 Formal context of TB dataset. Obj/ Attr

PC

SP

MC

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21

x x x x

x x x

x x x

x x

BS

x

x

x x x x x

x x x x x

x x x

x x x x x

x

x

WL x x x x x x x x x x x

NS

NA

x x

x x x x x x

x x x x x

x x x

x x

x x

CS

x

x

x

x

x x x

x x x x

x x

x

CP

SB

x x x

x

x x x

TC

x x x x x

x

x

TN

TB

x x x x x x x

x x x x x x x x x x

x x

x

x x x x x x

x x x

x x

x x x x x x

x x x

same time, the TB formal context and lattice are presented together with FCA rules (the stem base) and expert rules for TB diagnosis. Next, at Section 3, the application of the JBOS method of reduction to the TB formal context is explored. Related works are analyzed in Section 4 and conclusions are drawn on Section 5. 2. Formal concept analysis Formal concept analysis is a field of mathematics which was born in the early eighties (Wille, 1982). Its main characteristic is knowledge representation by means of a (concept) lattice usually presented in the form of line diagram (or Hasse diagram). In FCA, the initial data is presented as a formal context. A formal context is a triplet (G, M, I), where G is a set of elements called objects, M is a set of elements called attributes and I # G  M is called an incidence relation. If (g,m) 2 I, one says that ‘‘the object g has the attribute m’’. A formal context is usually presented as a cross table where the objects are rows headers, the attributes are columns headers and there is a cross in row g and column m if and only if (g, m) 2 I. Table 1 shows the formal context for a TB data set (Horner, 2007). The objects are patients and the attributes are symptoms. The abbreviations used for symptoms are as shown in the Table 2. Given a set of objects A # G from a formal context (G, M, I), the set of attributes which are common to all those objects is termed A0 . Similarly, for a set B # M, B0 is the set of objects that have all the attributes from B. That is to say A0 = {m 2 Mj"g 2 A(g,m) 2 I} and B0 = {g 2 Gj"m 2 B(g,m) 2 I}. By using such derivation operators, the notion of formal concept is defined as a pair ðA; BÞ 2 PðGÞ  PðMÞ such that A0 = B and B0 = A, where A is called the extent and B the intent of the concept. For example, from the formal context of Table 1, it can be seen that the pair

ðf1; 7g; fPC; SP; WL; CP; SB; TBgÞ is a formal concept with extent {1, 7} and intent {PC, SP, WL, CP, SB, TB}. The set of formal concepts can be ordered by the partial order  such that for any two formal concepts (A1, B1) and (A2, B2), (A1, B1)  (A2, B2) if and only if A1 # A2 (equivalently, B2 # B1). The set of concepts ordered by  constitutes a complete lattice (Davey & Priestley, 1990), the so called concept lattice. The concept lattice obtained from a formal context (G, M, I) is denoted by BðG; M; IÞ. Fig. 1

1882

S.M. Dias, N.J. Vieira / Expert Systems with Applications 40 (2013) 1880–1887 Table 2 Abbreviations of TB symptoms. Symptoms

Abbreviation

Persistent cough Sputum production Sputum prod. is Muco-Purulent Sputum bloody Clear sputum Weight loss Extreme night sweats No appetite Chest pain Shortness of breath Tuberculosis contact Tiredness

PC SP MC BS CS WL NS NA CP SB TC TN

Fig. 1. Concept lattice originated from the formal context of TB dataset.

presents the line diagram (Ganter & Wille, 1999) of the concept lattice originated from the formal context of Table 1.1 Each node in the line diagram represents a formal concept. The objects are shown inside white boxes drawn below some concept nodes and the attributes inside gray boxes drawn above some concept nodes. The boxes are distributed in such a way that the extent of a concept is obtainable by collecting all objects from the concept node to the lattice infimum, and its intent is obtainable by visiting all attributes from the concept node to the lattice supremum. The labeling can also be explained by using the notions of object concept and attribute concept. Given an object g, cg is the object concept (g00 , g0 ), and lm, for an attribute m, is the attribute concept (m0 , m00 ). Then, the labeling of BðG; M; IÞ proceeds as follow: for each object g the formal concept cg is labeled g and for attribute m the formal concept lm is labeled m. The line diagram of Fig. 1 has 101 concept nodes. It must be noticed that in spite of the small number of objects on Table 1, the 1 All line diagrams in this paper were drawn using the Conexp software (Yevtushenko, 2000).

resulting lattice is somewhat complex and its visual analysis is not simple, thus justifying approaches for lattice reduction and simplification without losing relevant information. The first part of the basic theorem(Wille, 1982) on concept lattices says that a concept lattice BðG; M; IÞ is a complete lattice in which for any arbitrary set C # BðG; M; IÞ the infimum and supreV T S W S T mum are given by C = ( X, ( Y)00 ) and C = (( X)00 , Y), where X = {Aj(A, B) 2 C} and Y = {Bj(A, B) 2 C}. The knowledge embodied by a concept lattice can be used to derive rules P ? Q, where P and Q are sets of attributes, which express that P0 # Q0 , or in words, if an object has all the attributes in P, then it has all those of Q (equivalently, Q # P00 )(Ganter & Wille, 1999). For instance, the rule NS,CP ? TB, meaning that those who have NS (extreme night sweets) and CP (chest pain) have TB (tuberculosis), follows from the lattice of Fig. 1, although not immediately visible in there. Visually, a rule like NS,CP ? TB is seen as valid if there is an ascending path from the formal concept V of intent {l NS,lCP} to lTB. A set of rules is said to be redundant if it has a rule that follows from the others. The set of all rules for a formal context usually has a high degree of redundancy and its extraction can be impractical. A set of rules C is said to be complete, if any rule of the set of all rules follows from C. The minimum complete non-redundant rule set, is known as stem base or Duquenneuigues base (Duquenne & Guigues, 1986). In general it may be difficult to compute (Kuznetsov & Obiedkov, 2008; Distel, 2011; Distel & Sertkaya, 2011), but that is not the case for the TB example. In cases where the stem base is not a proper option, other less strict restrictions can be used, for example a set of reduced to the left implications (Carpineto & Romano, 2003). Table 3(a) shows all rules concluding TB taken from the stem base of the TB data base. A set of expert rules, taken from Horner (2007), are presented alongside in Table 3(b). The first important thing to note is that in general, the extracted rules have less attributes on the antecedent than the expert rules. So TB is diagnosed based on less symptoms in the former case. Another important feature is that the expert rules are all subsumed by the extracted rules, meaning that from the logic point of view, all rules of the expert are consequences of the extracted rules. Table 4 shows the subsumed information. Implications 1 and 2 are new; there is no corresponding (subsumed) expert rule. It seems that the high number of subsumptions may be caused by the facts that the data on Table 1 shows is mostly correct (i.e., consistent with the expert knowledge), but the set of recorded patients is relatively small (only 21). Nonetheless, there is no way to know with certainty if a rule constructed by FCA is worth retaining as a good rule (substituting another subsumed one, whether it is the case), unless by consulting human experts. Another strategy would be to carefully construct a much more larger formal context, than that of Table 1, only then can we extract the set of rules. Our goal in this paper is to achieve the same kind of consistency that the extracted FCA rules have relative to the expert rules (corroborated by the high number of subsumptions). The idea is Table 3 (a) TB FCA rules. (b) TB expert rules. a

b

#

FCA rules

#

Expert rules

1 2 3 4 5 6 7 8

NS CP ?TB WL TN ?TB NA CP ?TB BS ?TB PC SP NS ?TB WL NS ?TB WL CP ?TB TC ?TB

1 2 3 4 5 6 7 9

PC PC PC PC PC PC PC PC

SP BS WL ?TB SP BS NS ?TB WL NS ?TB SP BS TC ?TB SP BS CP NA ?TB SP BS SB ?TB WL CP SB ?TB SP BS CP TN ?TB

1883

S.M. Dias, N.J. Vieira / Expert Systems with Applications 40 (2013) 1880–1887 Table 4 Expert rules subsumed by the FCA rules. FCA rule #

Expert rule #

1 2 3 4 5 6 7 8

– – 5 1, 2, 4, 5, 6, 8 2 3 7 4

to reduce the lattice by means of an appropriate attributes weighting, yet producing rules afterwards, highly consistent with those produced before reduction. 3. Applying JBOS to the TB data base The JBOS method (Dias, 2010; Dias & Vieira, 2010) is based on the replacement of groups of ‘‘similar’’ objects by representative ones. Naturally, the key entity here is the notion of similarity. For its definition, a weight is associated to each attribute of the formal context, which seeks to represent the attribute’s relevance. As the structure of the concept lattice is derived from the arrangements of attributes of objects, it is expected that the junction of similar objects will preserve, at a certain degree (which will depend on the specific assigned weights) a portion of the original structure, while providing a certain degree of simplification. In the rest of this section, let (G, M, I) be a formal context from which a reduced formal context must be produced. JBOS demands that to each m 2 M is assigned a weight wm such that 0 6 wm 6 1. The purpose of such a number is to express the significance of the attribute from 0 (no relevance) to 1 (absolute relevance). It must be defined by the user based on its importance relative to the specific task the user has in the mind, because attributes can have different relevance for different tasks. Two different sets of weights will be used for the task of TB diagnosis; one based on the frequency of appearance of each symptom in those patients actually having TB, and the other based on the frequency of each symptom in the expert rules. The numbers computed from Tables 1 and 3(b) are shown in Table 5, where attribute TB has a weight of 1 for both cases. The similarity between two objects g, h 2 G is also expressed by a real number between 0 (completely dissimilar) and 1 (completely indistinguishable):

P simðg; hÞ ¼

lðg; h; mÞ ¼

m2M

P

lðg; h; mÞ

ð1Þ

m2M wm



That is, the similarity between g and h is given by the weighted sum of the weights of attributes in which both objects agree with each other (both having them or both not having them). The JBOS algorithm uses the similarity matrix as a starting point for the construction of clusters of ‘‘similar’’ objects. It has two parameters: a similarity index, , and a maximum number of similar elements, a, both defined by the user. Two objects g, h 2 G are considered similar by the algorithm, and therefore are likely to stay in the same group if and only if sim(g, h) P . The parameter a is just one more cutting option for the user: it specifies that each group can contain at most a elements. Even though sim is actually a tolerance relation, the JBOS algorithm computes maximal groups such that two objects g and h are in the same group only if sim(g, h) P . The actual equivalence relation in not completely specified by the algorithm (it is non deterministic): if sim(g, h) P  and sim(h, i) P , but sim(g, i) < , h can be either in the same group as g or in the in the same group as i; the algorithm doesn’t specify which group is selected, as usual in clustering algorithms (Jain, Murty, & Flynn, 1999; Xu, Wunsch, & Donald, 2005). Let c be the set of all clusters of objects constructed from G by the JBOS approach. In the reduced formal context each set H 2 c will be considered as an object and such object will have the attributes of M which are common to all objects in H. That is, the T 0 attributes of H will be those in {g jg 2 H}, where the operator of derivation is applied from I. Therefore, by the JBOS method the reduced formal context (Gr, Mr, Ir) is such that: Gr = c; Mr = S T 0 T { {g jg 2 H}jH 2 c}; and (H,m) 2 Irif, and only if, m 2 {g0 jg 2 H}. Table 6 presents the reduced formal context obtained by application of JBOS to the formal context of Table 1 using the attribute weights of the second column of Table 5,  = 0.65and a = 3. Each row is headed by a set H of similar objects, as grouped by JBOS, T and its marks go to the attributes in {g0 jg 2 H}, as explained above. For example, the row headed by {17, 18} indicates that the objects 17 and 18 of the original formal context were grouped in one object in the reduced formal context with attributes 170 \ 180 = {PC, WL, NS, TN, TB}. Since the TB base is relatively small and focused on a very specific kind of task, the choices of  and awere made in order to achieve around 30% of reduction on the set of formal concepts. Larger data bases could require greater reductions, possibly to allow focus on a specific type of encoded knowledge. The choice of those two parameters is empirical, obtained through one or more experiments guided by the characteristics of the data and task. The reduced formal context of Table 6 results in the conceptual lattice of Fig. 2, which has 68 formal concepts, 33 less concepts than the lattice of Fig. 1.

wm ; if ðg; mÞ 2 I $ ðh; mÞ 2 I 0;

Table 6 Reduced formal context for

otherwise:

Table 5 Weights of attributes. Attribute

Based on the formal context

Based on the expert rules

PC SP MC BS CS WL NS NA CP SB TC TN TB

0.79 0.57 0.36 0.07 0.14 0.93 0.79 0.71 0.71 0.43 0.21 0.79 1

1 0.75 0 0.75 0 0.38 .025 0.13 0.38 0.25 0.13 0.13 1

 = 0.65 and a = 3.

Obj/Att

PC

SP

MC

1 27 3 16 4 14 5 6 89 10 11 12 13 15 17 18 19 20 21

x x x x

x x x

x

CS

x

x x x x x x x x

x x x x

WL x x x x x x

x x x

x x x

x x x x x

NA x x x x x x x x

x

x

CP

SB

x x x

x

x x

x x

x x

TC

x

x

x

TN

TB

x x x x

x x x x x x x x

x x

x x x

x

NS

x x x

x x x x x

x

1884

S.M. Dias, N.J. Vieira / Expert Systems with Applications 40 (2013) 1880–1887

The stem base derived from Table 6 surprisingly has exactly the same rules as the stem base of Table 3 minus rules 4 and 5. Even without those two rules, all objects receive the same classification, that is, those rules are actually unnecessary for the purpose of TB classification. In conclusion, even after the reduction specified above, exactly the same number of correct classifications is achieved as that obtained without reduction: 95.2%, as 20 objects of the original formal context (Table 1) are correctly classified as having TB or not classified as having TB; only object 9 is not correctly classified. It is important to emphasize that the above reduction, despite simplifying the formal context and the concept lattice, resulted in rules with exactly the same level of performance as the original ones. The rate of successful applications of the set of rules to the objects of the original formal context is called fidelity in Dias and Vieira (2010); it is given by:



 Pk  Nfi i¼1 1  jGj

ð2Þ

k

where Nfi is the total number of failures of the rule ri (i.e. the number of objects in G for which the rule fails) and k is the number of rules. Thus, F goes from 0 (all rules fail for all objects) to 1 (no rule fails for any object). The value of F for the above reduction is, therefore 1. As the set of attributes of an object in the reduced formal context is the intersection of the sets of attributes of the constitutive original objects, there are some losses of attributes for some objects. This is measured by the so-called descriptive loss defined in Dias and Vieira (2010):

P DL ¼

g2G



0

1  jmjgðgÞ0 j j



jGj

ð3Þ

where m(g) is the set of objects from G that contains g and is an object in the reduced formal context, jm(g)0 j is the number of attributes of the reduced object m(g) and j g0 j is the number of attributes of the original object g. The descriptive loss for reduced formal context of the TB data is DL = 0.11, indicating that few attributes were disregarded as unimportant. The high fidelity shows that such a loss is not important for the present purpose.

Table 7 summarizes the above indices, where all numbers go from 0 (lowest) to 1 (highest). Another JBOS reduction was produced based on the attribute weights of the third column of Table 5, which were derived from the knowledge present on the expert rules. The reduction used the same values for  (0.65) as before and a = 5. Table 8 presents the reduced formal context and Fig. 3 the respective conceptual lattice. The number of objects decreased from 21 to 11, a 52% reduction, and the number of concepts from 101 to 22, a high reduction of 78%. It must be emphasized that two attributes are eliminated by JBOS (CS and TC), as in the reduced formal context they would not be marked for any object, and consequently in the reduced formal context the infimum would be ({}, {CS,TC}). Table 9(a) shows the stem base extracted from the reduced lattice. Thus, the number of rules decreased from 8 (see Table 3(b)) to 5. However, Table 9(b) shows that rules 1, 4 and 5 subsume all eight expert rules. Showing that, even producing different rules (which was not the case in the former reduction based on the formal context knowledge), the resultant rules are still consistent with the original expert rules as far as the knowledge in the original formal context is concerned. The new set of rules classifies correctly more patients from the original formal context than the original set of expert rules! In particular, objects 4 and 5, which are not correctly classified by the expert rules, are now classified by the two new rules of 9(b): rules 2 (object 4) and 3 (object 5). As a consequence, the number of correct classifications grew from 81% to 90%. The fidelity index for the present reduction is 0.96 and the description loss 0.34. Fidelity is not integral anymore and the description loss is relatively high as consequences of the greater reduction (even with the same  and a as the former reduction). Table 10 summarizes the last numbers. To conclude, both reductions based on weights set in accordance with previous knowledge, in one case knowledge of the actual patients submitted to analysis, and on the other knowledge as expressed by an expert, resulted in rule sets very consistent with the original ones. In the first case, after a relatively small reduction of 33% on the number of concepts, it was obtained basically the same set of rules as those before reduction, a very impressive result indeed! In the second case, after a big reduction of 78% on the number of concepts, the classification proportionate to the new rules became adjusted to the original formal context in order to

Table 7 Indices for the reduction based on original formal context. Index

Value

# of objects reduction # of concepts reduction Correct classification Fidelity Description loss

0.11 0.33 0.95 1.00 0.11

Table 8 Reduced formal context using weight based on expert.

Fig. 2. Concept lattice extracted from the reduced formal context.

Obj/Att

PC

SP

MC

1 2 3 7 10 16 4 6 17 18 589 11 12 13 20 14 15 21 19

x x x

x x

x

BS

WL

NS

x x x

NA

CP

SB x

x

x x

TN

x x

x x x x

x x x

x

x x

x

x x

x

x

x

TB x x x x

x x x

x x

x

S.M. Dias, N.J. Vieira / Expert Systems with Applications 40 (2013) 1880–1887

1885

extracted after reduction using weights based on expert’s rules (Table 9 (a)) showed the same result. The FCA rules shown in Table 3(a) and those extracted from the reduced formal context classified correctly the same number of patients (90%). Thus, even after reduction the rules obtained were able to classify correctly a significant portion of data not previously seen.

4. Related works

Fig. 3. Concept lattice extracted from the reduced context using weight based on expert. Table 9 (a) Rules after reduction using weights based on expert’s rules. (b) Expert rules subsumed. #

Extracted rule

Extracted rule #

Expert rule #

1 2 3 4 5

PC CP ?TB PC WL TN ?TB NS CP ?TB BS ?TB PC SP TN ?TB

1 2 3 4 5

5, 7, 8

1, 2, 4, 5, 6, 8 8

Table 10 Indices for the reduction based on expert’s rules knowledge. Index

Value

# of objects reduction # of concepts reduction Correct classification Fidelity Description loss

0.52 0.78 0.90 0.96 0.34

Table 11 Formal context of TB dataset validation. Obj/ Att

PC

SP

MC

22 23 24 25 26 27 28 29 30 31

x x x x x x x x

x x x x x x

x x x x x

BS

CS

x

WL

NS

x x x x x x x x x

x x x x x x x

NA

x x x

CP x x x x

x

x

x

x x x

SB

TC

TN

x x x x

x x x x x x x x

x

TB

x x x x x x x x x

give better classification of its data, with 2 new rules including two previously unclassified patients. Table 11, taken from Horner (2007), contains a validation set. Using the expert rules presented in Table 3(b) 80% of the patients of the validation set were correctly classified. Using the rules

The notion of reduced context in this work is different from that defined in Ganter and Wille (1999), where reduction is based on eliminating objects and attributes and still producing a concept lattice isomorphic to that derived from the original formal context. The reduced concept lattice is unique up to isomorphism. In particular the clarification of a context, that is, the merging of objects with the same intents and attributes with the same extents are not necessarily accomplished by JBOS, as JBOS does not consider that attributes with the same extents are to be merged. Several approaches try to choose ‘‘important’’ concepts of the original concept lattice, thus forming a sublattice of the original one (Belohlavek & Macko, 2011; Belohlavek & Vychodil, 2009; Godin & Mili, 1993; Klimushkin, Obiedkov, & Roth, 2010; Kuznetsov, 1990; Rice & Siff, 2001; Stumme et al., 2002). Each one of those approaches is based on a specific measure of importance. Besides also trying to produce more relevant concepts. The JBOS approach is different from all of those mainly in two aspects: the reduced lattice in JBOS is a complete lattice generated from a ‘‘reduced concept lattice’’, not necessarily a sublattice of the original and JBOS does not have to construct the set of all concepts of the original lattice in order to choose more important concepts. The intention of JBOS is to produce a reduced formal context from which a whole lattice of ‘‘important’’ formal concepts is produced. A possible way to choose between concepts, as in Stumme et al. (2002), is to select frequent formal concepts, an adaptation of notion of frequent items to FCA. In this context, the support of a particular formal concept is defined as the number of objects which have all the attributes of the formal concept intent; and a formal concept is considered frequent if its support is not less than a specified minimum. Rice and Siff (2001) uses the idea of cluster analysis in applying a distance function to formal concept extents in order to select the concepts to retain. The algorithm computes a tractable subset of extents based on the distance function in an efficient way. With relation to the proposed distance function, there are two conceptual differences between that proposal and the similarity function of JBOS: in that work the attributes are considered to have exactly the same importance, while JBOS use weighting; secondly, the distance function is measured relative only to the union of the attributes of both concept’s intents, while in JOBS an object not having an attribute is considered as important as not having it. Another technique is that based on the concept of stability proposed by Kuznetsov (1990). Unlike other methods of reduction, the stability method seeks to create an index for concepts that indicates how much the intent of the concept depends on the set of objects. Using this index and a specific threshold, all formal concepts with lower values are removed. However, the stability is calculated for all formal concepts that can be generated from the original formal context. In Belohlavek and Vychodil (2009) the user’s knowledge is used to create constraints on attributes called AD-formulas (attributedependency formulas); in the generation of formal concepts and concept lattice only formal concepts conforming to AD-formulas are retained.

1886

S.M. Dias, N.J. Vieira / Expert Systems with Applications 40 (2013) 1880–1887

Gajdos, Moravec, and Snásel (2004), Cheung and Vogel (2005), King (2004) and Kumar and Srinivas (2010) proposes the use of SVD (singular value decomposition), which is a technique capable of reducing the dimensionality of data masses. Interestingly, in the approach of King (2004), when merging similar objects, the resulting object will have the attributes of union of their intents, while in JBOS the resulting object will have the attributes of intersection of their intents. This reflects the fact that JBOS was designed having in mind applications intrinsically different from that considered in that work. This illustrates the fact that different kinds of applications, even when all of them are approached by means of FCA, could require different approaches based on the specific meanings associated to ‘‘object’’, ‘‘attribute’’, etc. Belohlavek and Macko (2011) explore the use of attributes weight in order to assign an importance index to concepts. Here, again, in contrast with JBOS, all formal concepts have to be generated in order to choose those above a specified threshold. As in JBOS, that work shows that it is possible to direct the attention to important concepts, relative to a particular application, by means of an adequate set of attributes weighting. In some extensions of FCA, for example, where are introduced the notions of fuzzy formal context or decision formal context, there are some other approaches to reduction, as Konecny and Krupk (2011) and Pei, Li, and Mi (2011) (to fuzzy contexts) and Wang and Zhang (2008a) and Li, Mei, and Lv (2011b) (to decision formal context).

5. Conclusions The JBOS method of reduction was assessed in this paper by experimentations with the problem of diagnosis from a TB formal context. The main goal was to show that by means of attributes weight assignment it is possible to achieve a suitable level of performance in a specific task after reduction. Two reductions, based on different approaches for setting attribute weights, were made. The first one used the knowledge embodied by the original formal context by choosing as weight of an attribute the frequency of the respective symptom in those patients with TB, while the second used knowledge expressed in expert rules by assigning as weight of an attribute the frequency of the symptom in such rules. The assessment was made, in the first case, by comparison of the performances of the sets of extracted rules (stem bases) before and after the reduction, and in the second case, by comparison of the performances of the set of extracted rules after reduction with that of the experts rules on which the weights were based. The two tests revealed interesting results. The first one achieved a reduction of 33% on the number of concepts, but the set of generated rules included all rules extracted before reduction, and the performance in the task of TB diagnosis was exactly the same before and after reduction. The second test, resulted in a reduction of 78% on the number of concepts, and even then, all rules of the expert were subsumed by the extracted rules and the performance improved from 81% to 90% correct diagnosis. This last result can be explained by noticing that the weighting process, besides incorporating the expert knowledge, resulted in rules well adjusted to the knowledge expressed in the original formal context, from which the reduced formal context was constructed. To conclude, both reductions resulted in rule sets absolutely consistent with the original ones (the extracted from the original formal context, and from the expert rules, respectively). After the tests with the original data, the expert rules, the FCA rules, and both set of rules extracted after reduction were used to classify objects of a validation set. This last test has shown that

after reduction, in both cases, the performance was exactly the same as that before reduction.

References Belohlavek, R., & Macko, J. (2011). Selecting important concepts using weights. In Proceedings of the 9th international conference on Formal concept analysis, ICFCA’11. Heidelberg: Springer-Verlag (pp. 65–80). Bélohlávek, R., Sklenár, V., & Zacpal, J. (2004). Formal concept analysis with hierarchically ordered attributes. International Journal of General Systems, 33, 383–394. Belohlavek, R., & Vychodil, V. (2009). Formal concept analysis with background knowledge: Attribute priorities. Transactions on Systems, Man, and Cybernetics Part C, 39, 399–409. Carpineto, C., & Romano, G. (2003). Mining short-rule covers in relational databases. Computational Intelligence, 19, 215–234. Cheung, K. S. K., & Vogel, D. (2005). Complexity reduction in lattice-based information retrieval. Information Retrieval, 8, 285–299. Codocedo, V., Taramasco, C., & Astudill, H. (2011). Cheating to achieve formal concept analysis over a large formal context. In The eighth international conference on concept lattices and their applications – CLA 2011, Nancy, France, (pp. 349–362). Davey, B., & Priestley, H. (1990). Introduction to lattices and order. Cambridge, England: Cambridge University Press. Dias, S. M. (2010). Algoritmos para geração de reticulados conceituais, Master’s thesis, Federal University of Minas Gerais (UFMG), Institute of Exact Sciences, Department of Computer Science, Belo Horizonte, Minas Gerais, Brazil [In portuguese]. Dias, S. M., & Vieira, N. J. (2010). Reducing the size of concept lattices: The JBOS approach. In Proceedings of the 7th international conference on concept lattices and their applications. Vol. 672, Seville, Spain (pp. 80–91). Distel, F. (2011). Some complexity results about essential closed sets. In Valtchev, P., & Jäschke, R. (Eds.), International conference on formal concept analysis. Vol. 6628 of LNCS (pp. 81–92). Distel, F., & Sertkaya, B. (2011). On the complexity of enumerating pseudo-intents. Discrete Applied Mathematics, 159, 450–466. Duquenne, V., & Guigues, J. L. (1986). Familles minimales d’implications informatives resultant d’un tableau de données binaires. Mathématiques et Sciences Humaines, 5–18. Falk, I., & Gardent, C. (2011). Combining formal concept analysis and translation to assign frames and thematic role sets to french verbs. In The eighth international conference on concept lattices and their applications – CLA 2011, Nancy, France (pp. 87–99). Gaillard, E., Lieber, J., & Nauer, E. (2011). Adaptation knowledge discovery for cooking using closed itemset extraction. In The eighth international conference on concept lattices and their applications – CLA 2011, Nancy, France (pp. 87–99). Gajdos, P., Moravec, P., & Snásel, V. (2004). Concept lattice generation by singular value decomposition. In International conference on concept lattices and their applications – CLA (pp. 102 –110). Ganter, B., & Wille, R. (1999). Formal concept analysis: Mathematical foundations. Germany: Springer-Verlag. Godin, R., & Mili, H. (1993). Building and maintaining analysis-level class hierarchies using galois lattices. SIGPLAN Not, 28, 394–410. Godin, R., Saunders, E., & Gecsei, J. (1986). Lattice model of browsable data spaces. Information Science, 40, 89–116. Horner, V. (2007). Developing a consumer health informatics decision support system using formal concept analysis, Master’s thesis, University of Pretoria. Jain, A. K., Murty, M. N., & Flynn, P. J. (1999). Data clustering: A review. ACM Computing Surveys, 31, 264–323. Jay, N., Kohler, F., & Napoli, A. (2008). Analysis of social communities with iceberg and stability-based concept lattices. In Proceedings of the 6th international conference on Formal concept analysis, ICFCA’08. Berlin, Heidelberg: SpringerVerlag (pp. 258–272). King, C. S. (2004). Complexity reduction in lattice-based information retrieval: Theory, prototype development, and evaluation, Ph.D. thesis, City University of Hong Kong. Klimushkin, M., Obiedkov, S., & Roth, C. (2010). Approaches to the selection of relevant concepts in the case of noisy data. In L. Kwuida & B. Sertkaya (Eds.), 8th International Conference ICFCA 2010. Berlin/ Heidelberg: Springer (pp. 255–266). Koester, B. (2006). FooCA – Web information retrieval with formal concept analysis, Verlag Allgemeine Wissenschaft. Konecny, J., & Krupk, M. (2011). Block relations in fuzzy setting. In International conference on concept lattices and their applications (pp. 115–130). Kumar, C. A. (2011). Mining association rules using non-negative matriz factorization and formal concept analysis. In 5th International conference on information processing – ICIP (pp. 31–39). Kumar, C. A., & Srinivas, S. (2010). Mining associations in health care data using formal concept analysis and singular value decomposition. Journal of Biological Systems, 18, 787–807. Kuznetsov, S. (1990). Stability as an estimate of the degree of substantiation of hypotheses derived on the basis of operational similarity. NauchnoTekhnicheskaya Informatsiya, 21–29. Kuznetsov, S. (2001). On computing the size of a lattice and related decision problems. Order, 18, 313–321.

S.M. Dias, N.J. Vieira / Expert Systems with Applications 40 (2013) 1880–1887 Kuznetsov, S. (2007). On stability of a formal concept. Annals of Mathematics and Artificial Intelligence, 49, 101–115. Kuznetsov, S., & Obiedkov, S. (2002). Comparing performance of algorithms for generating concept lattices. Journal of Experimental & Theoretical Artificial Intelligence, 14, 189–216. Kuznetsov, S. O., & Obiedkov, S. (2008). Some decision and counting problems of the duquenne – Guigues basis of implications. Discrete Applied Mathematics, 156, 1994–2003. Li, J., Mei, C., & Lv, Y. (2011a). A heuristic knowledge-reduction method for decision formal contexts. Computers and Mathematics with Applications, 61, 1096–1106. Li, J., Mei, C., & Lv, Y. (2011b). Knowledge reduction in decision formal contexts. Knowledge-Based Systems, 24, 709–715. Liu, J., & Mi, J. S. (2008). A novel approach to attribute reduction in formal concept lattices. Lecture Notes in Computer Science – Rough Sets and Knowledge Technology (Vol. 5009, pp. 426–433). Berlin/ Heidelberg: Springer. Pei, D., Li, M. Z., & Mi, J. S. (2011). Attribute reduction in fuzzy decision formal contexts. In 2011 International conference on machine learning and cybernetics (ICMLC) Vol. 1 (pp. 204 –208). Pei, D., & Mi, J. S. (2011). Attribute reduction in decision formal context based on homomorphism. International Journal of Machine Learning and Cybernetics, 2, 289–293. Poelmans, J., Elzinga, P., Viaene, S., & Dedene, G. (2010). Formal concept analysis in knowledge discovery: A survey. In ICCS (pp. 139–153). Riadh, T. M., Le Grand, B., Aufaure, M. A., & Soto, M. (2009). Conceptual and statistical footprints for social networks’ characterization. In Proceedings of the 3rd Workshop on Social Network Mining and Analysis SNA-KDD ’09 (pp. 1–8). New York, NY, USA: ACM. Rice, M. D., & Siff, M. (2001). Clusters, concepts, and pseudometrics. Electronic Notes in Theoretical Computer Science, 40, 323–346.

1887

Roth, C., Obiedkov, S., & Kourie, D. G. (2006). Towards concise representation for taxonomies of epistemic communities. In Proceeding of the internacional conference on concept lattices and their applications (pp. 205–218). Soldano, H., Ventos, V., Champesme, M., & Forge, D. (2010). Incremental construction of alpha lattices and association rules. In Knowledge-based and intelligent information and engineering systems (pp. 351–360). Stumme, G., Taouil, R., Bastide, Y., Pasquier, N., & Lakhal, L. (2002). Computing iceberg concept lattices with titanic. Data and Knowledge Engineering, 42, 189–222. 34. Valtchev, P., Missaoui, R., & Godin, R. (2004). Formal concept analysis for knowledge discovery and data mining: The new challenges. Concept Lattices (vol. 2961). Berlin/ Heidelberg: Springer. 3901-3901. Wang, H., & Zhang, W. X. (2008a). Approaches to knowledge reduction in generalized consistent decision formal context. Mathematical and Computer Modelling, 48, 1677–1684. Wang, X., & Zhang, W. (2008b). Relations of attribute reduction between object and property oriented concept lattices. Knowledge-Based System, 21, 398–403. Wenxiu, Z., Ling, W., & Jianjun, Q. (2005). Attribute reduction theory and approach to concept lattice. Science in China Series F: Information Sciences, 48, 713–726. Wille, R. (1982). Restructuring lattice theory: An approach based on hierarchies of concepts. In Rival, I. (Ed.): Ordered sets (pp. 445–470). Xu, R., Wunsch, I., & Donald, C. (2005). IEEE Transactions on Survey of Clustering Algorithms Neural Networks, 16, 645–678. Yevtushenko, S.A. (2000). System of data analysis ‘‘concept explorer’’. In Proceedings of the 7th national conference on Artificial Intelligence KII-2000, Russia, (pp. 127– 134). Zárate, L. E., & Dias, S. M. (2009). Qualitative behavior rules for the cold rolling process extracted from trained ann via the fcann method. Engineering Application of Aritificial Intelligence, 22, 718–731.