Available online at www.sciencedirect.com Available online at www.sciencedirect.com
Available online at www.sciencedirect.com
ScienceDirect Procedia Computer Science 00 (2019) 000–000 Procedia Computer Science 14500 (2018) 312–318 Procedia Computer Science (2019) 000–000
www.elsevier.com/locate/procedia www.elsevier.com/locate/procedia
Postproceedings of the 9th Annual International Conference on Biologically Inspired Cognitive Postproceedings of the 9thBICA Annual International Conference Inspired Cognitive Architectures, 2018 (Ninth Annual MeetingonofBiologically the BICA Society) Architectures, BICA 2018 (Ninth Annual Meeting of the BICA Society)
Sigma-Pi Sigma-Pi Neural Neural Networks:Error Networks:Error Correction Correction Methods Methods L.A. Lyutikova L.A. Lyutikova Institute of Applied Mathematics and Automation, 360000, KBR, Nalchik, st.Shortanova 89a,
[email protected] Institute of Applied Mathematics and Automation, 360000, KBR, Nalchik, st.Shortanova 89a,
[email protected]
Abstract Abstract This paper considers the application of logical error correction methods for ΣΠ-artificial neural networks within classification This paperThe considers application of logical forregularities ΣΠ-artificial networks withinWe classification problems. logicalthe methods are widely used error in thecorrection detection methods of implicit in neural a domain under study. propose a problems. to The logical methods are widely usedtointhe thestructure detection regularities in a domain under study. features We propose technique reveal such regularities according of of theimplicit ΣΠ-neuron. This notably improves adaptive of thea technique reveal such regularities accordingapproach to the structure of the Thissystem notably improves adaptive features of the recognitiontosystem. We assert that a combined contributes to ΣΠ-neuron. the recognition performances. It allows as a solution recognition system. We assert that a combined approach contributes to the recognition system performances. It allows as a solution indicating objects closest by characteristic features identified using a logical correction method to the requested ones in case of the indicating objects closest by characteristic features identified using a logical correction method to the requested ones in case of the ΣΠ-neurons incorrect response. ΣΠ-neurons incorrect response. c 2018 � 2019 The The Authors. Authors. Published Published by by Elsevier Elsevier B.V. B.V. © c 2019 � The Authors. Published by Elsevier B.V. This is This is an an open open access access article article under under the the CC CC BY-NC-ND BY-NC-ND license license (https://creativecommons.org/licenses/by-nc-nd/4.0/) (https://creativecommons.org/licenses/by-nc-nd/4.0/) This is an open access article under the CC BY-NC-ND license Peer-review under responsibility of the scientific of the Peer-review under responsibility of the scientific committee committee of (https://creativecommons.org/licenses/by-nc-nd/4.0/) the 9th 9th Annual Annual International International Conference Conference on on Biologically Biologically Inspired Inspired Peer-review under responsibility of the scientific committee of the 9th Annual International Conference on Biologically Inspired Cognitive Architectures. Cognitive Architectures. Cognitive Architectures. Keywords: logical analysis, data analysis, algorithm, ΣΠ-neuron, training set, decision trees, error-correction. Keywords: logical analysis, data analysis, algorithm, ΣΠ-neuron, training set, decision trees, error-correction.
1. Introduction 1. Introduction Now a days Pattern Recognition in Theoretical Computer Scienceis presented by a whole section where principles a days Recognition in Theoretical Computer presented by asignals, whole section where andNow methods arePattern developed to classify and recognize objects,Scienceis phenomena, processes, situations withprinciples the help and methods are developed to classify and recognize objects, phenomena, processes, signals, situations with the help of a certain set of characteristics or properties that describe an object [1]. of aThere certain of characteristics or properties that describe anrecognizing object [1]. and any of them has advantages and drawareset a number of techniques and methods for pattern There are a number of techniques and methods for pattern recognizing and any of them has advantages and drawbacks [2]. backs [2]. The combined approach is applied when it is required to improve several different algorithms performances. Each The combined approach is applied required toset improve several Each of algorithms correctly classifies only when a part itinisthe learning of objects. Thedifferent purposealgorithms of the errorperformances. correction method of algorithms correctly classifies only a part in the learning set of objects. The purpose of the error correction method is to make sure that the errors of some algorithms are compensated by other algorithms and the efficiency of the is to make sure thatis the errors some algorithms areofcompensated by other algorithms and the efficiency of the resulting algorithm higher thanofthe efficiency of any basic algorithms [1,3,4]. resulting algorithm is higher than the efficiency of any of basic algorithms [1,3,4]. ∗ ∗
Corresponding author. Tel.: +7-963-166-40-14. Corresponding Tel.: +7-963-166-40-14. E-mail address:author.
[email protected] E-mail address:
[email protected]
c 2019 1877-0509 Authors. Published Published by by Elsevier Elsevier B.V. 1877-0509 � © 2018 The The Authors. B.V. c 2019 1877-0509 � Thearticle Authors. Published by B.V. This access under the CC licenselicense (https://creativecommons.org/licenses/by-nc-nd/4.0/) This isisan anopen open access article under the BY-NC-ND CCElsevier BY-NC-ND (https://creativecommons.org/licenses/by-nc-nd/4.0/) This is an open access article under the CC BY-NC-ND license (https://creativecommons.org/licenses/by-nc-nd/4.0/) Peer-review under responsibility ofofthe scientific committee of the 9th 9th Annual International Conference on Biologically Inspired Cognitive ArchiPeer-review under responsibility the scientific committee of the Annual International Conference on Biologically Inspired Cognitive Peer-review under responsibility of the scientific committee of the 9th Annual International Conference on Biologically Inspired Cognitive ArchiArchitectures. tectures. tectures. 10.1016/j.procs.2018.11.077
2
L.A. Lyutikova / Procedia Computer Science 145 (2018) 312–318 L.A. Lyutikova / Procedia Computer Science 00 (2019) 000–000
313
Indeed, in practice we know a variety of approaches to construction of combined corrective algorithm sthat are very productive [4-9]. This paper deals with approach based on the logical analysis of data which is employed for corrective actions of the ΣΠ-neural network. The ΣΠ-neuron (sigma-pi neuron) is a generalizing model of the classical formal neuron with a linear summation ofthe input signals sp(x1 , ..., xn ). It can be assumed that ΣΠ-neural network store rules for the object recognition in its weights, but these rules are implicit and to detect the reason that cause an error sometimes is very difficult. We propose creating functions of control and error correction for the ΣΠ-neural network using logical analysis methods. These methods allow reconstructing a training set with respect to the weights, analyzing it, building the knowledge base and minimizing it, and modifying the result with respect to the designed rulesin case of the ΣΠ-neurons erroneous operation. The article deals with the detection method of previously unknown, practically useful, accessible knowledge in the structure of ΣΠ-neuron required for decision-making. 2. Formulation of the problem Assume X = {x1 , x2 , ..., xn } xi ∈ {0, 1, ..., ki −1}, where ki ∈ [2, ..., N], is a set of the characteristics. Y = {y1 , y2 , ..., ym} is the set of objects, each object of yi is described by the corresponding set of the characteristics x1 (yi ), ..., xn (yi ) : yi = f (x1 (yi ), ..., xn (yi )). Also assume X = {x1 , x2 , ..., xn }, where xi ∈ {0, 1, ..., kr − 1}, kr ∈ [2, ..., N], N ∈ Z are the processed input data; Xi = {x1 (yi ), x2 (yi ), ..., xn(yi )}, i = 1, ..., n, yi ∈ Y, Y = {y1 , y2 , ..., ym } are the output data: x1 (y1 ) x2 (y1 ) x1 (y2 ) x2 (y2 ) ... ... x1 (ym ) x2 (ym )
... xn (y1 ) y1 y ... xn (y2 ) → 2 ... ... ... ym ... xn (ym )
sp(x1 , ..., xn ) =
�
The function type Y = f (X) is not defined. Let the ΣΠ-neuron has the following structure wi
n �
xi ,
i=1
where {w1 , w2 , ..., wk } is the set of weights of the given ΣΠ-neuron that recognizes an element of the given domain Y = {y1 , y2 , ..., yk } obtained using the set of characteristics {X1 , ..., Xk }. And let there be the set of characteristics Xl {X1 , ..., Xk } such that the ΣΠ-neuron responds to the query incorrectly. Lets say the ΣΠ-neurons response is incorrect if 1) the requested element is not identified sp(Xi ) yi ; 2) the identified object does not belong to the given subject domain, i.e. sp(Xi ) = yl , yl {y1 , ..., yn}; 3) the requested identified element X j {X1 , X2 , ..., Xk } belongs to the domain but the training set contains elements with more matching features, i.e. sp(Xi ) = yl , yl ∈ {y1 , ..., yn } and there is y j with the corresponding characteristic vector x j such that � � dim ��Xi − X j �� < dim |Xi − Xl | ,
where dim |Xl − Xi |is a number of mismatches, Xl is the characteristic vector of yl . There is a need to construct a function that will select the most matching element or a class of elements in the training setin case of the ΣΠ-neurons error: F(Xl ) : F(Xl ) = yi , yi ∈ Y, dim |Xl − Xi | → min .
L.A. Lyutikova / Procedia Computer Science 145 (2018) 312–318 L.A. Lyutikova / Procedia Computer Science 00 (2019) 000–000
314
3
3. Basic knowledges about ΣΠ-neural networks The ΣΠ-neuron is an algebraic model of the neuron that reflects information processing in the axon-dendritic neuron system [10-12] providing best approximation abilities. One classical ΣΠ-neuron contains the multilinear function of the sum signal sp(X) = out(θ +
wk
n
xi ),
i=1
where θ is a constant, out is an output function. It is possible to train logic-arithmetic neuron in a single pass through the training set. During training a sequence of logic-arithmetic ΣΠ-neurons {spnk (X)}, spn(X) = sng ◦ sp(X) is constructed. The constructing process is recurrent. 1) At the beginning sp0 (X) = 0, 2) In the k-thstep spk (X) = spk−1 (X) + wk xi , wherei0k = {i : xki = 1}. We get the weight wk as follows: i∈i0k
wk =
sk − spk−1 (Xk ), i f spnk−1 (Xk ) yk , 0, i f spnk−1 (Xk ) = yk ,
where sk is the arbitrary value such that sp(sk ) = yk . Once N steps have been passed we can see that the desired logic-arithmetic ΣΠ-neuron is constructed. This very simple learning algorithm, in the case where out is a threshold function, has been originally proposed by A.V. Timofeev. [13]. EXAMPLE. Let the following subject area be given: Table 1. x1
x2
x3
y
0 0 0 1
0 1 1 1
1 0 1 1
2 4 8 128
The set of the characteristics X is presented by the following values: X = {X1 = (0, 0, 1), X2 = (0, 1, 0), X3 = (0, 1, 1), X4 = (1, 1, 1)}, and the set of objects by the values: Y = {2, 4, 8, 128}. As a result of the ΣΠ-neuron training according to the above table we have: sp(x1 , x2 , x3 ) = 4x2 + 2x3 + 2x2 x3 + 120x1 x2 x3 . Enter query criteria (1, 1, 0) that does not match any of the given sets. The trained ΣΠ-neuron results in: sp(1, 1, 0) = 4. Indeed, object 4 of the set Ymatches the query at two points but object 128 also matches the query at two points and this object is allocated in all sample by the value of the variable x1 . Similar errors ofthe ΣΠ-neuroncan be eliminated through the logical analysis of data. 4. A logical approach in the study of the training set for the decision function design This section discusses the data analysis methods. We propose a construction method for a classifier function, investigate the properties of the function [14,15] and consequently analyze the resulting outputs.
4
L.A. Lyutikova / Procedia Computer Science 145 (2018) 312–318 L.A. Lyutikova / Procedia Computer Science 00 (2019) 000–000
315
Data we have to deal with when solving recognition problems, as is known, are incomplete, inaccurate and equivocal. However, the resulting solutions should comply with explicit and implicit regularities of the subject domain. Logical methods are good enough for evaluating data, distinguishing the essential characteristics from non-essential, identifying the minimum set of rules necessary for complete restoring the source regularities. As a result, we may achieve a more compact and reliable representation of the source information, which can be processed more safely and faster. The constructed system of rules is thought complete fit ensures finding all solutions in the domain under consideration. Objects grouped according to acertain characteristic (or characteristics) are called a class. Each object can represent one or several classes, objects that share similar characteristics are grouped in classes. To explain the logical connection between the subject domain concepts, we use the product rule proposed by E. Post. In our case, the product rule represents the substitution you can see below: x1 (yi )&x2 (yi )&...&xn (yi ) → P(yi ), where P(y) =
1, y = yi 0, y yi
x1 (yi ), x2 (yi ), ..., xn (yi ) a finite number of characteristics describing the element yi . This allows expressive representing the relationship between the object and its characteristics. &mj=1 x j (yi ) → P(yi ),
i = 1, ..., l;
x j (yi ) ∈ {0, 1, ..., k − 1},
where the predicate P(yi ) becomes true, i.e. P(yi ) = 1 in case as y = yi and P(yi ) = 0, as y yi . Alternatively, takes the form below: ∨ni=1 x¯(y j ) ∨ P(y j ), j ∈ [1, ..., m] . Let the solving function be a conjunction of all decisive rules: &mj=1 x j (yi ) → P(yi ), i = 1, ..., l; x j (yi ) ∈ {0, 1, ..., k − 1}, Function (1) can be interpreted as follows: If we describe the training set F (x1 (yi ), ..., xn(yi ), Pσ (y1 ), ..., Pσ(yn )), where P(yi ) as σ = 0, Pσ (yi ) = P(yi ) as σ = 1,
consisting
or
of
f (X) = &mj=1 ∨ni=1 xi (y j ) ∨ P(y j ) . k-elements
using
the
Boolean
(1)
function
then this function takes value 0 from the sets x1 (yi ), ..., xn(yi ), Pσ (y1 ), ..., P(yi), ..., Pσ (yn ) and ”1” from all other sets, i.e. the function allows any relationships between characteristics and objects, except for negation of this object over the set of characteristic features within the training set. Function (1) expresses the relationships between features, between features and objects, reveals all available classes in the given domain, and even object classes grouped by a single feature, allows including new product rules (the function is modifiable), recognizingan object by any entered value from the study domain. If the entered data are not exactly identified in the domain of the function then the function identifies the most appropriate object or class of objects with respect to the input data. In detail, all characteristic features of function (1) are considered in [16]. Since the function is a disjunction of conjunctions of variables having different lengths it can be reduced. From a logical point of view the class K j is a disjunct of decision function (1) containing a set of predicates that detect presence or absence of any elements and variables that are common features for these elements. Therefore, function (1) consists of variables whose combination is not a characteristic feature of any of the classes or individual objects, orobject part (disjuncts) that contain predicates of objects. The algorithm for constructing the object part of the decision function is considered in [16].
316
L.A. Lyutikova / Procedia Computer Science 145 (2018) 312–318 L.A. Lyutikova / Procedia Computer Science 00 (2019) 000–000
5
Example 1.The decision function has the following form: f (X) = (x1 ∨ x2 ∨ x3 ∨ P(128))&(x1 ∨ x2 ∨ x3 ∨ P(8))&(x1 ∨ x2 ∨ x3 ∨ P(2))&(x1 ∨ x2 ∨ x3 ∨ P(4)) = = x2 x3 ∨ x1 x2 ∨ x1 x3 ∨ x3 P(4) ∨ x2 P(2) ∨ x1 P(128) ∨ P(8)x2 x3 ∨ P(128)P(8)P(2)x3∨ ∨P(128)P(8)P(4)x2 ∨ P(4)P(8)P(2)x1. The object part of the function we represent through the disjuncts f ∗ (X) = x3 P(4) ∨ x2 P(2) ∨ x1 P(128) ∨ P(8)x2 x3 . When running a query with the characteristic vector (1, 1, 0) the logical function results in: f (1, 1, 0) = P(4) ∨ P(128). In contrast to a neural network, this function proposes objects ”4” and ”128” to the query. As a result of applying this algorithm, as well as function (1), we do not get a single answer, but possible options that are suitable for the given query according to the relevant characteristics. The results obtained can be studied usinga frequency analysis, weighted voting system, or an expert evaluation. An expert can identify several thresholds and correspondingly rank the obtained regularities with respect to their reliability. 5. Extracting rules from trained neural networks The result of training the ΣΠ-neural network is a polynomial of the form sp(x1 , ..., xn ) = w0 + w1 x1 + w2 x2 + n + w p x1 x2 ...xn , where {w1 , w2 , ..., wk } is theset of weights of the given ΣΠ-neuron. The weights-adjusted training set, generally speaking, may be unknown. Thus, we need to restore the training set, discover the logical regularities and use them to correct errors of the original ΣΠ-neuron. Indeed restoring the training set by polynomial is an inverse to the learning procedure. The lower-level variables are {x1 , x2 , ..., xn}. All the coefficients of the polynomial (the weights) are divided into level according to the number of variables. This implies that the first weight level has the smallest amount of variables; the second level includes one more variable, respectively etc. Thus, the upper level contains all the coefficients with the greatest number of variables. Each upper level element is associated with the corresponding lower level elements by common variables.
Fig. 1.
The weights in the first level {w1 , w2 , ..., wr } are respectively the objects {y1 , y2 , ..., yr } at each subsequent level yk+1 = wk+1 + Σyi , where i are indices of the corresponding objects whose variables are included as factor variables in the element at the level yk+1 .
6
L.A. Lyutikova / Procedia Computer Science 145 (2018) 312–318 L.A. Lyutikova / Procedia Computer Science 00 (2019) 000–000
317
EXAMPLE Let the trained ΣΠ-neuron be of the following form sp(x1 , x2 , x3 ) = 4x2 + 2x3 + 2x2 x3 + 120x1 x2 x3 Restore objects of the training set and find generalizing logic rules (Fig. 1). Implicit rules formation: Decision tree generate rules. In our case to convert decision trees into implicit rules we consider paths from each vertex of the graph yk to each variable xi . If such a path exists, then the set of vertices belonging to the corresponding graph of path i.e. the set of objects identified by the variable xi , create a class. This path is representable in the form of alogical formula by means of operation of conjunction. This implies P(yk )&P(yk−1 )&...&P(yi)&xi . If a path from the given vertex yk to the vertex xi do not exist, then such path (object class) is identified by the corresponding variable negation. The aggregate of all possible paths, both existing and non-existing, is represented in a disjunctive form (Fig. 2).Therefore, the object part of the logical function can be completely restored (1). See the examplein Fig. 2.
Fig. 2.
A dashed dotted line indicates links with logical negation f (X) = x3 P(4) ∨ x2 P(2) ∨ x1 P(128) ∨ P(128)P(8)P(2)x3 ∨ P(128)P(8)P(4)x2 ∨ P(4)P(8)P(2)x1 and this completely coincides with function (1) in its object part. THEOREM. Assume X = {x1 , x2 , ..., xn }, xi ∈ {0, 1, ..., ki − 1}, where ki ∈ [2, ..., N], N ∈ N is the set of characteristics. Y = {y1 , y2 , ..., ym} is the set of objects; each object yi is described with the corresponding set of characteristics x1 (yi ), ..., xn(yi ) and assume sp(X) = Y, then according to the corresponding structure sp(X) we can get all possible solutions in a given data space. As illustrated above, it is possible to reconstruct the training set by the neuron structure and to construct for it a logical decision function (1) that builds a complete rules system for a given domain. The proposed procedure for creating implicit rules simplifies the problem. Since through the logical function (1) a complete analysis of the training set and finding all connections are possible, the object part of the logical function can act as a corrective mechanism of the ΣΠ-neuron. Using the algorithm proposed above, it is possible to constructa function, relying on the structure of the Σ-neuron even without a training sample. The final form of the ΣΠ-neuron model, including the error corrective mechanism, can take the following form: Y = P(sp(X)) ∨ f (X), where f (X) is the neurons corrective tool. Thus, we can choose the best solution resulted from the neurons operation or the corrective tool. If the neuron response is correct and the object did not before belong to the training set the corrective tool accepts a new rule by virtue of the modifiability of the function f (X), expanding thus the knowledge base.
318
L.A. Lyutikova / Procedia Computer Science 145 (2018) 312–318 L.A. Lyutikova / Procedia Computer Science 00 (2019) 000–000
7
Conclusion The paper considers the search for logical regularities with respect to the structure of the trained ΣΠ-neuron. Combined application of Artificial Neural Networks and Inference Methods are proposed as the facility for identifying logical regularities and allocation of more accurate result in recognition task. We have also investigated the advantages of logical data analysis and constructed the logical function to reveal all regularities within the subject domain. We have developed a procedure for building a decision tree based on a trained ΣΠ-neuron that does not impose any requirements to architecture, learning algorithm, input or output dataor other network parameters. The tree is constructed with respect to the structure of the neuron. To sum up we can say that a number of implicit logical regularities has been revealed, the logical function has been built for inaccurate or noisy data the neuron yields providing thus the most plausible (close to the model) responses to the made request. The function is easily modifiable in case of unforeseen correct neuronal response. These all contribute to the efficiency of the automated solution of intellectual tasks, improve operational reliability, ensures the solution accuracy through the use of the most effective systems for analyzing the initial data and developing more precise methods for their processing. References [1] Zhuravlev Yu. I. (1978) ”On an algebraic approach to the solution of recognition or classification problems.” Problems of Cybernetics. Vol. 33. P. 5-68. [2] Flach P. (2015) Machine learning. The science and art of constructing algorithms that extract knowledge from data. Moscow: MDK Press, Pp. 400. [3] VorontsovK.V. (2000) ”Optimization methods for linear and monotone correction in the algebraic approach to the recognition problem” Journal of Computational Mathematics and Mathematical Physics. V. 40, No. 1. pp. 166-176. [4] Ablameiko S.V., A.S. Biryukov, Dokukin A.A., Dyakonov A.G., ZhuravlevYu.I., Krasnoproshin V.V., Obraztsov V.A.,RomanovM.Yu., Ryazanov V.V. ( 2014) ”Practical algorithms of algebraic and logical correction in problems of recognition by precedents” Journal of Computational Mathematics and Mathematical Physics.. Vol. 54. No. 12. P. 1979. [5] Timofeev A.V., Kosovskaya T.M. (2013) ”Neural network methods for logical description and complex images recognition” Proceedings of SPIIRAS. No. 27 Pp. 144-155. [6] Dyukova E.V., ZhuravlevYu.I., Prokofiev P.A. (2015) ”Methods for increasing the efficiency of error-correction” Machine learning and data analysis. Vol. 1. No. 11. P. 1555-1583. [7] Gridin V.N., Solodovnikov V.I., Evdokimov I.A., Filippov S.V. (2013) ”Building decision trees and extracting rules from trained neural networks” Artificial Intelligence and Decision Making. No. 4. Pp. 26-33. [8] Zhiting Hu, Xuezhe Ma, Zhengzhong Liu, Eduard Hovy, Eric Xing (2016) ”Harnessing Deep Neural Networks with Logic Rules” Computer Science Learning. P. 2410 2420. arXiv:1603.06318. [9] Alex Graves, Greg Wayne, Malcolm Reynolds, Tim Harley, Ivo Danihelka, Agnieszka GrabskaBarwinska, Sergio Gmez Colmenarejo, Edward Grefenstette, Tiago Ramalho, John Agapiou, et al. (2016) ”Hybrid computing using a neural network with dynamic external memory.” Nature. No. 538(7626). Pp. 471476. [10] ShibzukhovZ.M. (2013) ”Pointwise Correct Operations on Recognition and Prediction Algorithms” Reports of the Russian Academy of Sciences. V.450, No.1. Pp. 24-27. [11] Shibzukhov Z.M. (2014) ”Correct Aggregation Operations with Algorithms” Pattern Recognition and Image Analysis. Vol. 24, No. 3, pp. 377382. [12] Shibzukhov Z.M. (2010) ”On some constructive and well-posed classes of algebraic ΣΠ-algorithms” Doklady RAS. V. 432, No.4. Pp. 465-468. [13] Timofeev A.V., PribikhovV.Kh. (1974) ”Algorithms for learning and minimizing the complexity of polynomial recognition systems. Izvestiya AN SSSR. Technicalcybernetics. No. 7. Pp. 214-217. [14] Timofeev A.V., Lyutikova L.A. (2005) Development and application of multi-valued logic and network flows in intelligent systems // Proceedings of SPII RAS. Issue. 2. Pp. 114 -126. [15] Lyutikova L.A., Shmatova E.V. (2016) ”Analysis and synthesis of pattern recognition algorithms using variable-valued logic” Information technologies. Vol. 22. No. 4. Pp. 292-297. [16] Lyutikova L.A.(2008) ”Application of mathematical variable valued logic to the knowledge systems modeling” Bulletin of the Samara State University. Natural science series. No.6 (65). Pp. 20-27.