Neurocomputing 74 (2011) 2052–2061
Contents lists available at ScienceDirect
Neurocomputing journal homepage: www.elsevier.com/locate/neucom
Rough Neuron based on Pattern Space Partitioning Sandeep Chandana a, Rene V. Mayorga b,n a b
Faculty of Engineering, University of Calgary, Canada Faculty of Engineering, University of Regina, Canada
a r t i c l e i n f o
a b s t r a c t
Article history: Received 30 September 2009 Received in revised form 8 December 2010 Accepted 13 December 2010 Communicated by S. Mitra Available online 21 March 2011
This paper proposes a novel method of combining Rough concepts with Neural Computation. The proposed New Rough Neuron consists of, one Lower Bound Neuron and another Boundary Neuron. The combination is designed in a way such that the Boundary Neuron deals only with the random and unpredictable part of the applied signal. This division results in an improved rate of error convergence in the back propagation of the neural network along with an improved parameter approximation during the network learning process. Some testing results have been presented, and performance compared with some of the prevalent designs. & 2011 Elsevier B.V. All rights reserved.
Keywords: Artificial neural networks Rough sets Boundary Neuron Pattern Space Partitioning Hybrid learning
1. Introduction The world of neural networks has come a long way from its inception by continuously imbibing various concepts and theories into the fundamental model of the human brain. The current topic of discussion is one such case, wherein concepts of the Rough Set Theory are being introduced into Neural Computation while pursuing specific interests. Real time modeling and analysis inherently warrant the ability to deal with uncertainties and noise. With reference to this paper: this work attempts to address the issue of dealing with uncertainty and noise. Generally, noise encompasses the information that may be known about a system; hence, it is important to deal with it properly. And with respect to context, the terms noise and uncertainty are being considered here as synonyms, given the fact that both of them are undesirable. Following Rosenblatt’s [1] Perceptron to Churchland’s [2] famous mine-rock example, the approach to memory taken here is that, no direct mapping of stimulus to response exists; but that the information about a certain activity is stored as a preference for a particular response i.e. information is stored in associative clusters rather than topographic representations. Consider the case of understanding accented dialect; hearing a certain phonetic immediately results in recognition or classification of the word itself. It is here that humans display the capability to prune
n
Corresponding author. Tel.: þ1 306 585 4726; fax: þ1 306 585 4855. E-mail address:
[email protected] (R.V. Mayorga).
0925-2312/$ - see front matter & 2011 Elsevier B.V. All rights reserved. doi:10.1016/j.neucom.2010.12.024
through the available information in order to filter out noise and only retaining partial but certain information. Further, let us consider a neuro-biological perspective. What stemmed out of the Hebbian theory [3] is that the aspect of memory lay within the distributed synaptic weights in the neuronal networks. Memory is also characterized by that fact that the individual neurons of specific networks are arranged into coordinated clusters or cell assemblies [4]; each of them containing the knowledge required for a certain cognitive function. Explicitly it means that, when a human is presented with certain information, he tends to categorize the information into various classes or clusters, before processing or learning them. Rationale: Following Pawlak [5], Bargiela and Pedrycz [6] notes that, consciously or not, all humans indulge in the act of Granulation. It has also been observed that such Granulation helps improve the succeeding cognitive behavior. Granular approach is a proven technique to deal with complex problems; widely practiced under the auspices of interval analysis, quantization, Dempster Shafer theory, cluster analysis, etc. [7]. Such an approach is extremely beneficial for systems involving vague and uncertain information. This is because, Granular computing provides for an approximate inclusion of a concept when definite entities cannot be identified [5,6]. This research is aimed at presenting a computational analogy to the granular approach adopted by humans for complex problem solving. Taking cue from the aforementioned aspects about granular computing and that of cognitive behavior, this research proceeds with an ulterior aim of incorporating granular structure into neural network based computing. This work may be summarized
S. Chandana, R.V. Mayorga / Neurocomputing 74 (2011) 2052–2061
as an attempt to improvise a conventional architecture to enhance its capability (of approximation) and speed (of error convergence) by inculcating the concepts of parameter space partitioning [8]. The partitioning is done based on indistinguishability; precisely based on a given objects’ belongingness to the concepts of useful information and useless-information (noise). Specifically, this paper describes a new neural network based on the concept of partitioning the applied signal into random and the predictable parts. The result, the New Rough Neuron made up of one Lower Bound Neuron and another Boundary Neuron. The paper is divided into nine sections. The second section presents preliminary concepts and theories about the involved methodologies. Section three outlines the philosophy, design and working of Rough Neural Networks. Section four introduces a new Rough Neuron. Section five presents a Partition Mapping. Section six presents some node functions. Section seven outlines the test bed where the proposed model has been benchmarked. Section eight presents some valuable interpretations and discussions about the testing results. Finally, Section nine provides a discussion and some conclusions on the proposed network.
2. Underlying theory—Rough Sets The aim of the section is to make the paper self contained and help the reader obtain a better understanding of the concepts described in the succeeding sections. Rough Set Theory was first developed in Poland in the early 1980s by Pawlak ([5,9–11]); the intention was to create a methodology capable of performing classificatory analysis on imprecise, uncertain or incomplete information. Assuming that every object in the universe has information associated with it, the objects characterized by the same information are said to be indiscernible, in view of the available information about them. Such indiscernibility relations form the basis of the mathematical framework for Rough Sets. Pawlak’s methodology deals with the vagueness by replacing the vague/imprecise class with a pair of precise classes; called the Upper and the Lower Approximation. The Lower Approximation consists of all objects which can definitely be categorized as belonging to an indiscernible class whereas the Upper Approximation consists of all objects which have probabilistic belongingness to the class. Such division of the vague concept is designed to isolate the vagueness into a controlled region called the Boundary Region.
2.1. Basic notions Rough Set Theory adopts a knowledge representation schema called the Information System, which is a table containing the pair A¼ (U, A), where U is a set of objects and A that of attributes. Furthermore, aAA is a function a: U-Va; where the set Va is a value set of the attribute a. Two objects (x, y) form an indiscernible group over the attribute a iff a(x) ¼a(y); moreover if we assume that (x, y)ARa then Ra would be called the a-indiscernibility relation. Consider X DU, and R an equivalence relation, that is, a binary relation RDX X, which is reflexive, symmetric, and transitive. Then for any A there is an associated equivalence relation INDA (R) called the R-indiscernibility relation; and the equivalence classes of the R-indiscernibility relation are denoted [x]R [12]. Next, we can say that X is R-definable (or R-exact) if X is the union of some R-basic categories; else X is R-undefinable (or R-inexact). Rough Sets provide a framework to approximate the inexact region through two other exact sets, referred to as the Lower and the Upper Approximation sets. We create the following two (Upper and Lower) subsets out of the equivalence
2053
relation R, i.e. RX ¼ [ fY A U=R : Y D Xg
ð2:1Þ
RX ¼ [ fY A U=R : Y \ X a fg
ð2:2Þ
where RX-Lower Approximation and RX-Upper Approximation. They can also be represented as RX ¼ fx A U : ½xR DXg
ð2:3Þ
RX ¼ fx A U : ½xR \ X a fg
ð2:4Þ
Also set BNR ðXÞ ¼ RXRX
ð2:5Þ
Thus, Lower Approximation (R X) consists of all objects which can certainly be classified as elements of X, given the knowledge R; and Upper Approximation (R X) consists of all objects which can possibly be classified as elements of X. The Boundary (BNR (X)) may be defined as the objects which can neither be classified as those of X nor of –X. Rough Approximations based on certainty aspects are denoted as Positive, Negative, and Boundary regions. Positive region, earlier introduced as the Lower region, consists of all objects which can be classified with full certainty as members of the set X, using knowledge R. The Negative region consists of all objects which can be classified with full certainty as members of the set complement of X, using knowledge R. Definition of the Boundary region remains consistent even through this form of the notation i.e. this region holds all of the unclassifiable objects or in other words the Boundary region consists of the uncertain elements of the Universe. The certainty based regions may be denoted as POSR ðXÞ ¼ RX
ð2:6Þ
NEGR ðXÞ ¼ URX
ð2:7Þ
BNR ðXÞ ¼ RXRX
ð2:8Þ
3. Review of Rough-Neuro nets Considering that this research is not the first attempt to bring together the two methodologies, the following sections may be regarded as briefs about various other models and techniques that have been developed. Only important models and applications have been touched upon. The later part describes the Lingras [13] Rough Neural Network to certain detail.
3.1. Combining rough sets and neural networks Most of the work done within the scope of rough-neurocomputing tends to take a granular approach to information processing. Only a limited number of neural network based hybridizations exist, i.e. Lingras [13,14], Peters et al. [15–17], Slezak et al. [18–20], Chandana–Mayorga [21], and Slezak–Szczuka [22], are typical systems where the basic structure of a neural network is retained. On the contrary, many of the studies of neural networks with reference to Rough Sets have been done based on information granules [23]. Another main stream method of combining the two theories [24–28], is by applying one before the other i.e. either by applying rough sets as a preprocessor for the data, followed by neural learning or vice versa.
2054
S. Chandana, R.V. Mayorga / Neurocomputing 74 (2011) 2052–2061
presented by Slezak—Szczuka [22]. Please refer to Refs. [13,22] for further details.
4. Proposed Rough Neuron
Fig. 3.1. Lingras Rough Neuron connections. Source: pp. 4. [13].
3.2. Integrated rough-neural-computing Four landmark models exemplify integrated rough-neuralcomputing viz., the Rough Neuron (lower/upper) model developed by Lingras [13], the Rough Neuron (positive/negative) model by Slezak–Szczuka [22], the Normalizing Neural Networks developed by Slezak et al. [18–20], and the Rough (Petri net) Neuron developed by Peters et al. [15–17]. Lingras [13] Rough Neuron: This model was developed with an aim of classifying a set of objects into three parts based on a given condition i.e. into the lower, the upper and the negative regions. The Lingras Rough Neural Network consisted of both conventional as well as the Rough Neurons in a fully connected fashion. The Rough Neuron consists of two individual neurons called the upper bound neuron and the lower bound neuron, which have a mode of sharing information. An important variation from a conventional neural network exists in the form of connections between two rough neurons. With reference to Fig. 3.1, the two neurons (including the constituents) are fully connected (cf. Fig. 3.1(A)) to start with, but through the course of learning the linkages take either an inhibitory or an excitatory form. The inhibitory connection is established when an increase in the output of the neuron (r) causes a decrease in the output of the succeeding neuron (s); the connection is characterized by links between the opposites i.e. s-lower is connected to r-upper and s-upper is connected to r-lower (cf. Fig. 3.1(C)). An excitatory connection is established when an increase in the output of a neuron also results in an increase of output of the succeeding neuron; in such a case the likes get connected i.e. s-lower is connected to r-lower and s-upper is connected to r-upper (cf. Fig. 3.1(B)). Though the neurons (lower and upper) have a conventional sigmoid transfer, the actual output of lower and upper neurons is given as per the following functions: outputr ¼ maxðtransferðxr Þ,transferðxr ÞÞ
ð3:1Þ
output r ¼ max ðtransfer ðxr Þ , transfer ðxr ÞÞ
ð3:2Þ
where x ) inputi ¼
X
wi j Uoutputj
ð3:3Þ
j:jconnected with i
This Rough Neural Network employed a conventional back propagation learning algorithm. The validation of the architecture was done through the approximation of Urban Traffic data, the resulting training error was found to be reduced to 40% from that of a conventional neural network and the testing error was reduced by about 10%. An interesting alternative to the Lingras (lower/upper) model is the model (positive/negative) version
In this section we describe the proposed New Rough Neuron and its constituents. The proposed architecture primarily functions to partition the applied signal into certain and uncertain partitions. These partitions are initially created through a user specified heuristic, but the partitions are propagated through the layers based upon an entropy mapping; and the entropy is measured through a co-variance matrix. The Rough Neurons themselves consist of two individual neurons strategically linked such that, the noise in the training data decreases across the neural network layers. The New Rough Neuron has been designed taking into account the concept of Pattern Space Partitioning [8]. The partitioning of the signal is done based on Rough regions and boundaries. Such a partition helps separate the dynamic variances in the data from the important patterns. Once such segregation is in place, specialist neurons would deal with the individual partitions and thus the issue of random data which cause inaccurate modeling is avoided. 4.1. Structure The New Rough Neuron is made up of a combination of two individual neurons working in a pseudo-deterministic fashion (explained in the succeeding sections). One of the constituent neurons is called the Lower Bound Neuron, and is responsible for the output approximation (of the overall Rough Neuron) based on the lower bound of the applied input signal i.e. the lower bound neuron deals only with the definite or certain part of the signal and produces its share of the total output. The second and the more prominent half of the Rough Neuron, called the Boundary Neuron, is designed in such a way that it processes only the boundary signal i.e. the Boundary Neuron deals only with the random or uncertain part of the signal. The randomness in the applied signal may be categorized either as the noise present in the data or data content (and patterns) that the neurons have not yet been introduced to. This interpretation of randomness is applicable only to the learning/training stage of the neural network. 4.2. Linking Rough Neurons A basic rough neural network design is described in this subsection. Let us first consider the linkages between 4 Rough Neurons in a network; (illustrated in Fig. 4.1). As mentioned earlier, the boundary and lower bound parts of the Mi neuron, partition the applied input signal into random and predictable sections. Then they respectively process their parts of the input signal and produce outputs which are collated (as per details in the next subsection). The thus produced output signals from the neurons of layer M (which are now input signals into layer N) are once again partitioned into the two classes. This process repeats until the penultimate layer (or last hidden layer) further to which, the outputs are simply passed through a summation operator. The deviation of the obtained output compared to the desired output (after each complete iteration) dictates all further training and modification of the linkages. With reference to Fig. 4.1; please take note that Fig. 4.1(B) depicts the implicit connections between the four constituent neurons of the two connected Rough Neurons. Despite not being physically connected to an array of other neurons, the presented model establishes a qualitative linkage between every pair of non-homogenous neurons;
S. Chandana, R.V. Mayorga / Neurocomputing 74 (2011) 2052–2061
2055
through the following nodal functions: OutputðRÞ ¼ ORR þ OR
ð5:4Þ
Further, output of Boundary Neuron is given as
ORR ¼
transferðxR Þ
eR
þ
transferðxRR Þ
eRR
ð5:5Þ
And the output of the Lower Bound Neuron is given as OR ¼ minftransferðxR Þ, transferðxRR Þg
Fig. 4.1. Linkages among the Rough Neurons. (A) Four connected neurons and (B) Constituent Neurons of the two connected neurons.
a pair of constituent (Boundary and Lower Bound) neurons, belonging to the same Rough Neuron, forms a homogenous set. The qualitative implicit linkage is the result of rough region mapping adopted between the layers of the network.
5. Node functions
FðxÞ ¼
a 1 þ ebUx þ z
One should take note that the excitation factor, in reality has a negating effect on the output of the Boundary Neuron (refer to Eqs. (5.2) and (5.5)). And with reference to Eq. (5.4), such an effect is in turn propagated to the overall Rough Neuron output. Since the Boundary Neuron processes the random part of the applied input signal, over a few layers, we see a considerable reduction in the randomness. This is one half way of reducing uncertainty; the second half will be taken up for discussion in the subsection describing the learning algorithm. As previously mentioned, the lower bound neuron also tends to produce an output with some randomness in it. But a negating operator is not applied to its output excitation factor, since the generated random signal would be dealt with a Boundary Neuron in the succeeding layer. Such a mechanism applied across the Rough Neuron Ri (refer to Fig. 4.2) would boost the lower bound signal and simultaneously inhibit the boundary signal. Thus, (with reference to Fig. 4.2) the output of the full Rough Neuron may be given as oRi ¼ oRR þ OR
The Lower Bound Neuron and the Boundary Neuron have both been designed with a sigmoid transfer function. A generalized sigmoid function (Eq. (5.1)) has been chosen in order to accommodate any non-linearity encountered during the modeling process
ð5:6Þ
ð5:7Þ
Further, (with reference to Fig. 4.1(A)) input to a succeeding layer (N) can be given as a linear combination of all outputs of neurons belonging to the preceding layer (M). The linear combination is based on the weighted connections between the respective neurons i.e.
ð5:1Þ
where x represents the signal applied to the node and {a ,b ,z} are the node parameters. The network structure is based upon the nodal transfer functions and the behavior of the nodal outputs. The Nodal outputs in turn are dependent upon the Output Excitation Factor (OEF; e). The Output Excitation Factor (e) may be defined as a ratio of change in the (resultant) output magnitude to the change in (applied) input magnitude i.e. The Output Excitation Factor for the Boundary Neuron is given as Dðomi Þ ð5:2Þ eRR ¼ Dðxmi Þ
inputNpi ¼
k X
ðwMi Np Þ ðoMi Þ
Finally, the output layer of the Rough Neural Network should consist of either a singular Rough Neuron or a conventional neuron with a summation equivalent transfer function. All above transactions are based upon the partitioned parameter (or search) space. It can also be concluded, that an effective approach to partitioning is very crucial in terms of improving and expediting neural-information processing here. The following section details the internal space partitioning mechanism that has been adopted.
The Output Excitation Factor for the Lower Bound Neuron is given as Dðomi Þ ð5:3Þ eR ¼ Dðxmi Þ where D ( x) represents the change in the applied input ‘x’, D( o) represents the resulting change in the output ‘o’, and mi is the notation adopted to represent the ith node in the m th layer of the neural network. Computing the excitation factor helps determine a functionalbehavior trait, which has been adopted to reduce randomness in the signal, as it flows forward. This reduction may be explained
ð5:8Þ
i¼1
Fig. 4.2. One New Rough Neuron.
2056
S. Chandana, R.V. Mayorga / Neurocomputing 74 (2011) 2052–2061
6. Partition Mapping
and thus if the boundary region increases then the lower region shall decrease and vice versa. It should be noted that owing to the design of the neural network, the parameter space varies considerably through the layers; thus, further scaling of the distance between the (heuristic) Upper and Lower Approximation ends, can be done based upon a normalized co-variance matrix (also known as a correlation matrix). The co-variance for a distribution or an ordered set of variables xi, xj,AX, may be given as [30]
The proposed architecture primarily functions to partition the applied signal into certain and uncertain partitions. One of the prime features of the architecture is its capability to propagate a relational mapping of the rough regions. As the signal propagates through the layers of the Rough-Neural Network, the heuristically created rough regions are mapped using abstract space approximation [29]; the approximation is based on the co-variance matrix. This section describes the functionality of the internal mapping mechanism. The first subsection presents a couple of definitions; followed by a section introducing the co-variance mapping; then, by the subsection presenting some theoretical notions on the regions bounding; and concluding with a subsection discussing the propagation scheme.
where mi, mj denote the mean of the variables. The co-variance measure can be used to determine the covariant behavior on two consecutive layers, i.e.
6.1. Definitions
covfRg ¼ f½x m m U½x n m g
covfxi ,xj g ¼ f½xi mi U½xj mj g
m
Here, first it is necessary to state the following definitions. Definition 1. Rough regions based (lower and upper) sets may be defined; through the basic notions of Rough Set Theory as RðXÞ ¼ fx A U : RðxÞ D Xg RðXÞ ¼ fx A U : RðxÞ \ X a fg and also that RðXÞ a f where as previously mentioned in Section 2: U is the Universe of discourse; R is an equivalence relation; and A is the set of attributes, i.e. A¼{a:U -Va;8aAA}, with Va as a attribute value set. From our understanding, the Upper and the Lower Approximation sets may be described as a collection of elements held together by the relation R. Such a collection may also be regarded as a partially ordered set (referred to as poset); based upon which the following may be said [29]: Definition 2. An abstract approximation space may be defined as U˚ ðX,RðXÞ,RðXÞÞ where ˚ ), min (U ˚ )) is a poset with respect to the partial 1. (X, r, max (U order relation ‘ r’ bounded by the max (respectively min) value of the Universe at the upper (respectively lower) ends; 2. R ð X Þ is regarded as a sub-poset of X consisting of all available inner definable elements i.e. ðRðXÞ, r, xd , x Þ
n
ð6:1Þ
ð6:2Þ
where m, n denote the layers of the neural network and x, m denote the variate and mean values relevant to the Lower Approximation. And covfRg ¼ f½ xm mm U½ xn mn g
ð6:3Þ
where x, m , denote the variate and mean values relevant to the upper approximation. Similarly, the correlations of the Upper and Lower Approximation regions are given as
r R ¼
covfRg ðsfRgÞ2
ð6:4Þ
and
r R
¼
cov f R g ðs f R gÞ2
ð6:5Þ
where s f R g (respectivelys f R g) denote the standard deviation for the Upper (respectively Lower) Approximations. Although the above expressions can establish the co-variant behavior between two consecutive layers; they do not guarantee that in subsequent layers the uncertainty region would increase, or that the certainty region would not decrease. In order to ensure the proper bounding on the uncertainty and certainty regions, the following notions are necessary. 6.3. Regions bounding
d
where xd and xd are the upper and lower ends of the lower sub-poset. 3. R ð X Þ is regarded as a sub-poset of X consisting of all available outer definable elements i.e. ðRðXÞ, r, z d , z Þ d
where zd and zd are the upper and lower ends of the upper sub-poset.
This subsection provides a mathematical framework adopted to implement a rough region propagation schema. The Axiom 3 restricts any decrease in the lower (certainty) region extent; whereas, Axiom 4 restricts any increase in the boundary (uncertainty) region extent. Axiom 3. (Adopted from Ref. [29]). For any approximable element xAX, there exists at least one element i(x) such that iðxÞ A RðXÞ;
6.2. Co-variance based mapping The proposed mapping mechanism aims to approximate abstract spaces (partitions in this case), based on a co-variance measure; the mapping aims to propagate the user-heuristic defined Lower, Upper and Boundary regions through every layer in a qualitative fashion. The mapping is based upon the assumption that the rough regions co-vary, i.e. the change in the extent of the regions in a given dimension would more or less remain the same. That is, the co-variant behavior controls the outer bounds
iðxÞ r x;
and
8a A RðXÞ,
ða r x ) a r iðxÞÞ
ð6:6Þ
where i(x) is an inner definable element which is not only the Lower Approximation of x but also the best approximation of the vague element x from the bottom by the inner definable elements. And a is an arbitrary (i.e. a relative median) element between the ends of the Lower Approximation. For such a scheme of elements; it is possible to introduce the mapping i : X / R ðXÞ called the inner approximation mapping
S. Chandana, R.V. Mayorga / Neurocomputing 74 (2011) 2052–2061
associated with any x A X as iðxÞ maxfa A RðXÞ : a rxg
ð6:7Þ
Axiom 4. Similarly for any approximable element xAX, there exists at least one element o(x) such that oðxÞ A RðXÞ; x r oðxÞ; 8g A RðXÞ,
2057
The performance of the partitioning scheme depends mainly on the initial calculation of m , m , at each layer; and the computation of cov fRg , and the bounding parameters a1, g1, in the first layer. Afterwards, the proposed partition scheme propagates the bounds a, g, on each layer as discussed above. Notice that the computation of m , m , cov fRg , and a1, g1, is rather simple; still, some special attention should be given to the computation of a1, g1, in order to ensure a good performance.
and ðx r g ) oðxÞ r gÞ
ð6:8Þ
where o(x) is an outer definable element which is not only the upper approximation of x but also the best approximation of the vague element x from the top by the outer definable elements. And g is an arbitrary (i.e. a relative median) element between the ends of the upper approximation. For such a scheme of elements; it is possible to introduce the mapping o : X/RðXÞ called the outer approximation mapping associated with any xAX as oðxÞ minfg A RðXÞ : x r gg
ð6:9Þ
7. Learning—backpropagation In a conclusive fashion, it can be stated that through the layered search space partitioning mechanism, the number of links between the neurons and the computational complexity of an inverse learning algorithm have been reduced. Though change is reflected at the processing end i.e. the Partition Mapping only helps reduce the number of links (cf. Fig. 6.4) and the number of iterations (before convergence) by a sizeable proportion (cf. Results Section); it should not be expected an improvement in the approximation capabilities of the neural network. 7.1. Learning algorithm
6.4. Propagation scheme Notice that the co-variance measure can be used to determine the relative medians for the above aforementioned approximations. That is, given the correlation measures (i.e. cov fRg ) for both approximation regions (i.e. am, gm) at the layer ‘m’; the relevant region bounds (i.e. an , gn) for layer ‘n ¼mþ1’ can be computed from covfRg ¼ f½am m U½an m g
ð6:10Þ
covfRg ¼ f½gm mm U½gn mn g
ð6:11Þ
m
n
where the Upper and Lower distribution means (m , m ) are obtained from the input (output) values to the layer ‘n’ (‘m’) of the neural network. Thus it can be concluded that, a co-variance assumption coupled with Axioms {3, 4} ensures that the uncertainty bandwidth does not increase (although, it can decrease) and that the lower or certainty bandwidth does not decrease (although, it can increase). As previously mentioned, the mapping provides for a qualitative propagation of the rough regions. With such a mapping scheme in place, the nodal connections are altered (cf. Fig. 6.4(B)). Since such a scheme helps process all uncertainty at any stage in the neural network through the individual specialized neurons, thereby fewer weighted connections are sufficient for effective uncertainty reduction.
The standard backpropagation based learning algorithm has been implemented to suffice for the learning requirements of the Rough Neural Network. The learning algorithm helps in iterative approximation of the network parameters based on the supplied training data set and preliminary network design parameters (for example, sigmoid function parameters). The learning algorithm is described next; the notation has been adopted from Ref. [31]. Given a network of m layers each with N neurons; the error function is [31] ei ¼
N X
ðdi,j oi,j Þ2
ð7:1Þ
j¼1
where ei is the calculated sum of the squared errors for a singular neuron, di,j being the desired output for the said neuron, and oi,j the obtained output. Extending the above expression to the overall error of the output layer, we get E¼
N X
ðDj Oj Þ2
ð7:2Þ
j¼1
The error rate for the output layer may be determined by computing the ordered derivative of the error function @E ¼ 2ðDm Om Þ @Om
ð7:3Þ
Mi
Ni
Mi
Ni
Mi + 1
Ni + 1
Mi + 1
Ni + 1
Boundary Neuron
Lower
Boundary Neuron
Lower
Fig. 6.4. Four connected Rough Neurons. (A) Without partition mapping and (B) With partition mapping.
2058
S. Chandana, R.V. Mayorga / Neurocomputing 74 (2011) 2052–2061
This error rate may be transformed to that of an internal node through the chain rule derivation of Eq. (7.3) X @ei @E @Om ¼ U @Om1 @Om @Om1
Table 8.1 Training data for Rough Neural Network. Data sets:
Training
Testing
Noise (%)
Eq. (7.4) can be extended to determine the error rate with respect to a network parameter a as
(a) Larger training data Deceleration distance Deceleration time
500 500
100 100
15 15
X @ei @Oj @E ¼ U @a @Oj @a
(b) Smaller training data Deceleration distance Deceleration time
200 200
100 100
20 20
ð7:4Þ
ð7:5Þ
where j A [1, N]. As per Eq. (7.5); the parameter update formula is given as [31]
Da ¼ ZU
@E @a
ð7:6Þ
Table 8.2 Deceleration Distance network training details. Training error
Epochs
(a) Case 1: Larger training data New Rough Neural Network Lingras Rough Neural Network Adaptive Neuro-Fuzzy Network Back Propagation Neural Network
1.19 E 6 1.11 E 5 1.35 E 5 6 E2
70 224 477 20,000
(b) Case 2: Smaller training data New Rough Neural Network Lingras Rough Neural Network Adaptive Neuro-Fuzzy Network Back Propagation Neural Network
1.19 E 6 4 E4 4.78 E 4 8.02 E 2
185 207 550 20,000
where Z is called the learning rate and is given as k
Z ¼ qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi P @E 2
ð7:7Þ
a @a
where k is the step size governing the speed of convergence towards an optimal solution. 7.2. Uncertainty reduction The learning algorithm in the backward direction is applied in two tiers. Specifically, two parallel and simultaneously operating algorithms indulge in parameter approximation: one algorithm runs through the boundary neurons and the other runs through the lower bound neurons. This type of learning has been implemented in order to segregate the differential behaviors of the two tracks and thus induce a faster convergence rate. Moreover, the algorithm is implemented in its off-line learning form to achieve improved performance. All these characteristics of the learning algorithm have resulted in a reduction of uncertainty.
study. Two such sets of four networks have been designed, one for each of the aerodynamic equations. All of the Neural Networks have been designed with 5 layers (including the input and output layers) and each layer has 8 neurons. The Rough Neural Networks (NRNN and RNN) actually have 4 Rough Neurons (i.e. 4 2 constituent neurons). Such a design has been adopted to ensure equivalent processing capabilities of all the networks. 8.3. Training details
8. Performance testbed and results This section details some of the testing done on the New Rough Neural Network. Certain complex aerodynamic equations have been modeled and the performance compared with the conventional neural network and [13] proposed Rough Neural Network. 8.1. Modeling the aerodynamic equations A random higher order complex aerodynamic equation was chosen to evaluate the proposed Rough Neural Network. Due to the space constraints, we only provide the structural equation with reference. Please refer to the Appendix for exact equations. 8.2. Training and testing data Using the said aerodynamics equations [32], random data sets were generated within specific intervals. The intervals were chosen based on the normal flying conditions of the McDonnell Super Hornet. Two sets of training and testing data for each of the equations was numerically generated to produce two different training and modeling environments. Considering the fact that the data was numerically generated, noise (random numbers) was deliberately introduced. Noise percentage is inclusive in the training data size. Four (4) Neural Networks viz., proposed New Rough Neural Network, Conventional Back Propagation Network, Adaptive NeuroFuzzy Inference System [31], and the Rough Neuron based Network developed by Lingras [13], have been used in the comparative
The three types of neural networks (for one equation) have been trained over two data sets (varying in the number of data points in each set, refer to Table 8.1). The relevant training details have been tabulated in Tables 8.2–8.5. Columns: Time/epoch is the computational time per epoch in seconds and Epochs are the total number of epochs trained before reaching error stability in the network. Error% is the normalized approximation error of the network determined over a set of 500 test cases. 8.4. Results discussion This section is aimed at providing an insight into results from the earlier sections. It may also be regarded as space used to provide a reasonable justification for certain trends, which arose out of the problem solving and the subsequent analysis. It has been clearly observed that the number of epochs or iterations required through the training has drastically reduced for the case of the new rough approximation based neural network. The epoch reduction due to the use of rough neuro methods may not be predicted accurately, but by and large one can expect at least 50% reduction. Approximation Accuracy has been observed to have improved considerably over the conventional neuro or neuro-fuzzy methods or the Lingras network. Through the experiments with smaller data sets, it can be stated that the rough sets based models tend to perform better than the conventional model. This gives the rough approximation based models an edge for cases, where extensive data is either unavailable or expensive to gather. The speed of error
S. Chandana, R.V. Mayorga / Neurocomputing 74 (2011) 2052–2061
Table 8.3 Deceleration Time network training details. Training error
Epochs
A: Larger training data New Rough Neural Network Lingras Rough Neural Network Adaptive Neuro-Fuzzy Network Back Propagation Neural Network
1.0 E 3 9.24 E 2 9.5 E 1 5.4528
103 550 467 20,000
B: Smaller training data Rough Neural Network Lingras Rough Neural Network Adaptive Neuro-Fuzzy Network Back Propagation Neural Network
3.2 E 3 0.9975 1.562 5.4528
190 556 520 20,000
Table 8.4 Deceleration Distance network testing/validation. Testing error
2059
One may not discredit the conventional neural network architectures; and this, for two reasons, one, that it is predominantly the conventional neural architecture that has been used to redesign to get to the rough-neuro-fuzzy network and two, that these architectures are very robust. Take note of the approximation accuracy of the back propagation neural network (BPNN) in Tables 8.4B and 8.5B; despite of the reduced training data size, the conventional architecture maintains proximity in the respective error rates. This cannot be said about the modified methods. These results have been presented only to provide the reader with a better estimate of the methods. The presented trends are by no means comprehensive, thus only preliminary implications about the quality of the techniques should be inferred from them. More so, it should be understood that the proposed methodologies and refinements are effective only for a certain class of noisy systems and may not be efficient for all possible scenarios.
Runs
9. Conclusions A: Larger training data Rough Neural Network Lingras Rough Neural Network Adaptive Neuro-Fuzzy Network Back Propagation Neural Network
9.32 E 4 3.5 E 5 8.99 E 5 1.39 E 2
500 500 500 500
B: Smaller training data Rough Neural Network Lingras Rough Neural Network Adaptive Neuro-Fuzzy Network Back Propagation Neural Network
5.84 7.11 1.87 1.56
500 500 500 500
E2 E2 E2 E2
Table 8.5 Deceleration Time network testing/validation. Testing error
Runs
A: Larger training data Rough Neural Network Lingras Rough Neural Network Adaptive Neuro-Fuzzy Network Back Propagation Neural Network
0.6776 3.12 1.93 21.22
500 500 500 500
B: Smaller training data Rough Neural Network Lingras Rough Neural Network Adaptive Neuro-Fuzzy Network Back Propagation Neural Network
0.4511 4.62 6.04 19.34
500 500 500 500
9.1. The Achilles Heel This subsection lists the observed weaknesses of the architectures presented here. One of the major weaknesses comes in the form of increased complexity of building the neural network. And also the proposed rough-neuro algorithm can be beneficial only under certain conditions i.e. such an architecture remains effective only for cases with inherent imprecise or uncertain information and redundant for all others. The proposed network comes with a foresaid notion or an assumption that the certainty boundary for the data is known to the user; which might not be the case for all instances of noise. Such a system is fundamentally varied from a conventional neural network since the network learning warrants human input; that too, of a prominent nature. The mapping schema used to propagate the partitions forward through the layers rests on the concept of co-variance i.e. the variation in the boundary region is brought about through correlated variance calculated from the layer-1 partitions and the partitions from the preceding layer. Such a mapping mechanism ensures that the error rate of the neural network does not increase through the iterations, but it also tends to restrict sudden drops in the error rates while the network learns. 9.2. Final remarks
convergence has deteriorated for smaller training data sets, but the compromise might be fetching, in more than a few occasions. The initial opinion that the proposed rough-neuro architecture has the capability to consistently work around noise in the data is questioned through the experiments in modeling the Deceleration Time. With reference to the Table 8.4A, the adaptive neuro-fuzzy network and the rough-neuro network have very similar approximation capabilities apart from the speed of learning. This may not be regarded as the fallacy of the model, rather be deemed as a lean slot in the algorithmic efficiency; one should consider that the idea of introducing rough inclusion in neural networks is not only to improve the accuracy but primarily to expedite the learning curve. And evidently despite of the similar accuracies, the rough-neuro is 5 times faster than neuro-fuzzy. One of the prime aspects influencing the efficiency of a neural network is the computational speed of the network. Given the complexity of involved transfer functions, the computational speed becomes an even more important index. Iterating on a conventional 1.86 GHz Centrino (1 MB Cache; 433 MHz Front Side Bus) and supported with a 512 MB memory, each iteration took approximately 1.04 s.
This paper has put forward a new approach to approximation and information processing. An attempt to achieve human like learning; learning in parts, has resulted in a novel Rough Neuron and a sophisticated architecture for the Rough Neural Network. The work primarily, amalgamates the methodologies of Artificial Neural Networks and Rough Sets into what may be termed, a noise processing neural network. Based on the principle of partitioning the parameter space and thus learning efficiently and faster, this work proposes the New Rough Neuron. This new neuron is actually made up of two individual simpler neurons; one is called the Lower Neuron and the other is a Boundary Neuron; designed to deal with certain and uncertain information (respectively). A heuristic provided by the user is used to initially partition the signal width into clean and noisy regions. Further, a co-variance based mapping schema is used to propagate such partitioning, through the layers of the neural network. A back propagation learning paradigm has also been implemented. An extensive experiment has been conducted in order to test the competency of the modified neural network architecture. The results have been provided and an effort to highlight subtle
2060
S. Chandana, R.V. Mayorga / Neurocomputing 74 (2011) 2052–2061
but important features has been made. The developed RoughNeural Network has been found to possess superior approximation capabilities as well as the ability to converge faster. The tradeoff between the complexity of the neural network and the approximation capabilities is of a desirable nature. The computational time is equivalent to any other neural network, since the proposed neural network is found to require only a fraction of the neurons found in conventional networks. From the experiments and analysis of various Rough Neural Network parameters, it can be concluded that creating patterns through the means of relational classification has helped enhancing the learning and operational phase of the artificial neural systems. Such an effect is similar to the way humans reason; by processing certain knowledge first and then logically assigning truth values to random information that they posses. It can be stated that an iterative rough (-truth) -deduction process has been implemented in order to process noisy, random and uncertain information.
Acknowledgement The authors would like to thank Luis H. Garcia Islas for proofreading the revised version of this Paper.
Appendix A. Aerodynamic equations Complex mathematical functions related to supersonic aerodynamic control have been chosen; in particular the problem of approximating the landing parameters of a supersonic jet. Emphasis was laid on determining the effect of rough inclusion on the aspects of training time, training epochs, error of approximation and error propagation. Two neural networks, one for the deceleration distance,1 and the other for deceleration time2 have been devised. Following are the governing equations for each of the non-linear systems that have being modeled. The equations have been adopted from [32]. Deceleration distance: " ! !# Vf4 r2 S2 CDO V 4 r2 S2 CDO W log 1 þ ðA:1Þ X¼ log 1 þ i 2rgSCDO 4W 2 K 2 4W 2 K 2 Deceleration time: 2 qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 00 0 pffiffiffiffiffiffiffiffi 11 vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 0 u 2Vi rSWK CDO W u 6@ 1 @@ @ qffiffiffiffiffiffiffiffiffiffiffi4 pffiffiffi pffiffiffiffiffiffiffiffiAA arctan T ¼t 2 2WKVi2 rS CDO 2g 2 rS KC 3 DO
0
0 1111 qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi pffiffiffiffiffiffiffiffi pffiffiffiffiffiffiffiffi 2 B1 B2WK þVi rS CDO þ2Vi rSWK CDO CCCC qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi @ log@ pffiffiffiffiffiffiffiffi pffiffiffiffiffiffiffiffi AAAA 2 2WK þ Vi2 rS CDO 2Vi rSWK CDO qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 0 00 0 pffiffiffiffiffiffiffiffi11 2Vf rSWK CDO 1 pffiffiffiffiffiffiffiffiAA @pffiffiffi @@arctan@ 2 2WKVf2 rS CDO 0 0 11113 qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi pffiffiffiffiffiffiffiffi pffiffiffiffiffiffiffiffi 2 B1 B2WK þVf rS CDO þ2Vf rSWK CDO CCCC7 7 qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi @ log@ pffiffiffiffiffiffiffiffi pffiffiffiffiffiffiffiffi AAAA5 2 2WK þ Vf2 rS CDO 2Vf rSWK CDO ðA:2Þ
1 Deceleration distance is the point from which an aircraft should start decelerating from its full flying speed in order to be able to land accurately within specific constraints of space. Such a distance is inclusive of the altitude-drop travel. 2 Deceleration time is the total time in flight while decelerating before which the aircraft touches down. Such a time is exclusive of the runway travel time.
Total drag experienced by the aircraft 2 2 C K rS þV 2 rSCDO pffiffiffiffi D¼ L 2 K
ðA:3Þ
where Vi is the initial flying velocity; Vf is the touchdown velocity; V is the average velocity; r is the air density; S is the wing surface area; K is the induced drag coefficient; CDO is the zero drag coefficient; W is the aircraft weight; CL is the lift coefficient; and g is the gravitational acceleration.
References [1] F. Rosenblatt, The perceptron: a probabilistic model for information storage and organization in the brain, Psychology Review 65 (6) (1958) 386–408. [2] Churchland, On the Nature of Theories: A Neurocomputational Perspective, in: A. Clark, J. Toribio (Eds.), Cognitive Architectures in Artificial Intelligence, Garland Publishing, Taylor and Francis, 1998, pp. 139–182. [3] D.O. Hebb, The Organization of Behavior: A Neuropsychological Theory, John Willey, 1949. [4] R.J. MacGregor, Neural and Brain Modeling, Academic Press, 1987. [5] Z. Pawlak, Rough Sets: Theoretical Aspects of Reasoning about Data, Kluwer Academic Publishers, 1991. [6] A. Bargiela, W. Pedrycz, Granular Computing: An Introduction, Kluwer Academic Publishers, Boston, 2002. [7] Y.Y. Yao, A Partition Model of Granular Computing, Springer-LNCS Transactions on Rough Sets 1 (2004) 232–253. [8] T.O. Jackson, Data Input and Output Representations, in: E. Fiesler, R. Beale (Eds.), Handbook of Neural Computation, Oxford University Press, 1997, pp. B4.1–B4.11. [9] Z. Pawlak, Rough Sets, International Journal of Computer and Information Sciences 11 (1982) 341–356. [10] J. Komorowski, L. Polkowski, A. Skowron, Rough Sets: A Tutorial, in: S.K. Pal, A. Skowron (Eds.), Rough Fuzzy Hybridization—A New Trend in Decision Making, Springer, 1999, pp. 3–98. [11] L. Polkowski, A. Skowron, L. Polkowski, A. Skowron (Eds.), Rough Sets in Knowledge Discovery—I & II: Methodology and Applications, Physica-Verlag, 1998. [12] A. Skowron, N. Zhong, 2000. Rough Sets in KDD, in: Tutorial Notes, Proceedings of the Fourth Pacific-Asia Conference on Knowledge Discovery and Data Mining, Kyoto, Japan, April 18–20. [13] P. Lingras, 1996, Rough Neural Networks, in: Proceedings of the Sixth International Conference on Information Processing and Management of Uncertainty, Granada, pp. 1445–1450. [14] P. Lingras, Comparison of Neofuzzy and Rough Neural Networks, Information Sciences 110-3 (1998) 207–215. [15] J.F. Peters, A. Skowron, Z. Suraj, L. Han, S. Ramanna, Design of Rough Neurons: Rough Set Foundation and Petri Net Model, Foundations of Intelligent Systems, LNCS, LNAI 1932, in: Z.W. Ras, S. Ohsuga (Eds.), 12th Intl. Symposium on Foundations of Intelligent Systems, Charlotte-NC, October 11–14, 2000. Springer-Verlag, Berlin Heidelberg, 2000, pp. 283–291. [16] J.F. Peters, L. Han, S. Ramanna, Rough neural computing in signal analysis, Computational Intelligence 1–3 (2001) 493–513. [17] J.F. Peters, S. Ramanna, Z. Suraj, M. Borkowski, Rough Neurons—Petri net models and Applications, in: S.K. Pal, L. Polkowski, A. Skowron (Eds.), Rough-Neural Computing: Techniques for Computing with Words, Springer, 2004, pp. 471–490. [18] D. Slezak, J. Wroblewski, M. Szczuka, Constructing Extensions of Bayesian Classifiers with use of Normalizing Neural Networks, in: N. Zhong, Z.W. Ras, S. Tsumoto, E. Suzuki (Eds.), International Symposium on Methodologies for Intelligent Systems (ISMIS), LNCS, Springer, Maebashi, Japan, 2003, pp. 408–416 October 28–31, 2003October 28–31, 2003. [19] D. Slezak, J. Wroblewski, M. Szczuka, Neural network architecture for synthesis of the probabilistic rule based classifiers, Electronic Notes in Theoretical Computer Science 82-4 (2003) 1–12. [20] D. Slezak, J. Wroblewski, M. Szczuka, 2004. Feedforward Concept Networks, in: Proceedings of the Monitoring Security and Rescue Techniques in Multiagent Systems, International Workshop, Plock, Poland, pp. 148–150, June 7–9, 2004. [21] S. Chandana, R.V. Mayorga, Rough Approximation based Neuro-Fuzzy Inference System, in: Proceedings of the IEEE International Conference on Hybrid Intelligent Systems, Rio de Janeiro, Brazil, 2005, pp. 518–521. [22] D. Slezak, M. Szczuka, Rough Neural Networks for Complex Concepts, in: A. An et al.(Ed.), RSFDGrC 2007, LNAI 4482, Springer-Verlag, Berlin Heidelberg, 2007, pp. 574–582. [23] S.K. Pal, L. Polkowski, A. Skowron (Eds.), Rough-Neural Computing: Techniques for Computing with Words, Springer, , 2004. [24] J. Jelonek, K. Krawiec, R. Slowinski, J. Stefanowski, J. Szymas, 1994, Rough Sets as an Intelligent front-end for the Neural Network, in: Proceedings of the First National Conference on Neural Networks and their Applications 2, Kule, Poland, pp. 116–122. [25] J. Jelonek, K. Krawiec, R. Slowinski, Rough sets reduction for attributes and their domains for neural networks, Computational Intelligence 11-2 (1995) 339–347. [26] M. Szczuka, 1996. Rough Sets methods for Constructing Artificial Neural Networks, in: Proceedings of the Third Biennial European Joint Conference on Engineering Systems Design and Analysis, Montpellier, pp. 9–14.
S. Chandana, R.V. Mayorga / Neurocomputing 74 (2011) 2052–2061
[27] M. Szczuka, Rough Sets and Artificial Neural Networks, in: L. Polkowski, A. Skowron (Eds.), Rough Sets in Knowledge Discovery—I & II: Methodology and Applications, Physica-Verlag, 1998, pp. 449–470. [28] R. Swiniarski, F. Hunt, D. Chalvet, D. Pearson, 1995. Prediction system based on Neural Networks and Rough Sets in a highly Automated Production Process, in: Proceedings of the 12th System Science Conference, Wroclaw, Poland, 1995. [29] G. Cattaneo, Abstract Approximation Spaces for Rough Theories, in: L. Polkowski, A. Skowron (Eds.), Rough Sets in Knowledge Discovery—I & II: Methodology and Applications, Physica-Verlag, 1998, pp. 59–98. [30] G.W. Snedecor, W.G. Cochran, Statistical Methods, 7th Edition, Iowa State Press, 1980, pp. 180. [31] J.S.R. Jang, ANFIS: adaptive network based fuzzy inference system, IEEE Transactions of Systems, Man and Cybernetics 23 (5.6) (1993) 665–685. [32] A. Miele, Flight Mechanics: Theory of Flight Paths, Addison-Wesley, 1962.
Sandeep Chandana is finishing his Ph.D. in Electrical and Computer Engineering at the University of Calgary. He received his B.E. in Mechanical Engineering from Osmania University, India; and M.A.Sc. in Industrial Systems Engineering from University of Regina, Canada. His current research work involves high level information fusion and graphical models applied to distributed sensor networks.
2061
Rene V. Mayorga is currently an Associate Professor in the Faculty of Engineering at the University of Regina. He is interested in the development of Artificial/ Computational Sapience [Wisdom] and MetaBotics as new disciplines. He is involved in the development of Sapient [Wise] Decision & Control, Intelligent & Sapient (Wise) systems, and MetaBots and their application to robotics, automation, manufacturing, augmented perception, human-computer interaction & interface, and software engineering. He has published widely in scientific journals, international conference proceedings, books, and monographs; and has edited several international conference proceedings. He is the Editor in Chief of the Applied Bionics and Biomechanics journal, and Associate Editor for: the Journal of Control and Intelligent Systems, the International Journal of Robotics and Automation, the Journal of Intelligent Service Robotics, and the Journal of Robotics. Recently, he has co-edited a Book on Artificial/Computational Sapience (Wisdom), published by Springer in December 2007. Over the years he has served in several occasions as General Chair of International Conferences. He has also served as a member of the Program Committee, and organized Special Sessions, Workshops, and Tutorials, in many International Conferences.