Journal
of Economic
Neural
Dynamics
and Control
networks
17 (1993) 523-529.
North-Holland
and fuzzy systems
Book review Stefano Zambelli AUC. 9220 Aalborg,
Denmark
‘Thinking about learning is a real headache.’ Bart Kosko, Neural Networks proach to Machine Intelligence
The introductory book on Kosko is indispensable for researcher. In limited space explore, at different levels technology and fuzzy sets The book comes also with
Amit (1989, p. 428)
and Fuzzy Systems: A Dynamical System Ap(Prentice Hall, Englewood Cliffs, NJ, 1992).
neural network and fuzzy systems architecture’ by both the undergraduate student and the experienced (425 pp.) and at a small cost ($35) the reader is led to of depth, the fascinating world of neural networks philosophy in a quite accurate and enjoyable way. a diskette that allows the scholar to verify theories
1For the unaware reader who wants to know what artificial neural networks are about a countless number of books apart from this one are available. Even if fuzzy set theory is also rather new several books about this topic are also available. But, as far as I know this is the first book that couples fuzzv sets and neural network in such a straightforward manner. In ‘defining’ neural networks we can follow Arbib (1987, pp. 19-20) and say that a neural net is a ‘collection of McCulloch-Pitts neurons. each with the same time scale, interconnected by splitting the output of any neuron into a number of lines and connecting some or all of these to the inputs of other neurons. An output may thus lead to any number of inputs, but an input may come from at most one output’. The McCulloch-Pitts neuron is defined as an element with m inputs, x,, ,x,, and one output. It is characterized by m + 1 numbers, its threshold and weights, or, , w,, where wi is associated with xi. For a wellknown presentation and critique of the McCulloch-Pitts neuron a good reading is Minsky and Papert (1988). In defining fuzzy sets we can follow what was written by one of the precursors of fuzzy set systems [Zadeh (1965)]. In his original article Zadeh writes: ‘Let X be a space of points (objects), with a generic element of X defined by x. Thus, X = {x}. A fuzzy set (class) A in X is characterized by a membership (characteristic) functionf,(x) which associates with each point in X a real number in the interval [0, 11, with the value offA representing the ‘grade’ of membership of x in A’ [Zadeh (1965, p. 339)].
0165-1889/93/$05.00
0
1993-Elsevier
Science Publishers
B.V. All rights reserved
524
S. Zumhelli. Book reoiew
and her/his own understanding of them through well organized computer simulations. A modern economist will find the reading of the book relatively smooth because the basic theory uses mostly elementary calculus, linear algebra, simple dynamical systems, and probability theory as studied in the upper division undergraduate curricula in economics departments, while some applications use more advanced, but still well-known techniques such as digital signal processing, random processes, estimation, and control engineering. One of the main themes of the book is that ‘Intelligent systems adaptively estimate continuous functions from data without specifying mathematically how output depend on inputs’. This type of model-free approach is now familiar to many economists who would also agree that ‘ ‘Learning’ and ‘adaptation’ are linguistic gifts from antiquity. They simply mean parameter change’ (p. 22). In reading Kosko’s book we also find out that ‘Ultimately learning provides only a means to some computational end. Neural networks learn patterns or functions or probability distributions to recognize future patterns, filter future input streams of data, or solve future combinatorial optimization problems. [. .] We care how the learned parameter performs in some computational system, not how it was learned, just as we applaud the piano recitals and not the practice sections’ (p. 23). But how is learning actually occurring in a neural network? How can a neural network be trained to perform a piano recital? The process of ‘learning’ is well explained in the first five chapters of the book, where learning is described as changes in synaptic dynamics which in turn determine changes in the neuronal state of the system. A neural network has learned a functionf: R" + RP,when a vector x belonging to R" is mapped through the network to generate an output y belonging to RP,which is described by the functional law y’ =f(x). f(x) is considered to be properly computed by the neural network when the neural network output (described by a certain state of the artificial neurons) y adequately approximates the functional value y’ =f(x). A neural network may be trained to learn input-output associations (Xi, yi) and to remember what is ‘learned’ in a manner which is strictly dependent on the way in which the neural network has been constructed. Moreover learning may be ‘unsupervised’ or ‘supervised’ according to whether a ‘reward’ system exists or not. The reward system being described by a set of weights that would influence the synaptic adaptation according to whether the response to the input x by the neural network has been the proper y or not.’
‘Learning - changes in synaptic weights - may occur following deterministic or stochastic laws. Kosko presents several learning procedures such as: signal Hebbian learning law, competitive learning law, differential Hebbian learning law, differential competitive learning law. The four different laws are discussed and presented with several subvarieties and in both the deterministic and stochastic form.
S. Zamhelli.
Book rrricw
525
If this is the case, it can be objected that there are no specific reasons why the economists should learn about neural networks, since any good econometrician is well-versed in elaborating models of vector associations (Xi, yi). But, while an econometrician aims at estimating the actual functional form f: R" -+RP,an expert on neural networks theory can avoid this kind of knowledge; it is sufficient to model the knowledge about training couples (Xi, yi).3 Unfortunately almost all neural network synaptic weighting schemes, for example backpropagation, are very slow in learning, so that many training couples are required, and it is not very clear if other estimating schemes may have, according to the problem at hand, better performance results. Neural networks are not free from problems. In fact they may be structurally unstable, reach saturation conditions, find local instead of global minima, never converge, generate spurious results and so on. This is also well documented and acknowledged by Kosko. The innovative part of the book, as it can be guessed from the title, is the association between neural network theory and fuzzy set systems. The basic idea is that what thinking machines are learning about is bound to be, in most cases, fuzzy, i.e., imprecise and not clearly defined. The idea is that membership into a class is subject to degrees: How green is green? When is it that a mountain is a mountain and when is it that it isn’t anymore? Kosko seems to avoid and not to solve, as he seems to imply, a Kantian-like question on what is green by measuring in degrees the quality of the uncertain element which he is going, despite its uncertainty, to define. ‘Fuzzy theory holds that all things are matters of degree. [. .] Fuzzy theory also reduces black-white logic and mathematics to special limiting case of gray relationship. Along the way it violates blackwhite ‘laws of logic’, in particular the law of noncontradiction not-(A and not-A) and the law of the excluded middle either A or not-A, and yet resolves the paradoxes or antinomies that these laws generate’ (p. 3). In other terms, the set-theoretic features A n A’ = @ and A u A’ = X do not hold. That is: while it is true that Mount Everest is a mountain (and therefore has truth value 1) we have deterministic physical evidence that the mountain K2 is not quite as high as Mount Everest so it is not quite a mountain (and therefore has truth value less than 1, but very close to 1). Here the declared purpose of solving paradoxes and logical antinomies is not achieved, if we consider that not only the criterion of measure and its degrees, but the element itself (the green, the mountain) is defined by connections to other predetermined elements. So to answer the question: Is K2 a mountain? we must relate to another nonfuzzy element, Everest, as a superior criterion to define
‘In the case of linear associations the traditional econometric tools would probably perform at least as well as neural networks. To estimate functions generating nonlinear associations is a more difficult task and in that case neural networks may turn out to be far superior than traditional econometric tools.
526
S. Zamhelli.
Book review
a mountain by its measure, while we have already decided that K2 is a fuzzy element. On which basis? The more obvious basis could be that we already know what a mountain is (mountain = Everest = 1) and so defining in a precise way the degrees we can use to determine what a mountain is not. Inevitably, far from solving logical antinomies, Kosko has to deal with them, because he chooses, for defining a fuzzy element, more an A element than a non-,4 element, not solving but simply avoiding the problem and finally staying inside the logic of the excluded third that in fuzzy theory he wants to refuse. The attempt at overcoming traditional logic paradoxically deals with its aporias when the ‘alternative logic’ partially accepts (the distinction between a ‘mountain’ and a ‘not quite a mountain’) and partially rejects (the ‘not quite a mountain’ is not a ‘nonmountain’) the rules of that logic. In the case of an individual that tells one time a lie and another time a truth, his membership in the set of liars is measured to be l/2. Accordingly another individual who tells a lie nine times out of ten has measure 9/10, i.e., he is almost a liar but not quite. Moreover an individual who tells always half truths is also half a liar and his membership measure is l/2. But Kosko goes even further and claims that paradoxes may as well be assigned a value of l/2 because in the case of the liar paradox we cannot decide on whether the individual is telling a lie or not when he tells he is telling it. The event of somebody that is telling lies half the times is different in nature from an event the truth of which cannot be decided. Unfortunately, as Kosko seems to want us to believe, paradoxes and antinomies are not ‘resolved’ simply by assigning to them membership value l/2. The problem of decidability is very different from the problem of degree of membership.4 In the examples above the distinction between the liar paradox and a liar that tells the truth half the times is precisely a distinction between something that our knowledge (formal construction) cannot solve (decide) and something the truth of which can easily be decided (we simply need to discriminate between something we provisionally accept to be true from something we provisionally accept to be false). In this fuzzy set algebra for a given ‘point of discourse’ x there exists an effective procedure that we can use to decide if it is a member of the well-defined set X (‘universe of discourse’) and to compute its membership value or not (if it has the property of being green but not too green). In the situation in which we cannot decide on the membership measure of the element x the question is still
“On p. 268 Kosko observes: ‘The only subsets of the universe that are not in principle fuzzy are the construct of classical mathematics’. After having discussed at length the questions of paradoxes, it is surprising that Kosko values classical mathematics as free from paradoxes i.e., nonfuzzy. The work of Russel and Whitehead had been devoted to the construction of formal systems (universes of discourse) that would be free from paradoxes. The dream of constructing such formal systems was destroyed by Code1 and Turing who showed that it is impossible to construct formal systems in which propositions may be decided to be either true or false (truth value 1 or 0). It is therefore ‘paradoxical’ that Kosko values classical mathematics nonfuzzy. Inside a formal system a proposition may be decided to be true or false or eventually undecidable.
S. Zambelli, Book review
527
open and therefore is undecided or undefined. If we can describe events in some effective way (as in the case in which degrees of memberships are ‘deterministitally’ defined) there is nothing fuzzy about it. In fact, if the universe of discourse X is made of countable couples ({xi}, {mi}),where {mi> is the degree of membership of {xi}, it is difficult to see how the law of the excluded middle would not hold, as Brouwer (1923) has, I think, correctly maintained. It is therefore difficult to understand the bitterness in Kosko’s argument (pp. 287-293) against subjective probability theory. In the case of subjective probability theory a la deFinetti (1970) (where we provisionally, as long as other information will not lead us to change our opinion, define a mountain as ‘a mountain’ and something less than a mountain as ‘something less than a mountain’), probability theory is viewed as a measure of degree of confidence of our state of knowledge about an event. Subjective probability theory really implies that everything is ‘fuzzy’ so that we take a mountain to be a mountain, green to be green, only as a provisional fact. In this sense, as measures of our opinions, there may be a similitude between subjective probability and fuzzy theory a la Novak when he states that ‘the grade of membership can also be viewed as a degree of our certainty (belief) that the element belongs to the given fuzzy set’ [Novak (1989, p. 28)]. The difference with Kosko’s view is that he would claim that ‘Fuzziness is a type of deterministic uncertainty’ (p. 267). Apart from relevant theoretical questions there is no doubt about the success of the empirical applications of fuzzy algebra. As the extensive literature on the topic testifies, its power stems from the fact that a description of a fuzzy membership degree function requires the specification of if-then rules that can be made only when strong knowledge about the categorization is known and assumed to be found in ‘real world descriptions. This is well documented in the computer simulations where fuzzy adaptive systems are applied. The performance of the fuzzy associative memory is shown to be superior, in controlling ‘an inverted pendulum’ and the ‘backing up of a truck’, than the performance exhibited by the implementation of other (neural network) algorithms. The powerfulness of ‘fuzzy’ encoding stems from the fact that the controlling rules are programmed on the basis of the ‘common’ sense knowledge of the programmer. But programming with the (threshold) if-then rule is not a prerogative of fuzzy logic. The if-then rule is in fact a logic operator which is at the basis of the most traditional bivalent truth tables (if A, then B).5 The book deals with so many topics related with neural network and fuzzy theory applications that it is almost inevitable that some points would be, in such a short space, not discussed at length. From the technical point of view the reader may feel uneasy, for example, by the way in which the continuity concept 5This is not, of course, to deny the success of the applications of ‘fuzzy’ systems in controlling subways, focus cameras, tune color televisions, control automobile transmission, control traffic lights, and so on. On the contrary, the success of the applications is to be attributed to the specification of thresholds. But is this a prerogative of the fuzzy logic only?
528
S. Zan~belli. Book rrcitw
is treated. And in general the reader might have the desire to know more about the philosophical questions and economic issues that Kosko often raises. In the case of continuity many results are shown, through the book, to hold for continuous functions, but in fact the applications are all defined in terms of discrete numerical values (and therefore assumed to be defined on the range of rational numbers). Kosko does not discuss the relation existing between continuous and discrete domains. For example while at p. 149 Kosko states that ‘The continuous stochastic-differential formulation allows us to prove not only that average synaptic vectors converge to centroids, but that they converge exponentially quickly to centroids’, on the next page he conducts the analysis using a stochastic difference-equation algorithm so that the reader may be in doubts about the fast convergence properties. The use of philosophical references does also often stimulate curiosities that remain unsatisfied. For instance, at p. 268 Kosko writes: ‘The world, as Wittgenstein (1922) observed, is everything that is the case. In this spirit we can summarize the ontological case for fuzziness: The universe consists qf all the subsets of’ the unicerse’ (p. 268). And this is the only mention or reference to Wittgenstein’s work that occurs in the book. In the light of the fact that the ‘Tractatus-logic0 philosophicus’ is one of the first works in which bivalent truth tables are presented, the reader, stimulated by Kosko’s comment, may want to know more about the announced connection between fuzzy theory and the work of the young Wittgenstein. Relevant questions for an economist are also spread all over the book; they range from model-free estimation techniques to the comparison between Kalman filter controls and fuzzy association rules, from knowledge retrieval to the dynamics of learning, and so on. Kosko is sometimes suggesting a strict relation between the description of an economic system and trained neural nets. The analogy being that a neural network system performs a function as a whole that single units (agents as neurons) are ‘unaware of’. This ‘self-organizing’ feature of neural networks allows Kosko to make reference to ‘the invisible hand’ (p. 91) philosophy. While this type of analogy is very appealing, one should keep in mind that in the case of the ‘invisible hand’ the agents do not realize that their ‘greedy’ behaviour is the reason for the stability and welfare of the overall system and therefore of the other agents and themselves. In the case of the neurons, I don’t think that the analogy is very powerful because the fact that the state of the neuron is 0 or I does not tell anything about the neuron’s welfare.6
‘The situation might be very different when we consider a collection of neurons performing functions which are analog to the description of some plausible economic behaviour. Still, in this case each individual collection of neurons in order to perform a certain function has to be ‘trained’. Whether the interaction of this trained neural network will originate ‘invisible hand’ like results is a question that would have to be faced in specific contexts and according to the specific properties of the overall neural net.
S. Zambelli, Book reoiew
529
Despite these critical points the depth and lucidity of the presentation and the wide range of topics discussed make Kosko’s book a fascinating work that opens up a new perspective on the world of neural networks and fuzzy systems and inspires many curiosities and intellectual stimuli. This is certainly a big achievement for an introductory book.
References Amit. J., 1989, Modelling brain function (Cambridge University Press, Cambridge). Arbib, M.A., 1987, Brain machines and mathematics (Springer Verlag, New York, NY). Brouwer, L., 1923, On the significance of the principle of the excluded middle in mathematics, especially in function theory, Reprinted in van Heijenhort (1967). deFinetti, B., 1970, Teoria delle probabilita (Einaudi, Torino). van Heijenoort, 1967, From Frege to Godel (Harward College, Harward). Kosko, B., 1990, Fuzziness vs. probability, International Journal of General Systems 17, 21 I-240. Kosko, B., 1991, Neural network for signal processing (Prentice Hall, Englewood Cliffs, NJ). Novak, V., 1989, Fuzzy sets and their applications (Adam Hilger, Philadelphia, PA). Wittgenstein, L., 1922, Tractatus logico-philosophicus (Routledge & Keagan, London). Zadeh, L., 1965, Fuzzy sets, Information and Control 8, 3388353.