Neurocomputing 48 (2002) 741–762
www.elsevier.com/locate/neucom
Learning of associative prediction by experience Andrzej Wichert ∗ Department of Neural Information Processing, University of Ulm, Oberer Eselsberg, D-89069 Ulm, Germany Received 25 January 2001; accepted 13 August 2001
Abstract We introduce a neuronal model which learns during problem solving the associative prediction by experience. The model uses picture representation rather than symbolic representation to perform problem solving. Consequently the computational task corresponds to the manipulation of pictures. A computation is performed with the aid of associations by the transformation from an initial state represented as a picture to a desired state represented as a picture. Picture representation enables learning from examples through the de4nition of similarity between di5erent problems. The solved problems are reused to speed up the search for related or similar problems. The model learns by experience of failures and successes by an associative memory in which pictorial sequences of states describing plans are stored. The learning with di5erent strategies is demonstrated by empirical experiments in the block world. It is shown that depending on the learning strategy learning improves c 2002 Elsevier Science B.V. All rights the behavior of the model in signi4cant manner. reserved. Keywords: Agents; Associative memory; Distributed representation; Learning; Production system; Problem solving
1. Introduction 1.1. Motivation Currently arti4cial neural networks are used in many di5erent domains. But are neural networks also suitable for modeling problem solving and learning of an agent [8], a domain which is traditionally reserved for the symbolic approach? ∗
Tel.: +49-731-502-4257; fax: +49-731-502-4156. E-mail address:
[email protected] (A. Wichert).
c 2002 Elsevier Science B.V. All rights reserved. 0925-2312/02/$ - see front matter PII: S 0 9 2 5 - 2 3 1 2 ( 0 1 ) 0 0 6 7 1 - 3
742
A. Wichert / Neurocomputing 48 (2002) 741–762
This question is answered in this work. It is aErmed by corresponding neural network model of an agent in a block world. The model is composed of neural associative memories. It has the same behavior as the symbolic agents. However, also learning from examples resulting from the distributed representation of knowledge is possible. Di5ered learning strategies with the aid of a neural associative memory are examined. It is shown by empirical experiments, that learning lead to a signi4cant algorithmic improvement. 1.2. Summary Equivalent to a production system our neural model forms a sequence of actions which lead to a solution of a problem. It is composed of two kinds of associative memories. The 4rst represents the associations which describe the possible actions. The second learns by experience of failures and successes during problem solving, to speed up the search for related or similar problems. A new structured binary vector representation which allows the de4nition of similarity is introduced. Structuring is used by the 4rst associative memory during recognition and execution of the associations. Similarity is needed by the second associative memory to perform the learning from experience. In three di5erent learning strategies in the block world it is demonstrated how to learn from experience. Additionally, the problem of the catastrophic interference is examined. Catastrophic interference [21,32,9] is present when new information which is learned interfere radically with the previously stored information. 2. Symbolic problem solving by an agent Problem-solving by an agent can be modeled by a production system which implements a search algorithm [19]. Production systems theory describes how to form a sequence of actions which lead to a goal, and o5ers a computational theory of how humans solve problems [2]. Production systems are composed of if-then rules which are also called productions. The complete set of productions constitute the long-term memory. Productions are triggered by speci4c combinations of symbols which describe items. These items represent a state and are stored in short-term memory. A computation is performed with the aid of productions by the transformation from an initial state in the short-term memory to a desired state. By allowing backtracking [19] and the exclusion of loops, a search from the initial state to the desired state is executed. The search de4nes a state space, problems are solved by searching in this space whose states includes the initial situation and the desired situation [19]. A production system by itself does not adequately explain learning from experience, where the solved problems are reused to speed up the search for related or similar problems. The de4nition of similarity between di5erent problems is very diEcult. The diEculties arise, because the states are represented by symbols which are used to denote or refer to something other than themselves, namely to other things in the world. In this context symbols do not by them-
A. Wichert / Neurocomputing 48 (2002) 741–762
743
selves, represent any utilizable knowledge. They cannot be used for a de4nition of similarity criteria between themselves. 2.1. Learning search control knowledge There are di5erent kinds of learning laws which can help a problem solver to learn from past experience [6,25]. “Chunking” and “learning by explanation” are learning laws which were symbolically described and implemented, and integrated into a symbolic problem solver. Chunking is a learning method which is used in the SOAR system [18,27]. It operates by summarizing the information examined while processing a subgoal. If a state causes an impasse a new rule is learned which avoids it. An impasse is present, for example, when no valid succeeding state exists. By the generalization of new rules future impasses are avoided. Explanation-based learning can produce control knowledge by analyzing the trace of problem solving examples. The system explains why the choices were made and the explanation identi4es the relevant features of the example [26,24]. A strong body of domain knowledge should be present as it is useful in explaining both the problem and the generalization of that explanation. After learning, a description is present which is a generalization of the example and which helps to control the search of other examples. Both methods have weaknesses, with continuing learning the storage and time recourses grow, and a parallel implementation of the learning methods is diEcult. Associative memory which models human memory [30,7,10,34] eliminates those weaknesses [29]. 3. Subsymbolic problem solving by an agent In subsymbolic representation, items are represented by vectors, rather then by symbols. Vectors which represent items de4ne a space in which the similarity between themselves can be computed. Due to that, computation with vectors is a geometrical computation in space [1,11]. Vectorial representation is primarily used in the domain of pattern recognition, rather than in the domain of problem solving. This work examines the use of vector or pattern-based representation and the resulting consequences in the domain of problem solving. 3.1. Associative prediction To ease comprehension for the human reader, additional pictures like twodimensional binary sketches are often used beside the symbolical representation (see Fig. 1). Pictures correspond to vectors. Vectors de4ne a space in which the similarity between themselves can be computed. Sequences of vectors can be learned and recalled by the associative memory [15]. GLunther Palm suggested the usage of the associative memory to associate a situation with a list of moves, including in the association of the relative value of those moves [29]. This idea was realized in the domain of game playing. A heuristic
744
A. Wichert / Neurocomputing 48 (2002) 741–762
Fig. 1. Additional pictures like two-dimensional binary sketches are often used beside the symbolical representation.
for checkers and chess was learned with the aid of the associative memory [20,3]. In checkers, a game position represented the question. The succeeding position and its value computed by the MINIMAX algorithms was the answer vector. The associative memory learned game states and the succeeding game states (with its relative values) by diverse example games. After training, the associative memory was used to retrieve the succeeding learned positions with their values during a new game. A retrieved succeeding position reduced the needed search so that an improvement in the game playing was observed. The game’s positions were coded by feature vectors and a 4gure was coded by a “one” at a certain position. A vector was composed of vectors who indicated, for each position, the 4gure which occupied it. The numeric values resulting from the MINIMAX algorithm were coded by a “one” at certain positions. The analogic method was used, on the other hand, for chess play [3]. It led to the reduction of the size of the search trees. Also Nils Nilsson [28] suggests the usage of a neural network to predict the value of a state represented by a feature vector. After training, the prediction network can be used to compute the feature vectors that would result from various actions. These in turn could be used as new inputs to the network to predict the feature vector two steps ahead, and so on. From [28, p. 123]. A sequence of pictorial states describing a plan can be stored in an associative memory. The sequence commences with the initial state and ends with the penultimate state before the desired state. After the storage of some plans, a succeeding state of one stored plan can be recalled, given the current and the desired state. The current state together with the desired state represent the question, the next state the answer. A question vector is composed by concatenation of the pattern representation of the current state and the desired state. The answer vector corresponds to the pattern representation of the next state. Both vectors can be stored in the associative memory. From each current state together with the desired state a question vector is formed and stored together with the answer vector corresponding to the next state. After “learning” the sequence can be recalled. By posing the question vector which is composed of the initial state, holding the desired state part, and by feedback, the
A. Wichert / Neurocomputing 48 (2002) 741–762
745
Fig. 2. The sequence begins with the question vector which is form by the concatenation of the vectors representing initial and desired state. Via the feedback connections a new question vector is formed out of the concatenation of the answer vector and the vector representing the desired state. The corresponding answer vector is the next state of the sequence.
answer vector to the question vector, the state sequence is determined (see Fig. 2 and [29]). Owing to the ability of the associative memory to determine the most similar stored pattern to a currently not stored pattern, similarity between di5erent problems can be de4ned by the pictorial state representation. An associative memory in which a sequence of states describing a plan is stored is called the prediction associative memory. 3.1.1. Associative memory Binary represented states can be represented by the associative memory. The associative memory [35,13] is composed of a cluster of units which represent a simple model of a real biological neuron. The unit is composed of weights which correspond to the synapses and dendrites in the real neuron. They are described by wij in Fig. 3. T is the threshold of the unit. Two pairs of binary vectors are associated, this process of association is called learning. The 4rst of the two vectors is called the question vector and the second, the answer vector. After learning, the question vector is presented to the associative memory and the answer vector di5erence is determined. Learning and forgetting: In the initialization phase of the associative memory no information is stored. Because the information is represented in the weights, they are all initially set to zero. In the learning phase, binary vector pairs are associated. Let ˜x be the question vector and ˜y the answer vector, so that the learning rule for changing the weights wij is wijnew = wijold + yi xj : This rule is called the binary unclipped Hebb rule [29]. Every time a pair of binary vectors is stored this rule is used. Therefore, in each weight wij of the associative memory the frequency of the correlation between the components of the vectors is
746
A. Wichert / Neurocomputing 48 (2002) 741–762
Fig. 3. The associative memory is composed of a cluster of units.
stored. This is done to ensure the capability to “forget” vectors which were once learned. In this case, the following binary anti-Hebb rule [37] is used: old wij − yi xj if wijold ¿ 0; new wij = wijold if wijold = 0: The anti-Hebb rule is used to prevent an overloading of the associative memory during learning from experience. Retrieval: In the retrieval phase of the associative memory, a fault tolerant answering mechanism recalls the appropriate answer vector for a question vector ˜x. To the presented question vector ˜x the most similar learned ˜xl question vector regarding the hamming distance is determined and the appropriate answer vector ˜y is identi4ed. For the retrieval rule the knowledge about the correlation of the components is suEcient, and the knowledge about the frequency of the correlation is not used. The retrieval rule for the determination of the answer vector y is n 1
(wij xj ) ¿ T; yi = j=1 0 otherwise with the function (x) 1 if x ¿ 0;
(x) = 0 if x = 0: Ti is the threshold of the unit. The threshold is set to the maximum sum T := max i
n j=1
wij xj :
n
j=1
wij xj :
A. Wichert / Neurocomputing 48 (2002) 741–762
747
Reliability of the answer: Once an answer vector is determined, it would be useful to know how reliable it is. Let ˜x be the question vector and ˜y the answer vector that was determined by the associative memory. First, the vector ˜xl which belongs to the vector ˜y is determined. The vector ˜xl is determined by a backward projection of the vector ˜y. The synaptic matrix used in the backward projection is a transpose of the matrix W which is used for the forward projection. In the second step, the similarity of the stored question vector ˜xl to the actually presented vector ˜x is determined. The greater the similarity of the vector ˜xl to the vector ˜x, the more reliable the answer vector ˜y. 3.2. Distributed representation in the blocks world domain The domain of the “colored” or “marked” blocks world was chosen to examine the learning of associative prediction because it is well known and extensively studied in the arti4cial intelligence community [28,5,4]. In this blocks world, the blocks di5er by color, but not by form [38,8,33,36,19]. In this example, blocks can be placed in three di5erent positions and picked up and set down by a robot arm. There are three di5erent types of blocks. They di5er by attributes such as color (red, green, blue) or marks, but not by form. In AI they are traditionally called A, B, C blocks [19]. In our binary picture representation the A, B, C marks correspond to the marks at the corner of the counter representing the blocks (see Fig. 4). Both, an empty roboter arm, which is represented by the right corner, or a “clear” position are represented by a dot. Sixty di5erent states are possible in this blocks world example.
Fig. 4. A state in the marked blocks world.
748
A. Wichert / Neurocomputing 48 (2002) 741–762
Fig. 5. A cognitive entity.
3.2.1. Structured distributed representation The computational task concerning problem solving corresponds to the manipulation of pictures, but how can this be done? A structured state representation by pictures is needed, so that objects in the picture can be manipulated. Gross and Mishkkin [12] suggest that the brain includes two mechanism for visual categorization: one for the representation of the object and the other for the representation of the localization [16,31]. The 4rst mechanism is called the “what” pathway and is located in the temporal lobe. The second mechanism is called the “where” pathway and is located in the parietal lobe [16,31]. According to this division, the identity of a visual object can be coded apart from the location and the size of the object. A visual state represented by a vector can be also represented by meaningful pieces of the vector [14,1]. Pieces which represent objects of the scene are called cognitive entities. Each cognitive entity represents the identity of the object and its position by the coordinates. The identity of an object is represented by a binary pattern which is normalized for size and orientation. Its location corresponding to the abscissa is represented by a binary vector of the dimension of the abscissa of the picture representing the state. The location corresponding to the ordinate is likewise represented by a binary vector of the dimension of the ordinate of the picture representing the state. A binary bar of the size and position of the object in the picture of the state represents in each of those vectors the location and size. A block world state can be represented by cognitive entities (see Figs. 5 and 6). 3.2.2. Associations Cognitive entities can represent associations which represent transitions between states. The 4rst pattern represented by the cognitive entities describes the state which should be present before the transition (the premise). The second pattern
A. Wichert / Neurocomputing 48 (2002) 741–762
749
Fig. 6. The block world state (see Fig. 4) represented by seven cognitive entities.
describes the world state after the transition (the conclusion). In order to preserve the equality of cognitive entities in the premise and in the conclusion pattern, a notation for an empty cognitive entity is used (see Fig. 7). In Fig. 7, an example from the block world is shown. Both, an empty robot arm, which is represented by the right corner, or a “clear” position are represented by a dot. The cognitive entities of the premise pattern are replaced by the conclusion pattern in case the similarity between the condition pattern and the corresponding part of the state picture is suEcient. In our “ABC” blocks world example the possible moves are described by 55 associations. 3.3. Associative computation The states correspond to binary pictures represented by cognitive entities. A computation is performed with the aid of associations by the transformation from an
750
A. Wichert / Neurocomputing 48 (2002) 741–762
Fig. 7. Representation of the association: If block “A” is at a certain position and above it, it is clear and the gripper is empty then the block “A” is grasped by the gripper. The old position of the block is marked as clear, avoiding the frame problem [19]. One cognitive entity of the conclusion pattern is not used. In the inverse association, the premise pattern is interchanged with the conclusion pattern.
initial state into a desired state. Associations can be represented by the associative memory. A cognitive entity is represented by a binary vector formed by the concatenation of the three binary vectors which represent the three associative 4elds (see Fig. 5). A permise or conclusion is represented by a binary vector formed by the concatenation of the three binary vectors which represent the cognitive entities (see Fig. 7). 3.3.1. State transitions A state is represented by cognitive entities. Associations represent transitions between the states representing pictures. The premise of an association is represented by cognitive entities which describe a correlation of objects which should be present (see Figs. 8 and 7). If present, they are replaced by cognitive entities of the conclusion. In the blocks world example is seven, and is three. Generally, the premise is described by fewer cognitive entities then the state,
6 . In the recognition phase, all possible -permutation of cognitive entities should be composed to test if the premise of an association is valid. ! : := P(; ) = ( − )! This is done because the premise can describe any correlation between the cognitive entities. In the retrieval phase after learning permutations are formed. Each permutation represents a question vector ˜xi ; i ∈ {1; : : : ; }. To each question vector ˜xi an answer vector ˜yi with the quality criterion is determined. If the reliability value of this answer vector is above a certain threshold, the association can be
A. Wichert / Neurocomputing 48 (2002) 741–762
751
Fig. 8. A copy of the state representation is formed and the corresponding cognitive entities are replaced by the conclusion pattern. Objects are represented for simplicity by geometrical 4gures. ( is seven and is three).
executed. A copy of the state representation is formed and the corresponding cognitive entities are replaced by the conclusion pattern (see Fig. 8). Associative memory which perform this task is called the permutation associative memory. Given a state represented by a unit, the permutation associative memory recognizes s question vectors, s copies of state representation are formed. In the block world example this would correspond for example of grasping from di5erent position by the gripper. The cognitive entities which form the question vectors (premise) are replaced by the cognitive entities of the answer vectors (see Fig. 8). 3.3.2. Search and associative prediction The state space is represented by a chain of units in which values are propagated by local spreading activation. After the temporary parallel execution of a chosen state, s new states emerge. From the st states, one state is chosen and the new st+1 states are determined. A state can cause an impasse when no valid transition to a succeeding state exists. In this case, backtracking to the previous state is performed. Another state can be chosen, if possible, or backtracking is repeated. The resulting search strategy is the “deep search” strategy. After learning sequences of states describing plans associative prediction can be performed. The permutation associative memory determines s succeeding b(i) states of a chosen state. At the same time, the prediction associative memory determines to this chosen state together with the desired state the answer pattern Pred. The state b(i), whose pattern is most similar to Pred is chosen (see Fig. 9) and the search is continued. During the determination of similarity the counter of the blocks are hidden, only the marks at the corners are considered (see Fig. 10). At the beginning of learning session the prediction memory is empty. The associative prediction is improved during learning. After a solution of a problem was found using the associative prediction which guides the search, it is stored in the prediction associative memory. The prediction associative memory “learns” by
752
A. Wichert / Neurocomputing 48 (2002) 741–762
Fig. 9. The permutation associative memory determines s succeeding b(i) states of a chosen state (in this example s = 4). At the same time, the prediction associative memory determines the answer pattern Pred to this chosen state together with the desired state. The state b(i) which is most similar to Pred is chosen.
Fig. 10. The counter of the block is hidden.
experience of failures and successes in the presented examples. The resulting search strategy is the hill climbing search strategy [38]. The 4rst example is learned, and simultaneously the associative prediction is done by the prediction associative memory. During the next example the knowledge which was learned from the previous example is reused. This procedure is repeated with the remaining examples. To prevent an overloading of the prediction associative memory, states which lead to an impasse state and the impasse state itself are forgotten by the anti-Hebb rule. The sequence of states describing the plan is learned and the incorrect state sequences are forgotten. But how to learn in a domain from experience? In which order should the examples be presented? Is there a di5erence between 4xed or random order of presentation?
A. Wichert / Neurocomputing 48 (2002) 741–762
753
3.4. Experiments To clarify the previous questions, experiments concerning learning by experience of how to build a CBA tower (Fig. 11) were performed. Fifty-nine di5erent initial states are possible in our blocks worlds. Of the 59 states, 34 initial states were chosen and the remaining 25 states were not considered (Fig. 12). There were two reasons for not including a state. First, some of the states represent trivial problems, for example only one move from the desired state. The other reason was that some states were already nearly described by other states, for example di5erent combinations of towers in the middle position, versus di5erent combinations of towers in the left position.
Fig. 11. The desired state, CBA tower.
Fig. 12. The 4rst six initial states of 34 di5erent initial states.
754
A. Wichert / Neurocomputing 48 (2002) 741–762
Fig. 13. Required steps, build CBA tower during learning. The learning procedure converges to a stabilization after the tenth learning interval.
3.4.1. Learning of experience in a 8xed order and the catastrophic interference The prediction associative memory “learned” by experience of failures and successes in the 34 examples. The 4rst example is learned, and simultaneously the associative prediction is done by the prediction associative memory. During the next example the knowledge which was learned from the previous example is reused. This procedure is repeated with the remaining examples. The computer resources during the experiments were restricted to a quantity of 600 steps. At the beginning of the learning the prediction memory was empty. A stage consists of the tasks I1 → D1; I2 → D2; : : : ; I34 → D1 in a 8xed order. The stages are repeated sequentially another 15 times. A three-dimensional bar chart illustrates the development of learning from experience (see Fig. 13). At the third time interval, third stage, a strong interference of the task I17 → D1 with other tasks occurred (see Fig. 13). This is an example of the problem called catastrophic interference [21,32,9]. Since the associative memory stores patterns in a single set of weights, when new patterns are learned, the new information may radically interfere with previously stored patterns. Despite the determined arrangement of the tasks, the interference of the task I17 → D1 could be eliminated at time interval 10. The elimination of the catastrophic interference is linked with the forgetting of corresponding patterns by the anti-Hebb rule, as seen in Fig. 14. The results are not persuasive, despite the signi4cant improvement to the blind search (see Table 1). Is an improvement of the results possible? Could we overcome catastrophic interference the detachment of particular solutions?
A. Wichert / Neurocomputing 48 (2002) 741–762
755
Fig. 14. The number of synapses of the prediction associative memory which are not zero during learning. The learning procedure converges to a stabilization after the tenth learning interval. 167971 of 1.991018 synapses are not zero (0.0844%) after learning. Table 1 Comparison blind search versus associative prediction, 4xed order of presented stagesa
Mean steps Mean plan length Mean backtracking steps a The
Blind search
After learning, 4xed order
p
62 37.53 12.24
36.88 29.29 3.79
0.0019 0.00001 0.000012
p values were determined by the paired sample t test. Signi4cant for p ¡ 0:05 by convention.
3.5. Context information A problem can have diverse solutions. The same state in di5erent sequences can lead to di5erent following states despite the same desired state. These di5erent states are dependent on the initial state. This case can only occur when the solutions are not optimal. Nevertheless, the adherence of the initial state to the question vector could lead to the improvement of learning from experience. The improvement could result from the detachment of particular solutions, the catastrophic interference could be avoided. This kind of prediction is called the “context prediction”. The prediction associative memory learns and is used as before, however, this time with the addition of the context. The question vectors are composed by concatenation of the pattern representation of the state, the initial state, and the desired state. The answer vectors correspond to the patterns of the next states (see Fig. 15). At the second time interval, second stage, there is strong interference with other tasks (see Fig. 16), it disappears in the proceeding step. The improvement comes along with the reduction of the weights of the prediction associative memory (see Fig. 17). Context information nearly resolved the problem of the interference (see Table 2). But still, the question remains. Can we prevent the interference altogether? Could the order of the presented tasks play an important role in the learning of associative prediction from experience?
756
A. Wichert / Neurocomputing 48 (2002) 741–762
Fig. 15. The question vectors are composed by concatenation of the pattern representation of the state.
Fig. 16. Required steps, build CBA tower with context during learning. The learning procedure converges to a stabilization after the seventh learning interval.
3.6. Generate and test The assumption, that the order of the presented examples could play an important role in the learning of associative prediction from experience is examined in the following experiment. In this experiment there are no stages or 4xed orders of presentations. A task consists of building a CBA tower starting from a ran-
A. Wichert / Neurocomputing 48 (2002) 741–762
757
Fig. 17. The number of synapses which are not zero during learning. The learning procedure converges to a stabilization after the seventh learning interval. 308169 of 2:988018 synapses are not zero (0.1031%) after learning.
Table 2 Comparison blind search versus associative prediction, 4xed order of presented stages with context
Mean steps Mean plan length Mean backtracking steps
Blind search
After learning, with context
p
62 37.53 12.24
33.18 27.24 2.97
9:93−6 0.0016 9:55−7
dom state, which is chosen of the 34 di5erent initial states. The tasks are repeated 340 times, whereby each time another initial state is randomly determined, t → D1 It ∈ {I 1; : : : ; I 34} t ∈ {1; : : : ; 340}: At the beginning of the learning the prediction memory is empty, the 4rst example of a task is learned, and simultaneously the associative prediction is performed. During the next example the knowledge which was learned from the previous example is reused. This procedure is repeated with the remaining examples 340 times. After learning of the 340 examples, the weights of the prediction associative memory were frozen and its quality is determined by the mean values of the required steps and the backtracking steps for the 34 possible di5erent tasks. This experiment is repeated 20 times, whereby at the beginning of each procedure the prediction memory is empty. For each procedure di5erent random sequences are used, the results are shown in the Table 3. A division of the learning quality has been carried out in perfect learned, learned, bad learned and not learned, depended on the performed steps and backtracking steps. The prediction associative memory learned from experience perfectly to build a CBA tower starting from 34 di5erent initial states if the number of the required backtracking steps was zero or nearly zero. This is because in the learning strategy backtracking steps should be avoided. In Fig. 18 we see the best possible solution the task unstack the ABC tower and stack the CBA tower after such perfect learning.
758
A. Wichert / Neurocomputing 48 (2002) 741–762
Table 3 Mean steps and mean backtracking steps of 20 experiments. Division in perfect learned, learned, bad learned and not learned is performed depending on the performed backtracking steps Steps
Backtracking
Perfect learned
15.5882 15.6471 15.9412 16.2941 16.5294 17.4706 17.8235 18.2941
0.117647 0 0 0 0 0.117647 0.117647 0
Learned
28.9412 30.8235 33.1765 34.7647 36.0000
2.44118 3.29412 3.29412 3.82353 4.97059
Bad learned
41.8824 43.2353 43.8235 45.8235 47.1765 48.0588
6.73529 6.76471 8.73529 7.91176 10.9118 8.88235
Not learned
53.7059
13.0882
Obviously the order of the presented tasks plays an important role in the learning of associative prediction from experience. With the right order the catastrophic interference [21,32,9] problem can be prevented. With the wrong order, the prediction associative memory cannot recover from the catastrophic interference, despite the anti-Hebb rule. This heuristic procedure is called the generate and test method [38]. Generate and test method [38] is procedure to 4nd the right order of examples. Repeated random sequences of examples are learned. After each experiment the quality of the learned model is determined and the weights of the prediction associative memory are secured. After a suEcient number of experiment the weights corresponding to the best quality of the model are chosen. 4. Conclusion 4.1. Implications The division of the behavior of the model into basic behavior and additional property behavior resulting from the pattern and prediction heuristic corresponds
A. Wichert / Neurocomputing 48 (2002) 741–762
759
Fig. 18. Planning of the task unstack the ABC tower and stack the CBA tower. The best possible sequence of moves was found after perfect learning. Initial state ABC tower, desired state CBA tower. Planning sequence is shown line by line from left to right. Fourteen steps were needed.
to the Michalskis two tier philosophy of concept meaning [22,23,17]. The basic behavior of the models corresponds to the behavior of a symbolic production system which performs a depth-4rst search strategy. It di5ers signi4cantly from the symbolic approach by learning using the associative memory and distributed representation. It was shown how learning from experience is performed. Many states which di5er only slightly describe di5erent problems. Despite this fact, the prediction associative memory extracts the relevant knowledge. Due to this, many learning sequences are needed until the stabilization of the behavior of the model. The order of the presented tasks plays an important role in the learning of associative prediction from experience. With the right order the catastrophic interference problem can be prevented.
760
A. Wichert / Neurocomputing 48 (2002) 741–762
Table 4 Thirty-four examples were presented in the block worlda Examples
dfs
fo
foc
gt
34
62
40.52%
46.48%
74.87%
a Depth-4rst
search strategy dfs which corresponds to the behavior of the symbolic problem solver needed on average 62 steps. Learning with 4xed order fo brought a signi4cant improvement of 40.52%. Learning with 4xed order and context foc brought a signi4cant improvement of 46.48%. The best results of generate and test (gt) procedure brought a signi4cant improvement of 74.8%.
4.2. Power analysis The characteristics of the introduced model which is composed of two of associative memories are summarized by the following features: • Representation: Allows the structured representation and the de4nition of simi-
larity.
• Permutation associative memory: Shows how to store and execute associations
which describe actions in the block world.
• Prediction associative memory learns from experience to speed up the search
signi4cantly.
• Catastrophic interference can occur during the learning of the prediction asso-
ciative memory. This problem can be weaken by context.
• Generate and test [38] heuristic procedure 4nds the right order of example. By
the right order the catastrophic interference problem can be prevented.
4.3. Results Learning from experience results from the distributed representation. It was shown by experiments that learning from experience needs signi4cantly fewer steps than the symbolical depth-4rst search strategy (see Table 4). 4.4. Epilogue The computational task concerning problem solving in our model corresponds to the manipulation of pictures. A problem is described by the associations in the long-term memory which is represented by permutation associative memory, by the initial state, and by the desired state. The solution to the problem is represented by a chain of the associations which successively change the state from the initial state to the desired state. The basic behavior of our model corresponds to the behavior of a symbolic production system which performs a depth-4rst search strategy. However, the representation of states through pictures enables the access of knowledge which was formed by learned experience during problem solving. The learned knowledge speeds up the search for related problems signi4cantly, but only when the examples are presented in the right ordering during learning. Context can soften a little the
A. Wichert / Neurocomputing 48 (2002) 741–762
761
problem of the catastrophic interference, but catastrophic interference can be only prevented by the right order of the presented examples. Acknowledgements The author would like to gratefully acknowledge the reviewers of neurocomputing for their valuable suggestions. References [1] J.A. Anderson, An Introduction to Neural Networks, The MIT Press, Cambridge, MA, 1995. [2] J.R. Anderson, Cognitive Psychology and its Implications, 4th Edition, W.H. Freeman and Company, San Francisco, CA, 1995. [3] G. Andreadakis, Neuronale Assoziativspeicher als plausible Zuggeneratoren, Master’s Thesis, UniversitLat Ulm, Ulm, Germany, 1994. [4] A. Bieszczad, Neurosolver: a neural network based on a cortical column, in: Proceeding of the World Congress on Neural Networks WCNN’94, San Diego, CA, INNS Press, Lawrence Erlbaum Associates, London, 1994, pp. 756 –761. [5] A. Bieszczad, Neuromorphic distributed general problem solvers, Ph.D. Thesis, Carleton University, Ottawa, Ont. Canada, 1996. [6] L. Bolc, G. Bradshaw, P. Langley, R. Michalski, S. Ohlsson, L. Rendell, H. Simon, J. Wolf (Eds.), Computational Models of Learning, Springer, Berlin, 1987. [7] P.S. Churchland, T.J. Sejnowski, The Computational Brain, The MIT Press, Cambridge, MA, 1994. [8] J. Ferber, Les SystUemes Multi-Agents: Versus une intelligence collective, Inter Editions, Paris, 1995. [9] R.M. French, Catastrophic forgetting in connectionistic networks, Trends Cognitive Sci. 3 (4) (1999) 128–135. [10] J. Fuster, Memory in the Cerebral Cortex, The MIT Press, Cambridge, MA. [11] P. GLardenfors, Conceptual Spaces, the Geometry of Thought, The MIT Press, Cambridge, MA, 2000. [12] C. Gross, Mishkin, The neural basis of stimulus equivalence across retinal translation, in: S. Harnad, R. Dorty, J. Jaynes, L. Goldstein, Krauthamer (Eds.), Lateralization in the Nervous System, Academic Press, New York, 1977. [13] R. Hecht-Nielsen, Neurocomputing, Addison-Wesley, Reading, MA, 1989. [14] W. James, Psychology, the Briefer Course, University of Notre Dame Press, Notre Dame, Indiana, 1985 (originally published in 1892). [15] T. Kohonen, Self-Organization and Associative Memory, 3rd Edition, Springer, Berlin, 1989. [16] S.M. Kosslyn, Image and Brain, The Resolution of the Imagery Debate, The MIT Press, Cambridge, MA, 1994. [17] M. Kubat, Second tier for decision trees, in: Proceedings of the 13th International Conference on Machine Learning, ICML’97, Morgan Kaufmann, Los Altos, CA, 1996. [18] J.F. Laird, A. Newell, P.S. Rosenbloom, SOAR: an architecture for general intelligence, Artif. Intell. 40, 1987. [19] G.F. Luger, W.A. Stubble4eld, Arti4cial Intelligence, Structures and Strategies for Complex Problem Solving, 3rd Edition, Addison-Wesley, Reading, MA, 1998. [20] M. Marcinowski, Codierungsprobleme beim Assoziativen Speichern, Master’s Thesis, FakultLat fLur Physik der Eberhard-Karls-UniversitLat TLubingen, 1987. [21] M. McCloskey, N. Cohen, Catastrophic interference in connectionist networks: the sequential learning problem, Psychol. Learning Motivation 24 (1989) 109–165.
762
A. Wichert / Neurocomputing 48 (2002) 741–762
[22] R. Michalski, How to learn imprecise concepts: a method employing a two-tiered knowledge representation of learning, in: Proceedings of the Fourth International Workshop on Machine Learning, Irvine, CA, 1987, pp. 50 –58. [23] R. Michalski, Two-tiered concept meaning, inferential matching and conceptual cohesiveness, in: S. Vosniadu, A. Ortony (Eds.), Similarity and Analogy, Cambridge University Press, Cambridge, 1989. [24] S. Minton, Learning Search Control Knowledge: An Explanation-Based Approach, Kluwer Academic Publishers, Boston, 1988. [25] T. Mitchell, Machine Learning, McGraw-Hill, New York, 1997. [26] T. Mitchell, R. Keller, S. Kedar-Cabelli, Explanation-based generalization: A unifying view, Machine Learning 1 (1) (1986) 47–80. [27] A. Newell, Uni4ed Theories of Cognition, Harvard University Press, Cambridge, MA, 1990. [28] N.J. Nilsson, Arti4cial Intelligence: A New Synthesis, Morgan Kaufmann, Los Altos, CA, 1998. [29] G. Palm, Neural Assemblies, an Alternative Approach to Arti4cial Intelligence, Springer, Berlin, 1982. [30] G. Palm, Assoziatives GedLachtnis und Gehirntheorie, in: Gehirn und Kognition, Spektrum der Wissenschaft, 1990, pp. 164 –174. [31] M.I. Posner, M.E. Raichle, Images of Mind, Scienti4c American Library, New York, 1994. [32] R. Ratcli5, Connectionist models of recognition memory: constraints imposed by learning and forgetting functions, Psychol. Rev. 97 (1990) 285–308. [33] S.J. Russell, P. Norvig, Arti4cial intelligence: a modern approach, Prentice-Hall, Englewood Cli5s, NJ, 1995. [34] L.R. Squire, E.R. Kandel, Memory. From Mind to Moleculus, Scienti4c American Library, New York, 1999. [35] K. Steinbuch, Die Lernmatrix, Kybernetik. 1 (1961) 36–45. [36] D.G. Stork, Scientist on the set: an interview with Marvin Minsky, in: D.G. Stork (Ed.), HAL’s Legacy 2001’s Computer as Dream and Reality, The MIT Press, Cambridge, MA, 1997 (Chapter 2). [37] J. van Hemmen, Hebbian learning and unlearning, in: W. Theumann (Ed.), Neural Networks and Spin Glasses, World Scienti4c, Singapore, 1990. [38] P.H. Winston, Arti4cial Intelligence, 3rd Edition, Addison-Wesley, Reading, MA, 1992. Andrzej M. Wichert studied computer science at the University of Saarland, where he graduated 1993. Afterwards, he was a Ph.D. student at the Department of Neural Information Processing, University of Ulm. He received a Ph.D. in computer science 2000. He is now a researcher at an interdisciplinary group, Department of Psychiatry III and the Department of Neural Information Processing, University of Ulm.