ELSEVIER
BioSystems 35 (1995) 219-222
The information
generated in a man-to-man “Renju (Go-bang)”
Takashi
Nakamura
*a, Yukio-Pegio
game called
Gunjib
“Graduate School of Science and Technology, Kobe University, Nada, Kobe 657, Japan bDepartment of Earth-Planetary Science, Faculty of Science, Kobe University, Nada, Kobe 657, Japan
Abstract
A learning experiment was designed using “Renju (Go-bang)“. The matches could proceed with prior indefiniteness, distinct from probability, as under the finite VOP (velocity of observation propagation). The information of each situation in the first game S,, and one in the replay S,, were investigated with the basic strategy derived from the “Renju” rules. The behavior of the difference (S,, (i)-S,, (i)) suggested that prior indefiniteness turned into definiteness, suggesting that perpetual decision change occurred and that it perpetually allowed observers to construct higher levels of hierarchical learning logic. Keywords:
Games; Learning; Observation;
1. Introduction
“Renju(Go-bang)”
(“Renju”)
We designed a learning experiment using “RenJu (Go-bang)” which is one of the most familiar games like Chess, Othello or Go, and based the experiment on the dependency of the driving force of learning on the traced past orbit (Nakamura and Gunji, 1993; 1994). And we defined the generated information as the appearance of moves which was previously indefinite. Our aim was to estimate and quantitate the degree to which decision making proceeds at a finite
* Corresponding author. 0303-2647/95/$09.50
0 1995 Elsevier
SSDI 0303-2647(94)01518-C
VOP (velocity of observation propagation) (Matsuno, 1992). We further discuss the information hidden in decision making. It is too enormous for us to count the number of elementary events when we play “Renju” and make a decision as to the move. Players have to make their moves without evaluation and/or enumeration of all possible orbits, and then make moves by wild guesses. Their decision goes with the indefiniteness resulting from the fact decision making is realized at a finite VOP (Gunji, 1993). Because one cannot prescribe a definite distribution of possible moves, the indefiniteness appearing in “Renju” is not expressed with probability (Gunji et al., 1993). The features of “Renju” are
Science Ireland Ltd. All rights reserved
220
T. Nakamura, Y.-P. Gunji 1 BioSystems 35 (1995) 219-222
(1) the proceeding of the game is completely open to both players and external observers, (2) no probable choice (e.g. usage of a dice) intervenes the game, and (3) all of possible moves can be predicted in their own right, but it is too enormous to enumerate them in finite time intervals. The experimental recipe is: (1) The two men play one match to the end. This match is called the Sample Game (SG). The SG terminal step is Nth step. (2) The same men replay matches starting from the 1st SG situation K times. Next, the same men replay starting from the 2nd SG situation K times. Following similarly, they replay K times starting from the ith SG situation (K 2 1, 1 I i<(N-- 1)). In this work K= 1. In almost all cases of the experiment, the SG matches ended usually in about 20 steps and at most in 60 steps. The progress was irrelevant to the boundary of the board. The games proceeded whether or not there was a boundary. The matches proceeded with the probability that they could follow infinitely. So, the number of the orbits could be approximated as infinity. Even if someone could evaluate all of the possible moves before making a move, both the players and external observers would never evaluate ‘all’ of the orbits. The decision for making moves accompanied indefiniteness beforehand since a person could not identify the number of all of the orbits or trace them. The prior indefiniteness turned into the posterior definiteness after making the moves. So the prior indefiniteness and learning (hierarchy) supplemented each other. 2. Information in the sample game and the replay The information amount of the ith situation in SG, S,,(i) and S,,(i), is expressed as,
where B(i +j) and D(i +j) denote the number of branches in the (i +j)th situation in SG and in the
replay starting from the ith situation of SG, respectively. S,,(i) and S,,(i) approximately denote the information of the ith situation to realize the (i + m + 1)th situation before and after SG, respectively. The analysis was done at m = 3. B or D is counted with a basic strategy. After a player made the xth move, the priority of the (x + 1)th move is as: (1) if the player does not make a defensive move in the xth step for the (x - 1)th move which forms ‘Cren’, the playmate makes a move to form ‘5ren’ in the (x + 1)th step (the number of branches is always one); (2) if the player made ‘4-ren’ in the xth step, the playmate makes a defensive move for it (the number of branches is one); (3) if the player does not make a defensive move in the xth step for the (x - 1)th move which forms ‘Misete’ or ‘3-ren’, the playmate forms ‘4-3 formation’ or ‘Cren’ (the number is one or two); (4) if the player made ‘3-ren’ or ‘Misete’ in the xth step, the playmate makes a defensive move for it (the number is 2 m 5); (5) if some points are earned to make ‘3-ren’, the playmate takes a point from them (the number depends on the situation, but it is not so large, since they are made invalid one after another); (6) if some points are earned to make ‘2-ren’, the playmate takes one of the points (the number depends on the situation and is usually more than 8). The number of the first condition in the strategy is adopted as the number of branches for the next step. The strategy is unspecified but necessary if the players want to win. Such a strategy (or the approximate way to identify the number of branches) can be found in other games. The upper figures of Fig. 1 show three examples for the variation of S,,(i) through i, where shows the time step in SG (1 < i < N). In the first stage of SG (i = 2 N 6), Ss, is large, since the points to make ‘3-ren’ still do not appear and the number of branches is counted under Condition 6 of the basic strategy. In the middle stage, Ss, becomes a little smaller since it is counted under Condition 4, 5 or 6. In the final stage (after the step with the right broken line) in SG, Ss, is saturated since the
Fig. 1. Three examples of S,,(i), S,,(i) and S,,(i) - S,,(i). The horizontal axis is the time step of SG, and N is the terminal step. The right broken line denotes the fatal move step in SG and the left broken line was drawn three steps before the right one. One peak emerges a few steps before the left line and the other is within the two lines.
T. Nakamura,
Y.-P. Gmji / BioSysterns 35 (1995) 219-222
r
N .
i ::
Y
IWDSIS
- I~lmllS
222
T. Nakamura,
Y.-P. Gunji / BioSystems
number of branches is so small under Condition 1, 2 or 3. In almost all of the matches we see the tendency that Ss, is large in the first stage, not so large in the middle stage and saturated in the final stage. The middle figures of Fig. 1 show the variation of S,,(i), which is estimated from the replay of SG used in the upper figures. The tendency is approximately the same as S,,(i). The lower figures of Fig. 1 show (S,,(i) - S,,(i)) through i. In the experiment, 11 examples for SG the replay are obtained and almost all cases we can see a similar tendency. 3. Discussion There were two remarkable points for S,,(i) - S,,(i). One was the large increase and decrease in the positive values within the time steps bounded by the two broken lines. The other was the existence of large peaks in a few time steps before the left broken line’s time step, indicated by arrows. The right broken lines show the time step in SG when the SG winner made so a fatal move that the loser had to continue to defend after that and was defeated in the end. The SG orbit was certainly taken into a local solution. After the fatal time step, the number of branches were counted with the 2nd priority of the basic strategy in the loser’s time step and the 1st or 3rd strategy priority for the winner’s time step. So the number of branches B(i) was necessarily 1 or 2 until the end step. S’,,(i) is the summation from the ith step to the (i + m)th step in SG, where m = 3. Whether or not the SG orbit was already saturated in the time steps bounded with the two broken lines, Ss, decreased through the bounded time steps since the summation was done beyond the step with the fatal moves. S,, did not decrease greatly in the bounded time steps since the replay started before the fatal step. The number of branches was usually counted with the 4th to 6th strategy priority and was more than two, if SG loser tried to prevent the fatal move in the replay. S,, rapidly decreased beyond the step of the right broken line since the orbit was already saturated. The variation of (S,,(i) S,,(i)) suggests the driving force prevented the SG orbit and was always active in the replay. The peak a few steps before the bounded time
35 (1995) 219-222
steps occurred since the SG loser tried to prevent the approach of the local solution actualized in SG. Usually the bifurcation step of a local solution was some steps before the fatal move step. After the SG orbit was traced, the bifurcation step was somehow identified, though the identification generally needed fairly large amounts of computation and it was often performed at a wild guess within finite intervals. The peaks with arrows suggested that the information was generated by the trial to identify the bifurcation step and to prevent or to cause the realization of local solutions. Even if the driving force of learning meant some perturbation, the amplitude was adjusted a posteriori referring to the SG orbit. Our previous work showed that perpetual decision change could occur as the SG proceeded and that it resulted in turning the tables by the less advantageous players (not the loser). The study presented a way to see that the generated information especially emerged (and remarkably) some steps before saturation in the SG local solution. The second peak showed the force preventing SG local solution and the first one showed the force prevent the approach to the solution and the force leading to the solution. So we conjectured that the force to prevent the approach to the solution was active more steps before and so on, though this was not clear far from the local solution. The conjecture suggested that the perpetual decision change was occurring. The decision in the revised move was also performed with prior indefiniteness since the number of orbits was still infinite. Perpetual decision change could mean that observers continued to construct higher levels of hierarchical logic. References Gunji, Y.-P., 1993, Form of Lifexmprogrammability constitutes the outside of a system and its autonomy. Appl. Math. Comp. 51, 19-76. Gunji, Y.-P., Shinohara, S. and Konno, N., 1993, Learning processes based on incomplete identification and information generation. Appl. Math. Comp. 55, 2199253. Matsuno, K., 1992, The uncertainty principle as a evolutionary engine. Biosystems 27, 63-76. Nakamura, T. and Gunji, Y-P., 1994, Biological feature of the learning process in a man-to-man game, submitted to Biosystems. Nakamura, T. and Gunji, Y-P., 1993, Learning process with Decision Change a posteriori, The proceedings of NOLTA’93, Vol. 1, 361-366.