Artijicial Intelligence in Engineering 10 (1996) 61-70
Copyright 0 1995 Elsevier Science Limited Printed in Great Britain. All rights reserved 09561810/96/$15.00
0954-1810(95)00016-X ELSEVIER
A high school project on artificial intelligence in robotics S. C. Fok 8~ E. K. Ong School of Mechanical and Production Engineering, Nanyang Technological
University, Nanyang Avenue, Singapore
(Received 3 October 1994: accepted 6 June 1995)
In this paper, we describe the development of an artificial neural network strategy for an industrial robot to play the game of Tic-Tat-Toe. This project was undertaken by two high school students during the university’s technology and engineering research programme. The strategy is based on the feedforward multilayered neural networks with backpropagation of error training. The performance of the strategy is evaluated by its accomplishment against human opponents. The results indicate that the neural network strategy developed will almost always win if given the opportunity and at most draw if not. The neural network strategy developed has been successfully interfaced with a Scorbot-ER VII robot via an in-house designed electronic game board. Key wora!s: neural network, Tic-Tat-Toe,
error training.
was ‘a game playing robot using neural network’. This
1 INTRODUCTION
project concerns the application of artificial intelligence to robotics. This is an area of growing importance. To meet new levels of productivity and quality in manufacturing, there is a growing demand for smart robots that can maintain satisfactory performance under a changing environment. In such cases, it is highly desirable to equip robots with intelligence so that they can manipulate through the complex uncertainties generated by the changing environment. The objective of ‘a game playing robot using neural network’ is to develop a robotic system using the neural network approach so that the manipulator can play TicTat-Toe against a human opponent. This project was undertaken by two high school students. In this paper, we present the development of this robotics project using only feedforward multi-layered neural networks. The paper first gives an overview of the characteristics of Tic-Tat-Toe. This is followed by a presentation on the development of the neural network strategies for playing Tic-Tat-Toe. Finally, the implementation and the performances of the robotic system developed are examined and discussed.
Since 1988, Nanyang Technological University has been organizing a technological and engineering camp (twice
a year for high school students. These camps were part of the university’s continuing effort to promote engineering as a career among high school students in Singapore. In 1994, the scope of the technological and engineering camp was enlarged to the technology and engineering research programme for high school students. The main focus of the research programme is to nurture local talent by raising the level of intellectual training for selected students. This programme spans from April to September in which participants were first required to build up the necessary base-line knowledge on a chosen area of study by having discussions with the. relevant staff and being shown how to tap into the existing literature and data-bases on the subject. During the June school vacation, the participants attended a 3week intensive residential camp in which they performed the detailed investigative, experimental and analytical work in the engineering science laboratories. Through this arrangement, the programme aims to give high school students first hand experience in conducting real life engineering research and expose them to the research methodology, which ranges from an understanding of the simple physical theories to the excitement of discovering their application in practice. One of the projects offered in the programme of 1994
2 TIC-TAC-TOE Game playing programs have always played an important role in the development of artificial intelligence.iY2 61
62
S. C. Fok, E. K. Ong
Tic-Tat-Toe is a game which can be easily learned by children. To win, lose or draw, a player has to make the right decisions based on the opponent’s moves. The game requires two players and is played on a 3 x 3 game board with nine squares. Each player will take turns to occupy a desired square on the board. Noughts and crosses are used to differentiate the occupied squares between the two players. Each square can be occupied by only one player. The objective of the game is to form a straight line with three similar markers in any direction (i.e. horizontally, vertically, or diagonally). Once this goal is achieved, the player wins and his opponent loses. If this goal cannot be achieved after all the squares are occupied, the game is declared a draw. Tic-Tat-Toe has the following charcteristics: (1) There is a definite rule of defining the end of the game; it stops when all nine squaies are occupied or whenever a player wins the game. (2) The game must end after a finite number of moves. With a 3 x 3 game board, the game must end after the two players have made a combined total of nine moves. (3) The outcome of the game is deterministic and it can be a win, lose or draw. (4) There are no decisions in the game which are determined by chance, e.g. the players do not make a decision based on the outcome of a roll of dice. (5) Each player has perfect information about the state of the game to date, i.e. neither player can hide anything about decisions he has already made. Tic-Tat-Toe can therefore be classified as a finite, no chance, perfect information game. The simplicity of the game together with its rudimentary characteristics make Tic-Tat-Toe suitable for illustrating the basic concepts in artificial intelligence. In this aspect, the high school students are not hindered by any complexities in the game. Hence, they could focus their attention on the decision-making process of the neural network techniques. This makes the project ideal as a test bench for students to conduct a hands-on exercise so as to appreciate the fundamentals in artificial intelligence.
3 NEURAL NETWORK
APPROACH
One of the artificial intelligence techniques that has gained much attention in the last decade is the artificial neural network.3’4 The artificial neural network is specially developed to imitate the way in which a human brain works. After all, the ability to decide has always been credited to the human brain. Like the human brain, the basic building block of an artificial neural network is the processing unit shown in
Axons from other Neurons
‘I
T
Dendrites
fp<
Body Wti Synapses Fig. 1.
Processing unit.
Fig. 1. The operations of the biological neuron and processing unit are very similar. A processing unit accepts a set consiting of n inputs, X={x1,...,x,} Each input is multiplied by a corresponding which together forms a set
weight,
w = {Wi,...,W,} All the weighted inputs are then summed to form the weighted combination Z, where n
z=
WlXl
+
. . . +
w,x,
=
c
WiXi
=
X’W
i=l
The summation is then further processed by an activation function F(2) to produce only one output signal y, i.e. y = F(Z)
= F
=
\
i=l
F(X-W)
/
Commonly used activation functions include the linear and sigmoid functions. Many processing units can be inter-connected to form an artificial neural network. An artificial neural network can exhibit a surprising number of characteristics that the brain processes. It can generalize from past experience, recall knowledge from memory, and extract essential information from noisy inputs. The behaviour of artificial neural networks depends on the architecture of the overall network and its properties are closely associated with the type, quantity and arrangement of the processing units. The weights in a processing unit can be fixed or adaptive. Neural networks can be classified based upon the adaptation of the weights. Supervised learning occurs when the network is supplied with both the input set and the correct output values, and it adjusts its weights to eliminate the error between the actual and the correct outputs. Unsupervised learning occurs when the network receives only the inputs and must self-organize by adjusting its weights according to a well-defined algorithm. There is a
63
High school project on AI variety of ‘learning laws’ associated with both supervised and unsupervised learning. Backpropagation of error is the most widely used learning method for feedforward multi-layered network. A feedforward network consists typically of at least one hidden layer as illustrated in Fig. 2. In a feedforward network, the input layer of processing units is connected with every processing unit in the next layer, and so on until the output layer produces the network’s outputs. There are no interconnections between processing units in a layer. The backpropagation of error learning law is simply a gradient descent attempting to minimize the mean squared error with respect to the weights. The supervised learning is described as follows: (1) Initialize the weights to small random values. (2) Present a training input vector Xi = . . ,xF} to the input layer and its associated desired output vector at the output layer, i.e. Yr = {fl, . . . ,$} assuming that there are m layers. (3) Propagate the input signals forward through the network, layer by layer. For the ith processing unit in the nth layer, where n = 1, . . . , M,
{xy,.
xf=F
c$-‘n$ ( i
)
where #j is the wieght from node j in the (n - 1)th layer to node i in the nth layer, and F( - ) is the activation function. (4) Obtain the set of final output Xr = {$, . . . , A$‘} at the output layer. (5) Compute the error for each processing unit in the output layer, i.e. ey = JJ~ - $(j
= 1,. . . ,p)
(6) Compute the deltas for the output layer
*
.
0j
.
Fig. 2. Feedforward multi-layered neural network.
where
(7) Propagate the errors backwards by finding the deltas for the preceding layers, i.e. for n=m,ml,..., 2, find
(8) Modify each weight as follows: 4 = 4 + A$j where A$
= C,$-’
where C is called the learning rate. (9) Repeat the procedure for the next training pattern. A summary and critical analysis of the neural network applications in robotics can be found in the work of Home et d5 Utilization of the artificial neural network methodology to Tic-Tat-Toe can be found in the work by Rumelhart and McClelland6. The concept behind the works by Rmelhart and McClelland6 is based on the notion of thought as a simulation in a human brain. This is self explanatory if one tried to play the game of Tic-Tat-Toe. The very Grst step that the player would do is to try to win the game with the next move. If that is not possible, then the player would simulate the opponent’s future move and check for any possibilities of losing the game. If found, the player would try to block his opponent’s path. Otherwise, the player would make another move so as to create as many winning opportunities as possible. The above concept was adopted for implementation (‘method 1’). Two conceptual game boards Bi and B2 are used in this approach. The board B1 is used to track the player’s moves while B2 is used to record the robot’s play. Each conceptual board is joined to a set of eight pattern detectors (Fig. 3(a)). Each pattern detector consists of a feedforward neural network with three inputs, a hidden layer with 16 hidden neurons, and one output. The set of three inputs to the neural network correspond to the set of three squares on B1 or B2 that constitute a winning condition (i.e. three consecutive squares, either vertical, horizontal or diagonal). Input values are either 1 or 0, which respectively represents an occupied or unoccuppied square. The single output is an assigned value (ranging from 0 to 1) to be allocated to all the three input squares. It is called the priority value as it gives an indication of the priority for occupying any of the three squares. The pattern detectors are trained
64
S. C. Fok, E. K. Ong
2nd row detector
Neural’s
Board
ght diagonal detector
td row detector 2nd column detector Fig. 3(a)
‘Method 1’ conceptual boards to pattern detectors.
off line with the backpropagation of an error learning rule based on the training data set shown in Table 1. The number of pattern detectors connected to a square on B1 (or B2) corresponds to the number of possible winning states associated with that square (Fig. 3(b)). For example, the centre square will be connected to four pattern detectors while each corner square is only connected to three pattern detectors. For the robot to decide on its next move, the total priority values associated with each square on the physical game board (obtained by combining B1 and B2) have to be considered. To avoid making the next move on an occupied square, an inhibitory signal (i.e. bias with a very large negative weight) is employed to penalize the total priority for that square. The program compares the
final priority values for all squares and will make its next move on the square with the highest priority. The advantage of ‘method 1’ is that the training set for the pattern detectors only consists of eight possible situations. Furthermore, the last situation shown in Table 1 need not be considered as it means that either the robot or player has won. With this small training data set, the mapping with neural network is comparable easy. The main disadvantage of the approach is that the physical board is divided into two conceptual boards. As such, the robot may lose sight of the overall situation and it will not be able to make the best possible move. With the shortcomings of ‘method l’, alternate strategies using a multi-layered feedforward neural
Table 1. Training ‘metbad 1’
Inputs
Fig. 3(h) Possible winning states.
set
for
output
0 1 0 0 1
0 0 1 0 1
0 0 0 1 0
0
1 0 1
1 1 I
1 1
data
0.1 0.2 0.2 0.2 0.9 0.9 0.9 0
65
High school project on AZ
1
)
4
I
2
1
5
I
6
I g
7 I 8 Fig. 4(a)
3
‘Method 2’ game board label.
network were investigated. Initial investigation centres around the following approach called ‘method 2’. The game board is divided into nine squares as shown in Fig. 4(a). Each square now acts as an input to a neural network. Corresponding to the nine squares, there are nine inputs in the neural network. Input values can be either 1.0, 0.5 or 0.0 where 1-O represents a cross, i.e. the square has been occupied by the robot; l 0.5 represents a nought, i.e. the square has been occupied by the player; 0 0.0 represents an empty square.
l
The neural network has one hidden layer and one output. An output value, which ranges from 0 to 1 (in steps of O*l), represents the appropriate square for the robot’s next move. For example, in the game board situation described in Fig. 4(b) with the robot to make the next move, it is obvious that the robot should occupy square number 3 to prevent the player from winning. To represent the next move is to be made on square number 3, the desired output value is 0.3. Based on the above assignment of input and output combinations, a training data set was generated. The network was simulated using the commercial software Neural Works Professional Plus on a Sun workstation. The network was examined by varying the number of hidden neurons from 9 to 12. For all cases, the weights are first randomized to values ranging between O-5 and -0.5 and then trained using the backpropagation of the error learning rule. Based on the training time, the RMS
X
0
0
error computed between the actual and desired output, and the performance of the trained network, it was found that ‘method 2’ is not suitable for implementation. Evaluation of the trained network showed that it can memorize but not generalize. The main reason for this is that for method 2 to generalize, we need to use as many training data as possible. There are a total of 39 possible combinations of training data. Obviously, generating all these possible combinations is highly inefficient. Nevertheless, ‘method 2’ is attractive as it retains the general structure of the physical board. Since ‘method 2’ was deemed unsuitable, ‘method 3’ was devised. It is based on combining ‘method 2’ with ‘method 1’. Instead of dividing the game board into two conceptual boards, method 3 uses only one board. It utilizes the pattern detectors concept of ‘method l’, i.e. the board consists of eight pattern detectors. Each pattern detector consists of a feedforward neural network with three inputs, a hidden layer, and one output. The three inputs to the neural network correspond to the three squares on the physical board that constitute a winning condition (i.e. three consecutive squares, either vertical, horizontal or diagonal). Input values are either 1, 0.5, or 0, where 1.0 represents a cross, i.e. the square has been occupied by the robot; l 0.5 represents a nought, i.e. the square has been occupied by the player; 0 O-0 represents an empty square. l
Table 2. Training data set for ‘method 3’
Inputs 1 1 0 0.5 0.5 0 1 0 0 0 0.5 0 0 1 0.5 1 0.5 0 0 1 0.5 1 0.5 0.5 1 o-5
Fig. 4(b)
A game situation for ‘method 2’.
1
output 0 1 1 0 0.5 0.5 0 1 0 0 0 0.5 0 0.5 1 0 0 0.5 1 1 0.5 0.5 1 1
1 0 1 0.5 0 0.5 0 0 1 0 0 0 0.5 0 0 0.5 1 1 0.5 0.5 1 1 0.5 1
1 1 1 0.5 0.5 0.5 0.2 0.2 0.2 0.1 0.05 0.05 0.05 0 0 0 0 0 0 0 0 0 0 0
o-5 0.5
0.5 0.5
0 0
1
1
0
Mapping 1 1 1 0.5 0.5 0.5 0.2 0.2 0.2 0.1 -0.05 -0.05 -0.05 -0.1 -0.1 -0.1 -0.1 -0.1 -0.1 -0.1 -0.1 -0.1 -0.1 -0.1 -0.1 -0.1 -0.1
66
S. C. Fok, E. K. Ong 0
0
#
#
Ial
Ial
0 X
# lb1
0 X 0
# Lcl
x
0
X
0
X
X
0
#
0
#
19
I4
Fig. 5. ‘Method
Fig. 6. ‘Method
1’ case 1.
The single output or priority value is an assigned value (ranging from 0 to 1) to be allocated to all the three input squares. The pattern detectors are trained off line with the backpropagation of error learning rule based on the training set shown in Table 2. There are
3’ case 1.
altogether 27 possible different training data, but the last two are redundant as they would mean the end of the Tic-Tat-Toe game. The priority values of those situations whereby the robot would draw and cannot win are further mapped onto their negative equivalence
Table 3. Overview of the three methods ‘Method
1’
No. of neural networks: No. of inputs to each neural network
16 (for 2 boards)
Input values No. of output from each neural network Output value range Output represents Mapping after neural network No. of hidden layers used No. of hidden neurons Overall effectiveness
30, 11
1 o-1 Priority value for square No 1 8-16 Satisfactory
‘Method
2’
1 (for 1 board) 9 {O, 0.5. 1) 1 o-1 Position for next move No 1 4416 Bad
‘Method
3’
8 (for board) 3 VA 0.5, 1) 1 O-1 Priority value for square Yes 1 5-12 Good
67
High school project on AI
X
# Ial
X 0
#
Ibl
PI
#x
X
1
2
3
0
4
9
6
#7
8
9
Ic)
0 X
#x
0
X
0 X
#x
0
x x
#
w
x
0
0 lel
Fig. 7. ‘Method 1’ case 2.
Fig. 8. ‘Method 3’ case 2.
after having been processed by the neural network. This form of penalty ensures that the robot objective is to win the game. If it cannot win yet, it will try to block it opponent’s sure win move first before going on to create
winning opportunities. It will only settle for draw if all else failed. Similar to ‘method l’, the number of pattern detectors connected to a square corresponds to the number of possible winning states associated with that
Table 4. ‘Method 2’ training remIts Weights initialized at f0.3 No. of hidden neurons
4
6
8
10
12
14
16
RMS error after 250 000 cycles
0.13
0.15
0.14
0.15
0.13
0.15
0.14
Fixed no. of hidden neurons
= 12
Initialized weights
f0.1
f0.15
f0.2
f0.25
f0.3
f0.35
f0.4
RMS error after 250 000 cycles
0.14
0.17
0.13
0.15
0.15
0.13
0.13
68
S. C. Fok, E. K. Ong
square. Hence, the robot has to compute the total priority value for each square. Inhibitory signals are again used to prevent moves on already occupied squares. The robot will always take the highest priority square as its next move.
4 DISCUSSION AND IMPLEMENTATION Table 3 gives an overview of all three methods presented. There was no optimum number of hidden neurons for neural networks of ‘method 1’. The training time and the RMS error computed for 9-12 hidden neurons were similar. Furthermore, the overall performance of ‘method 1’ does not seem to be affected by varying the number of hidden neurons. Table 4 shows some results for ‘method 2’. From the evaluation of the performance, ‘method 2’ was found to be unsatisfactory and unsuitable for use. From 4 to 16 hidden neurons, the neural network of ‘method 2’ had trouble converging to an RMS error of less that 0.13 after about 250000 training cycles. Restricting the
User
III, Elcdronlc gamebawd
Fig. 9. Schematic
diagram
of implementation.
network to 12 hidden neurons but with a different range for weights initialization (from fO.1 to f0.4) shows no sign of the lowering of the RMS error after 250000 training cycles. Moreover, the poorly trained network can only memorize but cannot generalize. This, as indicated earlier, may be overcome by increasing the number of training date (which is not economical).
+v
+v RS 307-064
RS 307-064 n
+v
RS 307X64 input #I 1 Robot input # 2 m Signal output 4 u to robot Robot input # 4” controller u L7
input # 3 ’
Robot input # 5
1-\ Fig. 10. Input interfacing
circuit.
&-----+... Signal output to robot controller
69
High school project on Al Table 5.
‘Method 3’ training results
Weights initialized as f0.5 No. of hidden neurons
5
7
8
10
12
No. of cycles to reach RMS error of 0.00005
136932
75 849
74 586
80 155
101775
The performance of ‘method 3’ was evaluated with respect to the number of hidden neurons (5-10) in the neural networks. Table 5 shows some training results for weights initialized in the range of 410.5. Evaluation in terms of the number of training cycles before achieving an RMS error of less than 0.00005 shows that eight hidden neurons give the best results. To evaluate the performances of ‘method 1’ and ‘method 3’, test programs were written in turbo C. The programs employed the feedforward multi-layered neural network using the backpropagation of error learning. Results of the programs indicated that ‘method 3’ is feasible for implementation. The overall effectiveness of ‘method 1’ and ‘method 3’ were evaluated by their accomplishment against human opponents. Many games were played and all methods , +5v
+5v
+vs
I
were thoroughly tested. In all cases, ‘method 3’ had the best accomplishment. It will almost always win if given the opportunity and at most draw if not. No combination of moves tested had resulted in a defeat for ‘method 3’. Comparing ‘method 3’ with ‘method l’, the results indicate that ‘method 3’ performs better. Figures 5 with 6 and Figs 7 with 8 show two such cases for comparison. Figure 5(a)-(e) shows the series of moves played using ‘method 1’ while Figs 6(a)-6(e) show the same play with ‘method 3’. Figure 5(a) shows the player making the first move on square number 1. Using ‘method l’, the robot would reply by occupying the centre square (Fig. 5(b)). Next, the player put a nought on square number 9 as shown in Fig. 5(c). With this series of play, the next move by the robot is vital. If the robot will occupy either square number 3 or 7, it will definitely lose the game. +vs
I RS 307-064
RS 307AX4
I
SWI
4
L
SW3 sw*
loo0 ghms c
SW8
Fig. 11. Output interfacing ctrctut.
swl and sw2 arc interlocking signals from robot controller. Refer to controller manual for detailed connections
70
S. C. Fok, E. K. Ong
Figure 5(d) shows that this is the case with ‘method 1’, i.e. the robot moves to square number 3. Figure 5(e) shows the player creating a sure win situation by occupying square number 6. Figure 5(a)-(e) shows that ‘method 1’ is not intelligent enough to force for a draw. Figure 6(a)-(e) shows that if the same course of play is applied to ‘method 3’, the robot is intelligent enough to prevent the opponent from having the upper hand. The robot will force a draw if play is continued after the situation shown in Fig. 6(e). Figures 7 and 8 show another series of play demonstrating that ‘method 3’ is indeed better than ‘method 1’. Figure 7(a) shows the robot having the first move on the centre square using ‘method 1’. The player then moves onto square number 8. Figure 7(c) shows the next move by the robot is on square number 7. In this case, the player will have to block by occupying square number 3. The player has committed a mistake in the opening moves. If the robot’s next move is to ocupy square number 1, it will create a sure win situation. However, ‘method 1’ is unable to capitalize on the mistake. Instead, Fig. 7(d) shows that the robot’s next move is onto square number 9. The game ends in a draw. Figure 8(a)-(e) shows that the robot is able to keep its under hand and go for a win if given the same series of play. ‘Method 3’ was eventually implemented and interfaced to a Scorbot-ER VII robot via an in-house designed electronic Tic-Tat-Toe game board. Figure 9 shows the schematics of the implementation. The communication between the neural network computer and the robot controller is accomplished by means of the PCL 720 digital input/output and counter card. This PC add-on card offers 32 digital input channels, 32 digital output channels and 3 timer counters. All digital input and output channels are TTL compatible. Each digital input or output channel corresponds to a certain bit of the I/O port of the PC. The I/O port base address is selectable by setting a DIP switch. Valid addresses range from hex 200 to hex 3F8. The base address used for this project is set at hex 2A0. Four of the I/O port address starting from the base address are reserved for accessing the I/O channels. To ensure smooth flow of the game, the electronic game board must handle communications between the user, neural network program and the robot controller. The player must tell the computer that he has finished his move. The computer must be able to sense where the player had made his move on the electronc game board. This information must be relayed to the neural network program which will then respond by instructing the robot controller to make the counter moves. All
these necessary handshaking and information flow are performed by the PL 720 together with the game board input and output interfacing circuits. These circuits are shown in Figs 10 and 11. The input interface sense the status of the electronic game board using micro-limit switches. The signals are transferred to the computer via an optoisolator to eliminate high potential dange. One advantage of using the nine limit switches is that the control program can check the status of these switches during the start of the game to ensure that all the seeds have been cleared. Two additional switches on the game board have been installed to allow the program to communicate with the user. During a game, interlocking between the computer and the robot controler is essential and this is accomplished with two switches, which controlled the robot ready signal and the robot read ready signal respectively.
5 CONCLUSIONS In this paper, we describe the development of an artificial neural network strategy for an industrial robot to play the game of Tic-Tat-Toe. The performance of the strategy is evaluated by its performance against human opponents. The results indicate that the neural network strategy developed will almost always win if given the opportunity and at most draw if not. Although the work presented in this paper is aimed towards a game playing robot, the overall neural network strategy developed has potential for implementation in maufacturing processes with a similar environment, i.e. conditions whereby the process is governed by fixed rules but with uncertain inputs.
REFERENCES 1. Levy,
D. M. L. Computer
Games
II.
Springer-Verlag,
Berlin, 1988. 2.
Samuel, A. L. Some studies in machine learning using the game of checkers. IBM Journal of Research and Development, 19.59,3, 21 l-29. 3. EBerhart, R. C. & Dobbins, R. W. Neural Network PC Tools - A Practical Guide. Academic Press, London, 1990 4. Simpson, P. K. Artificial Neural Systems: Foundations, Paradigms, Applications and Implementations. Pergamon Press, Oxford, 1990. 5. Horne, B., Jamshidi, M. & Vadiee, N. Neural networks in robotics: a survey. Journal Systems, 1990, 3, 51-66.
of Intelligent
and Robotic
6. Rumelhart, D. E. & McClelland, J. J. Parallel Distributed Processing Explorations in the Microstructure Cognition, Vol. 2. MIT Press, Cambridge, MA, 1988.
of