Nested Monte-Carlo Search with simulation reduction

Nested Monte-Carlo Search with simulation reduction

Knowledge-Based Systems 34 (2012) 12–20 Contents lists available at SciVerse ScienceDirect Knowledge-Based Systems journal homepage: www.elsevier.co...

3MB Sizes 2 Downloads 153 Views

Knowledge-Based Systems 34 (2012) 12–20

Contents lists available at SciVerse ScienceDirect

Knowledge-Based Systems journal homepage: www.elsevier.com/locate/knosys

Nested Monte-Carlo Search with simulation reduction Haruhiko Akiyama a,⇑, Kanako Komiya b, Yoshiyuki Kotani b a b

Department of Computer and Information Sciences, Tokyo University of Agriculture and Technology, Tokyo, Japan Institute of Engineering, Tokyo University of Agriculture and Technology, Tokyo, Japan

a r t i c l e

i n f o

Article history: Available online 20 November 2011 Keywords: Nested Monte-Carlo Search All-Moves-As-First (AMAF) Single-player games Morpion Solitaire Random simulation

a b s t r a c t The execution time of Nested Monte-Carlo Search for Morpion Solitaire, a single-player game, increases exponentially with the level of the nested search. We investigated the use of two methods for reducing the execution time in order to enable a deeper nested search: simply reducing the number of lower level searches by a constant rate and using All-Moves-As-First heuristic to the reduction in the number of lower level searches. Testing showed the latter is more effective. Using it, we achieved a new world record of 146 moves for a computer search for the touching version of Morpion Solitaire. Ó 2011 Elsevier B.V. All rights reserved.

1. Introduction Monte-Carlo search, which is a way to use random simulation to identify the apparently best move, has been successfully applied to many games including Go [1] and to single agent problems [2– 4]. Kocsis and Szepesvári proposed a search algorithm called upper confidence bounds applied to trees (UCT) and successfully applied it to several single agent problems [2]. Chaslot et al. applied Monte-Carlo Tree Search (MCTS) to the task domain of production management problems [3]. Schadd et al. proposed applying SinglePlayer MCTS (SP-MCTS), which is a modified version of UCT to the SameGame puzzle [4]. The objective in single agent problems is not to defeat an opponent but to find the most suitable sequence of moves. Morpion Solitaire is one of the single agent problems that has a state space as large as Go, so a good solution is hard to find using the existing search algorithms such as UCT [5]. However, using Nested Monte-Carlo Search [6] in this game produces good solutions. Nested Monte-Carlo Search is a Monte-Carlo method with nested structure. It uses not random player but Monte-Carlo player in the simulation. The depth of recursive call with this method is called ‘‘level’’. Games played using a higher level search produce better results because they use multiplexed Monte-Carlo player in simulation. Tristan Cazenave set a world record for the ‘‘disjoint’’ version of Morpion Solitaire by using a level 4 Nested Monte-Carlo Search. Nested Monte-Carlo Search is also successful for many other problems: SameGame, Sudoku [6], general game playing [7], prime generating polynomials problem, and finite algebra problem [8].

⇑ Corresponding author. E-mail addresses: [email protected] (H. Akiyama), [email protected] (K. Komiya), [email protected] (Y. Kotani). 0950-7051/$ - see front matter Ó 2011 Elsevier B.V. All rights reserved. doi:10.1016/j.knosys.2011.11.015

However, higher level searches take much longer to execute because many more lower level searches are performed. Therefore much higher resource utilization is necessary for searches greater than level 4. Furthermore, the larger the search space, the higher the growth rate of computational complexity. There have been no reports on the use of Nested Monte-Carlo Search to set a world record in the ‘‘touching’’ version of Morpion Solitaire, which has a much larger search space than the disjoint version. Reducing the execution time for a game would enable the use of a higher level search. We investigated two methods for reducing the execution time of Nested Monte-Carlo Search: simply reduce the number of lower level searches by a constant rate and use All-Moves-As-First (AMAF) heuristic to control the reduction in the number of lower level searches. The second method was much better, and with it, we were able to complete a level 5 Nested Monte-Carlo Search for the touching version of Morpion Solitaire and achieved a world record of 146 moves for a computer search. This research produced three significant contributions: (1) application of the AMAF heuristic to Nested Monte-Carlo Search, (2) use of the AMAF heuristic to reduce the number of lower level searches (not increase them), and (3) first application of the AMAF heuristic to a puzzle game.

2. Morpion Solitaire and Nested Monte-Carlo Search 2.1. Morpion Solitaire Morpion Solitaire is a single-player game played on a square grid of an assumed unlimited size with a set of 36 marks on the grid, forming a Greek cross. The player adds a mark to create a new line of five marks in a row with the goal of creating as many new lines as possible. The initial position of Morpion Solitaire is shown in Fig. 1. There are five basic rules. (1) A move consists of

H. Akiyama et al. / Knowledge-Based Systems 34 (2012) 12–20

Fig. 1. Initial position of Morpion Solitaire.

13

Fig. 3. Legal move in the touching version but not in the disjoint version.

Fig. 2. Example move from initial position.

two steps; adding a ‘‘mark’’ and drawing a ‘‘line’’ in the board. (2) The ‘‘line’’ must be drawn through five contiguous marks aligned vertically, horizontally, or along a diagonal of 45°, and one of the marks must be the mark added in that move. (3) The number of type of ‘‘mark’’ is one (for example, black dot mark), and the mark can be added to any point in the grid pattern where two lines intersect that does not have a mark. (4) The line cannot overlap a segment of a line already drawn. (5) The game ends when there are no possible moves, and the total number of moves is the score. This basic game described above is called the ‘‘touching’’ version. The example of a move according to these rules is shown in Fig. 2. The first published world record for the touching version by hand was 149 moves [9], and the record by hand as of July 12, 2010 is 170 moves [10]. The previous published record by computer was 143 moves [11], and the record prior to the ones reported here was 144 moves [9] which is less than that by hand. The state space from the 1st move to the 22nd move is approximately 7.3  1011 positions [9].

Fig. 4. World record 80 moves for the disjoint version [6].

One variation of the game is called the ‘‘disjoint’’ version, which has an additional rule. (6) The endpoints of lines in the same direction must not touch. This reduces the number of possible moves and the size of the state space. An example of a legal move in the touching version but not in the disjoint version is shown in Fig. 3. The world record by computer as of July 12, 2010 of 80 moves for the disjoint version is shown in Fig. 4. The record by hand is 68 moves [9]. 2.2. Nested Monte-Carlo Search As described above, Nested Monte-Carlo Search is a MonteCarlo method with nested structure. Monte-Carlo method is a

14

H. Akiyama et al. / Knowledge-Based Systems 34 (2012) 12–20

Fig. 5. Level 2 Nested Monte-Carlo Search.

technique which simulates the playouts for each possible move by using a random player at each position, compares results, and identifies the apparently best move. Here, let us suppose a method that simulates the playouts by assuming that the player plays each move using Monte-Carlo method, and call it ‘‘Meta-Monte-Carlo method’’. Furthermore, ‘‘Meta-Meta-Monte-Carlo method’’ can be constructed by using Meta-Monte-Carlo method in the simulation. Nested Monte-Carlo Search uses the results of playout by a lower level Monte-Carlo player to evaluate the possible moves. Random playout is called level 0 game, Monte-Carlo search at the next higher level is called level 1 game, and Meta-Monte-Carlo search at the next higher level is called level 2 game. That is, each depth is called ‘‘level’’. The higher the search level is, the more accurate the search and the higher possible score become. A level 2 Nested Monte-Carlo Search is illustrated in Fig. 5. In a two-player game using Monte-Carlo search, the evaluation is based on who has the highest winning rate, but the evaluation is based on who has the highest score in Morpion Solitaire. The sequence of moves which has the highest score at that point is always stored with its score. With this method, the results for higher level searches are certainly better than those for lower level searches. 3. The reduction of the number of the simulation 3.1. Computational complexity and general idea for improvement As described above, using a higher level Nested Monte-Carlo Search results in a higher score. However, computational complexity and execution time explosively increase with the level because many lower level searches are used at each position. The computational complexity for the nth level is approximately 200n times that for the base game in the disjoint version of Morpion Solitaire. Moreover, the execution time in the touching version is approximately 400n times that in the base game because there are more possible moves in the touching version. While Nested Monte-Carlo Search can be parallelized efficiently [12], the computational complexity increases at a higher rate than the resource (CPU or execution time). Therefore, it is difficult to complete a game using a level

greater than 4 in the disjoint version or greater than 3 in the touching version due to the time required. One way to reduce the execution time is to reduce the number of lower level searches at each position. The basic idea is as follows: Suppose there is Nested Monte-Carlo Search S’ that has fewer searches than the same-level original search S. If the processing time of S’ is less than that of S and the efficiency of the S’ is higher than that of S, the processing time of a higher level search in S’ is less than that of a higher level search in S, so the efficiency of S’ is higher than that of S. Therefore, a complete search can be made more efficiently when there is such an S’ and the computational complexity of S’ is small enough to complete the game by computer. In the search of a game tree, use of this method enables a wider search instead of a precise search at the end of the tree because a higher level search can be completed. 3.2. Simple reduction in number of lower level searches In Nested Monte-Carlo Search, every possible move is evaluated based on the result of one level lower search. Therefore, the search can be speeded up by reducing the number of lower level searches for each position by a certain percentage, as illustrated in Fig. 6. A move without a score is selected at random from the possible moves and played out. The simulation ends when a certain percentage of all possible moves get their score at each level. We tested three reduction rates: 1/4, 2/4 and 3/4. While reducing the number of lower level searches can reduce the score for one search, this is not a problem if the higher level search is completed. Another potential problem is the effect of the random selection. A move that has not been tested cannot be selected even though it may potentially have a higher score than moves that have been tested. This problem can be alleviated by using the AMAF heuristic, as described in the next section. 3.3. Reduction in number of lower level searches using All-MovesAs-First heuristic All-Moves-As-First (AMAF) is a useful heuristic for evaluation in Monte-Carlo Go. Brügmann [13] proposed this idea in 1993, and

H. Akiyama et al. / Knowledge-Based Systems 34 (2012) 12–20

15

Fig. 6. Use of simple method to reduce number of lower level searches in level 2 search.

Gelly and Silver [1] used it for ‘‘rapid action value estimation’’ in the UCT search algorithm. The basic idea of this method is as follows: Assume that a player has a playout from possible move m at position P. If the result of a game with a different order of moves is the same, each possible move at P that is played in that playout can get the same evaluation as m. Therefore, by using this method, we can evaluate many possible moves by using fewer playouts. Thus, the AMAF heuristic is used for evaluation at the beginning of Monte-Carlo search because the value of a move depends not on the procedure but on the position of the mark. Note that evaluation using the AMAF heuristic may be less accurate than general playout in a two-player game because a player’s order of moves depends heavily on that of the other player. Since, in Nested Monte-Carlo Search, every possible move is evaluated based on the result of one level lower search, a possible move at a position is also a possible move at later positions in many games. Therefore, the number of branches that have the same moves in a different order in the game tree increases with the level. In a two-player game, each player wants to make a better move than that of the opponent. This means that the move selected by a player depends on that of the other player. Therefore, the incidence rates for sets of moves in different orders are usually different. On the other hand, in a single-player game like Morpion Solitaire, when the sequence of moves in which two moves are swapped follows the rules in Morpion Solitaire, the rates are often the same, and they tend to depend on the number of possible moves at each position. We used the AMAF heuristic as an evaluation method for controlling the reduction in the number of lower level searches. Suppose that move n played out after possible move m at position P is also a possible move at P and that these two moves can be swapped. The evaluation of move m and the sequence of moves in which m is replaced with n are given to move n. The simulation of lower level searches ends when every possible move has received one or more AMAF evaluations. In this way, every possible move always receives one or more scores and a sequence of moves, so the number of lower level searches is greatly reduced. This method can be used because the sequence of moves in which move

m is replaced with n follows the rules in Morpion Solitaire. The use of the AMAF evaluation method in a level 2 game is illustrated in Fig. 7. It is actually applied to searches at all levels, but only the evaluation at level 2 is shown here. 3.4. Selection of move when AMAF evaluation is used In Nested Monte-Carlo Search, the move with the highest evaluation score is simply selected because each possible move is evaluated only once. However, when the AMAF evaluation method is used, each possible move may be evaluated more than once before all possible moves have been evaluated because all moves in the simulation are evaluated. Therefore, more than one AMAF evaluation might be available for selecting a move. If two or more possible moves have the highest score, a method is needed for selecting one. We tested three methods for selecting which move to select.  Rand: Select a move at random.  Ave: Select the move with the highest average score.  Freq: Select the move appearing most frequently in the evaluations. With the Ave and Freq methods, if there are multiple best moves, one is selected at random. It is thought that Rand searches the widest area of the game tree in these three methods because the move selected at random at each search step does not depend on the evaluation of that small part of the tree then being searched. While the effect of these methods might be not large at each level, the effect in a higher level search cannot be disregarded because it is a multiple of the effects at each level. 4. Evaluation and discussion We used the touching version of Morpion Solitaire for our evaluation. Level 2 and level 3 Nested Monte-Carlo Search were performed for each execution time reduction method, for each

16

H. Akiyama et al. / Knowledge-Based Systems 34 (2012) 12–20

Fig. 7. Use of AMAF evaluation to reduce number of lower level searches in level 2 search.

Fig. 8. Distributions of move frequency for simple reduction method.

reduction rate and for each selection method for ten hours. Level 1 and level 2 original Nested Monte-Carlo Search were performed for comparison. Since only one level 3 game could be completed within the ten-hour time limit for the original method, those scores are not shown in the graphs. The results for the simple reduction method (1/4, 2/4, and 3/4) and the original method are shown in Figs. 8–11. The distributions of the move frequency are shown in Fig. 8. Fig. 9 shows the score incidence rates for n or more moves. Note that, in Morpion Solitaire, the higher the score, the better the result. The minimum, average, and maximum scores are shown in Fig. 10. The average execution times for a game are shown in Fig. 11. The higher level searches resulted in a higher score, but the distribution of the moves with a higher score was not clear in Fig. 8. In

Fig. 9, the performance of each method is clear. The higher level searches resulted in a higher score, and the searches with less reduction in the number of searches got better results. Fig. 10 shows that, the greater the number of searches is, the higher the minimum, average, and maximum scores becomes. As shown in Fig. 11, the execution time increased with the number of searches and the level. Although the methods of higher reduction rate (those using more searches) performed better, it is difficult to complete higher level searches because they take longer to execute. The results for the AMAF heuristic evaluation method and the original method are shown in Figs. 12–15. The distributions of the move frequency are shown in Fig. 12. Fig. 13 shows the score incidence rates for m or more moves. The minimum, average, and maximum scores are shown in Fig. 14. The average execution

H. Akiyama et al. / Knowledge-Based Systems 34 (2012) 12–20

17

Fig. 9. Score incidence rates for n or more moves for simple reduction method.

Fig. 10. Minimum, average, and maximum scores for simple reduction method.

times for one game are shown in Fig. 15. As shown in Fig. 12, the distributions of the move frequency of Rand level 2 and Freq level 2 overlap. The use of Ave resulted in the worst scores. Although the original method had better performance for the same level search as the AMAF heuristic evaluation method, the AMAF method uses a higher level search and thus achieves a much higher score. The AMAF method enables higher level searches, resulting in better performance. As shown in Fig. 14, the higher the search level is, the higher the minimum, average, and maximum scores becomes. The use of Rand for selection resulted in the highest maximum score for every level. The differences in average score among the selection methods were small, but the use of Freq resulted in the best average score. As shown in Fig. 15, for level 2 searches, the average execution times with Rand, Ave, and Freq were 15.84, 11.62, and 13.92, and they were 0.085, 0.062, and 0.075 times that of the original method (186.17). That is, the use of each selection method resulted in execution that was at least ten times faster than that of the original

method. For level 3 searches, the execution times were 968.63, 744.51, and 1386.16, and they were 0.014, 0.011, and 0.021 times that of the original method (67219.34); i.e., they were from 50 to 100 times faster. This finding that, the higher the search level is, the faster the search becomes, means that we can expect an exponential speedup with higher level searches. The score with Rand and Freq were better than with a simple search number reduction of 1/4, and the increase rate of the execution times of Ave was less than simple search number reduction of 1/4. The execution time for a level 4 search using Ave can be predicted for dozens of hours, and that for a level 5 search can be predicted for approximately one month. Therefore, it should be possible to complete these high level searches in a general resource environment. These results show that the AMAF reduction method makes the search faster for the touching version of Morpion Solitaire and that a higher score is obtained due to the use of a higher level search. Given these results and using a PC with a 3 GHz CPU, we performed a level 5 search for the touching version of Morpion

18

H. Akiyama et al. / Knowledge-Based Systems 34 (2012) 12–20

Fig. 11. Average execution times for one game for simple reduction method.

Fig. 12. Distributions of move frequency for AMAF reduction method.

Fig. 13. Score incidence rates for n or more moves for AMAF reduction method.

H. Akiyama et al. / Knowledge-Based Systems 34 (2012) 12–20

19

Fig. 14. Minimum, average, and maximum scores for AMAF reduction method.

Fig. 15. Average execution times for one game for AMAF reduction method.

Solitaire using the AMAF reduction method with Ave selection, which had the lowest execution times for a higher level search (see Figs. 11 and 15). On the basis of the calculated execution times for the lower level searches, we estimated that the total execution time would be approximately 36 days. We achieved a 145-moves record in approximately 20 days and 22 h and a 146-moves record in approximately 33 days and 16 hours. The program finished at approximately 35 days and 13 h. These records were as of February 4 and 16, 2010. As mentioned above, the previous record by computer was 144 moves [9]. The grid for the 146-moves record is shown in Fig. 16. We also performed level 3 original Nested Monte-Carlo Search for the touching version of Morpion Solitaire for approximately 35 days and 13 h, the same as for the level 5 AMAF/Ave search. The maximum, average, and minimum scores for the 103 games completed, shown in Table 1, confirm that the AMAF method has

better performance. A level 4 search using the original method could not finish, and we stopped the program at 22 days. The highest score obtained was 136, and the execution time for that game was longer than for our 145-move game. The AMAF method should also useful for other games that have a large state space and that allow the order of moves in a sequence to be changed. Its effectiveness would depend on the properties of the game and the resources used as evidenced by the fact that a record was not achieved with this method for the disjoint version of Morpion Solitaire. Reducing the number of searches in Nested Monte-Carlo Search by using the AMAF heuristic for evaluation speeds up the search. We plan to investigate other possible ways. For example, we could try to combine our two methods or use transposition tables. In addition, we plan to test this method in other domains like SameGame or Sudoku to show that it is domain independent.

20

H. Akiyama et al. / Knowledge-Based Systems 34 (2012) 12–20

Fig. 16. Grid for 146-moves record for the touching version of Morpion Solitaire set using AMAF method with Ave selection and level 5 search.

Table 1 Minimum, average, and maximum scores for 103 games completed using level 3 original Nested Monte-Carlo Search for the touching version of Morpion Solitaire in 35 days and 13 h. Level 5, AMAF Level 3, Orig

Average Minimum Average Maximum

146 111 125 141

5. Conclusion We investigated two methods for reducing the number of lower level searches in Nested Monte-Carlo Search: simply reducing the number of lower level searches by a constant rate and using the All-Moves-As-First heuristic (AMAF) to control the reduction in the number of lower level searches. Both methods speeded up Nested Monte-Carlo Search for Morpion Solitaire, a single-player game with a relatively large state space. The performance improvement was better with the AMAF method. With it, we completed a level 5 search in approximately one month and achieved a world record by computer of 146 moves for the touching version of Morpion Solitaire. This method should be useful for other problems that have a large state space. Its effectiveness depends on the properties of the game tree because a record was not achieved with this method for the disjoint version of Morpion Solitaire. We plan to investigate other possible ways to reduce the number of searches in Nested Monte-Carlo Search. For example, we could try to combine our two methods or use transposition tables. In addition, we plan to test this method in other domains like SameGame or Sudoku to show that it is domain independent.

References [1] Sylvain Gelly, David Silver, Combining online and offline knowledge in UCT, in: Proceedings of the 24th International Conference of Machine Learning (ICML 2007), pp. 273–280, June 2007. [2] Levente Kocsis, Csaba Szepesvári, Bandit based Monte-Carlo planning, in: The 15th European Conference on Machine Learning (ECML), pp. 282–293, September 2006. [3] Guillaume Chaslot, Steven de Jong, Jahn-Takeshi Saito, Jos Uiterwijk, MonteCarlo tree search in production management problems, in: Proceedings of the 18th BeNeLux Conference on Artificial Intelligence, pp. 91–98, October 2006. [4] Maarten P.D. Schadd, Mark H.M. Winands, Mandy J.W. Tak, Jos W.H.M. Uiterwijk, Single-player Monte-Carlo tree search for SameGame, KnowledgeBased Systems, Available online 27 August 2011, in press. [5] Tristan Cazenave, Reflexive Monte-Carlo Search, in: Proceedings of Computer Games Workshop 2007 (CGW 2007), pp. 165–173, June 2007. [6] Tristan Cazenave, ‘‘Nested Monte-Carlo Search,’’ Proceedings of the 21st International Joint Conference on Artificial Intelligence (IJCAI 2009), pp. 456– 461, July 2009. [7] Jean Méhat, Tristan Cazenave, Combining UCT and Nested Monte-Carlo Search for single-player general game playing, IEEE Transactions on Computational Intelligence and AI in Games 2 (4) (2010) 271–277. [8] Tristan Cazenave, Nested Monte-Carlo Expression Discovery, in: 19th European Conference on Artificial Intelligence (ECAI 2010), pp. 1057–1058, August 2010. [9] Morpion Solitaire, http://www.morpionsolitaire.com/. [10] D. Erik, Martin L. Demaine, Demaine, Arthur Langerman, Stefan Langerman, Morpion Solitaire, Theory of Computing Systems 39 (3) (2006) 439–453. [11] Heikki Hyyrö, Timo Poranen, New Heuristics for Morpion Solitaire, Technical Report, University of Tampere, 2007. [12] Tristan Cazenave, Nicolas Jouandeau, Parallel Nested Monte-Carlo Search, in: The 12th International Workshop on Nature Inspired Distributed Computing (NIDISC 2009), pp.456–461, May 2009. [13] Bernd Brügmann, Monte Carlo Go, Technical Report, Physics Department, Syracuse University, 1993.