AI Search Strategies for Classic Board Game Mastery

Repo: Gomoku Genius: AI Search Strategies for Classic Board Game Mastery

Abstract

In this project, we explore five different artificial intelligence game-playing algorithms: minimax search, Alpha-Beta pruning, truncated search, naive Monte Carlo Tree Search (MCTS), and AlphaZero based on evaluation functions. These algorithms address the issues of search complexity and evaluation function design to varying degrees. We have implemented and conducted comparative analyses of these algorithms, evaluating their performance from the perspectives of game outcomes and chess strategy.

Problem 1: Implementation of Minimax Algorithm in Tic-Tac-Toe

Problem Description

Tic-Tac-Toe is a simple two-player zero-sum game, where each player alternates placing their pieces on a 3x3 board. The objective is to place three of one’s own pieces in any row, column, or diagonal. When a player achieves this goal, they win the match. If the board is filled without any player reaching the goal, the match ends in a draw. Due to the relatively small state space of Tic-Tac-Toe, we can use a complete search (i.e., traversing all possible game states) to find the optimal strategy. Thus, the minimax search is very effective in Tic-Tac-Toe. It can quickly calculate all possible moves and their outcomes, always selecting the best strategy.

Algorithm Analysis

The minimax algorithm is a tree-based search that starts from the current state and considers all possible actions. The algorithm assigns roles of maximizer and minimizer to the two players, where one player aims to maximize gains, and the other aims to minimize losses. In this case, we use the minimax to determine the best moves for the corresponding player as the maximizer. In the minimax function, we first check if the game has ended. If the game is over, we assign a value to the state based on the winning side. If the game has not ended, we execute the corresponding search strategy based on whether the current player is a maximizer or a minimizer. We iterate through all possible actions, recursively calling the minimax function in each iteration to calculate the value of sub-states. Then, we update the node value and the best action based on whether the current player is a maximizer or a minimizer.

Running Results

AI Performance: In human vs. AI matches, the AI demonstrates formidable strength through Minimax search, finding unbeatable strategies regardless of playing order. If both sides play optimally, the game always ends in a draw. The AI performs even better against suboptimal strategies, ensuring it either wins or ties, making it challenging for average players to win against AI optimized with Minimax search.

Problem 2: Implementation and Acceleration of Alpha-Beta Search Algorithm

Alpha-beta pruning is a method to reduce the search space in the Minimax search process, enhancing search efficiency.

The essence of pruning is to use information from explored nodes to decide whether it’s worthwhile to explore other nodes. The extent of search speed improvement depends on the tree structure and the order of node evaluation. In the search tree, each node maintains two values, $\alpha$ and $\beta$, representing the current best values for the maximizer and minimizer, respectively. In the best case, pruning can reduce the search space by half, significantly improving search efficiency. The time complexity can be reduced to $O(b^{\frac{m}{2}})$, where $b$ is the branching factor, and $m$ is the search depth. However, the effectiveness of pruning in practice is influenced by the order of node evaluations. The best pruning results occur when nodes are evaluated in the optimal order; the worst results occur when evaluated in the worst order.

In our experiment implementing alpha-beta pruning, there was a noticeable increase in move speed compared to the naive minimax, with the actual running times as follows:

Step AlphaBetaSearchPlayer (s) MinimaxSearchPlayer (s)
1 2.095 7675.695
2 1.408 646.496
3 0.359 65.328
4 0.187 5.342
5 0.016 0.360
6 0.000 0.078
7 0.000 0.000

The branching factor varies at each level, and the search depth is not necessarily 12 - number of moves + 1, but the results clearly show that alpha-beta search significantly accelerates compared to naive minimax.

Problem 3: Truncated Search and Evaluation Function Design

Given the deep search trees in actual Gomoku, even Alpha-beta AI struggles to make the first move. We employ truncated search here. When a shallow search can determine the outcome, it returns the final score; otherwise, it uses an evaluation function to assess the current state and truncates the search. The evaluation function considers:

  • The counts of live fours, straight fours, live threes, sleeping threes, live twos
  • The maximum distance of the furthest piece from the board center (ranging from 0 to 1)

Weights are assigned based on the threat level of each pattern, with higher weights for patterns closer to winning. Asymmetric scoring based on the current player versus the opponent is used, subtracting the opponent’s score from the player’s total score. The evaluation function’s weights are designed as follows for the player and the opponent, highlighting the strategic importance of forming certain patterns and prioritizing central control and offensive strategies.

In the given battle results, we observe that players using Alpha-Beta pruning search (CuttingOffAlphaBetaSearchPlayer) significantly outperform those using MCTS (MCTSPlayer). The results of 10 matches are recorded as follows:

Player 1 Player 2 Player 1 Wins Player 2 Wins Ties
MCTSPlayer CuttingOffAlphaBetaSearchPlayer 1 8 1
CuttingOffAlphaBetaSearchPlayer MCTSPlayer 9 1 0

The results indicate that Alpha-Beta pruning search has a notably higher win rate and more rational moves. In contrast, the naive MCTS, due to its randomness and exploratory nature, may not have enough advantages in such simple games. MCTS relies on random games to evaluate nodes when exploring new ones, which might lead to lower search efficiency, affecting the final battle outcomes.

From the gameplay perspective, MCTS’s moves typically show high exploratory behavior. Since its strategy is not entirely based on the current situation’s evaluation but balances exploration and exploitation by simulating numerous random games, MCTS might try some unconventional and seemingly irrational moves. MCTS lacks response to many threatening patterns and does not always aim to control the center, sometimes considering blocking only when facing a live-four situation. Therefore, in Gomoku, the naive implementation of MCTS with random simulations may not be an effective method for playouts.

Problem 5: Comparison Between MCTS and AlphaZero

In our implementation, AlphaZero introduces an artificial evaluation function into MCTS. According to the battle results provided, AlphaZeroPlayer clearly has an advantage in matches, mostly defeating MCTSPlayer with only a few games ending in a draw or with MCTSPlayer winning. This indicates that the improvements made by AlphaZero indeed make AI moves more rational.

To further analyze whether AlphaZero makes more rational moves, we need to conduct a detailed analysis of the gameplay. Naive MCTS relies on random games to evaluate positions, leading to irrational moves. However, AlphaZero significantly improves the accuracy of situation assessment by introducing the artificial evaluation function from Problem Three into MCTS. This makes the search process more effective, better exploring advantageous moves and balancing exploration and exploitation during decision-making. When searching, AlphaZero tends to find moves with high evaluation values, meaning it can better identify key positions in the game, such as defending against potential threats, attacking the opponent’s weaknesses, forming advantageous patterns, etc. This evaluation function-based search strategy makes AlphaZero’s moves more strategic and targeted, thus gaining an advantage in matches.

Player 1 Player 2 Player 1 Wins Player 2 Wins Ties
MCTSPlayer AlphaZeroPlayer 1 8 1
AlphaZeroPlayer MCTSPlayer 10 0 0

In conclusion, by introducing the artificial evaluation function from Problem Three, AlphaZero achieves higher accuracy and efficiency in the search process, making AI moves more rational. Both the battle results and specific gameplay analysis support this viewpoint. Although AlphaZero has not reached the level of AlphaGo Zero trained with deep learning, its performance in Gomoku AI is commendable.

Conclusion

In this project, we successfully implemented five different game-playing algorithms and evaluated them from the perspectives of game outcomes and chess strategy. We found that the Alpha-Beta pruning algorithm based on minimax search + truncated search somewhat reduces search complexity but is still limited by the design of the evaluation function. While naive MCTS escapes the constraints of the evaluation function, its method of evaluating through random games results in slower and less effective moves.

In contrast, the AlphaZero algorithm, based on the evaluation function introduced into MCTS, makes the move process more rational and effective. However, it’s important to note that the evaluation function in this project is based on artificial features and may have certain limitations. In practice, a deep network learning evaluation function might be used to overcome these.