Search: From Alpha-Beta to PVS

Alpha-Beta Pruning

v_{\alpha\beta}(s)= \begin{cases} \max_{a\in A(s)} v_{\alpha\beta}(s_a) & \text{(MAX)} \\ \min_{a\in A(s)} v_{\alpha\beta}(s_a) & \text{(MIN)} \end{cases} \quad \text{with cutoff if } \alpha \ge \beta

Alpha-beta pruning is the foundational optimization over naive minimax. The idea: maintain a window $[\alpha, \beta]$ representing the best scores each player can already guarantee. If a node's score can't possibly influence the final decision — because it falls outside this window — prune the remaining children.

The cutoff condition is straightforward: if $\alpha \geq \beta$ , stop searching siblings. With perfect move ordering, this reduces the node count from $O(b^d)$ to $O(b^{d/2})$ (equivalently, the effective branching factor drops from $b$ to $\sqrt{b}$ ) — searching to depth 10 with the same cost as naive minimax at depth 5.

This was the first search algorithm implemented. It improved over naive minimax dramatically, but wasn't enough for depth 10 on a 19x19 board in reasonable time. The key bottleneck wasn't the algorithm — it was move ordering. Alpha-beta only prunes well when the best move is examined first.

Implementation note: the engine uses fail-soft alpha-beta (returned scores can fall outside the current $[\alpha,\beta]$ window), while the diagram below intentionally presents a fail-hard walkthrough for conceptual clarity.

Alpha-Beta Pruning

Step 0 / 0

MAX node MIN node Leaf value Pruned Active

Open in a new tab

Move Ordering

Alpha-beta's effectiveness depends entirely on examining the best move first. The engine uses a three-tier ordering system:

TT hash move: If a previous search found a best move for this position (stored in the transposition table), try it first. This is often the principal variation move from a shallower iteration, causing immediate cutoffs.
Killer moves: Two moves per depth level that caused beta cutoffs at sibling nodes. The heuristic: a move that refutes one sibling often refutes others. Stored in killerMoves[MAX_DEPTH+1][2], updated on every cutoff:

Source: minimax/src/gomoku/search/Minimax.cpp:359-362

      // minimax/src/gomoku/search/Minimax.cpp:359-362
if (killerMoves[depth][0] != mv && killerMoves[depth][1] != mv) {
  killerMoves[depth][1] = killerMoves[depth][0];
  killerMoves[depth][0] = mv;
}

cpp

Just four lines — shift the old killer out, insert the new one — but this simple heuristic provides substantial pruning improvements by ensuring refutation moves are tried early.

Evaluation-sorted: Remaining moves are scored by the evaluation function and sorted to maximize cutoffs — descending for MAX nodes (best-for-maximizer first), ascending for MIN nodes (best-for-minimizer first). Expensive per-node, but the pruning it enables more than compensates.

This ordering is what makes depth 10 feasible. Without it, even alpha-beta can't overcome the 200+ branching factor of a 19x19 board.

Transposition Table

Positions can be reached via different move orders but produce identical game states. The transposition table (TT) stores previously evaluated positions to avoid redundant work:

Source: minimax/src/gomoku/search/Minimax.cpp:260-285

      // minimax/src/gomoku/search/Minimax.cpp:260-285
inline bool probeTT(Board *board, int depth, int &alpha, int &beta,
                    std::pair<int, int> &bestMove, int &scoreOut) {
  uint64_t h = board->getHash();
  boost::unordered_map<uint64_t, TTEntry>::iterator it = transTable.find(h);
  if (it == transTable.end()) return false;

  const TTEntry &e = it->second;
  bestMove = e.bestMove;

  if (e.depth < depth) return false;  // only ordering info
  scoreOut = e.score;

  if (e.flag == EXACT) {
    board->flushCaptures();
    return true;
  }
  if (e.flag == LOWERBOUND)
    alpha = std::max(alpha, scoreOut);
  else /*UPPERBOUND*/
    beta = std::min(beta, scoreOut);
  if (alpha >= beta) {
    board->flushCaptures();
    return true;
  }
  return false;
}

cpp

Each entry stores {score, depth, bestMove, boundType}, keyed by Zobrist hash. Three bound types are used:

EXACT: score was within the alpha-beta window — the true minimax value
LOWERBOUND: score $\geq \beta$ , caused a cutoff — we know the score is at least this high
UPPERBOUND: score $\leq \alpha$ , all moves failed low — we know the score is at most this high

On lookup, the stored bound narrows the current window. If the narrowed window causes $\alpha \geq \beta$ , the position is resolved without searching. Even when the stored depth is too shallow for a full cutoff, the best move is still used for move ordering — providing the hash move for tier 1.

Iterative Deepening

Rather than jumping straight to the target depth, the engine searches depth 1, then depth 2, then depth 3... up to maxDepth or a time limit. This might seem wasteful — repeating shallower searches — but each iteration populates the transposition table, so deeper iterations benefit from much better move ordering. The TT's best moves from depth $d-1$ become the hash moves at depth $d$ , dramatically improving pruning.

Medium difficulty uses iterative deepening with a 0.4-second time limit. On timeout, the engine returns the best move from the deepest completed iteration — ensuring it always has a valid response regardless of how deep it managed to search.

Quiescence Search (Scoped by Mode)

Quiescence is not a universal minimax-engine feature in this project. It depends on the search path selected by difficulty:

Source: minimax/src/ws/request_handlers.cpp:22-26, minimax/src/gomoku/search/Minimax.cpp:389-393, minimax/src/gomoku/search/Minimax.cpp:593-596

Path	Mode	Quiescence at depth 0?
Alpha-Beta (`getBestMove`)	`easy`	Yes, but only when capture rules are enabled (`board->getEnableCapture()`)
Iterative Deepening (`iterativeDeepening`)	`medium`	Same as alpha-beta path
PVS (`getBestMovePVS`)	`hard`	No, returns static evaluation immediately

For the alpha-beta/iterative-deepening path, depth 0 can continue with capture-only moves. This reduces horizon-effect blunders in capture-heavy positions.

The stand-pat score (static evaluation at depth 0) serves as a lower bound: the current player can always choose not to capture. If the stand-pat already causes a cutoff (≥ β for MAX, ≤ α for MIN), the position is returned immediately. Otherwise, only capture moves that improve the score are searched deeper. When no captures remain, the position is "quiet" and the static evaluation is reliable.

The quiescence search uses fail-soft semantics — consistent with the main alpha-beta and PVS search paths. Stand-pat cutoffs return the actual evaluation score (not the clamped bound), giving the transposition table more accurate entries.

This matters particularly when captures are enabled: a position that looks equal statically might have a forced capture sequence that swings the evaluation dramatically.

Principal Variation Search (PVS)

PVS builds on a key insight: with good move ordering, the first child at each node is usually the best move (the "principal variation"). Instead of searching every child with the full $[\alpha, \beta]$ window, PVS searches the first child normally, then uses a probe window for all subsequent children — essentially asking "is this move better than my current best?"

In this implementation, PVS (hard) does not apply quiescence at depth 0. It returns the static evaluation directly at the depth cutoff. The hard evaluation's pattern-counted scoring at depth 10 compensates — the deeper search depth resolves most tactical sequences that quiescence would otherwise need to handle.

Null-window adaptation: The idea is simple — after finding a good move, test each remaining move with the cheapest possible search to confirm it's worse. Only if the cheap test says "this might actually be better" do we spend time on a full re-search.

The standard PVS formulation (in negamax) uses a width-1 null window (-alpha-1, -alpha), i.e. $\beta = \alpha + 1$ . This engine uses non-negamax minimax with explicit isMaximizing, and passes $[\alpha+1, \alpha+1]$ — a width-0 window where $\alpha = \beta$ . In this configuration, the probe subtree cuts off after evaluating a single best-ordered move at each ply, since $\alpha \geq \beta$ is immediately satisfied after the first child returns. This makes each probe cheaper (fewer nodes visited) at the cost of a less precise bound; the re-search with the full window corrects any imprecision when needed. The standard width-1 window $[\alpha, \alpha+1]$ was tested but is incompatible with the non-negamax sign conventions used in the rest of the search stack. In non-negamax minimax, maximizing and minimizing players maintain separate alpha/beta update logic; a width-1 window causes asymmetric behavior depending on whether the probe subtree root is maximizing or minimizing — the bound tightening works correctly for one side but produces incorrect cutoffs for the other. The width-0 window sidesteps this by making the cutoff symmetric: $\alpha \geq \beta$ triggers identically regardless of which side moves first in the subtree.

If a probe returns a score $> \alpha$ and $< \beta$ (the parent's full window), the PV assumption was wrong: this child is better than expected. The engine re-searches with the full window to get the exact score. In practice, re-searches are rare when move ordering is good, making PVS faster than standard alpha-beta.

Principal Variation Search (PVS)

Step 0 / 0

Full-window search Null-window search Fail-high (re-search) Confirmed / updated Active

Open in a new tab

Source: minimax/src/gomoku/search/Minimax.cpp:580-652

      // minimax/src/gomoku/search/Minimax.cpp:580-652
int pvs(Board *board, int depth, int alpha, int beta, int currentPlayer,
        int lastX, int lastY, bool isMaximizing, EvalFn evalFn) {
  // ... TT probe, terminal check ...

  bool firstChild = true;
  for (size_t i = 0; i < scored.size(); ++i) {
    const std::pair<int, int> &mv = scored[i].move;
    UndoInfo ui = board->makeMove(mv.first, mv.second);
    int next = board->getNextPlayer();

    int score;
    if (firstChild) {
      // full window
      score = pvs(board, depth - 1, alpha, beta, next,
                  mv.first, mv.second, !isMaximizing, evalFn);
      firstChild = false;
    } else {
      // null window (width-0: alpha = beta = alpha + 1)
      score = pvs(board, depth - 1, alpha + 1, alpha + 1, next,
                  mv.first, mv.second, !isMaximizing, evalFn);
      // if it produced something interesting, re-search
      if (score > alpha && score < beta) {
        score = pvs(board, depth - 1, alpha, beta, next,
                    mv.first, mv.second, !isMaximizing, evalFn);
      }
    }

    board->undoMove(ui);
    updateBestAndBounds(isMaximizing, score, mv, bestEval, bestMove, alpha, beta);
    if (alpha >= beta) {
      // update killer moves on cutoff
      if (!isKillerMove(depth, mv)) {
        killerMoves[depth][1] = killerMoves[depth][0];
        killerMoves[depth][0] = mv;
      }
      break;
    }
  }

  storeTT(hash, depth, bestMove, bestEval, alphaOrig, beta);
  return bestEval;
}

cpp

PVS was implemented with help from the chessprogramming.org PVS article for the null-window re-search logic. The reference pseudocode uses negamax with (-alpha-1, -alpha); this engine's non-negamax minimax form passes (alpha+1, alpha+1) directly — the same width-0 null window, just without the negation convention.

Result

Benchmarks were measured on an École 42 Paris lab Dell OptiPlex 7400 AIO workstation (12th Gen Intel Core i7-12700, 12 cores / 20 threads, 15 GiB RAM), running Ubuntu 22.04.4 LTS (kernel 5.15.0-170-generic), with g++ 10.5.0 and CPU governor powersave. Measurements used the release build (-O2) and single-thread execution (./search_benchmark).

Across repeated runs on the opening scenario (~200 legal moves), plain alpha-beta at depth 5 (easy) took about 4.07s, while PVS at depth 10 (hard) took about 2.33s. medium (iterative deepening with a 0.4s time limit) stayed near 0.40s. As the board gets denser, easy scales poorly (13.6 to 29.2s) because plain alpha-beta at depth 5 has no iterative deepening to warm the transposition table and no killer-move ordering. hard remains much lower (0.92 to 4.71s) despite searching to depth 10, because PVS benefits from TT-warmed move ordering and killer moves that drastically improve pruning. These benchmarks are directly reproducible by running make benchmark && ./search_benchmark in minimax/.

Observed ranges from repeated runs (iterations=1, warmup=0):

Scenario	easy (ms)	medium (ms)	hard (ms)
opening	4066.57 - 4077.10	401.13 - 405.25	2330.71 - 2331.39
midgame	13607.93 - 13796.55	404.99 - 409.15	920.02 - 932.39
late_midgame	28588.11 - 29238.21	437.79 - 440.14	4706.83 - 4711.01

Prev Next