AlphaZero Training and Config Guide
Training (Entry Point + Local Commands)
Training entry point:
python -m gomoku.scripts.train
CLI arguments:
--config(required): YAML config path--mode(required):sequential | vectorize | mp | ray--device(optional):auto | cpu(default:auto)
Requires Python 3.13+.
cd alphazero
# If needed
# python -m venv .venv
# source .venv/bin/activate
# CPU dependencies
pip install -e ".[torch-cpu]" --extra-index-url https://download.pytorch.org/whl/cpu
# Sequential
python -m gomoku.scripts.train --config configs/config_alphazero_test.yaml --mode sequential --device cpu
# Vectorized (single process, multiple game slots)
python -m gomoku.scripts.train --config configs/config_alphazero_vectorize_test.yaml --mode vectorize --device cpu
# Multiprocessing
python -m gomoku.scripts.train --config configs/config_alphazero_mp_test.yaml --mode mp --device cpu
# Ray
pip install -e ".[ray,torch-cpu]" --extra-index-url https://download.pytorch.org/whl/cpu
python -m gomoku.scripts.train --config configs/5x5_local_test.yaml --mode ray --device cpu
bashOutput Paths and Resume Behavior
{paths.run_prefix}/
{paths.run_id}/
ckpt/
iteration_0000.pt
iteration_0000.pt.optim
iteration_0001.pt
iteration_0001.pt.optim
...
replay/
shard-iter0001-....parquet
shard-iter0002-....parquet
...
eval_logs/
...
manifest.json
text- First run: create
manifest.jsonand save the initial champion checkpoint - Resume: continue from
manifest.json; restore optimizer state from*.optimwhen available
Configuration Template
This template shows the most common fields. Real configs in alphazero/configs/ may include additional options such as priority_replay, opponent_rates, elo_k_factor, and resign controls (resign_threshold, resign_enabled, min_moves_before_resign).
Section quick guide:
board: Board size and game-rule toggles used by self-play and evaluation.model: Neural network width/depth settings that control model capacity.training: Iteration counts, optimization schedule, replay sampling, and data-loader behavior.mcts: Search behavior during play;Cis the UCB exploration weight,num_searchesis simulations per move,exploration_turnskeeps early-game exploration higher,dirichlet_epsilon/dirichlet_alphacontrol root-noise mix/shape, andbatch_infer_sizecontrols inference batch size.evaluation: Promotion gate and periodic benchmark settings for challenger vs champion.parallel: Worker/process counts forvectorize,mp, and localraymodes.paths: Output location and run identifier (run_prefix,run_id, local vs GCS).io: Replay shard sizing and local replay cache behavior.runtime: Optional Ray actor CPU/GPU allocation and async self-play/inference topology.
# For schedulable config fields, you can use either fixed numeric values
# or scheduled params in the form: { until: ..., value: ... }.
# Example: learning_rate: 0.001 OR learning_rate: [{ until: 20, value: 0.002 }, { until: 60, value: 0.001 }]
# Scheduled values must be a list covering all iterations up to num_iterations.
board:
num_lines: 19
enable_doublethree: true
enable_capture: true
capture_goal: 5
gomoku_goal: 5
history_length: 5 # 5 by default
model:
num_hidden: 128
num_resblocks: 12
# num_planes / policy_channels / value_channels are fixed/derived in code
training:
num_iterations: 60 # total iterations
# note: for scheduled fields, the final `until` must match `num_iterations`
num_selfplay_iterations:
- { until: 20, value: 1200 }
- { until: 40, value: 1800 }
- { until: 60, value: 2400 }
num_epochs: 2
batch_size: 512
learning_rate:
- { until: 20, value: 0.0020 }
- { until: 40, value: 0.0010 }
- { until: 60, value: 0.0005 }
weight_decay: 0.0001
temperature:
- { until: 20, value: 1.0 }
- { until: 40, value: 0.7 }
- { until: 60, value: 0.4 }
replay_buffer_size: 500000
min_samples_to_train: 10000
random_play_ratio:
- { until: 20, value: 0.03 }
- { until: 40, value: 0.02 }
- { until: 60, value: 0.01 }
dataloader_num_workers: 4
dataloader_prefetch_factor: 2
enable_tf32: true
use_channels_last: true
mcts:
C: 2.0
num_searches:
- { until: 20, value: 400 }
- { until: 40, value: 800 }
- { until: 60, value: 1200 }
exploration_turns: 20
dirichlet_epsilon:
- { until: 20, value: 0.25 }
- { until: 40, value: 0.15 }
- { until: 60, value: 0.05 }
dirichlet_alpha: 0.3
batch_infer_size: 32
max_batch_wait_ms: 5
min_batch_size: 1
use_native: true
evaluation:
num_eval_games: 40
eval_every_iters: 2
promotion_win_rate:
- { until: 30, value: 0.55 }
- { until: 60, value: 0.58 }
num_baseline_games: 0
blunder_threshold: 0.5
initial_blunder_rate: 0.0
initial_baseline_win_rate: 0.0
blunder_increase_limit: 1.0
baseline_wr_min: 0.0
random_play_ratio: 0.0
eval_num_searches:
- { until: 30, value: 600 }
- { until: 60, value: 900 }
baseline_num_searches: 0
use_sprt: false
fast_eval:
enabled: false
num_games: 0
num_searches: 0
promote_threshold: 0.0
reject_threshold: 0.0
parallel:
num_parallel_games: 8 # only for 'vectorize' mode
mp_num_workers: 4 # only for 'mp' mode
ray_local_num_workers: 8 # only for 'ray' mode - local worker count
paths:
use_gcs: false # use local filesystem
run_prefix: runs # default when use_gcs=false
run_id: exp_20260217 # run identifier
# use_gcs=true example:
# use_gcs: true
# run_prefix: gomoku-prod-bucket # put your bucket name here (no gs:// prefix, no slash)
# run_id: exp_20260217
io:
initial_replay_shards: null
initial_replay_iters: null
max_samples_per_shard: 5000
local_replay_cache: /tmp/gmk_replay_cache
runtime: null # null is valid for local/simple runs
# ray mode example (CPU-only local):
# runtime:
# selfplay:
# actor_num_cpus: 1.0
# games_per_actor: 16
# inflight_per_actor: 16
# inference:
# actor_num_gpus: 0.0
# num_actors: 1
# actor_num_cpus: 1.0
# use_local_inference: true
# evaluation:
# num_workers: 2
# actor_num_cpus: 1.0
yamlRay/GCP Cluster Files
alphazero/infra/cluster/cluster_elo1800.yaml- Ray cluster template used for GCP deployment.
alphazero/infra/cluster/restart_cluster.sh- Cluster restart/redeploy script.
- Renders the cluster config and runs
ray up. - All required
GCP_*variables must be set in repo-root.env(the script uses:?requiredchecks and exits early if any are missing). - For the full list, set all
GCP_*entries in.env.example(includingGCP_PROJECT,GCP_REGION,GCP_ZONE,GCP_REPO,GCP_CLUSTER_NAME,GCP_SSH_USER,GCP_CONTAINER_NAME,GCP_GPU_TAG,GCP_CPU_TAG,GCP_SA_NAME,GCP_HEAD_RESERVATION,GCP_SSH_PRIVATE_KEY,GCP_USER_EMAIL,GCP_BUCKET_NAME). - You can override with exported env vars before running the script.
# from repo root
# optional overrides (examples)
# export GCP_PROJECT=my-gcp-project
# export GCP_REGION=us-central1
# export DO_BUILD=true
# export DO_RESTART=true
bash alphazero/infra/cluster/restart_cluster.sh
bash