Files
qibotn/docs/contest_runners.md
jaunatisblue 915c24dc7b
Some checks failed
Build wheels / build (ubuntu-latest, 3.11) (push) Has been cancelled
Build wheels / build (ubuntu-latest, 3.12) (push) Has been cancelled
Build wheels / build (ubuntu-latest, 3.13) (push) Has been cancelled
Tests / check (push) Has been cancelled
Tests / build (ubuntu-latest, 3.11) (push) Has been cancelled
Tests / build (ubuntu-latest, 3.12) (push) Has been cancelled
Tests / build (ubuntu-latest, 3.13) (push) Has been cancelled
赛前稳定版
2026-05-15 09:32:26 +08:00

6.3 KiB

Contest Runners

This directory contains two self-contained contest entrypoints:

  • tools/tn_contest_runner.py: general tensor-network path search and contraction.
  • tools/mps_contest_runner.py: Vidal/MPS multi-node expectation runner.

Both scripts keep circuit and observable definitions inside the script so a contest case can be edited in one place.

Environment

Run commands from the repository root:

cd /home/yx/qibotn

For Intel MPI on two nodes, use the known working style:

mpirun -np 4 -hostfile /home/yx/qibotn/hostfile -perhost 2 ...

Set TCM_ENABLE=1 for CPU runs:

export TCM_ENABLE=1

TN Workflow

List built-in TN contest cases:

python -u tools/tn_contest_runner.py list

TN path search uses dask by default. Without --dask-address, the script starts a local dask cluster. For multiple servers, start one scheduler and workers with the helper script, then pass the scheduler address to the search command.

Start the default two-node dask cluster:

cd /home/yx/qibotn
tools/manage_tn_dask_cluster.sh start

Check status:

cd /home/yx/qibotn
tools/manage_tn_dask_cluster.sh status

Stop the cluster:

cd /home/yx/qibotn
tools/manage_tn_dask_cluster.sh stop

The helper defaults are:

SCHEDULER_HOST=10.20.1.103
WORKER_HOSTS="10.20.1.103 10.20.1.102"
NWORKERS=48
NTHREADS=1
ROOT_DIR=/home/yx/qibotn
PYTHON_BIN=.venv/bin/python
DASK_WORKER_TTL="24 hours"
DASK_TICK_LIMIT="30 minutes"
DASK_LOST_WORKER_TIMEOUT="30 minutes"

Override them inline if needed:

WORKER_HOSTS="10.20.1.103 10.20.1.102" NWORKERS=48 \
  tools/manage_tn_dask_cluster.sh restart

Check that both nodes are connected by adding --tn-debug-trials to a small search. The output should include qibotn_dask_workers with both hosts.

tools/tn_contest_runner.py search stops the external dask cluster after the search phase by default. Pass --keep-dask if you want to reuse the same dask cluster for several searches.

Use enough trials to fill the cluster. With the default two-node setup there are 96 worker slots, so --tn-search-repeats should be at least 96. The contest runner default is 2048.

Cotengra trials are CPU-bound and can hold the Python GIL long enough for dask to report Event loop was unresponsive. Dask defaults are much more aggressive: scheduler.worker-ttl=5 minutes, admin.tick.limit=3s, and deploy.lost-worker-timeout=15s. The helper script raises these limits so workers are not killed by dask during search. The intended timeout is --tn-search-time; after that, the runner stops the external dask cluster.

Small correctness check against statevector:

python -u tools/tn_contest_runner.py validate \
  --case main1 \
  --nqubits 8 \
  --nlayers 2 \
  --torch-threads 4 \
  --tn-search-repeats 8 \
  --tn-search-time 5

Search and save contraction trees:

TCM_ENABLE=1 python -u tools/tn_contest_runner.py search \
  --case main1 \
  --torch-threads 48 \
  --dtype complex64 \
  --dask-address tcp://10.20.1.103:8786 \
  --tn-search-repeats 2048 \
  --tn-search-time 300

Contract using the saved tree on one node:

TCM_ENABLE=1 mpirun -np 2 python -u tools/tn_contest_runner.py contract \
  --mpi \
  --case main1 \
  --torch-threads 48 \
  --dtype complex64

Contract using the saved tree on two nodes:

TCM_ENABLE=1 mpirun -np 4 -hostfile /home/yx/qibotn/hostfile -perhost 2 \
  python -u tools/tn_contest_runner.py contract \
  --mpi \
  --case main1 \
  --torch-threads 48 \
  --dtype complex64

Run search and contract in one command:

TCM_ENABLE=1 python -u tools/tn_contest_runner.py all \
  --case main1 \
  --torch-threads 48 \
  --dtype complex64 \
  --dask-address tcp://10.20.1.103:8786 \
  --tn-search-repeats 2048 \
  --tn-search-time 300

Run only selected observables:

python -u tools/tn_contest_runner.py search \
  --case main2 \
  --observables open_zz

Tree files are written to trees/contest_tn/ by default. The tree filename contains case, observable, qubit count, layer count, and target slice count. If any of these change, search again.

Edit TN contest cases in tools/tn_contest_runner.py:

  • CASES: case name, circuit kind, observable list, default scale.
  • build_circuit: circuit definitions.
  • pauli_sum_observable: observable definitions.

MPS Workflow

List built-in Vidal/MPS contest cases:

python -u tools/mps_contest_runner.py list

Small correctness check against statevector:

mpirun -np 2 python -u tools/mps_contest_runner.py validate \
  --case main1 \
  --nqubits 8 \
  --nlayers 2 \
  --bond 64 \
  --torch-threads 4

Run one MPS case on one node:

TCM_ENABLE=1 mpirun -np 2 python -u tools/mps_contest_runner.py run \
  --case main1 \
  --torch-threads 48

Run one MPS case on two nodes:

TCM_ENABLE=1 mpirun -np 4 -hostfile /home/yx/qibotn/hostfile -perhost 2 \
  python -u tools/mps_contest_runner.py run \
  --case main1 \
  --torch-threads 48

Run only one observable:

TCM_ENABLE=1 mpirun -np 4 -hostfile /home/yx/qibotn/hostfile -perhost 2 \
  python -u tools/mps_contest_runner.py run \
  --case main1 \
  --observables ring_xz \
  --torch-threads 48

Override scale:

TCM_ENABLE=1 mpirun -np 4 -hostfile /home/yx/qibotn/hostfile -perhost 2 \
  python -u tools/mps_contest_runner.py run \
  --case main1 \
  --nqubits 128 \
  --nlayers 24 \
  --bond 1024 \
  --torch-threads 48

Edit MPS contest cases in tools/mps_contest_runner.py:

  • CASES: case name, circuit kind, observable list, default scale and bond.
  • build_circuit: circuit definitions.
  • observable: observable definitions, including dense local terms.

Notes

  • TN uses path search plus contraction. Reuse tree files only for the exact same circuit, observable, qubit count, layer count, seed, and slicing setup.
  • TN path search defaults to dask. Use --tn-search-backend processpool only for fallback/debugging.
  • Prefer the default --tn-target-size 4294967296 memory target. Do not force --tn-target-slices unless you have already verified that cotengra can find valid trees for that exact setting.
  • MPS/Vidal does not use contraction-tree search. It runs the circuit directly and reports trunc_sum and trunc_max.
  • Default TN contraction is the stable torch/quimb path. Do not pass --tn-contract-implementation cpp for contest runs.