6.3 KiB
Contest Runners
This directory contains two self-contained contest entrypoints:
tools/tn_contest_runner.py: general tensor-network path search and contraction.tools/mps_contest_runner.py: Vidal/MPS multi-node expectation runner.
Both scripts keep circuit and observable definitions inside the script so a contest case can be edited in one place.
Environment
Run commands from the repository root:
cd /home/yx/qibotn
For Intel MPI on two nodes, use the known working style:
mpirun -np 4 -hostfile /home/yx/qibotn/hostfile -perhost 2 ...
Set TCM_ENABLE=1 for CPU runs:
export TCM_ENABLE=1
TN Workflow
List built-in TN contest cases:
python -u tools/tn_contest_runner.py list
TN path search uses dask by default. Without --dask-address, the script starts
a local dask cluster. For multiple servers, start one scheduler and workers
with the helper script, then pass the scheduler address to the search command.
Start the default two-node dask cluster:
cd /home/yx/qibotn
tools/manage_tn_dask_cluster.sh start
Check status:
cd /home/yx/qibotn
tools/manage_tn_dask_cluster.sh status
Stop the cluster:
cd /home/yx/qibotn
tools/manage_tn_dask_cluster.sh stop
The helper defaults are:
SCHEDULER_HOST=10.20.1.103
WORKER_HOSTS="10.20.1.103 10.20.1.102"
NWORKERS=48
NTHREADS=1
ROOT_DIR=/home/yx/qibotn
PYTHON_BIN=.venv/bin/python
DASK_WORKER_TTL="24 hours"
DASK_TICK_LIMIT="30 minutes"
DASK_LOST_WORKER_TIMEOUT="30 minutes"
Override them inline if needed:
WORKER_HOSTS="10.20.1.103 10.20.1.102" NWORKERS=48 \
tools/manage_tn_dask_cluster.sh restart
Check that both nodes are connected by adding --tn-debug-trials to a small
search. The output should include qibotn_dask_workers with both hosts.
tools/tn_contest_runner.py search stops the external dask cluster after the
search phase by default. Pass --keep-dask if you want to reuse the same dask
cluster for several searches.
Use enough trials to fill the cluster. With the default two-node setup there are
96 worker slots, so --tn-search-repeats should be at least 96. The contest
runner default is 2048.
Cotengra trials are CPU-bound and can hold the Python GIL long enough for dask
to report Event loop was unresponsive. Dask defaults are much more aggressive:
scheduler.worker-ttl=5 minutes, admin.tick.limit=3s, and
deploy.lost-worker-timeout=15s. The helper script raises these limits so
workers are not killed by dask during search. The intended timeout is
--tn-search-time; after that, the runner stops the external dask cluster.
Small correctness check against statevector:
python -u tools/tn_contest_runner.py validate \
--case main1 \
--nqubits 8 \
--nlayers 2 \
--torch-threads 4 \
--tn-search-repeats 8 \
--tn-search-time 5
Search and save contraction trees:
TCM_ENABLE=1 python -u tools/tn_contest_runner.py search \
--case main1 \
--torch-threads 48 \
--dtype complex64 \
--dask-address tcp://10.20.1.103:8786 \
--tn-search-repeats 2048 \
--tn-search-time 300
Contract using the saved tree on one node:
TCM_ENABLE=1 mpirun -np 2 python -u tools/tn_contest_runner.py contract \
--mpi \
--case main1 \
--torch-threads 48 \
--dtype complex64
Contract using the saved tree on two nodes:
TCM_ENABLE=1 mpirun -np 4 -hostfile /home/yx/qibotn/hostfile -perhost 2 \
python -u tools/tn_contest_runner.py contract \
--mpi \
--case main1 \
--torch-threads 48 \
--dtype complex64
Run search and contract in one command:
TCM_ENABLE=1 python -u tools/tn_contest_runner.py all \
--case main1 \
--torch-threads 48 \
--dtype complex64 \
--dask-address tcp://10.20.1.103:8786 \
--tn-search-repeats 2048 \
--tn-search-time 300
Run only selected observables:
python -u tools/tn_contest_runner.py search \
--case main2 \
--observables open_zz
Tree files are written to trees/contest_tn/ by default. The tree filename
contains case, observable, qubit count, layer count, and target slice count.
If any of these change, search again.
Edit TN contest cases in tools/tn_contest_runner.py:
CASES: case name, circuit kind, observable list, default scale.build_circuit: circuit definitions.pauli_sum_observable: observable definitions.
MPS Workflow
List built-in Vidal/MPS contest cases:
python -u tools/mps_contest_runner.py list
Small correctness check against statevector:
mpirun -np 2 python -u tools/mps_contest_runner.py validate \
--case main1 \
--nqubits 8 \
--nlayers 2 \
--bond 64 \
--torch-threads 4
Run one MPS case on one node:
TCM_ENABLE=1 mpirun -np 2 python -u tools/mps_contest_runner.py run \
--case main1 \
--torch-threads 48
Run one MPS case on two nodes:
TCM_ENABLE=1 mpirun -np 4 -hostfile /home/yx/qibotn/hostfile -perhost 2 \
python -u tools/mps_contest_runner.py run \
--case main1 \
--torch-threads 48
Run only one observable:
TCM_ENABLE=1 mpirun -np 4 -hostfile /home/yx/qibotn/hostfile -perhost 2 \
python -u tools/mps_contest_runner.py run \
--case main1 \
--observables ring_xz \
--torch-threads 48
Override scale:
TCM_ENABLE=1 mpirun -np 4 -hostfile /home/yx/qibotn/hostfile -perhost 2 \
python -u tools/mps_contest_runner.py run \
--case main1 \
--nqubits 128 \
--nlayers 24 \
--bond 1024 \
--torch-threads 48
Edit MPS contest cases in tools/mps_contest_runner.py:
CASES: case name, circuit kind, observable list, default scale and bond.build_circuit: circuit definitions.observable: observable definitions, including dense local terms.
Notes
- TN uses path search plus contraction. Reuse tree files only for the exact same circuit, observable, qubit count, layer count, seed, and slicing setup.
- TN path search defaults to dask. Use
--tn-search-backend processpoolonly for fallback/debugging. - Prefer the default
--tn-target-size 4294967296memory target. Do not force--tn-target-slicesunless you have already verified that cotengra can find valid trees for that exact setting. - MPS/Vidal does not use contraction-tree search. It runs the circuit directly
and reports
trunc_sumandtrunc_max. - Default TN contraction is the stable torch/quimb path. Do not pass
--tn-contract-implementation cppfor contest runs.