# Contest Runners This directory contains two self-contained contest entrypoints: - `tools/tn_contest_runner.py`: general tensor-network path search and contraction. - `tools/mps_contest_runner.py`: Vidal/MPS multi-node expectation runner. Both scripts keep circuit and observable definitions inside the script so a contest case can be edited in one place. ## Environment Run commands from the repository root: ```bash cd /home/yx/qibotn ``` For Intel MPI on two nodes, use the known working style: ```bash mpirun -np 4 -hostfile /home/yx/qibotn/hostfile -perhost 2 ... ``` Set `TCM_ENABLE=1` for CPU runs: ```bash export TCM_ENABLE=1 ``` ## TN Workflow List built-in TN contest cases: ```bash python -u tools/tn_contest_runner.py list ``` TN path search uses dask by default. Without `--dask-address`, the script starts a local dask cluster. For multiple servers, start one scheduler and workers with the helper script, then pass the scheduler address to the search command. Start the default two-node dask cluster: ```bash cd /home/yx/qibotn tools/manage_tn_dask_cluster.sh start ``` Check status: ```bash cd /home/yx/qibotn tools/manage_tn_dask_cluster.sh status ``` Stop the cluster: ```bash cd /home/yx/qibotn tools/manage_tn_dask_cluster.sh stop ``` The helper defaults are: ```bash SCHEDULER_HOST=10.20.1.103 WORKER_HOSTS="10.20.1.103 10.20.1.102" NWORKERS=48 NTHREADS=1 ROOT_DIR=/home/yx/qibotn PYTHON_BIN=.venv/bin/python DASK_WORKER_TTL="24 hours" DASK_TICK_LIMIT="30 minutes" DASK_LOST_WORKER_TIMEOUT="30 minutes" ``` Override them inline if needed: ```bash WORKER_HOSTS="10.20.1.103 10.20.1.102" NWORKERS=48 \ tools/manage_tn_dask_cluster.sh restart ``` Check that both nodes are connected by adding `--tn-debug-trials` to a small search. The output should include `qibotn_dask_workers` with both hosts. `tools/tn_contest_runner.py search` stops the external dask cluster after the search phase by default. Pass `--keep-dask` if you want to reuse the same dask cluster for several searches. Use enough trials to fill the cluster. With the default two-node setup there are 96 worker slots, so `--tn-search-repeats` should be at least 96. The contest runner default is 2048. Cotengra trials are CPU-bound and can hold the Python GIL long enough for dask to report `Event loop was unresponsive`. Dask defaults are much more aggressive: `scheduler.worker-ttl=5 minutes`, `admin.tick.limit=3s`, and `deploy.lost-worker-timeout=15s`. The helper script raises these limits so workers are not killed by dask during search. The intended timeout is `--tn-search-time`; after that, the runner stops the external dask cluster. Small correctness check against statevector: ```bash python -u tools/tn_contest_runner.py validate \ --case main1 \ --nqubits 8 \ --nlayers 2 \ --torch-threads 4 \ --tn-search-repeats 8 \ --tn-search-time 5 ``` Search and save contraction trees: ```bash TCM_ENABLE=1 python -u tools/tn_contest_runner.py search \ --case main1 \ --torch-threads 48 \ --dtype complex64 \ --dask-address tcp://10.20.1.103:8786 \ --tn-search-repeats 2048 \ --tn-search-time 300 ``` Contract using the saved tree on one node: ```bash TCM_ENABLE=1 mpirun -np 2 python -u tools/tn_contest_runner.py contract \ --mpi \ --case main1 \ --torch-threads 48 \ --dtype complex64 ``` Contract using the saved tree on two nodes: ```bash TCM_ENABLE=1 mpirun -np 4 -hostfile /home/yx/qibotn/hostfile -perhost 2 \ python -u tools/tn_contest_runner.py contract \ --mpi \ --case main1 \ --torch-threads 48 \ --dtype complex64 ``` Run search and contract in one command: ```bash TCM_ENABLE=1 python -u tools/tn_contest_runner.py all \ --case main1 \ --torch-threads 48 \ --dtype complex64 \ --dask-address tcp://10.20.1.103:8786 \ --tn-search-repeats 2048 \ --tn-search-time 300 ``` Run only selected observables: ```bash python -u tools/tn_contest_runner.py search \ --case main2 \ --observables open_zz ``` Tree files are written to `trees/contest_tn/` by default. The tree filename contains case, observable, qubit count, layer count, and target slice count. If any of these change, search again. Edit TN contest cases in `tools/tn_contest_runner.py`: - `CASES`: case name, circuit kind, observable list, default scale. - `build_circuit`: circuit definitions. - `pauli_sum_observable`: observable definitions. ## MPS Workflow List built-in Vidal/MPS contest cases: ```bash python -u tools/mps_contest_runner.py list ``` Small correctness check against statevector: ```bash mpirun -np 2 python -u tools/mps_contest_runner.py validate \ --case main1 \ --nqubits 8 \ --nlayers 2 \ --bond 64 \ --torch-threads 4 ``` Run one MPS case on one node: ```bash TCM_ENABLE=1 mpirun -np 2 python -u tools/mps_contest_runner.py run \ --case main1 \ --torch-threads 48 ``` Run one MPS case on two nodes: ```bash TCM_ENABLE=1 mpirun -np 4 -hostfile /home/yx/qibotn/hostfile -perhost 2 \ python -u tools/mps_contest_runner.py run \ --case main1 \ --torch-threads 48 ``` Run only one observable: ```bash TCM_ENABLE=1 mpirun -np 4 -hostfile /home/yx/qibotn/hostfile -perhost 2 \ python -u tools/mps_contest_runner.py run \ --case main1 \ --observables ring_xz \ --torch-threads 48 ``` Override scale: ```bash TCM_ENABLE=1 mpirun -np 4 -hostfile /home/yx/qibotn/hostfile -perhost 2 \ python -u tools/mps_contest_runner.py run \ --case main1 \ --nqubits 128 \ --nlayers 24 \ --bond 1024 \ --torch-threads 48 ``` Edit MPS contest cases in `tools/mps_contest_runner.py`: - `CASES`: case name, circuit kind, observable list, default scale and bond. - `build_circuit`: circuit definitions. - `observable`: observable definitions, including dense local terms. ## Notes - TN uses path search plus contraction. Reuse tree files only for the exact same circuit, observable, qubit count, layer count, seed, and slicing setup. - TN path search defaults to dask. Use `--tn-search-backend processpool` only for fallback/debugging. - Prefer the default `--tn-target-size 4294967296` memory target. Do not force `--tn-target-slices` unless you have already verified that cotengra can find valid trees for that exact setting. - MPS/Vidal does not use contraction-tree search. It runs the circuit directly and reports `trunc_sum` and `trunc_max`. - Default TN contraction is the stable torch/quimb path. Do not pass `--tn-contract-implementation cpp` for contest runs.