补充

2026-05-15 11:11:20 +08:00
parent 915c24dc7b
commit 4c7a10d026
8 changed files with 211 additions and 254 deletions
@@ -1,254 +1,53 @@
-# Contest Runners
-
-This directory contains two self-contained contest entrypoints:
-
- `tools/tn_contest_runner.py`: general tensor-network path search and contraction.
- `tools/mps_contest_runner.py`: Vidal/MPS multi-node expectation runner.
-
-Both scripts keep circuit and observable definitions inside the script so a
-contest case can be edited in one place.
-
-## Environment
-
-Run commands from the repository root:
-
+# TN
 ```bash
-cd /home/yx/qibotn
-```
+# qibotn目录下
+I_MPI_FABRICS=shm:ofi \
+I_MPI_OFI_PROVIDER=tcp \
+FI_PROVIDER=tcp \
+CASE=main1 \
+OBSERVABLES=long_z_string \
+NQUBITS=34 \
+NLAYERS=20 \
+TORCH_THREADS=48 \
+SEARCH_REPEATS=2048 \
+SEARCH_TIME=300 \
+SCHEDULER_HOST=10.20.1.103 \
+WORKER_HOSTS="10.20.1.103 10.20.6.101" \
+DASK_ADDRESS="tcp://10.20.1.103:8786" \
+NWORKERS=84 \
+NTHREADS=1 \
+MPIEXEC_FULL="mpirun -np 4 -hostfile /home/yx/qibotn/hostfile -perhost 2" \
+tools/run_tn_dask_mpi_all.sh

-For Intel MPI on two nodes, use the known working style:
+# 单独缩并contract计算

-```bash
-mpirun -np 4 -hostfile /home/yx/qibotn/hostfile -perhost 2 ...
-```
-
-Set `TCM_ENABLE=1` for CPU runs:
-
-```bash
-export TCM_ENABLE=1
-```
-
-## TN Workflow
-
-List built-in TN contest cases:
-
-```bash
-python -u tools/tn_contest_runner.py list
-```
-
-TN path search uses dask by default. Without `--dask-address`, the script starts
-a local dask cluster. For multiple servers, start one scheduler and workers
-with the helper script, then pass the scheduler address to the search command.
-
-Start the default two-node dask cluster:
-
-```bash
-cd /home/yx/qibotn
-tools/manage_tn_dask_cluster.sh start
-```
-
-Check status:
-
-```bash
-cd /home/yx/qibotn
-tools/manage_tn_dask_cluster.sh status
-```
-
-Stop the cluster:
-
-```bash
-cd /home/yx/qibotn
-tools/manage_tn_dask_cluster.sh stop
-```
-
-The helper defaults are:
-
-```bash
-SCHEDULER_HOST=10.20.1.103
-WORKER_HOSTS="10.20.1.103 10.20.1.102"
-NWORKERS=48
-NTHREADS=1
-ROOT_DIR=/home/yx/qibotn
-PYTHON_BIN=.venv/bin/python
-DASK_WORKER_TTL="24 hours"
-DASK_TICK_LIMIT="30 minutes"
-DASK_LOST_WORKER_TIMEOUT="30 minutes"
-```
-
-Override them inline if needed:
-
-```bash
-WORKER_HOSTS="10.20.1.103 10.20.1.102" NWORKERS=48 \
-  tools/manage_tn_dask_cluster.sh restart
-```
-
-Check that both nodes are connected by adding `--tn-debug-trials` to a small
-search. The output should include `qibotn_dask_workers` with both hosts.
-
-`tools/tn_contest_runner.py search` stops the external dask cluster after the
-search phase by default. Pass `--keep-dask` if you want to reuse the same dask
-cluster for several searches.
-
-Use enough trials to fill the cluster. With the default two-node setup there are
-96 worker slots, so `--tn-search-repeats` should be at least 96. The contest
-runner default is 2048.
-
-Cotengra trials are CPU-bound and can hold the Python GIL long enough for dask
-to report `Event loop was unresponsive`. Dask defaults are much more aggressive:
-`scheduler.worker-ttl=5 minutes`, `admin.tick.limit=3s`, and
-`deploy.lost-worker-timeout=15s`. The helper script raises these limits so
-workers are not killed by dask during search. The intended timeout is
-`--tn-search-time`; after that, the runner stops the external dask cluster.
-
-Small correctness check against statevector:
-
-```bash
-python -u tools/tn_contest_runner.py validate \
-  --case main1 \
-  --nqubits 8 \
-  --nlayers 2 \
-  --torch-threads 4 \
-  --tn-search-repeats 8 \
-  --tn-search-time 5
-```
-
-Search and save contraction trees:
-
-```bash
-TCM_ENABLE=1 python -u tools/tn_contest_runner.py search \
-  --case main1 \
-  --torch-threads 48 \
-  --dtype complex64 \
-  --dask-address tcp://10.20.1.103:8786 \
-  --tn-search-repeats 2048 \
-  --tn-search-time 300
-```
-
-Contract using the saved tree on one node:
-
-```bash
-TCM_ENABLE=1 mpirun -np 2 python -u tools/tn_contest_runner.py contract \
+I_MPI_FABRICS=shm:ofi \
+I_MPI_OFI_PROVIDER=tcp \
+FI_PROVIDER=tcp \
+mpirun -np 4 -hostfile /home/yx/qibotn/hostfile -perhost 2 \
+  .venv/bin/python -u tools/tn_contest_runner.py contract \
  --mpi \
  --case main1 \
+  --nqubits 34 \
+  --nlayers 20 \
+  --observables long_z_string \
+  --tree-dir trees/contest_tn \
  --torch-threads 48 \
  --dtype complex64
 ```

-Contract using the saved tree on two nodes:
-
-```bash
-TCM_ENABLE=1 mpirun -np 4 -hostfile /home/yx/qibotn/hostfile -perhost 2 \
-  python -u tools/tn_contest_runner.py contract \
-  --mpi \
-  --case main1 \
-  --torch-threads 48 \
-  --dtype complex64
+# MPS
 ```
+cd /home/yx/qibotn

-Run search and contract in one command:
-
-```bash
-TCM_ENABLE=1 python -u tools/tn_contest_runner.py all \
-  --case main1 \
-  --torch-threads 48 \
-  --dtype complex64 \
-  --dask-address tcp://10.20.1.103:8786 \
-  --tn-search-repeats 2048 \
-  --tn-search-time 300
-```
-
-Run only selected observables:
-
-```bash
-python -u tools/tn_contest_runner.py search \
-  --case main2 \
-  --observables open_zz
-```
-
-Tree files are written to `trees/contest_tn/` by default. The tree filename
-contains case, observable, qubit count, layer count, and target slice count.
-If any of these change, search again.
-
-Edit TN contest cases in `tools/tn_contest_runner.py`:
-
- `CASES`: case name, circuit kind, observable list, default scale.
- `build_circuit`: circuit definitions.
- `pauli_sum_observable`: observable definitions.
-
-## MPS Workflow
-
-List built-in Vidal/MPS contest cases:
-
-```bash
-python -u tools/mps_contest_runner.py list
-```
-
-Small correctness check against statevector:
-
-```bash
-mpirun -np 2 python -u tools/mps_contest_runner.py validate \
-  --case main1 \
-  --nqubits 8 \
-  --nlayers 2 \
-  --bond 64 \
-  --torch-threads 4
-```
-
-Run one MPS case on one node:
-
-```bash
-TCM_ENABLE=1 mpirun -np 2 python -u tools/mps_contest_runner.py run \
-  --case main1 \
-  --torch-threads 48
-```
-
-Run one MPS case on two nodes:
-
-```bash
-TCM_ENABLE=1 mpirun -np 4 -hostfile /home/yx/qibotn/hostfile -perhost 2 \
-  python -u tools/mps_contest_runner.py run \
-  --case main1 \
-  --torch-threads 48
-```
-
-Run only one observable:
-
-```bash
-TCM_ENABLE=1 mpirun -np 4 -hostfile /home/yx/qibotn/hostfile -perhost 2 \
-  python -u tools/mps_contest_runner.py run \
-  --case main1 \
-  --observables ring_xz \
-  --torch-threads 48
-```
-
-Override scale:
-
-```bash
-TCM_ENABLE=1 mpirun -np 4 -hostfile /home/yx/qibotn/hostfile -perhost 2 \
-  python -u tools/mps_contest_runner.py run \
-  --case main1 \
-  --nqubits 128 \
-  --nlayers 24 \
-  --bond 1024 \
-  --torch-threads 48
-```
-
-Edit MPS contest cases in `tools/mps_contest_runner.py`:
-
- `CASES`: case name, circuit kind, observable list, default scale and bond.
- `build_circuit`: circuit definitions.
- `observable`: observable definitions, including dense local terms.
-
-## Notes
-
- TN uses path search plus contraction. Reuse tree files only for the exact same
-  circuit, observable, qubit count, layer count, seed, and slicing setup.
- TN path search defaults to dask. Use `--tn-search-backend processpool` only
-  for fallback/debugging.
- Prefer the default `--tn-target-size 4294967296` memory target. Do not force
-  `--tn-target-slices` unless you have already verified that cotengra can find
-  valid trees for that exact setting.
- MPS/Vidal does not use contraction-tree search. It runs the circuit directly
-  and reports `trunc_sum` and `trunc_max`.
- Default TN contraction is the stable torch/quimb path. Do not pass
-  `--tn-contract-implementation cpp` for contest runs.
+I_MPI_FABRICS=shm:ofi \
+I_MPI_OFI_PROVIDER=tcp \
+FI_PROVIDER=tcp \
+MPIEXEC_FULL="mpirun -np 4 -hostfile /home/yx/qibotn/hostfile -perhost 2" \
+TORCH_THREADS=48 \
+OBS_FILTER=ring_xz \
+MAIN1_NQ=128 \
+MAIN1_LAYERS=24 \
+MAIN1_BOND=1024 \
+tools/run_vidal_mpi_contest_cases.sh main1
+```