补充
Some checks failed
Build wheels / build (ubuntu-latest, 3.11) (push) Has been cancelled
Build wheels / build (ubuntu-latest, 3.12) (push) Has been cancelled
Build wheels / build (ubuntu-latest, 3.13) (push) Has been cancelled
Tests / check (push) Has been cancelled
Tests / build (ubuntu-latest, 3.11) (push) Has been cancelled
Tests / build (ubuntu-latest, 3.12) (push) Has been cancelled
Tests / build (ubuntu-latest, 3.13) (push) Has been cancelled
Some checks failed
Build wheels / build (ubuntu-latest, 3.11) (push) Has been cancelled
Build wheels / build (ubuntu-latest, 3.12) (push) Has been cancelled
Build wheels / build (ubuntu-latest, 3.13) (push) Has been cancelled
Tests / check (push) Has been cancelled
Tests / build (ubuntu-latest, 3.11) (push) Has been cancelled
Tests / build (ubuntu-latest, 3.12) (push) Has been cancelled
Tests / build (ubuntu-latest, 3.13) (push) Has been cancelled
This commit is contained in:
@@ -1,254 +1,53 @@
|
||||
# Contest Runners
|
||||
|
||||
This directory contains two self-contained contest entrypoints:
|
||||
|
||||
- `tools/tn_contest_runner.py`: general tensor-network path search and contraction.
|
||||
- `tools/mps_contest_runner.py`: Vidal/MPS multi-node expectation runner.
|
||||
|
||||
Both scripts keep circuit and observable definitions inside the script so a
|
||||
contest case can be edited in one place.
|
||||
|
||||
## Environment
|
||||
|
||||
Run commands from the repository root:
|
||||
|
||||
# TN
|
||||
```bash
|
||||
cd /home/yx/qibotn
|
||||
```
|
||||
# qibotn目录下
|
||||
I_MPI_FABRICS=shm:ofi \
|
||||
I_MPI_OFI_PROVIDER=tcp \
|
||||
FI_PROVIDER=tcp \
|
||||
CASE=main1 \
|
||||
OBSERVABLES=long_z_string \
|
||||
NQUBITS=34 \
|
||||
NLAYERS=20 \
|
||||
TORCH_THREADS=48 \
|
||||
SEARCH_REPEATS=2048 \
|
||||
SEARCH_TIME=300 \
|
||||
SCHEDULER_HOST=10.20.1.103 \
|
||||
WORKER_HOSTS="10.20.1.103 10.20.6.101" \
|
||||
DASK_ADDRESS="tcp://10.20.1.103:8786" \
|
||||
NWORKERS=84 \
|
||||
NTHREADS=1 \
|
||||
MPIEXEC_FULL="mpirun -np 4 -hostfile /home/yx/qibotn/hostfile -perhost 2" \
|
||||
tools/run_tn_dask_mpi_all.sh
|
||||
|
||||
For Intel MPI on two nodes, use the known working style:
|
||||
# 单独缩并contract计算
|
||||
|
||||
```bash
|
||||
mpirun -np 4 -hostfile /home/yx/qibotn/hostfile -perhost 2 ...
|
||||
```
|
||||
|
||||
Set `TCM_ENABLE=1` for CPU runs:
|
||||
|
||||
```bash
|
||||
export TCM_ENABLE=1
|
||||
```
|
||||
|
||||
## TN Workflow
|
||||
|
||||
List built-in TN contest cases:
|
||||
|
||||
```bash
|
||||
python -u tools/tn_contest_runner.py list
|
||||
```
|
||||
|
||||
TN path search uses dask by default. Without `--dask-address`, the script starts
|
||||
a local dask cluster. For multiple servers, start one scheduler and workers
|
||||
with the helper script, then pass the scheduler address to the search command.
|
||||
|
||||
Start the default two-node dask cluster:
|
||||
|
||||
```bash
|
||||
cd /home/yx/qibotn
|
||||
tools/manage_tn_dask_cluster.sh start
|
||||
```
|
||||
|
||||
Check status:
|
||||
|
||||
```bash
|
||||
cd /home/yx/qibotn
|
||||
tools/manage_tn_dask_cluster.sh status
|
||||
```
|
||||
|
||||
Stop the cluster:
|
||||
|
||||
```bash
|
||||
cd /home/yx/qibotn
|
||||
tools/manage_tn_dask_cluster.sh stop
|
||||
```
|
||||
|
||||
The helper defaults are:
|
||||
|
||||
```bash
|
||||
SCHEDULER_HOST=10.20.1.103
|
||||
WORKER_HOSTS="10.20.1.103 10.20.1.102"
|
||||
NWORKERS=48
|
||||
NTHREADS=1
|
||||
ROOT_DIR=/home/yx/qibotn
|
||||
PYTHON_BIN=.venv/bin/python
|
||||
DASK_WORKER_TTL="24 hours"
|
||||
DASK_TICK_LIMIT="30 minutes"
|
||||
DASK_LOST_WORKER_TIMEOUT="30 minutes"
|
||||
```
|
||||
|
||||
Override them inline if needed:
|
||||
|
||||
```bash
|
||||
WORKER_HOSTS="10.20.1.103 10.20.1.102" NWORKERS=48 \
|
||||
tools/manage_tn_dask_cluster.sh restart
|
||||
```
|
||||
|
||||
Check that both nodes are connected by adding `--tn-debug-trials` to a small
|
||||
search. The output should include `qibotn_dask_workers` with both hosts.
|
||||
|
||||
`tools/tn_contest_runner.py search` stops the external dask cluster after the
|
||||
search phase by default. Pass `--keep-dask` if you want to reuse the same dask
|
||||
cluster for several searches.
|
||||
|
||||
Use enough trials to fill the cluster. With the default two-node setup there are
|
||||
96 worker slots, so `--tn-search-repeats` should be at least 96. The contest
|
||||
runner default is 2048.
|
||||
|
||||
Cotengra trials are CPU-bound and can hold the Python GIL long enough for dask
|
||||
to report `Event loop was unresponsive`. Dask defaults are much more aggressive:
|
||||
`scheduler.worker-ttl=5 minutes`, `admin.tick.limit=3s`, and
|
||||
`deploy.lost-worker-timeout=15s`. The helper script raises these limits so
|
||||
workers are not killed by dask during search. The intended timeout is
|
||||
`--tn-search-time`; after that, the runner stops the external dask cluster.
|
||||
|
||||
Small correctness check against statevector:
|
||||
|
||||
```bash
|
||||
python -u tools/tn_contest_runner.py validate \
|
||||
--case main1 \
|
||||
--nqubits 8 \
|
||||
--nlayers 2 \
|
||||
--torch-threads 4 \
|
||||
--tn-search-repeats 8 \
|
||||
--tn-search-time 5
|
||||
```
|
||||
|
||||
Search and save contraction trees:
|
||||
|
||||
```bash
|
||||
TCM_ENABLE=1 python -u tools/tn_contest_runner.py search \
|
||||
--case main1 \
|
||||
--torch-threads 48 \
|
||||
--dtype complex64 \
|
||||
--dask-address tcp://10.20.1.103:8786 \
|
||||
--tn-search-repeats 2048 \
|
||||
--tn-search-time 300
|
||||
```
|
||||
|
||||
Contract using the saved tree on one node:
|
||||
|
||||
```bash
|
||||
TCM_ENABLE=1 mpirun -np 2 python -u tools/tn_contest_runner.py contract \
|
||||
I_MPI_FABRICS=shm:ofi \
|
||||
I_MPI_OFI_PROVIDER=tcp \
|
||||
FI_PROVIDER=tcp \
|
||||
mpirun -np 4 -hostfile /home/yx/qibotn/hostfile -perhost 2 \
|
||||
.venv/bin/python -u tools/tn_contest_runner.py contract \
|
||||
--mpi \
|
||||
--case main1 \
|
||||
--nqubits 34 \
|
||||
--nlayers 20 \
|
||||
--observables long_z_string \
|
||||
--tree-dir trees/contest_tn \
|
||||
--torch-threads 48 \
|
||||
--dtype complex64
|
||||
```
|
||||
|
||||
Contract using the saved tree on two nodes:
|
||||
|
||||
```bash
|
||||
TCM_ENABLE=1 mpirun -np 4 -hostfile /home/yx/qibotn/hostfile -perhost 2 \
|
||||
python -u tools/tn_contest_runner.py contract \
|
||||
--mpi \
|
||||
--case main1 \
|
||||
--torch-threads 48 \
|
||||
--dtype complex64
|
||||
# MPS
|
||||
```
|
||||
cd /home/yx/qibotn
|
||||
|
||||
Run search and contract in one command:
|
||||
|
||||
```bash
|
||||
TCM_ENABLE=1 python -u tools/tn_contest_runner.py all \
|
||||
--case main1 \
|
||||
--torch-threads 48 \
|
||||
--dtype complex64 \
|
||||
--dask-address tcp://10.20.1.103:8786 \
|
||||
--tn-search-repeats 2048 \
|
||||
--tn-search-time 300
|
||||
```
|
||||
|
||||
Run only selected observables:
|
||||
|
||||
```bash
|
||||
python -u tools/tn_contest_runner.py search \
|
||||
--case main2 \
|
||||
--observables open_zz
|
||||
```
|
||||
|
||||
Tree files are written to `trees/contest_tn/` by default. The tree filename
|
||||
contains case, observable, qubit count, layer count, and target slice count.
|
||||
If any of these change, search again.
|
||||
|
||||
Edit TN contest cases in `tools/tn_contest_runner.py`:
|
||||
|
||||
- `CASES`: case name, circuit kind, observable list, default scale.
|
||||
- `build_circuit`: circuit definitions.
|
||||
- `pauli_sum_observable`: observable definitions.
|
||||
|
||||
## MPS Workflow
|
||||
|
||||
List built-in Vidal/MPS contest cases:
|
||||
|
||||
```bash
|
||||
python -u tools/mps_contest_runner.py list
|
||||
```
|
||||
|
||||
Small correctness check against statevector:
|
||||
|
||||
```bash
|
||||
mpirun -np 2 python -u tools/mps_contest_runner.py validate \
|
||||
--case main1 \
|
||||
--nqubits 8 \
|
||||
--nlayers 2 \
|
||||
--bond 64 \
|
||||
--torch-threads 4
|
||||
```
|
||||
|
||||
Run one MPS case on one node:
|
||||
|
||||
```bash
|
||||
TCM_ENABLE=1 mpirun -np 2 python -u tools/mps_contest_runner.py run \
|
||||
--case main1 \
|
||||
--torch-threads 48
|
||||
```
|
||||
|
||||
Run one MPS case on two nodes:
|
||||
|
||||
```bash
|
||||
TCM_ENABLE=1 mpirun -np 4 -hostfile /home/yx/qibotn/hostfile -perhost 2 \
|
||||
python -u tools/mps_contest_runner.py run \
|
||||
--case main1 \
|
||||
--torch-threads 48
|
||||
```
|
||||
|
||||
Run only one observable:
|
||||
|
||||
```bash
|
||||
TCM_ENABLE=1 mpirun -np 4 -hostfile /home/yx/qibotn/hostfile -perhost 2 \
|
||||
python -u tools/mps_contest_runner.py run \
|
||||
--case main1 \
|
||||
--observables ring_xz \
|
||||
--torch-threads 48
|
||||
```
|
||||
|
||||
Override scale:
|
||||
|
||||
```bash
|
||||
TCM_ENABLE=1 mpirun -np 4 -hostfile /home/yx/qibotn/hostfile -perhost 2 \
|
||||
python -u tools/mps_contest_runner.py run \
|
||||
--case main1 \
|
||||
--nqubits 128 \
|
||||
--nlayers 24 \
|
||||
--bond 1024 \
|
||||
--torch-threads 48
|
||||
```
|
||||
|
||||
Edit MPS contest cases in `tools/mps_contest_runner.py`:
|
||||
|
||||
- `CASES`: case name, circuit kind, observable list, default scale and bond.
|
||||
- `build_circuit`: circuit definitions.
|
||||
- `observable`: observable definitions, including dense local terms.
|
||||
|
||||
## Notes
|
||||
|
||||
- TN uses path search plus contraction. Reuse tree files only for the exact same
|
||||
circuit, observable, qubit count, layer count, seed, and slicing setup.
|
||||
- TN path search defaults to dask. Use `--tn-search-backend processpool` only
|
||||
for fallback/debugging.
|
||||
- Prefer the default `--tn-target-size 4294967296` memory target. Do not force
|
||||
`--tn-target-slices` unless you have already verified that cotengra can find
|
||||
valid trees for that exact setting.
|
||||
- MPS/Vidal does not use contraction-tree search. It runs the circuit directly
|
||||
and reports `trunc_sum` and `trunc_max`.
|
||||
- Default TN contraction is the stable torch/quimb path. Do not pass
|
||||
`--tn-contract-implementation cpp` for contest runs.
|
||||
I_MPI_FABRICS=shm:ofi \
|
||||
I_MPI_OFI_PROVIDER=tcp \
|
||||
FI_PROVIDER=tcp \
|
||||
MPIEXEC_FULL="mpirun -np 4 -hostfile /home/yx/qibotn/hostfile -perhost 2" \
|
||||
TORCH_THREADS=48 \
|
||||
OBS_FILTER=ring_xz \
|
||||
MAIN1_NQ=128 \
|
||||
MAIN1_LAYERS=24 \
|
||||
MAIN1_BOND=1024 \
|
||||
tools/run_vidal_mpi_contest_cases.sh main1
|
||||
```
|
||||
Reference in New Issue
Block a user