diff --git a/docs/contest_runners.md b/docs/contest_runners.md index 28c0d33..8db6ddf 100644 --- a/docs/contest_runners.md +++ b/docs/contest_runners.md @@ -1,254 +1,53 @@ -# Contest Runners - -This directory contains two self-contained contest entrypoints: - -- `tools/tn_contest_runner.py`: general tensor-network path search and contraction. -- `tools/mps_contest_runner.py`: Vidal/MPS multi-node expectation runner. - -Both scripts keep circuit and observable definitions inside the script so a -contest case can be edited in one place. - -## Environment - -Run commands from the repository root: - +# TN ```bash -cd /home/yx/qibotn -``` +# qibotn目录下 +I_MPI_FABRICS=shm:ofi \ +I_MPI_OFI_PROVIDER=tcp \ +FI_PROVIDER=tcp \ +CASE=main1 \ +OBSERVABLES=long_z_string \ +NQUBITS=34 \ +NLAYERS=20 \ +TORCH_THREADS=48 \ +SEARCH_REPEATS=2048 \ +SEARCH_TIME=300 \ +SCHEDULER_HOST=10.20.1.103 \ +WORKER_HOSTS="10.20.1.103 10.20.6.101" \ +DASK_ADDRESS="tcp://10.20.1.103:8786" \ +NWORKERS=84 \ +NTHREADS=1 \ +MPIEXEC_FULL="mpirun -np 4 -hostfile /home/yx/qibotn/hostfile -perhost 2" \ +tools/run_tn_dask_mpi_all.sh -For Intel MPI on two nodes, use the known working style: +# 单独缩并contract计算 -```bash -mpirun -np 4 -hostfile /home/yx/qibotn/hostfile -perhost 2 ... -``` - -Set `TCM_ENABLE=1` for CPU runs: - -```bash -export TCM_ENABLE=1 -``` - -## TN Workflow - -List built-in TN contest cases: - -```bash -python -u tools/tn_contest_runner.py list -``` - -TN path search uses dask by default. Without `--dask-address`, the script starts -a local dask cluster. For multiple servers, start one scheduler and workers -with the helper script, then pass the scheduler address to the search command. - -Start the default two-node dask cluster: - -```bash -cd /home/yx/qibotn -tools/manage_tn_dask_cluster.sh start -``` - -Check status: - -```bash -cd /home/yx/qibotn -tools/manage_tn_dask_cluster.sh status -``` - -Stop the cluster: - -```bash -cd /home/yx/qibotn -tools/manage_tn_dask_cluster.sh stop -``` - -The helper defaults are: - -```bash -SCHEDULER_HOST=10.20.1.103 -WORKER_HOSTS="10.20.1.103 10.20.1.102" -NWORKERS=48 -NTHREADS=1 -ROOT_DIR=/home/yx/qibotn -PYTHON_BIN=.venv/bin/python -DASK_WORKER_TTL="24 hours" -DASK_TICK_LIMIT="30 minutes" -DASK_LOST_WORKER_TIMEOUT="30 minutes" -``` - -Override them inline if needed: - -```bash -WORKER_HOSTS="10.20.1.103 10.20.1.102" NWORKERS=48 \ - tools/manage_tn_dask_cluster.sh restart -``` - -Check that both nodes are connected by adding `--tn-debug-trials` to a small -search. The output should include `qibotn_dask_workers` with both hosts. - -`tools/tn_contest_runner.py search` stops the external dask cluster after the -search phase by default. Pass `--keep-dask` if you want to reuse the same dask -cluster for several searches. - -Use enough trials to fill the cluster. With the default two-node setup there are -96 worker slots, so `--tn-search-repeats` should be at least 96. The contest -runner default is 2048. - -Cotengra trials are CPU-bound and can hold the Python GIL long enough for dask -to report `Event loop was unresponsive`. Dask defaults are much more aggressive: -`scheduler.worker-ttl=5 minutes`, `admin.tick.limit=3s`, and -`deploy.lost-worker-timeout=15s`. The helper script raises these limits so -workers are not killed by dask during search. The intended timeout is -`--tn-search-time`; after that, the runner stops the external dask cluster. - -Small correctness check against statevector: - -```bash -python -u tools/tn_contest_runner.py validate \ - --case main1 \ - --nqubits 8 \ - --nlayers 2 \ - --torch-threads 4 \ - --tn-search-repeats 8 \ - --tn-search-time 5 -``` - -Search and save contraction trees: - -```bash -TCM_ENABLE=1 python -u tools/tn_contest_runner.py search \ - --case main1 \ - --torch-threads 48 \ - --dtype complex64 \ - --dask-address tcp://10.20.1.103:8786 \ - --tn-search-repeats 2048 \ - --tn-search-time 300 -``` - -Contract using the saved tree on one node: - -```bash -TCM_ENABLE=1 mpirun -np 2 python -u tools/tn_contest_runner.py contract \ +I_MPI_FABRICS=shm:ofi \ +I_MPI_OFI_PROVIDER=tcp \ +FI_PROVIDER=tcp \ +mpirun -np 4 -hostfile /home/yx/qibotn/hostfile -perhost 2 \ + .venv/bin/python -u tools/tn_contest_runner.py contract \ --mpi \ --case main1 \ + --nqubits 34 \ + --nlayers 20 \ + --observables long_z_string \ + --tree-dir trees/contest_tn \ --torch-threads 48 \ --dtype complex64 ``` -Contract using the saved tree on two nodes: - -```bash -TCM_ENABLE=1 mpirun -np 4 -hostfile /home/yx/qibotn/hostfile -perhost 2 \ - python -u tools/tn_contest_runner.py contract \ - --mpi \ - --case main1 \ - --torch-threads 48 \ - --dtype complex64 +# MPS ``` +cd /home/yx/qibotn -Run search and contract in one command: - -```bash -TCM_ENABLE=1 python -u tools/tn_contest_runner.py all \ - --case main1 \ - --torch-threads 48 \ - --dtype complex64 \ - --dask-address tcp://10.20.1.103:8786 \ - --tn-search-repeats 2048 \ - --tn-search-time 300 -``` - -Run only selected observables: - -```bash -python -u tools/tn_contest_runner.py search \ - --case main2 \ - --observables open_zz -``` - -Tree files are written to `trees/contest_tn/` by default. The tree filename -contains case, observable, qubit count, layer count, and target slice count. -If any of these change, search again. - -Edit TN contest cases in `tools/tn_contest_runner.py`: - -- `CASES`: case name, circuit kind, observable list, default scale. -- `build_circuit`: circuit definitions. -- `pauli_sum_observable`: observable definitions. - -## MPS Workflow - -List built-in Vidal/MPS contest cases: - -```bash -python -u tools/mps_contest_runner.py list -``` - -Small correctness check against statevector: - -```bash -mpirun -np 2 python -u tools/mps_contest_runner.py validate \ - --case main1 \ - --nqubits 8 \ - --nlayers 2 \ - --bond 64 \ - --torch-threads 4 -``` - -Run one MPS case on one node: - -```bash -TCM_ENABLE=1 mpirun -np 2 python -u tools/mps_contest_runner.py run \ - --case main1 \ - --torch-threads 48 -``` - -Run one MPS case on two nodes: - -```bash -TCM_ENABLE=1 mpirun -np 4 -hostfile /home/yx/qibotn/hostfile -perhost 2 \ - python -u tools/mps_contest_runner.py run \ - --case main1 \ - --torch-threads 48 -``` - -Run only one observable: - -```bash -TCM_ENABLE=1 mpirun -np 4 -hostfile /home/yx/qibotn/hostfile -perhost 2 \ - python -u tools/mps_contest_runner.py run \ - --case main1 \ - --observables ring_xz \ - --torch-threads 48 -``` - -Override scale: - -```bash -TCM_ENABLE=1 mpirun -np 4 -hostfile /home/yx/qibotn/hostfile -perhost 2 \ - python -u tools/mps_contest_runner.py run \ - --case main1 \ - --nqubits 128 \ - --nlayers 24 \ - --bond 1024 \ - --torch-threads 48 -``` - -Edit MPS contest cases in `tools/mps_contest_runner.py`: - -- `CASES`: case name, circuit kind, observable list, default scale and bond. -- `build_circuit`: circuit definitions. -- `observable`: observable definitions, including dense local terms. - -## Notes - -- TN uses path search plus contraction. Reuse tree files only for the exact same - circuit, observable, qubit count, layer count, seed, and slicing setup. -- TN path search defaults to dask. Use `--tn-search-backend processpool` only - for fallback/debugging. -- Prefer the default `--tn-target-size 4294967296` memory target. Do not force - `--tn-target-slices` unless you have already verified that cotengra can find - valid trees for that exact setting. -- MPS/Vidal does not use contraction-tree search. It runs the circuit directly - and reports `trunc_sum` and `trunc_max`. -- Default TN contraction is the stable torch/quimb path. Do not pass - `--tn-contract-implementation cpp` for contest runs. +I_MPI_FABRICS=shm:ofi \ +I_MPI_OFI_PROVIDER=tcp \ +FI_PROVIDER=tcp \ +MPIEXEC_FULL="mpirun -np 4 -hostfile /home/yx/qibotn/hostfile -perhost 2" \ +TORCH_THREADS=48 \ +OBS_FILTER=ring_xz \ +MAIN1_NQ=128 \ +MAIN1_LAYERS=24 \ +MAIN1_BOND=1024 \ +tools/run_vidal_mpi_contest_cases.sh main1 +``` \ No newline at end of file diff --git a/hostfile b/hostfile index e596b93..19358eb 100644 --- a/hostfile +++ b/hostfile @@ -1,2 +1,2 @@ 10.20.1.103:2 -10.20.1.102:2 +10.20.6.101:2 diff --git a/src/qibotn/backends/cpu.py b/src/qibotn/backends/cpu.py index 5259fe0..27db0b5 100644 --- a/src/qibotn/backends/cpu.py +++ b/src/qibotn/backends/cpu.py @@ -41,11 +41,19 @@ def _bind_numa_node(rank): Returns the NUMA domain that was selected, or ``None`` if the binding could not be determined. """ + current_affinity = os.sched_getaffinity(0) + online_cpus = set(range(os.cpu_count() or 1)) + if current_affinity and current_affinity != online_cpus: + # MPI launchers such as Intel MPI often pin local ranks correctly + # before Python starts. Do not narrow that placement further. + return None + local_rank = rank for name in ( "OMPI_COMM_WORLD_LOCAL_RANK", "MV2_COMM_WORLD_LOCAL_RANK", "MPI_LOCALRANKID", + "I_MPI_LOCAL_RANK", "SLURM_LOCALID", ): try: @@ -54,13 +62,27 @@ def _bind_numa_node(rank): except (KeyError, ValueError): pass - domain = local_rank % 2 - cpulist = f"/sys/devices/system/node/node{domain}/cpulist" + domains = _available_numa_domains() + if not domains: + return None + + local_size = _local_world_size() + assigned_domains = domains[local_rank::local_size] + if not assigned_domains: + assigned_domains = [domains[local_rank % len(domains)]] + + domain = assigned_domains[0] + cpus = set() + for selected in assigned_domains: + cpulist = f"/sys/devices/system/node/node{selected}/cpulist" + try: + cpus.update(_parse_cpu_list(open(cpulist, encoding="utf-8").read().strip())) + except (FileNotFoundError, OSError): + pass try: - cpus = _parse_cpu_list(open(cpulist, encoding="utf-8").read().strip()) if cpus: os.sched_setaffinity(0, cpus) - except (FileNotFoundError, OSError): + except OSError: pass try: @@ -76,6 +98,38 @@ def _bind_numa_node(rank): return domain +def _available_numa_domains(): + nodes = [] + base = Path("/sys/devices/system/node") + try: + for path in base.glob("node[0-9]*"): + try: + nodes.append(int(path.name[4:])) + except ValueError: + pass + except OSError: + return [] + return sorted(nodes) + + +def _local_world_size(): + for name in ( + "OMPI_COMM_WORLD_LOCAL_SIZE", + "MV2_COMM_WORLD_LOCAL_SIZE", + "MPI_LOCALNRANKS", + "I_MPI_LOCAL_SIZE", + "SLURM_NTASKS_PER_NODE", + ): + value = os.environ.get(name) + if not value: + continue + try: + return max(1, int(str(value).split("(", 1)[0])) + except ValueError: + pass + return 1 + + def _parse_cpu_list(text): cpus = set() for item in text.split(","): diff --git a/src/qibotn/parallel.py b/src/qibotn/parallel.py index 2603b4f..46ecc53 100644 --- a/src/qibotn/parallel.py +++ b/src/qibotn/parallel.py @@ -745,6 +745,12 @@ def _contract_mpi( is_torch = backend == "torch" nslices = int(getattr(tree, "multiplicity", 1)) stats = SlicedContractStats(rank, size, nslices, 0, assignment) + nslices_by_rank = comm.allgather(nslices) + if len(set(nslices_by_rank)) != 1: + raise RuntimeError( + "Inconsistent contraction tree slices across MPI ranks: " + f"{nslices_by_rank}. Ensure all nodes load the same tree file." + ) if not set(getattr(tree, "sliced_inds", ())).isdisjoint(set(getattr(tree, "output", ()))): raise NotImplementedError( diff --git a/tools/manage_tn_dask_cluster.sh b/tools/manage_tn_dask_cluster.sh index 2fb7446..b91cd84 100755 --- a/tools/manage_tn_dask_cluster.sh +++ b/tools/manage_tn_dask_cluster.sh @@ -5,7 +5,7 @@ set -euo pipefail # # Defaults target two servers: # scheduler: 10.20.1.103:8786 -# workers: 10.20.1.103, 10.20.1.102 +# workers: 10.20.1.103, 10.20.6.101 # # Usage: # tools/manage_tn_dask_cluster.sh start @@ -14,7 +14,7 @@ set -euo pipefail # # Common overrides: # SCHEDULER_HOST=10.20.1.103 -# WORKER_HOSTS="10.20.1.103 10.20.1.102" +# WORKER_HOSTS="10.20.1.103 10.20.6.101" # NWORKERS=48 # NTHREADS=1 # ROOT_DIR=/home/yx/qibotn @@ -25,8 +25,8 @@ PYTHON_BIN="${PYTHON_BIN:-.venv/bin/python}" SCHEDULER_HOST="${SCHEDULER_HOST:-10.20.1.103}" SCHEDULER_PORT="${SCHEDULER_PORT:-8786}" DASHBOARD_ADDRESS="${DASHBOARD_ADDRESS:-:8787}" -WORKER_HOSTS="${WORKER_HOSTS:-10.20.1.103 10.20.1.102}" -NWORKERS="${NWORKERS:-48}" +WORKER_HOSTS="${WORKER_HOSTS:-10.20.1.103 10.20.6.101}" +NWORKERS="${NWORKERS:-84}" NTHREADS="${NTHREADS:-1}" MEMORY_LIMIT="${MEMORY_LIMIT:-0}" LOCAL_DIRECTORY="${LOCAL_DIRECTORY:-/tmp/qibotn-dask}" diff --git a/tools/run_tn_dask_mpi_all.sh b/tools/run_tn_dask_mpi_all.sh new file mode 100755 index 0000000..c273534 --- /dev/null +++ b/tools/run_tn_dask_mpi_all.sh @@ -0,0 +1,93 @@ +#!/usr/bin/env bash +set -euo pipefail + +ROOT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")/.." && pwd)" +cd "$ROOT_DIR" + +CASE="${CASE:-main1}" +OBSERVABLES="${OBSERVABLES:-long_z_string}" +NQUBITS="${NQUBITS:-34}" +NLAYERS="${NLAYERS:-20}" +TORCH_THREADS="${TORCH_THREADS:-48}" +SEARCH_REPEATS="${SEARCH_REPEATS:-2048}" +SEARCH_TIME="${SEARCH_TIME:-300}" +TN_TARGET_SIZE="${TN_TARGET_SIZE:-8589934592}" +TN_TARGET_SLICES="${TN_TARGET_SLICES:-}" + +PYTHON_BIN="${PYTHON_BIN:-.venv/bin/python}" +DTYPE="${DTYPE:-complex64}" +TREE_DIR="${TREE_DIR:-trees/contest_tn}" +DASK_ADDRESS="${DASK_ADDRESS:-tcp://10.20.1.103:8786}" +MPIEXEC_FULL="${MPIEXEC_FULL:-mpirun -np 4 -hostfile /home/yx/qibotn/hostfile -perhost 2}" +SYNC_TREES="${SYNC_TREES:-1}" +SYNC_HOSTS="${SYNC_HOSTS:-${WORKER_HOSTS:-}}" +SSH_BIN="${SSH_BIN:-ssh}" + +export TCM_ENABLE="${TCM_ENABLE:-1}" + +tn_slice_args=(--tn-target-size "$TN_TARGET_SIZE") +if [[ -n "$TN_TARGET_SLICES" ]]; then + tn_slice_args+=(--tn-target-slices "$TN_TARGET_SLICES") +fi + +is_local_host() { + local host="$1" + [[ "$host" == "localhost" || "$host" == "127.0.0.1" ]] && return 0 + [[ "$host" == "$(hostname)" ]] && return 0 + [[ "$host" == "$(hostname -f 2>/dev/null || true)" ]] && return 0 + hostname -I 2>/dev/null | tr ' ' '\n' | grep -qx "$host" +} + +sync_trees_to_hosts() { + [[ "$SYNC_TREES" == "1" ]] || return 0 + [[ -n "$SYNC_HOSTS" ]] || return 0 + + local src_dir="$TREE_DIR" + local dst_dir="$TREE_DIR" + if [[ "$TREE_DIR" != /* ]]; then + src_dir="$ROOT_DIR/$TREE_DIR" + dst_dir="$ROOT_DIR/$TREE_DIR" + fi + + for host in $SYNC_HOSTS; do + is_local_host "$host" && continue + echo "Sync tree dir to $host:$dst_dir" + "$SSH_BIN" "$host" "mkdir -p $(printf '%q' "$dst_dir")" + if command -v rsync >/dev/null 2>&1; then + rsync -a "$src_dir/" "$host:$dst_dir/" + else + scp -q "$src_dir"/*.pkl "$host:$dst_dir/" + fi + done +} + +tools/manage_tn_dask_cluster.sh start + +echo "Search with dask: $DASK_ADDRESS" +"$PYTHON_BIN" -u tools/tn_contest_runner.py search \ + --case "$CASE" \ + --nqubits "$NQUBITS" \ + --nlayers "$NLAYERS" \ + --observables $OBSERVABLES \ + --tree-dir "$TREE_DIR" \ + --dask-address "$DASK_ADDRESS" \ + --torch-threads "$TORCH_THREADS" \ + --dtype "$DTYPE" \ + --tn-search-repeats "$SEARCH_REPEATS" \ + --tn-search-time "$SEARCH_TIME" \ + "${tn_slice_args[@]}" + +sync_trees_to_hosts + +echo "Contract with MPI: $MPIEXEC_FULL" +read -r -a mpi_prefix <<< "$MPIEXEC_FULL" +"${mpi_prefix[@]}" "$PYTHON_BIN" -u tools/tn_contest_runner.py contract \ + --mpi \ + --case "$CASE" \ + --nqubits "$NQUBITS" \ + --nlayers "$NLAYERS" \ + --observables $OBSERVABLES \ + --tree-dir "$TREE_DIR" \ + --torch-threads "$TORCH_THREADS" \ + --dtype "$DTYPE" \ + "${tn_slice_args[@]}" diff --git a/tools/tn_contest_runner.py b/tools/tn_contest_runner.py index 680cecd..40de960 100644 --- a/tools/tn_contest_runner.py +++ b/tools/tn_contest_runner.py @@ -199,7 +199,7 @@ def build_parallel_opts(args, tree_file=None, search_only=False): "search_workers": args.tn_search_workers or args.torch_threads, "max_repeats": args.tn_search_repeats, "max_time": args.tn_search_time, - "print_stats": not args.no_tn_stats, + "print_stats": False, } if args.tn_search_backend is not None: opts["search_backend"] = args.tn_search_backend @@ -303,7 +303,7 @@ def run_one(args, case_name, obs_name, mode): f"failed_trials={search_stats.get('failed_trials', 'na')} " f"requested_trials={search_stats.get('requested_trials', 'na')} " f"best_score={search_stats.get('best_score', float('nan')):.6g} " - f"slices={cost.get('slices')} " + f"slices={cost.get('nslices')} " f"log10_flops={cost.get('log10_flops', float('nan')):.3f} " f"log10_write={cost.get('log10_write', float('nan')):.3f} " f"log2_size={cost.get('log2_size', float('nan')):.3f} " @@ -337,6 +337,11 @@ def apply_case_defaults(args): def stop_dask_cluster(args): if args.keep_dask or args.tn_search_backend != "dask" or not args.dask_address: return + if args.mpi: + from mpi4py import MPI + + if MPI.COMM_WORLD.Get_rank() != 0: + return script = ROOT / "tools" / "manage_tn_dask_cluster.sh" if not script.exists(): print(f"dask_stop_skipped reason=missing_script path={script}", flush=True) diff --git a/trees/contest_tn/main1_long_z_string_34q20l_auto.pkl b/trees/contest_tn/main1_long_z_string_34q20l_auto.pkl index 76eeedd..55ac205 100644 Binary files a/trees/contest_tn/main1_long_z_string_34q20l_auto.pkl and b/trees/contest_tn/main1_long_z_string_34q20l_auto.pkl differ