Frequently Asked Questions
Network
1. How profitable is running a prover?
Profitability depends on hardware, electricity costs, and competition on the network. The minimal implementation provided is not expected to be competitive for generating significant revenue. It serves primarily as a reference for developers building optimized proving implementations.
2. How much $PROVE do I need to stake?
You need to stake at least 1000 $PROVE tokens to be eligible to bid for proofs on the network. However, the staking requirements may change over time.
3. What happens if I can't complete a proof in time?
If your prover accepts a request but fails to deliver a proof before the deadline, a portion of your stake may be slashed as a penalty. This ensures the reliability of the network.
Slashing is not currently enabled.
4. What happens if my prover goes offline?
If your prover goes offline, it won't be able to bid for or complete proofs. There's no penalty for simply being offline, but if your prover was in the middle of generating a proof and fails to deliver it before the deadline, a portion of your stake may be slashed as a penalty. This ensures the reliability of the network.
Slashing is not currently enabled.
5. How can I unstake my $PROVE tokens?
You can unstake your tokens through the frontend or by directly interacting with the SuccinctStaking contract. Note that there may be a cooldown period before tokens can be withdrawn after unstaking.
Cluster
6. I have >50 GPUs in my cluster and I am adding even more, how come a single Fibonacci proof is not getting any faster?
Because execution is a bottleneck, the # of GPUs used for a single proof is limited to ~30-40 GPUs.
SP1 proof generation is bottlenecked by the executor that executes the VM program and creates checkpoints to prove from. Therefore, there is an upper limit on how many GPUs can be utilized by a single proof. In other words, the executor is only able to create work for GPU nodes at a certain rate. This upper limit on GPUs is around 50 from our testing. This bottleneck can be alleviated by running the CPU node on a CPU machine with better single-core clock speed. Future SP1 upgrades will also improve upon this bottleneck by optimizing the executor code.
Adding additional GPUs should still improve the total throughput of your cluster, but it may not be noticeable without multiple concurrent proofs. Therefore, if you are trying to test the throughput of your cluster with more than 50 GPUs, you should run multiple proofs at once in order to measure the overall throughput.
7. Why is my GPU VRAM usage so low?
SP1 proof generation on GPUs is mainly compute/bandwidth bound rather than memory bound, so low VRAM usage does not always mean the GPU is underutilized. As of SP1 v5, at least 24 GB of VRAM is recommended to prevent GPU OOM errors. Significantly more VRAM does not necessarily improve performance.
8. How can I tell if I am getting good performance out of my cluster? The CPU/GPU utilization seems low.
Ensuring the cluster is performant is a complex problem. There are many factors that could affect performance.
Measuring performance
Firstly, it's important to get an accurate measurement of the resulting performance from your cluster. You can do this using the cluster CLI:
- Docker Compose
- Kubernetes
docker run --rm -it --network=infra_default ghcr.io/succinctlabs/sp1-cluster:base-latest \
/cli bench fibonacci 300 -c 2 \
--cluster-rpc http://api:50051 \
--redis-nodes redis://:redispassword@redis:6379/0
kubectl run cli --image=ghcr.io/succinctlabs/sp1-cluster:base-latest \
--rm -it -n sp1-cluster-test /bin/bash
/cli bench fibonacci 300 --count 2 \
--cluster-rpc http://api-grpc:50051 \
--redis-nodes redis://:redispassword@redis-master:6379/0
You should increase --count
based on how many GPUs you have. As a starting point you could use # GPUs / 30
, and try increasing it by 1 until aggregate throughput does not improve anymore.
This should output something like:
2025-09-10T21:03:55.029021Z INFO crates/common/src/logger.rs:110: logging initialized
2025-09-10T21:03:55.029688Z INFO bin/cli/src/commands/bench.rs:71: Running 1x Fibonacci Compressed for 5 million cycles...
...
2025-09-10T21:03:56.373132Z INFO bin/cli/src/commands/bench.rs:150: base_id: cli_1757538236373
2025-09-10T21:04:02.425794Z INFO bin/cli/src/commands/bench.rs:215: Completed after 5.969402833s
2025-09-10T21:04:02.425818Z INFO bin/cli/src/commands/bench.rs:217: Total Cycles: 5000000 | Aggregate MHz: 0.84
The output Aggregate MHz
is the aggregate proving throughput of your cluster observed from the test.
In our experience, we tend to observe ~2 million PGU/s of proving throughput per GPU node for higher performance GPUs (ex. RTX 4090/5090) and closer to 1m PGU/s for less powerful GPUs (ex. L4). Other factors such as GPU bandwidth / PCIe lane count, CPU cores/clock speed, CPU memory speed and network bandwidth can also affect overall performance.
Potential bottlenecks
There are a few likely bottlenecks that can affect performance:
- Uploading/downloading circuit artifacts takes too much time (>300ms)
- You can verify whether this is the case by inspecting the CPU/GPU node logs for download/upload span times (ex.
download: close time.busy=10ms time.idle=1200ms
) or by setting up distributed tracing and inspecting the traces for unusually large spans. - Redis nodes could be bound by compute, memory, or networking. You can scale the artifact store load horizontally by simply setting up additional Redis instances and adding them (comma-separated) to the
REDIS_NODES
env var. (ex.REDIS_NODES=redis://:redispassword@redis1:6379/0,redis://:redispassword@redis2:6379/0
). Also ensure that persistence is disabled for your Redis instance (persistence.enabled: false
orREDIS_AOF_ENABLED=no
) as it is not needed. You can also try tuning other Redis parameters such asio-threads
andhz
. - You should not use S3 as the artifact store unless you are running in AWS due to the increased data transfer cost and latency.
- You can verify whether this is the case by inspecting the CPU/GPU node logs for download/upload span times (ex.
- GPU nodes are not tuned properly, meaning the node is not able to provide enough work to the GPU itself to keep it busy.
- You can increase the max weight override to run more tasks in parallel. (ex.
WORKER_MAX_WEIGHT_OVERRIDE=32
). Weight is measured in such that 1 weight is equivalent to 1 GB of RAM (not VRAM) available to the node. Setting the parameter too high can cause the node to run too many tasks concurrently, run out of memory, and crash. - You can measure this by setting up Prometheus metrics and the provided Grafana dashboard and inspecting the "Active GPUs" chart while a proof runs for several minutes. This chart has "Estimated Active GPUs" which reflects total GPU utilization over time. A low value relative to # of total GPUs means some GPUs were idle for a majority of the interval. You can also measure GPU utilization using
nvidia-smi
or similar tools.
- You can increase the max weight override to run more tasks in parallel. (ex.
- Network operations between the coordinator and nodes has too much latency.
- These should all be running on the same internal network, ideally with at least 10Gbps bandwidth.
- There are not enough CPU nodes to run proofs in parallel.
- If you are running many proofs at once (ex.
BIDDER_MAX_CONCURRENT_PROOFS
> 10), you should have up to 1 CPU node per proof, ideally with at least 32 worker weight per CPU node. You can get away with fewer CPU nodes (as little as 4 worker weight per concurrent proof), but you may notice decreased cluster utilization when there are Plonk or Groth16 tasks running.
- If you are running many proofs at once (ex.
Miscellaneous issues
- You should not set
WORKER_TYPE=ALL
on CPU machines if your cluster has GPUs, since doing so will cause the CPU node to run proving tasks, and the CPU prover is much less performant than GPU.
Note that if you are using 5090s, there is currently a driver bug which requires you to set the env var MOONGATE_DISABLE_GRIND_DEVICE=true
on the GPU node to enable a workaround. Not setting this will greatly reduce proving performance.
9. Why are my Groth16/Plonk proofs timing out / failing?
If your prover is consistently failing or timing out on Groth16 or Plonk proofs, it's likely that circuit artifacts were not downloaded properly or your CPU node is not performant enough.
Artifacts setup
Check your CPU node logs for messages like the following:
2025-08-12T00:44:40.415405Z INFO task: /app/crates/worker/src/tasks/finalize.rs:55: Waiting for circuit artifacts to be ready proof_id="28cf80a36990e85a064da2d9ccf82a7dfc7aeecd57a95aeb3353143f14b5102c" task_id="task_01k2dtppn8fe9v7az1mek4rnvg" otel.name="PLONK_WRAP"
2025-08-12T00:44:45.415810Z INFO task: /app/crates/worker/src/tasks/finalize.rs:55: Waiting for circuit artifacts to be ready proof_id="28cf80a36990e85a064da2d9ccf82a7dfc7aeecd57a95aeb3353143f14b5102c" task_id="task_01k2dtppn8fe9v7az1mek4rnvg" otel.name="PLONK_WRAP"
2025-08-12T00:44:50.417412Z INFO task: /app/crates/worker/src/tasks/finalize.rs:55: Waiting for circuit artifacts to be ready proof_id="28cf80a36990e85a064da2d9ccf82a7dfc7aeecd57a95aeb3353143f14b5102c" task_id="task_01k2dtppn8fe9v7az1mek4rnvg" otel.name="PLONK_WRAP"
This suggests the CPU node was unable to download artifacts properly, likely due to directory permissions if you are mounting the artifacts directory as a volume. By default, the artifacts are downloaded to /root/.sp1/circuits
in the CPU node image. Look for logs like this when the CPU node starts:
2025-08-12T00:54:29.242040Z INFO bin/node/src/main.rs:207: worker type: Cpu
2025-08-12T00:54:29.242178Z INFO bin/node/src/main.rs:215: downloading circuit artifacts
thread 'tokio-runtime-worker' panicked at /usr/local/cargo/git/checkouts/sp1-9091391fc1cd5ab7/6544380/crates/sdk/src/install.rs:85:41:
failed to create build directory: Os { code: 13, kind: PermissionDenied, message: "Permission denied" }
note: run with RUST_BACKTRACE=1 environment variable to display a backtrace
2025-08-12T00:54:29.244128Z INFO bin/node/src/main.rs:123: Not creating circuits dir: Os { code: 13, kind: PermissionDenied, message: "Permission denied" }
2025-08-12T00:54:29.244158Z INFO bin/node/src/main.rs:133: Not creating temp dir: Os { code: 13, kind: PermissionDenied, message: "Permission denied" }
When the artifacts are correctly setup, you'll see a message like circuit artifacts ready after 3 min
.
CPU wrap proof is too slow
If you do not see circuit artifact issues but are still seeing Groth16/Plonk proofs timing out, it's likely that your CPU node does not have enough RAM or CPU resources. Having more cores and at least 32 GB RAM should result in better wrapping performance. If this is not possible, you can:
- Increase time buffers:
BIDDER_BUFFER_SEC
(base),BIDDER_GROTH16_BUFFER_SEC
,BIDDER_PLONK_BUFFER_SEC
to require more slack before bidding. - Temporarily disable a mode: set
BIDDER_GROTH16_ENABLED=false
orBIDDER_PLONK_ENABLED=false
to avoid bidding on that proof type.
10. Why is my Redis node using so much disk space?
Make sure persistence is disabled for Redis (persistence.enabled: false
for Helmchart or REDIS_AOF_ENABLED=no
for docker image) as it is not needed. This should prevent Redis from using any disk space. You can also try tuning other Redis parameters such as io-threads
and hz
.
11. What's the optimal configuration for my cluster hardware?
For optimal performance, we recommend the following:
- CPU nodes
- Recommend at least 32 GB RAM per concurrent proof (ex. 4 concurrent proofs = at least 128 GB RAM or two nodes with 64 GB RAM each)
- High CPU core count for Groth16/Plonk proofs
- You don't need that many CPU nodes. At the very most, you could have 1 per concurrent proof. Any more than that has no benefit.
- GPU nodes
- RTX 5090/4090 best, other GPUs >=24 GB VRAM okay as well. More than 24 GB VRAM is not beneficial. PCIe x16 lanes per GPU is recommended for best memory throughput.
- For non-VRAM: DDR5 RAM is recommended, and 32 GB per GPU is recommended.
- At least 4 CPU cores per GPU is recommended.
- 10 Gbps internal networking recommended.
- If running more than 8 GPUs, multiple Redis nodes are recommended. (See FAQ #8)
12. How should I configure the bidder?
TL;DR:
- You should set
BIDDER_THROUGHPUT_MGAS
to the throughput observed from running the CLI bench tool. - You should set
BIDDER_MAX_CONCURRENT_PROOFS
tothroughput / 8
. - If your cluster is timing out on many proofs, you should try lowering these variables or increasing
BIDDER_BUFFER_SEC
. - If your cluster is timing out on Groth16/Plonk proofs, you can increase the specific buffer sec or disable them entirely.
Environment variables
The bidder has three main configuration env vars: BIDDER_MAX_CONCURRENT_PROOFS
, BIDDER_THROUGHPUT_MGAS
, and BIDDER_BID_AMOUNT
.
BIDDER_THROUGHPUT_MGAS
should be set to the estimated total throughput of the cluster in million PGUs per second. This should be configured to the observed total throughput from running a large enough Fibonacci test on the cluster. See FAQ #6.BIDDER_MAX_CONCURRENT_PROOFS
controls the maximum number of assigned proofs the bidder will try to maintain.- The bidder determines how much throughput each proof gets based on
BIDDER_THROUGHPUT_MGAS / BIDDER_MAX_CONCURRENT_PROOFS
, and uses this per-proof throughput to estimate whether or not it has enough time to fulfill open proof requests. (The cluster coordinator assigns tasks evenly by proof, so each proof gets a roughly equal amount of throughput at any given time.) SettingBIDDER_MAX_CONCURRENT_PROOFS
to a lower value will allow the bidder to bid on proofs that have more aggressive deadlines, but if the value is too low, it may not fully utilize the cluster effectively. - Depending on how aggressive proof deadlines of proofs being requested are, this value could for example be set to a number in the range of
BIDDER_THROUGHPUT_MGAS / 10
toBIDDER_THROUGHPUT_MGAS / 2
, allowing the bidder to bid on proofs with deadlines of up to 10 MGas/sec to 2 MGas/sec.
- The bidder determines how much throughput each proof gets based on
BIDDER_BID_AMOUNT
should be set to the amount of PROVE (in wei) per PGU the bidder will bid at. For example, if this is set to200_000_000
, that would be equivalent to 0.20 PROVE per billion PGUs.
Additional vars control time buffers and proof modes:
BIDDER_BUFFER_SEC
(default 30): Base safety buffer applied to all proofs.BIDDER_GROTH16_BUFFER_SEC
(default 30): Extra buffer for Groth16 proofs.BIDDER_GROTH16_ENABLED
(default true): Disable to avoid bidding on Groth16.BIDDER_PLONK_BUFFER_SEC
(default 80): Extra buffer for Plonk proofs.BIDDER_PLONK_ENABLED
(default true): Disable to avoid bidding on Plonk.