CUDA
WARNING: CUDA proving is still an experimental feature and may be buggy.
SP1 supports CUDA acceleration, which can provide dramatically better latency and cost performance compared to using the CPU prover, even with AVX acceleration.
Software Requirements
Please make sure you have the following installed before using the CUDA prover:
Hardware Requirements
- CPU: We recommend having at least 8 CPU cores with 32GB of RAM available to fully utilize the GPU.
- GPU: 24GB or more for core/compressed proofs, 40GB or more for shrink/wrap proofs
Usage
To use the CUDA prover, you have two options:
- Use
ProverClient::from_env
to build the client and setSP1_PROVER
environment variable tocuda
. - Use
ProverClient::builder().cuda().build()
to build the client.
Then, use your standard methods on the ProverClient
to generate proofs.
Recommended Workflow
Currently, the CUDA prover relies on a Docker image that contains state, and only utilizes the 0th
GPU.
In general, the best practice is to keep an instance of a ProverClient
in an Arc
as they have initialization overhead.
However, as the CUDA prover is "single threaded", attempting to call the methods on the CUDA prover concurrently will cause unexpected behavior
and attempting to initialize a CUDA prover twice in the same process (even if the previous one was dropped) will panic.
It is recommended to store an instance of a CUDA prover in an Arc<Mutex<_>>
in order to avoid any issues.