Optimize Your Prover
We are actively seeking partnerships with teams interested in building competitive provers on the prover network. Learn more here.
If you want to be a competitive prover on the network, you'll need to do a lot more than just run the reference implementation. Here's what you need to know:
Multi-GPU Parallelization
The reference implementation doesn't use multiple GPUs, which means you won't hit the SDK's latency targets. Competitive provers regularly utilize 20-40 GPUs per proving request to maximize throughput. This requires:
- Writing custom code to split proving work across multiple GPUs
- Coordinating the RISC-V executor across the GPU cluster
- Creating checkpoints of the virtual machine state
- Distributing execution artifacts to different GPUs for parallel proving
- Orchestrating the recursion prover to combine sharded proofs into a single final proof
Cost-Effective GPU Infrastructure
Running on AWS or GCP will probably eat up all your profits. You'll need to get creative:
- Build autoscaling infrastructure for on-demand GPU usage
- Source long-term GPU reservations from alternative cloud providers
- Buy GPUs in bulk directly from suppliers
Software and Hardware Optimizations
The prover is primarily bottlenecked by three operations: hashing, field operations, and memory access. You'll need to optimize at both the software and hardware levels:
Software Optimizations
- Optimize assembly of field operations and hashing, commonly used primitives in prover
- Use AVX256/512/NEON to vectorize certain operations
- Optimize memory usage and memory access to minimize cache misses
GPU Optimizations
- Identify massively parallel parts of the prover, such as merkle tree root calculation, quotient calculation and implement CUDA kernels for them
- Optimize memory transfer between CPU and GPU
Hardware Optimizations
- Source cheap CPU/GPU hardware that is most cost effective and efficient on latency
- Use more advanced hardware such as FPGA/ASICs to accelerate the prover
Reliability and Performance Monitoring
You'll need to invest in robust monitoring and testing to ensure your prover is running at maximum capacity and reliability.
- Robustly test your proving node to ensure you miss deadlines on proofs as rarely as possible
- Integrate extensive monitoring, metrics collection, healthchecks, and paging to ensure your provers are running at maximum capacity and reliability
- Benchmark your performance against other provers on the network