Why train a custom model instead of using an API?
API-based models are excellent for general-purpose tasks and rapid prototyping. Custom training makes sense when you need: specialised performance on your domain data, predictable costs at high volume, data privacy (no data sent to third parties), offline capability, or a model optimised for your specific latency and throughput requirements. Many teams use a hybrid approach — API for complex reasoning, custom models for high-volume or sensitive tasks.
Is Burn production-ready for training?
Yes. Burn v0.20 (January 2026) is actively used in production by teams training and deploying models. It supports automatic differentiation, multiple optimisers (Adam, SGD, AdamW), learning rate scheduling, gradient checkpointing, distributed training via NCCL, and an ergonomic metrics dashboard. The CubeCL kernel system delivers performance competitive with LibTorch. The ecosystem is still newer than PyTorch's, but core training workflows are solid and well-tested.
How much data do I need to train a model?
It depends on the approach. Fine-tuning a pre-trained model (via LoRA or QLoRA) can work with as few as 500–1,000 high-quality examples. Training from scratch requires substantially more data — typically tens of thousands to millions of examples depending on model size and task complexity. During the assessment phase, we evaluate your data volume and quality, and recommend the most data-efficient approach.
What hardware do I need for training?
For fine-tuning (LoRA/QLoRA), a single consumer GPU with 16–24 GB VRAM (RTX 3090/4090) is often sufficient. Training larger models or training from scratch benefits from professional GPUs (A100, H100, AMD MI250) or multi-GPU setups. Burn supports CUDA, ROCm, and Metal, so you can use NVIDIA, AMD, or Apple hardware. We help you select the right infrastructure — cloud GPU instances are a great starting point before committing to on-prem hardware.
Can I deploy the trained model without Python?
Yes — this is the primary advantage of Burn. The trained model is serialised and loaded by a Rust application that compiles to a single static binary (~8 MB). No Python runtime, no virtual environment, no dependency conflicts. The binary can be deployed on bare metal, in a Docker container, on a serverless platform, or even compiled to WebAssembly for browser-based inference.
How do you handle experiment tracking?
Burn includes a built-in metrics dashboard for tracking training progress — loss curves, accuracy, learning rate, and custom metrics. For more advanced experiment management, we integrate with external tools via Burn's logging API. We also implement checkpointing so training can resume from any point, and we version both models and datasets for reproducibility.