General purpose
Candle
Minimalist ML framework from Hugging Face. Ideal for most self-hosted inference scenarios — small binary (~8 MB), broad model support (Llama, Mistral, Phi, Gemma, Stable Diffusion, Whisper, YOLO), GPU acceleration via CUDA and Metal, and first-class WASM support for browser or edge deployment.
- • 20k+ GitHub stars, 260+ contributors — large, active community
- • Quantization support via GGUF — run models on consumer GPUs
- • Apache 2.0 licensed — fully open source