By manually deriving all compute heavy maths steps and handwriting GPU kernels, unsloth can magically make training faster without any hardware changes.
10x faster on a single GPU and up to 32x faster on multiple GPU systems compared to Flash Attention 2 (FA2).
We support NVIDIA GPUs from Tesla T4 to H100, and we’re portable to AMD and Intel GPUs.
Why not try our fully free open source version? Finetune 2X faster on a single NVIDIA GPU for free on Google Colab or Kaggle Notebooks.