70% + 20% VRAM reduction

Mar 19, 2024 • By Daniel & Michael

Apr 9, 2024

By Daniel & Michael

TinyLlama
Colab T4
387%
faster

TinyLlama
Colab T4
-74%
VRAM

DPO Zephyr
1xA100
188%
faster

DPO Zephyr
1xA100
-11.6%
VRAM

Hey readers! It's been a month since our Gemma bug fixes and today you can reduce memory even further By using '', Unsloth now reduces VRAM use by an extra 25% with no extra overhead (well 1% if you want specifics). Previously, Unsloth already reduced VRAM use by 70%, however our new update adds an extra 25% reduction. Also, the longer the context, the more VRAM reductions you get! See below for tables on new min. requirements for models.
  • You can finetune TinyLlama 387% faster + use 74% less memory on 1 epoch of Alpaca's 52K dataset in 84 minutes on a free Google Colab instance with packing support! We also extended the context window from 2048 to 4096 tokens automatically! Notebook
  • With packing support through 🤗Hugging Face, Tiny Llama is not 387% faster but a whopping 6,700% faster than non packing!! Shocking!
  • PS Don't forget to ⭐Star us on Github and join our Discord server ❤️
In case you missed it, we've also written a blog post up on Hugging Face. By directly integrating Unsloth, users can now achieve 2x faster finetuning and use 50% less memory by installing our package. A huge thanks to the Hugging Face team and Younes Belkada for making this possible. We look forward to more collabs in the future! We're also in 🤗Hugging Face's docs!
Unsloth was benchmarked across 59 runs using 4 datasets on Tesla T4 and A100 Google Colab instances. QLoRA was applied to all linear layers (attention and MLP) with a rank of 16, and gradient checkpointing was on. By testing against the latest Transformers version (4.36), which has SDPA natively integrated if you have Pytorch 2.1.1, Unsloth is up to 2.7x faster and uses up to 74% less memory. We also tested Unsloth on a free Google Colab instance (low RAM, 1 T4 GPU, Pytorch 2.1.0 CUDA 12.1). All 59 notebooks are provided for full reproducibility, and more details are in Unsloth’s benchmarking details here

Unsloth Checkpoint benchmarks

Unsloth +
Checkpointing
Unsloth Old
Hugging Face + Flash Attention 2
Hugging Face
Unsloth Old
Speed boost
2x
43455
455
2x
2x
Gemma 7b
2x
43455
455
2x
2x
Mistral 7b
2x
43455
455
2x
2x
Stable Diffusion
2x
43455
455
2x
2x
Other important updates
  • Kaggle Notebooks should now be fully fixed. No more bugs.
  • The Gemma bugs which we found + fixed are now only and fully integrated into Unsloth.
  • Saving bugs are solved.
  • We’ve also enabled native text streaming in all notebooks, making it easier for you to manage your text data
Support us! 💕 
Feel free to support us via our Ko-fi donation page. Huge shout out to: Rajesh, 007ok, Netrve, Goblin, pacozaa, Datta Nimmaturi, Hamel Husain, Ratish, Chris, Steffen, Remek, Anthony, Richard, Chrismcmaster, Trelis Research, preemware and Nam who are new supporters! 🙏

As always, be sure to join our Discord server for help or just to show your support! You can also follow us on Twitter and Substack.
Thank you for reading!
Daniel & Michael Han 🦥
19 March 2024

Unsloth Studio coming soon!

Join Our Discord