Hey readers! It's been a month since our Gemma bug fixes and today you can reduce memory even further By using '', Unsloth now reduces VRAM use by an extra 25% with no extra overhead (well 1% if you want specifics). Previously, Unsloth already reduced VRAM use by 70%, however our new update adds an extra 25% reduction. Also, the longer the context, the more VRAM reductions you get! See below for tables on new min. requirements for models.
You can finetune TinyLlama 387% faster + use 74% less memory on 1 epoch of Alpaca's 52K dataset in 84 minutes on a free Google Colab instance with packing support! We also extended the context window from 2048 to 4096 tokens automatically! Notebook
With packing support through 🤗Hugging Face, Tiny Llama is not 387% faster but a whopping 6,700% faster than non packing!! Shocking!
In case you missed it, we've also written a blog post up on Hugging Face. By directly integrating Unsloth, users can now achieve 2x faster finetuning and use 50% less memory by installing our package. A huge thanks to the Hugging Face team and Younes Belkada for making this possible. We look forward to more collabs in the future! We're also in 🤗Hugging Face's docs!
Unsloth was benchmarked across 59 runs using 4 datasets on Tesla T4 and A100 Google Colab instances. QLoRA was applied to all linear layers (attention and MLP) with a rank of 16, and gradient checkpointing was on. By testing against the latest Transformers version (4.36), which has SDPA natively integrated if you have Pytorch 2.1.1, Unsloth is up to 2.7x faster and uses up to 74% less memory. We also tested Unsloth on a free Google Colab instance (low RAM, 1 T4 GPU, Pytorch 2.1.0 CUDA 12.1). All 59 notebooks are provided for full reproducibility, and more details are in Unsloth’s benchmarking details here
Unsloth Checkpoint benchmarks
Unsloth + Checkpointing
Unsloth Old
Hugging Face + Flash Attention 2
Hugging Face
Unsloth Old
Speed boost
2x
43455
455
2x
2x
Gemma 7b
2x
43455
455
2x
2x
Mistral 7b
2x
43455
455
2x
2x
Stable Diffusion
2x
43455
455
2x
2x
Other important updates
Kaggle Notebooks should now be fully fixed. No more bugs.
The Gemma bugs which we found + fixed are now only and fully integrated into Unsloth.
Saving bugs are solved.
We’ve also enabled native text streaming in all notebooks, making it easier for you to manage your text data
Support us! 💕
Feel free to support us via our Ko-fi donation page. Huge shout out to: Rajesh, 007ok, Netrve, Goblin, pacozaa, Datta Nimmaturi, Hamel Husain, Ratish, Chris, Steffen, Remek, Anthony, Richard, Chrismcmaster, Trelis Research, preemware and Nam who are new supporters! 🙏
As always, be sure to join our Discord server for help or just to show your support! You can also follow us on Twitter and Substack.