# Unsloth Updates

To use the latest changes, update Unsloth via `unsloth studio update`.

{% updates format="full" %}
{% update date="2026-04-11" tags="model-release" %}

## **Gemma 4 Update + MiniMax-M2.7**

[Gemma 4 GGUFs](https://huggingface.co/collections/unsloth/gemma-4) are now updated with Google's official chat template fixes (which fixed/improved tool-calling), along with the latest llama.cpp fixes. Update to the latest llama.cpp, re-download quants and you shouldn't see `unused token` issues anymore.\
\
[MiniMax-M2.7](https://unsloth.ai/docs/models/minimax-m27) is out now! You can run the model locally with our GGUFs in 4-bit quantization on 128GB RAM / unified memory. [**MiniMax-M2.7 GGUF**](https://huggingface.co/unsloth/MiniMax-M2.7-GGUF)
{% endupdate %}

{% update date="2026-04-08" tags="new-releases,v0.1.36-beta" %}

## **Gemma 4 Fixes**

We’ve updated Gemma 4 [with many fixes](https://unsloth.ai/docs/models/gemma-4/train). These bugs are universal and affected all training packages and implementations and **did not originate from Unsloth**. We identified the bugs, fixed them, and Gemma 4 training now works properly in Unsloth.

You only need **8GB VRAM** to train **Gemma-4-E2B** locally. Unsloth trains Gemma 4 **\~1.5x faster while using \~60% less VRAM** than FA2 setups. For the full guide and notebooks on Gemma 4 training, [see our blog](https://unsloth.ai/docs/models/gemma-4/train).

#### Gemma 4 Training Fixes

1. **Gradient accumulation** no longer causes loss explosions. Previously, losses could spike to **300–400**; expected loss is around **10–15**.
2. Fixed the **IndexError** affecting **26B** and **31B** inference in `transformers`.
3. Fixed gibberish outputs for **E2B/E4B** when `use_cache=False`. See [issue #45242](https://github.com/huggingface/transformers/issues/45242).
4. Fixed **float16 audio** overflow from `-1e9` values.

If you see losses above **13–15,** for example **100** or **300** - gradient accumulation is likely being handled incorrectly. This is fixed in both **Unsloth** and **Unsloth Studio**.

#### Gemma 4 Quant Re-uploads

We also updated our Gemma 4 GGUFs so you will need to re-download. Again, these quant issues are **not related to or caused by Unsloth**:

1. CUDA: check for buffer overlap before fusing - critical fix for `<unused24>` tokens - [PR #21566](https://github.com/ggml-org/llama.cpp/pull/21566)
2. `kv-cache`: support attention rotation for heterogeneous iSWA - [PR #21513](https://github.com/ggml-org/llama.cpp/pull/21513)
3. `vocab`: add byte token handling to BPE detokenizer for Gemma 4 - [PR #21488](https://github.com/ggml-org/llama.cpp/pull/21488)
4. `convert`: set `"add bos" == True` for Gemma 4 - [PR #21500](https://github.com/ggml-org/llama.cpp/pull/21500)
5. `common`: add Gemma 4 specialized parser - [PR #21418](https://github.com/ggml-org/llama.cpp/pull/21418)
6. `llama-model`: read `final_logit_softcapping` for Gemma 4 - [PR #21390](https://github.com/ggml-org/llama.cpp/pull/21390)
7. `llama`: add custom newline split for Gemma 4 - [PR #21406](https://github.com/ggml-org/llama.cpp/pull/21406)

#### Unsloth Studio Updates

* Add **speculative decoding** support (ngram-mod, on by default)
* Llama.cpp updated to use latest version with all Gemma 4 Fixes
* Fix Qwen3.5 and Gemma 4 training issues
* Enable exporting and saving of Gemma 4 models
* Harden sandbox security for terminal and python tools
* Let recipes use the model loaded in Chat
* Fix empty chat threads on navigation (and whenever switching tabs) and stabilize new chat flow
* Allow non-LLM recipes to run and move Data tab first in executions
* Reuse HF cached repo casing to prevent duplicate downloads
  {% endupdate %}

{% update date="2026-04-03" tags="new-releases,v0.1.36-beta" %}

## **Google - Gemma 4**

* You can now run and train the [Gemma 4](https://unsloth.ai/docs/models/gemma-4) models in Unsloth.
* Intel Mac now works
* Pre-compiled binaries for llama.cpp for 2 Gemma-4 fixes:
  * vocab: fix Gemma4 tokenizer ([#21343](https://github.com/ggml-org/llama.cpp/pull/21343))
  * fix: gemma 4 template ([#21326](https://github.com/ggml-org/llama.cpp/pull/21326))
* Tool calls for smaller models are now more stable and don't cut off anymore
* Pre-compiled binaries for Windows, Linux, Mac, WSL devices - CPU and GPU
* Speculative Decoding added for non vision models (Gemma-4 is vision sadly and Qwen3.5)
* Context length is now properly applied.
* Web search now actually gets web content and not just summaries
* 90% reduced HF API calls - less rate limits
  {% endupdate %}

{% update date="2026-03-31" tags="new-releases,improvements" %}

## **+50% tool call accuracy + more support**

* Tool calls for all models are now **+30% to +80% more accurate.**
* Web search now actually gets web content and not just summaries
* Number of tool calls allowed are increased to 25 from 10
* Tool calls now terminate much better, so looping / repetitions will be reduced
* More **tool call healing** and de-duplication logic to stop tool callings from leaking XML as well
* Tested with `unsloth/Qwen3.5-4B-GGUF` (`UD-Q4_K_XL`), web search + code execution + thinking enabled.

| Metric                       | Before | After     |
| ---------------------------- | ------ | --------- |
| XML leaks in response        | 10/10  | 0/10      |
| URL fetches used             | 0      | 4/10 runs |
| Runs with correct song names | 0/10   | 2/10      |
| Avg tool calls               | 5.5    | 3.8       |
| Avg response time            | 12.3s  | 9.8s      |

#### New features

* Added **custom folders** so you can use any GGUFs in any folder - for now access in Advanced Settings in Chat and Custom Folders
* **Update button** now visible
* Install script styling all updated!
* Preliminary **Automatic Multi GPU support for inference and training** - useful for large models that don't fit on 1 GPU - Studio auto will allocate GPU resources
* Intel Macs should work out of the box

### Much smoother and faster Studio

* **Fixed timeouts of downloads of large models** - no more timeouts seen.
* **Fixed Hugging Face rate limiting - HF API calls reduced by 90%**
* Fixed bun on Windows and faster installs
  {% endupdate %}

{% update date="2026-03-27" tags="new-releases,fixes,improvements" %}

## **New Important Updates**

It’s only been 2 days since our previous release, but we’ve got a more important updates:

* **Inference is now 20–30% faster.** Previously, tool-calling and repeat penalty could slow inference below normal speeds. Inference tokens/s should now perform the same as `llama-server` / `llama.cpp`.
* **Now Auto-detects older or pre-existing models** downloaded from **LM Studio, Hugging Face,** and similar sources.
* **Inference token/s speed is now calculated correctly.** Previously, tokens/s included startup time, which made the displayed speed look slower than it actually was. It should now reflect 'true' inference speed.
* **CPU usage no longer spikes.** Previously, inline querier identity changed every render, causing `useLiveQuery` to resubscribe continuously.
* **Unsloth Studio now has a shutdown x button and shuts down properly.** Previously, closing it after opening from the desktop icon would not close it properly. Now, launching from the shortcut also opens the terminal, and closing that terminal fully exits Unsloth Studio. If you still have it open from a previous session you can restart your computer or run `lsof -i :8888` then `kill -9 <PID>`.
* **Even better tool-calling and websearch** with reduced errors.
* Updated documentation with lots of new info on [deleting models, uninstalling](https://unsloth.ai/docs/studio/install#uninstall) etc.
* **Cleaner, smarter install and setup logging across Windows and Linux.** Output is now easier to read with consistent formatting, quieter by default for a smoother experience, and supports richer `--verbose` diagnostics when you want full technical detail.
* You can now view your training history!
  {% endupdate %}

{% update date="2026-03-25" tags="new-releases,fixes,improvements" %}

## First Release post Unsloth Studio

Hey guys, this is our first release since we launched Unsloth Studio. Lots of new features and fixes:

* **You can now update Unsloth Studio!** Please update via: `unsloth studio update`
* **Windows** CPU or GPU now works seamlessly. Please reinstall!
* **App shortcuts**. Once installed, you can now launch in Windows, MacOS and Linux via a shortcut icon in the Start / Launch and Desktop.
* **Pre-compiled `llama.cpp` binaries** and `mamba_ssm` - 6x faster installs! Also <300MB in size for binaries.
* **50% reduced installation sizes** (-7GB or more savings), 2x faster installs and faster resolving. 50% smaller pypi sizes.
* **Tool calling improved.** Better llama.cpp parsing, no raw tool markup in chat, faster inference, a new Tool Outputs panel, timers.
* MacOS and CPU now have [Data Recipes](https://unsloth.ai/docs/new/studio/data-recipe) enabled with multi-file uploading.
* **AMD support preliminary for Linux** only machines - auto detects.
* **Settings sidebar redesign.** Settings are now grouped into **Model, Sampling, Tools, and Preferences**
* **Context length** now adjustable. Keep in mind this is not needed as llama.cpp smartly uses the exact context you need via `--fit on`
* **Multi-file upload.** Data recipes now support multiple drag-and-drop uploads for PDF, DOCX, TXT, and MD, with backend extraction, saved uploads, and improved previews.
* **Colab** with free T4 GPUs with Unsloth Studio now fixed! [Try it here](https://colab.research.google.com/github/unslothai/unsloth/blob/main/studio/Unsloth_Studio_Colab.ipynb). Due to pre-compiled binaries, it's also 20x faster!
* **Better chat observability.** Studio now shows `llama-server` timings and usage, a context-window usage bar, and richer source hover cards.
* **Better UX overall** - clickable links, better LaTeX parsing, tool / code / web tooltips for default cards and much more!
* **LiteLLM -** Unsloth Studio and Unsloth were **NOT** affected by the recent LiteLLM compromise. Nemo Data Designer used LiteLLM only up to `1.80`, not the affected `1.82.7` or `1.82.8`, and has since removed it entirely.
* We now have a new one line install command, just run:&#x20;

  <pre class="language-bash" data-overflow="wrap" data-expandable="true"><code class="lang-bash">curl -fsSL https://unsloth.ai/install.sh | sh
  </code></pre>

#### **Fixes:**

* **Windows/setup improvements.** Fixed silent Windows exits, Anaconda/conda-forge startup crashes, broken non-NVIDIA Windows installs, and missing early CUDA/stale-venv setup checks.
* **System prompts fixed.** They work again for non-GGUF text and vision inference.
* **Persistent system prompts and presets.** Custom system prompts and chat presets now persist across reloads and page changes.
* **GGUF export expanded.** Full fine-tunes, not just LoRA/PEFT, can now export to GGUF. Base model resolution is more reliable, and unsupported export options are disabled in the UI.
* **Chat scroll/layout fixes.** Fixed scroll-position issues during generation, thinking-panel layout shift, and viewport jumps when collapsing reasoning panels.
* **Smarter port conflict detection.** Studio now detects loopback conflicts, can identify the blocking process when possible, and gives clearer fallback-port messages.
  {% endupdate %}

{% update date="2026-03-17" tags="fixes,improvements" %}

## New Tool calling + Windows Stability

* Claude Artifacts works so HTML can be executed like a snake game inside the chat
* +30% more accurate tool calls esp for small models + Timer for tool calls
* Tool + Web Search outputs can be saved + Toggle auto healing tool on/off
* Many bug fixes - Windows CPU works, Mac more seamless, faster and smaller installs
  {% endupdate %}
  {% endupdates %}
