> For the complete documentation index, see [llms.txt](https://unsloth.ai/docs/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://unsloth.ai/docs/new/changelog.md).

# Unsloth Updates

To use the latest changes, [update Unsloth](/docs/new/studio/install.md#update-unsloth-studio).

{% updates format="full" %}
{% update date="2026-06-12" tags="new-releases,v0.1464-beta" %}

## DiffusionGemma + Gemma 4 MTP

Ensure you install the latest [`v0.1.464-beta`](https://github.com/unslothai/unsloth/tree/v0.1.462-beta) or `2026.6.7`. [DiffusionGemma](https://unsloth.ai/docs/models/diffusiongemma), [Gemma 4 MTP](https://unsloth.ai/docs/models/mtp) and [**MiniMax-M3**](https://unsloth.ai/docs/models/minimax-m3) are all now supported.

* Run and train [DiffusionGemma](https://unsloth.ai/docs/models/diffusiongemma) via [Unsloth Studio](https://unsloth.ai/docs/new/studio).
* [Gemma 4 MTP](https://unsloth.ai/docs/models/mtp) is here! Run [Gemma 4](https://unsloth.ai/docs/models/gemma-4) \~2x faster with MTP.
* Audio chat is now supported for Gemma 4 (`wav`, `mp3`, `m4a`, `flac`, `webm`).
* Preserve Think added to Gemma 4.

<figure><img src="/files/g9BbDrR1I207d1vKuMv6" alt="" width="375"><figcaption></figcaption></figure>

#### Hub + Download Manager (Experimental)

* Added a new **Hub** page for browsing, downloading, and managing Hugging Face models and datasets.
* Unsloth can now detect models and datasets already on your machine and show them alongside downloaded assets.
* Downloaded [GGUF models](https://unsloth.ai/docs/basics/inference-and-deployment/saving-to-gguf) now have direct **Run / New Chat** actions.

#### RAG / Chat with Files (Experimental)

* Added [**Chat with Files**](https://unsloth.ai/docs/new/studio/chat) in Studio, letting you ask questions over your own documents and knowledge bases.
* Supports hybrid search, citations, PDF previews, per-thread documents, and a built-in `search_knowledge_base` tool.

#### New Update Button + Hardware Support

* Unsloth now uses constant fresh up to date [llama.cpp prebuilts](https://unsloth.ai/docs/new/changelog) across CUDA, ROCm, Windows, Linux, and macOS.
* Added an in-app **Update llama.cpp** button so users can update the local backend without reinstalling Studio.
* Improved Windows / WSL AMD support, [Strix Halo ROCm support](https://unsloth.ai/docs/get-started/install/amd), [Blackwell CUDA selection](https://unsloth.ai/docs/blog/fine-tuning-llms-with-blackwell-rtx-50-series-and-unsloth), and clearer installer messages.

#### Local Chat, Tools & API Compatibility

* Local [tool calling](https://unsloth.ai/docs/basics/tool-calling-guide-for-local-llms) is more reliable, with better ordering of tool cards, fewer duplicate tool loops, and support for tool use with GGUF vision models.
* Improved [OpenAI-compatible API](https://unsloth.ai/docs/basics/inference-and-deployment/llama-server-and-openai-endpoint) and Anthropic-compatible API behavior for local Studio servers, including better errors, token usage, stop reasons, and [Claude Code compatibility](https://unsloth.ai/docs/basics/claude-code).

#### Training & Fixes

* Improved [MLX support](https://unsloth.ai/docs/new/studio/install) with better model labels, generation speed stats, and fixes for [VLM training](https://unsloth.ai/docs/basics/vision-fine-tuning).
* Fixed several [training](https://unsloth.ai/docs/get-started/fine-tuning-llms-guide) and [dataset](https://unsloth.ai/docs/get-started/fine-tuning-llms-guide/datasets-guide) edge cases, including non-writable Hugging Face caches and custom dataset mappings.
* Added many UI polish fixes across chat, menus, model picker, dark mode, import/export, and settings.

To update Unsloth or install a new Unsloth Studio, you must use:

**macOS, Linux, WSL:**

```bash
curl -fsSL https://unsloth.ai/install.sh | sh
```

**Windows:**

```powershell
irm https://unsloth.ai/install.ps1 | iex
```

{% endupdate %}

{% update date="2026-06-03" tags="new-releases,v0.1.44-beta" %}

## Gemma 4 12B, New UI, MCP, Projects

This update focuses mainly on Gemma 4 12B, MCP, Projects, Canvas, CUDA 13.3 and the new chat UI. Next week we'll have an even bigger update.

<div data-with-frame="true"><figure><img src="/files/fkVNBWBa27CUpCbEyesl" alt="" width="375"><figcaption></figcaption></figure></div>

#### Gemma 4 12B

Google releases [Gemma 4 12B](https://unsloth.ai/docs/models/gemma-4), a new model that runs locally on 8GB RAM. [GGUF](https://huggingface.co/unsloth/gemma-4-12b-it-GGUF) / [Guide](https://unsloth.ai/docs/models/gemma-4)

Gemma 4 12B Unified supports image, audio and 256K context. Run and train the model via Unsloth Studio.

#### MCP

* Remote `MCP` server support, including custom headers and OAuth
* Local command-based `MCP` server support
* `MCP` can now be turned on from the chat composer
* Built-in presets for common `MCP` servers

#### New Chat UI

* Projects, Canvas, `MCP`, RAG and Compare controls now live in the plus menu
* Search and Code controls are easier to access from the composer
* Menus, overlays, icons and clickable controls are more consistent across Studio

#### Projects

* Organize related chats into dedicated project workspaces
* Move existing chats into projects
* Create and manage projects directly from the sidebar

#### Experimental Canvas / Artifacts

* Opens generated HTML in a dedicated canvas panel inside Unsloth Studio
* Supports interactive outputs, including browser based visualizations and CDN-loaded packages
* Lets you switch between rendered preview and source code

#### Install, Runtime and Hardware

* Windows prebuilt installs no longer require the early `CUDA Toolkit` check
* Linux `llama.cpp` prebuilts now match the detected runtime `cudart` major
* `ROCm` gfx detection is forwarded into prebuilt selection
* `Blackwell`, `B300` and `ARM64` Linux support updates

To update Unsloth or install a new Unsloth Studio, you must use:

**macOS, Linux, WSL:**

```bash
curl -fsSL https://unsloth.ai/install.sh | sh
```

**Windows:**

```powershell
irm https://unsloth.ai/install.ps1 | iex
```

{% hint style="warning" %}
**DO NOT USE `unsloth studio update` anymore since packaging will not get the latest updates!**
{% endhint %}
{% endupdate %}

{% update date="2026-05-31" tags="new-releases,v0.1.43-beta" %}

## CUDA 13.3, Windows, Mac

**To update Unsloth or install a new Unsloth Studio, you must use:**

**macOS, Linux, WSL:**

```bash
curl -fsSL https://unsloth.ai/install.sh | sh
```

**Windows:**

```powershell
irm https://unsloth.ai/install.ps1 | iex
```

{% hint style="warning" %}
**DO NOT USE `unsloth studio update` anymore since packaging will not get the latest updates!**
{% endhint %}

#### Mac Updates

* Re-enabled `llama.cpp` prebuilt binaries for Apple Silicon (M1-M4) - Mac OS 14 / 15 / 26 (Tahoe)
* Apple Silicon Mac OS 13 (Ventura) is source build
* Intel (x86\_64) for Mac OS 13.3 / 14 / 15 / 26 (Tahoe) uses `llama.cpp` prebuilt binaries
* Intel for Max 13.0 - 13.2 is source build

#### Windows Updates

* CUDA 13.3 `llama.cpp` prebuilt binaries now work for Windows
* For CUDA 13.2, CUDA 13.1 and below, Windows devices uses CUDA 12.4 fallback - we'll work on CUDA 13.1 binaries soon.

#### CUDA 13.3 Update

* CUDA 13.3 non Linux binaries work. We'll still use CUDA 13.1 for now
* CUDA 13.3 solves the CUDA 13.2 gibberish problem - see <https://github.com/unslothai/unsloth/issues/4849>

#### Blackwell GPUs Update

* For now Blackwell will have delayed releases of `llama.cpp` prebuilt binaries sine CUDA 12.4 does not work - we are working to resolve this soon.
  {% endupdate %}

{% update date="2026-05-26" tags="new-releases,v0.1.42-beta" %}

## An update before Revamp.

Hey guys, we're doing one more-ish update before a major revamp which is likely coming this week or next week. Our revamp will change a lot of things, especially with new major features and a lot of design changes.

{% embed url="<https://github.com/user-attachments/assets/70456395-e016-4273-8256-35adb206267e>" %}

* NEW: [**API calling support**](https://unsloth.ai/docs/integrations/connections) now with image generation + editing, proper web search, code execution, auto prompt caching. Connect [OpenAI](https://unsloth.ai/docs/integrations/connections/openai), [Anthropic](https://unsloth.ai/docs/integrations/connections/anthropic-claude) and more.
* Proper support for **non-English languages** e.g. Japanese, Chinese, Indian etc.

Many of you may have missed our previous release which only lasted for one day. We introduced:

* Connect to external inference backends: [vLLM](https://unsloth.ai/docs/integrations/connections/vllm), [Ollama](https://unsloth.ai/docs/integrations/connections/ollama), [llama-server](https://unsloth.ai/docs/integrations/connections/connect-llama.cpp-to-unsloth-run-ggufs-with-llama-server)
* **Security improvements**
* **Auto MTP speculative decoding** for MTP GGUFs; get the best settings customized for your hardware.

#### API provider calling & external connections

* You can now connect Unsloth to any API cloud provider (OpenAI, Anthropic, OpenRouter etc.)
* **Built-in web search** for OpenAI, Anthropic, OpenRouter and Kimi
* **Built-in code execution** for OpenAI and Anthropic (Anthropic containers persist and are reused across turns)
* Prompt caching is enabled for OpenAI and Anthropic models saving 50 to 90% of costs.
* Image generation + editing
* API key now optional for local providers (llama.cpp / vLLM / Ollama)
* Auto-load models when adding a cloud provider

#### Other Unsloth Studio updates

* OpenDocument chat attachments
* o3 reasoning summary payload
* Sending/prompting non-English languages (e.g. Japanese, Chinese) now works properly
* IME composer hardening, RTL `dir="auto"`, long log-line truncation fix
* Tool reasoning trace rendering in UI
* Fully offline support: cached GGUF discovery and offline DNS auto-detect for both inference and training

#### Unsloth Studio security improvements

* Authentication rate-limiting, proxy-aware so reverse proxies don't bypass it
* Sandboxed worker with a tightened blocklist (bash, `hf upload`, `NOFILE`)
* Path containment so workers can't escape their in-flight tmp dirs
* Strict schema validation across the Studio API
* Tightened CSP / security headers (only legitimate favicon hosts allowed)
* Removed the `torch.load` fallback on `training_args.bin` so untrusted pickles can never execute on model load
* Hardened Tauri desktop release flow
* Frontend auth: singleflight token refresh, current-password input on changes, working logout, shared 422 helper
* Cancel cleanup now scoped strictly to in-flight tmp dirs so it can never delete user state
  {% endupdate %}

{% update date="2026-05-19" tags="new-releases,v0.1.41-beta" %}

## MTP + Unsloth Fixes

Lots of bug fixes, UI, UX fixes to Studio! To get the latest updates do:

**macOS, Linux, WSL:**

```bash
curl -fsSL https://unsloth.ai/install.sh | sh
```

**Windows:**

```powershell
irm https://unsloth.ai/install.ps1 | iex
```

#### Fixes

1. Fix `unsloth studio update` not working well
2. Fix getting stuck on `reset-password` page
3. More offline mode support
4. Improve MTP not being faster on Macs, CPUs and GPUs - now it's much better!
5. Fix Desktop Shortcut not working after update
6. Many many UI UX bug fixes
   {% endupdate %}

{% update date="2026-05-18" tags="new-releases,model-release,v0.1.405-beta" %}

## Qwen3.6 MTP + API Connections

We've got lots of new updates for Unsloth `v0.1.41-beta`:

* **\~2x faster GGUF inference** with automatically enabled [MTP](/docs/models/qwen3.6.md#mtp-guide)
* [**API calling support**](/docs/integrations/connections.md) for [OpenAI](/docs/integrations/connections/openai.md), [Anthropic](/docs/integrations/connections/anthropic-claude.md) etc. with auto prompt caching, web search, code execution
* Connect to external inference backends: [vLLM](/docs/integrations/connections/vllm.md), [Ollama](/docs/integrations/connections/ollama.md), [llama-server](/docs/integrations/connections/connect-llama.cpp-to-unsloth-run-ggufs-with-llama-server.md)
* Experimental **MLX inference**
* Proper support for **non-English languages**
* **Security** improvements

<a href="/pages/NpuhjPsxi8BKhuS8nnyY#qwen3.6-inference-tutorials" class="button primary">Run Qwen3.6 Tutorials</a><a href="/pages/NpuhjPsxi8BKhuS8nnyY#mtp-guide" class="button primary">MTP Guide</a>

<div data-with-frame="true"><figure><img src="/files/vhlCefN7HOqIdfF84MKz" alt="" width="375"><figcaption></figcaption></figure></div>

#### MTP speculative decoding support 1.4 to 2x faster inference!

* **Auto MTP speculative decoding** for MTP GGUFs; warn when the bundled llama.cpp prebuilt is stale or too old for MTP
* New pre-built llama.cpp binaries for MTP support!

#### API provider calling & external connections

* You can now connect Unsloth to any API cloud provider (OpenAI, Anthropic, OpenRouter etc.)
* **Built-in web search** for OpenAI, Anthropic, OpenRouter and Kimi
* **Built-in code execution** for OpenAI and Anthropic (Anthropic containers persist and are reused across turns)
* Prompt caching is enabled for OpenAI and Anthropic models saving 50 to 90% of costs.
* API key now optional for local providers (llama.cpp / vLLM / Ollama)
* Auto-load models when adding a cloud provider

#### MLX inference (Experimental)

* MLX quants and models now can run locally on your Mac machines!
* We'll be adding thinking, tools and web search soon!

#### Other Unsloth Studio updates

* Sending/prompting non-English languages (e.g. Japanese, Chinese) now works properly
* OpenDocument chat attachments
* o3 reasoning summary payload
* IME composer hardening, RTL `dir="auto"`, long log-line truncation fix
* Tool reasoning trace rendering in UI
* Fully offline support: cached GGUF discovery and offline DNS auto-detect for both inference and training
* Lots of UI/UX polish: dark theme refactor, right sidebar redesign, time-of-day sloth mascot, dismissable copyable toasts, larger chat composer, code-execution config polish, composer action pill styling, narrower Discord button

#### Training updates

* Gemma attention mask fixes
* Multi Image GRPO
* GRPO hidden-state return experiments
* New Continued Pretraining (CPT) training method as a first-class option
* Gemma-4 MoE LoRA extractor registered to fix `grouped_mm` contraction crash
* Opt-in fused `lm_head` + cross-entropy forward, with single-matmul path under `UNSLOTH_RETURN_LOGITS=1`
* Pass batch size for eval
* Eval/training paths now honour `HF_DATASETS_OFFLINE` alongside `HF_HUB_OFFLINE`

#### Unsloth Studio security improvements

* Authentication rate-limiting, proxy-aware so reverse proxies don't bypass it
* Sandboxed worker with a tightened blocklist (bash, `hf upload`, `NOFILE`)
* Path containment so workers can't escape their in-flight tmp dirs
* Strict schema validation across the Studio API
* Tightened CSP / security headers (only legitimate favicon hosts allowed)
* Removed the `torch.load` fallback on `training_args.bin` so untrusted pickles can never execute on model load
* Hardened Tauri desktop release flow
* Frontend auth: singleflight token refresh, current-password input on changes, working logout, shared 422 helper
* Cancel cleanup now scoped strictly to in-flight tmp dirs so it can never delete user state
  {% endupdate %}

{% update date="2026-05-05" tags="new-releases,v0.1.39-beta,v0.1.38-beta" %}

## Unsloth API endpoint

#### ***v0.1.39-beta bug fix*** **May 5th 2026**

Fixes chat history not being shown (existing chat history is not lost) and attachments not attaching correctly. The bug was render-only - use `2026.5.2` or directly call `curl -fsSL https://unsloth.ai/install.sh | sh`  to update

You can use local LLMs with tools like [Claude Code](https://unsloth.ai/docs/basics/claude-code) and [Codex](https://unsloth.ai/docs/basics/codex) by connecting them to Unsloth’s API endpoint. This lets you run models like [Qwen](https://unsloth.ai/docs/models/qwen3.6) and [Gemma](https://unsloth.ai/docs/models/gemma-4) locally, with additional features such as self-healing tool calling, code execution, and web search.

Using Unsloth as an API inference endpoint is beneficial not only because it is easy to setup and fast, but also because Unsloth provides:

* [Self-healing tool calling](https://unsloth.ai/docs/new/studio/chat#auto-healing-tool-calling), which helps reduce broken or malformed tool calls by 50%
* [Code execution](https://unsloth.ai/docs/new/studio/chat#code-execution) support, allowing Bash and Python execution for more accurate code outputs.
* Advanced [Web search](https://unsloth.ai/docs/new/studio/chat#advanced-web-search) that visits and actually reads webpages to gather in-depth info.
* [Automatic inference settings](https://unsloth.ai/docs/new/studio/chat#auto-parameter-tuning) for GGUF models (temp, top-k etc.)

<div data-with-frame="true"><figure><img src="/files/Z3eIk2YCloY1lJy73JHS" alt="" width="375"><figcaption></figcaption></figure></div>

#### New models

We've also got a handful of new models to run including NVIDIA [Nemotron 3 Nano Omni](/docs/models/nemotron-3-nano-omni.md), IBM [Granite 4.1](/docs/models/ibm-granite-4.1.md) and [Mistral 3.5](/docs/models/mistral-3.5.md) Medium. We helped Mistral solve some issues with implementation in transformers and GGUFs.

#### Unsloth Updates

* Stopped Studio training runs can now resume from checkpoints.
* Chat threads now autosave and persist more reliably.
* DPO training hangs in multi-process setups were fixed.
* VLM GRPO support improved with MROPE updates.
* Studio’s stop button now properly stops generation.
* Fix chat template disappearing after browser refresh.
  {% endupdate %}

{% update date="2026-04-23" tags="new-releases,v0.1.37-beta" %}

## Brand New UI Redesign

Hey guys, we revamped the entire Unsloth Studio UI and UX experience to put an emphasis on chat and training:

* Added a collapsible sidebar based on community feedback

<div data-with-frame="true"><figure><img src="/files/vTGOOXiSgQ6qXSrMZMMw" alt="" width="375"><figcaption></figcaption></figure></div>

* You can now delete chats and search past conversations

<div><figure><img src="/files/Y2MPKaKO9MSojMnGHkXl" alt=""><figcaption></figcaption></figure> <figure><img src="/files/gZACythOQsFyjn0UdVh3" alt=""><figcaption></figcaption></figure></div>

* New Preserve Thinking toggle for models that support it like Qwen3.6
* Cleaner, more consistent design with easier navigation
* Expanded Settings page with options to change your profile picture, name, and more

<div data-with-frame="true"><figure><img src="/files/aEMGbiZZ4Lq988UHygfZ" alt="" width="375"><figcaption></figcaption></figure></div>

* No more entering your Hugging Face token twice
* gpt-oss now has low, medium and high thinking toggles.
* Now uses latest llama.cpp prebuilt, even on Linux CUDA
* Lots of bug, consistency and stability fixes
* Kimi-K2.6 can now be run!
* We also added experimental API support. Guides, announcement etc will come next week.

Qwen3.6 was also also previously already supported in Unsloth Studio for running and training. You can train and run Qwen3.6-27B right now!
{% endupdate %}

{% update date="2026-04-22" tags="model-release,new-releases" %}

## **Qwen3.6-27B + Kimi K2.6**

[**Qwen3.6-27B**](/docs/models/qwen3.6.md) can now run (18GB RAM) and be fine-tuned in Unsloth Studio. Kimi K2.6 can also be ran in Unsloth (350GB RAM).

Unsloth Studio received many new updates so please update. Details and writeup coming in the next few days.
{% endupdate %}

{% update date="2026-04-16" tags="model-release,new-releases" %}

## **Qwen3.6**

[**Qwen3.6**](/docs/models/qwen3.6.md) can now run and be fine-tuned in Unsloth Studio. The model runs on 23GB RAM and is the strongest mid-sized LLM on nearly all benchmarks.
{% endupdate %}

{% update date="2026-04-11" tags="model-release" %}

## **Gemma 4 Update + MiniMax-M2.7**

[Gemma 4 GGUFs](https://huggingface.co/collections/unsloth/gemma-4) are now updated with Google's official chat template fixes (which fixed/improved tool-calling), along with the latest llama.cpp fixes. Update to the latest llama.cpp, re-download quants and you shouldn't see `unused token` issues anymore.\
\
[MiniMax-M2.7](/docs/models/tutorials/minimax-m27.md) is out now! You can run the model locally with our GGUFs in 4-bit quantization on 128GB RAM / unified memory. [**MiniMax-M2.7 GGUF**](https://huggingface.co/unsloth/MiniMax-M2.7-GGUF)
{% endupdate %}

{% update date="2026-04-08" tags="new-releases,v0.1.36-beta" %}

## **Gemma 4 Fixes**

We’ve updated Gemma 4 [with many fixes](/docs/models/gemma-4/train.md). These bugs are universal and affected all training packages and implementations and **did not originate from Unsloth**. We identified the bugs, fixed them, and Gemma 4 training now works properly in Unsloth.

You only need **8GB VRAM** to train **Gemma-4-E2B** locally. Unsloth trains Gemma 4 **\~1.5x faster while using \~60% less VRAM** than FA2 setups. For the full guide and notebooks on Gemma 4 training, [see our blog](/docs/models/gemma-4/train.md).

#### Gemma 4 Training Fixes

1. **Gradient accumulation** no longer causes loss explosions. Previously, losses could spike to **300–400**; expected loss is around **10–15**.
2. Fixed the **IndexError** affecting **26B** and **31B** inference in `transformers`.
3. Fixed gibberish outputs for **E2B/E4B** when `use_cache=False`. See [issue #45242](https://github.com/huggingface/transformers/issues/45242).
4. Fixed **float16 audio** overflow from `-1e9` values.

If you see losses above **13–15,** for example **100** or **300** - gradient accumulation is likely being handled incorrectly. This is fixed in both **Unsloth** and **Unsloth Studio**.

#### Gemma 4 Quant Re-uploads

We also updated our Gemma 4 GGUFs so you will need to re-download. Again, these quant issues are **not related to or caused by Unsloth**:

1. CUDA: check for buffer overlap before fusing - critical fix for `<unused24>` tokens - [PR #21566](https://github.com/ggml-org/llama.cpp/pull/21566)
2. `kv-cache`: support attention rotation for heterogeneous iSWA - [PR #21513](https://github.com/ggml-org/llama.cpp/pull/21513)
3. `vocab`: add byte token handling to BPE detokenizer for Gemma 4 - [PR #21488](https://github.com/ggml-org/llama.cpp/pull/21488)
4. `convert`: set `"add bos" == True` for Gemma 4 - [PR #21500](https://github.com/ggml-org/llama.cpp/pull/21500)
5. `common`: add Gemma 4 specialized parser - [PR #21418](https://github.com/ggml-org/llama.cpp/pull/21418)
6. `llama-model`: read `final_logit_softcapping` for Gemma 4 - [PR #21390](https://github.com/ggml-org/llama.cpp/pull/21390)
7. `llama`: add custom newline split for Gemma 4 - [PR #21406](https://github.com/ggml-org/llama.cpp/pull/21406)

#### Unsloth Studio Updates

* Add **speculative decoding** support (ngram-mod, on by default)
* Llama.cpp updated to use latest version with all Gemma 4 Fixes
* Fix Qwen3.5 and Gemma 4 training issues
* Enable exporting and saving of Gemma 4 models
* Harden sandbox security for terminal and python tools
* Let recipes use the model loaded in Chat
* Fix empty chat threads on navigation (and whenever switching tabs) and stabilize new chat flow
* Allow non-LLM recipes to run and move Data tab first in executions
* Reuse HF cached repo casing to prevent duplicate downloads
  {% endupdate %}

{% update date="2026-04-03" tags="new-releases,v0.1.36-beta" %}

## **Google - Gemma 4**

* You can now run and train the [Gemma 4](/docs/models/gemma-4.md) models in Unsloth.
* Intel Mac now works
* Pre-compiled binaries for llama.cpp for 2 Gemma-4 fixes:
  * vocab: fix Gemma4 tokenizer ([#21343](https://github.com/ggml-org/llama.cpp/pull/21343))
  * fix: gemma 4 template ([#21326](https://github.com/ggml-org/llama.cpp/pull/21326))
* Tool calls for smaller models are now more stable and don't cut off anymore
* Pre-compiled binaries for Windows, Linux, Mac, WSL devices - CPU and GPU
* Speculative Decoding added for non vision models (Gemma-4 is vision sadly and Qwen3.5)
* Context length is now properly applied.
* Web search now actually gets web content and not just summaries
* 90% reduced HF API calls - less rate limits
  {% endupdate %}

{% update date="2026-03-31" tags="new-releases,improvements" %}

## **+50% tool call accuracy + more support**

* Tool calls for all models are now **+30% to +80% more accurate.**
* Web search now actually gets web content and not just summaries
* Number of tool calls allowed are increased to 25 from 10
* Tool calls now terminate much better, so looping / repetitions will be reduced
* More **tool call healing** and de-duplication logic to stop tool callings from leaking XML as well
* Tested with `unsloth/Qwen3.5-4B-GGUF` (`UD-Q4_K_XL`), web search + code execution + thinking enabled.

| Metric                       | Before | After     |
| ---------------------------- | ------ | --------- |
| XML leaks in response        | 10/10  | 0/10      |
| URL fetches used             | 0      | 4/10 runs |
| Runs with correct song names | 0/10   | 2/10      |
| Avg tool calls               | 5.5    | 3.8       |
| Avg response time            | 12.3s  | 9.8s      |

#### New features

* Added **custom folders** so you can use any GGUFs in any folder - for now access in Advanced Settings in Chat and Custom Folders
* **Update button** now visible
* Install script styling all updated!
* Preliminary **Automatic Multi GPU support for inference and training** - useful for large models that don't fit on 1 GPU - Studio auto will allocate GPU resources
* Intel Macs should work out of the box

### Much smoother and faster Studio

* **Fixed timeouts of downloads of large models** - no more timeouts seen.
* **Fixed Hugging Face rate limiting - HF API calls reduced by 90%**
* Fixed bun on Windows and faster installs
  {% endupdate %}

{% update date="2026-03-27" tags="new-releases,fixes,improvements" %}

## **New Important Updates**

It’s only been 2 days since our previous release, but we’ve got a more important updates:

* **Inference is now 20–30% faster.** Previously, tool-calling and repeat penalty could slow inference below normal speeds. Inference tokens/s should now perform the same as `llama-server` / `llama.cpp`.
* **Now Auto-detects older or pre-existing models** downloaded from **LM Studio, Hugging Face,** and similar sources.
* **Inference token/s speed is now calculated correctly.** Previously, tokens/s included startup time, which made the displayed speed look slower than it actually was. It should now reflect 'true' inference speed.
* **CPU usage no longer spikes.** Previously, inline querier identity changed every render, causing `useLiveQuery` to resubscribe continuously.
* **Unsloth Studio now has a shutdown x button and shuts down properly.** Previously, closing it after opening from the desktop icon would not close it properly. Now, launching from the shortcut also opens the terminal, and closing that terminal fully exits Unsloth Studio. If you still have it open from a previous session you can restart your computer or run `lsof -i :8888` then `kill -9 <PID>`.
* **Even better tool-calling and websearch** with reduced errors.
* Updated documentation with lots of new info on [deleting models, uninstalling](/docs/new/studio/install.md#uninstall) etc.
* **Cleaner, smarter install and setup logging across Windows and Linux.** Output is now easier to read with consistent formatting, quieter by default for a smoother experience, and supports richer `--verbose` diagnostics when you want full technical detail.
* You can now view your training history!
  {% endupdate %}

{% update date="2026-03-25" tags="new-releases,fixes,improvements" %}

## First Release post Unsloth Studio

Hey guys, this is our first release since we launched Unsloth Studio. Lots of new features and fixes:

* **You can now update Unsloth Studio!** Please update via the same install commands.
* **Windows** CPU or GPU now works seamlessly. Please reinstall!
* **App shortcuts**. Once installed, you can now launch in Windows, MacOS and Linux via a shortcut icon in the Start / Launch and Desktop.
* **Pre-compiled `llama.cpp` binaries** and `mamba_ssm` - 6x faster installs! Also <300MB in size for binaries.
* **50% reduced installation sizes** (-7GB or more savings), 2x faster installs and faster resolving. 50% smaller pypi sizes.
* **Tool calling improved.** Better llama.cpp parsing, no raw tool markup in chat, faster inference, a new Tool Outputs panel, timers.
* MacOS and CPU now have [Data Recipes](/docs/new/studio/data-recipe.md) enabled with multi-file uploading.
* **AMD support preliminary for Linux** only machines - auto detects.
* **Settings sidebar redesign.** Settings are now grouped into **Model, Sampling, Tools, and Preferences**
* **Context length** now adjustable. Keep in mind this is not needed as llama.cpp smartly uses the exact context you need via `--fit on`
* **Multi-file upload.** Data recipes now support multiple drag-and-drop uploads for PDF, DOCX, TXT, and MD, with backend extraction, saved uploads, and improved previews.
* **Colab** with free T4 GPUs with Unsloth Studio now fixed! [Try it here](https://colab.research.google.com/github/unslothai/unsloth/blob/main/studio/Unsloth_Studio_Colab.ipynb). Due to pre-compiled binaries, it's also 20x faster!
* **Better chat observability.** Studio now shows `llama-server` timings and usage, a context-window usage bar, and richer source hover cards.
* **Better UX overall** - clickable links, better LaTeX parsing, tool / code / web tooltips for default cards and much more!
* **LiteLLM -** Unsloth Studio and Unsloth were **NOT** affected by the recent LiteLLM compromise. Nemo Data Designer used LiteLLM only up to `1.80`, not the affected `1.82.7` or `1.82.8`, and has since removed it entirely.
* We now have a new one line install command, just run:&#x20;

  <pre class="language-bash" data-overflow="wrap" data-expandable="true"><code class="lang-bash">curl -fsSL https://unsloth.ai/install.sh | sh
  </code></pre>

#### **Fixes:**

* **Windows/setup improvements.** Fixed silent Windows exits, Anaconda/conda-forge startup crashes, broken non-NVIDIA Windows installs, and missing early CUDA/stale-venv setup checks.
* **System prompts fixed.** They work again for non-GGUF text and vision inference.
* **Persistent system prompts and presets.** Custom system prompts and chat presets now persist across reloads and page changes.
* **GGUF export expanded.** Full fine-tunes, not just LoRA/PEFT, can now export to GGUF. Base model resolution is more reliable, and unsupported export options are disabled in the UI.
* **Chat scroll/layout fixes.** Fixed scroll-position issues during generation, thinking-panel layout shift, and viewport jumps when collapsing reasoning panels.
* **Smarter port conflict detection.** Studio now detects loopback conflicts, can identify the blocking process when possible, and gives clearer fallback-port messages.
  {% endupdate %}

{% update date="2026-03-17" tags="fixes,improvements" %}

## New Tool calling + Windows Stability

* Claude Artifacts works so HTML can be executed like a snake game inside the chat
* +30% more accurate tool calls esp for small models + Timer for tool calls
* Tool + Web Search outputs can be saved + Toggle auto healing tool on/off
* Many bug fixes - Windows CPU works, Mac more seamless, faster and smaller installs
  {% endupdate %}
  {% endupdates %}


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://unsloth.ai/docs/new/changelog.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
