flask-gearQwen3.5: Fine-tuning Guide

Learn how to fine-tune Qwen3.5 with Unsloth.

You can now fine-tune the Qwen3.5 model family (27B, 35B‑A3B, 122B‑A10B, 397B‑A17B) with Unslotharrow-up-right. Support includes both vision and text fine-tuning. Qwen3.5‑35B‑A3B - bf16 LoRA works on 74GB VRAM.

  • Qwen3.5‑27B - bf16 LoRA works on 56GB VRAM and 4-bit QLoRA on 28GB

  • Supports our recent ~12x faster MoE training update with >35% less VRAM & ~6x longer context

Qwen3.5 fine-tuning Colab notebooks:

  • If you want to preserve reasoning ability, you can mix reasoning-style examples with direct answers (keep a minimum of 75% reasoning). Otherwise you can emit it fully.

  • After fine-tuning, you can export to GGUF (for llama.cpp/Ollama/LM Studio/etc.) or vLLM

If you’re on an older version (or fine-tuning locally), update first:

pip install --upgrade --force-reinstall --no-cache-dir unsloth unsloth_zoo

MoE fine-tuning

For MoE models like Qwen3.5‑35B‑A3B / 122B‑A10B / 397B‑A17B:

  • Best to use bf16 setups (e.g. LoRA or full fine-tuning) (MoE QLoRA 4‑bit is not recommended due to BitsandBytes limitations).

  • Unsloth’s MoE kernels are enabled by default and can use different backends; you can switch with UNSLOTH_MOE_BACKEND.

  • Router-layer fine-tuning is disabled by default for stability.

  • Qwen3.5‑122B‑A10B - bf16 LoRA works on 256GB VRAM. If you're using multiGPUs, add device_map = "balanced" or follow our multiGPU Guide.

Quickstart

Below is a minimal SFT recipe (works for “text-only” fine-tuning). See also our vision fine-tuning section.

circle-info

Qwen3.5 is “Causal Language Model with Vision Encoder” (it’s a unified VLM), so ensure you have the usual vision deps installed (torchvision, pillow) if needed, and keep Transformers up-to-date. Use the latest Transformers for Qwen3.5.

circle-info

If you OOM:

  • Drop per_device_train_batch_size to 1 and/or reduce max_seq_length.

  • Keep use_gradient_checkpointing="unsloth" on (it’s designed to reduce VRAM use and extend context length).

Loader example for MoE (bf16 LoRA):

Once loaded, you’ll attach LoRA adapters and train similarly to the SFT example above.

Vision fine-tuning

Unsloth supports vision fine-tuning for the multimodal Qwen3.5 models. You can read / use our Qwen3-VL guide for reference. Use the below Qwen3-VL notebooks and change the respective model names to your desired Qwen3.5 model.

Disabling Vision / Text-only fine-tuning:

To fine-tune vision models, we now allow you to select which parts of the mode to finetune. You can select to only fine-tune the vision layers, or the language layers, or the attention / MLP layers! We set them all on by default!

In order to fine-tune or train Qwen3.5 with multi-images, view our multi-image vision guide.

Saving / export fine-tuned model

You can view our specific inference / deployment guides for llama.cpp, vLLM, llama-server, Ollama, LM Studio or SGLang.

Save to GGUF

Unsloth supports saving directly to GGUF:

Or push GGUFs to Hugging Face:

If the exported model behaves worse in another runtime, Unsloth flags the most common cause: wrong chat template / EOS token at inference time (you must use the same chat template you trained with).

Save to vLLM

To save to 16-bit for vLLM, use:

To save just the LoRA adapters, either use:

Or use our builtin function:

For more details read our inference guides:

Last updated

Was this helpful?