Qwen3.5: Fine-tuning Guide
Learn how to fine-tune Qwen3.5 with Unsloth.
You can now fine-tune the Qwen3.5 model family (27B, 35B‑A3B, 122B‑A10B, 397B‑A17B) with Unsloth. Support includes both vision and text fine-tuning. Qwen3.5‑35B‑A3B - bf16 LoRA works on 74GB VRAM.
Qwen3.5‑27B - bf16 LoRA works on 56GB VRAM and 4-bit QLoRA on 28GB
Supports our recent ~12x faster MoE training update with >35% less VRAM & ~6x longer context
Qwen3.5 fine-tuning Colab notebooks:
If you want to preserve reasoning ability, you can mix reasoning-style examples with direct answers (keep a minimum of 75% reasoning). Otherwise you can emit it fully.
If you’re on an older version (or fine-tuning locally), update first:
pip install --upgrade --force-reinstall --no-cache-dir unsloth unsloth_zooMoE fine-tuning
For MoE models like Qwen3.5‑35B‑A3B / 122B‑A10B / 397B‑A17B:
Best to use bf16 setups (e.g. LoRA or full fine-tuning) (MoE QLoRA 4‑bit is not recommended due to BitsandBytes limitations).
Unsloth’s MoE kernels are enabled by default and can use different backends; you can switch with
UNSLOTH_MOE_BACKEND.Router-layer fine-tuning is disabled by default for stability.
Qwen3.5‑122B‑A10B - bf16 LoRA works on 256GB VRAM. If you're using multiGPUs, add
device_map = "balanced"or follow our multiGPU Guide.
Quickstart
Below is a minimal SFT recipe (works for “text-only” fine-tuning). See also our vision fine-tuning section.
Qwen3.5 is “Causal Language Model with Vision Encoder” (it’s a unified VLM), so ensure you have the usual vision deps installed (torchvision, pillow) if needed, and keep Transformers up-to-date. Use the latest Transformers for Qwen3.5.
If you OOM:
Drop
per_device_train_batch_sizeto 1 and/or reducemax_seq_length.Keep
use_gradient_checkpointing="unsloth"on (it’s designed to reduce VRAM use and extend context length).
Loader example for MoE (bf16 LoRA):
Once loaded, you’ll attach LoRA adapters and train similarly to the SFT example above.
Vision fine-tuning
Unsloth supports vision fine-tuning for the multimodal Qwen3.5 models. You can read / use our Qwen3-VL guide for reference. Use the below Qwen3-VL notebooks and change the respective model names to your desired Qwen3.5 model.
Disabling Vision / Text-only fine-tuning:
To fine-tune vision models, we now allow you to select which parts of the mode to finetune. You can select to only fine-tune the vision layers, or the language layers, or the attention / MLP layers! We set them all on by default!
In order to fine-tune or train Qwen3.5 with multi-images, view our multi-image vision guide.
Saving / export fine-tuned model
You can view our specific inference / deployment guides for llama.cpp, vLLM, llama-server, Ollama, LM Studio or SGLang.
Save to GGUF
Unsloth supports saving directly to GGUF:
Or push GGUFs to Hugging Face:
If the exported model behaves worse in another runtime, Unsloth flags the most common cause: wrong chat template / EOS token at inference time (you must use the same chat template you trained with).
Save to vLLM
To save to 16-bit for vLLM, use:
To save just the LoRA adapters, either use:
Or use our builtin function:
For more details read our inference guides:
Last updated
Was this helpful?

