# Qwen3.5 微调指南

你现在可以微调 [Qwen3.5](/docs/zh/mo-xing/qwen3.5.md) 模型家族（0.8B、2B、4B、9B、27B、35B‑A3B、122B‑A10B）使用 [**使用 Unsloth 进行**](https://github.com/unslothai/unsloth)。支持包括 [视觉](/docs/zh/mo-xing/qwen3.5/fine-tune.md#vision-fine-tuning)、文本和 [RL](#reinforcement-learning-rl) 微调。 **Qwen3.5‑35B‑A3B** - bf16 LoRA 可在 **74GB 显存上运行。**

* Unsloth 使 Qwen3.5 训练 **快 1.5 倍** 并且使用 **比 FA2 配置少 50% 的显存** 。
* Qwen3.5 bf16 LoRA 显存使用： **0.8B**：3GB • **2B**：5GB • **4B**：10GB • **9B**：22GB • **27B**：56GB
* 通过我们的 **0.8B**, **2B** 以及 **4B** 通过我们的 bf16 LoRA **免费** **Google Colab 笔记本**:

| [Qwen3.5-**0.8B**](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Qwen3_5_\(0_8B\)_Vision.ipynb) | [Qwen3.5-**2B**](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Qwen3_5_\(2B\)_Vision.ipynb) | [Qwen3.5-**4B**](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Qwen3_5_\(4B\)_Vision.ipynb) | [Qwen3.5-4B **GRPO**](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Qwen3_5_\(4B\)_Vision_GRPO.ipynb) |
| --------------------------------------------------------------------------------------------------------------------------- | ----------------------------------------------------------------------------------------------------------------------- | ----------------------------------------------------------------------------------------------------------------------- | --------------------------------------------------------------------------------------------------------------------------------- |

* 如果你想 **保留推理** 能力，你可以将推理风格示例与直接答案混合使用（至少保留 75% 的推理）。否则你可以完全输出。
* **全量微调（FFT）** 也同样可用。请注意，这将使用 4 倍更多的显存。
* Qwen3.5 适合多语言微调，因为它支持 201 种语言。
* 微调完成后，你可以导出为 [GGUF](#saving-export-your-fine-tuned-model) （适用于 llama.cpp/Ollama 等）或 [vLLM](#saving-export-your-fine-tuned-model)
* [强化学习](/docs/zh/kai-shi-shi-yong/reinforcement-learning-rl-guide.md) （RL）用于 Qwen3.5 [VLM RL](/docs/zh/kai-shi-shi-yong/reinforcement-learning-rl-guide/vision-reinforcement-learning-vlm-rl.md) 也可通过 Unsloth 推理使用。
* 我们有 **A100** Colab 笔记本用于 [Qwen3.5‑27B](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Qwen_3_5_27B_A100\(80GB\).ipynb) 以及 [Qwen3.5‑35B‑A3B](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Qwen3_5_MoE.ipynb).

如果你使用的是旧版本（或在本地微调），请先更新：

{% columns %}
{% column %}
Unsloth Studio：

{% code expandable="true" %}

```bash
unsloth studio update
```

{% endcode %}
{% endcolumn %}

{% column %}
基于代码的 Unsloth：

```bash
pip install --upgrade --force-reinstall --no-cache-dir unsloth unsloth_zoo
```

{% endcolumn %}
{% endcolumns %}

{% hint style="warning" %}
**请使用 `transformers v5` 用于 Qwen3.5。旧版本将无法使用。Unsloth 现在默认会自动使用 transformers v5（Colab 环境除外）。**

如果训练看起来 **比平时更慢**，那是因为 Qwen3.5 使用了自定义的 Mamba Triton 内核。编译这些内核可能比正常情况更久，尤其是在 T4 GPU 上。

不建议在 Qwen3.5 模型上进行 QLoRA（4-bit）训练，无论是 MoE 还是稠密模型，都因为量化差异高于正常水平。
{% endhint %}

### MoE 微调（35B、122B）

对于像以下这样的 MoE 模型 **Qwen3.5‑35B‑A3B / 122B‑A10B / 397B‑A17B**:

* 你可以使用我们的 [Qwen3.5‑35B‑A3B（A100）](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Qwen3_5_MoE.ipynb) 微调笔记本
* 支持我们最近约快 12 倍的 [MoE 训练更新](/docs/zh/ji-chu/faster-moe.md) 显存减少 >35%，上下文长度约提升 6 倍
* **最好使用 bf16 配置（例如 LoRA 或全参数微调）** （由于 BitsandBytes 的限制，不建议使用 MoE QLoRA 4-bit）。
* Unsloth 的 MoE 内核默认启用，并且可以使用不同后端；你可以通过以下方式切换： `UNSLOTH_MOE_BACKEND`.
* 为稳定起见，默认禁用路由层微调。
* Qwen3.5‑122B‑A10B - bf16 LoRA 可在 256GB 显存上运行。如果你使用多 GPU，请添加     `device_map = "balanced"` 或者参考我们的 [多 GPU 指南](/docs/zh/ji-chu/multi-gpu-training-with-unsloth.md).

### 快速开始

#### 🦥 Unsloth Studio 指南

Qwen3.5 可以在以下环境中运行和微调 [Unsloth Studio](/docs/zh/xin/studio.md)中运行和微调，这是我们为本地 AI 推出的新开源网页界面。使用 Unsloth Studio，你可以在以下系统上本地运行模型： **MacOS、Windows**、Linux 以及：

{% columns %}
{% column %}

* [训练 LLM](/docs/zh/xin/studio.md#no-code-training) 速度提升 2 倍，VRAM 减少 70%
* 搜索、下载、 [运行 GGUF](/docs/zh/xin/studio.md#run-models-locally) 和 safetensor 模型
* [**自我修复** 工具调用](/docs/zh/xin/studio.md#execute-code--heal-tool-calling) + **网页搜索**
* [**代码执行**](/docs/zh/xin/studio.md#run-models-locally) （Python、Bash）
* [自动推理](/docs/zh/xin/studio.md#model-arena) 参数调优（temp、top-p 等）
* 通过 llama.cpp 实现快速 CPU + GPU 推理
  {% endcolumn %}

{% column %}

<div data-with-frame="true"><figure><img src="/files/228db5f011784652639aeba0c62d0e7fc0bf915c" alt=""><figcaption></figcaption></figure></div>
{% endcolumn %}
{% endcolumns %}

{% stepper %}
{% step %}

#### 安装 Unsloth

在终端中运行：

**MacOS、Linux、WSL：**

```bash
curl -fsSL https://unsloth.ai/install.sh | sh
```

**Windows PowerShell：**

```bash
irm https://unsloth.ai/install.ps1 | iex
```

{% hint style="success" %}
**安装会很快，约需 1-2 分钟。**
{% endhint %}
{% endstep %}

{% step %}

#### 启动 Unsloth

**MacOS、Linux、WSL 和 Windows：**

```bash
unsloth studio -H 0.0.0.0 -p 8888
```

**然后在浏览器中打开 `http://localhost:8888` 。**
{% endstep %}

{% step %}

#### 训练 Qwen3.5

首次启动时，你需要创建一个密码以保护你的账户，并在之后重新登录。随后你会看到一个简短的引导向导，用于选择模型、数据集和基本设置。你可以随时跳过。

在搜索栏中搜索 Qwen3.5，并选择你想要的模型和数据集。接着，按需调整你的超参数和上下文长度。

<div data-with-frame="true"><figure><img src="/files/8e814335066f001aac505d8c4d38681117d6fe7d" alt="" width="563"><figcaption></figcaption></figure></div>
{% endstep %}

{% step %}

#### 监控训练进度

点击开始训练后，你将能够监控并观察模型的训练进度。训练损失应当稳定下降。\
完成后，模型会自动保存。

<div data-with-frame="true"><figure><img src="/files/c784ae65c4e99b93e1c7d82eb9ebdcc48206759a" alt="" width="563"><figcaption></figcaption></figure></div>
{% endstep %}

{% step %}

#### 导出你的微调模型

完成后，Unsloth Studio 允许你将模型导出为 GGUF、safetensor 等格式。

<div data-with-frame="true"><figure><img src="/files/06f06e74240dd5f0b2601598c4f954733de4727f" alt="" width="563"><figcaption></figcaption></figure></div>
{% endstep %}
{% endstepper %}

#### Unsloth Core（基于代码）指南：

下面是一个最小 SFT 配方（适用于“仅文本”微调）。另请参阅我们的 [视觉微调](/docs/zh/ji-chu/vision-fine-tuning.md) 部分。

{% hint style="info" %}
Qwen3.5 是“带视觉编码器的因果语言模型”（它是一个统一的 VLM），因此请确保你已安装常规的视觉依赖项（`torchvision`, `pillow`）如有需要，并保持 Transformers 为最新版本。请使用最新的 Transformers 版本来支持 Qwen3.5。

**如果你想进行** [**GRPO**](/docs/zh/kai-shi-shi-yong/reinforcement-learning-rl-guide.md)**，只要禁用 fast vLLM 推理并改用 Unsloth 推理，它就能在 Unsloth 中运行。请参考我们的** [**视觉 RL**](/docs/zh/kai-shi-shi-yong/reinforcement-learning-rl-guide/vision-reinforcement-learning-vlm-rl.md) **笔记本示例。**
{% endhint %}

{% code expandable="true" %}

```python
from unsloth import FastLanguageModel
import torch
from datasets import load_dataset
from trl import SFTTrainer, SFTConfig

max_seq_length = 2048  # 先从小开始；能工作后再扩大

# 示例数据集（替换为你的）。需要一个 "text" 列。
url = "https://huggingface.co/datasets/laion/OIG/resolve/main/unified_chip2.jsonl"
dataset = load_dataset("json", data_files={"train": url}, split="train")

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "Qwen/Qwen3.5-27B",
    max_seq_length = max_seq_length,
    load_in_4bit = False,     # 不建议使用 MoE QLoRA，稠密 27B 没问题
    load_in_16bit = True,     # bf16/16 位 LoRA
    full_finetuning = False,
)

model = FastLanguageModel.get_peft_model(
    model,
    r = 16,
    target_modules = [
        "q_proj", "k_proj", "v_proj", "o_proj",
        "gate_proj", "up_proj", "down_proj",
    ],
    lora_alpha = 16,
    lora_dropout = 0,
    bias = "none",
    # "unsloth" 检查点机制适用于超长上下文 + 更低 VRAM
    use_gradient_checkpointing = "unsloth",
    random_state = 3407,
    max_seq_length = max_seq_length,
)

trainer = SFTTrainer(
    model = model,
    train_dataset = dataset,
    tokenizer = tokenizer,
    args = SFTConfig(
        max_seq_length = max_seq_length,
        per_device_train_batch_size = 1,
        gradient_accumulation_steps = 4,
        warmup_steps = 10,
        max_steps = 100,
        logging_steps = 1,
        output_dir = "outputs_qwen35",
        optim = "adamw_8bit",
        seed = 3407,
        dataset_num_proc = 1,
    ),
)

trainer.train()
```

{% endcode %}

{% hint style="info" %}
如果你遇到 OOM：

* 将 `per_device_train_batch_size` 调低到 **1** 和/或降低 `max_seq_length`.&#x20;
* 保留 `use_`[`gradient_checkpointing`](/docs/zh/bo-ke/500k-context-length-fine-tuning.md#unsloth-gradient-checkpointing-enhancements)`="unsloth"` 开启（它旨在减少 VRAM 使用并延长上下文长度）。
  {% endhint %}

**MoE 加载示例（bf16 LoRA）：**

```python
import os
import torch
from unsloth import FastModel

model, tokenizer = FastModel.from_pretrained(
    model_name = "unsloth/Qwen3.5-35B-A3B",
    max_seq_length = 2048,
    load_in_4bit = False,     # 不建议使用 MoE QLoRA，稠密 27B 没问题
    load_in_16bit = True,     # bf16/16 位 LoRA
    full_finetuning = False,
)
```

加载完成后，你将附加 LoRA 适配器，并像上面的 SFT 示例那样进行训练。

### 视觉微调

Unsloth 支持 [视觉微调](/docs/zh/ji-chu/vision-fine-tuning.md) 用于多模态 Qwen3.5 模型。请使用下面的 Qwen3.5 笔记本，并将相应的模型名称改为你想要的 Qwen3.5 模型。

| [Qwen3.5-**0.8B**](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Qwen3_5_\(0_8B\)_Vision.ipynb) | [Qwen3.5-**2B**](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Qwen3_5_\(2B\)_Vision.ipynb) | [Qwen3.5-**4B**](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Qwen3_5_\(4B\)_Vision.ipynb) | Qwen3.5-**9B** |
| --------------------------------------------------------------------------------------------------------------------------- | ----------------------------------------------------------------------------------------------------------------------- | ----------------------------------------------------------------------------------------------------------------------- | -------------- |

* [Qwen3-VL GRPO/GSPO RL 笔记本](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Qwen3_VL_\(8B\)-Vision-GRPO.ipynb) （将模型名称改为 Qwen3.5-4B 等）

**禁用视觉 / 仅文本微调：**

为了微调视觉模型，我们现在允许你选择要微调模型的哪些部分。你可以选择只微调视觉层，或语言层，或注意力 / MLP 层！我们默认全部开启！

{% code expandable="true" %}

```python
model = FastVisionModel.get_peft_model(
    model,
    finetune_vision_layers     = True, # 如果不微调视觉层则为 False
    finetune_language_layers   = True, # 如果不微调语言层则为 False
    finetune_attention_modules = True, # 如果不微调注意力层则为 False
    finetune_mlp_modules       = True, # 如果不微调 MLP 层则为 False

    r = 16,                           # 越大，准确率越高，但可能过拟合
    lora_alpha = 16,                  # 建议 alpha 至少等于 r
    lora_dropout = 0,
    bias = "none",
    random_state = 3407,
    use_rslora = False,               # 我们支持秩稳定化 LoRA
    loftq_config = None,               # 以及 LoftQ
    target_modules = "all-linear",    # 现在可选！如有需要可指定列表
    modules_to_save=[
        "lm_head",
        "embed_tokens",
    ],
)
```

{% endcode %}

为了使用多图像来微调或训练 Qwen3.5， 请查看我们的 [**多图像视觉指南**](/docs/zh/ji-chu/vision-fine-tuning.md#multi-image-training)**.**

### 强化学习（RL）

你现在可以使用 [我们的免费笔记本](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Qwen3_5_\(4B\)_Vision_GRPO.ipynb):

{% embed url="<https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Qwen3_5_(4B)_Vision_GRPO.ipynb>" %}

即使 vLLM 不支持，你仍然可以通过设置以下参数来使用 Unsloth 运行 Qwen3.5 RL： `fast_inference=False` 在加载模型时：

```python
from unsloth import FastLanguageModel

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name="unsloth/Qwen3.5-4B",
    fast_inference=False,
)
```

### 保存 / 导出微调模型

你可以查看我们针对以下内容的特定推理 / 部署指南： [Unsloth Studio](/docs/zh/xin/studio/export.md), [llama.cpp](/docs/zh/ji-chu/inference-and-deployment/saving-to-gguf.md), [vLLM](/docs/zh/ji-chu/inference-and-deployment/vllm-guide.md), [llama-server](/docs/zh/ji-chu/inference-and-deployment/llama-server-and-openai-endpoint.md), [Ollama](/docs/zh/ji-chu/inference-and-deployment/saving-to-ollama.md).

#### 保存为 GGUF

Unsloth 支持直接保存为 GGUF：

```python
model.save_pretrained_gguf("directory", tokenizer, quantization_method = "q4_k_m")
model.save_pretrained_gguf("directory", tokenizer, quantization_method = "q8_0")
model.save_pretrained_gguf("directory", tokenizer, quantization_method = "f16")
```

或者将 GGUF 推送到 Hugging Face：

```python
model.push_to_hub_gguf("hf_username/directory", tokenizer, quantization_method = "q4_k_m")
model.push_to_hub_gguf("hf_username/directory", tokenizer, quantization_method = "q8_0")
```

如果导出的模型在另一个运行时中的表现更差，Unsloth 会指出最常见的原因： **推理时使用了错误的聊天模板 / EOS 令牌** （你必须使用与你训练时相同的聊天模板）。

#### 保存到 vLLM

{% hint style="warning" %}
vLLM 版本 `0.16.0` 不支持 Qwen3.5。请等待 `0.170` 或者尝试 Nightly 版本。
{% endhint %}

要保存为 16-bit 以供 vLLM 使用，请使用：

{% code overflow="wrap" %}

```python
model.save_pretrained_merged("finetuned_model", tokenizer, save_method = "merged_16bit")
## 或上传到 HuggingFace：
model.push_to_hub_merged("hf/model", tokenizer, save_method = "merged_16bit", token = "")
```

{% endcode %}

如果只想保存 LoRA 适配器，可使用以下任一方式：

```python
model.save_pretrained("finetuned_lora")
tokenizer.save_pretrained("finetuned_lora")
```

或者使用我们内置的函数：

{% code overflow="wrap" %}

```python
model.save_pretrained_merged("finetuned_model", tokenizer, save_method = "lora")
## 或上传到 HuggingFace
model.push_to_hub_merged("hf/model", tokenizer, save_method = "lora", token = "")
```

{% endcode %}

更多详情请阅读我们的推理指南：

{% columns %}
{% column width="50%" %}
{% content-ref url="/pages/9a72670992feb75def412a693565c84a88c8a266" %}
[推理与部署](/docs/zh/ji-chu/inference-and-deployment.md)
{% endcontent-ref %}

{% content-ref url="/pages/b83d88f106d75c3396c46f5342fb401501910093" %}
[GGUF & llama.cpp](/docs/zh/ji-chu/inference-and-deployment/saving-to-gguf.md)
{% endcontent-ref %}
{% endcolumn %}

{% column width="50%" %}
{% content-ref url="/pages/f7c3389bdba9af3050e66a941596d827cdb11e0b" %}
[Model Export](/docs/zh/xin/studio/export.md)
{% endcontent-ref %}

{% content-ref url="/pages/9f0e22d200c9105481e4854b8473aba99ca44835" %}
[vLLM](/docs/zh/ji-chu/inference-and-deployment/vllm-guide.md)
{% endcontent-ref %}
{% endcolumn %}
{% endcolumns %}


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://unsloth.ai/docs/zh/mo-xing/qwen3.5/fine-tune.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.