# 推理故障排查

### 在 Unsloth 上运行效果良好，但导出并在其他平台上运行后，结果很差

有时您可能会遇到这样的情况：您的模型在 Unsloth 上运行并产生良好结果，但当您在另一个平台（例如 Ollama 或 vLLM）上使用它时，结果很差，或者可能出现乱码、无限/无尽生成 *或* 重复输&#x51FA;**.**

* 此错误最常见的原因是使用了 <mark style="background-color:blue;">**不正确的对话模板**</mark>**.** 在 Unsloth 中训练模型时使用的对话模板与随后在另一个框架（例如 llama.cpp 或 Ollama）中运行时使用的模板必须相同。当从已保存的模型进行推理时，应用正确的模板至关重要。
* 您必须使用正确的 `eos 标记`。否则，在较长的生成中可能会出现乱码。
* 这也可能是因为您的推理引擎添加了不必要的“序列开始”标记（或者相反地缺少该标记），因此请务必检查这两种情况！
* <mark style="background-color:green;">**使用我们的对话笔记本来强制设置对话模板——这将解决大多数问题。**</mark>
  * Qwen-3 14B 对话笔记本 [**在 Colab 中打开**](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Qwen3_\(14B\)-Reasoning-Conversational.ipynb)
  * Gemma-3 4B 对话笔记本 [**在 Colab 中打开**](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Gemma3_\(4B\).ipynb)
  * Llama-3.2 3B 对话笔记本 [**在 Colab 中打开**](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3.2_\(1B_and_3B\)-Conversational.ipynb)
  * Phi-4 14B 对话笔记本 [**在 Colab 中打开**](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Phi_4-Conversational.ipynb)
  * Mistral v0.3 7B 对话笔记本 [**在 Colab 中打开**](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Mistral_v0.3_\(7B\)-Conversational.ipynb)
  * **更多笔记本在我们的** [**notebooks 仓库**](https://github.com/unslothai/notebooks)**.**

### 保存为 `safetensors`，而不是 `bin` 格式（在 Colab 中）

我们在 Colab 中保存为 `.bin` ，这样大约快 4 倍，但设置 `safe_serialization = None` 以强制保存为 `.safetensors`。所以 `model.save_pretrained(..., safe_serialization = None)` 或 `model.push_to_hub(..., safe_serialization = None)`

### 如果保存为 GGUF 或 vLLM 16 位导致崩溃

您可以尝试通过更改来减少保存时的最大 GPU 使用率 `maximum_memory_usage`.

默认值是 `model.save_pretrained(..., maximum_memory_usage = 0.75)`。将其降低到例如 0.5 以使用 50% 的 GPU 峰值内存或更低。这可以减少保存时的 OOM 崩溃。


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://unsloth.ai/docs/zh/ji-chu/inference-and-deployment/troubleshooting-inference.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.