> For the complete documentation index, see [llms.txt](https://unsloth.ai/docs/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://unsloth.ai/docs/zh/ji-chu/inference-and-deployment/saving-to-ollama.md). # 将模型保存到 Ollama 请参阅下方指南，了解将模型保存到的完整流程 [Ollama](https://github.com/ollama/ollama): {% content-ref url="/pages/e36b62f99558f2c1e1e644ea9bc15aa9621d449b" %} [Tutorial: Finetune Llama-3 and Use In Ollama](/docs/zh/kai-shi-shi-yong/fine-tuning-llms-guide/tutorial-how-to-finetune-llama-3-and-use-in-ollama.md) {% endcontent-ref %} ### 在 Google Colab 中保存你可以像下面这样将微调后的模型保存为一个名为 LoRA 适配器的小型 100MB 文件。如果你想上传你的模型，也可以改为推送到 Hugging Face hub！记得通过以下链接获取 Hugging Face token：并添加你的 token！

保存模型后，我们又可以使用 Unsloth 来运行模型本身！使用 `FastLanguageModel` 再次调用它进行推理！

### 导出到 Ollama 最后，我们可以将微调后的模型直接导出到 Ollama！首先，我们必须在 Colab 笔记本中安装 Ollama：

然后，我们将手头的微调模型导出为 llama.cpp 的 GGUF 格式，如下所示：

提醒：将 `False` 改为 `True` ，只针对第 1 行，不要把每一行都改成 `True`，否则你会等很久！我们通常建议将第一行设置为 `True`，这样就能快速将微调模型导出为 `Q8_0` 格式（8 位量化）。我们也允许你导出为一整套量化方法，其中较受欢迎的是 `q4_k_m`. 前往了解更多关于 GGUF 的信息。如果你愿意，我们也提供了手动导出为 GGUF 的说明：你会看到一长串如下所示的文本——请等待 5 到 10 分钟！！

最后，在最末尾时，它会如下所示：

然后，我们必须在后台运行 Ollama 本身。我们使用 `subprocess` 因为 Colab 不喜欢异步调用，但通常人们只需在终端/命令提示符中运行 `ollama serve` 。

### 自动 `Modelfile` 创建 Unsloth 提供的技巧是，我们会自动创建一个 `Modelfile` 而 Ollama 需要这个！这只是一组设置，并且包含了我们在微调过程中使用的聊天模板！你也可以像下面这样打印 `Modelfile` 生成的内容：

然后，我们通过使用 `Modelfile`

### Ollama 推理来让 Ollama 创建一个与 Ollama 兼容的模型。现在，如果你想直接调用正在你自己的本地机器上运行的 Ollama 服务器 / 在免费的 Colab 笔记本中后台运行的 Ollama 服务器，就可以进行推理了。记得你可以编辑黄色下划线部分。

### 在 Unsloth 中运行效果很好，但导出并在 Ollama 上运行后，结果很差你有时可能会遇到这样的问题：模型在 Unsloth 上运行并产生良好结果，但当你在另一个平台（如 Ollama）上使用它时，结果很差，或者你可能会得到乱码、无休止/无限生成 *或* 重复输出**.** * 这种错误最常见的原因是使用了 **错误的聊天模板****.** 必须使用与在 Unsloth 中训练模型时相同的聊天模板，并在之后将其用于其他框架，例如 llama.cpp 或 Ollama。对已保存的模型进行推理时，正确应用模板至关重要。 * 你必须使用正确的 `eos 令牌`。否则，在较长的生成中你可能会得到乱码。 * 也可能是因为你的推理引擎添加了一个不必要的“序列开始”令牌（反之亦然缺少该令牌），所以请确保同时检查这两种假设！ * **使用我们的对话式笔记本来强制使用聊天模板——这会修复大多数问题。** * Qwen-3 14B 对话式笔记本 [**在 Colab 中打开**](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Qwen3_\(14B\)-Reasoning-Conversational.ipynb) * Gemma-3 4B 对话式笔记本 [**在 Colab 中打开**](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Gemma3_\(4B\).ipynb) * Llama-3.2 3B 对话式笔记本 [**在 Colab 中打开**](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3.2_\(1B_and_3B\)-Conversational.ipynb) * Phi-4 14B 对话式笔记本 [**在 Colab 中打开**](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Phi_4-Conversational.ipynb) * Mistral v0.3 7B 对话式笔记本 [**在 Colab 中打开**](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Mistral_v0.3_\(7B\)-Conversational.ipynb) * **更多笔记本请查看我们的** [**笔记本文档**](/docs/zh/kai-shi-shi-yong/unsloth-notebooks.md) --- # Agent Instructions This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com. ## Querying This Documentation If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question. Perform an HTTP GET request on the current page URL with the `ask` query parameter: ``` GET https://unsloth.ai/docs/zh/ji-chu/inference-and-deployment/saving-to-ollama.md?ask= ``` The question should be specific, self-contained, and written in natural language. The response will contain a direct answer to the question and relevant excerpts and sources from the documentation. Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.