# 如何使用 Unsloth Studio 运行模型 [Unsloth Studio](https://unsloth.ai/docs/zh/xin-zeng/studio) 让你能在电脑上 100% 离线运行 AI 模型。可运行 GGUF 和 safetensors 等模型格式，来源可以是 Hugging Face 或本地文件。 * **适用于所有 MacOS、CPU、Windows、Linux、WSL 环境！无需 GPU** * **搜索 + 下载 + 运行** 任何模型，如 GGUF、LoRA 适配器、safetensors 等。 * [**对比**](#model-arena) 将两个不同模型的输出并排比较 * [**自我修复式工具调用**](#auto-healing-tool-calling) / 网页搜索， [**代码执行**](#code-execution) 并调用与 OpenAI 兼容的 API * [**自动推理参数**](#auto-parameter-tuning) 调优（temp、top-p 等）并编辑聊天模板 * 上传图片、音频、PDF、代码、DOCX 以及更多文件类型来聊天。

### 使用 Unsloth Studio Chat {% columns %} {% column %} #### 搜索并运行模型你可以通过 Hugging Face 搜索并下载任何模型，或者使用本地文件。 Studio 支持多种模型类型，包括 **GGUF**、视觉语言和文本转语音模型。运行最新模型，例如 [Qwen3.5](https://unsloth.ai/docs/zh/mo-xing/qwen3.5) 或 NVIDIA [Nemotron 3](https://unsloth.ai/docs/zh/mo-xing/nemotron-3). 上传图片、音频、PDF、代码、DOCX 以及更多文件类型来聊天。 {% endcolumn %} {% column %}

{% endcolumn %} {% endcolumns %} {% hint style="success" %} Unsloth Studio Chat 会自动适用于 **多 GPU 配置** 进行推理。 {% endhint %} {% columns %} {% column %} #### 代码执行 Unsloth Studio 让 LLM 运行 Bash 和 Python，而不只是 JavaScript。它还会像 Claude Artifacts 一样对程序进行沙盒隔离，因此模型可以测试代码、生成文件，并用真实计算验证答案。这使得模型给出的答案更可靠、更准确。 {% endcolumn %} {% column %}

{% endcolumn %} {% endcolumns %} {% columns %} {% column %} #### 自动修复式工具调用 Unsloth Studio 不仅支持工具调用和网页搜索，还能自动修复可能发生的任何错误。这意味着你总能得到推理输出 **不会出现** 损坏的工具调用。例如，Qwen3.5-4B 搜索了 20 多个网站并引用了来源，网页搜索发生在它的思考轨迹内部。 {% endcolumn %} {% column %}

{% endcolumn %} {% endcolumns %} {% columns %} {% column %} #### 自动参数调优推理参数，例如 **temperature**, **top-p**, **top-k** 会自动为 Qwen3.5 等新模型预设，这样你无需担心设置就能获得最佳输出。你也可以手动调整参数并编辑系统提示。借助 llama.cpp 的智能自动上下文，已经无需再调整上下文长度，它只会使用你需要的上下文，而不会额外加载任何内容。 {% endcolumn %} {% column %}

{% endcolumn %} {% endcolumns %} {% columns %} {% column %} #### 聊天工作区输入提示，附加任何文档、图片（webp、png）、代码文件、txt 或音频作为额外上下文，并实时查看模型的回复。开关：思考 + 网页搜索。 {% endcolumn %} {% column %}

{% endcolumn %} {% endcolumns %} ### 模型竞技场 Studio Chat 让你使用同一个提示并排比较任意两个模型。例如比较基础模型和 LoRa 适配器。推理会先加载一个模型，然后再加载第二个（并行推理正在开发中）。

{% columns %} {% column %} 训练完成后，你可以使用相同的提示并排比较基础模型和微调后的模型，查看发生了什么变化以及结果是否有所提升。这种工作流能让你轻松看出微调如何改变模型的回复，以及它是否改善了你的使用场景结果。 {% endcolumn %} {% column %}

{% endcolumn %} {% endcolumns %} {% hint style="success" %} Unsloth Studio Chat 自动适用于 **多 GPU 配置** 进行推理。 {% endhint %} ### 使用旧的 / 现有的 GGUF 模型 {% columns %} {% column %} **4 月 1 日更新：** 现在你可以选择一个现有文件夹，让 Unsloth 从中检测。 **3 月 27 日更新：** Unsloth Studio 现在 **会自动检测旧的 / 预先存在的模型** 这些模型是从 Hugging Face、LM Studio 等下载的。 {% endcolumn %} {% column %}

{% endcolumn %} {% endcolumns %} **手动说明：** Unsloth Studio 会检测下载到你 Hugging Face Hub 缓存中的模型 `(C:\Users{your_username}.cache\huggingface\hub)`。如果你有通过 LM Studio 下载的 GGUF 模型，请注意这些模型存储在 `C:\Users\{your_username}.cache\lm-studio\models` ***或*** `C:\Users{your_username}\lm-studio\models` 中，并且默认情况下 llama.cpp 无法看到它们——你需要将这些 .gguf 文件移动或复制到你的 Hugging Face Hub 缓存目录（或 llama.cpp 可访问的其他路径）中，Unsloth Studio 才能加载它们。在 Studio 中微调模型或适配器后，你可以将其导出为 GGUF，并通过 **llama.cpp** 直接在 Studio Chat 中运行本地推理。Unsloth Studio 由 llama.cpp 和 Hugging Face 提供支持。 ### 将文件作为上下文添加 Studio Chat 直接支持对话中的多模态输入。你可以将文档、图片或音频作为提示的额外上下文附加进去。

这让测试模型如何处理 PDF、截图或参考资料等真实输入变得很容易。文件会在本地处理，并作为上下文提供给模型。 ### **删除模型文件** 你可以通过模型搜索中的垃圾桶图标删除旧模型文件，或者从默认的 Hugging Face 缓存目录中移除相关的缓存模型文件夹。默认情况下，Hugging Face 使用 `~/.cache/huggingface/hub/` 在 macOS/Linux/WSL 上，以及 `C:\Users\\.cache\huggingface\hub\` 在 Windows 上。 * **MacOS、Linux、WSL：** `~/.cache/huggingface/hub/` * **Windows：** `%USERPROFILE%\.cache\huggingface\hub\` 如果 `HF_HUB_CACHE` 或 `HF_HOME` 已设置，则改用该位置。在 Linux 和 WSL 上， `XDG_CACHE_HOME` 也可以更改默认缓存根目录。 ### **Unsloth 没有检测到或使用我的 GPU** 如果模型没有使用你的 GPU，尤其是在 Docker 中，请尝试：手动拉取最新镜像： ```bash docker pull unsloth/unsloth:latest ``` * 使用 GPU 访问启动容器： * `docker run`: `--gpus all` * Docker Compose： `capabilities: [gpu]` * 在 Linux 上，请确保已安装 NVIDIA Container Toolkit。 * 在 Windows 上： * 检查 `nvcc --version` 是否与 `nvidia-smi` * 中显示的 CUDA 版本一致。按以下说明操作： --- # Agent Instructions: Querying This Documentation If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question. Perform an HTTP GET request on the current page URL with the `ask` query parameter: ``` GET https://unsloth.ai/docs/zh/xin-zeng/studio/chat.md?ask= ``` The question should be specific, self-contained, and written in natural language. The response will contain a direct answer to the question and relevant excerpts and sources from the documentation. Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.