# 如何使用 Unsloth Studio 运行模型 [Unsloth Studio](/docs/zh/xin/studio.md) 可让你在电脑上 100% 离线运行 AI 模型。可运行 GGUF 和 safetensors 等模型格式，支持从 Hugging Face 或本地文件加载。 * **可在所有 MacOS、CPU、Windows、Linux、WSL 环境中运行！无需 GPU** * [**自我修复式工具调用**](#auto-healing-tool-calling)**,** 高级 [**网页搜索**](#advanced-web-search), [**代码执行**](#code-execution) * 将 Unsloth 作为与 OpenAI 兼容的推理 [**API 端点**](/docs/zh/ji-chu/api.md) * 搜索 + 下载 + 运行 + [对比](#model-arena) 任何模型，如 GGUF、LoRA 适配器、safetensors 等。 * [**自动推理参数**](#auto-parameter-tuning) 调优（temp、top-p 等）并编辑聊天模板 * 上传图片、音频、PDF、代码、DOCX 及更多文件类型来聊天。

### 使用 Unsloth Studio Chat {% hint style="success" %} Unsloth Studio Chat 可自动在 **多 GPU 配置** 上进行推理。 {% endhint %} {% columns %} {% column %} #### 代码执行 Unsloth Studio 让 LLM 不仅能运行 JavaScript，还能运行 Bash 和 Python。它还会像 Claude Artifacts 一样对程序进行沙箱隔离，因此模型可以测试代码、生成文件，并通过真实计算验证答案。这使模型给出的答案更可靠、更准确。 {% endcolumn %} {% column %}

{% endcolumn %} {% endcolumns %} {% columns %} {% column %} #### 自动修复式工具调用 Unsloth Studio 不仅支持 [工具调用](#id-50-tool-calling-accuracy)，还可将格式错误或损坏的工具调用自动修复 50%。这意味着你始终能获得推理输出 **而不会有** 损坏的工具调用。例如，Qwen3.5-4B 搜索了 20 多个网站并引用了来源，网页搜索发生在其思考轨迹内部。 {% endcolumn %} {% column %}

{% endcolumn %} {% endcolumns %} {% columns %} {% column %} #### 高级网页搜索 Unsloth 的网页搜索会直接访问页面以收集相关信息和数据，而不仅仅是扫描网站摘要。这能提供更准确、更深入的信息和上下文输出。 {% endcolumn %} {% column %}

{% endcolumn %} {% endcolumns %} {% columns %} {% column %} #### 将 Unsloth 作为 API 端点使用你现在可以通过 [Claude Code](/docs/zh/ji-chu/claude-code.md) 和 [Codex](/docs/zh/ji-chu/codex.md) 等工具连接到 Unsloth 的 API 端点，从而使用本地 LLM。这意味着你可以在这些工具中直接运行 Qwen 和 Gemma 模型，并使用 Unsloth 的推理能力，其中包括自我修复式工具调用、网页搜索等功能。 {% endcolumn %} {% column %}

{% endcolumn %} {% endcolumns %} {% columns %} {% column %} #### 搜索并运行模型你可以通过 Hugging Face 搜索并下载任何模型，或使用本地文件。 Studio 支持多种模型类型，包括 **GGUF**、视觉语言模型以及文本转语音模型。可运行最新模型，如 [Qwen3.5](/docs/zh/mo-xing/qwen3.5.md) 或 NVIDIA [Nemotron 3](/docs/zh/mo-xing/nemotron-3.md). 上传图片、音频、PDF、代码、DOCX 及更多文件类型来聊天。 {% endcolumn %} {% column %}

{% endcolumn %} {% endcolumns %} {% columns %} {% column %} #### 自动推理设置推理参数，如 **temperature**, **top-p**, **top-k** 会为新模型（如 Qwen3.5）自动预设，让你无需担心设置即可获得最佳输出。你也可以手动调整参数并编辑系统提示词。借助 llama.cpp 的智能自动上下文，不再需要调整上下文长度，它只会使用你需要的上下文，而不会额外加载内容。 {% endcolumn %} {% column %}

{% endcolumn %} {% endcolumns %} {% columns %} {% column %} #### 聊天工作区输入提示词，附加任何文档、图片（webp、png）、代码文件、txt 或音频作为额外上下文，并实时查看模型的回复。可切换：思考 + 网页搜索。 {% endcolumn %} {% column %}

{% endcolumn %} {% endcolumns %} ### **工具调用准确率 +50%** Unsloth 提供多项独特功能来改进工具调用，包括： * Unsloth 中所有模型的工具调用 **准确率提升 30% 到 80%**. * 网页搜索会检索真实网页内容，而不仅仅是摘要。 * 允许的最大工具调用次数 **超过 25 次。** * 工具调用结束得更可靠，减少循环和重复调用。 * 改进的工具调用修复与去重逻辑有助于防止 XML 泄漏到输出中。查看测试结果： `unsloth/Qwen3.5-4B-GGUF (UD-Q4_K_XL)` 已启用网页搜索、代码执行和思考： | 指标 | 普通工具调用 | Unsloth 工具调用 | | ------------ | ------ | ------------ | | 响应中的 XML 泄漏 | 10/10 | 0/10 | | 使用的 URL 获取次数 | 0 | 4/10 次运行 | | 歌曲名称正确的运行次数 | 0/10 | 2/10 | | 平均工具调用次数 | 5.5 | 3.8 | | 平均响应时间 | 12.3 秒 | 9.8 秒 | ### 模型竞技场 Studio Chat 允许你使用同一个提示词并排比较任意两个模型。例如，比较基础模型和 LoRa 适配器。推理会先加载一个模型，再加载第二个模型（并行推理正在开发中）。

{% columns %} {% column %} 训练完成后，你可以使用相同的提示词并排比较基础模型和微调后的模型，查看发生了什么变化以及结果是否有所改进。这个工作流可以轻松看出你的微调如何改变了模型的回复，以及它是否改善了你的使用场景结果。 {% endcolumn %} {% column %}

{% endcolumn %} {% endcolumns %} {% hint style="success" %} Unsloth Studio Chat 可自动运行于 **多 GPU 配置** 上进行推理。 {% endhint %} ### 使用旧的 / 现有 GGUF 模型 {% columns %} {% column %} **4 月 1 日更新：** 你现在可以选择一个现有文件夹，让 Unsloth 从中检测模型。 **3 月 27 日更新：** Unsloth Studio 现在 **可自动检测旧的 / 预先存在的模型** 这些模型可从 Hugging Face、LM Studio 等处下载。 {% endcolumn %} {% column %}

{% endcolumn %} {% endcolumns %} **手动说明：** Unsloth Studio 会检测下载到 Hugging Face Hub 缓存中的模型 `(C:\Users{your_username}.cache\huggingface\hub)`。如果你通过 LM Studio 下载了 GGUF 模型，请注意这些文件存储在 `C:\Users\{your_username}.cache\lm-studio\models` ***或*** `C:\Users{your_username}\lm-studio\models` 中，且默认情况下 llama.cpp 无法看到它们——你需要将这些 .gguf 文件移动或复制到 Hugging Face Hub 缓存目录（或 llama.cpp 可访问的其他路径），这样 Unsloth Studio 才能加载它们。在 Studio 中对模型或适配器完成微调后，你可以将其导出为 GGUF，并使用 **llama.cpp** 直接在 Studio Chat 中进行本地推理。Unsloth Studio 由 llama.cpp 和 Hugging Face 提供支持。 ### 将文件添加为上下文 Studio Chat 直接支持对话中的多模态输入。你可以附加文档、图片或音频作为提示词的额外上下文。

这使得测试模型如何处理 PDF、截图或参考资料等真实世界输入变得非常容易。文件会在本地处理，并作为模型的上下文包含进去。 ### **删除模型文件** 你可以通过模型搜索中的垃圾桶图标删除旧模型文件，或者从默认的 Hugging Face 缓存目录中移除相应的缓存模型文件夹。默认情况下，Hugging Face 使用 `~/.cache/huggingface/hub/` 在 macOS/Linux/WSL 上，以及 `C:\Users\\.cache\huggingface\hub\` 在 Windows 上。 * **MacOS、Linux、WSL：** `~/.cache/huggingface/hub/` * **Windows：** `%USERPROFILE%\.cache\huggingface\hub\` 如果设置了 `HF_HUB_CACHE` 或 `HF_HOME` ，则改用该位置。在 Linux 和 WSL 上， `XDG_CACHE_HOME` 也可以更改默认缓存根目录。 ### **Unsloth 没有检测到或没有使用我的 GPU** 如果模型没有使用你的 GPU，尤其是在 Docker 中，请尝试：手动拉取最新镜像： ```bash docker pull unsloth/unsloth:latest ``` * 使用 GPU 访问启动容器： * `docker run`: `--gpus all` * Docker Compose： `capabilities: [gpu]` * 在 Linux 上，请确保已安装 NVIDIA Container Toolkit。 * 在 Windows 上： * 检查 `nvcc --version` 是否与 `nvidia-smi` * 中显示的 CUDA 版本一致。 [请参考：](https://docs.docker.com/desktop/features/gpu/) --- # Agent Instructions: Querying This Documentation If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question. Perform an HTTP GET request on the current page URL with the `ask` query parameter: ``` GET https://unsloth.ai/docs/zh/xin/studio/chat.md?ask= ``` The question should be specific, self-contained, and written in natural language. The response will contain a direct answer to the question and relevant excerpts and sources from the documentation. Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.