# 如何将 Unsloth 作为 API 端点使用

你可以使用 **本地 LLM** 配合以下工具： [Claude Code](/docs/zh/ji-chu/claude-code.md) 和 [Codex](/docs/zh/ji-chu/codex.md) ，方法是将这些工具连接到 Unsloth 的 **兼容 OpenAI 的 API 端点**。这使你可以在本地运行诸如 [Qwen](/docs/zh/mo-xing/qwen3.6.md) 和 [Gemma](/docs/zh/mo-xing/gemma-4.md) 等模型，用于智能体式编程。Unsloth 还提供诸如自我修复 **工具调用**, **代码执行**以及 **网页搜索**.

Unsloth 让你能够轻松部署一个快速的 API 推理端点，提供：

* [**自我修复工具调用**](/docs/zh/xin/studio/chat.md#auto-healing-tool-calling)，可帮助将损坏或格式错误的工具调用减少 50%
* [**代码执行**](/docs/zh/xin/studio/chat.md#code-execution) 支持，允许执行 Bash 和 Python，以获得更准确的代码输出。
* **高级** [**网页搜索**](/docs/zh/xin/studio/chat.md#advanced-web-search) ，会访问并实际读取网页以收集深入信息。
* [**自动推理** 设置](/docs/zh/xin/studio/chat.md#auto-parameter-tuning) ，适用于 GGUF 模型（temp、top-k 等）

{% columns %}
{% column %}
在 Unsloth 中加载的模型（包括 GGUF）会通过 **已认证 API** 经由 `llama-server`对外提供。出于安全原因，会生成一个很长的 API 密钥，就像 OpenAI 提供的一样。

你的 **本地模型** 随后就可以直接在你偏好的 AI 代理、SDK 或聊天客户端中使用。Unsloth 在同一个端口上支持两种协议。两者都支持流式输出、工具调用（OpenAI `工具` / Anthropic `工具`）以及视觉输入：
{% endcolumn %}

{% column %}

<figure><img src="/files/c633f6e5a61522d2d7fa76b1c6c3376b956d223d" alt=""><figcaption></figcaption></figure>
{% endcolumn %}
{% endcolumns %}

* **兼容 Anthropic 的 `/v1/messages`**  适用于 Claude Code、OpenClaw、Anthropic SDK，以及任何期望使用 Messages API 的客户端。
* **兼容 OpenAI 的 `/v1/chat/completions`** 和 **`/v1/responses`** 适用于 OpenAI SDK、OpenCode、Cursor、Continue、Cline、Open WebUI、SillyTavern，以及任何兼容 OpenAI 的工具。

### ⚡ 快速开始

1. **安装或更新** [**Unsloth Studio**](/docs/zh/xin/studio.md)**.** 然后启动 Unsloth。
2. **加载模型。** 点击 **新建聊天**，选择或搜索一个模型（GGUF），并等待其加载完成。
3. **创建 API 密钥。** 点击左下角的 **Unsloth** 头像 → **设置** → **API** → 输入密钥名称 → **创建**。复制出现的 `sk-unsloth-…` 值。Unsloth 只会显示一次。
4. **将你的客户端指向 Unsloth。** 使用 `http://localhost:PORT` 作为基础 URL，以及你的 `sk-unsloth-…` 密钥进行认证。请直接跳到下面对应工具的使用说明。

### 🔑 创建 API 密钥

1. 打开侧边栏，点击左下角的 **Unsloth** 头像。
2. 进入 **设置** → **API** （地球 :globe\_with\_meridians: 图标）。
3. 输入一个友好的名称（例如 `claude-code-macbook`）。设置过期时间（可选）
4. 点击 **创建**.
5. **复制密钥。** Unsloth 只会存储哈希值，你之后无法再次查看它。

<div data-with-frame="true"><figure><img src="/files/7242dbe4fc4c5504066ca1aac435bcfd34e8bb74" alt="" width="375"><figcaption></figcaption></figure></div>

所有密钥都以 `sk-unsloth-` 前缀开头。你可以随时在同一页面撤销某个密钥。使用已撤销密钥的请求将返回 `401 未授权`.

{% hint style="warning" %}
请像对待密码一样对待你的 API 密钥。任何拥有该密钥并能访问你的 Unsloth 实例网络的人，都可以向你加载的模型发送请求。
{% endhint %}

### ⏳ 模型加载

{% stepper %}
{% step %}

#### 选择模型

在使用 API 之前，请从 Chat 页面左上角的 **选择模型** 下拉菜单中加载一个模型。

<figure><img src="/files/e7c1a267b0c4f58689066eddfc57a2c2211f1e13" alt=""><figcaption></figcaption></figure>

在本指南中，我们将使用：

`unsloth/gemma-4-26B-A4B-it-GGUF` 以及推荐的 `UD-Q4_K_XL` 量化。
{% endstep %}

{% step %}

#### 测试模型

在使用客户端之前，先发送一条简短消息：

<div data-with-frame="true"><figure><img src="/files/d1ef1d199c3aee2da86cc3da46a133801d2683ad" alt="" width="563"><figcaption></figcaption></figure></div>

{% hint style="info" %}
这可确认模型已正确加载并准备好响应。
{% endhint %}
{% endstep %}

{% step %}

#### **Unsloth API 密钥**

在 Studio 中，打开 **设置 → API** 以查看或创建你的 API 密钥。

<figure><img src="/files/7d43e8763d1ae72290485151822b1ea2e4fce42a" alt=""><figcaption></figcaption></figure>

请像对待密码一样对待你的 API 密钥，避免在截图或代码仓库中泄露它。
{% endstep %}
{% endstepper %}

### <i class="fa-terminal">:terminal:</i> Unsloth 运行命令

1. **安装或更新 Unsloth Studio。** 早期版本不提供外部 API。请参阅安装说明。
2. **加载一个 GGUF 模型。** 使用运行命令加载一个 GGUF 模型。这也会在默认端口上加载 UI。端点 URL 和 API Key 会打印到控制台，供你立即在所选客户端中使用。

   ```bash
   unsloth run --model unsloth/gemma-4-26B-A4B-it-GGUF:UD-Q4_K_XL
   ```

#### 通过 CLI 加载模型

你可以使用 `unsloth` CLI 工具加载模型，并自动为你创建 API 密钥。模型加载完成后，端点 URL 和 API 密钥会打印到控制台。将它们复制到你选择的客户端中即可开始使用。

#### 开始之前

请确保你使用的是较新的 Unsloth Studio 版本，因为早期版本不提供外部 API。请参阅 [安装](/docs/zh/xin/studio/install.md).

#### 快速方式

打开终端并加载一个 GGUF 模型：

```bash
unsloth run --model unsloth/gemma-4-26B-A4B-it-GGUF:UD-Q4_K_XL
```

这会在默认端口启动服务器，加载 UI，并打印出你的端点 URL 和 API 密钥。

#### 模型名称的工作方式

你可以通过几种不同方式指定模型。请选择你觉得最简单的：

```bash
# 合并：将仓库和量化变体写在同一个字符串中（推荐——最简洁）
unsloth run --model unsloth/gemma-4-26B-A4B-it-GGUF:UD-Q4_K_XL

# 分开：将仓库和变体作为两个标志（较旧的方式，仍然有效）
unsloth run --model unsloth/gemma-4-26B-A4B-it-GGUF --gguf-variant UD-Q4_K_XL

# 使用 -hf / --hf-repo（与 llama.cpp 的写法一致，如果你来自那里会很方便）
unsloth run -hf unsloth/gemma-4-26B-A4B-it-GGUF:UD-Q4_K_XL
```

#### 调整运行参数（可选）

基础加载不需要这些，但如果你想要更多控制，可以传入额外参数，它们会直接转发到底层的 `llama-server`。你的设置会覆盖 Studio 的默认值。

这里有几个示例：

```bash
# 设置更大的上下文窗口并固定线程数
unsloth run --model unsloth/gemma-4-26B-A4B-it-GGUF:UD-Q4_K_XL -c 131072 --threads 32

# 调整采样
unsloth run --model unsloth/Qwen3-1.7B-GGUF --top-k 20 --seed 42

# 使用自定义聊天模板
unsloth run --model unsloth/gemma-4-26B-A4B-it-GGUF:UD-Q4_K_XL \
  --chat-template-file /path/to/template.jinja
```

大多数 `llama-server` 标志都适用于上下文大小、GPU 层数、采样参数、KV 缓存类型、推理设置等等。请参阅 [llama-server 文档](https://github.com/ggml-org/llama.cpp/tree/master/tools/server) 了解完整列表。

#### **服务器端工具策略**

`unsloth run` 控制推理服务器是否公开服务器端工具（网页搜索、代码执行等）。默认值取决于绑定地址：

* **`127.0.0.1` （localhost）** — 工具 **默认开启** 。只有你的机器可以访问该服务器。
* **`0.0.0.0` 或任何非回环地址** — 工具 **默认关闭** 。若网络可访问的服务器上的 API 密钥泄露，可能导致主机上任意代码执行。

**标志：**

* `--enable-tools` / `--disable-tools` — 强制开启或关闭。开启时 `0.0.0.0`, `--enable-tools` 会显示一个 y/N 安全提示。
* `--yes` / `-y` — 跳过提示（用于自动化）。

最终确定的策略是进程级的硬性覆盖——单个请求无法通过请求体中的 `enable_tools=true` 来绕过它。

<div data-with-frame="true"><figure><img src="/files/cc987c669523513c4b352410069e817c5d25aeff" alt=""><figcaption></figcaption></figure></div>

### 🌐 **端点**

Studio 会在启动时所用的端口上暴露这些端点（通常是 `http://localhost:8000` 或 `http://localhost:8888`):

| 端点                          | 兼容对象                        | 从以下位置使用                                                      |
| --------------------------- | --------------------------- | ------------------------------------------------------------ |
| `POST /v1/messages`         | Anthropic Messages API      | Claude Code、Anthropic SDK、OpenClaw，以及任何支持 Anthropic 的工具      |
| `POST /v1/chat/completions` | OpenAI Chat Completions API | OpenAI SDK、opencode、Cursor、Continue、Cline、Open WebUI、curl 等。 |
| `GET /v1/models`            | OpenAI 模型列表                 | 列出当前在 Unsloth 中加载的模型                                         |

使用以下内容进行认证： `Authorization: Bearer sk-unsloth-…` 请求头。

{% hint style="info" %}
你不需要为这两种格式运行不同的服务器。Studio 会在同一个端口上同时处理两者。
{% endhint %}

### 🖇️ 连接你的客户端

Unsloth 让你可以通过大多数框架运行本地 LLM，包括 [Claude Code](/docs/zh/ji-chu/claude-code.md), [Codex](/docs/zh/ji-chu/codex.md), [OpenClaw](/docs/zh/ji-cheng/openclaw.md), [OpenCode](/docs/zh/ji-cheng/opencode.md) 等。点击下面的具体工具以查看指南：

{% columns %}
{% column width="50%" %}
{% content-ref url="/pages/1a707991086189a8e5cd8374f3ce1b81915bc159" %}
[Claude Code](/docs/zh/ji-chu/claude-code.md)
{% endcontent-ref %}

{% content-ref url="/pages/b71ddea7924324c058a771e5e831c3cb6fc75b18" %}
[OpenAI Codex](/docs/zh/ji-chu/codex.md)
{% endcontent-ref %}

{% content-ref url="/pages/4636d45e7e20328c61211d43c235257fdd7ebc1d" %}
[Curl & HTTP](/docs/zh/ji-cheng/jiang-curl-he-http-lian-jie-dao-unsloth.md)
{% endcontent-ref %}
{% endcolumn %}

{% column width="50%" %}
{% content-ref url="/pages/f1eb04d9bdae8f6dbb3d9ed5d64e060dac5a68ff" %}
[OpenClaw](/docs/zh/ji-cheng/openclaw.md)
{% endcontent-ref %}

{% content-ref url="/pages/124bfded8d8412a9fbc1614fa7467985c0af22da" %}
[OpenCode](/docs/zh/ji-cheng/opencode.md)
{% endcontent-ref %}

{% content-ref url="/pages/010e01be868ae39c13b48ffdf9774e645c6a347f" %}
[Python SDK](/docs/zh/ji-cheng/jiang-python-sdk-lian-jie-dao-unsloth.md)
{% endcontent-ref %}
{% endcolumn %}
{% endcolumns %}

### 🧰 工具调用

这两个端点都支持其原生格式下的函数 / 工具调用，并提供一种 Unsloth 特定的简写形式，用于 Studio 内置工具。

**OpenAI 风格的工具：** 发送 `工具` 和 `tool_choice` 到 `/v1/chat/completions` ，就像你在 OpenAI 中所做的一样。Claude Code（通过 `/v1/messages`）、opencode、Cursor、Continue 和 Cline 都可直接使用。

**Anthropic 风格的工具：** 发送 `工具` （配合 `input_schema`）以及 `tool_choice` 到 `/v1/messages` ，就像你在 Claude 中所做的一样。

**Studio 服务器端工具：** Studio 可以在 *服务器端* 执行 Python、网页搜索和 bash，并将结果以 `tool_result` 事件流式返回。通过在任一端点中添加以下额外字段来启用：

```json
{
  "messages": [{"role": "user", "content": "123 * 456 等于多少？请使用 Python。"}],
  "stream": true,
  "enable_tools": true,
  "enabled_tools": ["python", "web_search","terminal"],
  "session_id": "my-session"
}
```

模型会在下一轮看到每个工具的输出。更深入的内容（schema、流式事件、链式调用）请参见 。

{% hint style="info" %}
如果你使用的是 Anthropic `/v1/messages` 端点， `tool_choice` 映射很简单：Anthropic `auto` → OpenAI `auto`，Anthropic `any` → OpenAI `required`，Anthropic `{type: "tool", name: "x"}` → OpenAI `{type: "function", function: {name: "x"}}`，Anthropic `none` → OpenAI `none`.
{% endhint %}

### ❔ 故障排查

**`401 未授权`** ：要么是 `Authorization` 请求头缺失，要么是密钥错误。密钥必须以 `Authorization: Bearer sk-unsloth-…`的形式传递。如果你丢失了密钥，请从 **设置 → API。** Studio 在创建后不会显示旧密钥。

**`无法连接到模型服务器`** ：Studio 无法连接到底层的 llama.cpp 服务器。通常是模型加载完成后崩溃了，或者在 Studio 中关闭了模型标签页。请从 **新建聊天** 重新加载模型并重试。

**Claude Code 显示的是默认的 Anthropic 模型，而不是我的本地模型** ：请检查这三个环境变量是否都在你运行 **同一个** shell 中导出： `claude`:

```bash
echo $ANTHROPIC_BASE_URL
echo $ANTHROPIC_AUTH_TOKEN
echo $ANTHROPIC_MODEL
```

然后运行 `/model` 在 Claude Code 内确认。在 Windows PowerShell 中使用 `$env:ANTHROPIC_BASE_URL` 等。

**`stream: true` 返回单个 JSON 块而不是 SSE** ：请确保你访问的是正确的路径（`/v1/messages` 或 `/v1/chat/completions`）并且你的 HTTP 客户端确实是将响应作为流来消费，而不是缓冲它。

**我找不到要添加到 opencode（或 OpenClaw / 其他任何客户端）中的模型名称** ：直接向 Studio 询问。 `GET /v1/models` 会返回你需要填入客户端“Model ID”字段的精确模型 ID：

```bash
curl http://localhost:8888/v1/models \
  -H "Authorization: Bearer sk-unsloth-xxxxxxxxxxxx"
```

你将收到如下形式的 JSON 负载 `{"data": [{"id": "gemma-4-26B-A4B-it-GGUF", ...}]}`。复制出现的 `id` 值，那就是 opencode 的 **Model ID** 字段（左列）以及 OpenClaw 的 `models[].id` 所期望的字符串。右侧显示名称则是你希望用户看到的内容。

**工具调用没有被执行** ：模型需要支持客户端侧工具的工具调用（`工具` / `tool_choice`）。对于 Studio 内置工具，请记得设置 `enable_tools: true` **和** 并在 `enabled_tools` 中列出你想要的工具（例如 `["python", "web_search"]`).


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://unsloth.ai/docs/zh/ji-chu/api.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.