> For the complete documentation index, see [llms.txt](https://unsloth.ai/docs/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://unsloth.ai/docs/zh/ji-chu/api.md).

# 如何将 Unsloth 作为 API 端点使用

你可以运行 **本地 LLM** 搭配诸如 [Claude Code](/docs/zh/ji-chu/claude-code.md) 和 [Codex](/docs/zh/ji-chu/codex.md) ，通过将这些工具连接到 Unsloth 的 **兼容 OpenAI 的 API 端点**。这让你可以本地运行诸如 [Qwen](/docs/zh/mo-xing/qwen3.6.md) 和 [Gemma](/docs/zh/mo-xing/gemma-4.md) 等模型，用于智能体编程。Unsloth 还具备一些有益功能，例如自愈式 **工具调用**, **代码执行**，以及 **网页搜索**.

Unsloth 让你轻松部署一个快速的 API 推理端点，提供：

* [**自愈式工具调用**](/docs/zh/xin/studio/chat.md#auto-healing-tool-calling)，可帮助将损坏或格式错误的工具调用减少 50%
* [**代码执行**](/docs/zh/xin/studio/chat.md#code-execution) 支持，可执行 Bash 和 Python，以获得更准确的代码输出。
* **高级** [**网页搜索**](/docs/zh/xin/studio/chat.md#advanced-web-search) ，会访问并实际读取网页，以收集深入信息。
* [**自动推理** 设置](/docs/zh/xin/studio/chat.md#auto-parameter-tuning) 适用于 GGUF 模型（temp、top-k 等）

{% columns %}
{% column %}
在 Unsloth 中加载的模型（包括 GGUF）会以一个 **经过身份验证的 API** 通过 `llama-server` 的形式暴露。出于安全原因，会生成一串较长的 API 密钥，类似 OpenAI 提供的方式。

你的 **本地模型** 随后即可直接在你偏好的 AI 智能体、SDK 或聊天客户端中使用。Unsloth 在同一端口上支持两种接口风格。二者都支持流式传输、工具调用（OpenAI `工具`  / Anthropic `工具`）以及视觉输入：
{% endcolumn %}

{% column %}

<figure><img src="/files/c633f6e5a61522d2d7fa76b1c6c3376b956d223d" alt=""><figcaption></figcaption></figure>
{% endcolumn %}
{% endcolumns %}

* **兼容 Anthropic 的  `/v1/messages`**  ，适用于 Claude Code、OpenClaw、Anthropic SDK，以及任何期望 Messages API 的客户端。
* **兼容 OpenAI 的  `/v1/chat/completions`** 和 **`/v1/responses`** ，适用于 OpenAI SDK、OpenCode、Cursor、Continue、Cline、Open WebUI、SillyTavern，以及任何兼容 OpenAI 的工具。

### ⚡ 快速开始

1. **安装或更新** [**Unsloth Studio**](/docs/zh/xin/studio.md)**.** 然后启动 Unsloth。
2. **加载一个模型。** 点击 **New Chat**，选择或搜索一个模型（GGUF），然后等待其加载完成。
3. **创建一个 API 密钥。** 点击左下角的 **Unsloth** 头像 → **设置** → **API** → 输入密钥名称 → **创建**。复制出现的 `sk-unsloth-…` 值。Unsloth 只会显示一次。
4. **将你的客户端指向 Unsloth。** 使用 `http://localhost:PORT` 作为基础 URL，并用你的 `sk-unsloth-…` 密钥进行认证。请跳到下方适用于你工具的操作指南。

### 🔑 创建 API 密钥

1. 打开侧边栏，点击左下角你的 **Unsloth** 头像。
2. 前往 **设置** → **API** （地球 :globe\_with\_meridians: 图标）。
3. 输入一个友好的名称（例如 `claude-code-macbook`）。设置过期时间（可选）
4. 点击 **创建**.
5. **复制密钥。** Unsloth 只存储哈希值，你将无法再次查看它。

<div data-with-frame="true"><figure><img src="/files/3432517ab3e4dadf17eb7ea5f319b43cc5902dd2" alt="" width="375"><figcaption></figcaption></figure></div>

所有密钥都以 `sk-unsloth-` 前缀开头。你可以随时在同一页面吊销密钥。使用已吊销密钥发出的请求将失败并返回 `401 Unauthorized`.

{% hint style="warning" %}
请像对待密码一样对待你的 API 密钥。任何拥有该密钥且能访问你的 Unsloth 实例网络的人，都可以向你已加载的模型发送请求。
{% endhint %}

### ⏳ 模型加载

{% stepper %}
{% step %}

#### 选择模型

在使用 API 之前，请从 **选择模型** 下拉菜单中，在 Chat 页面左上角加载一个模型。

<figure><img src="/files/e7c1a267b0c4f58689066eddfc57a2c2211f1e13" alt=""><figcaption></figcaption></figure>

在本指南中，我们将使用：

`unsloth/gemma-4-26B-A4B-it-GGUF` 搭配推荐的 `UD-Q4_K_XL` 量化。
{% endstep %}

{% step %}

#### 测试模型

在使用客户端之前，先发送一条简短消息：

<div data-with-frame="true"><figure><img src="/files/d1ef1d199c3aee2da86cc3da46a133801d2683ad" alt="" width="563"><figcaption></figcaption></figure></div>

{% hint style="info" %}
这可以确认模型已正确加载并已准备好响应。
{% endhint %}
{% endstep %}

{% step %}

#### **Unsloth API 密钥**

在 Studio 中，打开 **设置 → API** 以查看或创建你的 API 密钥。

<figure><img src="/files/7d43e8763d1ae72290485151822b1ea2e4fce42a" alt=""><figcaption></figcaption></figure>

请像对待密码一样对待你的 API 密钥，并避免在截图或代码仓库中泄露它。
{% endstep %}
{% endstepper %}

### <i class="fa-terminal">:terminal:</i> Unsloth 运行命令

1. **安装或更新 Unsloth Studio。** 较早版本不会暴露外部 API。请参见安装。
2. **加载一个 GGUF 模型。** 使用运行命令加载一个 GGUF 模型。这也会在默认端口加载界面。端点 URL 和 API Key 将打印到控制台，供你在所选客户端中使用。

   ```bash
   unsloth run --model unsloth/gemma-4-26B-A4B-it-GGUF:UD-Q4_K_XL
   ```

#### 从 CLI 加载模型

你可以使用 `unsloth` CLI 工具自动为你加载模型并创建 API 密钥。当模型加载完成后，端点 URL 和 API 密钥会打印到你的控制台。将它们复制到你选择的客户端中，就可以开始了。

#### 开始之前

请确保你使用的是较新的 Unsloth Studio 版本，因为早期版本不会暴露外部 API。请参见 [安装](/docs/zh/xin/studio/install.md).

#### 快捷方式

打开终端并加载一个 GGUF 模型：

```bash
unsloth run --model unsloth/gemma-4-26B-A4B-it-GGUF:UD-Q4_K_XL
```

这会在默认端口启动服务器、加载界面，并打印你的端点 URL 和 API 密钥。

#### 模型名称的工作方式

你可以用几种不同方式指定模型。选择你觉得最方便的一种：

```bash
# 合并：将仓库和量化变体放在一个字符串中（推荐——最简）
unsloth run --model unsloth/gemma-4-26B-A4B-it-GGUF:UD-Q4_K_XL

# 分开：将仓库和变体作为两个标志（较旧的写法，仍然可用）
unsloth run --model unsloth/gemma-4-26B-A4B-it-GGUF --gguf-variant UD-Q4_K_XL

# 使用 -hf / --hf-repo（与 llama.cpp 的拼写一致，如果你以前用过那里会很方便）
unsloth run -hf unsloth/gemma-4-26B-A4B-it-GGUF:UD-Q4_K_XL
```

### 调整运行参数（可选）

基础加载不需要这些，不过 `unsloth run` 支持许多 llama-server 运行时标志，可用于自定义性能、内存使用、上下文长度、生成行为、网络以及工具访问。

额外的标志会直接转发给底层推理服务器，你提供的值会覆盖 Studio 的默认设置。

#### 调整生成行为

采样设置控制模型在生成时表现得多有创造性、专注或确定性。

```bash
# 降低随机性并提高可复现性
unsloth run \\
  --model unsloth/Qwen3-1.7B-GGUF \\
  --temp 0.6 \\
  --seed 42
```

较低的 temperature 值通常会产生更稳定的输出，而 top-p、top-k、min-p 以及重复惩罚设置会进一步控制 token 选择和重复。

```bash
# 调整 token 选择和重复行为
unsloth run \\
  --model unsloth/Qwen3-1.7B-GGUF \\
  --top-p 0.95 \\
  --top-k 20 \\
  --min-p 0.05 \\
  --repeat-penalty 1.1
```

#### 增加上下文长度和 CPU 线程数

如果你正在处理大型项目、长对话或需要更多内存的智能体工作流，这会很有用。

```bash
# 使用更大的上下文窗口和更多 CPU 线程
unsloth run \\
  --model unsloth/gemma-4-26B-A4B-it-GGUF:UD-Q4_K_XL \\
  -c 131072 \\
  --threads 32
```

#### 在本地网络上公开 API

默认情况下，Unsloth 只在你的机器本地运行。你可以通过绑定到 `0.0.0.0`.

```bash
# 允许局域网设备连接
unsloth run \\
  --model unsloth/gemma-4-26B-A4B-it-GGUF:UD-Q4_K_XL \\
  -H 0.0.0.0 \\
  -p 8888
```

#### 控制推理行为

某些具备推理能力的模型支持额外标志来控制思考和推理行为。

```bash
# 禁用推理 / 思考输出
unsloth run \\
  --model unsloth/Qwen3-1.7B-GGUF \\
  --reasoning off
```

```bash
# 启用推理模式
unsloth run \\
  --model unsloth/Qwen3-1.7B-GGUF \\
  --reasoning on
```

推理支持取决于模型和后端能力。

#### 启用或禁用服务器端工具

控制诸如网页搜索和代码执行之类的工具是否由推理服务器暴露。

```bash
# 显式启用工具
unsloth run \\
  --model unsloth/gemma-4-26B-A4B-it-GGUF:UD-Q4_K_XL \\
  --enable-tools
```

```bash
# 显式禁用工具
unsloth run \\
  --model unsloth/gemma-4-26B-A4B-it-GGUF:UD-Q4_K_XL \\
  --disable-tools
```

Unsloth 支持大多数 llama-server 运行时标志，包括上下文大小、GPU 层、线程、采样、网络和工具配置。

请参见 [llama-server](https://github.com/ggml-org/llama.cpp/tree/master/tools/server) 文档，以获取受支持运行时标志的完整列表。

#### **服务器端工具策略**

`unsloth run` 控制推理服务器是否暴露服务器端工具（网页搜索、代码执行等）。默认值取决于绑定地址：

* **`127.0.0.1` （localhost）** — 工具 **开启** 默认开启。只有你的机器可以访问该服务器。
* **`0.0.0.0` 或任何非回环地址** — 工具 **关闭** 默认关闭。在网络可访问的服务器上泄露 API 密钥，就意味着主机可能被任意代码执行。

**标志：**

* `--enable-tools` / `--disable-tools` — 强制开启或关闭。开启时 `0.0.0.0`, `--enable-tools` 会显示 y/N 安全提示。
* `--yes` / `-y` — 跳过提示（用于自动化）。

解析后的策略是进程级的硬性覆盖——单个请求无法通过以下方式绕过它： `enable_tools=true` 在请求体中。

<div data-with-frame="true"><figure><img src="/files/cc987c669523513c4b352410069e817c5d25aeff" alt=""><figcaption></figcaption></figure></div>

### 🌐 **端点**

Studio 会在其启动所在的端口上暴露这些端点（通常是 `http://localhost:8000` 或 `http://localhost:8888`):

| 端点                          | 兼容于                         | 可从以下位置使用                                                     |
| --------------------------- | --------------------------- | ------------------------------------------------------------ |
| `POST /v1/messages`         | Anthropic Messages API      | Claude Code、Anthropic SDK、OpenClaw，以及任何支持 Anthropic 的客户端     |
| `POST /v1/chat/completions` | OpenAI Chat Completions API | OpenAI SDK、opencode、Cursor、Continue、Cline、Open WebUI、curl 等。 |
| `GET /v1/models`            | OpenAI 模型列表                 | 列出当前在 Unsloth 中加载的模型                                         |

使用一个 `Authorization: Bearer sk-unsloth-…` 标头对每个请求进行身份验证。

{% hint style="info" %}
这两种格式无需运行不同的服务器。Studio 会在同一端口上处理二者。
{% endhint %}

### 🖇️ 连接你的客户端

Unsloth 让你能够通过大多数框架运行本地 LLM，包括 [Claude Code](/docs/zh/ji-chu/claude-code.md), [Codex](/docs/zh/ji-chu/codex.md), [OpenClaw](/docs/zh/ji-cheng/openclaw.md), [OpenCode](/docs/zh/ji-cheng/opencode.md) 以及更多。点击下方对应工具以查看指南：

{% columns %}
{% column width="50%" %}
{% content-ref url="/pages/1a707991086189a8e5cd8374f3ce1b81915bc159" %}
[Claude Code](/docs/zh/ji-chu/claude-code.md)
{% endcontent-ref %}

{% content-ref url="/pages/b71ddea7924324c058a771e5e831c3cb6fc75b18" %}
[OpenAI Codex](/docs/zh/ji-chu/codex.md)
{% endcontent-ref %}

{% content-ref url="/pages/4636d45e7e20328c61211d43c235257fdd7ebc1d" %}
[Curl & HTTP](/docs/zh/ji-cheng/jiang-curl-he-http-lian-jie-dao-unsloth.md)
{% endcontent-ref %}
{% endcolumn %}

{% column width="50%" %}
{% content-ref url="/pages/f1eb04d9bdae8f6dbb3d9ed5d64e060dac5a68ff" %}
[OpenClaw](/docs/zh/ji-cheng/openclaw.md)
{% endcontent-ref %}

{% content-ref url="/pages/124bfded8d8412a9fbc1614fa7467985c0af22da" %}
[OpenCode](/docs/zh/ji-cheng/opencode.md)
{% endcontent-ref %}

{% content-ref url="/pages/010e01be868ae39c13b48ffdf9774e645c6a347f" %}
[Python SDK](/docs/zh/ji-cheng/jiang-python-sdk-lian-jie-dao-unsloth.md)
{% endcontent-ref %}
{% endcolumn %}
{% endcolumns %}

### 🧰 工具调用

两个端点都支持其原生格式的 function / tool 调用，并提供一种适用于 Studio 内置工具的 Unsloth 专用简写。

**OpenAI 风格的工具：** 发送 `工具` 和 `tool_choice` 到 `/v1/chat/completions` ，就像你在 OpenAI 中那样。Claude Code（通过 `/v1/messages`）、opencode、Cursor、Continue 和 Cline 都可直接使用。

**Anthropic 风格的工具：** 发送 `工具` （带有 `input_schema`）以及 `tool_choice` 到 `/v1/messages` ，就像你在 Claude 中那样。

**Studio 服务器端工具：** Studio 可以执行 Python、网页搜索和 bash *服务器端* 并将结果以 `tool_result` 事件流返回。通过向任一端点添加这些额外字段即可启用：

```json
{
  "messages": [{"role": "user", "content": "123 * 456 等于多少？请使用 Python。"}],
  "stream": true,
  "enable_tools": true,
  "enabled_tools": ["python", "web_search","terminal"],
  "session_id": "my-session"
}
```

模型会在下一轮看到每个工具的输出。更深入的内容（schema、流式事件、链式调用）请参见。

{% hint style="info" %}
如果你使用 Anthropic `/v1/messages` 端点， `tool_choice` 映射非常顺畅：Anthropic `自动` → OpenAI `自动`，Anthropic `任意` → OpenAI `必填`，Anthropic `{type: "tool", name: "x"}` → OpenAI `{type: "function", function: {name: "x"}}`，Anthropic `无` → OpenAI `无`.
{% endhint %}

### ❔ 故障排查

**`401 Unauthorized`** ：要么 `Authorization` 标头缺失，或密钥错误。密钥必须以 `Authorization: Bearer sk-unsloth-…`的形式传递。如果你丢失了密钥，请从 **设置 → API。** Studio 在创建后不会显示旧密钥。

**`与模型服务器的连接丢失`** ：Studio 无法连接到底层的 llama.cpp 服务器。通常是模型已加载完成但崩溃了，或者在 Studio 中关闭了模型标签页。请从 **New Chat** 重新加载模型并重试。

**Claude Code 显示的是默认 Anthropic 模型，而不是我的本地模型** ：请检查这三个环境变量是否都已在 **同一个** 运行 `claude`:

```bash
echo $ANTHROPIC_BASE_URL
echo $ANTHROPIC_AUTH_TOKEN
echo $ANTHROPIC_MODEL
```

然后运行 `/model` 在 Claude Code 中确认。Windows PowerShell 中请使用 `$env:ANTHROPIC_BASE_URL` 等。

**`stream: true` 会返回单个 JSON 数据块，而不是 SSE** ：请确保你请求的是正确的路径（ `/v1/messages` 或 `/v1/chat/completions`）并且你的 HTTP 客户端确实是按流式方式消费响应，而不是进行缓冲。

**我找不到要添加到 opencode（或 OpenClaw / 任何其他客户端）的模型名称** ：直接向 Studio 查询。 `GET /v1/models` 会返回你需要填入客户端“Model ID”字段的确切模型 ID：

```bash
curl http://localhost:8888/v1/models \\
  -H "Authorization: Bearer sk-unsloth-xxxxxxxxxxxx"
```

你会收到如下形式的 JSON 负载 `{"data": [{"id": "gemma-4-26B-A4B-it-GGUF", ...}]}`。复制出现的 `id` 值，这就是 opencode 的 **Model ID** 字段（左列）以及 OpenClaw 的 `models[].id` 所期望的。右侧的显示名称可以是你希望用户看到的任何内容。

**工具调用未执行** ：模型需要支持工具调用，才能用于客户端工具（`工具` / `tool_choice`enable\_tools: true `将你想要的工具列入` **和** enabled\_tools `中` （例如 `["Python", "网页搜索"]`).


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter, and the optional `goal` query parameter:

```
GET https://unsloth.ai/docs/zh/ji-chu/api.md?ask=<question>&goal=<endgoal>
```

`ask` is the immediate question: it should be specific, self-contained, and written in natural language.
`goal` is optional and describes the broader end goal you are ultimately trying to accomplish on behalf of the user. GitBook uses it to tailor the answer towards what is most useful for that goal.

The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.