> For the complete documentation index, see [llms.txt](https://unsloth.ai/docs/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://unsloth.ai/docs/zh/ji-cheng/jiang-curl-he-http-lian-jie-dao-unsloth.md). # 将 Curl 和 HTTP 连接到 Unsloth Unsloth 在其启动的端口上，通过同一个基础 URL 提供三种兼容 OpenAI/Anthropic 的传输格式。它们都需要一个 `Authorization: Bearer sk-unsloth-…` header，并根据你是否设置 `stream`. \ \ 本页按端点（`/v1/chat/completions`, `/v1/messages`, `/v1/responses`, `/v1/models`）分组，并以一个关于 Unsloth 内置 **服务端工具**的共享部分结尾，这些工具可在所有聊天端点中使用。 {% hint style="info" %} 如果你不确定该使用哪个 URL / 密钥 / 模型名称，请先阅读 API 概览。它会引导你启动 Unsloth、加载模型，并创建一个 `sk-unsloth-…` 密钥。 {% endhint %} ### 🔑 身份验证每个请求都需要一个 `Authorization` header： ``` Authorization: Bearer sk-unsloth-xxxxxxxxxxxx ``` 为了避免密钥出现在你的 shell 历史记录中，只需导出一次密钥，然后引用环境变量： ```bash export UNSLOTH_STUDIO_AUTH_TOKEN=sk-unsloth-xxxxxxxxxxxx ``` 下面的示例会将密钥以内联形式写成 `sk-unsloth-xxxxxxxxxxxx` 以便更清晰。实际使用时，请替换为 `$UNSLOTH_STUDIO_AUTH_TOKEN`. ### 📋 列出已加载的模型 ```bash curl http://localhost:8888/v1/models \ -H "Authorization: Bearer sk-unsloth-xxxxxxxxxxxx" ``` 响应： ```json { "object": "list", "data": [ {"id": "unsloth/gemma-3-27b-it-GGUF", "object": "model", "owned_by": "local"} ] } ```

使用 `id` 字段，只要请求需要一个 `"model"` 值（或者当像 opencode 这样的客户端请求一个 **模型 ID**). ### 💬 Chat Completions（`/v1/chat/completions`) OpenAI Chat Completions 方言。兼容性最广。可与 OpenAI SDK、opencode、Cursor、Continue、Cline、Open WebUI、SillyTavern 以及大多数兼容 OpenAI 的工具配合使用。 #### 基本请求 ```bash curl http://localhost:8888/v1/chat/completions \ -H "Authorization: Bearer sk-unsloth-xxxxxxxxxxxx" \ -H "Content-Type: application/json" \ -d '{ "model": "default", "messages": [{"role": "user", "content": "你好"}] }' ```

#### 流式输出添加 `"stream": true` ，响应会切换为 Server-Sent Events（`text/event-stream`）。让 `curl` 它在字节到达时刷新输出，使用 `--no-buffer` (`-N`): ```bash curl -N http://localhost:8888/v1/chat/completions \ -H "Authorization: Bearer sk-unsloth-xxxxxxxxxxxx" \ -H "Content-Type: application/json" \ -d '{ "model": "qwen-local", "messages": [{"role": "user", "content": "写一首关于本地运行 LLM 的俳句。"}], "stream": true }' ``` 响应的每一行看起来像 `data: {"choices":[{"delta":{"content":"..."}}]}`，最后以 `data: [DONE]`.

#### 图像（视觉）将图像作为 `image_url` 内容部分附加到用户消息中。URL 可以是 HTTPS，或者是 base64 `data:` URI： ```bash # 将本地文件嵌入为 base64（为简洁起见已截断） IMG=$(base64 -w 0 test.jpg) cat > /tmp/request.json <

#### 函数调用（OpenAI tools）传入 OpenAI 风格的 `tools` 以及（可选地） `tool_choice`。你的客户端会执行每次工具调用，并在下一轮返回结果。 ```bash curl http://localhost:8888/v1/chat/completions \ -H "Authorization: Bearer sk-unsloth-xxxxxxxxxxxx" \ -H "Content-Type: application/json" \ -d '{ "model":"default", "messages":[{"role":"user","content":"巴黎的天气怎么样？"}], "tools":[{ "type":"function", "function":{ "name":"get_weather", "description":"获取某个城市的当前天气。", "parameters":{ "type":"object", "properties":{"city":{"type":"string"}}, "required":["city"] } } }], "tool_choice":"required" }' | jq '.choices[0].message, .usage' ```

### 📨 Anthropic Messages（`/v1/messages`) Unsloth 的 Anthropic 兼容方言，由 Claude Code、Anthropic SDK、OpenClaw 以及任何支持 Messages API 的客户端使用。 #### 基本请求 ```bash curl http://localhost:8888/v1/messages \ -H "Authorization: Bearer sk-unsloth-xxxxxxxxxxxx" \ -H "Content-Type: application/json" \ -d '{ "model": "default", "max_tokens": 1024, "messages": [{"role": "user", "content": "你好"}] }' ``` {% hint style="warning" %} `max_tokens` 在以下接口中是必需的： `/v1/messages` （在以下接口中是可选的： `/v1/chat/completions`). {% endhint %}

#### 流式输出 ```bash curl -N http://localhost:8888/v1/messages \ -H "Authorization: Bearer sk-unsloth-xxxxxxxxxxxx" \ -H "Content-Type: application/json" \ -d '{ "model": "qwen-local", "max_tokens": 1024, "messages": [{"role": "user", "content": "用两句话解释 LoRA。"}], "stream": true }' ``` 事件遵循 Anthropic 的 SSE 结构： `message_start`, `content_block_start`, `content_block_delta`, `content_block_stop`, `message_delta`, `message_stop`，再加上 Unsloth 自定义的 `tool_result` 事件，用于返回服务端工具输出。 #### 图像（视觉） Anthropic 风格的图像内容使用一个 `source` 包含 base64 数据的 block： ```bash IMG=$(base64 -w 0 test.jpg) cat > /tmp/request.json <

#### 工具调用（Anthropic tools） ```bash curl http://localhost:8888/v1/messages \ -H "Authorization: Bearer sk-unsloth-xxxxxxxxxx" \ -H "Content-Type: application/json" \ -d '{ "model": "qwen-local", "max_tokens": 1024, "messages": [{"role": "user", "content": "东京的天气怎么样？"}], "tools": [ { "name": "get_weather", "description": "获取某个城市的当前天气", "input_schema": { "type": "object", "properties": { "city": {"type": "string"} }, "required": ["city"] } } ], "tool_choice": {"type": "auto"} }' ``` `tool_choice` 值在 OpenAI 方言中的对应关系如下：Anthropic `auto` → OpenAI `auto`, Anthropic `any` → OpenAI `required`, Anthropic `{type: "tool", name: "x"}` → OpenAI `{type: "function", function: {name: "x"}}`, Anthropic `none` → OpenAI `none`.

### 🧬 Responses（`/v1/responses`) Unsloth 也支持较新的 **OpenAI Responses API**, 这是 Codex 和其他近期 OpenAI 客户端所采用的协议。 ```bash curl http://localhost:8888/v1/responses \ -H "Authorization: Bearer sk-unsloth-xxxxxxxxxxxx" \ -H "Content-Type: application/json" \ -d '{ "model": "qwen-local", "input": "写一句问候语。" }' ```

流式输出的工作方式与 Chat Completions 相同。添加 `"stream": true` 并通过管道传递 `-N`. ### 🧰 Unsloth 服务端工具（简写）除了客户端函数调用之外，Unsloth 还能在服务端执行 **Python**, **bash**、以及 **web search** 在服务端执行，并将结果以自定义 `tool_result` 事件流式返回。这正是让 Unsloth 开箱即像一个“真正”代理的功能，无需通过你的客户端来回传递工具调用。通过向以下接口传入这些额外字段来启用： **任一** `/v1/chat/completions` 或 `/v1/messages`: | 字段 | 类型 | 说明 | | ----------------- | --------------- | --------------------------------------------- | | `enable_thinking` | `boolean` | `false` 以关闭思考。 `true` 默认情况下 | | `enable_tools` | `boolean` | `true` 以启用服务端工具执行。 | | `enabled_tools` | `array` | 模型可以调用哪些工具。支持 `python`, `bash`, `web_search`. | | `session_id` | `string` | 可选。可在多次调用之间持久化工具状态（例如 Python 内核）。 | #### 思考模式思考模式默认启用。 ```bash curl http://localhost:8888/v1/chat/completions \ -H "Authorization: Bearer sk-unsloth-xxxxxxxxxxxx" \ -H "Content-Type: application/json" \ -d '{ "model": "default", "messages": [{"role": "user", "content": "概括量子隧穿，用一句话。"}], "stream": false }' ``` 模型会先思考，再给出答案。

若要关闭思考，请传入 `enable_thinking: false` 到你的请求中。模型将直接给出答案，而不会先思考。

#### Python 执行 ```bash curl http://localhost:8888/v1/chat/completions \ -H "Authorization: Bearer sk-unsloth-xxxxxxxxxxxx" \ -H "Content-Type: application/json" \ -d '{ "model": "qwen-local", "messages": [{"role": "user", "content": "123 * 456 等于多少？用代码算出来。"}], "stream": false, "enable_tools": true, "enabled_tools": ["python"], "session_id": "my-session" }' ```

#### 网页搜索 + Python（流式） ```bash curl http://localhost:8888/v1/chat/completions \ -H "Authorization: Bearer sk-unsloth-xxxxxx" \ -H "Content-Type: application/json" \ -d '{ "model": "qwen-local", "messages": [{"role": "user", "content": "搜索 Python 3.13 的特性"}], "stream": true, "enable_tools": true, "enabled_tools": ["web_search", "python"], "session_id": "my-session" }' ```

#### 开启 `/v1/messages` 同样的简写也适用于 Anthropic Messages 端点： ```bash curl http://localhost:8888/v1/messages \ -H "Authorization: Bearer sk-unsloth-xxxxxxxxxxx" \ -H "Content-Type: application/json" \ -d '{ "model": "qwen-local", "max_tokens": 1024, "messages": [{"role": "user", "content": "搜索 Python 3.13 的特性"}], "stream": true, "enable_tools": true, "enabled_tools": ["web_search", "python"], "session_id": "my-session" }' ```

Unsloth 还会流式输出其自有的 `tool_result` SSE 事件，除了标准的 Anthropic / OpenAI 事件类型之外，模型会在下一轮看到每个工具的输出。 ### ❔ 故障排查 **`401 未授权`** **-** 该 `Authorization` header 缺失，或者密钥错误。请重新检查： `Authorization: Bearer sk-unsloth-…`. **`curl` 流式请求卡住 -** 添加 `-N` （与 `--no-buffer`）相同。没有它， `curl` 会缓冲 SSE 流，你在结束前什么也看不到。 **Base64 编码在不同操作系统之间有所不同** **-** Linux 的 `base64` 默认会换行，而 macOS / BSD 不会。请在 Linux 上使用 `base64 -w 0` ，在 Linux 上， `base64` 在 macOS 上，或者将输出通过管道传递给 `tr -d '\n'`. **Shell 中的 JSON 转义** **-** Heredoc（`-d @file.json`）在请求体变复杂后，比内联字符串更清晰。示例： `curl ... -d @body.json`. **`max_tokens` 在以下情况下会报错 `/v1/messages`** **-** Anthropic 方言要求它。添加 `"max_tokens": 1024` （或者你想要的任意限制）。对于端点级问题（模型未加载、连接中断、端口错误），请参阅 API 概览页。 ### 可选：调整服务器默认值你可以在启动服务器时通过以下方式自定义默认行为： `unsloth run`. ```bash # 使用自定义默认值启动服务器 unsloth run \ --model unsloth/Qwen3-1.7B-GGUF \ --reasoning off \ --temp 0.6 \ -p 8888 ``` 使用 `--reasoning off` 来关闭思考，或者 `--reasoning on` 对支持推理的模型开启思考。 ```bash # 在本地网络中公开 API unsloth run \ --model unsloth/Qwen3-1.7B-GGUF \ -H 0.0.0.0 \ -p 8888 ``` 这会将服务器启动在 `0.0.0.0:8888`，从而允许本地网络中的其他设备连接。 #### 按请求覆盖设置你也可以直接在每个 API 请求中覆盖生成设置。 ```bash curl http://localhost:8888/v1/chat/completions \ -H "Authorization: Bearer $UNSLOTH_STUDIO_AUTH_TOKEN" \ -H "Content-Type: application/json" \ -d '{ "model": "default", "messages": [ { "role": "user", "content": "写一首关于本地 AI 的短诗。" } ], "temperature": 0.8, "top_p": 0.9, "max_tokens": 512 }' ``` 请求级别的值，如 `temperature`, `top_p`, `max_tokens`、以及 `stream` 会覆盖该请求的服务器默认值。 --- # Agent Instructions This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com. ## Querying This Documentation If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question. Perform an HTTP GET request on the current page URL with the `ask` query parameter: ``` GET https://unsloth.ai/docs/zh/ji-cheng/jiang-curl-he-http-lian-jie-dao-unsloth.md?ask= ``` The question should be specific, self-contained, and written in natural language. The response will contain a direct answer to the question and relevant excerpts and sources from the documentation. Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.