Connect Curl & HTTP to Unsloth

Guide to hitting Unsloth's API with curl (or any HTTP client), complete with copy-pasteable recipes for every endpoint and feature..

Unsloth exposes three OpenAI/Anthropic-compatible wire formats at the same base URL on the port Unsloth started on. All of them take an Authorization: Bearer sk-unsloth-… header and return either JSON or SSE, depending on whether you set stream. This page groups the recipes by endpoint (/v1/chat/completions, /v1/messages, /v1/responses, /v1/models) and ends with a shared section on Unsloth's built-in server-side tools, which work across all the chat endpoints.

If you're not sure what URL / key / model name to use, read the API overview first. It walks you through starting Unsloth, loading a model, and creating an sk-unsloth-… key.

🔑 Authentication

Every request needs an Authorization header:

Authorization: Bearer sk-unsloth-xxxxxxxxxxxx

To keep keys out of your shell history, export the key once and reference the env var:

export UNSLOTH_STUDIO_AUTH_TOKEN=sk-unsloth-xxxxxxxxxxxx

The snippets below inline the key as sk-unsloth-xxxxxxxxxxxx for clarity. In practice, substitute $UNSLOTH_STUDIO_API_KEY.

📋 List loaded models

curl http://localhost:8888/v1/models \
  -H "Authorization: Bearer sk-unsloth-xxxxxxxxxxxx"

Response:

{
  "object": "list",
  "data": [
    {"id": "unsloth/gemma-3-27b-it-GGUF", "object": "model", "owned_by": "local"}
  ]
}

Use the id field whenever a request needs a "model" value (or when a client like opencode asks for a Model ID).

💬 Chat Completions (/v1/chat/completions)

The OpenAI Chat Completions dialect. The broadest compatibility surface. Works with the OpenAI SDK, opencode, Cursor, Continue, Cline, Open WebUI, SillyTavern, and most OpenAI-compatible tools.

Basic request

Streaming

Add "stream": true and the response switches to Server-Sent Events (text/event-stream). Tell curl to flush as bytes arrive with --no-buffer (-N):

Each line of the response looks like data: {"choices":[{"delta":{"content":"..."}}]}, ending with data: [DONE].

Images (vision)

Attach an image as an image_url content part in the user message. The URL can be HTTPS or a base64 data: URI:

The loaded model must be multimodal. If you load a text-only model the request succeeds structurally but the model won't process the image.

Function calling (OpenAI tools)

Pass OpenAI-style tools and (optionally) tool_choice. Your client runs each tool call and returns the result on the next turn.

📨 Anthropic Messages (/v1/messages)

Unsloth's Anthropic-compatible dialect used by Claude Code, the Anthropic SDK, OpenClaw, and any client that speaks the Messages API.

Basic request

Streaming

Events follow Anthropic's SSE shape: message_start, content_block_start, content_block_delta, content_block_stop, message_delta, message_stop, plus Unsloth's custom tool_result event for server-side tool output.

Images (vision)

Anthropic-style image content uses a source block with base64 data:

Tool calling (Anthropic tools)

tool_choice values map as follows to the OpenAI dialect: Anthropic auto → OpenAI auto, Anthropic any → OpenAI required, Anthropic {type: "tool", name: "x"} → OpenAI {type: "function", function: {name: "x"}}, Anthropic none → OpenAI none.

🧬 Responses (/v1/responses)

Unsloth also speaks the newer OpenAI Responses API, the protocol Codex and other recent OpenAI clients have moved to.

Streaming works the same way as Chat Completions. Add "stream": true and pipe with -N.

🧰 Unsloth server-side tools (shorthand)

In addition to client-side function calling, Unsloth can execute Python, bash, and web search server-side and stream the results back as custom tool_result events. This is the feature that makes Unsloth feel like a "real" agent out of the box, no round-tripping tool calls through your client.

Opt in by passing these extra fields to either /v1/chat/completions or /v1/messages:

Field
Type
Notes

enable_thinking

boolean

false to disable thinking. true by default

enable_tools

boolean

true to enable server-side tool execution.

enabled_tools

array<string>

Which tools the model can call. Supports python, bash, web_search.

session_id

string

Optional. Persists tool state (e.g. Python kernel) across calls.

Thinking mode

Thinking mode is enabled by default.

The model will think before providing an answer.

To disable thinking pass enable_thinking: false in your request. The model will provide an answer without thinking first.

Python execution

Web search + Python (streaming)

On /v1/messages

The same shorthand works against the Anthropic Messages endpoint:

Unsloth streams its own tool_result SSE events in addition to the standard Anthropic / OpenAI event types, The model sees each tool's output on its next turn.

❔ Troubleshooting

401 Unauthorized - The Authorization header is missing or the key is wrong. Double-check: Authorization: Bearer sk-unsloth-….

curl hangs on streaming requests - Add -N (same as --no-buffer). Without it, curl buffers the SSE stream and you see nothing until the end.

Base64 encoding differs between OSes - Linux's base64 defaults to wrapping lines, macOS / BSD does not. Use base64 -w 0 on Linux, base64 on macOS, or pipe the output through tr -d '\n'.

JSON escaping in shells - Heredocs (-d @file.json) are cleaner than inline strings once the body gets complex. Example: curl ... -d @body.json.

max_tokens errors on /v1/messages - The Anthropic dialect requires it. Add "max_tokens": 1024 (or whatever limit you want).

For endpoint-level issues (model not loading, connection dropped, wrong port) see the API overview page.

Last updated

Was this helpful?