# How to use Unsloth as an API endpoint

You can now use local LLMs via tools like [Claude Code](/docs/basics/claude-code.md) and [Codex](/docs/basics/codex.md) by connecting it to Unsloth's API endpoint. This means you'll be able to directly run local [Qwen](/docs/models/qwen3.6.md) and [Gemma](/docs/models/gemma-4.md) models in those tools with Unsloth Studio with bonus inference features like self-healing tool-calling, code execution, websearch etc.

Models (and GGUFs) you load in Unsloth gets exposed as an authenticated API via `llama-server`, so you can use the local models with your favorite agent, SDK, or chat client at Unsloth.

Studio speaks two dialects on the same port:

* **Anthropic-compatible `/v1/messages`**  for Claude Code, OpenClaw, the Anthropic SDK, and any client that expects the Messages API.
* **OpenAI-compatible `/v1/chat/completions`** and **`/v1/responses`** for the OpenAI SDK, opencode, Cursor, Continue, Cline, Open WebUI, SillyTavern, and any OpenAI-compatible tool.

Both support streaming, tool calling (OpenAI `tools` / Anthropic `tools`), and vision inputs.

{% hint style="info" %}
API access is part of **Unsloth Studio (Beta)**. Make sure you're on the latest version, earlier builds don't expose the external API. See Installation to install or update.
{% endhint %}

### ⚡ Quickstart

1. **Install or update Unsloth Studio.** Earlier versions don't expose the external API. See Installation.
2. **Launch Studio.** Note the port it starts on is usually `8000` or `8888`. You'll see it in the terminal output and in the browser URL (`http://localhost:PORT`).
3. **Load a model.** Click **New Chat**, pick or search a model (GGUF), and wait for it to finish loading.
4. **Create an API key.** In Studio, click your **Unsloth** avatar in the bottom-left → **Settings** → **API Keys** → type a key name → **Create**. Copy the `sk-unsloth-…` value that appears . Studio only shows it once.
5. **Point your client at Studio.** Use `http://localhost:PORT` as the base URL and your `sk-unsloth-…` key for auth. Jump to the recipe for your tool below.

<div data-with-frame="true"><figure><img src="/files/PxQ3x37GwzkPPjHW6pVh" alt="" width="375"><figcaption></figcaption></figure></div>

### 🔑 Creating an API key

Keys are created from inside Studio at **Unsloth → Settings → API Keys**.

1. Open the sidebar, click your **Unsloth** avatar at the bottom-left.
2. Go to **Settings** → **API Keys**.
3. Enter a friendly name (e.g. `claude-code-macbook`). Set an expiry (optional)
4. Click **Create**.
5. **Copy the key.** Studio stores only a hash and you won't be able to view it again.

<div data-with-frame="true"><figure><img src="/files/h74Myk0Gm7aygIC5vAsH" alt="" width="563"><figcaption></figcaption></figure></div>

All keys start with the `sk-unsloth-` prefix. Revoke a key from the same page at any time. Requests made with a revoked key will fail with `401 Unauthorized`.

{% hint style="warning" %}
Treat your API key like a password. Anyone with the key and network access to your Studio instance can send requests to your loaded model.
{% endhint %}

### <i class="fa-terminal">:terminal:</i> Unsloth studio run command

Alternatively you can load a model and automatically have an API Key created for you by using the run command with our `unsloth` CLI tool. To do so:

1. **Install or update Unsloth Studio.** Earlier versions don't expose the external API. See Installation.
2. **Load a GGUF model.** load a GGUF model using the run command. This will also load the UI on the default port. The endpoint URL and API Key will be printed out to the console , ready for you to be used with your client of choice.

   ```
   unsloth studio run --model unsloth/gpt-oss-120b-GGUF --gguf-variant UD-Q4_K_XL
   ```

what each argument does

* `--model` :  The model's repo URL on HuggingFace Hub.
* `--gguf-variant` :  the **quantization** variant you would like to load. In this case UD-Q4\_K\_XL. Available variants is based on what is available in the actual HF repo.

<figure><img src="/files/MaID7j6ybUcV10UN2VhO" alt=""><figcaption></figcaption></figure>

🌐 **Endpoints**\
Studio exposes these endpoints on whichever port it booted on (typically `http://localhost:8000` or `http://localhost:8888`):

| Endpoint                    | Compatible with             | Use it from                                                           |
| --------------------------- | --------------------------- | --------------------------------------------------------------------- |
| `POST /v1/messages`         | Anthropic Messages API      | Claude Code, Anthropic SDK, OpenClaw, anything that speaks Anthropic  |
| `POST /v1/chat/completions` | OpenAI Chat Completions API | OpenAI SDK, opencode, Cursor, Continue, Cline, Open WebUI, curl, etc. |
| `GET /v1/models`            | OpenAI models list          | List the models currently loaded in Studio                            |

Authenticate with an `Authorization: Bearer sk-unsloth-…` header on every request.

{% hint style="info" %}
You don't need to run different servers for the two formats. Studio handles both on the same port.
{% endhint %}

### 🖇️ Connecting your client

Unsloth enables you run local LLMs via most frameworks including [Claude Code](/docs/basics/claude-code.md), [Codex](/docs/basics/codex.md), [OpenClaw](/docs/integrations/openclaw.md), [OpenCode](/docs/integrations/opencode.md) and more. Click the specific tools below for a guide:

{% columns %}
{% column width="50%" %}
{% content-ref url="/pages/w020xJgdCTBtTvfHtvye" %}
[Claude Code](/docs/basics/claude-code.md)
{% endcontent-ref %}

{% content-ref url="/pages/PCjZ57h5pE0QccKyJMYD" %}
[OpenAI Codex](/docs/basics/codex.md)
{% endcontent-ref %}

{% content-ref url="/pages/viZvzp58ObzZkXtCm0qv" %}
[Python SDK](/docs/integrations/connect-python-sdk-to-unsloth.md)
{% endcontent-ref %}
{% endcolumn %}

{% column width="50%" %}
{% content-ref url="/pages/CwQEpEmkKPmyEYdnEngt" %}
[OpenClaw](/docs/integrations/openclaw.md)
{% endcontent-ref %}

{% content-ref url="/pages/qaA8ZjTxsH2GTuBOHyra" %}
[OpenCode](/docs/integrations/opencode.md)
{% endcontent-ref %}

{% content-ref url="/pages/1cwX0SOqPoqLx7fQ2sIS" %}
[Curl & HTTP](/docs/integrations/connect-curl-and-http-to-unsloth.md)
{% endcontent-ref %}
{% endcolumn %}
{% endcolumns %}

### 🧰 Tool calling

Both endpoints support function / tool calling in their native format, plus an Unsloth-specific shorthand for Studio's built-in tools.

**OpenAI-style tools:** send `tools` and `tool_choice` to `/v1/chat/completions` exactly as you would with OpenAI. Claude Code (via `/v1/messages`), opencode, Cursor, Continue, and Cline all work out of the box.

**Anthropic-style tools:** send `tools` (with `input_schema`) and `tool_choice` to `/v1/messages` exactly as you would with Claude.

**Studio server side tools:** Studio can execute Python, web search, and bash *server-side* and stream the results back as `tool_result` events. Opt in by adding these extra fields to either endpoint:

```json
{
  "messages": [{"role": "user", "content": "What is 123 * 456? Use Python."}],
  "stream": true,
  "enable_tools": true,
  "enabled_tools": ["python", "web_search","terminal"],
  "session_id": "my-session"
}
```

The model sees each tool's output on its next turn. For deeper coverage (schemas, streaming events, chaining), see .

{% hint style="info" %}
If you're using the Anthropic `/v1/messages` endpoint, `tool_choice` maps cleanly: Anthropic `auto` → OpenAI `auto`, Anthropic `any` → OpenAI `required`, Anthropic `{type: "tool", name: "x"}` → OpenAI `{type: "function", function: {name: "x"}}`, Anthropic `none` → OpenAI `none`.
{% endhint %}

### ❔ Troubleshooting

**`401 Unauthorized`** :  either the `Authorization` header is missing or the key is wrong. Keys must be passed as `Authorization: Bearer sk-unsloth-…`. If you lost the key, create a new one from **Settings → API Keys.** Studio doesn't show old keys after creation.

**`Lost connection to the model server`** : Studio couldn't reach the underlying llama.cpp server. Usually the model finished loading but crashed, or the model tab was closed inside Studio. Reload the model from **New Chat** and retry.

**Claude Code shows the default Anthropic model, not my local one** :  check all three env vars are exported in the **same** shell where you run `claude`:

```bash
echo $ANTHROPIC_BASE_URL
echo $ANTHROPIC_AUTH_TOKEN
echo $ANTHROPIC_MODEL
```

Then run `/model` inside Claude Code to confirm. On Windows PowerShell use `$env:ANTHROPIC_BASE_URL` etc.

**`stream: true` returns a single JSON blob instead of SSE** :  make sure you're hitting the right path (`/v1/messages` or `/v1/chat/completions`) and that your HTTP client is actually consuming the response as a stream, not buffering it.

**I can't find the name of the model to add to opencode (or OpenClaw / any other client)** :  ask Studio directly. `GET /v1/models` returns the exact model ID you need to plug into the client's "Model ID" field:

```bash
curl http://localhost:8888/v1/models \
  -H "Authorization: Bearer sk-unsloth-xxxxxxxxxxxx"
```

You'll get back a JSON payload of the form `{"data": [{"id": "gpt-oss-20b-GGUF", ...}]}`. Copy the `id` value, that's the string opencode's **Model ID** field (left column) and OpenClaw's `models[].id` expect. The display name on the right is whatever you want users to see.

**Tool calls aren't executed** :  The model needs to support tool calling for client-side tools (`tools` / `tool_choice`). For Studio's built-in tools, remember to set `enable_tools: true` **and** list the ones you want in `enabled_tools` (e.g. `["python", "web_search"]`).


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://unsloth.ai/docs/basics/api.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.