# Connect Python SDK to Unsloth

Unsloth serves three OpenAI-compatible dialects at the same base URL. Chat Completions, Responses, and Anthropic Messages,  so every mainstream Python SDK works against it. \
\
You change only the `base_url` and `api_key`  on the client; everything else (streaming, tool calling, vision, structured output) behaves the way the SDK documents. This page covers the two SDKs most developers reach for first: the official **OpenAI Python SDK** and the official **Anthropic Python SDK**.

{% hint style="info" %}
If you're not sure what URL / key / model name to use, read the API overview first. It walks you through starting, loading a model, and creating an `sk-unsloth-…` key.
{% endhint %}

### 🔑 Prerequisites

Before you run any of the snippets below you'll need:

* **Unsloth running locally** with a model loaded (note the port: typically `8000` or `8888`).
* **An `sk-unsloth-…` API key** created from **Settings → API**.
* **A model name.** The name of the GGUF model inside Unsloth (e.g. `qwen-local`, `gpt-oss-20b-GGUF`). If you forget it, run:

  ```bash
  curl http://localhost:8888/v1/models -H "Authorization: Bearer sk-unsloth-…"
  ```

  and copy the `id` field.

Set the key as an env var so you never paste it into code:

```bash
export UNSLOTH_STUDIO_AUTH_TOKEN=sk-unsloth-xxxxxxxxxxxx
```

#### 🤖 OpenAI SDK

Unsloth's `/v1/chat/completions` endpoint is a drop-in for the OpenAI Python SDK. The client treats Unsloth like any other OpenAI-compatible provider.

**1. Install the SDK:**

```bash
pip install openai
```

**2. Create a client** pointed at Unsloth:

```python
import os
from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:8888/v1",              # your unsloth port + /v1
    api_key=os.environ["UNSLOTH_STUDIO_AUTH_TOKEN"],     # your sk-unsloth-… key
)
```

#### Basic chat completion

```python
response = client.chat.completions.create(
    model="default",                               # the name you gave the model in unsloth or default
    messages=[
        {"role": "user", "content": "Give me two facts about Paris"}
    ],
)
print(response.choices[0].message.content)
```

<figure><img src="/files/7snUN5tBaJx5oz98JeTd" alt=""><figcaption></figcaption></figure>

#### Streaming

Set `stream=True` and iterate over the returned generator:

```python
stream = client.chat.completions.create(
    model="qwen-local",
    messages=[{"role": "user", "content": "Write a haiku about locally-run LLMs."}],
    stream=True,
)
for chunk in stream:
    if chunk.choices:
        delta = chunk.choices[0].delta.content
        if delta:
            print(delta, end="", flush=True)
```

<figure><img src="/files/X4f2SFD41fP9wJ122aFz" alt=""><figcaption></figcaption></figure>

#### Images (vision)

Attach an image as an `image_url` content part. Unsloth accepts either an HTTP(S) URL or a `data:` base64 URI:

```python
import base64
from pathlib import Path

img_b64 = base64.b64encode(Path("test.jpg").read_bytes()).decode()

response = client.chat.completions.create(
    model="default",
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "image_url",
                    "image_url": {"url": f"data:image/jpeg;base64,{img_b64}"},
                },
                {"type": "text", "text": "What's in this image?"},
            ],
        }
    ],
)
print(response.choices[0].message.content)
```

{% hint style="info" %}
The loaded model must be multimodal. If you load a text-only model, the vision request will succeed structurally but the model won't be able to "see" the image.
{% endhint %}

<figure><img src="/files/x8uFNTG7lrLrGFKNlP0i" alt=""><figcaption></figcaption></figure>

#### Function calling (OpenAI tools)

Pass OpenAI-style `tools` and (optionally) `tool_choice` and Unsloth forwards them to the backend. Your client is responsible for executing each tool call and returning the result on the next turn:

```python
tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get the current weather for a city",
            "parameters": {
                "type": "object",
                "properties": {
                    "city": {"type": "string", "description": "City name, e.g. 'Paris'"},
                },
                "required": ["city"],
            },
        },
    }
]

response = client.chat.completions.create(
    model="default",
    messages=[{"role": "user", "content": "What's the weather in Perth right now?"}],
    tools=tools,
    tool_choice="auto",
)

tool_call = response.choices[0].message.tool_calls[0]
print(tool_call.function.name, tool_call.function.arguments)
```

<figure><img src="/files/EzbvhYbLNuXDYrEMERUU" alt=""><figcaption></figcaption></figure>

#### Unsloth server-side tools (shorthand)

In addition to OpenAI-style client-side tools, Unsloth can execute **Python**, **bash**, and **web search** server-side and stream the results back automatically. Opt in via the `extra_body` parameter so the fields pass straight through to Unsloth:

```python
stream = client.chat.completions.create(
    model="default",
    messages=[{"role": "user", "content": "What is 123 * 456? Use Python to compute it."}],
    stream=True,
    extra_body={
        "enable_tools": True,
        "enabled_tools": ["python", "web_search"],
        "session_id": "my-session",
    },
)
for chunk in stream:
    if chunk.choices:
        delta = chunk.choices[0].delta.content
        if delta:
            print(delta, end="", flush=True)
```

<figure><img src="/files/OTaxZQ04zucqL6PkKO9X" alt=""><figcaption></figcaption></figure>

<figure><img src="/files/NCU9TbcHNrK8ZNnkVxuU" alt=""><figcaption></figcaption></figure>

The `session_id` is optional. Use it to persist tool state (e.g. a Python kernel) across calls.

{% hint style="info" %}
`enabled_tools` currently supports `"python"`, `"bash"`, and `"web_search"`. Tool results are streamed back as `tool_result` events so the model can see them on its next turn.
{% endhint %}

**Listing models**

```python
models = client.models.list()
for m in models.data:
    print(m.id)
```

<figure><img src="/files/46EVD72fh9T5vieWhKik" alt=""><figcaption></figcaption></figure>

#### 🧠 Anthropic SDK

Unsloth's `/v1/messages` endpoint is a drop-in for the Anthropic Python SDK.

**1. Install the SDK:**

```bash
pip install anthropic
```

**2. Create a client** pointed at Unsloth:

```python
import os
from anthropic import Anthropic

client = Anthropic(
    base_url="http://localhost:8888",                 # your unsloth port (no /v1 here - the SDK adds it)
    api_key="dummy",                                  # a random none empty value
    default_headers={"Authorization": "Bearer {os.environ['UNSLOTH_STUDIO_AUTH_TOKEN']}"} # your sk-unsloth-… key
)
```

#### Basic message

```python
message = client.messages.create(
    model="default",
    max_tokens=1024,
    messages=[
        {"role": "user", "content": "Say hello in three languages."}
    ],
)
print(message.content[0].text)
```

<figure><img src="/files/PZnQ9ViIvVaomi1NtnPr" alt=""><figcaption></figcaption></figure>

#### Streaming

The SDK exposes a context manager that yields text deltas:

```python
with client.messages.stream(
    model="default",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Explain LoRA in two sentences."}],
) as stream:
    for text in stream.text_stream:
        print(text, end="", flush=True)
```

#### Images (vision)

Anthropic-style image content uses a `source` block with base64 data:

```python
import base64
from pathlib import Path

img_b64 = base64.standard_b64encode(Path("photo.jpg").read_bytes()).decode()

message = client.messages.create(
    model="default",
    max_tokens=1024,
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "image",
                    "source": {
                        "type": "base64",
                        "media_type": "image/jpeg",
                        "data": img_b64,
                    },
                },
                {"type": "text", "text": "What's in this image?"},
            ],
        }
    ],
)
print(message.content[0].text)
```

<figure><img src="/files/qX7ueMXCe3wzsJCVNMNX" alt=""><figcaption></figcaption></figure>

#### Tool calling (Anthropic tools)

Pass Anthropic-style `tools` with an `input_schema` and Unsloth forwards them natively:

```python
tools = [
    {
        "name": "get_weather",
        "description": "Get the current weather for a city",
        "input_schema": {
            "type": "object",
            "properties": {
                "city": {"type": "string", "description": "City name, e.g. 'Tokyo'"},
            },
            "required": ["city"],
        },
    }
]

message = client.messages.create(
    model="default",
    max_tokens=1024,
    tools=tools,
    tool_choice={"type": "auto"},
    messages=[{"role": "user", "content": "What's the weather in Tokyo?"}],
)

for block in message.content:
    if block.type == "tool_use":
        print(block.name, block.input)
```

<figure><img src="/files/wkHcigdom5yOxcvgqVxj" alt=""><figcaption></figcaption></figure>

#### Unsloth server-side tools (shorthand)

The same `enable_tools` / `enabled_tools` / `session_id` shorthand works against `/v1/messages` pass it through `extra_body`:

```python
with client.messages.stream(
    model="default",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Search for Python 3.13 features and summarize."}],
    extra_body={
        "enable_tools": True,
        "enabled_tools": ["web_search", "python"],
        "session_id": "my-session",
    },
) as stream:
    for text in stream.text_stream:
        print(text, end="", flush=True)
```

<figure><img src="/files/ntTIw6iAHo0O6h42EIgB" alt=""><figcaption></figcaption></figure>

<figure><img src="/files/Col5kHmGaiXuyetEoP8X" alt=""><figcaption></figcaption></figure>

Unsloth emits custom `tool_result` SSE events for the model's view of each tool call's output. The Anthropic SDK passes these through its event stream unchanged.

#### JSON decoding (`response_format`)

Unsloth supports OpenAI-style structured outputs via `response_format`. Pass a JSON Schema and the model is constrained to produce JSON matching it.

````python
import json
import os
import re
from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:8888/v1",
    api_key=os.environ["UNSLOTH_STUDIO_AUTH_TOKEN"],
)

response = client.chat.completions.create(
    model="default",
    stream=False,
    temperature=0.0,
    max_tokens=1024,
    messages=[
        {
            "role": "user",
            "content": "Pick a country: Japan, Egypt, or Peru. Explain why in one sentence.",
        },
    ],
    response_format={
        "type": "json_schema",
        "json_schema": {
            "name": "country_pick",
            "schema": {
                "type": "object",
                "properties": {
                    "country": {"type": "string", "enum": ["Japan", "Egypt", "Peru"]},
                    "reason":  {"type": "string"},
                },
                "required": ["country", "reason"],
                "additionalProperties": False,
            },
            "strict": True,
        },
    },
)

raw = response.choices[0].message.content
# Strip the markdown fence Gemma 4 wraps around the JSON, then parse.
cleaned = re.sub(r"^```(?:json)?\s*", "", raw)
cleaned = re.sub(r"\s*```$", "", cleaned)
parsed = json.loads(cleaned)

print(json.dumps(parsed, indent=2))
print()
print("country:", parsed["country"])
print("reason :", parsed["reason"])
````

The `strict: True` flag tells Unsloth to enforce the schema during decoding rather than relying on the model to comply on its own. `additionalProperties: False` and `required` work as in standard JSON Schema.

The terminal output should look roughly like this:

<figure><img src="/files/EeDN7C7BZ7JOYPCVdC6s" alt=""><figcaption></figcaption></figure>

## Strip the markdown fence Gemma 4 wraps around the JSON, then parse.

cleaned = re.sub(r"^`(?:json)?\s*", "", raw) cleaned = re.sub(r"\s*`$", "", cleaned) parsed = json.loads(cleaned)

print(json.dumps(parsed, indent=2)) print() print("country:", parsed\["country"]) print("reason :", parsed\["reason"])

### 🧪 Choosing an SDK

Both SDKs work against Unsloth. The right choice depends on the rest of your stack:

* Use the **OpenAI SDK** if your code already depends on the OpenAI Python package, you want OpenAI-style `tools` / `tool_choice`, or you plan to call the Responses API.
* Use the **Anthropic SDK** if your code already depends on the Anthropic package, you prefer Anthropic's `input_schema` tool format, or you want the Anthropic-native streaming event types.

You can use both in the same project. Unsloth serves them on the same port, so a single `sk-unsloth-…` key authenticates both.

### ❔ Troubleshooting

**`401 Unauthorized`**  The `UNSLOTH_STUDIO_API_KEY` env var isn't set, or the key is wrong. Re-export and confirm with `echo $UNSLOTH_STUDIO_API_KEY`.

**`404 Not Found` from the OpenAI SDK** Check that `base_url` ends in `/v1`. The OpenAI SDK appends endpoint paths to the base URL as-is.

**`404 Not Found` from the Anthropic SDK** Check that `base_url` does **not** end in `/v1`. The Anthropic SDK adds `/v1/messages` itself.

**`extra_body` fields aren't reaching Unsloth** Make sure you're on a recent `openai` / `anthropic` SDK. Older versions silently drop unknown fields. Upgrade with `pip install -U openai anthropic`.

**Streaming "hangs" then dumps everything at once** Whatever is wrapping your output is buffering. In a script, `print(..., flush=True)`; in a notebook, it's usually fine; behind a proxy, disable response buffering on the proxy.

For endpoint-level issues (wrong port, model not loading, lost connection, etc.) see the API overview page.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://unsloth.ai/docs/integrations/connect-python-sdk-to-unsloth.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
