# Connect API Providers & Model Servers to Unsloth

Learn how to run models from Ollama, llama.cpp, vLLM, OpenAI, Anthropic, and other providers through a single local UI interface with [Unsloth](/docs/new/studio.md), an open-source repo for running and training LLMs.

Once connected, you can run models with code execution, tool-calling, thinking, and other features in the same Unsloth chat interface used for both local and cloud models.

Unsloth uniquely supports [prompt caching](#prompt-caching) (to save you many tokens without accuracy degradation) while preserving access to provider-native capabilities, such as OpenAI’s built-in [web search](#web-search-and-thinking) and [code execution](#code-execution).

### Providers

Connections fall into two groups: hosted API providers that run models for you, and model servers that you run or control.

**Cloud Providers -** Hosted APIs that use an account API key:

| Connection | Capabilities                            | Setup guide                                                       |
| ---------- | --------------------------------------- | ----------------------------------------------------------------- |
| OpenAI     | Search, code, thinking                  | [OpenAI →](/docs/integrations/connections/openai.md)              |
| Anthropic  | Search, code, thinking                  | [Anthropic →](/docs/integrations/connections/anthropic-claude.md) |
| OpenRouter | Many hosted models through one API key. | [OpenRouter →](/docs/integrations/connections/openrouter.md)      |

**Model Servers -** Inference servers running locally, on your network, or on your remote machine:

| Server    | Description                  | Guide                                                                                                     |
| --------- | ---------------------------- | --------------------------------------------------------------------------------------------------------- |
| Llama.cpp | Efficient GGUF model serving | [Llama.cpp →](/docs/integrations/connections/connect-llama.cpp-to-unsloth-run-ggufs-with-llama-server.md) |
| vLLM      | High-throughput serving      | [vLLM →](/docs/integrations/connections/vllm.md)                                                          |
| Ollama    | Simple local model server    | [Ollama →](/docs/integrations/connections/ollama.md)                                                      |

### Quickstart

To run an external provider's model, add an API key and select which models Unsloth should show. In this example, we’ll use [OpenAI](https://platform.openai.com/api-keys). The same setup works for Anthropic, and other providers.

{% stepper %}
{% step %}

#### Create API

Create a new API key from the provider’s dashboard and copy it.

<figure><img src="/files/Pmi2ri2cEhyBvFDxBNPF" alt="" width="563"><figcaption></figcaption></figure>
{% endstep %}

{% step %}

#### Setup Unsloth Studio

Now we will need to install and setup [Unsloth](/docs/new/studio.md), which will enable you to run the cloud models in a UI interface. [See here](/docs/new/studio/install.md) for more detailed instructions.

{% tabs %}
{% tab title="MacOS" %}

#### Step 1: Setup Unsloth

Launch the `terminal` from Mac, then install Unsloth by entering the command below.

```bash
curl -fsSL https://unsloth.ai/install.sh | sh
```

The environment and required packages will now be installed. Type `Y` and press Enter when prompted to continue. After setup finishes, the server will be available locally on port `8888`.

<figure><img src="/files/kAxiYilqsmP233htYNpi" alt="" width="375"><figcaption></figcaption></figure>

{% hint style="info" %}
If you skipped starting the app during installation, you can launch it later with `unsloth studio -p 8888`. To allow connections from other devices on your network, use `unsloth studio -H 0.0.0.0 -p 8888` instead.
{% endhint %}

#### Step 2: Start Unsloth

Open your browser of choice and type `http://127.0.0.1:8888`  in the URL box. If this is your first time installing Unsloth, you will be forwarded to the Password page where you will need to create a new password. You should then see the Chat Page as shown below.

<figure><img src="/files/ryuI6lvessgKynLGfv1K" alt="" width="375"><figcaption></figcaption></figure>
{% endtab %}

{% tab title="Windows" %}

#### Step 1: Setup Unsloth

Open the Start Menu, search for `PowerShell`, and launch it. Copy & enter the install command:

```powershell
irm https://unsloth.ai/install.ps1 | iex
```

it will begin installing automatically. After installation finishes, PowerShell will ask if you want to start Unsloth Studi&#x6F;**.**

<figure><img src="/files/kAxiYilqsmP233htYNpi" alt="" width="375"><figcaption></figcaption></figure>

You can also launch it with the following command:

```bash
unsloth studio -H 0.0.0.0 -p 8888
```

{% hint style="info" %}
If you would like to have your instance accessible by clients outside of your PC/computer.\
Add `-H 0.0.0.0` to the `unsloth studio` command.
{% endhint %}

#### Step 2: Start Unsloth

Open `http://127.0.0.1:8888` in your browser. On first launch, create a new password to continue to the Chat page. **Unsloth Studio** is now installed and ready to use.

<figure><img src="/files/ryuI6lvessgKynLGfv1K" alt="" width="375"><figcaption></figcaption></figure>
{% endtab %}

{% tab title="Linux, WSL" %}

#### Step 1: Setup Unsloth

{% tabs %}
{% tab title="Linux" %}
Open your terminal application. You can launch it by pressing `Ctrl + Alt + T`, or by searching for `Terminal` in your system's application menu.
{% endtab %}

{% tab title="WSL" %}
Click the Windows Start Menu, type the name of your installed distro (e.g. `Ubuntu`), then open it.

{% hint style="warning" %}
On **WSL**, make sure your **NVIDIA drivers** are installed on **Windows** (not inside WSL) and that the **CUDA toolkit** is installed inside your WSL distro. See the System Requirements below for details.
{% endhint %}
{% endtab %}
{% endtabs %}

To install, copy and run the install command:

```bash
curl -fsSL https://unsloth.ai/install.sh | sh
```

Then:

1. Click inside the terminal window
2. Paste the command with `Ctrl + Shift + V`
3. Press `Enter`

Unsloth will start setting up the environment and installing the required packages as shown below. Type **Y** and Press `Enter` when asked if you want to allow Studio to start now. This will start Unsloth on your local **8888** port.

<figure><img src="/files/uQP4sGPAd6C4MBSFdUTm" alt=""><figcaption></figcaption></figure>

{% hint style="info" %}
If you chose not to start Unsloth during the installation process, you can always start the Unsloth app using `unsloth studio -p 8888` . If you would like to have your Unsloth instance accessible by clients outside of your PC/computer, add `-H 0.0.0.0` to the `unsloth studio` command.
{% endhint %}

#### Step 2: Start Unsloth

Open your browser of choice and type `http://127.0.0.1:8888`  in the URL box. If this is your first time installing Unsloth, you will be forwarded to the Password page where you will need to create a new password. After, Unsloth should now open on the Chat Page as shown below.

<figure><img src="/files/CresfHYJ3aP1rTTlj3YF" alt="" width="375"><figcaption></figcaption></figure>
{% endtab %}
{% endtabs %}
{% endstep %}

{% step %}

#### Configure Connections

Next, connect your provider to Unsloth.

1. Open **Settings** → **Connections**, then click **Add Provider.**
2. Select the provider you want to add, then paste the API key you copied earlier.
3. Click **Reload Models** to refresh the list with models available to your account.
4. Choose the models you want to enable, then hit save.&#x20;

<figure><img src="/files/GaQDk4hQbPpOhVAIBKNJ" alt=""><figcaption></figcaption></figure>
{% endstep %}

{% step %}

#### Ready to Chat

The models you enabled will now appear under **External** in the **Select Model** dropdown.

Unsloth dynamically exposes compatible reasoning levels and generation controls for different models.

<div data-with-frame="true"><figure><img src="/files/LuODDubUnxNX700szMOk" alt="" width="373"><figcaption></figcaption></figure></div>
{% endstep %}
{% endstepper %}

### Connect a Model Server

Use this flow for [**llama.cpp**](/docs/integrations/connections/connect-llama.cpp-to-unsloth-run-ggufs-with-llama-server.md), [**vLLM**](/docs/integrations/connections/vllm.md), and [**Ollama**](/docs/integrations/connections/ollama.md). Start or locate the server you want to connect. &#x20;

{% tabs %}
{% tab title="llama.cpp " %}
Start `llama-server` with the model you want to serve:

```bash
llama-server \
  --model /path/to/model.gguf \
  --host 0.0.0.0 \
  --port 8080
```

This exposes an API endpoint at: `http://localhost:8080/v1`

To require an API key, add:

```bash
--api-key 1234-myapi-key
```

{% endtab %}

{% tab title="vLLM" %}
Start the `vLLM` server with the model you want to serve:

```bash
  vllm serve unsloth/gemma-4-26B-A4B-it \
  --dtype auto \
```

To require an API key, add:

```bash
  --api-key token-abc123
```

This exposes an API endpoint at: `http://localhost:8000/v1`
{% endtab %}

{% tab title="Ollama" %}
Start `Ollama`, then pull the model you want to use:

```bash
ollama serve
ollama pull qwen3:14b
```

This exposes an API endpoint at: `http://localhost:11434/v1`
{% endtab %}
{% endtabs %}

Now we can connect the model server.&#x20;

Open **Settings → Connections**, then click **Add Provider**.

Select llama.cpp, vLLM, or Ollama then Paste the server **Base URL**.

<div data-with-frame="true"><figure><img src="/files/rcIydWpY9PZFkFP0tbBc" alt="" width="563"><figcaption></figcaption></figure></div>

* llama.cpp example: `http://localhost:8080/v1`
* Ollama example: `http://localhost:11434/v1`

Click **Load Models** to fetch available model IDs, or enter model IDs manually if your server does not expose `/models`.

Then, after you click **Add Provider,** The models you enabled will now appear under **External** in the **Select Model** dropdown.

### Web Search & Thinking

Provider-side web search is available for supported models from OpenAI, Anthropic, OpenRouter, Mistral, Gemini, and Kimi.

<div data-with-frame="true"><figure><img src="/files/9hxbjfC8Cjif2CPpcAlG" alt="" width="563"><figcaption></figcaption></figure></div>

The Think control adapts to the selected model: some models use an on/off toggle, while reasoning-effort models use model specific thinking levels.&#x20;

### Code Execution

When enabled, supported OpenAI and Anthropic models can run code in a provider sandbox to solve problems, analyse data, and work with files.\
\
Anthropic models use Claude’s provider-side Code execution tool.

<div data-with-frame="true"><figure><img src="/files/u855tCuYk5dXEqW74c2y" alt="" width="563"><figcaption></figcaption></figure></div>

OpenAI uses reusable containers, which you can create, delete, and select from **Code Execution** settings.

Select the same container in a new thread to continue with its files and state.

### Prompt Caching

Prompt caching reduces latency and cost when requests reuse the same long prefix. It is supported for compatible providers and servers, including OpenAI, Anthropic, and llama.cpp.

<figure><img src="/files/rd7uqDkUz6YnddW01aRl" alt=""><figcaption></figcaption></figure>

Use the **Prompt caching** setting in the side panel to control caching behaviour for supported connections.

For llama.cpp, prompt caching is enabled by default and can be disabled when starting `llama-server` with:

```bash
--no-cache-prompt
```

### Troubleshooting

If a provider fails to connect, check that the API key belongs to the selected provider and has access to the model you chose.

If a model does not appear after clicking **Reload Models**, it may not be available for your account. You can still use Unsloth’s default model list or choose another model.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://unsloth.ai/docs/integrations/connections.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
