Connect API Providers & Model Servers to Unsloth

Guide to connect OpenAI, Anthropic, Ollama, llama.cpp, vLLM and other providers to Unsloth. Add API keys or model server URLs, load models, and use external models in chat.

Learn how to run models from Ollama, llama.cpp, vLLM, OpenAI, Anthropic, and other providers through a single local UI interface with Unsloth, an open-source repo for running and training LLMs.

Once connected, you can run models with tool-calling, thinking, and other features in the same Unsloth chat interface used for both local and cloud models.

Providers

Connections fall into two groups: hosted API providers that run models for you, and model servers that you run or control.

Cloud Providers - Hosted APIs that use an account API key:

Connection
Capabilities
Setup guide

OpenAI

Search, code, thinking

Anthropic

Search, code, thinking

OpenRouter

Many hosted models through one API key.

Model Servers - Inference servers running locally, on your network, or on your remote machine:

Server
Description
Guide

Llama.cpp

Efficient GGUF model serving

vLLM

High-throughput serving

Ollama

Simple local model server

Quickstart

To run an external provider's model, add an API key and select which models Unsloth should show. In this example, we’ll use OpenAI. The same setup works for Anthropic, and other providers.

1

Create API

Create a new API key from the provider’s dashboard and copy it.

2

Setup Unsloth Studio

Now we will need to install and setup Unsloth, which will enable you to run the cloud models in a UI interface. See here for more detailed instructions.

Step 1: Setup Unsloth

Launch the terminal from Mac, then install Unsloth by entering the command below.

curl -fsSL https://unsloth.ai/install.sh | sh

The environment and required packages will now be installed. Type Y and press Enter when prompted to continue. After setup finishes, the server will be available locally on port 8888.

If you skipped starting the app during installation, you can launch it later with unsloth studio -p 8888. To allow connections from other devices on your network, use unsloth studio -H 0.0.0.0 -p 8888 instead.

Step 2: Start Unsloth

Open your browser of choice and type http://127.0.0.1:8888 in the URL box. If this is your first time installing Unsloth, you will be forwarded to the Password page where you will need to create a new password. You should then see the Chat Page as shown below.

3

Configure Connections

Next, connect your provider to Unsloth.

  1. Open SettingsConnections, then click Add Provider.

  2. Select the provider you want to add, then paste the API key you copied earlier.

  3. Click Reload Models to refresh the list with models available to your account.

  4. Choose the models you want to enable, then hit save.

4

Ready to Chat

The models you enabled will now appear under External in the Select Model dropdown.

Unsloth dynamically exposes compatible reasoning levels and generation controls for different models.

Connect a Model Server

Use this flow for llama.cpp, vLLM, and Ollama.

Start or locate the server you want to connect.

Start llama-server with the model you want to serve:

This exposes an API endpoint at: http://localhost:8080/v1

To require an API key, add:

Now we can connect the model server.

Open Settings → Connections, then click Add Provider.

Select llama.cpp, vLLM, or Ollama then Paste the server Base URL.

  • llama.cpp example: http://localhost:8080/v1

  • Ollama example: http://localhost:11434/v1

Click Load Models to fetch available model IDs, or enter model IDs manually if your server does not expose /models.

Then, after you click Add Provider, The models you enabled will now appear under External in the Select Model dropdown.

Web Search & Thinking

Provider-side web search is available for supported models from OpenAI, Anthropic, OpenRouter, Mistral, Gemini, and Kimi.

The Think control adapts to the selected model: some models use an on/off toggle, while reasoning-effort models use model specific thinking levels.

Code Execution

When enabled, supported OpenAI and Anthropic models can run code in a provider sandbox to solve problems, analyse data, and work with files. Anthropic models use Claude’s provider-side Code execution tool.

OpenAI uses reusable containers, which you can create, delete, and select from Code Execution settings.

Select the same container in a new thread to continue with its files and state.

Prompt Caching

Prompt caching reduces latency and cost when requests reuse the same long prefix. It is supported for compatible providers and servers, including OpenAI, Anthropic, and llama.cpp.

Use the Prompt caching setting in the side panel to control caching behaviour for supported connections.

For llama.cpp, prompt caching is enabled by default and can be disabled when starting llama-server with:

Troubleshooting

If a provider fails to connect, check that the API key belongs to the selected provider and has access to the model you chose.

If a model does not appear after clicking Reload Models, it may not be available for your account. You can still use Unsloth’s default model list or choose another model.

Last updated

Was this helpful?