Connect API Providers & Model Servers to Unsloth
Guide to connect OpenAI, Anthropic, Ollama, llama.cpp, vLLM and other providers to Unsloth. Add API keys or model server URLs, load models, and use external models in chat.
Learn how to run models from Ollama, llama.cpp, vLLM, OpenAI, Anthropic, and other providers through a single local UI interface with Unsloth, an open-source repo for running and training LLMs.
Once connected, you can run models with tool-calling, thinking, and other features in the same Unsloth chat interface used for both local and cloud models.
Providers
Connections fall into two groups: hosted API providers that run models for you, and model servers that you run or control.
Cloud Providers - Hosted APIs that use an account API key:
Model Servers - Inference servers running locally, on your network, or on your remote machine:
Quickstart
To run an external provider's model, add an API key and select which models Unsloth should show. In this example, we’ll use OpenAI. The same setup works for Anthropic, and other providers.
Setup Unsloth Studio
Now we will need to install and setup Unsloth, which will enable you to run the cloud models in a UI interface. See here for more detailed instructions.
Step 1: Setup Unsloth
Launch the terminal from Mac, then install Unsloth by entering the command below.
curl -fsSL https://unsloth.ai/install.sh | shThe environment and required packages will now be installed. Type Y and press Enter when prompted to continue. After setup finishes, the server will be available locally on port 8888.

If you skipped starting the app during installation, you can launch it later with unsloth studio -p 8888. To allow connections from other devices on your network, use unsloth studio -H 0.0.0.0 -p 8888 instead.
Step 2: Start Unsloth
Open your browser of choice and type http://127.0.0.1:8888 in the URL box. If this is your first time installing Unsloth, you will be forwarded to the Password page where you will need to create a new password. You should then see the Chat Page as shown below.

Step 1: Setup Unsloth
Open the Start Menu, search for PowerShell, and launch it. Copy & enter the install command:
it will begin installing automatically. After installation finishes, PowerShell will ask if you want to start Unsloth Studio.

You can also launch it with the following command:
If you would like to have your instance accessible by clients outside of your PC/computer.
Add -H 0.0.0.0 to the unsloth studio command.
Step 2: Start Unsloth
Open http://127.0.0.1:8888 in your browser. On first launch, create a new password to continue to the Chat page. Unsloth Studio is now installed and ready to use.

Step 1: Setup Unsloth
Open your terminal application. You can launch it by pressing Ctrl + Alt + T, or by searching for Terminal in your system's application menu.
Click the Windows Start Menu, type the name of your installed distro (e.g. Ubuntu), then open it.
On WSL, make sure your NVIDIA drivers are installed on Windows (not inside WSL) and that the CUDA toolkit is installed inside your WSL distro. See the System Requirements below for details.
To install, copy and run the install command:
Then:
Click inside the terminal window
Paste the command with
Ctrl + Shift + VPress
Enter
Unsloth will start setting up the environment and installing the required packages as shown below. Type Y and Press Enter when asked if you want to allow Studio to start now. This will start Unsloth on your local 8888 port.

If you chose not to start Unsloth during the installation process, you can always start the Unsloth app using unsloth studio -p 8888 . If you would like to have your Unsloth instance accessible by clients outside of your PC/computer, add -H 0.0.0.0 to the unsloth studio command.
Step 2: Start Unsloth
Open your browser of choice and type http://127.0.0.1:8888 in the URL box. If this is your first time installing Unsloth, you will be forwarded to the Password page where you will need to create a new password. After, Unsloth should now open on the Chat Page as shown below.

Configure Connections
Next, connect your provider to Unsloth.
Open Settings → Connections, then click Add Provider.
Select the provider you want to add, then paste the API key you copied earlier.
Click Reload Models to refresh the list with models available to your account.
Choose the models you want to enable, then hit save.

Connect a Model Server
Use this flow for llama.cpp, vLLM, and Ollama.
Start or locate the server you want to connect.
Start llama-server with the model you want to serve:
This exposes an API endpoint at: http://localhost:8080/v1
To require an API key, add:
Start the vLLM server with the model you want to serve:
To require an API key, add:
This exposes an API endpoint at: http://localhost:8000/v1
Start Ollama, then pull the model you want to use:
This exposes an API endpoint at: http://localhost:11434/v1
Now we can connect the model server.
Open Settings → Connections, then click Add Provider.
Select llama.cpp, vLLM, or Ollama then Paste the server Base URL.

llama.cpp example:
http://localhost:8080/v1Ollama example:
http://localhost:11434/v1
Click Load Models to fetch available model IDs, or enter model IDs manually if your server does not expose /models.
Then, after you click Add Provider, The models you enabled will now appear under External in the Select Model dropdown.
Web Search & Thinking
Provider-side web search is available for supported models from OpenAI, Anthropic, OpenRouter, Mistral, Gemini, and Kimi.

The Think control adapts to the selected model: some models use an on/off toggle, while reasoning-effort models use model specific thinking levels.
Code Execution
When enabled, supported OpenAI and Anthropic models can run code in a provider sandbox to solve problems, analyse data, and work with files. Anthropic models use Claude’s provider-side Code execution tool.

OpenAI uses reusable containers, which you can create, delete, and select from Code Execution settings.
Select the same container in a new thread to continue with its files and state.
Prompt Caching
Prompt caching reduces latency and cost when requests reuse the same long prefix. It is supported for compatible providers and servers, including OpenAI, Anthropic, and llama.cpp.

Use the Prompt caching setting in the side panel to control caching behaviour for supported connections.
For llama.cpp, prompt caching is enabled by default and can be disabled when starting llama-server with:
Troubleshooting
If a provider fails to connect, check that the API key belongs to the selected provider and has access to the model you chose.
If a model does not appear after clicking Reload Models, it may not be available for your account. You can still use Unsloth’s default model list or choose another model.
Last updated
Was this helpful?



