# How to Use MCP Servers with Local LLMs

This step-by-step guide shows you how to connect **Model Context Protocol (MCP)** servers to local LLMs like [Qwen](/docs/models/qwen3.6.md) or [Gemma](/docs/models/gemma-4.md), so any model you run can call external tools and services via MCP. Connecting MCP to a local model lets it securely use your local files, apps, databases, and tools instead of only chatting from memory, to build a more useful, private, and interchangeable AI assistant that can act on your real environment.

We'll use the open-source repos [Unsloth](https://github.com/unslothai/unsloth) and [llama.cpp](#llama.cpp-guide) as they are popular frameworks for local model inference/deployment. MCP works for local GGUF models and cloud [provider models](/docs/integrations/connections.md). **We'll also show how** [**multiple MCP Servers**](#using-multiple-mcp-servers) **can be utilized.**

MCP tools work alongside other model capabilities such as [code execution](/docs/new/studio/chat.md#code-execution) and [web search](/docs/new/studio/chat.md#advanced-web-search), so a single model can search the web, run code, and call your connected services in the same thread.

### Use Cases

Once an MCP server is connected, you can ask your local model to do many automated tasks. A few examples:

* **Search docs:** “Find the relevant docs and summarize the setup steps.” - Context7 can be used.
* **Analyze a codebase:** “Map out this repo and explain where authentication, billing, and data access happen.” - The GitHub official MCP and GitMCP can be used to analyze repos.
* **Search the web with embeddings** - Exa's MCP Server can be used for semantic web searches with contextual embedding support.
* **Debug websites UIs** - The Playwright and Chrome DevTools MCP Servers can be used to drive websites to find fixes to issues.

### Quickstart

We'll be using two ways to connect your local model on your device to MCP Servers. Both use open-source packages: [Unsloth](https://github.com/unslothai/unsloth) and [llama.cpp](llama.cpphttps://github.com/ggml-org/llama.cpp) to run, serve and deploy your model.

<a href="/pages/txFFdu3I50OSJ0ZpPo0Z#unsloth-guide" class="button primary">Unsloth MCP Guide</a><a href="/pages/txFFdu3I50OSJ0ZpPo0Z#llama.cpp-guide" class="button primary">Llama.cpp MCP Guide</a>

### 🦥 Unsloth Guide

In this example we'll use Unsloth to connect any local model like [Qwen3.6](/docs/models/qwen3.6.md) or [Gemma 4](/docs/models/gemma-4.md) with the MCP Servers: [Vercel](https://mcp.vercel.com), [Context7](https://context7.com/), [Exa](https://exa.ai/) and [Hugging Face](https://huggingface.co/docs/hub/en/agents-mcp). We'll then ask a model what it can do with it. The same steps work for any MCP server.

{% stepper %}
{% step %}

#### Setup Unsloth Studio

Now we will need to install and setup [Unsloth](/docs/new/studio.md), which will enable you to run the cloud models in a UI interface. [See here](/docs/new/studio/install.md) for more detailed instructions.

{% tabs %}
{% tab title="MacOS" %}

#### Step 1: Setup Unsloth

Launch the `terminal` from Mac, then install Unsloth by entering the command below.

```bash
curl -fsSL https://unsloth.ai/install.sh | sh
```

The environment and required packages will now be installed. Type `Y` and press Enter when prompted to continue. After setup finishes, the server will be available locally on port `8888`.

<figure><img src="/files/kAxiYilqsmP233htYNpi" alt="" width="375"><figcaption></figcaption></figure>

{% hint style="info" %}
If you skipped starting the app during installation, you can launch it later with `unsloth studio -p 8888`. To allow connections from other devices on your network, use `unsloth studio -H 0.0.0.0 -p 8888` instead.
{% endhint %}

#### Step 2: Start Unsloth

Open your browser of choice and type `http://127.0.0.1:8888`  in the URL box. If this is your first time installing Unsloth, you will be forwarded to the Password page where you will need to create a new password. You should then see the Chat Page as shown below.
{% endtab %}

{% tab title="Windows" %}

#### Step 1: Setup Unsloth

Open the Start Menu, search for `PowerShell`, and launch it. Copy & enter the install command:

```powershell
irm https://unsloth.ai/install.ps1 | iex
```

it will begin installing automatically. After installation finishes, PowerShell will ask if you want to start Unsloth Studi&#x6F;**.**

<figure><img src="/files/kAxiYilqsmP233htYNpi" alt="" width="375"><figcaption></figcaption></figure>

You can also launch it with the following command:

```bash
unsloth studio -H 0.0.0.0 -p 8888
```

{% hint style="info" %}
If you would like to have your instance accessible by clients outside of your PC/computer.\
Add `-H 0.0.0.0` to the `unsloth studio` command.
{% endhint %}

#### Step 2: Start Unsloth

Open `http://127.0.0.1:8888` in your browser. On first launch, create a new password to continue to the Chat page. **Unsloth Studio** is now installed and ready to use.
{% endtab %}

{% tab title="Linux, WSL" %}

#### Step 1: Setup Unsloth

{% tabs %}
{% tab title="Linux" %}
Open your terminal application. You can launch it by pressing `Ctrl + Alt + T`, or by searching for `Terminal` in your system's application menu.
{% endtab %}

{% tab title="WSL" %}
Click the Windows Start Menu, type the name of your installed distro (e.g. `Ubuntu`), then open it.

{% hint style="warning" %}
On **WSL**, make sure your **NVIDIA drivers** are installed on **Windows** (not inside WSL) and that the **CUDA toolkit** is installed inside your WSL distro. See the System Requirements below for details.
{% endhint %}
{% endtab %}
{% endtabs %}

To install, copy and run the install command:

```bash
curl -fsSL https://unsloth.ai/install.sh | sh
```

Then:

1. Click inside the terminal window
2. Paste the command with `Ctrl + Shift + V`
3. Press `Enter`

Unsloth will start setting up the environment and installing the required packages as shown below. Type **Y** and Press `Enter` when asked if you want to allow Studio to start now. This will start Unsloth on your local **8888** port.

<figure><img src="/files/uQP4sGPAd6C4MBSFdUTm" alt=""><figcaption></figcaption></figure>

{% hint style="info" %}
If you chose not to start Unsloth during the installation process, you can always start the Unsloth app using `unsloth studio -p 8888` . If you would like to have your Unsloth instance accessible by clients outside of your PC/computer, add `-H 0.0.0.0` to the `unsloth studio` command.
{% endhint %}

#### Step 2: Start Unsloth

Open your browser of choice and type `http://127.0.0.1:8888`  in the URL box. If this is your first time installing Unsloth, you will be forwarded to the Password page where you will need to create a new password. After, Unsloth should now open on the Chat Page as shown below.
{% endtab %}
{% endtabs %}
{% endstep %}

{% step %}

#### **Enable MCP**

Click on "MCP" in the chat toolbar.

<figure><img src="/files/3IMplzUQRhaBXBubKhPF" alt=""><figcaption></figcaption></figure>

Unsloth Studio by default has MCP support for Context7, Exa and Hugging Face. Turning on Exa [web search](/docs/new/studio/chat.md#advanced-web-search) will disable the default search tool which we have.
{% endstep %}

{% step %}

#### **Adding custom MCP servers**

To add the Vercel MCP server, click on "Add custom MCP", and you will get a pop-up:

<figure><img src="/files/LuT0jn405M75LDwWNsUl" alt=""><figcaption></figcaption></figure>

Fill in the server details:

1. **Display name**: a friendly label, e.g. `Vercel`.
2. **URL**: the server's base endpoint, e.g. `https://mcp.vercel.com`.
3. Choose an authentication method below.

{% tabs %}
{% tab title="OAuth sign-in" %}
For servers that require browser-based authentication (GitHub, Linear, Vercel, etc.), turn on **Use OAuth sign-in**. A browser window will open on first connect so you can authorize Unsloth.
{% endtab %}

{% tab title="Custom header" %}
For servers that authenticate with a token, leave OAuth off and click **Add header** under **Custom headers**. Add an `Authorization` header with your token:

```
Authorization: Bearer <your-token>
```

{% endtab %}
{% endtabs %}

<figure><img src="/files/ypHq3qstANJwVCFci8e5" alt="" width="563"><figcaption></figcaption></figure>
{% endstep %}

{% step %}

#### **Test & add**

Click **Test connection** to confirm Unsloth can reach the server. Once it succeeds, click **Add server** to save it.

If **Test connection** fails, check that the URL is the server's base endpoint (not a docs page) and that your authentication method is correct. See Troubleshooting below.

<figure><img src="/files/9g2gk2UgAZAFp8Oqfw9D" alt="" width="563"><figcaption></figcaption></figure>
{% endstep %}

{% step %}

#### **Verify tools loaded**

The server now appears in the MCP Servers list. Unsloth fetches its tools automatically and shows a confirmation, e.g. *Refreshed "Vercel" (18 tools)*.

Each server has controls to **toggle** it on/off, **refresh** its tools, **edit** it, or **delete** it. Make sure both the server's toggle and the **Use MCP Servers** master toggle are on, then close the dialog.

It'll be highlighted if enabled. You can also disable them by simply clicking on them again.

<figure><img src="/files/uTXK1w9siZsaQZgBRtFe" alt=""><figcaption></figcaption></figure>
{% endstep %}

{% step %}

#### **Use it in chat**

Pick any model from the **Select model** dropdown and start chatting. The model can now call the server's tools on its own when your request calls for it.

Above, a local `gemma-4-E2B-it-GGUF` was asked *"Can you use Vercel MCP server?"* and reported the actions it can take: managing projects, analysing logs, listing teams, generating access links, checking domains, and searching Vercel's documentation.

<figure><img src="/files/15xRAK3yz6zYoODaKsdf" alt=""><figcaption></figcaption></figure>
{% endstep %}
{% endstepper %}

#### Using multiple MCP servers

How about calling multiple MCP servers like 3? We'll use Unsloth Studio's default Exa, Context7 & Hugging Face provided MCP servers and enable all 3.

If you ask "Can Unsloth support Qwen finetuning", Exa will provide great details on it:

<figure><img src="/files/TCzCDw6SVZ9nvvFnR5xO" alt=""><figcaption></figcaption></figure>

Then as a follow up "Search Unsloth docs on how to do this", and Context7 is used for docs:

<figure><img src="/files/JsDySnc2Tzv3oJlgJsEZ" alt=""><figcaption></figcaption></figure>

Then type "Search Hugging Face for unsloth/Qwen models", and Hugging Face's MCP Server will be called:

<figure><img src="/files/nIM0obtKckwz0uBD6I8J" alt=""><figcaption></figcaption></figure>

#### Another Specific Use case Example

Once a server is connected, ask the model to do real work in plain language. A few examples with the Vercel server:

* **Debug a failing build**: *"Pull the build logs for my latest deployment and tell me why it failed."*
* **Check deployment status**: *"List my most recent deployments and their status."*
* **Search the docs**: *"Search the Vercel docs for how to set up a custom domain."*
* **Domain research**: *"Is `myproject.dev` available, and how much would it cost?"*

### 🦙 Llama.cpp Guide

{% stepper %}
{% step %}

#### Install or build llama.cpp

**macOS:**

```bash
brew install llama.cpp
```

Or for **Linux, Windows, WSL** build from source:

```bash
git clone https://github.com/ggml-org/llama.cpp
cmake -S llama.cpp -B llama.cpp/build -DCMAKE_BUILD_TYPE=Release
cmake --build llama.cpp/build --config Release -j --target llama-server llama-cli
```

{% endstep %}

{% step %}

#### Start llama-server with a GGUF

We are using [Gemma 4](/docs/models/gemma-4.md) E4B GGUF in this example:

```bash
llama-server \
  -hf unsloth/gemma-4-E4B-it-GGUF:UD-Q4_K_XL \
  --alias local \
  --host 127.0.0.1 \
  --port 8080 \
  --no-ui \
  --temp 1.0 \
  --top-p 0.95 \
  --top-k 64 \
  --chat-template-kwargs '{"enable_thinking":false}'
```

For the bigger Gemma 4 26B-A4B model:

```bash
llama-server \
  -hf unsloth/gemma-4-26B-A4B-it-GGUF:UD-Q4_K_M \
  --alias local \
  --host 127.0.0.1 \
  --port 8080 \
  --no-ui \
  --temp 1.0 \
  --top-p 0.95 \
  --top-k 64 \
  --chat-template-kwargs '{"enable_thinking":false}'
```

For more inference parameter adjustments, see our [Gemma 4 guide](/docs/models/gemma-4.md).

You can test the server:

```bash
curl http://127.0.0.1:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer none" \
  -d '{
    "model": "local",
    "messages": [{"role": "user", "content": "Say hello in one sentence."}]
  }'
```

{% endstep %}

{% step %}

#### Create an MCP filesystem sandbox

```bash
mkdir -p ~/mcp-workspace
cd ~/mcp-workspace
pwd
```

Copy the absolute path. Create a separate project folder for your MCP host:

```bash
mkdir -p ~/llama-mcp
cd ~/llama-mcp
```

Create `server_config.json`:

{% code expandable="true" %}

```json
{
  "mcpServers": {
    "filesystem": {
      "command": "npx",
      "args": [
        "-y",
        "@modelcontextprotocol/server-filesystem",
        "/ABSOLUTE/PATH/TO/mcp-workspace"
      ],
      "env": {}
    }
  }
}
```

{% endcode %}

Replace the path with your real workspace path. IBM’s `mcp-cli` docs use this same `server_config.json` shape and the same `npx -y @modelcontextprotocol/server-filesystem /path/to/allowed/files` filesystem configuration.
{% endstep %}

{% step %}

#### Run a terminal MCP host against llama.cpp

Use IBM’s `mcp-cli`; it is a command-line MCP client/host with chat mode, tool discovery, and custom OpenAI-compatible provider support. Its docs recommend `uvx mcp-cli --help`, support project `server_config.json`, and support runtime custom OpenAI-compatible providers via `--api-base` and `--api-key`.

Run:

```bash
uvx mcp-cli \
  --provider llamacpp \
  --api-base http://127.0.0.1:8080/v1 \
  --api-key none \
  --model local \
  --server filesystem \
  --config-file server_config.json
```

Then try:

```
List the files in the filesystem workspace.
```

Then:

```
Create hello.txt with a one-line greeting, then read it back.
```

`mcp-cli` has tool-call confirmation enabled by default, so you should see prompts before tool execution.

**Full code example:**

{% code expandable="true" %}

```bash
# terminal 1
llama-server -hf unsloth/gemma-4-E4B-it-GGUF:UD-Q4_K_XL \
  --alias local --host 127.0.0.1 --port 8080 --no-ui \
  --temp 1.0 --top-p 0.95 --top-k 64 \
  --chat-template-kwargs '{"enable_thinking":false}'

# terminal 2, in the folder with server_config.json
uvx mcp-cli \
  --provider llamacpp \
  --api-base http://127.0.0.1:8080/v1 \
  --api-key none \
  --model local \
  --server filesystem \
  --config-file server_config.json
```

{% endcode %}
{% endstep %}
{% endstepper %}

### Troubleshooting

If a server fails to connect or its tools don't appear, check the URL is the server's base endpoint (e.g. `https://mcp.vercel.com`), not a docs or dashboard page. For OAuth servers, complete the browser sign-in when it opens; for token-based servers, verify the `Authorization` header and that the token is valid.

Click **Refresh** if tools don't show up after connecting, and make sure both the individual server toggle and the **Use MCP Servers** master toggle are on.

### Security notes

Only connect MCP servers you trust. Review requested permissions and keep human confirmation enabled for actions that read private data, change deployments, purchase domains, or modify projects. Be especially careful when combining MCP servers with web search or other tools, because prompt-injected content can try to trigger unwanted tool calls.

### Popular MCP Servers

Here is a list of some popular and useful MCP Servers you can connect to:

| MCP server                       | Best for                                 | Why it’s useful                                                                                                                                                                                                  |
| -------------------------------- | ---------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| **GitHub MCP**                   | Repos, issues, PRs, code search, Actions | Official GitHub MCP with remote and local setup, covering repos, issues, pull requests, Actions, code security, and more. ([GitHub](https://github.com/github/github-mcp-server))                                |
| **Context7**                     | Fresh library docs and examples          | Pulls current, version-specific docs for coding assistants instead of relying on stale training data. Uses `https://mcp.context7.com/mcp`. ([GitHub](https://github.com/upstash/context7))                       |
| **Notion MCP**                   | Docs, notes, tasks, project knowledge    | Good for teams using Notion for specs, PRDs, roadmaps, or notes. Hosted MCP can read and write workspace content. ([Notion Developers](https://developers.notion.com/guides/mcp/overview))                       |
| **Slack MCP**                    | Team conversation search                 | Lets AI tools query Slack messages, channels, files, threads, and member info; actions depend on permissions. ([Slack](https://slack.com/help/articles/48855576908307-Guide-to-the-Slack-MCP-server))            |
| **Linear MCP**                   | Issues, projects, product workflows      | Official remote MCP for finding, creating, and updating Linear issues, projects, comments, and related objects. ([Linear](https://linear.app/changelog/2025-05-01-mcp))                                          |
| **Vercel MCP**                   | Deployments, logs, docs, domains         | Useful for frontend/web workflows: inspect deployments, logs, docs, and project context. Check client compatibility first. ([Vercel](https://vercel.com/docs/agent-resources/vercel-mcp?utm_source=chatgpt.com)) |
| **Sentry MCP**                   | Production debugging                     | Helps agents inspect Sentry issues, traces, errors, and performance data in human-in-the-loop dev workflows. ([GitHub](https://github.com/getsentry/sentry-mcp))                                                 |
| **Filesystem / local files MCP** | Local project files                      | Useful for local LLM setups, usually via `stdio`. Reference servers are best treated as examples or starting points. ([GitHub](https://github.com/modelcontextprotocol/servers))                                 |


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://unsloth.ai/docs/basics/mcp.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
