> For the complete documentation index, see [llms.txt](https://unsloth.ai/docs/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://unsloth.ai/docs/zh/ji-chu-zhi-shi/claude-code.md). # 如何使用 Claude Code 运行本地 LLM 这份分步指南展示了如何将开源 LLM 和 API 完全本地连接到 Claude Code，并附带截图。可使用任意开源模型运行，例如 Qwen3.6、DeepSeek 和 Gemma。在本教程中，我们将使用以下开源模型： [Gemma 4](/docs/zh/mo-xing/gemma-4.md) 和 [Qwen3.5](/docs/zh/mo-xing/qwen3.5.md) 它们是强大的智能体与编程模型（可在 24GB 内存/统一内存设备上运行）。用于推理时，我们将使用 [Unsloth Studio](https://github.com/unslothai/unsloth) 和 [`llama.cpp`](https://github.com/ggml-org/llama.cpp) 可让你在 macOS、Linux 和 Windows 上运行/提供 LLM 服务。你可以替换为 [任何其他模型](/docs/zh/mo-xing/tutorials.md)，只需在脚本中更新模型名称即可。 Claude Code 设置 📖 本地模型设置教程对于模型量化，我们将使用 Unsloth [动态 GGUF](/docs/zh/ji-chu-zhi-shi/unsloth-dynamic-2.0-ggufs.md) 来运行任意已量化的 LLM，同时尽可能保留准确度。 ## Claude Code 设置在设置本地 LLM 之前，我们需要先安装 Claude Code。Claude Code 是一个基于终端的编程智能体，它能理解你的代码库，并使用自然语言处理复杂的 Git 工作流。 {% tabs %} {% tab title="macOS、Linux、WSL" %} #### **安装 Claude Code：** 将以下内容粘贴到终端中以安装 Claude Code： ```bash curl -fsSL https://claude.ai/install.sh | bash ``` 安装完成后，进入你的项目文件夹。然后输入 `claude` 到 `shell` 中即可开始。 ```bash cd ~/projects/my-project claude ``` {% endtab %} {% tab title="Windows" %} #### **安装 Claude Code：** 进入 `PowerShell` 以安装 Claude Code： ```powershell irm https://claude.ai/install.ps1 | iex ``` 安装完成后，进入你的项目文件夹。然后输入 `claude` 到 `powershell` 中即可开始。

cd /path/to/your/project
claude

{% endtab %} {% endtabs %} ### :detective:修复 Claude Code 中慢 90% 的推理速度 {% hint style="warning" %} Claude Code 最近会前置并添加一个 Claude Code Attribution 标头，这会 **使 KV Cache 失效，从而让本地模型的推理速度慢 90%**. {% endhint %} 归因信息是一行前置在 **系统提示词开头** (`x-anthropic-billing-header: cc_version=...; cch=...;`）其值会在每次请求时变化，因此整个提示前缀在每一轮都会错过 KV cache。最简单的修复方法是在启动 Claude Code 时直接禁用它，这样就不需要编辑任何文件： {% code overflow="wrap" %} ```bash claude --settings '{"env":{"CLAUDE_CODE_ATTRIBUTION_HEADER":"0","CLAUDE_CODE_ENABLE_TELEMETRY":"0"}}' --model unsloth/gemma-4-26B-A4B-it-GGUF ``` {% endcode %} {% hint style="info" %} 最近的 Claude Code 版本也会遵循 `export CLAUDE_CODE_ATTRIBUTION_HEADER=0`；旧版本会忽略这个 shell 变量，所以 `--settings` 这种形式（或下面的设置文件）才是可靠选择。 {% endhint %} 要将其永久生效，请把 `CLAUDE_CODE_ATTRIBUTION_HEADER` 设为 0，放在 `"env"` 中的 `~/.claude/settings.json`里。例如执行 `cat > ~/.claude/settings.json` 然后添加下面的内容（粘贴后按回车，再按 CTRL+D 保存）。如果你之前已有 `~/.claude/settings.json` 文件，只需添加 `"CLAUDE_CODE_ATTRIBUTION_HEADER" : "0"` 到 "env" 部分即可，其余设置文件保持不变。

{
  "promptSuggestionEnabled": false,
  "env": {
    "CLAUDE_CODE_ENABLE_TELEMETRY": "0",
    "CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC": "1",
    "CLAUDE_CODE_ATTRIBUTION_HEADER" : "0"
  },
  "attribution": {
    "commit": "",
    "pr": ""
  },
  "plansDirectory" : "./plans",
  "prefersReducedMotion" : true,
  "terminalProgressBarEnabled" : false,
  "effortLevel" : "high"
}

## 📖 快速入门教程 {% columns %} {% column %} 在开始之前，我们首先需要完成你将要使用的特定模型的设置。我们使用 [Unsloth](/docs/zh/xin/studio.md) （一个 Web UI）和 llama.cpp，这些都是用于在你的 Mac、Linux、Windows 设备上运行和提供 LLM 服务的开源框架。 Unsloth 还有独特的自我修复 [工具调用](/docs/zh/xin/studio/chat.md#auto-healing-tool-calling) 和 [网页搜索](/docs/zh/xin/studio/chat.md#code-execution) 能力。右侧示例展示了连接到 Unsloth 的 Claude Code： {% endcolumn %} {% column %}

{% endcolumn %} {% endcolumns %} 连接 Claude Code 🦥 Unsloth 教程 llama.cpp 教程 ## 🦥 Unsloth 教程在本教程中，我们将通过一个 UI 使用 [Unsloth](https://github.com/unslothai/unsloth)将本地模型提供给 Claude Code 并进行连接。Unsloth 可在 Windows、WSL、Linux 和 MacOS 上运行。 {% columns %} {% column %} * 搜索、下载， [运行 GGUF](/docs/zh/xin/studio.md#run-models-locally) 以及 safetensor 模型 * [**自愈式** 工具调用](/docs/zh/xin/studio.md#execute-code--heal-tool-calling) + **网页搜索** * [**代码执行**](/docs/zh/xin/studio.md#run-models-locally) （Python、Bash） * [自动推理](/docs/zh/xin/studio.md#model-arena) 参数选择（temp、top-p 等） * 通过 llama.cpp 实现快速 CPU + GPU 推理 * [训练 LLM](/docs/zh/xin/studio.md#no-code-training) 快 2 倍，显存占用减少 70% 安装说明见下： {% endcolumn %} {% column %}

{% endcolumn %} {% endcolumns %} {% tabs %} {% tab title="MacOS" %} #### 步骤 1：设置 Unsloth 打开 `终端` （在 Mac 上），然后输入下面的命令安装 Unsloth。 ```bash curl -fsSL https://unsloth.ai/install.sh | sh ``` Unsloth 将开始设置环境并安装所需的软件包，如下所示。输入 **Y** 并按 `回车` 当系统询问你是否要立即允许 Studio 启动时。这样会在你的本地 **8888** 端口上启动 Unsloth。

{% hint style="info" %} 如果你在安装过程中没有选择立即启动 Unsloth，随时可以使用 `unsloth studio -p 8888` 启动 Unsloth 应用。如果你希望你的 Unsloth 实例可被 PC/电脑之外的客户端访问，请添加 `-H 0.0.0.0` 到 `unsloth studio` 命令中。 {% endhint %} #### 步骤 2：启动 Unsloth 打开你常用的浏览器并输入 `http://127.0.0.1:8888` 到 URL 框中。如果这是你第一次安装 Unsloth，你会被转到密码页面，需要创建新密码。之后，Unsloth 应会打开聊天页面，如下所示。

{% endtab %} {% tab title="Windows" %} #### 步骤 1：设置 Unsloth 打开开始菜单，搜索 `PowerShell`，并启动它。复制并输入安装命令： ```powershell irm https://unsloth.ai/install.ps1 | iex ``` 它将自动开始安装。安装完成后，PowerShell 会询问你是否要启动 Unsloth Studio**.**

你也可以使用以下命令启动它： ```bash unsloth studio -H 0.0.0.0 -p 8888 ``` {% hint style="info" %} 如果你希望你的实例可被 PC/电脑之外的客户端访问。\ 添加 `-H 0.0.0.0` 到 `unsloth studio` 命令中。 {% endhint %} #### 步骤 2：启动 Unsloth 打开 `http://127.0.0.1:8888` 并在浏览器中访问。首次启动时，请创建一个新密码以继续进入聊天页面。 **Unsloth Studio** 现已安装完成并可使用。

{% endtab %} {% tab title="Linux、WSL" %} #### 步骤 1：设置 Unsloth {% tabs %} {% tab title="Linux" %} 打开你的终端应用。你可以按 `Ctrl + Alt + T`，或搜索 `终端` 来启动它。 {% endtab %} {% tab title="WSL" %} 点击 Windows 开始菜单，输入你已安装的发行版名称（例如 `Ubuntu`），然后将其打开。 {% hint style="warning" %} 在 **WSL**上，请确保你的 **NVIDIA 驱动** 已安装在 **Windows** （不是在 WSL 内）上，并且 **CUDA 工具包** 已安装在你的 WSL 发行版中。详情请参见下方系统要求。 {% endhint %} {% endtab %} {% endtabs %} 要安装，请复制并运行安装命令： ```bash curl -fsSL https://unsloth.ai/install.sh | sh ``` 然后： 1. 点击终端窗口内部 2. 使用 `Ctrl + Shift + V` 3. 按下 `回车` Unsloth 将开始设置环境并安装所需的软件包，如下所示。输入 **Y** 并按 `回车` 当系统询问你是否要立即允许 Studio 启动时。这样会在你的本地 **8888** 端口上启动 Unsloth。

{% endtab %} {% endtabs %} ### 模型加载 + API 指南 {% stepper %} {% step %} #### 选择模型在使用 API 之前，请先从聊天页面左上角的 **选择模型** 下拉菜单中加载一个模型。

在本指南中，我们将使用： `unsloth/gemma-4-26B-A4B-it-GGUF` 以及推荐的 `UD-Q4_K_XL` 量化。 {% endstep %} {% step %} #### 测试模型在使用客户端之前，请先发送一条简短消息：

{% hint style="info" %} 这可以确认模型已正确加载并准备好响应。 {% endhint %} {% endstep %} {% step %} #### **Unsloth API 密钥** 在 Studio 中，打开 **设置 → API** 以查看或创建你的 API 密钥。

请把你的 API 密钥视为密码，不要在截图或仓库中暴露它。 {% endstep %} {% endstepper %} ## ⚙️ 连接 Claude Code 现在我们已经为 Claude Code 设置好了本地 LLM，接下来配置 Claude Code 以与你的工具配合使用。你可以通过 `unsloth start` 下面的方式轻松连接，或者 [手动](#connect-manually). #### ⚡ 使用 `unsloth start` 将 Claude Code 指向本地模型的最快方法是使用 `unsloth start` 命令。Unsloth 运行且模型已加载后，请在终端中运行： ```bash unsloth start claude ``` 这会生成一个 API 密钥，设置 `ANTHROPIC_BASE_URL`, `ANTHROPIC_AUTH_TOKEN`，以及 `ANTHROPIC_MODEL` 变量，为你应用 KV-cache 修复，并使用你加载的模型启动 Claude Code。你不需要导出任何内容，也不需要手动编辑 `~/.claude/settings.json` ， `settings.json` 。默认情况下，它使用已在 Unsloth 中加载的模型。若要加载并使用特定模型，请传入 `--model`: ```bash unsloth start claude --model unsloth/gemma-4-26B-A4B-it-GGUF ``` 要连接到另一台机器上的 Unsloth？创建一个密钥（**设置 → API**）并通过以下方式传入 `--api-key`一起传入，然后将 `UNSLOTH_STUDIO_URL` 指向该服务器。 #### 🔌 手动连接如果你更愿意手动设置，可以先设置以下环境变量。这些变量默认不会在会话之间持久保存。 {% tabs %} {% tab title="MacOS、Linux、WSL" %} **配置：** 设置本地 API URL： ```bash export ANTHROPIC_BASE_URL="http://localhost:8888" ``` 从 Unsloth Studio → 设置 → API 复制你的密钥（或从你用以下命令启动时的控制台中获取 `unsloth run`，其中它会显示为 `sk-unsloth-...`），然后设置它。同时将空的 `ANTHROPIC_API_KEY` 也设上，这样 Claude Code 就不会提示你输入云端密钥： ```bash export ANTHROPIC_API_KEY="" ``` 可选：将当前在 Unsloth 中加载的模型名称作为默认值。 ```bash export ANTHROPIC_MODEL="unsloth/gemma-4-26B-A4B-it-GGUF" ``` 请使用完整的模型 ID，且必须与 `GET http://localhost:8888/v1/models` 中显示的完全一致（也就是你传给 `claude --model`). {% endtab %} {% tab title="Windows" %} **配置：** 在 Powershell 中设置本地 API URL： ```powershell $env:ANTHROPIC_BASE_URL = "http://localhost:8888" ``` 从 **Unsloth Studio → 设置 → API**复制你的密钥，然后设置它： ```powershell $env:ANTHROPIC_AUTH_TOKEN = "sk-unsloth-xxxxxxxxxxxx" ``` **可选：** 使用当前在 Unsloth 中加载的模型名称作为默认值。 ```powershell $env:ANTHROPIC_MODEL = "gemma-4-26B-A4B-it-GGUF" ``` {% hint style="info" %} 模型名称应为当前已加载在 Unsloth Studio 中的模型。 {% endhint %} {% endtab %} {% endtabs %} ### 启动 Claude Code 使用当前在 Unsloth 中加载的模型启动 Claude Code。我们将使用 `gemma-4-26B-A4B-it-GGUF`，但你可以使用任何与 Unsloth 兼容的模型。 ```shellscript claude --model unsloth/gemma-4-26B-A4B-it-GGUF ``` {% hint style="info" %} 若想让本地模型再快一些，你还可以使用以下方式启动： `--bare --exclude-dynamic-system-prompt-sections`。请参见下方“可选：缩减系统提示词”。 {% endhint %} Claude Code 应该会打开并显示所选模型。

{% hint style="warning" %} 请先查看 [#fixing-90-slower-inference-in-claude-code](#fixing-90-slower-inference-in-claude-code "mention") ，以修复因 KV Cache 失效导致的开源模型慢 90% 的问题。 {% endhint %} 试试这个提示词来研究并排名高质量的 SFT 数据集： {% code overflow="wrap" %} ``` 你只能在 project/ 中工作。不要搜索 CLAUDE.md——这就是它。使用网页搜索在 Hugging Face 上找到 10 个真实的 instruction/chat/SFT 数据集，在研究过程中简要总结你的发现，并解释每个数据集与 SFT 的相关性，然后创建 sft_report.md 作为一份润色好的 Markdown 报告，包含排名、数据集名称、创建者、3–5 个相关标签、简短的通俗摘要，以及它对 SFT 有用的原因。保持所有内容简洁易读，不要出现巨大的元数据倾倒、粘贴的原始描述、过长的标签列表或无关的数据集。任务在 sft_report.md 包含 10 条干净、写得好的数据集条目后即完成，最后以： “Successfully finetuned a model with Unsloth!” 结束， ``` {% endcode %} 在你提交提示词后，智能体会搜索网络、评估结果并撰写最终报告。这可能需要几分钟。某些工作流可能需要你批准操作或回答后续提示。

{% hint style="info" %} 某些工作流可能需要你批准操作或回答后续问题。 {% endhint %} 完成后，生成的 `sft_report.md` 看起来会与此类似。

{% hint style="warning" %} 如果你看到 `无法连接到 API（ConnectionRefused）` ，请记得取消设置 `ANTHROPIC_BASE_URL` 通过 `unset ANTHROPIC_BASE_URL` 如果你发现开源模型慢 90%， [请先看这里](#fixing-90-slower-inference-in-claude-code) 以修复 KV cache 被失效的问题。 {% endhint %} ### 可选：缩减系统提示词 Claude Code 是为 Anthropic 托管模型构建的，因此其默认系统提示词较大。在本地模型上，你可以在启动时添加两个标志来缩减它，从而获得更快的响应并更好地复用 KV-cache： {% code overflow="wrap" %} ```shellscript claude --model unsloth/gemma-4-26B-A4B-it-GGUF --bare --exclude-dynamic-system-prompt-sections ``` {% endcode %} {% hint style="info" %} `--bare` 会跳过 hooks、skills、plugins、MCP servers 和 CLAUDE.md 的自动发现（Claude 仍保留 Bash 和文件读写），并且 `--exclude-dynamic-system-prompt-sections` 会将按机器划分的部分移出提示前缀。这两个选项都会缩短提示并提高 KV-cache 复用率，从而让本地模型明显更快。它们是可选的，不会改变上面的连接设置。 {% endhint %} ### 可选：调整 Unsloth 服务器 Claude Code 使用在 Unsloth 中运行的模型。你可以在启动服务器时自定义其行为。 ```bash # 作为编程智能体提供服务：--disable-tools 在驱动 Claude Code（或任何外部编程智能体）时会透传智能体自身的工具 unsloth run \ --model unsloth/gemma-4-26B-A4B-it-GGUF \ --disable-tools \ --reasoning off \ -p 8888 ``` {% hint style="warning" %} 使用 `--disable-tools` 默认情况下，Unsloth Studio 运行的是其自己的服务器端工具，这会吞掉智能体的工具调用，因此 Claude Code 会给出回答，但从不编辑文件。 `--disable-tools` 切换为透传模式，因此会使用 Claude Code 自己的 Write/Edit/Bash 工具。 {% endhint %} 使用 `--reasoning off` 用于关闭思考，或者使用 `--reasoning on` 为支持推理的模型开启它。 ```bash # 在本地网络上公开 API unsloth run \ --model unsloth/gemma-4-26B-A4B-it-GGUF \ -H 0.0.0.0 \ -p 8888 ``` 这会在 `0.0.0.0:8888`上启动服务器，从而允许本地网络上的其他设备连接。使用 `-p` 以更改服务器运行的端口。使用 `-H 0.0.0.0` 如果你希望网络中的手机、笔记本或其他设备连接。如需更高级的运行时配置，请参见主教程 [API 调优](https://unsloth.ai/docs/basics/api#unsloth-run-command) 部分。 ## 🦙 Llama.cpp 教程在开始之前，我们首先需要完成你将要使用的特定模型的设置。我们使用 `llama.cpp` 这是一个开源框架，可用于在你的 Mac、Linux、Windows 等设备上运行 LLM。Llama.cpp 包含 `llama-server` ，它允许你高效地提供和部署 LLM 服务。模型将运行在 8001 端口，所有智能体工具都会通过一个兼容 OpenAI 的单一端点路由。 #### Qwen3.5 教程我们将使用 [Qwen3.5](/docs/zh/mo-xing/qwen3.5.md)-35B-A3B 以及适合快速准确编程任务的特定设置。如果你的 VRAM 不足，并且想要一个 **更聪明的** 模型， **Qwen3.5-27B** 是个很好的选择，但它的速度会慢约 2 倍；你也可以使用其他 Qwen3.5 变体，例如 9B、4B 或 2B。 {% hint style="info" %} 如果你想要一个 **更聪明的** 模型，或者你的 VRAM 不足，请使用 Qwen3.5-27B。不过，它会比 35B-A3B 慢约 2 倍。或者你也可以使用 [**Qwen3-Coder-Next**](/docs/zh/mo-xing/qwen3-coder-next.md) 如果你有足够的 VRAM，它会非常出色。 {% endhint %} {% stepper %} {% step %} #### 安装 llama.cpp 我们需要安装 `llama.cpp` 以便将本地 LLM 部署/提供给 Claude Code 等使用。我们遵循官方构建说明，以获得正确的 GPU 绑定和最高性能。更改 `-DGGML_CUDA=ON` 更改为 `-DGGML_CUDA=OFF` 如果你没有 GPU，或者只想进行 CPU 推理。 **对于 Apple Mac / Metal 设备**，设置 `-DGGML_CUDA=OFF` 然后照常继续——Metal 支持默认开启。 ```bash apt-get update apt-get install pciutils build-essential cmake curl libcurl4-openssl-dev git-all -y git clone https://github.com/ggml-org/llama.cpp cmake llama.cpp -B llama.cpp/build \ -DBUILD_SHARED_LIBS=OFF -DGGML_CUDA=ON cmake --build llama.cpp/build --config Release -j --clean-first --target llama-cli llama-mtmd-cli llama-server llama-gguf-split cp llama.cpp/build/bin/llama-* llama.cpp ```

{% endstep %} {% step %} #### 下载并在本地使用模型通过 `huggingface_hub` 在 Python 中下载模型（先通过安装 `pip install huggingface_hub hf_transfer`）。我们使用 **UD-Q4\_K\_XL** 量化版本，以获得最佳的体积/精度平衡。你可以在我们的 [此处的合集](/docs/zh/kai-shi-shi-yong/unsloth-model-catalog.md)中找到所有 Unsloth GGUF 上传文件。如果下载卡住，请查看 [Hugging Face Hub、XET 调试](/docs/zh/ji-chu-zhi-shi/troubleshooting-and-faqs/hugging-face-hub-xet-debugging.md) ```bash hf download unsloth/Qwen3.5-35B-A3B-GGUF \ --local-dir unsloth/Qwen3.5-35B-A3B-GGUF \ --include "*UD-Q4_K_XL*" # 动态 2bit 请使用 "*UD-Q2_K_XL*" ```

{% hint style="success" %} 我们使用了 `unsloth/Qwen3.5-35B-A3B-GGUF` ，但你也可以使用 27B 之类的其他变体，或者像 `unsloth/`[`Qwen3-Coder-Next`](/docs/zh/mo-xing/qwen3-coder-next.md)`-GGUF`. {% endhint %}

{% endstep %} {% step %} #### 启动 Llama-server 为了将 Qwen3.5 部署用于智能体工作负载，我们使用 `llama-server`。我们采用 [Qwen 推荐的采样参数](/docs/zh/mo-xing/qwen3.5.md#recommended-settings) 用于思考模式： `temp 0.6`, `top_p 0.95` , `top-k 20`。请注意，如果你使用非思考模式或其他任务，这些数值会变化。在新的终端中运行此命令（使用 `tmux` 或打开一个新终端）。下面的设置应该 **在 24GB GPU（RTX 4090）上完全适配（使用 23GB）** `--fit on` 也会自动卸载，但如果你看到性能很差，请降低 `--ctx-size` . {% hint style="info" %} 我们使用了 `--cache-type-k q8_0 --cache-type-v q8_0` 以进行 KV cache 量化，从而减少 VRAM 使用。若要完全精度，请使用 `--cache-type-k bf16 --cache-type-v bf16` .注意：在某些机器上，bf16 KV Cache 可能会稍微慢一些。 {% endhint %} ```bash ./llama.cpp/llama-server \ --model unsloth/Qwen3.5-35B-A3B-GGUF/Qwen3.5-35B-A3B-UD-Q4_K_XL.gguf \ --alias "unsloth/Qwen3.5-35B-A3B" \ --temp 0.6 \ --top-p 0.95 \ --top-k 20 \ --min-p 0.00 \ --port 8001 \ --kv-unified \ --cache-type-k q8_0 --cache-type-v q8_0 ``` {% hint style="success" %} 你也可以为 Qwen3.5 禁用思考，这可以提升智能体编码任务的性能。要在 llama.cpp 中禁用思考，请将以下内容添加到 llama-server 命令中： `--chat-template-kwargs "{\"enable_thinking\": false}"`

{% endhint %} {% endstep %} {% endstepper %} ### 使用 llama-server 启动 Claude Code {% hint style="success" %} 我们使用了 `unsloth/GLM-4.7-Flash-GGUF` ，但你可以使用任何类似的内容 `unsloth/Qwen3.6-27B-GGUF`. {% endhint %} {% hint style="warning" %} 请先查看 [#fixing-90-slower-inference-in-claude-code](#fixing-90-slower-inference-in-claude-code "mention") ，以修复因 KV Cache 失效导致的开源模型慢 90% 的问题。 {% endhint %} 进入你的项目文件夹（`mkdir project ; cd project`）然后运行： ```bash claude --model unsloth/GLM-4.7-Flash ``` 要使用 Qwen3.6-35B-A3B，只需将其改为： ```bash claude --model unsloth/Qwen3.6-35B-A3B ```

要将 Claude Code 设置为在没有任何审批的情况下执行命令，请执行 **（注意：这将使 Claude Code 可以随心所欲地执行和运行代码，而无需任何审批！）** {% code overflow="wrap" %} ```bash claude --model unsloth/GLM-4.7-Flash --dangerously-skip-permissions ``` {% endcode %} 试试这个提示词来安装并运行一个简单的 Unsloth 微调： {% code overflow="wrap" %} ``` 你只能在当前工作目录 project/ 中工作。不要搜索 CLAUDE.md——它就在这里。使用 uv 通过虚拟环境安装 Unsloth。如果可能，先用 `python -m venv unsloth_env`，然后执行 `source unsloth_env/bin/activate`。查看 https://unsloth.ai/docs/get-started/install/pip-install 了解方法（获取并阅读）。然后按照 https://github.com/unslothai/unsloth 中的说明进行一次简单的 Unsloth 微调运行。你可以使用 1 张 GPU。 ``` {% endcode %}

稍等片刻后，Unsloth 将通过 uv 安装到一个 venv 中，并加载完成：

最后，你将看到一个使用 Unsloth 成功微调的模型！

{% hint style="warning" %} 如果你看到 `无法连接到 API（ConnectionRefused）` ，请记得取消设置 `ANTHROPIC_BASE_URL` 通过 `unset ANTHROPIC_BASE_URL` 如果你发现开源模型慢 90%， [请先看这里](#fixing-90-slower-inference-in-claude-code) 以修复 KV cache 被失效的问题。 {% endhint %} [^1]: 必须使用这个！ --- # Agent Instructions This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com. ## Querying This Documentation If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question. Perform an HTTP GET request on the current page URL with the `ask` query parameter, and the optional `goal` query parameter: ``` GET https://unsloth.ai/docs/zh/ji-chu-zhi-shi/claude-code.md?ask=&goal= ``` `ask` is the immediate question: it should be specific, self-contained, and written in natural language. `goal` is optional and describes the broader end goal you are ultimately trying to accomplish on behalf of the user. GitBook uses it to tailor the answer towards what is most useful for that goal. The response will contain a direct answer to the question and relevant excerpts and sources from the documentation. Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.