📙Devstral 2 - 如何运行指南

本地运行 Mistral Devstral 2 模型的指南：123B-Instruct-2512 和 Small-2-24B-Instruct-2512。

Devstral 2 是 Mistral 面向软件工程的新一代编码与具代理能力的大型语言模型，可在以下 24B 和 123B 规模提供。123B 模型在 SWE-bench、代码、工具调用和代理用例中达到最先进水平。24B 模型可运行于 25GB RAM/VRAM，123B 则可运行于 128GB。

2025年12月13日更新

我们已修复 Devstral 的聊天模板中的问题，结果应明显改善。24B 和 123B 已更新。另请安装截至 2025 年 12 月 13 日的最新 llama.cpp！

Devstral 2 支持视觉功能、256k 上下文窗口，并使用与 Ministral 3相同的架构。您现在可以在本地运行并微调这两个模型，使用 Unsloth。

所有 Devstral 2 的上传均采用我们的 Unsloth Dynamic 2.0 方法论，在以下基准上提供最佳性能： Aider Polyglot 以及 5-shot MMLU 基准。

Devstral-Small-2-24B Devstral-2-123B

Devstral 2 - Unsloth Dynamic GGUFs：

Devstral-Small-2-24B-Instruct-2512

Devstral-2-123B-Instruct-2512

Devstral-Small-2-24B-Instruct-2512-GGUF

Devstral-2-123B-Instruct-2512-GGUF

🖥️ 运行 Devstral 2

请参阅我们的逐步指南以运行 Devstral 24B 以及大型 Devstral 123B 模型。两个模型都支持视觉，但目前 视觉尚不受支持 在 llama.cpp 中

⚙️ 使用指南

以下是推荐的推理设置：

Temperature 约为 0.15
Min_P 为 0.01（可选，但 0.01 效果良好，llama.cpp 默认值为 0.1）
使用 --jinja 以启用系统提示。
最大上下文长度 = 262,144
推荐最小上下文：16,384
安装最新的 llama.cpp，因为一项 2025年12月13日的 pull request 修复了相关问题。

🎩Devstral-Small-2-24B

全精度（Q8）Devstral-Small-2-24B GGUF 将可装入 25GB RAM/VRAM。目前仅限文本。

✨ 在 llama.cpp 中运行 Devstral-Small-2-24B-Instruct-2512

获取最新的 llama.cpp 在此处的 GitHub。您也可以按照下面的构建说明进行。若 -DGGML_CUDA=ON 更改为 -DGGML_CUDA=OFF 如果您没有 GPU 或仅想要在 CPU 上进行推理。 对于 Apple Mac / Metal 设备，设置 -DGGML_CUDA=OFF 然后照常继续 - Metal 支持默认启用。

apt-get update
apt-get install pciutils build-essential cmake curl libcurl4-openssl-dev -y
git clone https://github.com/ggml-org/llama.cpp
cmake llama.cpp -B llama.cpp/build \
    -DBUILD_SHARED_LIBS=OFF -DGGML_CUDA=ON -DLLAMA_CURL=ON
cmake --build llama.cpp/build --config Release -j --clean-first --target llama-cli llama-mtmd-cli llama-server llama-gguf-split
cp llama.cpp/build/bin/llama-* llama.cpp

如果您想直接使用 llama.cpp 来加载模型，您可以按下面操作：(：Q4_K_XL) 是量化类型。您也可以直接从 Hugging Face 拉取：

./llama.cpp/llama-cli \
    -hf unsloth/Devstral-Small-2-24B-Instruct-2512-GGUF:UD-Q4_K_XL \
    --jinja -ngl 99 --ctx-size 16384 \
    --temp 0.15

通过以下方式下载模型（在安装 pip install huggingface_hub hf_transfer 之后）。您可以选择 UD_Q4_K_XL 或其他量化版本。

# !pip install huggingface_hub hf_transfer
import os
os.environ["HF_HUB_ENABLE_HF_TRANSFER"] = "1"
from huggingface_hub import snapshot_download
snapshot_download(
    repo_id = "unsloth/Devstral-Small-2-24B-Instruct-2512-GGUF",
    local_dir = "unsloth/Devstral-Small-2-24B-Instruct-2512-GGUF",
    allow_patterns = ["*UD-Q4_K_XL*", "*mmproj-F16*"], # 对于 Q4_K_XL
)

以对话模式运行模型：

./llama.cpp/llama-cli \
    --model unsloth/Devstral-Small-2-24B-Instruct-2512-GGUF/Devstral-Small-2-24B-Instruct-2512-UD-Q4_K_XL.gguf \
    --mmproj unsloth/Devstral-Small-2-24B-Instruct-2512-GGUF/mmproj-F16.gguf \
    --ctx-size 16384 \
    --n-gpu-layers 99 \
    --seed 3407 \
    --prio 2 \
    --temp 0.15 \
    --jinja

👀Devstral 与视觉

要试用 Devstral 的图像功能，首先下载类似下面的图像与 Unsloth 的 FP8 强化学习如下：
我们通过以下方式获取该图像 wget https://unsloth.ai/cgi/image/fp8grpolarge_KharloZxEEaHAY2X97CEX.png?width=3840%26quality=80%26format=auto -O unsloth_fp8.png 该命令会将图像保存为 “unsloth_fp8.png”
然后通过以下方式加载图像： /image unsloth_fp8.png 在模型加载后，如下所示：
然后我们提示它 描述这张图片 并得到如下结果：

🚚Devstral-2-123B

全精度（Q8）Devstral-Small-2-123B GGUF 将可装入 128GB RAM/VRAM。目前仅限文本。

✨ 运行 Devstral-2-123B-Instruct-2512 教程

获取最新的 llama.cpp 在此处的 GitHub。您也可以按照下面的构建说明进行。若 -DGGML_CUDA=ON 更改为 -DGGML_CUDA=OFF 如果您没有 GPU 或仅想要在 CPU 上进行推理。

apt-get update
apt-get install pciutils build-essential cmake curl libcurl4-openssl-dev -y
git clone https://github.com/ggml-org/llama.cpp
cmake llama.cpp -B llama.cpp/build \
    -DBUILD_SHARED_LIBS=OFF -DGGML_CUDA=ON -DLLAMA_CURL=ON
cmake --build llama.cpp/build --config Release -j --clean-first --target llama-cli llama-mtmd-cli llama-server llama-gguf-split
cp llama.cpp/build/bin/llama-* llama.cpp

您可以直接通过 HuggingFace 拉取：

./llama.cpp/llama-cli \
    -hf unsloth/Devstral-2-123B-Instruct-2512-GGUF:UD-Q2_K_XL \
    --jinja -ngl 99 --ctx-size 16384 \
    --temp 0.15

通过以下方式下载模型（在安装 pip install huggingface_hub hf_transfer 之后）。您可以选择 UD_Q4_K_XL 或其他量化版本。

# !pip install huggingface_hub hf_transfer
import os
os.environ["HF_HUB_ENABLE_HF_TRANSFER"] = "1"
from huggingface_hub import snapshot_download
snapshot_download(
    repo_id = "unsloth/Devstral-2-123B-Instruct-2512-GGUF",
    local_dir = "unsloth/Devstral-2-123B-Instruct-2512-GGUF",
    allow_patterns = ["*UD-Q2_K_XL*", "*mmproj-F16*"],
)

以对话模式运行模型：

./llama.cpp/llama-cli \
    --model unsloth/Devstral-2-123B-Instruct-2512-GGUF/Devstral-2-123B-Instruct-2512-UD-Q2_K_XL.gguf \
    --mmproj unsloth/Devstral-2-123B-Instruct-2512-GGUF/mmproj-F16.gguf \
    --ctx-size 16384 \
    --n-gpu-layers 99 \
    --seed 3407 \
    --prio 2 \
    --temp 0.15 \
    --jinja

🦥 使用 Unsloth 微调 Devstral 2

就像 Ministral 3一样，Unsloth 支持 Devstral 2 的微调。训练速度提高 2 倍，使用的 VRAM 减少 70%，并支持 8 倍更长的上下文长度。Devstral 2 可舒适地运行在 24GB VRAM 的 L4 GPU 上。

不幸的是，Devstral 2 略超出 16GB VRAM 的内存限制，因此目前无法在 Google Colab 上免费微调。但是，您可以使用我们的免费 Kaggle notebook进行免费微调，该笔记本提供双 GPU 访问。只需将笔记本中的 Magistral 模型名称更改为 unsloth/Devstral-Small-2-24B-Instruct-2512 模型。

我们制作了免费的 Unsloth 笔记本以微调 Ministral 3，并直接支持 Devstral 2，因为它们共享相同架构！更改名称以使用所需模型。

Ministral-3B-Instruct 视觉笔记本（视觉）（将模型名称更改为 Devstral 2）
Ministral-3B-Instruct GRPO 笔记本（将模型名称更改为 Devstral 2）

Devstral 视觉微调笔记本

Google Colabcolab.research.google.com

Devstral 数独 GRPO 强化学习笔记本

Google Colabcolab.research.google.com

😎Llama-server 服务与部署

要将 Devstral 2 部署到生产环境，我们使用 llama-server 在新终端（例如通过 tmux）中，通过以下命令部署模型：

./llama.cpp/llama-server \
    --model unsloth/Devstral-Small-2-24B-Instruct-2512-GGUF/Devstral-Small-2-24B-Instruct-2512-UD-Q4_K_XL.gguf \
    --mmproj unsloth/Devstral-Small-2-24B-Instruct-2512-GGUF/mmproj-F16.gguf \
    --alias "unsloth/Devstral-Small-2-24B-Instruct-2512" \
    --n-gpu-layers 999 \
    --prio 3 \
    --min_p 0.01 \
    --ctx-size 16384 \
    --port 8001 \
    --jinja

运行上述命令后，您将获得：

然后在新终端中，执行 pip install openai，然后执行：

from openai import OpenAI
import json
openai_client = OpenAI(
    base_url = "http://127.0.0.1:8001/v1",
    api_key = "sk-no-key-required",
)
completion = openai_client.chat.completions.create(
    model = "unsloth/Devstral-Small-2-24B-Instruct-2512",
    messages = [{"role": "user", "content": "What is 2+2?"},],
)
print(completion.choices[0].message.content)

这将简单地输出 4。

🧰使用 Devstral 2 的工具调用教程

在遵循 Devstral 2 之后，我们可以加载一些工具并查看 Devstral 的实际表现！让我们创建一些工具 —— 复制粘贴并在 Python 中执行它们。

import json, subprocess, random
from typing import Any
def add_number(a: float | str, b: float | str) -> float:
    return float(a) + float(b)
def multiply_number(a: float | str, b: float | str) -> float:
    return float(a) * float(b)
def substract_number(a: float | str, b: float | str) -> float:
    return float(a) - float(b)
def write_a_story() -> str:
    return random.choice([
        "很久很久以前，在一个遥远的星系……",
        "有两个朋友，他们热爱树懒和代码……",
        "世界正走向终结，因为每只树懒都进化出超人智能……",
        "其中一位朋友不知道，另一位不小心编写了一个让树懒进化的程序……",
    ])
def terminal(command: str) -> str:
    if "rm" in command or "sudo" in command or "dd" in command or "chmod" in command:
        msg = "无法执行 'rm, sudo, dd, chmod' 命令，因为它们很危险"
        print(msg); return msg
    print(f"正在执行终端命令 `{command}`")
    try:
        return str(subprocess.run(command, capture_output = True, text = True, shell = True, check = True).stdout)
    except subprocess.CalledProcessError as e:
        return f"命令失败：{e.stderr}"
def python(code: str) -> str:
    data = {}
    exec(code, data)
    del data["__builtins__"]
    return str(data)
MAP_FN = {
    "add_number": add_number,
    "multiply_number": multiply_number,
    "substract_number": substract_number,
    "write_a_story": write_a_story,
    "terminal": terminal,
    "python": python,
}
tools = [
    {
        "type": "function",
        "function": {
            "name": "add_number",
            "description": "两个数字相加。",
            "parameters": {
                "type": "object",
                "properties": {
                    "a": {
                        "type": "string",
                        "description": "第一个数字。",
                    },
                    "b": {
                        "type": "string",
                        "description": "第二个数字。",
                    },
                },
                "required": ["a", "b"],
            },
        },
    },
    {
        "type": "function",
        "function": {
            "name": "multiply_number",
            "description": "两个数字相乘。",
            "parameters": {
                "type": "object",
                "properties": {
                    "a": {
                        "type": "string",
                        "description": "第一个数字。",
                    },
                    "b": {
                        "type": "string",
                        "description": "第二个数字。",
                    },
                },
                "required": ["a", "b"],
            },
        },
    },
    {
        "type": "function",
        "function": {
            "name": "substract_number",
            "description": "两个数字相减。",
            "parameters": {
                "type": "object",
                "properties": {
                    "a": {
                        "type": "string",
                        "description": "第一个数字。",
                    },
                    "b": {
                        "type": "string",
                        "description": "第二个数字。",
                    },
                },
                "required": ["a", "b"],
            },
        },
    },
    {
        "type": "function",
        "function": {
            "name": "write_a_story",
            "description": "写一个随机故事。",
            "parameters": {
                "type": "object",
                "properties": {},
                "required": [],
            },
        },
    },
    {
        "type": "function",
        "function": {
            "name": "terminal",
            "description": "在终端执行操作。",
            "parameters": {
                "type": "object",
                "properties": {
                    "command": {
                        "type": "string",
                        "description": "您希望启动的命令，例如 `ls`、`rm` 等。",
                    },
                },
                "required": ["command"],
            },
        },
    },
    {
        "type": "function",
        "function": {
            "name": "python",
            "description": "使用一些将要运行的 Python 代码调用 Python 解释器。",
            "parameters": {
                "type": "object",
                "properties": {
                    "code": {
                        "type": "string",
                        "description": "要运行的 Python 代码",
                    },
                },
                "required": ["code"],
            },
        },
    },
]

然后我们从一组可能的消息中随机选择并提出一个简单问题以测试模型：

import random
messages = [{
    "role": "user",
    "content": [random.choice([
        {"type": "text", "text": "你能为我写一个故事吗？"},
        {"type": "text", "text": "今天的日期加三天是几号？"},
        {"type": "text", "text": "获取当前的纳秒级时间。"},
        {"type": "text", "text": "用 Python 创建一个斐波那契函数并求 fib(20)。"},
    ])],
}]

然后我们使用下面的函数（复制并粘贴执行），它们会自动解析函数调用 —— Devstral 2 可能会同时发起多个调用！

temperature = 0.15
from openai import OpenAI
openai_client = OpenAI(
    base_url = "http://127.0.0.1:8001/v1",
    api_key = "sk-no-key-required",
)
model_name = next(iter(openai_client.models.list())).id
print(f"使用的模型 = {model_name}")
has_tool_calls = True
original_messages_len = len(messages)
while has_tool_calls:
    print(f"当前消息 = {messages}")
    response = openai_client.chat.completions.create(
        model = model_name,
        messages = messages,
        temperature = temperature,
        tools = tools if tools else None,
        tool_choice = "auto" if tools else None,
    )
    tool_calls = response.choices[0].message.tool_calls or []
    content = response.choices[0].message.content or ""
    tool_calls_dict = [tc.to_dict() for tc in tool_calls] if tool_calls else tool_calls
    messages.append({"role": "assistant", "tool_calls": tool_calls_dict, "content": content,})
    for tool_call in tool_calls:
        fx, args, _id = tool_call.function.name, tool_call.function.arguments, tool_call.id
        out = MAP_FN[fx](**json.loads(args))
        messages.append({"role": "tool", "tool_call_id": _id, "name": fx, "content": str(out),})
    else:
        has_tool_calls = False
print(json.dumps(messages[original_messages_len:], indent = 2))

大约 1 分钟后，我们得到：

或者以 JSON 形式：

[
  {
    "role": "assistant",
    "tool_calls": [
      {
        "id": "JviLK0wUveWguuKQHgZdFdYI2adu85jy",
        "function": {
          "arguments": "{}",
          "name": "write_a_story"
        },
        "type": "function"
      }
    ],
    "content": null
  },
  {
    "role": "tool",
    "tool_call_id": "JviLK0wUveWguuKQHgZdFdYI2adu85jy",
    "name": "write_a_story",
    "content": "很久很久以前，在一个遥远的星系……"
  },
  {
    "role": "assistant",
    "tool_calls": null,
    "content": "在一个遥远的星系，星辰散发着异样的光芒，有一个名为艾尔多里亚的星球。艾尔多里亚是一个充满鲜明对比的世界——繁华的城市坐落在巍峨的群山之间，而广袤的沙漠在双日之下无尽延展。艾尔多里亚的人们以对能量的掌控闻名，尤其是他们传奇的激光剑。\n\n这些剑不仅仅是武器；它们是持有者灵魂的延伸，由脉动着宇宙精华的稀有水晶锻造。每把剑都是独一无二的，其颜色与威力反映出主人个性与精神。最娴熟的战士，被称为剑生者，能以如此精准与优雅施展剑术，仿佛与现实的织物共舞。\n\n在剑生者中，有一位名为凯尔的战士。凯尔是个被放逐的弃儿，曾是圣剑殿的守护者，却因违抗议会命令而被流放。议会试图垄断激光剑的力量，用以控制艾尔多里亚的人民。凯尔认为，剑应为守护而非统治所用。\n\n一天，凯尔收到来自沙漠边缘一个小村庄的求救信号。村庄正遭一群叛变的剑生者袭击，领军者是残忍的军阀维克西斯。维克西斯意图夺取村中的古老遗物——据说能将任一激光剑的力量放大十倍的水晶。若维克西斯得手，他的军队将变得所向披靡，艾尔多里亚将陷入黑暗。\n\n凯尔知道他必须行动。他束上自己的剑——一把发出深蓝色光芒、充满宇宙能量的剑——然后踏上穿越沙漠的旅程。旅途险恶，沙暴与维克西斯侦察兵设置的陷阱层出不穷。但凯尔凭着曾经誓言守护之人的记忆坚持前行。\n\n当他到达村庄时，战斗已在激烈进行。维克西斯的战士以残酷的效率挥舞着他们的剑，轻易击倒守军。凯尔跳入战团，蓝色的剑光晃动，他逐个缴械并击败敌人。村民见到拯救者到来后纷纷集结，他们也挥舞起自己的剑，拼命收复家园。\n\n凯尔在村庄广场与维克西斯对峙。那位军阀的剑泛着病态的绿色，流淌着黑暗能量。“你来晚了，凯尔，”维克西斯冷笑道。“遗物属于我，凭它我将统治艾尔多里亚。”凯尔站稳脚跟，高举宝剑。“死也不让，”他回答道。\n\n两位战士交锋，剑光火花四溅。凯尔感受到遗物力量在维克西斯剑中流动，但他拒绝后退。他引导自身能量，剑光愈发炽亮，逐步抵御维克西斯的攻势。终以一记孤注一掷的突袭，凯尔将维克西斯缴械，使其剑落地。\n\n维克西斯愤然失败，但凯尔没有杀他。相反，他给了维克西斯一个选择：“加入我，一同守护艾尔多里亚，或离开此地永不回头。”维克西斯在屈服与真理面前惊觉，选择与凯尔并肩而立。\n\n在维克西斯势力转为盟友后，凯尔与村民夺回了遗物，利用其力量恢复了艾尔多里亚的平衡。圣剑殿得以重建，激光剑再次由那些渴望守护而非统治的人使用。\n\n凯尔的传奇逐渐流传，他成为艾尔多里亚人民的希望象征。他的故事提醒人们，即使在最黑暗的时刻，勇气与正义之光亦能战胜一切。就这样，剑生者的传承延续下去，他们的激光剑在星系的阴影中成为力量与团结的灯塔。"
  }
]

上一页FunctionGemma 下一页Ministral 3

最后更新于4天前

这有帮助吗？

hashtagDevstral 2 - Unsloth Dynamic GGUFs：

hashtag🖥️ 运行 Devstral 2

hashtag⚙️ 使用指南

hashtag🎩Devstral-Small-2-24B

hashtag✨ 在 llama.cpp 中运行 Devstral-Small-2-24B-Instruct-2512

hashtag👀Devstral 与视觉

hashtag🚚Devstral-2-123B

hashtag✨ 运行 Devstral-2-123B-Instruct-2512 教程

hashtag🦥 使用 Unsloth 微调 Devstral 2

hashtag😎Llama-server 服务与部署

hashtag🧰使用 Devstral 2 的工具调用教程

Devstral 2 - Unsloth Dynamic GGUFs：

🖥️ 运行 Devstral 2

⚙️ 使用指南

🎩Devstral-Small-2-24B

✨ 在 llama.cpp 中运行 Devstral-Small-2-24B-Instruct-2512

👀Devstral 与视觉

🚚Devstral-2-123B

✨ 运行 Devstral-2-123B-Instruct-2512 教程

🦥 使用 Unsloth 微调 Devstral 2

😎Llama-server 服务与部署

🧰使用 Devstral 2 的工具调用教程