GLM-4.7-Flash：如何本地运行

在您的设备上本地运行并微调 GLM-4.7-Flash！

GLM-4.7-Flash 是 Z.ai 新推出的本地部署用 30B MoE 推理模型，在代码生成、代理式工作流和聊天方面提供同类最佳的性能。它使用约 3.6B 参数，支持 200K 上下文，并在 SWE-Bench、GPQA 以及推理/聊天基准中领先。

GLM-4.7-Flash 可在 24GB 内存/显存/统一内存（全精度需 32GB），现在你也可以使用 Unsloth 进行微调。要在 vLLM 上运行 GLM 4.7 Flash，请参见 GLM-4.7-Flash

1 月 21 日更新： llama.cpp 修复了一个错误，该错误指定了错误的 scoring_func: "softmax" （应为 "sigmoid"）。该问题导致循环和糟糕的输出。我们已更新 GGUF —— 请重新下载模型以获得更好的输出。

你现在可以使用 Z.ai 推荐的参数并获得很好的结果：

一般使用场景： --temp 1.0 --top-p 0.95
工具调用场景： --temp 0.7 --top-p 1.0
重复惩罚： 禁用它，或者设置 --repeat-penalty 1.0

1 月 22 日：更快的推理已经到来，因为针对 CUDA 的 FA 修复现已合并。

运行教程微调

用于运行的 GLM-4.7-Flash GGUF： unsloth/GLM-4.7-Flash-GGUF

⚙️ 使用指南

为获得最佳性能，请确保您可用的总内存（VRAM + 系统 RAM）大于您下载的量化模型文件的大小。如果不是，llama.cpp 仍可通过 SSD/HDD 异地卸载运行，但推理会更慢。

在与 Z.ai 团队沟通后，他们建议使用他们的 GLM-4.7 采样参数：

默认设置（大多数任务）

终端基准、SWE 基准已验证

temperature = 1.0

temperature = 0.7

top_p = 0.95

top_p = 1.0

repeat penalty = 禁用或 1.0

一般使用场景： --temp 1.0 --top-p 0.95
工具调用场景： --temp 0.7 --top-p 1.0
如果使用 llama.cpp，设置 --min-p 0.01 因为 llama.cpp 的默认值是 0.05
有时你需要尝试哪些数值最适合你的使用场景。

目前，我们 不建议 用以下方式运行此 GGUF Ollama 因为可能存在聊天模板兼容性问题。该 GGUF 在 llama.cpp（或例如 LM Studio、Jan 等后端）上运行良好。

记得禁用重复惩罚！或者设置 --repeat-penalty 1.0

最大上下文窗口： 202,752

🖥️ 运行 GLM-4.7-Flash

根据你的使用场景，你需要使用不同的设置。一些 GGUF 的最终大小相似是因为模型架构（例如 gpt-oss）的维度不能被 128 整除，因此部分无法量化到更低的位宽。

因为本指南使用 4 位，你将需要大约 18GB 内存/统一内存。我们建议使用至少 4-bit 精度以获得最佳性能。

记得禁用重复惩罚！或者设置 --repeat-penalty 1.0

Llama.cpp 教程（GGUF）：

在 llama.cpp 中运行的说明（注意我们将使用 4 位以适配大多数设备）：

获取最新的 llama.cpp 在 GitHub 在此。你也可以按照下面的构建说明。若 -DGGML_CUDA=ON 改为 -DGGML_CUDA=OFF 如果你没有 GPU 或仅想使用 CPU 推理。 对于 Apple Mac / Metal 设备，设置 -DGGML_CUDA=OFF 然后照常继续 - Metal 支持默认开启。

apt-get update
apt-get install pciutils build-essential cmake curl libcurl4-openssl-dev -y
git clone https://github.com/ggml-org/llama.cpp
cmake llama.cpp -B llama.cpp/build \
    -DBUILD_SHARED_LIBS=OFF -DGGML_CUDA=ON
cmake --build llama.cpp/build --config Release -j --clean-first --target llama-cli llama-mtmd-cli llama-server llama-gguf-split
cp llama.cpp/build/bin/llama-* llama.cpp

你可以直接从 Hugging Face 拉取。你可以根据你的内存/显存将上下文增加到 200K。

你也可以尝试 Z.ai 推荐的 GLM-4.7 采样参数：

一般使用场景： --temp 1.0 --top-p 0.95
工具调用场景： --temp 0.7 --top-p 1.0
记得禁用重复惩罚！

遵循此以获取 一般指令 用例：

./llama.cpp/llama-cli \
    -hf unsloth/GLM-4.7-Flash-GGUF:UD-Q4_K_XL \
    --ctx-size 16384 \
    --temp 1.0 --top-p 0.95 --min-p 0.01

遵循此以获取 工具调用 用例：

./llama.cpp/llama-cli \
    -hf unsloth/GLM-4.7-Flash-GGUF:UD-Q4_K_XL \
    --ctx-size 16384 \
    --temp 0.7 --top-p 1.0 --min-p 0.01

通过以下方式下载模型（在安装后 pip install huggingface_hub）。你可以选择 UD-Q4_K_XL 或其他量化版本。如果下载卡住，请参见 Hugging Face Hub、XET 调试

pip install -U huggingface_hub
hf download unsloth/GLM-4.7-Flash-GGUF \
    --local-dir unsloth/GLM-4.7-Flash-GGUF \
    --include "*UD-Q2_K_XL*"

然后以对话模式运行模型：

./llama.cpp/llama-cli \
    --model unsloth/GLM-4.7-Flash-GGUF/GLM-4.7-Flash-UD-Q4_K_XL.gguf \
    --ctx-size 16384 \
    --seed 3407 \
    --temp 1.0 \
    --top-p 0.95 \
    --min-p 0.01

另外，根据需要调整 上下文窗口 到 202752

➿减少重复与循环

1 月 21 日更新： llama.cpp 修复了一个错误，该错误指定了错误的 "scoring_func": "softmax" 导致循环和糟糕的输出（应为 sigmoid）。我们已更新 GGUF。请重新下载模型以获得更好的输出。

这意味着你现在可以使用 Z.ai 推荐的参数并获得很好的结果：

一般使用场景： --temp 1.0 --top-p 0.95
工具调用场景： --temp 0.7 --top-p 1.0
如果使用 llama.cpp，设置 --min-p 0.01 因为 llama.cpp 的默认值是 0.05
记得禁用重复惩罚！或者设置 --repeat-penalty 1.0

我们添加了 "scoring_func": "sigmoid" 改为 config.json 用于主模型 - 参见.

🐦使用 UD-Q4_K_XL 的 Flappy Bird 示例

作为示例，我们通过使用 UD-Q4_K_XL 进行了以下长对话，运行命令： ./llama.cpp/llama-cli --model unsloth/GLM-4.7-Flash-GGUF/GLM-4.7-Flash-UD-Q4_K_XL.gguf --fit on --temp 1.0 --top-p 0.95 --min-p 0.01 :

嗨
2+2 等于多少
创建一个 Python 的 Flappy Bird 游戏
用 Rust 创建一个完全不同的游戏
找出两个游戏中的错误
将我提到的第一个游戏做成独立的 HTML 文件
找出错误并展示修复后的游戏

它渲染出如下的 Flappy Bird HTML 游戏：

可扩展的 HTML 版 Flappy Bird 游戏

<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0, maximum-scale=1.0, user-scalable=no">
    <title>Flappy Bird 修复版</title>
    <style>
        body {
            margin: 0;
            display: flex;
            justify-content: center;
            align-items: center;
            height: 100vh;
            background-color: #222;
            font-family: 'Arial', sans-serif;
            overflow: hidden;
            user-select: none;
            -webkit-user-select: none;
            touch-action: none; /* 防止移动端缩放 */
        }

        #game-container {
            position: relative;
            box-shadow: 0 0 20px rgba(0,0,0,0.5);
        }

        canvas {
            background-color: #87CEEB;
            display: block;
            border-radius: 4px;
        }

        /* UI 覆盖层 */
        #ui-layer {
            position: absolute;
            top: 0;
            left: 0;
            width: 100%;
            height: 100%;
            pointer-events: none;
            display: flex;
            flex-direction: column;
            justify-content: center;
            align-items: center;
            text-align: center;
        }

        #score-display {
            position: absolute;
            top: 40px;
            left: 50%;
            transform: translateX(-50%);
            font-size: 48px;
            font-weight: bold;
            color: white;
            text-shadow: 3px 3px 0 #000;
            z-index: 10;
            font-family: 'Courier New', Courier, monospace;
        }

        #start-screen, #game-over-screen {
            background: rgba(0, 0, 0, 0.7);
            width: 100%;
            height: 100%;
            display: flex;
            flex-direction: column;
            justify-content: center;
            align-items: center;
            color: white;
            pointer-events: auto; /* 允许点击 */
            cursor: pointer;
        }

        h1 { margin: 0 0 10px 0; font-size: 60px; text-shadow: 4px 4px 0 #000; line-height: 1; }
        p { font-size: 22px; margin: 10px 0; color: #ddd; }
        
        .btn {
            background: linear-gradient(to bottom, #ffeb3b, #fbc02d);
            border: 3px solid #fff;
            color: #333;
            padding: 15px 40px;
            font-size: 28px;
            font-weight: bold;
            cursor: pointer;
            border-radius: 8px;
            box-shadow: 0 6px 0 #c49000, 0 10px 10px rgba(0,0,0,0.3);
            text-transform: uppercase;
            transition: all 0.1s;
            margin-top: 10px;
        }

        .btn:active {
            transform: translateY(4px);
            box-shadow: 0 2px 0 #c49000, 0 4px 4px rgba(0,0,0,0.3);
        }

        .score-board {
            background: #ded895;
            border: 2px solid #543847;
            padding: 20px 40px;
            border-radius: 10px;
            box-shadow: 4px 4px 0 #543847;
            margin-bottom: 30px;
            display: none;
            border: 4px solid #543847;
        }
        
        .score-board h2 { margin: 0 0 5px 0; color: #e86101; font-size: 40px; }
        .score-board span { font-size: 20px; color: #543847; display: block; text-align: center; }

    </style>
</head>
<body>

    <div id="game-container">
        <canvas id="gameCanvas" width="400" height="600"></canvas>
        
        <div id="score-display">0</div>

        <div id="ui-layer">
            <div id="start-screen">
                <h1>FLAPPY<br>BIRD</h1>
                <p>点击或按空格开始</p>
                <button class="btn" style="display:none;" id="touch-instruction">点击开始</button>
            </div>

            <div id="game-over-screen">
                <h1>游戏结束</h1>
                <div class="score-board" id="score-board">
                    <h2>得分： <span id="final-score">0</span></h2>
                </div>
                <button class="btn" id="restart-btn">再试一次</button>
            </div>
        </div>
    </div>

<script>
    const canvas = document.getElementById('gameCanvas');
    const ctx = canvas.getContext('2d');

    // --- 常量 ---
    const GRAVITY = 0.35; // 稍强的重力以获得更好的手感
    const JUMP_STRENGTH = -6.5;
    const PIPE_GAP = 180;
    const PIPE_WIDTH = 60;
    const PIPE_SPEED = 2.5;
    const PIPE_SPAWN_RATE = 100;

    // --- 状态 ---
    let frames = 0;
    let score = 0;
    let isGameOver = false;
    let isPlaying = false;
    let gameLoopId;

    const ui = {
        startScreen: document.getElementById('start-screen'),
        gameOverScreen: document.getElementById('game-over-screen'),
        scoreDisplay: document.getElementById('score-display'),
        scoreBoard: document.getElementById('score-board'),
        finalScore: document.getElementById('final-score'),
        restartBtn: document.getElementById('restart-btn')
    };

    const bird = {
        x: 80,
        y: 150,
        radius: 12, // 固定半径
        velocity: 0,
        
        draw: function() {
            // 根据速度旋转小鸟以增加视觉效果
            let angle = Math.min(Math.PI / 4, Math.max(-Math.PI / 4, (this.velocity * 0.1)));
            
            ctx.save();
            ctx.translate(this.x, this.y);
            ctx.rotate(angle);
            
            // 绘制身体
            ctx.fillStyle = '#FFD700';
            ctx.beginPath();
            ctx.arc(0, 0, this.radius, 0, Math.PI * 2);
            ctx.fill();
            
            // 眼睛
            ctx.fillStyle = 'white';
            ctx.beginPath();
            ctx.arc(4, -4, 4, 0, Math.PI * 2);
            ctx.fill();
            ctx.fillStyle = 'black';
            ctx.beginPath();
            ctx.arc(6, -4, 2, 0, Math.PI * 2);
            ctx.fill();
            
            // 翅膀
            ctx.fillStyle = '#FFA500';
            ctx.beginPath();
            ctx.arc(-4, 4, 5, 0, Math.PI * 2);
            ctx.fill();

            ctx.restore();
        },

        update: function() {
            this.velocity += GRAVITY;
            this.y += this.velocity;
        },

        jump: function() {
            this.velocity = JUMP_STRENGTH;
        },

        reset: function() {
            this.y = 150;
            this.velocity = 0;
        }
    };

    let pipes = [];

    function createPipe() {
        const minHeight = 50;
        const maxPos = canvas.height - PIPE_GAP - minHeight;
        const topHeight = Math.floor(Math.random() * (maxPos - minHeight + 1)) + minHeight;
        
        pipes.push({
            x: canvas.width,
            topHeight: topHeight,
            bottomY: topHeight + PIPE_GAP,
            width: PIPE_WIDTH,
            passed: false
        });
    }

    function drawPipes() {
        ctx.fillStyle = '#2ecc71';
        ctx.strokeStyle = '#27ae60';
        ctx.lineWidth = 2;
        
        pipes.forEach(pipe => {
            // 上方管道
            ctx.fillRect(pipe.x, 0, pipe.width, pipe.topHeight);
            ctx.strokeRect(pipe.x, 0, pipe.width, pipe.topHeight);
            
            // 下方管道
            ctx.fillRect(pipe.x, pipe.bottomY, pipe.width, canvas.height - pipe.bottomY);
            ctx.strokeRect(pipe.x, pipe.bottomY, pipe.width, canvas.height - pipe.bottomY);

            // 管道帽
            const capH = 20;
            ctx.fillStyle = '#27ae60'; 
            ctx.fillRect(pipe.x - 2, pipe.topHeight - capH, pipe.width + 4, capH);
            ctx.fillRect(pipe.x - 2, pipe.bottomY, pipe.width + 4, capH);
        });
    }

    function updatePipes() {
        if (frames % PIPE_SPAWN_RATE === 0) createPipe();

        for (let i = 0; i < pipes.length; i++) {
            let p = pipes[i];
            p.x -= PIPE_SPEED;

            // --- 修复后的碰撞检测 ---
            // 将小鸟视为半径为 'bird.radius' 的圆
            // 管道视为矩形：x, x+w, y_top, y_bottom
            let birdLeft = bird.x - bird.radius;
            let birdRight = bird.x + bird.radius;
            let birdTop = bird.y - bird.radius;
            let birdBottom = bird.y + bird.radius;

            // 水平重叠
            if (birdRight > p.x && birdLeft < p.x + p.width) {
                // 垂直重叠（撞上上方管道或下方管道）
                if (birdTop < p.topHeight || birdBottom > p.bottomY) {
                    gameOver();
                }
            }

            // --- 修复后的计分 ---
            // 若管道已移出屏幕左侧且尚未计分
            if (p.x + p.width < 0 && !p.passed) {
                score++;
                p.passed = true;
                ui.scoreDisplay.innerText = score;
            }

            if (p.x < -60) {
                pipes.shift();
                i--;
            }
        }
    }

    function checkCollisions() {
        // 地面
        if (bird.y + bird.radius >= canvas.height) {
            gameOver();
        }
        // 天花板
        if (bird.y - bird.radius <= 0) {
            bird.y = bird.radius;
            bird.velocity = 0;
        }
    }

    function drawBackground() {
        // 清除画布
        ctx.clearRect(0, 0, canvas.width, canvas.height);
        
        // 地面
        ctx.fillStyle = '#654321';
        ctx.fillRect(0, canvas.height - 10, canvas.width, 10);
        
        // 云朵
        ctx.fillStyle = "rgba(255, 255, 255, 0.6)";
        for(let i=0; i<4; i++) {
            let x = (frames * 0.5 + i * 150) % (canvas.width + 100) - 50;
            let y = (i * 40) + 20;
            let scale = 1 + (Math.sin(frames * 0.02 + i) * 0.1);
            let size = 30 * scale;
            ctx.beginPath();
            ctx.arc(x, y, size, 0, Math.PI * 2);
            ctx.arc(x + 20*scale, y - 10*scale, size * 1.2, 0, Math.PI * 2);
            ctx.arc(x + 40*scale, y, size, 0, Math.PI * 2);
            ctx.fill();
        }
    }

    function update() {
        if (!isPlaying) return;
        bird.update();
        updatePipes();
        checkCollisions();
        frames++;
    }

    function draw() {
        drawBackground();
        drawPipes();
        bird.draw();
    }

    function loop() {
        update();
        draw();
        if (isPlaying || !isGameOver) {
            gameLoopId = requestAnimationFrame(loop);
        }
    }

    function startGame() {
        isPlaying = true;
        isGameOver = false;
        
        // 界面
        ui.startScreen.style.display = 'none';
        ui.gameOverScreen.style.display = 'none';
        ui.scoreBoard.style.display = 'none';
        
        // 逻辑
        bird.reset();
        pipes = [];
        score = 0;
        frames = 0;
        ui.scoreDisplay.innerText = '0';
        
        loop();
    }

    function gameOver() {
        isPlaying = false;
        isGameOver = true;
        cancelAnimationFrame(gameLoopId);
        
        ui.finalScore.innerText = score;
        ui.gameOverScreen.style.display = 'flex';
        ui.scoreBoard.style.display = 'block';
    }

    // --- 输入处理 ---

    function handleInput(e) {
        if (e.type === 'keydown' && e.code === 'Space') e.preventDefault();

        if (isPlaying) {
            bird.jump();
        } else if (!isGameOver) {
            // 在开始界面点击（或游戏未开始时的任意点击）
            startGame();
        }
    }

    // 键盘
    window.addEventListener('keydown', (e) => {
        if (e.code === 'Space') handleInput(e);
    });

    // 鼠标 / 触摸
    window.addEventListener('mousedown', handleInput);
    window.addEventListener('touchstart', (e) => {
        // 防止缩放/滚动
        // e.preventDefault(); 
        handleInput(e);
    }, {passive: false});

    // 界面交互
    ui.restartBtn.addEventListener('click', (e) => {
        e.stopPropagation();
        startGame();
    });
    
    // 允许点击“游戏结束”覆盖层以重启
    ui.gameOverScreen.addEventListener('mousedown', (e) => {
        if(e.target === ui.gameOverScreen) startGame();
    });
    ui.gameOverScreen.addEventListener('touchstart', (e) => {
        if(e.target === ui.gameOverScreen) {
            e.preventDefault();
            startGame();
        }
    });

    // 初始绘制
    drawBackground();
    bird.reset();
    bird.draw();

</script>
</body>
</html>

我们还截取了一些截图（4bit 可用）：

🦥 微调 GLM-4.7-Flash

Unsloth 现在支持 GLM-4.7-Flash 的微调，然而你将需要使用 transformers v5。该 30B 模型无法放入免费的 Colab GPU；不过你可以使用我们的笔记本。GLM-4.7-Flash 的 16-bit LoRA 微调大约需要 60GB 显存:

GLM-4.7-Flash SFT LoRA 笔记本

在使用 A100 40GB 显存时你可能会遇到内存不足的情况。你需要使用 H100/A100 80GB 显存以获得更顺畅的运行。

Google Colabcolab.research.google.com

在微调 MoE 时，通常不建议微调路由层，因此我们默认禁用它。如果你想保留其推理能力（可选），可以使用直接答案与链式思维示例的混合。在你的数据集中至少使用 75% 推理类和 25% 非推理类以使模型保留其推理能力。

🦙 Llama-server 提供服务与部署

要将 GLM-4.7-Flash 部署到生产环境，我们使用 llama-server 在新终端中（例如通过 tmux），通过以下方式部署模型：

./llama.cpp/llama-server \
    --model unsloth/GLM-4.7-Flash-GGUF/GLM-4.7-Flash-UD-Q4_K_XL.gguf \
    --alias "unsloth/GLM-4.7-Flash" \
    --seed 3407 \
    --temp 1.0 \
    --top-p 0.95 \
    --min-p 0.01 \
    --ctx-size 16384 \
    --port 8001

然后在新终端中，在执行 pip install openai之后，执行：

from openai import OpenAI
import json
openai_client = OpenAI(
    base_url = "http://127.0.0.1:8001/v1",
    api_key = "sk-no-key-required",
)
completion = openai_client.chat.completions.create(
    model = "unsloth/GLM-4.7-Flash",
    messages = [{"role": "user", "content": "What is 2+2?"},],
)
print(completion.choices[0].message.content)

这将打印出

用户提出一个简单问题：“2+2 等于多少？” 答案是 4。提供答案。

2 + 2 = 4.

💻 vLLM 中的 GLM-4.7-Flash

你现在可以使用我们的新 FP8 动态量化为高端且快速的推理对模型进行的设置。首先从 nightly 安装 vLLM：

uv pip install --upgrade --force-reinstall vllm --torch-backend=auto --extra-index-url https://wheels.vllm.ai/nightly/cu130
uv pip install --upgrade --force-reinstall git+https://github.com/huggingface/transformers.git
uv pip install --force-reinstall numba

然后提供服务 Unsloth 的动态 FP8 版本的模型。我们启用了 FP8 以将 KV 缓存内存使用量减少 50%，并在 4 张 GPU 上。如果只有 1 张 GPU，请使用 CUDA_VISIBLE_DEVICES='0' 并设置 --tensor-parallel-size 1 或移除此参数。要禁用 FP8，请移除 --quantization fp8 --kv-cache-dtype fp8

export PYTORCH_CUDA_ALLOC_CONF=expandable_segments:False
CUDA_VISIBLE_DEVICES='0,1,2,3' vllm serve unsloth/GLM-4.7-Flash-FP8-Dynamic \
    --served-model-name unsloth/GLM-4.7-Flash \
    --tensor-parallel-size 4 \
    --tool-call-parser glm47 \
    --reasoning-parser glm45 \
    --enable-auto-tool-choice \
    --dtype bfloat16 \
    --seed 3407 \
    --max-model-len 200000 \
    --gpu-memory-utilization 0.95 \
    --max_num_batched_tokens 16384 \
    --port 8001 \
    --kv-cache-dtype fp8

然后你可以通过 OpenAI API 调用该已部署的模型：

from openai import AsyncOpenAI, OpenAI
openai_api_key = "EMPTY"
openai_api_base = "http://localhost:8001/v1"
client = OpenAI( # 或 AsyncOpenAI
    api_key=openai_api_key,
    base_url=openai_api_base,
)

⭐ vLLM GLM-4.7-Flash 预测解码（Speculative Decoding）

我们发现使用 GLM 4.7 Flash 的 MTP（多标记预测）模块会让生成吞吐量从 1 个 B200 的 13000 个标记降到 1300 个标记！（慢 10 倍）在 Hopper 上，希望应该没问题。

    --speculative-config.method mtp \
    --speculative-config.num_speculative_tokens 1

在 1xB200 上吞吐量仅 1,300 标记/秒（每用户解码 130 标记/秒）

在 1xB200 上吞吐量为 13,000 标记/秒（仍为每用户解码 130 标记/秒）

🔨使用 GLM-4.7-Flash 的工具调用

参见 Tool Calling Guide 以了解有关如何进行工具调用的更多细节。在新的终端中（如果使用 tmux，使用 CTRL+B+D），我们创建了一些工具，例如相加两个数字、执行 Python 代码、执行 Linux 命令等更多功能：

import json, subprocess, random
from typing import Any
def add_number(a: float | str, b: float | str) -> float:
    return float(a) + float(b)
def multiply_number(a: float | str, b: float | str) -> float:
    return float(a) * float(b)
def substract_number(a: float | str, b: float | str) -> float:
    return float(a) - float(b)
def write_a_story() -> str:
    return random.choice([
        "很久以前在一个遥远的银河系……",
        "有两个朋友，他们都热爱树懒和代码……",
        "世界正在终结，因为每只树懒都进化出超人般的智慧……",
        "在一个朋友不知情的情况下，另一个朋友意外地写了一个让树懒进化的程序……",
    ])
def terminal(command: str) -> str:
    if "rm" in command or "sudo" in command or "dd" in command or "chmod" in command:
        msg = "无法执行 'rm, sudo, dd, chmod' 命令，因为它们很危险"
        print(msg); return msg
    print(f"正在执行终端命令 `{command}`")
    try:
        return str(subprocess.run(command, capture_output = True, text = True, shell = True, check = True).stdout)
    except subprocess.CalledProcessError as e:
        return f"命令失败：{e.stderr}"
def python(code: str) -> str:
    data = {}
    exec(code, data)
    del data["__builtins__"]
    return str(data)
MAP_FN = {
    "add_number": add_number,
    "multiply_number": multiply_number,
    "substract_number": substract_number,
    "write_a_story": write_a_story,
    "terminal": terminal,
    "python": python,
}
tools = [
    {
        "type": "function",
        "function": {
            "name": "add_number",
            "description": "将两个数字相加。",
            "parameters": {
                "type": "object",
                "properties": {
                    "a": {
                        "type": "string",
                        "description": "第一个数字。",
                    },
                    "b": {
                        "type": "string",
                        "description": "第二个数字。",
                    },
                },
                "required": ["a", "b"],
            },
        },
    },
    {
        "type": "function",
        "function": {
            "name": "multiply_number",
            "description": "将两个数字相乘。",
            "parameters": {
                "type": "object",
                "properties": {
                    "a": {
                        "type": "string",
                        "description": "第一个数字。",
                    },
                    "b": {
                        "type": "string",
                        "description": "第二个数字。",
                    },
                },
                "required": ["a", "b"],
            },
        },
    },
    {
        "type": "function",
        "function": {
            "name": "substract_number",
            "description": "将两个数字相减。",
            "parameters": {
                "type": "object",
                "properties": {
                    "a": {
                        "type": "string",
                        "description": "第一个数字。",
                    },
                    "b": {
                        "type": "string",
                        "description": "第二个数字。",
                    },
                },
                "required": ["a", "b"],
            },
        },
    },
    {
        "type": "function",
        "function": {
            "name": "write_a_story",
            "description": "写一个随机故事。",
            "parameters": {
                "type": "object",
                "properties": {},
                "required": [],
            },
        },
    },
    {
        "type": "function",
        "function": {
            "name": "terminal",
            "description": "在终端执行操作。",
            "parameters": {
                "type": "object",
                "properties": {
                    "command": {
                        "type": "string",
                        "description": "您希望运行的命令，例如 `ls`、`rm` 等。",
                    },
                },
                "required": ["command"],
            },
        },
    },
    {
        "type": "function",
        "function": {
            "name": "python",
            "description": "用一些将要运行的 Python 代码调用 Python 解释器。",
            "parameters": {
                "type": "object",
                "properties": {
                    "code": {
                        "type": "string",
                        "description": "要运行的 Python 代码",
                    },
                },
                "required": ["code"],
            },
        },
    },
]

然后我们使用下面的函数（复制粘贴并执行），它们会自动解析函数调用并为任何模型调用 OpenAI 端点：

from openai import OpenAI
def unsloth_inference(
    messages,
    temperature = 0.7,
    top_p = 1.0,
    top_k = -1,
    repetition_penalty = 0.0,
):
    messages = messages.copy()
    openai_client = OpenAI(
        base_url = "http://127.0.0.1:8001/v1",
        api_key = "sk-no-key-required",
    )
    model_name = next(iter(openai_client.models.list())).id
    print(f"使用模型 = {model_name}")
    has_tool_calls = True
    original_messages_len = len(messages)
    while has_tool_calls:
        print(f"当前消息 = {messages}")
        response = openai_client.chat.completions.create(
            model = model_name,
            messages = messages,
            temperature = temperature,
            top_p = top_p,
            tools = tools if tools else None,
            tool_choice = "auto" if tools else None,
            extra_body = {"top_k": top_k, "min_p": min_p, "dry_multiplier" :repetition_penalty,}
        )
        tool_calls = response.choices[0].message.tool_calls or []
        content = response.choices[0].message.content or ""
        tool_calls_dict = [tc.to_dict() for tc in tool_calls] if tool_calls else tool_calls
        messages.append({"role": "assistant", "tool_calls": tool_calls_dict, "content": content,})
        for tool_call in tool_calls:
            fx, args, _id = tool_call.function.name, tool_call.function.arguments, tool_call.id
            out = MAP_FN[fx](**json.loads(args))
            messages.append({"role": "tool", "tool_call_id": _id, "name": fx, "content": str(out),})
        else:
            has_tool_calls = False
    return messages

在通过启动 GLM-4.7-Flash 之后， llama-server 就像在 GLM-4.7-Flash 或参见 Tool Calling Guide 了解更多细节，然后我们可以进行一些工具调用：

针对 GLM 4.7 的数学运算工具调用

messages = [{
    "role": "user",
    "content": [{"type": "text", "text": "今天的日期加 3 天是几号？"}],
}]
unsloth_inference(messages, temperature = 1.0, top_p = 0.95, top_k = -1, min_p = 0.01)

用于为 GLM-4.7-Flash 执行生成的 Python 代码的工具调用

messages = [{
    "role": "user",
    "content": [{"type": "text", "text": "用 Python 创建一个 Fibonacci 函数并计算 fib(20)。"}],
}]
unsloth_inference(messages, temperature = 1.0, top_p = 0.95, top_k = -1, min_p = 0.01)

基准测试

GLM-4.7-Flash 是除 AIME 25 之外在所有基准测试中表现最好的 30B 模型。

基准

GLM-4.7-Flash

Qwen3-30B-A3B-Thinking-2507

GPT-OSS-20B

AIME 25

91.6

85.0

91.7

GPQA

75.2

73.4

71.5

LCB v6

64.0

66.0

61.0

HLE

14.4

9.8

10.9

SWE-bench 已验证

59.2

22.0

34.0

τ²-Bench

79.5

49.0

47.7

BrowseComp

42.8

2.29

28.3

上一页MiniMax-M2.5 下一页Kimi K2.5

最后更新于4天前

这有帮助吗？

hashtag⚙️ 使用指南

hashtag🖥️ 运行 GLM-4.7-Flash

hashtagLlama.cpp 教程（GGUF）：

hashtag➿减少重复与循环

hashtag🐦使用 UD-Q4_K_XL 的 Flappy Bird 示例

hashtag🦥 微调 GLM-4.7-Flash

hashtag🦙 Llama-server 提供服务与部署

hashtag💻 vLLM 中的 GLM-4.7-Flash

hashtag⭐ vLLM GLM-4.7-Flash 预测解码（Speculative Decoding）

hashtag🔨使用 GLM-4.7-Flash 的工具调用

hashtag基准测试

⚙️ 使用指南

🖥️ 运行 GLM-4.7-Flash

Llama.cpp 教程（GGUF）：

➿减少重复与循环

🐦使用 UD-Q4_K_XL 的 Flappy Bird 示例

🦥 微调 GLM-4.7-Flash

🦙 Llama-server 提供服务与部署

💻 vLLM 中的 GLM-4.7-Flash

⭐ vLLM GLM-4.7-Flash 预测解码（Speculative Decoding）

🔨使用 GLM-4.7-Flash 的工具调用

基准测试