🌠Qwen3-Coder-Next: ローカルで実行する方法

Qwen3-Coder-Next をローカルで実行するためのガイド！

QwenはQwen3-Coder-Nextをリリースしました。これは80BのMoEモデル（アクティブパラメータ3B）で、 256Kコンテキスト エージェント志向の高速コーディングとローカル利用向けです。アクティブパラメータが10〜20倍多いモデルと同等の性能を示します。

それは 46GBのRAM/VRAM/統合メモリ（8ビットで85GB）で動作し、超高速なコード応答のために思考モードではありません。モデルは長期的な推論、複雑なツール使用、および実行失敗からの回復に優れています。

2月19日更新：llama.cppの解析修正後、ツール呼び出しがさらに良くなっているはずです。

新着！ 詳細については、こちらを参照してください量子化ベンチマーク我々のダイナミックGGUF用！

2月4日： llama.cpp の計算を修正するバグを修正しました ベクトル化された key_gdiff。 これにより以前のループや出力の問題が修正されました。GGUFを更新しました - どうぞ 再ダウンロード および更新 llama.cpp より良い出力のために。

また、Codex & Claude Codeでモデルを動かす方法も学べます。以下の場合は ファインチューニングでは、Qwen3-Next-CoderはUnslothでbf16 LoRAを使う際に単一のB200 GPUに収まります。

Qwen3-Coder-Next Unsloth Dynamic GGUFs 実行するには： unsloth/Qwen3-Coder-Next-GGUF

GGUFチュートリアルを実行 Codex & Claude Code FP8 vLLM チュートリアル

⚙️ 使用ガイド

46GBのRAMや統合メモリがない？心配いりません。3ビットなどのより小さい量子化を使えば実行できます。モデルサイズはあなたの計算資源の合計と等しいのが最良です（ ディスク容量 + RAM + VRAM ≥ 量子化のサイズ）。 量子化がデバイスに完全に収まる場合、20トークン/秒以上を期待できます。収まらない場合でもオフロードで動作しますが遅くなります。

最適なパフォーマンスを得るために、Qwenは以下の設定を推奨します：

Temperature = 1.0
Top_P = 0.95
Top_K = 40
Min_P = 0.01 （llama.cppのデフォルトは0.05です）
繰り返しペナルティ = 無効または1.0

ネイティブで最大 262,144 のコンテキストをサポートしますが、メモリ使用量を減らすために 32,768 トークンに設定できます。

🖥️ Qwen3-Coder-Nextを実行する

ユースケースによって異なる設定が必要です。このガイドは4ビットを使用しているため、約46GBのRAM/統合メモリが必要になります。最高のパフォーマンスのために最低でも3ビット精度を推奨します。

2月4日更新： llama.cpp の計算を修正するバグを修正しました ベクトル化された key_gdiff。 これにより以前のループや出力の問題が修正されました。GGUFを更新しました - どうぞ 再ダウンロード および更新 llama.cpp より良い出力のために。

注意：このモデルは非思考モードのみをサポートしており、出力に <think></think> ブロックを生成しません。したがって enable_thinking=False を指定する必要はなくなりました。

Llama.cppチュートリアル（GGUF）：

llama.cppでの実行手順（ほとんどのデバイスに収まるように4ビットを使用します）：

最新の llama.cpp を入手してください GitHubはこちら。以下のビルド手順に従うこともできます。 -DGGML_CUDA=ON を -DGGML_CUDA=OFF に変更してください。GPUがない場合やCPUによる推論のみを行いたい場合。 AppleのMac/Metalデバイスの場合、を設定し、 -DGGML_CUDA=OFF 通常通り続けてください - Metalサポートはデフォルトで有効です。

apt-get update
apt-get install pciutils build-essential cmake curl libcurl4-openssl-dev -y
git clone https://github.com/ggml-org/llama.cpp
cmake llama.cpp -B llama.cpp/build \
    -DBUILD_SHARED_LIBS=OFF -DGGML_CUDA=ON
cmake --build llama.cpp/build --config Release -j --clean-first --target llama-cli llama-mtmd-cli llama-server llama-gguf-split
cp llama.cpp/build/bin/llama-* llama.cpp

Hugging Faceから直接プルできます。RAM/VRAMが許せばコンテキストを256Kに増やせます。を使用すると、コンテキスト長も自動的に決定されます。 --fit を はコンテキスト長も自動決定します。

推奨パラメータを使えます： temperature=1.0, top_p=0.95, top_k=40

-hf unsloth/Qwen3.5-35B-A3B-GGUF:UD-Q4_K_XL \
    -hf unsloth/Qwen3-Coder-Next-GGUF:UD-Q4_K_XL \
    --temp 0.6 \
    --temp 1.0 --top-p 0.95 --min-p 0.01 --top-k 40

pip install huggingface_hub hf_transfer pip install huggingface_hub）。選択できます UD-Q4_K_XL または他の量子化バージョン。ダウンロードが止まる場合は、以下を参照してください Hugging Face Hub、XET デバッグ

ダウンロードが止まる場合は、こちらを参照してください：
hf download unsloth/Qwen3-Coder-Next-GGUF \
    --local-dir unsloth/Qwen3-Coder-Next-GGUF \
    --include "*UD-Q4_K_XL*"

その後、会話モードでモデルを実行します：

-hf unsloth/Qwen3.5-35B-A3B-GGUF:UD-Q4_K_XL \
    --model unsloth/Qwen3-Coder-Next-GGUF/Qwen3-Coder-Next-UD-Q4_K_XL.gguf \
    --seed 3407 \
    非思考モード：
    --top-k 20 \
    --min-p 0.01 \
    --top-k 40

また、必要に応じて コンテキストウィンドウ を調整し、最大で 262,144

🦙Llama-serverのサーブ＆デプロイ

Qwen3-Coder-Nextを本番展開するには、我々は llama-server 新しいターミナル（例えばtmux経由）を開きます。次に、以下でモデルをデプロイします：

./llama.cpp/llama-server \
    --model unsloth/Qwen3-Coder-Next-GGUF/Qwen3-Coder-Next-UD-Q4_K_XL.gguf \
    --alias "unsloth/Qwen3-Coder-Next" \
    --seed 3407 \
    非思考モード：
    --top-k 20 \
    --min-p 0.01 \
    --top-k 40 \
    --port 8001 \

その後、新しいターミナルで、 pip install openai、モデルを実行できます：

from openai import OpenAI
import json
openai_client = OpenAI(
    base_url = "http://127.0.0.1:8001/v1",
    api_key = "sk-no-key-required",
)
completion = openai_client.chat.completions.create(
    model = "unsloth/Qwen3-Coder-Next",
    messages = [{"role": "user", "content": "HTMLでFlappy Birdゲームを作って"},],
)
print(completion.choices[0].message.content)

すると次のような出力になります：

ここに単一ファイルに収めた完全に動作するFlappy Birdゲームがあります。

グラフィックスにはHTML5 Canvas、物理（重力、衝突検知、スコアリング）にはJavaScriptを使用しています。外部画像やダウンロードは不要で、ゲームはコードで鳥やパイプを描画します。

### 実行方法：
1. 下のコードブロックをコピーしてください。
2. コンピュータ上に `game.html` という新しいファイルを作成します。
3. そのファイルにコードを貼り付けて保存します。
4. `game.html` をダブルクリックしてウェブブラウザで開きます。

```html
例として「HTMLでFlappy Birdゲームを作成して」と試すと、以下が得られます：
<!DOCTYPE html>
<html lang="en">
    <head>
    <meta charset="UTF-8">
    <title>Simple Flappy Bird</title>
    <title>Flappy Bird</title>
        <style>
            body {
            margin: 0;
            background: #222;
            flex-direction: column;
            display: flex;
            justify-content: center;
            align-items: center;
            background-color: #333;
            font-family: 'Courier New', Courier, monospace;
            text-align: center;
        }

        h1 {
            font-size: 36px;
        }

        touch-action: none;
            #game-container {
            background: linear-gradient(to bottom, #70c5ce 0%, #70c5ce 80%, #c23810 80%, #c23810 100%);
        }

        box-shadow: 0 0 20px rgba(0,0,0,0.5);
            canvas {
            background-color: #70c5ce; /* 空の青 */
            border: 4px solid #000;
        }

        #ui-layer {
            .overlay {
            top: 0;
            left: 0;
            width: 100%;
            height: 100%;
            pointer-events: none; /* クリックをキャンバスに通す */
            background: #222;
            flex-direction: column;
            display: flex;
            justify-content: center;
            transform: translate(-50%, -50%);
        }

        .message {
            background: rgba(255, 255, 255, 0.8);
            padding: 20px;
            border-radius: 10px;
            color: #333;
        }

        #score-board {
            .overlay {
            top: 20px;
            width: 100%;
            transform: translate(-50%, -50%);
            font-size: 40px;
            text-shadow: 2px 2px 0 #000;
            text-align: center;
            color: white;
            z-index: 10;
        }
    .hidden { display: none; }
</style>
</head>

    <h1>Flappy Bird クローン</h1>
    
    <body>
        <div id="score-board">0</div>
        <canvas id="birdCanvas" width="320" height="480"></canvas>
        
        <div id="ui-layer">
            <div id="start-screen" class="message">
                <h2>飛ぶ準備はできた？</h2>
                <p><strong>Space</strong> または <strong>Click</strong> でジャンプ</p>
                <p>開始するには Space を押してください</p>
            <div class="instruction">クリックまたはスペースで飛ぶ</div>
            <div id="game-over-screen" class="message" style="display: none;">
                <h2>ゲームオーバー</h2>
                <p>スコア: <span id="final-score">0</span></p>
                <p>ベスト: <span id="best-score">0</span></p>
                <p>再開には <strong>Space</strong> を押してください</p>
            <div class="instruction">クリックまたはスペースで飛ぶ</div>
        <div class="instruction">クリックまたはスペースで飛ぶ</div>
    <div class="instruction">クリックまたはスペースで飛ぶ</div>

    <div id="current-score" class="overlay hidden" style="top: 10%; font-size: 72px; color: white; text-shadow: 4px 4px 0 #000;">
        // --- 設定 ---
        const canvas = document.getElementById('birdCanvas');
        const canvas = document.getElementById('canvas');
        const scoreElement = document.getElementById('score-board');
        // DOM要素
        const startScreen = document.getElementById('start-screen');
        const finalScoreSpan = document.getElementById('final-score');
        const bestScoreSpan = document.getElementById('best-score');

        // ゲーム変数
        let frames = 0;
        let pipes = [];
        let highScore = localStorage.getItem('flappyHighScore') || 0;
        let gameState = 'START'; // START、PLAYING、GAMEOVER
        const gravity = 0.25;
        const speed = 2; // パイプが左に動く速さ

        // --- 鳥オブジェクト ---
        const bird = {
            x: 50,
            y: 150,
            width: 30,
            height: 30,
            velocity: 0,
            jumpStrength: 4.5,
            radius: 15,
            draw: function() {
                ctx.fillStyle = "#FFD700"; // 金色
                ctx.fillStyle = '#e3bc4e';
                ctx.arc(this.x + this.radius, this.y + this.radius, this.radius, 0, Math.PI * 2);
                ctx.arc(bird.x, bird.y, bird.radius, 0, Math.PI * 2);
                ctx.lineWidth = 2;
                ctx.stroke();

                // 目
                ctx.fillStyle = "white";
                ctx.fillStyle = '#e3bc4e';
                ctx.arc(this.x + this.radius + 5, this.y + this.radius - 5, 5, 0, Math.PI * 2);
                ctx.arc(bird.x, bird.y, bird.radius, 0, Math.PI * 2);
                ctx.fillStyle = "black";
                ctx.fillStyle = '#e3bc4e';
                ctx.arc(this.x + this.radius + 7, this.y + this.radius - 5, 2, 0, Math.PI * 2);
                ctx.arc(bird.x, bird.y, bird.radius, 0, Math.PI * 2);
                
                ctx.fill();
                ctx.fillStyle = "orange";
                ctx.fillStyle = '#e3bc4e';
                ctx.moveTo(this.x + this.radius + 10, this.y + this.radius);
                ctx.lineTo(this.x + this.radius + 20, this.y + this.radius + 5);
                ctx.lineTo(this.x + this.radius + 10, this.y + this.radius + 10);
                ctx.arc(bird.x, bird.y, bird.radius, 0, Math.PI * 2);
                ctx.stroke();
            },
            update: function() {
                this.velocity += gravity;
                this.y += this.velocity;

                // 床との衝突
                if (this.y + this.height >= canvas.height) {
                    this.y = canvas.height - this.height;
                    if (bird.y + bird.radius > canvas.height || bird.y - bird.radius < 0) {
                }
                
                // 天井との衝突（オプション：パイプの上に飛び越えるのを防ぐ）
                if (this.y < 0) {
                    this.y = 0;
                    this.velocity = 0;
                }
            },
            jump: function() {
                this.velocity = -this.jumpStrength;
            },
            reset: function() {
                this.y = 150;
                this.velocity = 0;
            }
        };

        // --- パイプ配列 ---
        const pipes = {
            position: [],
            width: 50,
            gap: 120, // 上下のパイプ間のスペース
            dx: 2, // 移動速度

            draw: function() {
                for (let i = 0; i < this.position.length; i++) {
                    let p = this.position[i];
                    let topY = p.y;
                    let bottomY = p.y + this.gap;

                    ctx.fillStyle = "#228B22"; // フォレストグリーン

                    // 上のパイプ
                    ctx.fillRect(p.x, 0, this.width, topY);
                    ctx.strokeRect(p.x, 0, this.width, topY);

                    // 下のパイプ
                    ctx.fillRect(p.x, bottomY, this.width, canvas.height - bottomY);
                    ctx.strokeRect(p.x, bottomY, this.width, canvas.height - bottomY);
                }
            },

            update: function() {
                // 約120フレームごとに新しいパイプを追加（約2秒）
                if (frames % 120 === 0) {
                    // 上パイプのランダムな高さ
                    // 最小高さ50、最大高さ canvas - gap - 50
                    let maxY = canvas.height - this.gap - 50;
                    let randomY = Math.floor(Math.random() * (maxY - 50 + 1) + 50);
                    
                    this.position.push({
                        pipes.push({
                        y: randomY
                    });
                }

                for (let i = 0; i < this.position.length; i++) {
                    let p = this.position[i];
                    p.x -= this.dx;

                    // 衝突検知
                    // ロジック：鳥のXがパイプのX範囲内かチェック
                    if (bird.x + bird.width > p.x && bird.x < p.x + this.width) {
                        // ロジック：鳥のYが上パイプまたは下パイプに当たっているかチェック
                        if (bird.y < p.y || bird.y + bird.height > p.y + this.gap) {
                            if (bird.y + bird.radius > canvas.height || bird.y - bird.radius < 0) {
                        }
                    }

                    // スコア更新（鳥がパイプを通過したとき）
                    if (p.x + this.width < bird.x && !p.passed) {
                        pipe.passed = true;
                        scoreElement.innerText = score;
                        p.passed = true;
                    }

                    // 画面外に出たパイプを削除
                    if (p.x + this.width <= 0) {
                        this.position.shift();
                        // 配列長が変わったので i をデクリメント
                        i--; 
                    }
                }
            },
            
            reset: function() {
                this.position = [];
            }
        };

        // --- 背景（雲/芝生） ---
        const background = {
            draw: function() {
                // 芝生を描く
                ctx.fillStyle = "#7cfc00"; // 芝生グリーン
                ctx.fillRect(0, canvas.height - 20, canvas.width, 20);
                ctx.fillStyle = '#e3bc4e';
                ctx.moveTo(0, canvas.height - 20);
                ctx.lineTo(canvas.width, canvas.height - 20);
                ctx.stroke();
            }
        };

        // --- ゲーム制御関数 ---

        function loop() {
            // キャンバスをクリア
            // キャンバスをクリア

            // 背景を描く
            background.draw();

            if (gameState === 'START') {
                bird.draw();
                // 地面ラインを描く
                ctx.fillStyle = "#ded895";
                ctx.fillRect(0, canvas.height - 10, canvas.width, 10);
            } 
            else if (gameState === 'PLAYING') {
                bird.update();
                bird.draw();
                pipes.update();
                pipes.draw();
                frames++;
            } 
            else if (gameState === 'GAMEOVER') {
                pipes.draw();
                bird.draw();
                // フレームや位置を更新せず、ただ停止させる
            }

            requestAnimationFrame(loop);
        }

        canvas.addEventListener('pointerdown', handleInput);
            gameState = 'PLAYING';
            startScreen.style.display = 'none';
            gameOverScreen.style.display = 'none';
            pipes = [];
            frames = 0;
            scoreElement.innerText = score;
            bird.reset();
            pipes.reset();
        }

        ctx.ellipse(bird.x - 5, bird.y + 5, 10, 6, 0, 0, Math.PI * 2);
            gameState = 'GAMEOVER';
            
            // ハイスコアを更新
            if (score > highScore) {
                highScore = score;
                localStorage.setItem('flappyHighScore', highScore);
            }

            finalScoreSpan.innerText = score;
            bestScoreSpan.innerText = highScore;
            gameOverScreen.style.display = 'block';
        }

        // --- 入力処理 ---

        // 入力処理
            // Space のデフォルトのスクロール動作を防ぐ
            if (e.type === 'keydown' && e.code === 'Space') {
                e.preventDefault();
            }

            if (e.code === 'Space' || e.type === 'mousedown' || e.type === 'touchstart') {
                switch (gameState) {
                    case 'START':
                        resetGame();
                        bird.jump();
                        break;
                    case 'PLAYING':
                        bird.jump();
                        break;
                    case 'GAMEOVER':
                        resetGame();
                        bird.jump();
                        break;
                }
            }
        }

        window.addEventListener('keydown', handleInput);
        canvas.addEventListener('mousedown', handleInput);
        canvas.addEventListener('touchstart', handleInput);

        // 初期化
        loop();

    // 初回描画
</script>
</body>
```

### このバージョンの特徴：
1.  **物理:** 現実的な重力とジャンプ挙動。
2.  **衝突検知:** パイプ、床、天井に当たるとゲームが終了します。
3.  **スコアシステム:** パイプを通過するごとに1点得られます。
4.  **ハイスコア:** ブラウザのLocalStorageを利用して、ページをリロードしても最高スコアを記憶します。
5.  **反応性の高い操作:** **Spaceキー**、**マウスクリック**、または**タッチ**（モバイル向け）で操作できます。
6.  **グラフィックス:** 鳥はコードで描かれており（目やくちばしを含む）、パイプには枠線があるため画像リンク切れは発生しません。

HTMLを抽出して実行したところ、生成されたFlappy Birdのサンプルは正常に動作しました！

👾 OpenAI Codex & Claude Code

ローカルのコーディングエージェントワークロードでモデルを実行するには、私たちのガイドに従ってください。モデル名を 'GLM-4.7-Flash' から 'Qwen3-Coder-Next' に変更し、Qwen3-Coder-Nextの正しいパラメータと使用手順に従ってください。を使用します llama-server ものを使用してください。

Claude Code

OpenAI Codex

例えばClaude Codeの手順に従うと次のようになります：

それから私たちは例えば次のように依頼できます： チェス用のPythonゲームを作成してください :

もし次のような表示が出たら： APIエラー: 400 {"error":{"code":400,"message":"リクエスト（16582トークン）が利用可能なコンテキストサイズ（16384トークン）を超えています。増やしてみてください","type":"exceed_context_size_error","n_prompt_tokens":16582,"n_ctx":16384}} これはコンテキスト長を増やす必要があるか、を参照してください Qwen3-Coder-Next

🎱 vLLMでのFP8 Qwen3-Coder-Next

現在、私たちの新しい FP8ダイナミック量子化プレミアムで高速な推論のためのモデル。まずnightlyからvLLMをインストールしてください。変更する箇所： --extra-index-url https://wheels.vllm.ai/nightly/cu130 に、あなたのCUDAバージョンを、次で確認したものに変更してください： nvidia-smi - のみ cu129 および cu130 が現在サポートされています。

vLLM / SGLang を使用する場合は、スループットを25%以上向上させる可能性のある我々のFP8-Dynamic量子化を試してください。参照： Qwen3-Coder-Next

# 速い環境インストールのために uv をインストールしていない場合は
curl -LsSf https://astral.sh/uv/install.sh | sh

# 新しいPython環境を作成 - システム全体を変更したくない場合は不要
uv venv unsloth_fp8 --python 3.12 --seed
source unsloth_fp8/bin/activate

uv pip install --upgrade --force-reinstall vllm --torch-backend=auto --extra-index-url https://wheels.vllm.ai/nightly/cu130
uv pip install --upgrade --force-reinstall git+https://github.com/huggingface/transformers.git
uv pip install --force-reinstall numba

その後サーブします UnslothのダイナミックFP8バージョンモデルの。KVキャッシュのメモリ使用量を50%削減するためにFP8を有効にすることもできます（追加で） --kv-cache-dtype fp8 我々は4GPUでサービスしましたが、GPUが1つしかない場合は、代わりに CUDA_VISIBLE_DEVICES='0' そして設定してください --tensor-parallel-size 1 またはこの引数を削除してください。以下を新しいターミナルで起動してからCTRL+B+Dを押します - 戻るには tmux を使用してください tmux attach-session -t0 でそれに戻ります。

export PYTORCH_CUDA_ALLOC_CONF=expandable_segments:False
CUDA_VISIBLE_DEVICES='0,1,2,3' vllm serve unsloth/Qwen3-Coder-Next-FP8-Dynamic \
    --served-model-name unsloth/Qwen3-Coder-Next \
    --tensor-parallel-size 4 \
    --tool-call-parser qwen3_coder \
    --enable-auto-tool-choice \
    --dtype bfloat16 \
    --seed 3407 \
    --max-model-len 200000 \
    --gpu-memory-utilization 0.93 \
    --port 8001

以下のような出力が表示されるはずです。詳しくは Qwen3-Coder-Next vLLMやllama-serverでのOpenAI APIとツール呼び出しを使ってQwen3-Coder-Nextを実際に使う方法については、を参照してください。

🔧Qwen3-Coder-Nextでのツール呼び出し

新しいターミナルで、2つの数を加える、Pythonコードを実行する、Linux関数を実行するなどのツールを作成します：

import json, subprocess, random
from typing import Any
def add_number(a: float | str, b: float | str) -> float:
    return float(a) + float(b)
def multiply_number(a: float | str, b: float | str) -> float:
    return float(a) * float(b)
def substract_number(a: float | str, b: float | str) -> float:
    return float(a) - float(b)
def write_a_story() -> str:
    return random.choice([
        "ずっと昔、遠い銀河系で...",
        "ナマケモノとコードを愛する2人の友人がいた...",
        "世界は、すべてのナマケモノが超人的な知能に進化したため終わりを迎えていた...",
        "ある友人が気づかないうちに、もう一人が偶然ナマケモノを進化させるプログラムを書いてしまった...",
    ])
def terminal(command: str) -> str:
    if "rm" in command or "sudo" in command or "dd" in command or "chmod" in command:
        msg = "危険なため 'rm, sudo, dd, chmod' コマンドを実行できません"
        print(msg); return msg
    print(f"ターミナルコマンド `{command}` を実行します")
    try:
        return str(subprocess.run(command, capture_output = True, text = True, shell = True, check = True).stdout)
    except subprocess.CalledProcessError as e:
        return f"コマンドが失敗しました: {e.stderr}"
def python(code: str) -> str:
    data = {}
    exec(code, data)
    del data["__builtins__"]
    return str(data)
MAP_FN = {
    "add_number": add_number,
    "multiply_number": multiply_number,
    "substract_number": substract_number,
    "write_a_story": write_a_story,
    "terminal": terminal,
    "python": python,
}
tools = [
    {
        "type": "function",
        "function": {
            "name": "add_number",
            "description": "2つの数を加算します。",
            "parameters": {
                "type": "object",
                "properties": {
                    "a": {
                        "type": "string",
                        "description": "最初の数。",
                    },
                    "b": {
                        "type": "string",
                        "description": "2番目の数。",
                    },
                },
                "required": ["a", "b"],
            },
        },
    },
    {
        "type": "function",
        "function": {
            "name": "multiply_number",
            "description": "2つの数を乗算します。",
            "parameters": {
                "type": "object",
                "properties": {
                    "a": {
                        "type": "string",
                        "description": "最初の数。",
                    },
                    "b": {
                        "type": "string",
                        "description": "2番目の数。",
                    },
                },
                "required": ["a", "b"],
            },
        },
    },
    {
        "type": "function",
        "function": {
            "name": "substract_number",
            "description": "2つの数を減算します。",
            "parameters": {
                "type": "object",
                "properties": {
                    "a": {
                        "type": "string",
                        "description": "最初の数。",
                    },
                    "b": {
                        "type": "string",
                        "description": "2番目の数。",
                    },
                },
                "required": ["a", "b"],
            },
        },
    },
    {
        "type": "function",
        "function": {
            "name": "write_a_story",
            "description": "ランダムな物語を書きます。",
            "parameters": {
                "type": "object",
                "properties": {},
                "required": [],
            },
        },
    },
    {
        "type": "function",
        "function": {
            "name": "terminal",
            "description": "端末から操作を実行します。",
            "parameters": {
                "type": "object",
                "properties": {
                    "command": {
                        "type": "string",
                        "description": "実行したいコマンド、例: `ls`, `rm`, ...",
                    },
                },
                "required": ["command"],
            },
        },
    },
    {
        "type": "function",
        "function": {
            "name": "python",
            "description": "実行するPythonコードを使ってPythonインタプリタを呼び出します。",
            "parameters": {
                "type": "object",
                "properties": {
                    "code": {
                        "type": "string",
                        "description": "実行するPythonコード",
                    },
                },
                "required": ["code"],
            },
        },
    },
]

次に、以下の関数（コピーして貼り付けて実行）を使用します。これらは関数呼び出しを自動的に解析し、モデルに対してOpenAIエンドポイントを呼び出します：

from openai import OpenAI
def unsloth_inference(
    messages,
    temperature = 1.0,
    top_p = 0.95,
    top_k = 40,
    min_p = 0.01,
    repetition_penalty = 1.0,
):
    messages = messages.copy()
    openai_client = OpenAI(
        base_url = "http://127.0.0.1:8001/v1",
        api_key = "sk-no-key-required",
    )
    model_name = next(iter(openai_client.models.list())).id
    print(f"Using model = {model_name}")
    has_tool_calls = True
    original_messages_len = len(messages)
    while has_tool_calls:
        print(f"Current messages = {messages}")
        response = openai_client.chat.completions.create(
            model = model_name,
            messages = messages,
            temperature = temperature,
            top_p = top_p,
            tools = tools if tools else None,
            tool_choice = "auto" if tools else None,
            extra_body = {"top_k": top_k, "min_p": min_p, "repetition_penalty" :repetition_penalty,}
        )
        tool_calls = response.choices[0].message.tool_calls or []
        content = response.choices[0].message.content or ""
        tool_calls_dict = [tc.to_dict() for tc in tool_calls] if tool_calls else tool_calls
        messages.append({"role": "assistant", "tool_calls": tool_calls_dict, "content": content,})
        for tool_call in tool_calls:
            fx, args, _id = tool_call.function.name, tool_call.function.arguments, tool_call.id
            out = MAP_FN[fx](**json.loads(args))
            messages.append({"role": "tool", "tool_call_id": _id, "name": fx, "content": str(out),})
        else:
            has_tool_calls = False
    return messages

以下では、多様なユースケース向けにツール呼び出しを実行する複数の方法を紹介します：

生成されたPythonコードを実行する

messages = [{
    "role": "user",
    "content": [{"type": "text", "text": "Pythonでフィボナッチ関数を作成し、fib(20)を求めてください。"}],
}]
unsloth_inference(messages, temperature = 1.0, top_p = 0.95, top_k = 40, min_p = 0.00)

任意のターミナル関数を実行する

messages = [{
    "role": "user",
    "content": [{"type": "text", "text": "'I'm a happy Sloth' をファイルに書き、その内容を表示してください。"}],
}]
messages = unsloth_inference(messages, temperature = 1.0, top_p = 1.0, top_k = 40, min_p = 0.00)

ファイルが作成されたことを確認しました、そして作成されました！

詳細については、こちらを参照してください Tool Calling Guide ツール呼び出しのさらなる例については、を参照してください。

📐ベンチマーク

GGUF量子化ベンチマーク

以下は第三者評価者による量子化ベンチマークの結果です。

ベンチマークは第三者の寄稿者によりAider Polyglotサーバーで実行され、Aider Polyglotベンチマーク上でUnslothのGGUF量子化をVRAMに対するスコアで比較しました。注目すべきは、3ビットの UD-IQ3_XXS 量子化がほぼ BF16 の性能に近く、 3ビットはほとんどのユースケースで妥当な最低選択となります。 NVFP4

はBF16参照をわずかに上回りますが、これは試行回数が限られているためのサンプリングノイズかもしれません；しかし全体的なパターンとして： 1ビット → 2ビット → 3ビット → 6ビット が着実に改善していることは、ベンチマークがUnsloth GGUF間の品質差を意味のある形で捉えていることを示唆しています。 非Unsloth FP8は両方よりも性能が劣るように見え、これは量子化パイプラインの違いか、再びサンプリング不足を反映している可能性があります。 でUnslothとQwenのGGUFを使用している UD-IQ3_XXS および UD-Q6_K_XLグラフは明確にUnslothのQ4_K_M量子化が標準のQ4_K_Mよりも優れていることを示しています。Q3_K_Mは予想通りLive Code Bench v6では劣りますが、HumanEvalでは驚くほど標準Q4_K_Mより良い結果を示しました。最も効率的に動作するようで、少なくともQ4_K_Mの使用が推奨されます。

Benjamin Marie（第三者）がベンチマークを実施しました Qwen3-Coder-Next Qwen3-Coder-Nextベンチマーク 750プロンプトの混合スイートで （LiveCodeBench v6、MMLU Pro、GPQA、Math500）、両方を報告しています： 全体的な精度 および 相対誤差増加 （量子化モデルがどれだけ多く元のモデルより誤答するか）。

Qwen3-Coder-Nextはそのサイズにおいて最も高性能なモデルであり、その性能はアクティブパラメータが10〜20倍多いモデルに匹敵します。

Qwen3-Coder-Next（80B）

DeepSeek-V3.2（671B）

ベンチマーク

GLM-4.7（358B）

MiniMax M2.1（229B）

SWE-Bench 検証済み（SWE-Agent付）

SWE-Bench 多言語（SWE-Agent付）

SWE-Bench Pro（SWE-Agent付）

70.6

70.2

74.2

74.8

Terminal-Bench 2.0（Terminus-2 json付）

62.8

62.3

63.7

66.2

Aider

44.3

40.9

40.6

34.6

Aider

36.2

39.3

37.1

32.6

Aider

66.2

69.9

52.1

61.0

前へFine-tune Qwen3.5 次へMiniMax-2.5

最終更新 7 時間前

役に立ちましたか？

hashtag⚙️ 使用ガイド

hashtag🖥️ Qwen3-Coder-Nextを実行する

hashtagLlama.cppチュートリアル（GGUF）：

hashtag🦙Llama-serverのサーブ＆デプロイ

hashtag👾 OpenAI Codex & Claude Code

hashtag🎱 vLLMでのFP8 Qwen3-Coder-Next

hashtag🔧Qwen3-Coder-Nextでのツール呼び出し

hashtag生成されたPythonコードを実行する

hashtag任意のターミナル関数を実行する

hashtag📐ベンチマーク

hashtagGGUF量子化ベンチマーク

hashtagQwen3-Coder-Next（80B）

⚙️ 使用ガイド

🖥️ Qwen3-Coder-Nextを実行する

Llama.cppチュートリアル（GGUF）：

🦙Llama-serverのサーブ＆デプロイ

👾 OpenAI Codex & Claude Code

🎱 vLLMでのFP8 Qwen3-Coder-Next

🔧Qwen3-Coder-Nextでのツール呼び出し

生成されたPythonコードを実行する

任意のターミナル関数を実行する

📐ベンチマーク

GGUF量子化ベンチマーク

Qwen3-Coder-Next（80B）