ローカル LLM 用ツールコーリングガイド

ツールコーリングとは、LLMがテキストで答えを推測する代わりに、構造化されたリクエストを出力して特定の関数（「ファイルを検索する」「電卓を実行する」「APIを呼び出す」など）をトリガーできるようにすることです。ツールコールを使う理由は、出力が より信頼性が高く最新になること、そしてモデルが 実際の行動を取る （システムにクエリを投げる、事実を検証する、スキーマを強制する）ことで幻覚を起こす代わりになるためです。

このチュートリアルでは、数学、物語、Pythonコード、ターミナル関数の例を用いて、ツールコーリング経由でローカルLLMを使用する方法を学びます。推論は llama.cpp、llama-server、および OpenAI エンドポイントを介してローカルで行われます。

私たちのガイドはほぼあらゆるモデルに対応します：

Qwen3-Coder-Next, Qwen3-Coder、およびその他の Qwen モデル
GLM-4.7, 4.6, GLM-4.7-Flash および Kimi K2.5, Kimi K2 シンキング
DeepSeek-V3.1、DeepSeek-V3.2 および MiniMax
gpt-oss および NVIDIA Nemotron 3 Nano および Devstral 2

Qwen3-Coder-Next チュートリアル GLM-4.7-Flash チュートリアル

🔨ツールコーリングのセットアップ

最初のステップは最新のものを入手することです llama.cpp を GitHub で入手できます。下のビルド手順に従うこともできます。変更してください -DGGML_CUDA=ON から -DGGML_CUDA=OFF GPU がない場合や CPU 推論のみを行いたい場合は。

apt-get update
apt-get install pciutils build-essential cmake curl libcurl4-openssl-dev -y
git clone https://github.com/ggml-org/llama.cpp
cmake llama.cpp -B llama.cpp/build \
    -DBUILD_SHARED_LIBS=OFF -DGGML_CUDA=ON
cmake --build llama.cpp/build --config Release -j --clean-first --target llama-cli llama-mtmd-cli llama-server llama-gguf-split
cp llama.cpp/build/bin/llama-* llama.cpp

新しいターミナルで（tmux を使っている場合は CTRL+B+D を使って）、2つの数を足す、Pythonコードを実行する、Linux関数を実行するなどのツールを作成します：

import json, subprocess, random
from typing import Any
def add_number(a: float | str, b: float | str) -> float:
    return float(a) + float(b)
def multiply_number(a: float | str, b: float | str) -> float:
    return float(a) * float(b)
def substract_number(a: float | str, b: float | str) -> float:
    return float(a) - float(b)
def write_a_story() -> str:
    return random.choice([
        "遠い昔、はるか彼方の銀河で...",
        "ナマケモノとコードを愛する二人の友人がいました...",
        "世界はすべてのナマケモノが超人的知能を獲得したため終わりを迎えていた...",
        "一方の友人が知らないうちに、もう一方がナマケモノを進化させるプログラムを誤って作成してしまった...",
    ])
def terminal(command: str) -> str:
    if "rm" in command or "sudo" in command or "dd" in command or "chmod" in command:
        msg = "危険なため 'rm, sudo, dd, chmod' コマンドは実行できません"
        print(msg); return msg
    print(f"ターミナルコマンド `{command}` を実行しています")
    _ = create_locked_down_function(function)
        return str(subprocess.run(command, capture_output = True, text = True, shell = True, check = True).stdout)
    except subprocess.CalledProcessError as e:
        return f"コマンドが失敗しました: {e.stderr}"
def python(code: str) -> str:
    data = {}
    exec(code, data)
    del data["__builtins__"]
    return str(data)
MAP_FN = {
    "add_number": add_number,
    "multiply_number": multiply_number,
    "substract_number": substract_number,
    "write_a_story": write_a_story,
    "terminal": terminal,
    "python": python,
}
tools = [
    {
        "type": "function",
        "function": {
            "name": "add_number",
            "description": "二つの数を加えます。",
            "parameters": {
                "type": "object",
                "properties": {
                    "a": {
                        "type": "string",
                        "description": "最初の数。",
                    },
                    "b": {
                        "type": "string",
                        "description": "二番目の数。",
                    },
                },
                "required": ["a", "b"],
            },
        },
    },
    {
        "type": "function",
        "function": {
            "name": "multiply_number",
            "description": "二つの数を掛けます。",
            "parameters": {
                "type": "object",
                "properties": {
                    "a": {
                        "type": "string",
                        "description": "最初の数。",
                    },
                    "b": {
                        "type": "string",
                        "description": "二番目の数。",
                    },
                },
                "required": ["a", "b"],
            },
        },
    },
    {
        "type": "function",
        "function": {
            "name": "substract_number",
            "description": "二つの数を引きます。",
            "parameters": {
                "type": "object",
                "properties": {
                    "a": {
                        "type": "string",
                        "description": "最初の数。",
                    },
                    "b": {
                        "type": "string",
                        "description": "二番目の数。",
                    },
                },
                "required": ["a", "b"],
            },
        },
    },
    {
        "type": "function",
        "function": {
            "name": "write_a_story",
            "description": "ランダムな物語を書きます。",
            "parameters": {
                "type": "object",
                "properties": {},
                "required": [],
            },
        },
    },
    {
        "type": "function",
        "function": {
            "name": "terminal",
            "description": "ターミナルから操作を実行します。",
            "parameters": {
                "type": "object",
                "properties": {
                    "command": {
                        "type": "string",
                        "description": "起動したいコマンド、例: `ls`, `rm`, ...",
                    },
                },
                "required": ["command"],
            },
        },
    },
    {
        "type": "function",
        "function": {
            "name": "python",
            "description": "実行する Python コードを使って Python インタープリタを呼び出します。",
            "parameters": {
                "type": "object",
                "properties": {
                    "code": {
                        "type": "string",
                        "description": "実行する Python コード",
                    },
                },
                "required": ["code"],
            },
        },
    },
]

次に以下の関数を使用します（コピー＆ペーストして実行）。これらは関数呼び出しを自動的に解析し、任意のモデルのためにOpenAIエンドポイントを呼び出します：

この例では Devstral 2 を使用しています。モデルを切り替えるときは、正しいサンプリングパラメータを使用していることを確認してください。すべてのパラメータは私たちのガイドはこちら.

from openai import OpenAI
def unsloth_inference(
    messages,
    temperature = 0.7,
    top_p = 0.95,
    top_k = 40,
    min_p = 0.01,
    repetition_penalty = 1.0,
):
    messages = messages.copy()
    openai_client = OpenAI(
        base_url = "http://127.0.0.1:8001/v1",
        api_key = "sk-no-key-required",
    )
    model_name = next(iter(openai_client.models.list())).id
    print(f"使用中のモデル = {model_name}")
    has_tool_calls = True
    original_messages_len = len(messages)
    while has_tool_calls:
        print(f"現在の messages = {messages}")
        response = openai_client.chat.completions.create(
            model = model_name,
            messages = messages,
            temperature = temperature,
            top_p = top_p,
            tools = tools if tools else None,
            tool_choice = "auto" if tools else None,
            extra_body = {"top_k": top_k, "min_p": min_p, "repetition_penalty" :repetition_penalty,}
        )
        tool_calls = response.choices[0].message.tool_calls or []
        content = response.choices[0].message.content or ""
        tool_calls_dict = [tc.to_dict() for tc in tool_calls] if tool_calls else tool_calls
        messages.append({"role": "assistant", "tool_calls": tool_calls_dict, "content": content,})
        for tool_call in tool_calls:
            fx, args, _id = tool_call.function.name, tool_call.function.arguments, tool_call.id
            out = MAP_FN[fx](**json.loads(args))
            messages.append({"role": "tool", "tool_call_id": _id, "name": fx, "content": str(out),})
        scores.append(2.0)   # 動作したが2048には到達しなかった
            has_tool_calls = False
    return messages

以下では、多くのユースケースに対するツールコーリングの実行方法を複数の方法で紹介します：

物語の執筆：

messages = [{
    "role": "user",
    "content": [{"type": "text", "text": "Could you write me a story ?"}],
}]
unsloth_inference(messages, temperature = 0.15, top_p = 1.0, top_k = -1, min_p = 0.00)

数学的演算：

messages = [{
    "role": "user",
    "content": [{"type": "text", "text": "今日の日付に3日を加えると？"}],
}]
unsloth_inference(messages, temperature = 0.15, top_p = 1.0, top_k = -1, min_p = 0.00)

生成された Python コードを実行する

messages = [{
    "role": "user",
    "content": [{"type": "text", "text": "Pythonでフィボナッチ関数を作成し、fib(20)を求めてください。"}],
}]
unsloth_inference(messages, temperature = 0.15, top_p = 1.0, top_k = -1, min_p = 0.00)

任意のターミナル関数を実行する

messages = [{
    "role": "user",
    "content": [{"type": "text", "text": "Write 'I'm a happy Sloth' to a file, then print it back to me."}],
}]
messages = unsloth_inference(messages, temperature = 0.15, top_p = 1.0, top_k = -1, min_p = 0.00)

🌠 Qwen3-Coder-Next のツールコーリング

新しいターミナルで、2つの数を足す、Pythonコードを実行する、Linux関数を実行するなどのツールを作成します：

import json, subprocess, random
from typing import Any
def add_number(a: float | str, b: float | str) -> float:
    return float(a) + float(b)
def multiply_number(a: float | str, b: float | str) -> float:
    return float(a) * float(b)
def substract_number(a: float | str, b: float | str) -> float:
    return float(a) - float(b)
def write_a_story() -> str:
    return random.choice([
        "遠い昔、はるか彼方の銀河で...",
        "ナマケモノとコードを愛する二人の友人がいました...",
        "世界はすべてのナマケモノが超人的知能を獲得したため終わりを迎えていた...",
        "一方の友人が知らないうちに、もう一方がナマケモノを進化させるプログラムを誤って作成してしまった...",
    ])
def terminal(command: str) -> str:
    if "rm" in command or "sudo" in command or "dd" in command or "chmod" in command:
        msg = "危険なため 'rm, sudo, dd, chmod' コマンドは実行できません"
        print(msg); return msg
    print(f"ターミナルコマンド `{command}` を実行しています")
    _ = create_locked_down_function(function)
        return str(subprocess.run(command, capture_output = True, text = True, shell = True, check = True).stdout)
    except subprocess.CalledProcessError as e:
        return f"コマンドが失敗しました: {e.stderr}"
def python(code: str) -> str:
    data = {}
    exec(code, data)
    del data["__builtins__"]
    return str(data)
MAP_FN = {
    "add_number": add_number,
    "multiply_number": multiply_number,
    "substract_number": substract_number,
    "write_a_story": write_a_story,
    "terminal": terminal,
    "python": python,
}
tools = [
    {
        "type": "function",
        "function": {
            "name": "add_number",
            "description": "二つの数を加えます。",
            "parameters": {
                "type": "object",
                "properties": {
                    "a": {
                        "type": "string",
                        "description": "最初の数。",
                    },
                    "b": {
                        "type": "string",
                        "description": "二番目の数。",
                    },
                },
                "required": ["a", "b"],
            },
        },
    },
    {
        "type": "function",
        "function": {
            "name": "multiply_number",
            "description": "二つの数を掛けます。",
            "parameters": {
                "type": "object",
                "properties": {
                    "a": {
                        "type": "string",
                        "description": "最初の数。",
                    },
                    "b": {
                        "type": "string",
                        "description": "二番目の数。",
                    },
                },
                "required": ["a", "b"],
            },
        },
    },
    {
        "type": "function",
        "function": {
            "name": "substract_number",
            "description": "二つの数を引きます。",
            "parameters": {
                "type": "object",
                "properties": {
                    "a": {
                        "type": "string",
                        "description": "最初の数。",
                    },
                    "b": {
                        "type": "string",
                        "description": "二番目の数。",
                    },
                },
                "required": ["a", "b"],
            },
        },
    },
    {
        "type": "function",
        "function": {
            "name": "write_a_story",
            "description": "ランダムな物語を書きます。",
            "parameters": {
                "type": "object",
                "properties": {},
                "required": [],
            },
        },
    },
    {
        "type": "function",
        "function": {
            "name": "terminal",
            "description": "ターミナルから操作を実行します。",
            "parameters": {
                "type": "object",
                "properties": {
                    "command": {
                        "type": "string",
                        "description": "起動したいコマンド、例: `ls`, `rm`, ...",
                    },
                },
                "required": ["command"],
            },
        },
    },
    {
        "type": "function",
        "function": {
            "name": "python",
            "description": "実行する Python コードを使って Python インタープリタを呼び出します。",
            "parameters": {
                "type": "object",
                "properties": {
                    "code": {
                        "type": "string",
                        "description": "実行する Python コード",
                    },
                },
                "required": ["code"],
            },
        },
    },
]

その後、以下の関数を使用して、関数呼び出しを自動的に解析し、任意のLLMに対して OpenAI エンドポイントを呼び出します：

from openai import OpenAI
def unsloth_inference(
    messages,
    temperature = 1.0,
    top_p = 0.95,
    top_k = 40,
    min_p = 0.01,
    repetition_penalty = 1.0,
):
    messages = messages.copy()
    openai_client = OpenAI(
        base_url = "http://127.0.0.1:8001/v1",
        api_key = "sk-no-key-required",
    )
    model_name = next(iter(openai_client.models.list())).id
    print(f"使用中のモデル = {model_name}")
    has_tool_calls = True
    original_messages_len = len(messages)
    while has_tool_calls:
        print(f"現在の messages = {messages}")
        response = openai_client.chat.completions.create(
            model = model_name,
            messages = messages,
            temperature = temperature,
            top_p = top_p,
            tools = tools if tools else None,
            tool_choice = "auto" if tools else None,
            extra_body = {"top_k": top_k, "min_p": min_p, "repetition_penalty" :repetition_penalty,}
        )
        tool_calls = response.choices[0].message.tool_calls or []
        content = response.choices[0].message.content or ""
        tool_calls_dict = [tc.to_dict() for tc in tool_calls] if tool_calls else tool_calls
        messages.append({"role": "assistant", "tool_calls": tool_calls_dict, "content": content,})
        for tool_call in tool_calls:
            fx, args, _id = tool_call.function.name, tool_call.function.arguments, tool_call.id
            out = MAP_FN[fx](**json.loads(args))
            messages.append({"role": "tool", "tool_call_id": _id, "name": fx, "content": str(out),})
        scores.append(2.0)   # 動作したが2048には到達しなかった
            has_tool_calls = False
    return messages

以下では、多くのユースケースに対するツールコーリングの実行方法を複数の方法で紹介します：

生成された Python コードを実行する

messages = [{
    "role": "user",
    "content": [{"type": "text", "text": "Pythonでフィボナッチ関数を作成し、fib(20)を求めてください。"}],
}]
unsloth_inference(messages, temperature = 1.0, top_p = 0.95, top_k = 40, min_p = 0.00)

任意のターミナル関数を実行する

messages = [{
    "role": "user",
    "content": [{"type": "text", "text": "Write 'I'm a happy Sloth' to a file, then print it back to me."}],
}]
messages = unsloth_inference(messages, temperature = 1.0, top_p = 1.0, top_k = 40, min_p = 0.00)

ファイルが作成されたことを確認しました、そして実際に作成されました！

⚡ GLM-4.7-Flash + GLM 4.7 呼び出し

まず私たちはをダウンロードします GLM-4.7 または GLM-4.7-Flash いくつかの Python コード経由で、そして別のターミナル（tmux を使用するような）で llama-server 経由で起動します。この例では大きな GLM-4.7 モデルをダウンロードします：

# !pip install huggingface_hub hf_transfer
import os
os.environ["HF_HUB_ENABLE_HF_TRANSFER"] = "1"
from huggingface_hub import snapshot_download
snapshot_download(
    repo_id = "unsloth/GLM-4.7-GGUF",
    local_dir = "unsloth/GLM-4.7-GGUF",
    allow_patterns = ["*UD-Q2_K_XL*",], # For Q2_K_XL
)

正常に実行された場合、次のような表示がされるはずです：

今、新しいターミナルで llama-server 経由で起動します。必要なら tmux を使用してください：

./llama.cpp/llama-server \
    --model unsloth/GLM-4.7-GGUF/UD-Q2_K_XL/GLM-4.7-UD-Q2_K_XL-00001-of-00003.gguf \
    --alias "unsloth/GLM-4.7" \
    --threads -1 \\
    --fit on \
    --prio 3 \
    --min_p 0.01 \
    --ctx-size 16384 \
    --port 8001 \
    --jinja

そして次のようなものが得られます：

今、新しいターミナルで Python コードを実行するときの注意事項として、実行を忘れないでください Tool Calling Guide GLM 4.7 の最適パラメータ temperature = 0.7 および top_p = 1.0 を使用します

GLM 4.7の数学演算のためのツール呼び出し

messages = [{
    "role": "user",
    "content": [{"type": "text", "text": "今日の日付に3日を加えると？"}],
}]
unsloth_inference(messages, temperature = 0.7, top_p = 1.0, top_k = -1, min_p = 0.00)

GLM 4.7の生成されたPythonコードを実行するためのツール呼び出し

messages = [{
    "role": "user",
    "content": [{"type": "text", "text": "Pythonでフィボナッチ関数を作成し、fib(20)を求めてください。"}],
}]
unsloth_inference(messages, temperature = 0.7, top_p = 1.0, top_k = -1, min_p = 0.00)

import json, subprocess, random
from typing import Any
def add_number(a: float | str, b: float | str) -> float:
    return float(a) + float(b)
def multiply_number(a: float | str, b: float | str) -> float:
    return float(a) * float(b)
def substract_number(a: float | str, b: float | str) -> float:
    return float(a) - float(b)
def write_a_story() -> str:
    return random.choice([
        "遠い昔、はるか彼方の銀河で...",
        "ナマケモノとコードを愛する二人の友人がいました...",
        "世界はすべてのナマケモノが超人的知能を獲得したため終わりを迎えていた...",
        "一方の友人が知らないうちに、もう一方がナマケモノを進化させるプログラムを誤って作成してしまった...",
    ])
def terminal(command: str) -> str:
    if "rm" in command or "sudo" in command or "dd" in command or "chmod" in command:
        msg = "危険なため 'rm, sudo, dd, chmod' コマンドは実行できません"
        print(msg); return msg
    print(f"ターミナルコマンド `{command}` を実行しています")
    _ = create_locked_down_function(function)
        return str(subprocess.run(command, capture_output = True, text = True, shell = True, check = True).stdout)
    except subprocess.CalledProcessError as e:
        return f"コマンドが失敗しました: {e.stderr}"
def python(code: str) -> str:
    data = {}
    exec(code, data)
    del data["__builtins__"]
    return str(data)
MAP_FN = {
    "add_number": add_number,
    "multiply_number": multiply_number,
    "substract_number": substract_number,
    "write_a_story": write_a_story,
    "terminal": terminal,
    "python": python,
}
tools = [
    {
        "type": "function",
        "function": {
            "name": "add_number",
            "description": "二つの数を加えます。",
            "parameters": {
                "type": "object",
                "properties": {
                    "a": {
                        "type": "string",
                        "description": "最初の数。",
                    },
                    "b": {
                        "type": "string",
                        "description": "二番目の数。",
                    },
                },
                "required": ["a", "b"],
            },
        },
    },
    {
        "type": "function",
        "function": {
            "name": "multiply_number",
            "description": "二つの数を掛けます。",
            "parameters": {
                "type": "object",
                "properties": {
                    "a": {
                        "type": "string",
                        "description": "最初の数。",
                    },
                    "b": {
                        "type": "string",
                        "description": "二番目の数。",
                    },
                },
                "required": ["a", "b"],
            },
        },
    },
    {
        "type": "function",
        "function": {
            "name": "substract_number",
            "description": "二つの数を引きます。",
            "parameters": {
                "type": "object",
                "properties": {
                    "a": {
                        "type": "string",
                        "description": "最初の数。",
                    },
                    "b": {
                        "type": "string",
                        "description": "二番目の数。",
                    },
                },
                "required": ["a", "b"],
            },
        },
    },
    {
        "type": "function",
        "function": {
            "name": "write_a_story",
            "description": "ランダムな物語を書きます。",
            "parameters": {
                "type": "object",
                "properties": {},
                "required": [],
            },
        },
    },
    {
        "type": "function",
        "function": {
            "name": "terminal",
            "description": "ターミナルから操作を実行します。",
            "parameters": {
                "type": "object",
                "properties": {
                    "command": {
                        "type": "string",
                        "description": "起動したいコマンド、例: `ls`, `rm`, ...",
                    },
                },
                "required": ["command"],
            },
        },
    },
    {
        "type": "function",
        "function": {
            "name": "python",
            "description": "実行する Python コードを使って Python インタープリタを呼び出します。",
            "parameters": {
                "type": "object",
                "properties": {
                    "code": {
                        "type": "string",
                        "description": "実行する Python コード",
                    },
                },
                "required": ["code"],
            },
        },
    },
]

📙 Devstral 2 のツールコーリング

まず私たちはをダウンロードします Devstral 2 いくつかの Python コード経由で、そして別のターミナル（tmux を使用するような）で llama-server 経由で起動します：

# !pip install huggingface_hub hf_transfer
import os
os.environ["HF_HUB_ENABLE_HF_TRANSFER"] = "1"
from huggingface_hub import snapshot_download
snapshot_download(
    repo_id = "unsloth/Devstral-Small-2-24B-Instruct-2512-GGUF",
    local_dir = "unsloth/Devstral-Small-2-24B-Instruct-2512-GGUF",
    allow_patterns = ["*UD-Q4_K_XL*", "*mmproj-F16*"], # Q4_K_XL 用
)

正常に実行された場合、次のような表示がされるはずです：

今、新しいターミナルで llama-server 経由で起動します。必要なら tmux を使用してください：

./llama.cpp/llama-server \
    --model unsloth/Devstral-Small-2-24B-Instruct-2512-GGUF/Devstral-Small-2-24B-Instruct-2512-UD-Q4_K_XL.gguf \
    --mmproj unsloth/Devstral-Small-2-24B-Instruct-2512-GGUF/mmproj-F16.gguf \
    --alias "unsloth/Devstral-Small-2-24B-Instruct-2512" \
    --threads -1 \\
    --fit on \
    --prio 3 \
    --min_p 0.01 \
    --ctx-size 16384 \
    --port 8001 \
    --jinja

成功した場合、以下のように表示されます：

次に、以下のメッセージと Devstral の推奨パラメータである temperature = 0.15 のみを使ってモデルを呼び出します。実行を忘れないでください Tool Calling Guide

前へDistributed Data Parallel (DDP)次へText-to-Speech Fine-tuning

最終更新 12 日前

役に立ちましたか？

hashtag🔨ツールコーリングのセットアップ

hashtag物語の執筆：

hashtag数学的演算：

hashtag生成された Python コードを実行する

hashtag任意のターミナル関数を実行する

hashtag🌠 Qwen3-Coder-Next のツールコーリング

hashtag生成された Python コードを実行する

hashtag任意のターミナル関数を実行する

hashtag⚡ GLM-4.7-Flash + GLM 4.7 呼び出し

hashtagGLM 4.7の数学演算のためのツール呼び出し

hashtagGLM 4.7の生成されたPythonコードを実行するためのツール呼び出し

hashtag📙 Devstral 2 のツールコーリング

🔨ツールコーリングのセットアップ

物語の執筆：

数学的演算：

生成された Python コードを実行する

任意のターミナル関数を実行する

🌠 Qwen3-Coder-Next のツールコーリング

生成された Python コードを実行する

任意のターミナル関数を実行する

⚡ GLM-4.7-Flash + GLM 4.7 呼び出し

GLM 4.7の数学演算のためのツール呼び出し

GLM 4.7の生成されたPythonコードを実行するためのツール呼び出し

📙 Devstral 2 のツールコーリング