FunctionGemma：如何运行与微调

了解如何在您的设备和手机上本地运行并微调 FunctionGemma。

FunctionGemma 是 Google 推出的一个新的 2.7 亿参数模型，专为函数调用和微调设计。基于 Gemma 3 270M 并专门针对仅文本的工具调用进行训练，其小巧的体积使其非常适合在手机上部署。

您可以在以下环境运行全精度模型 550MB 内存 （CPU），现在您可以微调使用 Unsloth 在本地运行。感谢 Google DeepMind 与 Unsloth 合作，提供零日支持！

运行教程对 FunctionGemma 进行微调

运行 FunctionGemma GGUF： unsloth/functiongemma-270m-it-GGUF

免费笔记本：

微调以 在调用工具前进行推理/思考 使用我们的 FunctionGemma 笔记本
进行 多轮工具调用 在一个免费的多轮工具调用笔记本
微调以 启用移动操作 （日历、设置计时器）在我们的移动操作笔记本中

⚙️ 使用指南

Google 建议以下推理设置：

top_k = 64
top_p = 0.95
temperature = 1.0
最大上下文长度 = 32,768

当我们使用下面内容时，可以找到聊天模板格式：

def get_today_date():
    """ 获取今天的日期 """
    return {"today_date": "2025年12月18日"}
    
tokenizer.apply_chat_template(
    [
        {"role" : "user", "content" : "今天的日期是什么？"},
    ],
    tools = [get_today_date], add_generation_prompt = True, tokenize = False,
)

FunctionGemma 聊天模板格式：

FunctionGemma 需要系统或 开发者消息 作为 您是一个可以进行函数调用的模型，具有以下函数 Unsloth 版本如果您忘记传入会内置此项，因此请使用 unsloth/functiongemma-270m-it

<bos><start_of_turn>developer\nYou are a model that can do function calling with the following functions<start_function_declaration>declaration:get_today_date{description:<escape>Gets today's date<escape>,parameters:{type:<escape>OBJECT<escape>}}<end_function_declaration><end_of_turn>\n<start_of_turn>user\nwhat is today's date?<end_of_turn>\n<start_of_turn>model\n

🖥️ 运行 FunctionGemma

下面查看本地桌面指南，或查看我们的手机部署指南。

Llama.cpp 教程（GGUF）：

在 llama.cpp 中运行的说明（注意我们将使用 4 位以适配大多数设备）：

获取最新的 llama.cpp 在此处的 GitHub。您也可以按照下面的构建说明进行。若 -DGGML_CUDA=ON 更改为 -DGGML_CUDA=OFF 如果您没有 GPU 或仅想要在 CPU 上进行推理。 对于 Apple Mac / Metal 设备，设置 -DGGML_CUDA=OFF 然后照常继续 - Metal 支持默认启用。

apt-get update
apt-get install pciutils build-essential cmake curl libcurl4-openssl-dev -y
git clone https://github.com/ggml-org/llama.cpp
cmake llama.cpp -B llama.cpp/build \
    -DBUILD_SHARED_LIBS=OFF -DGGML_CUDA=ON -DLLAMA_CURL=ON
cmake --build llama.cpp/build --config Release -j --clean-first --target llama-cli llama-mtmd-cli llama-server llama-gguf-split
cp llama.cpp/build/bin/llama-* llama.cpp

您可以直接从 Hugging Face 拉取。由于模型体积很小，我们将使用未量化的全精度 BF16 变体。

./llama.cpp/llama-cli \
    -hf unsloth/functiongemma-270m-it-GGUF:BF16 \
    --jinja -ngl 99 --ctx-size 32768 \
    --top-k 64 --top-p 0.95 --temp 1.0

通过以下方式下载模型（在安装 pip install huggingface_hub hf_transfer 之后）。您可以选择 BF16 或其他量化版本（尽管由于模型较小，不建议低于 4-bit）。

# !pip install huggingface_hub hf_transfer
import os
os.environ["HF_HUB_ENABLE_HF_TRANSFER"] = "1"
from huggingface_hub import snapshot_download
snapshot_download(
    repo_id = "unsloth/functiongemma-270m-it-GGUF",
    local_dir = "unsloth/functiongemma-270m-it-GGUF",
    allow_patterns = ["*BF16*"],
)

然后以对话模式运行模型：

./llama.cpp/llama-cli \
    --model unsloth/functiongemma-270m-it-GGUF/functiongemma-270m-it-BF16.gguf \
    --ctx-size 32768 \
    --n-gpu-layers 99 \
    --seed 3407 \
    --prio 2 \
    --top-k 64 \
    --top-p 0.95 \
    --temp 1.0 \
    --jinja

📱 手机部署

由于体积小，您也可以在手机上运行并部署 FunctionGemma。我们与 PyTorch 合作，使用量化感知训练（QAT）恢复了 70% 的精度，然后直接部署到边缘设备。

将 FunctionGemma 本地部署到 Pixel 8 和 iPhone 15 Pro 以获得 大约 ~50 令牌/秒的推理速度
获得优先隐私、即时响应和离线能力
使用我们的免费 Colab 笔记本来微调 Qwen3 0.6B 并导出以进行手机部署——只需改为 Gemma3，并遵循 Gemma 3 Executorch 文档.

📱Run LLMs on your Phone

查看我们的 iOS 和 Android 手机部署教程：

iOS 教程 Android 教程

🦥 对 FunctionGemma 进行微调

Google 指出 FunctionGemma 旨在针对您的特定函数调用任务进行微调 包括多轮用例。Unsloth 现在支持对 FunctionGemma 进行微调。我们创建了 2 个微调笔记本，展示如何通过 全量微调或 LoRA，在 Colab 笔记本中免费训练模型：

在调用工具前推理的微调笔记本

Google Colabcolab.research.google.com

移动操作微调笔记本

Google Colabcolab.research.google.com

在 在调用工具前推理的微调笔记本中，我们将 微调它以在函数调用前“思考/推理”。链式思维推理对于提升工具使用能力变得越来越重要。

FunctionGemma 是一个专门用于函数调用的小型模型。它使用自己独特的聊天模板。当提供工具定义和用户提示时，它会生成结构化输出。然后我们可以解析此输出以执行工具、检索结果，并使用这些结果生成最终答案。

回合类型

内容

开发者提示

<start_of_turn>developer

您可以使用以下函数进行函数调用：

函数声明

<start_function_declaration>declaration:get_weather{

description: "获取城市的天气",

parameters: { city: STRING }

}

<end_function_declaration>

<end_of_turn>

用户回合

<start_of_turn>user

巴黎的天气怎么样？

<end_of_turn>

函数调用

<start_of_turn>model

<start_function_call>call:get_weather{

city: "paris"

}

<end_function_call>

函数响应

<start_function_response>response:get_weather{temperature:26}

<end_function_response>

助理结尾

巴黎的天气是 26 摄氏度。

<end_of_turn>

在这里，我们通过一个单一的思考块（而不是交错推理）来实现简化版本，使用 <think></think>。因此，我们的模型交互如下所示：

思考（Thinking） + 函数调用

<start_of_turn>model

<think>

用户想要巴黎的天气。我有 get_weather 工具。我应该用 city 参数来调用它。

</think>

<start_function_call>call:get_weather{

city: "paris"

}

<end_function_call>

🪗为移动操作微调 FunctionGemma

我们还创建了一个笔记本，展示如何让 FunctionGemma 执行移动操作。在 移动操作微调笔记本中，我们还启用了评估，并展示了为设备端操作进行微调的效果良好，如评估损失下降所示：

例如，给定一个提示 请为本周五 2025 年 6 月 6 日下午 2 点安排一个“团队同步会议”的提醒。

[{'role': 'developer',
  'content': '当前日期和时间以 YYYY-MM-DDTHH:MM:SS 格式给出：2025-06-04T15:29:23\n星期几是星期三\n您是一个可以进行函数调用的模型，具有以下函数\n',
  'tool_calls': None},
 {'role': 'user',
  'content': '请为本周五 2025 年 6 月 6 日下午 2 点安排一个“团队同步会议”的提醒。',
  'tool_calls': None}]

我们微调模型以能够输出：

<start_of_turn>user
请为本周五 2025 年 6 月 6 日下午 2 点安排一个“团队同步会议”的提醒。<end_of_turn>
<start_of_turn>model
<start_function_call>call:create_calendar_event{body:None,datetime:2025-06-06 14:00:00,email:None,first_name:None,last_name:None,phone_number:None,query:None,subject:None,title:<escape>Team Sync Meeting<escape>,to:None}<end_function_call><start_function_response>

🏃‍♂️使用 FunctionGemma 的多轮工具调用

我们还创建了一个笔记本，展示如何让 FunctionGemma 进行多轮工具调用。在 多轮工具调用笔记本中，我们展示了 FunctionGemma 能够在长消息变更中调用工具，例如见下：

您首先需要像下面这样指定您的工具：

def get_today_date():
    """
    获取今天的日期

    返回：
        today_date：以格式 2025年12月18日 的今天日期
    """
    from datetime import datetime
    today_date = datetime.today().strftime("%d %B %Y")
    return {"today_date": today_date}

def get_current_weather(location: str, unit: str = "celsius"):
    """
    获取给定位置的当前天气。

    参数：
        location：城市和州，例如 “San Francisco, CA, USA” 或 “Sydney, Australia”
        unit：返回温度的单位。（选项：["celsius", "fahrenheit"]）

    返回：
        temperature：给定位置的当前温度
        weather：给定位置的当前天气
    """
    if "San Francisco" in location.title():
        return {"temperature": 15, "weather": "sun朗"}
    elif "Sydney" in location.title():
        return {"temperature": 25, "weather": "多云"}
    else:
        return {"temperature": 30, "weather": "下雨"}

def add_numbers(x: float | str, y: float | str):
    """
    将两个数字相加

    参数：
        x：第一个数字
        y：第二个数字

    返回：
        result：x + y
    """
    return {"result" : float(x) + float(y)}

def multiply_numbers(x: float | str, y: float | str):
    """
    将两个数字相乘

    参数：
        x：第一个数字
        y：第二个数字

    返回：
        result：x * y
    """
    return {"result" : float(x) * float(y)}

然后我们为所有工具创建映射：

FUNCTION_MAPPING = {
    "get_today_date" : get_today_date,
    "get_current_weather" : get_current_weather,
    "add_numbers": add_numbers,
    "multiply_numbers": multiply_numbers,
}
TOOLS = list(FUNCTION_MAPPING.values())

我们还需要一些工具调用和解析代码：

#@title FunctionGemma 解析代码（可扩展）
import re
def extract_tool_calls(text):
    def cast(v):
        try: return int(v)
        except:
            try: return float(v)
            except: return {'true': True, 'false': False}.get(v.lower(), v.strip("'\""))

    return [{
        "name": name,
        "arguments": {
            k: cast((v1 or v2).strip())
            for k, v1, v2 in re.findall(r"(\w+):(?:<escape>(.*?)<escape>|([^,}]*))", args)
        }
    } for name, args in re.findall(r"<start_function_call>call:(\w+)\{(.*?)\}<end_function_call>", text, re.DOTALL)]

def process_tool_calls(output, messages):
    calls = extract_tool_calls(output)
    if not calls: return messages
    messages.append({
        "role": "assistant",
        "tool_calls": [{"type": "function", "function": call} for call in calls]
    })
    results = [
        {"name": c['name'], "response": FUNCTION_MAPPING[c['name']](**c['arguments'])}
        for c in calls
    ]
    messages.append({ "role": "tool", "content": results })
    has_tool_calls = False

def _do_inference(model, messages, max_new_tokens = 128):
    inputs = tokenizer.apply_chat_template(
        messages, tools = TOOLS, add_generation_prompt = True, return_dict = True, return_tensors = "pt",
    )
    output = tokenizer.decode(inputs["input_ids"][0], skip_special_tokens = False)

    out = model.generate(**inputs.to(model.device), max_new_tokens = max_new_tokens,
                         top_p = 0.95, top_k = 64, temperature = 1.0,)
    generated_tokens = out[0][len(inputs["input_ids"][0]):]
    return tokenizer.decode(generated_tokens, skip_special_tokens = True)
    
def do_inference(model, messages, print_assistant = True, max_new_tokens = 128):
    output = _do_inference(model, messages, max_new_tokens = max_new_tokens)
    messages = process_tool_calls(output, messages)
    if messages[-1]["role"] == "tool":
        output = _do_inference(model, messages, max_new_tokens = max_new_tokens)
    messages.append({"role": "assistant", "content": output})
    if print_assistant: print(output)
    has_tool_calls = False

现在我们可以调用模型了！

from unsloth import FastLanguageModel
import torch
max_seq_length = 4096 # 可以选择任意序列长度！
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "unsloth/functiongemma-270m-it",
    max_seq_length = max_seq_length, # 可为长上下文选择任意长度！
    load_in_4bit = False,  # 4 位量化以减少内存
    load_in_8bit = False, # [新！] 精度略好，使用 2 倍内存
    load_in_16bit = True, # [新！] 启用 16bit LoRA
    full_finetuning = False, # [新！] 我们现在有全量微调！
    # token = "hf_...", # 如果使用受限模型则使用
)

messages = []
messages.append({"role": "user", "content": "今天的日期是什么？"})
messages = do_inference(model, messages, max_new_tokens = 128)