# IBM Granite 4.1 - ローカル実行方法

IBMが3つのサイズのGranite-4.1モデルをリリース： **3B**, **8B** および **30B**。Granite-4.1は長文コンテキスト対応の高密度モデルファミリーで、指示追従、ツール呼び出し、チャット、RAG、コーディングのユースケース向けに構築されています。これらのモデルはサイズに対して非常に競争力が高く、15Tトークンで学習されています。

Unsloth Granite-4.1 Dynamic GGUFの実行方法、またはモデルのファインチューニング/RLの方法を学びましょう。サポート担当エージェントのユースケース向けに、無料ノートブックでGranite-4.1をファインチューニングできます。

**Granite-4.1モデルファミリー：**

* **Granite-4.1-3B Dense：** ローカル、エッジ、高負荷タスク向けの軽量かつ効率的なモデルです。高速な分類、抽出、シンプルなRAG、関数呼び出し、小型GPUでのファインチューニングに最適です。
* **Granite-4.1-8B Dense：** ローカルアシスタント、RAG、コーディング、多言語チャット、ツール使用ワークフロー向けのバランスの取れたモデルです。メモリ使用量を実用的に抑えつつ、より高い品質を求める場合の優れたデフォルト選択です。
* **Granite-4.1-30B Dense：** Granite-4.1で最も強力なモデルです。より要求の厳しい企業向けアシスタント、長文コンテキストタスク、複雑なRAG、コーディング、多言語ワークフロー、エージェント的なツール呼び出しのユースケースに最適です。

### ⚙️ 使用ガイド

決定的で指示に従う応答には、次の設定を使用してください：

`temperature=0.0`, `top_p=1.0`, `top_k=0`

* Temperature of `0.0`
* Top\_K = `0`
* Top\_P = `1.0`
* 推奨最小コンテキスト： `16,384`
* 最大コンテキスト長ウィンドウ： `131,072` トークン

#### Unsloth Granite-4.1 アップロード

* [`unsloth/granite-4.1-3b-GGUF`](https://huggingface.co/unsloth/granite-4.1-3b-GGUF)
* [`unsloth/granite-4.1-8b-GGUF`](https://huggingface.co/unsloth/granite-4.1-8b-GGUF)
* [`unsloth/granite-4.1-30b-GGUF`](https://huggingface.co/unsloth/granite-4.1-30b-GGUF)

## Granite-4.1チュートリアルを実行

<a href="/pages/e02293ea19a4c81500c51201ca5896a12ead136c#unsloth-studio-guide" class="button primary">Unsloth Studio で実行</a><a href="/pages/e02293ea19a4c81500c51201ca5896a12ead136c#llama.cpp-run-granite-4.1-tutorial" class="button secondary">llama.cpp で実行</a>

{% hint style="warning" %}
使用しないでください **CUDA 13.2** 、さもないと意味不明な出力になることがあります。NVIDIA が修正に取り組んでいます。
{% endhint %}

### 🦥 Unsloth Studio ガイド

このチュートリアルでは、 [Unsloth Studio](/docs/jp/xin-zhe/studio.md)を使用します。これは LLM の実行と学習のための新しい Web UI です。Unsloth Studio を使えば、モデルを実行し、 **音声**、画像、テキストをローカルで **Mac、Windows**、Linux 上で入力でき、さらに次のことができます:

{% columns %}
{% column %}

* 検索、ダウンロード、 [GGUF を実行](/docs/jp/xin-zhe/studio.md#run-models-locally) し、safetensor モデルを扱う
* **モデルを** 比較する **横並びで**
* [**自己修復** ツール呼び出し](/docs/jp/xin-zhe/studio.md#execute-code--heal-tool-calling) + **Web 検索**
* [**コード実行**](/docs/jp/xin-zhe/studio.md#run-models-locally) （Python、Bash）
* [自動推論](/docs/jp/xin-zhe/studio.md#model-arena) パラメータ調整（temp、top-p など）
* [LLM を学習する](/docs/jp/xin-zhe/studio.md#no-code-training) VRAM を 70% 少なくして 2 倍高速
  {% endcolumn %}

{% column %}

<div data-with-frame="true"><figure><img src="/files/c32867f999db074387ac16732ce548485cc593de" alt=""><figcaption></figcaption></figure></div>
{% endcolumn %}
{% endcolumns %}

{% stepper %}
{% step %}

#### Unslothをインストールする

**MacOS、Linux、WSL:**

```bash
curl -fsSL https://unsloth.ai/install.sh | sh
```

**Windows PowerShell:**

```bash
irm https://unsloth.ai/install.ps1 | iex
```

{% endstep %}

{% step %}

#### Unsloth Studio をセットアップ（1回のみ）

セットアップでは自動的に Node.js（nvm 経由）をインストールし、フロントエンドをビルドし、必要な Python 依存関係をすべてインストールし、CUDA サポート付きで llama.cpp をビルドします。

{% hint style="info" %}
**WSL ユーザー:** 次のインストールのために `sudo` パスワードの入力を求められます（ビルド依存関係のインストール用: `cmake`, `git`, `libcurl4-openssl-dev`).
{% endhint %}
{% endstep %}

{% step %}

#### Unsloth を起動

**MacOS、Linux、WSL:**

```bash
source unsloth_studio/bin/activate
unsloth studio -H 0.0.0.0 -p 8888
```

**Windows Powershell:**

```bash
& .\unsloth_studio\Scripts\unsloth.exe studio -H 0.0.0.0 -p 8888
```

<div data-with-frame="true"><figure><img src="/files/698ae7636b7c9b8a8122c6fbdabc1bd2273fdb2c" alt="" width="375"><figcaption></figcaption></figure></div>

**その後、 `http://localhost:8888` をブラウザで開いてください。**
{% endstep %}

{% step %}

#### Granite 4.1を検索してダウンロード

初回起動時には、アカウントを保護するためのパスワードを作成し、後で再度サインインする必要があります。その後、 [Studio Chat](/docs/jp/xin-zhe/studio/chat.md) タブを開き、検索バーで Granite 4.1 を検索して、希望するモデルと量子化版をダウンロードしてください。
{% endstep %}

{% step %}

#### Granite 4.1を実行

Unsloth Studio を使用すると推論パラメータは自動設定されるはずですが、手動で変更することもできます。コンテキスト長、チャットテンプレート、その他の設定も編集できます。

詳細は、 [Unsloth Studio 推論ガイド](/docs/jp/xin-zhe/studio/chat.md).
{% endstep %}
{% endstepper %}

### 🦙 Llama.cpp チュートリアル

1. 最新の `llama.cpp`から取得してください。以下のビルド手順に従うこともできます。 `-DGGML_CUDA=ON` を `-DGGML_CUDA=OFF` GPU がない場合、または CPU 推論だけを使いたい場合は変更してください。Apple Mac / Metal デバイスでは、 `-DGGML_CUDA=OFF` その後は通常どおり進めてください — Metalサポートはデフォルトで有効です。

```shell
apt-get update
apt-get install pciutils build-essential cmake curl libcurl4-openssl-dev -y
git clone https://github.com/ggml-org/llama.cpp
cmake llama.cpp -B llama.cpp/build \\
    -DBUILD_SHARED_LIBS=OFF -DGGML_CUDA=ON -DLLAMA_CURL=ON
cmake --build llama.cpp/build --config Release -j --clean-first --target llama-cli llama-server llama-gguf-split
cp llama.cpp/build/bin/llama-* llama.cpp
```

2. もし `llama.cpp` モデルを直接読み込むには、以下を実行できます。 `UD-Q4_K_XL` は量子化タイプです。次のような他の量子化版にも変更できます `Q4_K_M`, `Q5_K_M`, `Q8_0` または、利用可能であればBF16の完全精度。

```shell
./llama.cpp/llama-cli \\
    -hf unsloth/granite-4.1-30b-GGUF:UD-Q4_K_XL
```

3. または、インストール後にHugging Face経由でモデルをダウンロードします `huggingface_hub` および `hf_transfer`.

```python
# !pip install huggingface_hub hf_transfer
import os
os.environ["HF_HUB_ENABLE_HF_TRANSFER"] = "1"

from huggingface_hub import snapshot_download

snapshot_download(
    repo_id = "unsloth/granite-4.1-30b-GGUF",
    local_dir = "unsloth/granite-4.1-30b-GGUF",
    allow_patterns = ["*UD-Q4_K_XL*"],
)
```

4. UnslothのFlappy Birdテストを実行します。

```shell
./llama.cpp/llama-cli \\
    --model unsloth/granite-4.1-30b-GGUF/granite-4.1-30b-UD-Q4_K_XL.gguf \
    --n-gpu-layers 99 \\
    --seed 3407 \\
    --prio 2 \\
    --temp 0.0 \
    --top-k 0 \
    --top-p 1.0 \\
    -p "Flappy Birdの単一ファイルPython pygame実装を作成してください。"
```

編集 `--threads 32` CPU スレッド数を `--ctx-size 16384` コンテキスト長に対して、そして `--n-gpu-layers 99` GPUオフロードに対して。GPUがメモリ不足になった場合は、GPUレイヤーを調整してみてください。削除してください `--n-gpu-layers` CPUのみの推論を使用している場合。

5. 会話モードの場合：

```shell
./llama.cpp/llama-cli \\
    --model unsloth/granite-4.1-30b-GGUF/granite-4.1-30b-UD-Q4_K_XL.gguf \
    --conversation \
    --n-gpu-layers 99 \\
    --seed 3407 \\
    --prio 2 \\
    --temp 0.0 \
    --top-k 0 \
    --top-p 1.0
```

### UnslothでGranite-4.1をファインチューニング

Unslothは、ファインチューニング用に3B、8B、30Bを含むGranite-4.1モデルをサポートしています。学習は2倍高速で、VRAMの使用量が少なく、より長いコンテキスト長をサポートします。Granite-4.1-3BとGranite-4.1-8Bはローカルでのファインチューニングの出発点として最適で、Granite-4.1-30Bは高精度な企業向けワークフローに最も強力なモデルです。

* **Granite-4.0** [**無料ファインチューニングノートブック**](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Granite4.0.ipynb) **（モデル名を Granite-4.1 に変更）**

このノートブックでは、顧客とのやり取りを理解し、分析と推奨を含むサポートエージェントになるモデルを学習します。この設定により、サポート担当者にリアルタイムで支援を提供するボットを学習できます。また、Google Sheetに保存されたデータを使ってモデルを学習する方法も示します。

#### Granite-4.1 用の Unsloth 設定

古いバージョンの Unsloth を使っている場合やローカルでファインチューニングしている場合は、最新バージョンの Unsloth をインストールしてください：

```python
!pip install --upgrade unsloth
```

```python
from unsloth import FastLanguageModel
import torch

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "unsloth/granite-4.1-8b",
    max_seq_length = 2048,   # コンテキスト長 - より長くできますが、より多くのメモリを使用します
    dtype = None,            # 自動検出の場合は None
    load_in_4bit = True,     # 4bit ははるかに少ないメモリを使用します
    load_in_8bit = False,    # 少し正確ですが、2倍のメモリを使用します
    full_finetuning = False, # 現在はフルファインチューニングに対応しています！
    # token = "hf_...",      # gatedモデルを使う場合はこれを使用
)
```

最新のUnslothとUnsloth Zooを強制的に再インストールするには：

```shell
pip install --upgrade --force-reinstall --no-cache-dir unsloth unsloth_zoo
```

モデル名は任意のGranite-4.1モデルに変更できます：

```python
model_name = "unsloth/granite-4.1-3b"
model_name = "unsloth/granite-4.1-8b"
model_name = "unsloth/granite-4.1-30b"
```

30Bモデルでは、より大きなGPUまたはマルチGPU構成を使用し、 `max_seq_length` メモリ不足になった場合は量子化を下げるか上げてください。


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://unsloth.ai/docs/jp/moderu/ibm-granite-4.1.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.