> For the complete documentation index, see [llms.txt](https://unsloth.ai/docs/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://unsloth.ai/docs/jp/moderu/nemotron-3-nano-omni.md). # NVIDIA Nemotron 3 Nano Omni - ローカル実行方法 NVIDIA Nemotron-3-Nano-Omni-30B-A3B は、動画を含むマルチモーダルなエージェント系ワークロード向けに構築された、公開の30Bパラメータ、3Bアクティブのハイブリッド推論MoEモデルです **音声**, **ビデオ**、テキスト、画像、ドキュメントを入力として受け取り、テキストを出力します。モデルは **25GB RAM** で4ビット、36GBで8ビットで動作します。を備えた **256Kコンテキスト**を持つNemotron 3 Nano Omniは、 **最強のオムニ** であり、そのサイズにおける最高性能のオープン・マルチモーダルモデルです。NVIDIAと協力し、初日からサポートしました！\ **GGUF：** [Nemotron-3-Nano-Omni-30B-A3B-Reasoning](https://huggingface.co/unsloth/Nemotron-3-Nano-30B-A3B-GGUF) ### ⚙️ 使用ガイド NVIDIA は推論に次の設定を推奨しています： {% columns %} {% column %} **思考モード：** * `temperature = 0.6` * `top_p = 0.95` {% endcolumn %} {% column %} **Instructモード:** * `temperature = 0.2` {% endcolumn %} {% endcolumns %} ### Nemotron-3-Nano-Omni を実行ユースケースに応じて、 [異なる設定を使う必要があります](#usage-guide)を使用する必要があります。いくつかのGGUFは、モデルアーキテクチャ（たとえば [gpt-oss](/docs/jp/moderu/gpt-oss-how-to-run-and-fine-tune.md)）が128で割り切れない次元を持つため、同程度のサイズになります。そのため、一部の部分はより低いビット数に量子化できません。 **GGUF：** [Nemotron-3-Nano-Omni-30B-A3B-Reasoning](https://huggingface.co/unsloth/Nemotron-3-Nano-30B-A3B-GGUF) モデルの4ビット版には約25GBのRAMが必要です。8ビット版には36GBが必要です。これらのガイドでは、 `UD-Q4-K-XL` を使用します。これはサイズと精度のバランスが良好です。 Unsloth Studioで実行 llama.cppで実行 {% hint style="warning" %} 現在、マルチモーダル/ビジョンGGUFは **Ollama** で動作しません。理由は別々の `mmproj` visionファイルのためです。llama.cpp互換のバックエンドを使用してください。使用しないでください **CUDA 13.2** 。意味不明な出力になる可能性があります。NVIDIAは修正に取り組んでいます。 {% endhint %} ### 🦥 Unsloth Studioガイドこのチュートリアルでは、 [Unsloth Studio](/docs/jp/shii/studio.md)を使用します。これは、LLMの実行と学習のための新しいWeb UIです。Unsloth Studioでは、モデルを実行し、 **音声**、画像、テキストをローカルで **Mac、Windows**、Linux上で利用でき、さらに： {% columns %} {% column %} * 検索、ダウンロード、 [GGUFの実行](/docs/jp/shii/studio.md#run-models-locally) およびsafetensorモデル * **比較** モデル **を並べて** * [**自己修復** ツール呼び出し](/docs/jp/shii/studio.md#execute-code--heal-tool-calling) + **ウェブ検索** * [**コード実行**](/docs/jp/shii/studio.md#run-models-locally) （Python、Bash） * [自動推論](/docs/jp/shii/studio.md#model-arena) パラメータ調整（temp、top-pなど） * [LLMの学習](/docs/jp/shii/studio.md#no-code-training) VRAMを70%削減しつつ2倍高速 {% endcolumn %} {% column %}

{% endcolumn %} {% endcolumns %} {% stepper %} {% step %} #### Unsloth をインストール **MacOS、Linux、WSL：** ```bash curl -fsSL https://unsloth.ai/install.sh | sh ``` **Windows PowerShell：** ```bash irm https://unsloth.ai/install.ps1 | iex ``` {% endstep %} {% step %} #### Unsloth Studioのセットアップ（1回だけ）セットアップでは、自動的にNode.js（nvm経由）をインストールし、フロントエンドをビルドし、すべてのPython依存関係をインストールし、CUDA対応でllama.cppをビルドします。 {% hint style="info" %} **WSLユーザー：** 次の入力を求められます： `sudo` ビルド依存関係をインストールするための`cmake`, `git`, `libcurl4-openssl-dev`). {% endhint %} {% endstep %} {% step %} #### Unsloth を起動 **MacOS、Linux、WSL：** ```bash source unsloth_studio/bin/activate unsloth studio -H 0.0.0.0 -p 8888 ``` **Windows PowerShell：** ```bash unsloth studio -H 0.0.0.0 -p 8888 ```

その後 `http://127.0.0.1:8888` をブラウザで開いてください。 {% endstep %} {% step %} #### NVIDIA-Nemotron-3-Nano-30B-A3B-Omni を検索してダウンロード初回起動時には、アカウントを保護するためのパスワードを作成し、後で再度サインインする必要があります。その後、 [Unsloth Chat](/docs/jp/shii/studio/chat.md) タブで検索バーに Nemotron-3-Nano-Omni を入力して検索し、希望のモデルと量子化版をダウンロードしてください。

{% endstep %} {% step %} #### Nemotron-3-Nano-30B-A3B-Omni を実行 Unsloth Studioを使用すると推論パラメータは自動設定されますが、手動で変更することもできます。コンテキスト長、チャットテンプレート、その他の設定も編集できます。詳細については、 [Unsloth Studio 推論ガイド](/docs/jp/shii/studio/chat.md).

{% endstep %} {% endstepper %} ### 🦙 Llama.cpp チュートリアル： llama.cpp で実行する手順（ほとんどのデバイスに収めるため 4 ビットを使用します）： {% stepper %} {% step %} 最新の `llama.cpp` で [GitHub はこちら](https://github.com/ggml-org/llama.cpp)。以下のビルド手順に従うこともできます。 `-DGGML_CUDA=ON` を `-DGGML_CUDA=OFF` に変更してください。GPU がない場合や CPU 推論だけを使いたい場合です。 **Apple Mac / Metal デバイスの場合**、 `-DGGML_CUDA=OFF` に設定し、そのまま続行してください。Metal サポートはデフォルトで有効です。 {% code overflow="wrap" %} ```bash apt-get update apt-get install pciutils build-essential cmake curl libcurl4-openssl-dev -y git clone https://github.com/ggml-org/llama.cpp cmake llama.cpp -B llama.cpp/build \\ -DBUILD_SHARED_LIBS=OFF -DGGML_CUDA=ON -DLLAMA_CURL=ON cmake --build llama.cpp/build --config Release -j --clean-first --target llama-cli llama-mtmd-cli llama-server llama-gguf-split cp llama.cpp/build/bin/llama-* llama.cpp ``` {% endcode %} {% endstep %} {% step %} **まずは画像を取得しましょう！** 画像をアップロードすることもできます。ここではを使用します。これは、Unslothでファインチューニングがどのように作られるかを示す、私たちのミニロゴです: {% code overflow="wrap" %} ```bash wget https://raw.githubusercontent.com/unslothai/unsloth/refs/heads/main/images/unsloth%20made%20with%20love.png -O unsloth.png ``` {% endcode %}

2枚目の画像を {% code overflow="wrap" %} ```bash wget https://files.worldwildlife.org/wwfcmsprod/images/Sloth_Sitting_iStock_3_12_2014/story_full_width/8l7pbjmj29_iStock_000011145477Large_mini__1_.jpg -O picture.png ``` {% endcode %}

{% endstep %} {% step %} では、モデルを手動でダウンロードしましょう。以下のコードで実行できます（まず pip install huggingface\_hub をインストールしてください）。ダウンロードが止まってしまう場合は、こちらを参照してください: [Hugging Face Hub、XETのデバッグ](/docs/jp/ji-ben/troubleshooting-and-faqs/hugging-face-hub-xet-debugging.md) {% code overflow="wrap" %} ```bash pip install huggingface_hub hf download unsloth/NVIDIA-Nemotron-3-Nano-Omni-30B-A3B-Reasoning-GGUF \ --local-dir unsloth/NVIDIA-Nemotron-3-Nano-Omni-30B-A3B-Reasoning-GGUF \ --include "*mmproj-BF16*" \\ --include "*UD-Q4_K_XL*" # 動的 2ビットには "*UD-Q2_K_XL*" を使用 ``` {% endcode %} {% endstep %} {% step %} その後、会話モードでモデルを実行します： {% code overflow="wrap" %} ```bash ./llama.cpp/llama-cli \\ --model unsloth/NVIDIA-Nemotron-3-Nano-Omni-30B-A3B-Reasoning-GGUF/NVIDIA-Nemotron-3-Nano-Omni-30B-A3B-Reasoning-UD-Q4_K_XL.gguf \ --mmproj unsloth/NVIDIA-Nemotron-3-Nano-Omni-30B-A3B-Reasoning-GGUF/mmproj-BF16.gguf \ --temp 0.6 \\ --top-p 0.95 \\ --min-p 0.01 ``` {% endcode %} {% endstep %} {% step %} すると、以下のようになります:

{% endstep %} {% step %} 次に `/image` を使って両方の画像を読み込み、「これは何の画像ですか」と尋ねます:

{% endstep %} {% step %} そしてナマケモノの画像については:

{% endstep %} {% endstepper %} #### Llama-server のサービングとデプロイ Nemotron 3 Nano Omni をローカルにデプロイするには、 `llama-server`。新しいターミナルで、たとえば次のように `tmux`、モデルをデプロイします： ```bash ./llama.cpp/llama-server \\ -hf unsloth/NVIDIA-Nemotron-3-Nano-Omni-30B-A3B-Reasoning-GGUF:UD-Q4_K_XL \ --alias "unsloth/NVIDIA-Nemotron-3-Nano-Omni-30B-A3B-Reasoning" \ --prio 3 \ --temp 0.6 \\ --top-p 0.95 \\ --port 8001 ``` モデルを手動でダウンロードした場合は、次を使用します： {% code overflow="wrap" %} ```bash ./llama.cpp/llama-server \\ --model unsloth/NVIDIA-Nemotron-3-Nano-Omni-30B-A3B-Reasoning-GGUF/NVIDIA-Nemotron-3-Nano-Omni-30B-A3B-Reasoning-UD-Q4_K_XL.gguf \ --mmproj unsloth/NVIDIA-Nemotron-3-Nano-Omni-30B-A3B-Reasoning-GGUF/mmproj-BF16.gguf \ --alias "unsloth/NVIDIA-Nemotron-3-Nano-Omni-30B-A3B-Reasoning" \ --prio 3 \ --temp 0.6 \\ --top-p 0.95 \\ --port 8001 ``` {% endcode %} その後、新しいターミナルで、OpenAI クライアントを次でインストールしたら `pip install openai`: ```python from openai import OpenAI openai_client = OpenAI( base_url = "http://127.0.0.1:8001/v1", api_key = "sk-no-key-required", ) completion = openai_client.chat.completions.create( model = "unsloth/NVIDIA-Nemotron-3-Nano-Omni-30B-A3B-Reasoning", messages = [ {"role": "user", "content": "What is 2+2?"}, ], ) print(completion.choices[0].message.reasoning_content) print(completion.choices[0].message.content) ``` 以下のような表示になります:

#### OpenAI互換サーバー経由の画像入力では、 `picture.png` を使いましょう。これは [#llama.cpp-tutorial](#llama.cpp-tutorial "mention") {% code expandable="true" %} ```python from openai import OpenAI import base64 import mimetypes image_link = "picture.png" def file_to_data_url(path: str) -> str: mime = mimetypes.guess_type(path)[0] or "application/octet-stream" with open(path, "rb") as f: data = base64.b64encode(f.read()).decode("utf-8") return f"data:{mime};base64,{data}" openai_client = OpenAI( base_url = "http://127.0.0.1:8001/v1", api_key = "sk-no-key-required", ) completion = openai_client.chat.completions.create( model = "unsloth/NVIDIA-Nemotron-3-Nano-Omni-30B-A3B-Reasoning", messages = [ { "role": "user", "content": [ { "type": "text", "text": "これは何の画像ですか?", }, { "type": "image_url", "image_url": { "url": file_to_data_url(image_link), }, }, ], } ], ) print(completion.choices[0].message.reasoning_content) print(completion.choices[0].message.content) ``` {% endcode %} 以下のように表示されます:

### 🦥 Nemotron 3 Nano Omni のファインチューニング Unslothは [Nemotron](/docs/jp/moderu/nemotron-3.md) モデルファミリー全体をサポートしています。Nemotron 3 Nano Omni は、マルチモーダルなエージェントデータセットに役立ちます。Unslothを使って音声、視覚、またはテキストで学習できます。 **動画入力の** ファインチューニングは現在サポートされていません。テキストのみおよびノートブックの場合は、既存の [Nemotron 3 Nano のファインチューニングフロー](/docs/jp/moderu/nemotron-3.md#fine-tuning-nemotron-3-and-rl)から始めることができます。マルチモーダルアダプタの場合は、データセットにエージェントが実際に必要とするモダリティが含まれていることを確認してください: * **コンピュータ利用:** スクリーンショット、UI状態、カーソル/コンテキスト、期待される次のアクション * **ドキュメントインテリジェンス:** PDF、スクリーンショット、チャート、表、構造化抽出の対象 * **音声理解:** 音声クリップ、サンプリングフレーム、要約、タイムスタンプ、イベント、フォローアップ質問 * **エージェントループ:** 観察 → 推論 → 行動 → 検証の例 Omniでは、テキストのみのVRAM容量を安易に流用しないでください。マルチモーダルエンコーダー、プロジェクターの重み、画像トークン、音声チャンク、そして長いコンテキストはすべてメモリ使用量を増やします。まずは短いコンテキストと小さなバッチサイズで始め、徐々に拡大してください。 ### ベンチマーク Nemotron 3 Nano Omni は、そのサイズにおいて最強のオムニモデルです。さらに、最高効率のオープン・マルチモーダルモデルであり、精度も先行しています。このモデルは、あらゆるベンチマークで Qwen3-Omni-30B-A3B を上回ります。

--- # Agent Instructions This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com. ## Querying This Documentation If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question. Perform an HTTP GET request on the current page URL with the `ask` query parameter, and the optional `goal` query parameter: ``` GET https://unsloth.ai/docs/jp/moderu/nemotron-3-nano-omni.md?ask=&goal= ``` `ask` is the immediate question: it should be specific, self-contained, and written in natural language. `goal` is optional and describes the broader end goal you are ultimately trying to accomplish on behalf of the user. GitBook uses it to tailor the answer towards what is most useful for that goal. The response will contain a direct answer to the question and relevant excerpts and sources from the documentation. Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.