> For the complete documentation index, see [llms.txt](https://unsloth.ai/docs/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://unsloth.ai/docs/zh/mo-xing/minimax-m3.md). # MiniMax M3 - 如何在本地运行 MiniMax M3 是一款新的 **\~428B（23B 激活）** 用于编程、智能体工作流、协作任务和多模态聊天的开源模型。该多模态模型支持文本、图像和视频输入，并且有一个 **100万上下文** **窗口**。未量化的 bf16 权重约为**855GB** 而 1 位 GGUF 将其减少到仅 **128GB（-85%）**: [**MiniMax-M3 GGUF**](https://huggingface.co/unsloth/MiniMax-M3-GGUF) 该模型的表现与 Gemini 3.1 Pro 相当——在 SWE-Bench Pro 上得分 59%，Terminal-Bench 2.1 上得分 66%，SWE-fficiency 上得分 34.8%，在 KernelBench Hard 上得分 28.8%。感谢 MiniMax 在首发日提供访问。 {% columns %} {% column width="50%" %} 你现在可以直接在 [Unsloth Studio](#unsloth-studio-guide)。通过 Unsloth Studio 在单台 M3 Ultra 512GB 上本地运行 5 位 MiniMax M3 的示例： {% hint style="info" %} MiniMax-M3 GGUF 目前仍处于实验阶段。MiniMax-M3 本身是原生多模态的，但当前实验性的 GGUF 是 **仅文本** 并且不支持 MiniMax Sparse Attention。 {% endhint %} {% endcolumn %} {% column width="50%" %}

{% endcolumn %} {% endcolumns %} #### :gear: 使用指南最小的 GGUF 量化版本 `UD-IQ1_M`，使用 **128GB** 磁盘空间。由于文件大小不包括 KV cache 和上下文分配，请尽量至少留有 **133GB RAM** 来运行模型。建议使用 `UD-IQ3_XXS` 它是 **159GB** 以获得最佳结果。该 **4位** `UD-IQ4_XS` 量化版本为 **208GB**，而 `UD-Q4_K_XL` 则为 **265GB**。这些更适合 256GB+ 或 512GB 级系统、多 GPU 服务器，或具有 CPU RAM 加 GPU 卸载的系统。 **表：推理硬件要求** （单位 = 总内存：RAM + VRAM，或统一内存）

1 位	2 位	3 位	4位	5-bit	8位
133 GB	148 GB	164-200 GB	213-270 GB	325 GB	460-470 GB

{% hint style="success" %} 为获得最佳性能，请确保包括 VRAM 和系统 RAM 在内的总可用内存，能比量化后的模型文件大小大出充足的余量。 {% endhint %} #### 推荐设置 MiniMax 建议以下参数以获得最佳性能： `temperature=1.0`, `top_p=0.95`, `top_k=40`. {% columns %} {% column %} | `temperature = 1.0` | | ------------------- | | `top_p = 0.95` | | `top_k = 40` | | {% endcolumn %} | {% column %} * **最大上下文窗口：** `1,048,576` * 默认系统提示词： {% code overflow="wrap" %} ``` 你是一个有帮助的助手。你的名字是 MiniMax-M3，由 MiniMax 构建。 ``` {% endcode %} {% endcolumn %} {% endcolumns %} ## 运行 MiniMax-M3 教程：在本教程中，我们将使用当前最小的量化版本 `UD-IQ1_M`，因为 MiniMax-M3 非常大。请替换 `UD-IQ1_M` 并且 `UD-IQ4_XS`, `UD-Q4_K_XL`，如果你的机器有足够内存，也可以换成其他量化版本。你现在可以在 [Unsloth Studio](#run-in-unsloth-studio). 🦥 Unsloth Studio 指南 🦙 Llama.cpp 指南 ### 🦥 Unsloth Studio 指南 {% hint style="success" %} 你现在可以通过 [Unsloth Studio](#unsloth-studio-guide) ✨ 运行 MiniMax M3。确保你使用 > [`v0.1.463-beta`](https://github.com/unslothai/unsloth/tree/v0.1.462-beta) 或 `2026.6.6`. {% endhint %} MiniMax M3 现在可以在 [Unsloth Studio](/docs/zh/xin-de/studio.md)中运行和训练 Gemma 4 QAT，我们新的本地 AI 开源网页 UI。Unsloth Studio 让你可以在本地运行模型，支持 **MacOS**, **Windows**、Linux，以及： {% columns %} {% column %} * 搜索、下载， [运行 GGUF](/docs/zh/xin-de/studio.md#run-models-locally) 以及 safetensor 模型 * [**自修复** 工具调用](/docs/zh/xin-de/studio.md#execute-code--heal-tool-calling) + **网页搜索** * [**代码执行**](/docs/zh/xin-de/studio.md#run-models-locally) （Python、Bash） * [自动推理](/docs/zh/xin-de/studio.md#model-arena) 参数调优（temp、top-p 等） * 通过 llama.cpp 实现快速 CPU + GPU 推理 * [训练 LLM](/docs/zh/xin-de/studio.md#no-code-training) 速度提升 2 倍，VRAM 减少 70% {% endcolumn %} {% column %}

{% endcolumn %} {% endcolumns %} {% stepper %} {% step %} #### 安装 Unsloth 请确保你使用最新的 [`v0.1.463-beta`](https://github.com/unslothai/unsloth/tree/v0.1.462-beta) 或 `2026.6.6`。在终端中运行： **MacOS、Linux、WSL：** ```bash curl -fsSL https://unsloth.ai/install.sh | sh ``` **Windows PowerShell：** ```bash irm https://unsloth.ai/install.ps1 | iex ``` {% endstep %} {% step %} #### 启动 Unsloth **MacOS、Linux、WSL 和 Windows：** ```bash unsloth studio -H 0.0.0.0 -p 8888 ``` 然后打开 `http://127.0.0.1:8888` （或你的具体 URL）在浏览器中。 {% endstep %} {% step %} #### 搜索并下载 MiniMax M3 首次启动时，你需要创建密码来保护你的账户，并再次登录。然后前往 [Unsloth Chat](/docs/zh/xin-de/studio/chat.md) 选项卡，并在搜索栏中搜索 MiniMax M3，下载你想要的模型和量化版本。

{% endstep %} {% step %} #### 运行 MiniMax M3 使用 Unsloth Studio 时，推理参数应会自动设置，不过你仍然可以手动更改。你也可以编辑上下文长度、聊天模板和其他设置。更多信息，你可以查看我们的 [Unsloth Studio 推理指南](/docs/zh/xin-de/studio/chat.md).

{% endstep %} {% endstepper %} ### 🦙 Llama.cpp 指南 {% stepper %} {% step %} 获取特定的 `llama.cpp` PR 在 [**GitHub 这里**](https://github.com/ggml-org/llama.cpp/pull/24523)。你也可以按照下面的构建说明进行操作。将 `-DGGML_CUDA=ON` 更改为 `-DGGML_CUDA=OFF` 如果你没有 GPU，或者只想进行 CPU 推理。 **对于 Apple Mac / Metal 设备**，设置 `-DGGML_CUDA=OFF` 然后像往常一样继续——Metal 支持默认开启。 ```bash git clone https://github.com/ggml-org/llama.cpp cd llama.cpp git fetch origin pull/24523/head:minimax-m3 git checkout minimax-m3 cmake -B build -DGGML_CUDA=ON cmake --build build --config Release -j --target llama-cli llama-server ``` {% endstep %} {% step %} 你现在可以使用 `llama.cpp` 直接加载和下载模型，就像 `ollama run`。首先，选择你想要的量化类型，例如 `Q2_K_XL`。同时使用 `export LLAMA_CACHE="folder"` 以强制 `llama.cpp` 保存到特定位置。请注意，此下载过程可能非常慢，因此最好使用下一节中的手动下载流程。 ```bash export LLAMA_CACHE="unsloth/MiniMax-M3-GGUF" ./build/bin/llama-cli \\ -hf unsloth/MiniMax-M3-GGUF:UD-IQ1_M \\ --temp 1.0 \\ --top-p 0.95 \ --top-k 40 ``` {% hint style="info" %} 注意：MiniMax Sparse Attention 目前尚不受支持，因此推理会回退到稠密注意力。 {% endhint %} {% endstep %} {% step %} 如果你想手动下载模型，我们可以在安装 `pip install huggingface_hub`之后通过下面的代码下载模型。如果下载卡住，请参见： [Hugging Face Hub、XET 调试](/docs/zh/ji-chu-zhi-shi/troubleshooting-and-faqs/hugging-face-hub-xet-debugging.md) ```bash hf download unsloth/MiniMax-M3-GGUF \\ --local-dir unsloth/MiniMax-M3-GGUF \\ --include "*UD-IQ1_M*" # 4 位请使用 "*UD-IQ4_XS*" ``` {% endstep %} {% step %} 你可以编辑 `--threads 32` 来设置 CPU 线程数， `--ctx-size 32768` 来设置上下文长度， `--n-gpu-layers 2` 用于 GPU 卸载时可卸载的层数。若你的 GPU 显存不足，请尝试调整它。如果你只进行 CPU 推理，也要把它移除。记住 MSA 目前尚不受支持，所以请保持 `--ctx-size` 保持适中——在超长上下文下，稠密注意力会占用大量内存。 {% code overflow="wrap" %} ```bash ./build/bin/llama-cli \\ --model unsloth/MiniMax-M3-GGUF/UD-IQ1_M/MiniMax-M3-UD-IQ1_M-00001-of-00004.gguf \\ --temp 1.0 \\ --top-p 0.95 \ --top-k 40 ``` {% endcode %} {% endstep %} {% endstepper %} ## 📊 基准测试

--- # Agent Instructions This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com. ## Querying This Documentation If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question. Perform an HTTP GET request on the current page URL with the `ask` query parameter, and the optional `goal` query parameter: ``` GET https://unsloth.ai/docs/zh/mo-xing/minimax-m3.md?ask=&goal= ``` `ask` is the immediate question: it should be specific, self-contained, and written in natural language. `goal` is optional and describes the broader end goal you are ultimately trying to accomplish on behalf of the user. GitBook uses it to tailor the answer towards what is most useful for that goal. The response will contain a direct answer to the question and relevant excerpts and sources from the documentation. Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.