# DeepSeek-V3-0324：如何本地运行

{% hint style="info" %}
请参阅 <https://docs.unsloth.ai/basics/deepseek-r1-0528-how-to-run-locally> （2025年5月28日更新）学习如何更快更高效地运行 DeepSeek！
{% endhint %}

DeepSeek 又有新动作！在 2024 年 12 月和 2025 年 1 月发布 V3、R1 Zero 和 R1 之后，DeepSeek 更新了 V3 的检查点/模型，并发布了三月更新！

根据 DeepSeek 的说法，MMLU-Pro 提升了 +5.3%，达到 81.2%。 **GPQA 提升了 +9.3 个百分点**。AIME 提升 +19.8%，LiveCodeBench 提升 +10.0%！他们提供了一张图表，展示了与之前的 V3 检查点以及像 GPT 4.5 和 Claude Sonnet 3.7 等其他模型的比较情况。 <mark style="background-color:blue;">**但是我们如何在本地运行一个 6710 亿参数的模型？**</mark>

<table data-full-width="true"><thead><tr><th>MoE 位数</th><th>类型</th><th>磁盘大小</th><th>准确度</th><th>链接</th><th>详情</th></tr></thead><tbody><tr><td>1.78bit</td><td>IQ1_S</td><td><strong>173GB</strong></td><td>好的</td><td><a href="https://huggingface.co/unsloth/DeepSeek-V3-0324-GGUF/tree/main/UD-IQ1_S">链接</a></td><td>2.06/1.56bit</td></tr><tr><td>1.93bit</td><td>IQ1_M</td><td><strong>183GB</strong></td><td>一般</td><td><a href="https://huggingface.co/unsloth/DeepSeek-V3-0324-GGUF/tree/main/UD-IQ1_M">链接</a></td><td>2.5/2.06/1.56</td></tr><tr><td>2.42bit</td><td>IQ2_XXS</td><td><strong>203GB</strong></td><td><mark style="background-color:blue;"><strong>建议</strong></mark></td><td><a href="https://huggingface.co/unsloth/DeepSeek-V3-0324-GGUF/tree/main/UD-IQ2_XXS">链接</a></td><td>2.5/2.06bit</td></tr><tr><td>2.71bit</td><td>Q2_K_XL</td><td><strong>231GB</strong></td><td><mark style="background-color:purple;"><strong>建议</strong></mark></td><td><a href="https://huggingface.co/unsloth/DeepSeek-V3-0324-GGUF/tree/main/UD-Q2_K_XL">链接</a></td><td>3.5/2.5bit</td></tr><tr><td>3.5bit</td><td>Q3_K_XL</td><td><strong>320GB</strong></td><td>太棒了</td><td><a href="https://huggingface.co/unsloth/DeepSeek-V3-0324-GGUF/tree/main/UD-Q3_K_XL">链接</a></td><td>4.5/3.5bit</td></tr><tr><td>4.5bit</td><td>Q4_K_XL</td><td><strong>406GB</strong></td><td>最佳</td><td><a href="https://huggingface.co/unsloth/DeepSeek-V3-0324-GGUF/tree/main/UD-Q4_K_XL">链接</a></td><td>5.5/4.5bit</td></tr></tbody></table>

{% hint style="success" %}
DeepSeek V3 的原始上传为 float8，占用 715GB。使用 Q4\_K\_M 可将文件大小减半至大约 404GB，而我们的动态 1.78bit 量化约为 151GB。 **我们建议使用我们的 2.7bit 量化在大小和精度之间取得平衡！2.4bit 的也运作良好！**
{% endhint %}

## :gear: 官方推荐设置

根据 [DeepSeek](https://huggingface.co/deepseek-ai/DeepSeek-V3-0324)，以下是推理的推荐设置：

* <mark style="background-color:blue;">**温度为 0.3**</mark> （对于编程可能设为 0.0， [见此处](https://api-docs.deepseek.com/quick_start/parameter_settings))
* Min\_P 为 0.00（可选，但 0.01 效果很好，llama.cpp 的默认值是 0.1）
* 聊天模板： `<｜User｜>在 Python 中创建一个简单可玩的 Flappy Bird 游戏。将最终游戏放在一个 markdown 区块中。<｜Assistant｜>`
* 一个 BOS 标记为 `<｜begin▁of▁sentence｜>` 会在分词时自动添加（请不要手动添加！）
* DeepSeek 提到还使用了一个 <mark style="background-color:green;">**system 提示词**</mark> （可选）——它是中文： `该助手为DeepSeek Chat，由深度求索公司创造。\n今天是3月24日，星期一。` 其翻译为： `该助手为 DeepSeek Chat，由 DeepSeek 创建。\n今天是 3 月 24 日，星期一。`
* <mark style="background-color:orange;">**对于 KV 缓存量化，请使用 8bit，而不是 4bit——我们发现 4bit 明显更差。**</mark>

## 📖 教程：如何在 llama.cpp 中运行 DeepSeek-V3

1. 获取最新的 `llama.cpp` 在 [此处的 GitHub](https://github.com/ggml-org/llama.cpp)。您也可以按照下面的构建说明进行。若 `-DGGML_CUDA=ON` 更改为 `-DGGML_CUDA=OFF` 如果您没有 GPU 或仅想要在 CPU 上进行推理。 **对于 Apple Mac / Metal 设备**，设置 `-DGGML_CUDA=OFF` 然后照常继续 - Metal 支持默认启用。

{% hint style="warning" %}
注意 使用 `-DGGML_CUDA=ON` 在 GPU 上可能需要 5 分钟编译。仅 CPU 编译大约 1 分钟。你可能对 llama.cpp 的预编译二进制感兴趣。
{% endhint %}

```bash
apt-get update
apt-get install pciutils build-essential cmake curl libcurl4-openssl-dev -y
git clone https://github.com/ggml-org/llama.cpp
cmake llama.cpp -B llama.cpp/build \
    -DBUILD_SHARED_LIBS=OFF -DGGML_CUDA=ON -DLLAMA_CURL=ON
cmake --build llama.cpp/build --config Release -j --clean-first --target llama-quantize llama-cli llama-gguf-split
cp llama.cpp/build/bin/llama-* llama.cpp
```

2. 通过以下方式下载模型（在安装 `pip install huggingface_hub hf_transfer` 之后）。您可以选择 `UD-IQ1_S`（动态 1.78bit 量化）或其他量化版本，例如 `Q4_K_M` . <mark style="background-color:green;">**我建议使用我们的 2.7bit 动态量化**</mark><mark style="background-color:green;">**&#x20;**</mark><mark style="background-color:green;">**`UD-Q2_K_XL`**</mark><mark style="background-color:green;">**&#x20;**</mark><mark style="background-color:green;">**以在大小和准确性之间取得平衡**</mark>。更多版本在： <https://huggingface.co/unsloth/DeepSeek-V3-0324-GGUF>

{% code overflow="wrap" %}

```python
# !pip install huggingface_hub hf_transfer
import os
os.environ["HF_HUB_ENABLE_HF_TRANSFER"] = "1"
from huggingface_hub import snapshot_download
snapshot_download(
    repo_id = "unsloth/DeepSeek-V3-0324-GGUF-UD",
    local_dir = "unsloth/DeepSeek-V3-0324-GGUF-UD",
    allow_patterns = ["*UD-Q2_K_XL*"], # 动态 2.7bit（230GB） 使用 "*UD-IQ_S*" 可得动态 1.78bit（151GB）
)
```

{% endcode %}

3. 按照我们为 DeepSeek R1 提供的 1.58bit 动态量化描述，运行 Unsloth 的 Flappy Bird 测试。
4. 编辑 `--threads 32` 用于设置 CPU 线程数， `--ctx-size 16384` 用于上下文长度， `--n-gpu-layers 2` 用于指定将多少层卸载到 GPU。若 GPU 出现内存不足，请尝试调整它。若仅使用 CPU 推理，请移除此项。

<pre class="language-bash" data-overflow="wrap"><code class="lang-bash">./llama.cpp/llama-cli \
    --model unsloth/DeepSeek-V3-0324-GGUF-UD/blob/main/UD-Q2_K_XL/DeepSeek-V3-0324-UD-Q2_K_XL-00001-of-00006.gguf \
    <a data-footnote-ref href="#user-content-fn-1">--cache-type-k q8_0 </a>\
    <a data-footnote-ref href="#user-content-fn-2">--threads 20</a> \
    <a data-footnote-ref href="#user-content-fn-3">--n-gpu-layers 2</a> \
    -no-cnv \
    --prio 3 \
    --temp 0.3 \
    --min-p 0.01 \
    <a data-footnote-ref href="#user-content-fn-4">--ctx-size 4096</a> \
    --seed 3407 \
    --prompt "&#x3C;｜User｜>用 Python 创建一个 Flappy Bird 游戏。你必须包含以下内容:\n1. 你必须使用 pygame.\n2. 背景颜色应随机选择且为浅色。以浅蓝色开始.\n3. 多次按下 SPACE 将加速小鸟.\n4. 小鸟的形状应随机选择为方形、圆形或三角形。颜色应随机选择为深色.\n5. 在底部放置一些土地，颜色随机为深棕色或黄色.\n6. 在右上角显示分数。通过通过管道且未碰撞时增加分数.\n7. 随机间隔生成管道并保持足够的间距。管道颜色随机为深绿色、浅棕色或深灰色调.\n8. 当你失败时，显示最佳得分。将文本显示在屏幕内。按 q 或 Esc 将退出游戏。重新开始是再次按 SPACE.\n最终游戏应位于 Python 的 markdown 段中。在最终的 markdown 段之前检查你的代码是否有错误并修复它们。&#x3C;｜Assistant｜>"
</code></pre>

<details>

<summary>如果我们运行上述命令，会得到两个非常不同的结果。<br><br><strong>标准 2-bit 版本：</strong> 点击查看结果 <em><mark style="color:红色;"><strong>（癫痫警告！）</strong></mark></em><br><strong>动态 2-bit 版本：</strong> 见下方结果：</summary>

<img src="https://2657992854-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FxhOjnexMCB3dmuQFQ2Zq%2Fuploads%2Fgit-blob-5ab85f63c9fb12f0ce701dd3cfd540aa1503a636%2FOld.gif?alt=media" alt="" data-size="original">

标准 2-bit。在背景上失败，在碰撞上失败

</details>

<div align="center"><figure><img src="https://2657992854-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FxhOjnexMCB3dmuQFQ2Zq%2Fuploads%2Fgit-blob-97735d09e3d399cf74f8e7112eef8ca56b0e10e9%2FNew.gif?alt=media" alt="" width="240"><figcaption><p>动态 2-bit。成功创建了一个可玩的游戏。</p></figcaption></figure></div>

5. 像 DeepSeek-R1 一样，V3 有 61 层。例如在 24GB GPU 或 80GB GPU 上，你可以在向下取整后进行卸载（如果出现内存不足则再减少 1 层）：

| 量化      | 文件大小  | 24GB GPU | 80GB GPU | 2x80GB GPU |
| ------- | ----- | -------- | -------- | ---------- |
| 1.73bit | 173GB | 5        | 25       | 56         |
| 2.22bit | 183GB | 4        | 22       | 49         |
| 2.51bit | 212GB | 2        | 19       | 32         |

### 在 Mac / Apple 设备上运行

对于 Apple Metal 设备，请注意 `--n-gpu-layers`。如果发现机器内存不足，请减少该值。对于 128GB 统一内存的机器，你应能卸载大约 59 层左右。

```bash
./llama.cpp/llama-cli \
    --model DeepSeek-R1-GGUF/DeepSeek-V3-0324-UD-IQ1_S/DeepSeek-V3-0324-UD-IQ1_S-00001-of-00003.gguf \
    --cache-type-k q4_0 \
    --threads 16 \
    --prio 2 \
    --temp 0.6 \
    --ctx-size 8192 \
    --seed 3407 \
    --n-gpu-layers 59 \
    -no-cnv \
    --prompt "<｜User｜>Create a Flappy Bird game in Python.<｜Assistant｜>"
```

## :8ball: 七边形测试

我们还通过测试我们的动态量化，来自 [r/Localllama](https://www.reddit.com/r/LocalLLaMA/comments/1j7r47l/i_just_made_an_animation_of_a_ball_bouncing/) 该测试要求模型创建一个基本物理引擎，以模拟球在移动的封闭七边形中旋转的情况。

<figure><img src="https://2657992854-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FxhOjnexMCB3dmuQFQ2Zq%2Fuploads%2Fgit-blob-1371de5a93c6c5b0e43e8bb51980d84554b199f4%2Fsnapshot.jpg?alt=media" alt="" width="563"><figcaption><p>目标是让七边形旋转，七边形内的球也应随之运动。</p></figcaption></figure>

{% code overflow="wrap" %}

```bash
./llama.cpp/llama-cli \
    --model unsloth/DeepSeek-V3-0324-GGUF-UD/blob/main/UD-Q2_K_XL/DeepSeek-V3-0324-UD-Q2_K_XL-00001-of-00006.gguf \
    --cache-type-k q8_0 \
    --threads 20 \
    --n-gpu-layers 2 \
    -no-cnv \
    --prio 3 \
    --temp 0.3 \
    --min_p 0.01 \
    --ctx-size 4096 \
    --seed 3407 \
    --prompt "<｜User｜>编写一个 Python 程序，显示 20 个在旋转七边形内弹跳的球：\n- 所有球具有相同的半径。\n- 每个球上有从 1 到 20 的编号。\n- 启动时所有球从七边形中心落下。\n- 颜色为：#f8b862、#f6ad49、#f39800、#f08300、#ec6d51、#ee7948、#ed6d3d、#ec6800、#ec6800、#ee7800、#eb6238、#ea5506、#ea5506、#eb6101、#e49e61、#e45e32、#e17b34、#dd7a56、#db8449、#d66a35\n- 球应受重力和摩擦影响，并且必须与旋转的墙壁进行真实的弹跳。球之间也应发生碰撞。\n- 所有球的材质决定了其碰撞反弹高度不会超过七边形的半径，但会高于球的半径。\n- 所有球会因摩擦旋转，球上的编号可用于指示球的旋转。\n- 七边形绕其中心旋转，旋转速度为每 5 秒 360 度。\n- 七边形大小应足以容纳所有球。\n- 不要使用 pygame 库；自行实现碰撞检测算法和碰撞响应等。允许使用的 Python 库包括：tkinter、math、numpy、dataclasses、typing、sys。\n- 所有代码应放在单个 Python 文件中。<｜Assistant｜>"
```

{% endcode %}

<table data-view="cards"><thead><tr><th></th><th data-type="files"></th><th data-hidden data-card-cover data-type="files"></th></tr></thead><tbody><tr><td>非动态 2bit。失败 - <mark style="background-color:red;">癫痫警告</mark> 又来了！</td><td><a href="https://2657992854-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FxhOjnexMCB3dmuQFQ2Zq%2Fuploads%2Fgit-blob-0e0d101cc282b869a97fc679da5a9e98141c6f62%2Funsloth-q2_k_rotate.txt?alt=media">unsloth-q2_k_rotate.txt</a></td><td><a href="https://2657992854-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FxhOjnexMCB3dmuQFQ2Zq%2Fuploads%2Fgit-blob-d2579e8d1145572548ed69b0e335925f47130ba3%2FInShot_20250325_185636426.gif?alt=media">InShot_20250325_185636426.gif</a></td></tr><tr><td>动态 2bit。实际上正确地解决了七边形难题！！</td><td><a href="https://2657992854-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FxhOjnexMCB3dmuQFQ2Zq%2Fuploads%2Fgit-blob-96202dad58c0d0004cdb51408332ac78b425138f%2Funsloth-q2_k_xl_rotate.txt?alt=media">unsloth-q2_k_xl_rotate.txt</a></td><td><a href="https://2657992854-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FxhOjnexMCB3dmuQFQ2Zq%2Fuploads%2Fgit-blob-669f5b08b6bf220a38dce445d4830fcd80d52be3%2FInShot_20250325_181710554.gif?alt=media">InShot_20250325_181710554.gif</a></td></tr><tr><td>原始 float8</td><td><a href="https://2657992854-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FxhOjnexMCB3dmuQFQ2Zq%2Fuploads%2Fgit-blob-902430fe71d14f19cd31d3fe271f9a636e91304e%2Ffp8-heptagon.txt?alt=media">fp8-heptagon.txt</a></td><td><a href="https://2657992854-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FxhOjnexMCB3dmuQFQ2Zq%2Fuploads%2Fgit-blob-4501a6aef616d02057b89e96baa9c1d5ed57666f%2FInShot_20250325_181423756.gif?alt=media">InShot_20250325_181423756.gif</a></td></tr></tbody></table>

动态 2.7 bit 量化仅有 230GB，实际上成功解决了七边形难题！所有三个版本（包括完整 fp8）的完整输出如下：

<details>

<summary>动态 2bit 七边形 代码</summary>

```python
import tkinter as tk
import math
import numpy as np
from dataclasses import dataclass
from typing import List, Tuple, Optional

# 常量
HEPTAGON_RADIUS = 300
BALL_RADIUS = 20
GRAVITY = 0.2
FRICTION = 0.99
BOUNCE_FACTOR = 0.8
ROTATION_SPEED = 360 / 5  # 每秒度数
SPIN_FRICTION = 0.98
BALL_COLORS = [
    "#f8b862", "#f6ad49", "#f39800", "#f08300", "#ec6d51",
    "#ee7948", "#ed6d3d", "#ec6800", "#ec6800", "#ee7800",
    "#eb6238", "#ea5506", "#ea5506", "#eb6101", "#e49e61",
    "#e45e32", "#e17b34", "#dd7a56", "#db8449", "#d66a35"
]

@dataclass
class Ball:
    x: float
    y: float
    vx: float
    vy: float
    number: int
    spin: float = 0.0
    color: str = "#000000"

@dataclass
class Wall:
    x1: float
    y1: float
    x2: float
    y2: float

class BouncingBalls:
    def __init__(self, root):
        self.root = root
        self.canvas = tk.Canvas(root, width=800, height=800, bg="white")
        self.canvas.pack()
        self.balls: List[Ball] = []
        self.walls: List[Wall] = []
        self.heptagon_angle = 0
        self.last_time = 0
        
        self.setup_balls()
        self.setup_heptagon()
        
        self.root.after(16, self.update)
        self.root.bind("<space>", self.reset_balls)
    
    def setup_balls(self):
        for i in range(20):
            ball = Ball(
                x=400,
                y=400,
                vx=np.random.uniform(-5, 5),
                vy=np.random.uniform(-5, 5),
                number=i+1,
                color=BALL_COLORS[i]
            )
            self.balls.append(ball)
    
    def setup_heptagon(self):
        # 创建初始七边形墙壁
        self.update_heptagon_walls(0)
    
    def update_heptagon_walls(self, angle):
        self.walls = []
        center_x, center_y = 400, 400
        angle_rad = math.radians(angle)
        
        for i in range(7):
            angle1 = angle_rad + 2 * math.pi * i / 7
            angle2 = angle_rad + 2 * math.pi * (i + 1) / 7
            
            x1 = center_x + HEPTAGON_RADIUS * math.cos(angle1)
            y1 = center_y + HEPTAGON_RADIUS * math.sin(angle1)
            x2 = center_x + HEPTAGON_RADIUS * math.cos(angle2)
            y2 = center_y + HEPTAGON_RADIUS * math.sin(angle2)
            
            self.walls.append(Wall(x1, y1, x2, y2))
    
    def reset_balls(self, event=None):
        for ball in self.balls:
            ball.x = 400
            ball.y = 400
            ball.vx = np.random.uniform(-5, 5)
            ball.vy = np.random.uniform(-5, 5)
            ball.spin = np.random.uniform(-5, 5)
    
    def update(self):
        current_time = self.root.after_idle(self.root.after, 16, self.update)
        if self.last_time == 0:
            self.last_time = current_time
            return
        
        # 计算时间差（近似）
        dt = 0.016  # 假设约为 60 FPS
        
        # 更新七边形旋转
        self.heptagon_angle += ROTATION_SPEED * dt
        self.update_heptagon_walls(self.heptagon_angle)
        
        # 更新球
        for ball in self.balls:
            # 应用重力
            ball.vy += GRAVITY
            
            # 应用摩擦
            ball.vx *= FRICTION
            ball.vy *= FRICTION
            ball.spin *= SPIN_FRICTION
            
            # 移动球
            ball.x += ball.vx
            ball.y += ball.vy
            
            # 检查与墙的碰撞
            self.check_wall_collisions(ball)
            
            # 检查与其他球的碰撞
            for other in self.balls:
                if other.number != ball.number:
                    self.check_ball_collision(ball, other)
        
        # 绘制一切
        self.draw()
    
    def check_wall_collisions(self, ball):
        for wall in self.walls:
            # 找到墙段上距球最近的点
            closest = self.closest_point_on_segment(
                wall.x1, wall.y1, wall.x2, wall.y2, ball.x, ball.y
            )
            
            # 计算到墙的距离
            dx = ball.x - closest[0]
            dy = ball.y - closest[1]
            distance = math.sqrt(dx*dx + dy*dy)
            
            if distance < BALL_RADIUS:
                # 检测到碰撞
                # 计算法向量
                nx = dx / distance
                ny = dy / distance
                
                # 计算沿法向的相对速度
                v_rel = ball.vx * nx + ball.vy * ny
                
                if v_rel < 0:  # 朝向墙移动
                    # 计算冲量
                    j = -(1 + BOUNCE_FACTOR) * v_rel
                    
                    # 应用冲量
                    ball.vx += j * nx
                    ball.vy += j * ny
                    
                    # 根据碰撞添加一些自旋
                    ball.spin += (ball.vx * ny - ball.vy * nx) * 0.1
                    
                    # 将球移出碰撞区域
                    penetration = BALL_RADIUS - distance
                    ball.x += penetration * nx
                    ball.y += penetration * ny
    
    def check_ball_collision(self, ball1, ball2):
        dx = ball2.x - ball1.x
        dy = ball2.y - ball1.y
        distance = math.sqrt(dx*dx + dy*dy)
        
        if distance < 2 * BALL_RADIUS:
            # 检测到碰撞
            nx = dx / distance
            ny = dy / distance
            
            # 计算相对速度
            v_rel_x = ball2.vx - ball1.vx
            v_rel_y = ball2.vy - ball1.vy
            v_rel = v_rel_x * nx + v_rel_y * ny
            
            if v_rel < 0:  # 相互靠近移动
                # 计算冲量
                j = -(1 + BOUNCE_FACTOR) * v_rel / 2
                
                # 应用冲量
                ball1.vx -= j * nx
                ball1.vy -= j * ny
                ball2.vx += j * nx
                ball2.vy += j * ny
                
                # 根据碰撞添加自旋
                ball1.spin += (ball1.vx * ny - ball1.vy * nx) * 0.05
                ball2.spin += (ball2.vx * ny - ball2.vy * nx) * 0.05
                
                # 将球分开
                penetration = 2 * BALL_RADIUS - distance
                ball1.x -= penetration * nx * 0.5
                ball1.y -= penetration * ny * 0.5
                ball2.x += penetration * nx * 0.5
                ball2.y += penetration * ny * 0.5
    
    @staticmethod
    def closest_point_on_segment(x1, y1, x2, y2, x, y):
        # 从点到线段起点的向量
        dx = x - x1
        dy = y - y1
        
        # 线段向量
        sx = x2 - x1
        sy = y2 - y1
        
        # 点到线段的投影
        dot = dx * sx + dy * sy
        len_sq = sx * sx + sy * sy
        param = dot / len_sq if len_sq != 0 else -1
        
        if param < 0:
            return x1, y1
        elif param > 1:
            return x2, y2
        else:
            return x1 + param * sx, y1 + param * sy
    
    def draw(self):
        self.canvas.delete("all")
        
        # 绘制七边形
        points = []
        for wall in self.walls:
            points.extend([wall.x1, wall.y1])
        self.canvas.create_polygon(points, fill="", outline="black", width=2)
        
        # 绘制球
        for ball in self.balls:
            # 绘制单个球
            self.canvas.create_oval(
                ball.x - BALL_RADIUS, ball.y - BALL_RADIUS,
                ball.x + BALL_RADIUS, ball.y + BALL_RADIUS,
                fill=ball.color, outline="black"
            )
            
            # 根据自旋绘制带旋转的编号
            angle = ball.spin * 10  # 缩放自旋以便视觉效果
            self.canvas.create_text(
                ball.x, ball.y,
                text=str(ball.number),
                font=("Arial", 12, "bold"),
                angle=angle
            )

if __name__ == "__main__":
    root = tk.Tk()
    root.title("旋转七边形中的弹跳球")
    app = BouncingBalls(root)
    root.mainloop()
```

</details>

<details>

<summary>非动态 2bit 七边形 代码</summary>

```python
import tkinter as tk
import math
import random
from dataclasses import dataclass
from typing import List, Tuple, Optional
import sys

# 常量
WIDTH, HEIGHT = 800, 800
HEPTAGON_RADIUS = 300
BALL_RADIUS = 15
GRAVITY = 0.5
FRICTION = 0.999
ELASTICITY = 0.8
ROTATION_SPEED = 2 * math.pi / 5  # 每 5 秒 360 度
SPIN_DECAY = 0.99

# 球的颜色
BALL_COLORS = [
    "#f8b862", "#f6ad49", "#f39800", "#f08300", "#ec6d51",
    "#ee7948", "#ed6d3d", "#ec6800", "#ec6800", "#ee7800",
    "#eb6238", "#ea5506", "#ea5506", "#eb6101", "#e49e61",
    "#e45e32", "#e17b34", "#dd7a56", "#db8449", "#d66a35"
]

@dataclass
class Ball:
    x: float
    y: float
    vx: float
    vy: float
    radius: float
    color: str
    number: int
    spin: float = 0.0

@dataclass
class Heptagon:
    center_x: float
    center_y: float
    radius: float
    angle: float = 0.0

class BouncingBalls:
    def __init__(self, root):
        self.root = root
        self.canvas = tk.Canvas(root, width=WIDTH, height=HEIGHT, bg="white")
        self.canvas.pack()
        
        self.heptagon = Heptagon(WIDTH//2, HEIGHT//2, HEPTAGON_RADIUS)
        self.balls = []
        self.setup_balls()
        
        self.root.after(0, self.update)
        self.root.mainloop()
    
    def setup_balls(self):
        center_x, center_y = WIDTH//2, HEIGHT//2
        for i in range(20):
            self.balls.append(Ball(
                x=center_x,
                y=center_y,
                vx=0,
                vy=0,
                radius=BALL_RADIUS,
                color=BALL_COLORS[i],
                number=i+1,
                spin=0
            ))
    
    def update(self):
        self.canvas.delete("all")
        
        # 更新七边形角度
        self.heptagon.angle += ROTATION_SPEED / 60  # 假设 60 FPS
        
        # 绘制七边形
        self.draw_heptagon()
        
        # 更新并绘制球
        for ball in self.balls:
            # 应用重力
            ball.vy += GRAVITY
            
            # 更新位置
            ball.x += ball.vx
            ball.y += ball.vy
            
            # 应用摩擦
            ball.vx *= FRICTION
            ball.vy *= FRICTION
            
            # 应用自旋衰减
            ball.spin *= SPIN_DECAY
            
            # 检查与七边形墙的碰撞
            self.check_heptagon_collision(ball)
            
            # 检查与其他球的碰撞
            for other in self.balls:
                if other != ball:
                    if self.check_ball_collision(ball, other):
                        self.resolve_ball_collision(ball, other)
            
            # 绘制球体
            self.draw_ball(ball)
        
        self.root.after(16, self.update)  # ~60 FPS
    
    def draw_heptagon(self):
        center_x, center_y = self.heptagon.center_x, self.heptagon.center_y
        points = []
        for i in range(7):
            angle = self.heptagon.angle + i * 2 * math.pi / 7
            x = center_x + self.heptagon.radius * math.cos(angle)
            y = center_y + self.heptagon.radius * math.sin(angle)
            points.append((x, y))
        
        # 绘制七边形
        self.canvas.create_polygon(
            [points[0], points[1], points[2], points[3], 
             points[4], points[5], points[6]],
            outline="black", fill="", width=2
        )
    
    def draw_ball(self, ball):
        self.canvas.create_oval(
            ball.x - ball.radius,
            ball.y - ball.radius,
            ball.x + ball.radius,
            ball.y + ball.radius,
            fill=ball.color,
            outline="black"
        )
        
        # 绘制编号
        self.canvas.create_text(
            ball.x, ball.y,
            text=str(ball.number),
            fill="black"
        )
    
    def check_heptagon_collision(self, ball):
        center_x, center_y = WIDTH//2, HEIGHT//2
        
        # 检查与中心的距离
        dx = ball.x - center_x
        dy = ball.y - center_y
        dist = math.sqrt(dx**2 + dy**2)
        
        if dist + ball.radius > self.heptagon.radius:
            # 找到从中心到球的法向量
            angle = math.atan2(dy, dx)
            normal_x = math.cos(angle)
            normal_y = math.sin(angle)
            
            # 将球移回七边形内
            overlap = (dist + ball.radius) - self.heptagon.radius
            ball.x -= overlap * normal_x
            ball.y -= overlap * normal_y
            
            # 反射速度
            dot_product = ball.vx * normal_x + ball.vy * normal_y
            ball.vx -= 2 * dot_product * normal_x * ELASTICITY
            ball.vy -= 2 * dot_product * normal_y * ELASTICITY
    
    def check_ball_collision(self, ball1, ball2):
        dx = ball2.x - ball1.x
        dy = ball2.y - ball1.y
        distance = math.sqrt(dx**2 + dy**2)
        return distance < (ball1.radius + ball2.radius)
    
    def resolve_ball_collision(self, ball1, ball2):
        dx = ball2.x - ball1.x
        dy = ball2.y - ball1.y
        distance = math.sqrt(dx**2 + dy**2)
        
        # 法向量
        nx = dx / distance
        ny = dy / distance
        
        # 相对速度
        dvx = ball2.vx - ball1.vx
        dvy = ball2.vy - ball1.vy
        
        # 计算冲量
        impulse = 2 * (dvx * nx + dvy * ny) / 2
        impulse *= ELASTICITY
        
        # 应用冲量
        ball1.vx -= impulse * nx
        ball1.vy -= impulse * ny
        ball2.vx += impulse * nx
        ball2.vy += impulse * ny
        
        # 将球分离以防止粘连
        overlap = (ball1.radius + ball2.radius) - distance
        ball1.x -= overlap * nx / 2
        ball1.y -= overlap * ny / 2
        ball2.x += overlap * nx / 2
        ball2.y += overlap * ny / 2
    
    def run(self):
        self.root.mainloop()

if __name__ == "__main__":
    root = tk.Tk()
    root.title("旋转七边形中的弹跳球")
    app = BouncingBalls(root)
    app.run()
```

</details>

<details>

<summary>Float8 七边形 代码</summary>

```python
import tkinter as tk
import math
import numpy as np
from dataclasses import dataclass
from typing import List, Tuple, Optional

# 常量
WIDTH, HEIGHT = 800, 800
CENTER_X, CENTER_Y = WIDTH // 2, HEIGHT // 2
GRAVITY = 0.2
FRICTION = 0.99
BALL_RADIUS = 15
HEPTAGON_RADIUS = 300
ROTATION_SPEED = 360 / 5  # 每秒度数
ELASTICITY = 0.8
SPIN_FRICTION = 0.98

# 球的颜色
BALL_COLORS = [
    '#f8b862', '#f6ad49', '#f39800', '#f08300', '#ec6d51',
    '#ee7948', '#ed6d3d', '#ec6800', '#ec6800', '#ee7800',
    '#eb6238', '#ea5506', '#ea5506', '#eb6101', '#e49e61',
    '#e45e32', '#e17b34', '#dd7a56', '#db8449', '#d66a35'
]

@dataclass
class Ball:
    x: float
    y: float
    vx: float
    vy: float
    radius: float
    color: str
    number: int
    spin: float = 0.0

    def move(self):
        self.x += self.vx
        self.y += self.vy
        self.vy += GRAVITY
        self.vx *= FRICTION
        self.vy *= FRICTION
        self.spin *= SPIN_FRICTION

    def collide_with_ball(self, other: 'Ball'):
        dx = other.x - self.x
        dy = other.y - self.y
        distance = math.hypot(dx, dy)
        
        if distance < self.radius + other.radius:
            # 计算碰撞法线
            nx = dx / distance
            ny = dy / distance
            
            # 计算相对速度
            dvx = other.vx - self.vx
            dvy = other.vy - self.vy
            
            # 计算冲量
            impulse = 2 * (dvx * nx + dvy * ny) / (1/self.radius + 1/other.radius)
            
            # 应用冲量
            self.vx += impulse * nx / self.radius
            self.vy += impulse * ny / self.radius
            other.vx -= impulse * nx / other.radius
            other.vy -= impulse * ny / other.radius
            
            # 将球分离以防止粘连
            overlap = (self.radius + other.radius - distance) / 2
            self.x -= overlap * nx
            self.y -= overlap * ny
            other.x += overlap * nx
            other.y += overlap * ny
            
            # 传递一些自旋
            transfer = impulse * 0.01
            self.spin -= transfer
            other.spin += transfer

class HeptagonBounceSimulator:
    def __init__(self, root):
        self.root = root
        self.canvas = tk.Canvas(root, width=WIDTH, height=HEIGHT, bg='white')
        self.canvas.pack()
        
        self.balls = self.create_balls()
        self.heptagon_angle = 0
        self.last_time = 0
        self.running = True
        
        self.root.bind('<space>', self.toggle_pause)
        self.root.bind('<Escape>', lambda e: root.destroy())
        
        self.last_time = self.root.after(0, self.update)
    
    def create_balls(self) -> List[Ball]:
        balls = []
        for i in range(20):
            # 让所有球从中心开始并具有小的随机速度
            angle = np.random.uniform(0, 2 * math.pi)
            speed = np.random.uniform(0.5, 2)
            vx = math.cos(angle) * speed
            vy = math.sin(angle) * speed
            
            balls.append(Ball(
                x=CENTER_X,
                y=CENTER_Y,
                vx=vx,
                vy=vy,
                radius=BALL_RADIUS,
                color=BALL_COLORS[i],
                number=i+1,
                spin=np.random.uniform(-2, 2)
            ))
        return balls
    
    def toggle_pause(self, event):
        self.running = not self.running
        if self.running:
            self.last_time = self.root.after(0, self.update)
    
    def get_heptagon_vertices(self) -> List[Tuple[float, float]]:
        vertices = []
        for i in range(7):
            angle = math.radians(self.heptagon_angle + i * 360 / 7)
            x = CENTER_X + HEPTAGON_RADIUS * math.cos(angle)
            y = CENTER_Y + HEPTAGON_RADIUS * math.sin(angle)
            vertices.append((x, y))
        return vertices
    
    def check_ball_heptagon_collision(self, ball: Ball):
        vertices = self.get_heptagon_vertices()
        closest_dist = float('inf')
        closest_normal = (0, 0)
        closest_edge = None
        
        # 检查与七边形每条边的碰撞
        for i in range(len(vertices)):
            p1 = vertices[i]
            p2 = vertices[(i + 1) % len(vertices)]
            
            # 从 p1 到 p2 的向量
            edge_x = p2[0] - p1[0]
            edge_y = p2[1] - p1[1]
            edge_length = math.hypot(edge_x, edge_y)
            
            # 归一化边向量
            edge_x /= edge_length
            edge_y /= edge_length
            
            # 法线向量（垂直于边，朝内指向）
            nx = -edge_y
            ny = edge_x
            
            # 从 p1 到球的向量
            ball_to_p1_x = ball.x - p1[0]
            ball_to_p1_y = ball.y - p1[1]
            
            # 将球投影到边的法线方向
            projection = ball_to_p1_x * nx + ball_to_p1_y * ny
            
            # 如果投影为负，球在七边形外部
            if projection < ball.radius:
                # 找到边上离球最近的点
                edge_proj = ball_to_p1_x * edge_x + ball_to_p1_y * edge_y
                edge_proj = max(0, min(edge_length, edge_proj))
                closest_x = p1[0] + edge_proj * edge_x
                closest_y = p1[1] + edge_proj * edge_y
                
                # 球到边上最近点的距离
                dist = math.hypot(ball.x - closest_x, ball.y - closest_y)
                
                if dist < closest_dist:
                    closest_dist = dist
                    closest_normal = (nx, ny)
                    closest_edge = (p1, p2)
        
        if closest_dist < ball.radius:
            # 计算反弹响应
            dot_product = ball.vx * closest_normal[0] + ball.vy * closest_normal[1]
            
            # 以弹性系数应用反弹
            ball.vx -= (1 + ELASTICITY) * dot_product * closest_normal[0]
            ball.vy -= (1 + ELASTICITY) * dot_product * closest_normal[1]
            
            # 根据撞击添加一些自转
            edge_vec = (closest_edge[1][0] - closest_edge[0][0], 
                        closest_edge[1][1] - closest_edge[0][1])
            edge_length = math.hypot(edge_vec[0], edge_vec[1])
            if edge_length > 0:
                edge_vec = (edge_vec[0]/edge_length, edge_vec[1]/edge_length)
                # 速度与边方向的叉积
                spin_effect = (ball.vx * edge_vec[1] - ball.vy * edge_vec[0]) * 0.1
                ball.spin += spin_effect
            
            # 将球移到七边形外以防止粘连
            penetration = ball.radius - closest_dist
            ball.x += penetration * closest_normal[0]
            ball.y += penetration * closest_normal[1]
    
    def update(self):
        if not self.running:
            return
        
        # 清除画布
        self.canvas.delete('all')
        
        # 更新七边形旋转
        self.heptagon_angle += ROTATION_SPEED / 60  # 假设约 60 FPS
        
        # 绘制七边形
        vertices = self.get_heptagon_vertices()
        self.canvas.create_polygon(vertices, outline='black', fill='', width=2)
        
        # 更新并绘制球
        for i, ball in enumerate(self.balls):
            # 移动球
            ball.move()
            
            # 检查与七边形的碰撞
            self.check_ball_heptagon_collision(ball)
            
            # 绘制单个球
            self.canvas.create_oval(
                ball.x - ball.radius, ball.y - ball.radius,
                ball.x + ball.radius, ball.y + ball.radius,
                fill=ball.color, outline='black'
            )
            
            # 根据自旋绘制带旋转的编号
            angle = ball.spin * 10  # 放大自旋以便可见旋转
            self.canvas.create_text(
                ball.x, ball.y,
                text=str(ball.number),
                font=('Arial', 10, 'bold'),
                angle=angle
            )
        
        # 检查球与球之间的碰撞
        for i in range(len(self.balls)):
            for j in range(i + 1, len(self.balls)):
                self.balls[i].collide_with_ball(self.balls[j])
        
        # 安排下一次更新
        self.last_time = self.root.after(16, self.update)  # 约 60 FPS

if __name__ == '__main__':
    root = tk.Tk()
    root.title('Bouncing Balls in a Spinning Heptagon')
    simulator = HeptagonBounceSimulator(root)
    root.mainloop()
```

</details>

## :detective: 额外发现与提示

1. 通过经验测试我们发现使用较低的 KV 缓存量化（4bit）似乎会降低生成质量——需要更多测试，但我们建议使用 `q8_0` 缓存量化。量化的目标是支持更长的上下文长度，因为 KV 缓存使用相当多的内存。
2. 我们发现 `down_proj` 在此模型中对量化极为敏感。我们不得不重新做了一些动态量化，这些量化对 `down_proj` 使用 2 位，而现在我们为所有这些矩阵至少使用 3 位。
3. 使用 `llama.cpp` 的 Flash Attention 后端确实能带来稍快的解码速度。编译时使用 `-DGGML_CUDA_FA_ALL_QUANTS=ON` 注意最好还将你的 CUDA 架构设置为在以下位置找到的值 <https://developer.nvidia.com/cuda-gpus> 以减少编译时间，然后通过以下方式设置它 `-DCMAKE_CUDA_ARCHITECTURES="80"`
4. 使用一个 `min_p=0.01`可能就足够了。 `llama.cpp`默认值为 0.1，这可能不是必要的。由于无论如何温度设置为 0.3，我们很可能不会抽样到低概率的标记，因此移除极不可能的标记是一个好主意。DeepSeek 建议对编码任务使用 0.0 的温度。

[^1]: 必须使用 8bit —— 不能用 4bit

[^2]: 机器的 CPU 线程数

[^3]: 24GB GPU 大约为 2。80GB GPU 大约为 18。

[^4]: 上下文长度


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://unsloth.ai/docs/zh/mo-xing/tutorials/deepseek-v3-0324-how-to-run-locally.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
