# DeepSeek-V3-0324：如何本地运行

{% hint style="info" %}
请参见 <https://docs.unsloth.ai/basics/deepseek-r1-0528-how-to-run-locally> （2025年5月28日更新）了解如何更快、更高效地运行 DeepSeek！
{% endhint %}

DeepSeek 又来发力了！在 2024 年 12 月和 2025 年 1 月发布 V3、R1 Zero 和 R1 之后，DeepSeek 又更新了 V3 的检查点/模型，并发布了 3 月更新！

据 DeepSeek 所说，MMLU-Pro 提升了 +5.3%，达到 81.2%。 **GPQA 提升了 +9.3 个百分点**。AIME +19.8%，LiveCodeBench +10.0%！他们提供了一张图，展示其与之前的 V3 检查点以及 GPT 4.5、Claude Sonnet 3.7 等其他模型的对比。 <mark style="background-color:blue;">**但我们要如何在本地运行一个 6710 亿参数的模型？**</mark>

<table data-full-width="true"><thead><tr><th>MoE 比特数</th><th>类型</th><th>磁盘大小</th><th>准确率</th><th>链接</th><th>详情</th></tr></thead><tbody><tr><td>1.78bit</td><td>IQ1_S</td><td><strong>173GB</strong></td><td>可以</td><td><a href="https://huggingface.co/unsloth/DeepSeek-V3-0324-GGUF/tree/main/UD-IQ1_S">链接</a></td><td>2.06/1.56bit</td></tr><tr><td>1.93bit</td><td>IQ1_M</td><td><strong>183GB</strong></td><td>一般</td><td><a href="https://huggingface.co/unsloth/DeepSeek-V3-0324-GGUF/tree/main/UD-IQ1_M">链接</a></td><td>2.5/2.06/1.56</td></tr><tr><td>2.42bit</td><td>IQ2_XXS</td><td><strong>203GB</strong></td><td><mark style="background-color:blue;"><strong>建议</strong></mark></td><td><a href="https://huggingface.co/unsloth/DeepSeek-V3-0324-GGUF/tree/main/UD-IQ2_XXS">链接</a></td><td>2.5/2.06bit</td></tr><tr><td>2.71bit</td><td>Q2_K_XL</td><td><strong>231GB</strong></td><td><mark style="background-color:purple;"><strong>建议</strong></mark></td><td><a href="https://huggingface.co/unsloth/DeepSeek-V3-0324-GGUF/tree/main/UD-Q2_K_XL">链接</a></td><td>3.5/2.5bit</td></tr><tr><td>3.5bit</td><td>Q3_K_XL</td><td><strong>320GB</strong></td><td>很好</td><td><a href="https://huggingface.co/unsloth/DeepSeek-V3-0324-GGUF/tree/main/UD-Q3_K_XL">链接</a></td><td>4.5/3.5bit</td></tr><tr><td>4.5bit</td><td>Q4_K_XL</td><td><strong>406GB</strong></td><td>最佳</td><td><a href="https://huggingface.co/unsloth/DeepSeek-V3-0324-GGUF/tree/main/UD-Q4_K_XL">链接</a></td><td>5.5/4.5bit</td></tr></tbody></table>

{% hint style="success" %}
DeepSeek V3 的原始上传版本是 float8，占用 715GB。使用 Q4\_K\_M 可将文件大小减半到大约 404GB，而我们的动态 1.78bit 量化可压缩到约 151GB。 **我们建议使用 2.7bit 量化，在体积和精度之间取得平衡！2.4bit 版本也表现不错！**
{% endhint %}

## :gear: 官方推荐设置

根据 [DeepSeek](https://huggingface.co/deepseek-ai/DeepSeek-V3-0324)，以下是推理时的推荐设置：

* <mark style="background-color:blue;">**温度 0.3**</mark> （编码时也许用 0.0， [如这里所示](https://api-docs.deepseek.com/quick_start/parameter_settings))
* Min\_P 设为 0.00（可选，但 0.01 也很好，llama.cpp 默认值是 0.1）
* 聊天模板： `<｜User｜>用 Python 创建一个简单可玩的 Flappy Bird 游戏。把最终游戏放在一个 markdown 区块中。<｜Assistant｜>`
* BOS 令牌 `<｜begin▁of▁sentence｜>` 会在分词时自动添加（不要手动添加！）
* DeepSeek 还提到使用了一个 <mark style="background-color:green;">**系统提示词**</mark> （可选）——它是中文的： `该助手为DeepSeek Chat，由深度求索公司创造。\n今天是3月24日，星期一。` 这翻译为： `该助手为 DeepSeek Chat，由 DeepSeek 创造。\n今天是 3 月 24 日，星期一。`
* <mark style="background-color:orange;">**对于 KV cache 量化，请使用 8bit，而不是 4bit——我们发现 4bit 的效果明显更差。**</mark>

## 📖 教程：如何在 llama.cpp 中运行 DeepSeek-V3

1. 获取最新的 `llama.cpp` 在 [GitHub 这里](https://github.com/ggml-org/llama.cpp)。你也可以按照下面的构建说明操作。将 `-DGGML_CUDA=ON` 改为 `-DGGML_CUDA=OFF` 如果你没有 GPU，或者只想进行 CPU 推理。 **对于 Apple Mac / Metal 设备**，设置 `-DGGML_CUDA=OFF` 然后照常继续——Metal 支持默认开启。

{% hint style="warning" %}
注意使用 `-DGGML_CUDA=ON` 在 GPU 上可能需要 5 分钟才能编译。仅 CPU 只需 1 分钟即可编译。你可能会对 llama.cpp 的预编译二进制文件感兴趣。
{% endhint %}

```bash
apt-get update
apt-get install pciutils build-essential cmake curl libcurl4-openssl-dev -y
git clone https://github.com/ggml-org/llama.cpp
cmake llama.cpp -B llama.cpp/build \
    -DBUILD_SHARED_LIBS=OFF -DGGML_CUDA=ON -DLLAMA_CURL=ON
cmake --build llama.cpp/build --config Release -j --clean-first --target llama-quantize llama-cli llama-gguf-split
cp llama.cpp/build/bin/llama-* llama.cpp
```

2. 通过以下方式下载模型（在安装 `pip install huggingface_hub hf_transfer` 之后）。你可以选择 `UD-IQ1_S`（动态 1.78bit 量化）或其他量化版本，例如 `Q4_K_M` . <mark style="background-color:green;">**我建议使用我们的 2.7bit 动态量化**</mark><mark style="background-color:green;">**&#x20;**</mark><mark style="background-color:green;">**`UD-Q2_K_XL`**</mark><mark style="background-color:green;">**&#x20;**</mark><mark style="background-color:green;">**以平衡大小和准确率**</mark>。更多版本见： <https://huggingface.co/unsloth/DeepSeek-V3-0324-GGUF>

{% code overflow="wrap" %}

```python
# !pip install huggingface_hub hf_transfer
import os
os.environ["HF_HUB_ENABLE_HF_TRANSFER"] = "1"
from huggingface_hub import snapshot_download
snapshot_download(
    repo_id = "unsloth/DeepSeek-V3-0324-GGUF-UD",
    local_dir = "unsloth/DeepSeek-V3-0324-GGUF-UD",
    allow_patterns = ["*UD-Q2_K_XL*"], # 动态 2.7bit（230GB） 使用 "*UD-IQ_S*" 代表动态 1.78bit（151GB）
)
```

{% endcode %}

3. 按照我们在 DeepSeek R1 的 1.58bit 动态量化中描述的方法，运行 Unsloth 的 Flappy Bird 测试。
4. 编辑 `--threads 32` 来设置 CPU 线程数， `--ctx-size 16384` 来设置上下文长度， `--n-gpu-layers 2` 来设置 GPU 卸载多少层。如果你的 GPU 显存不足，请尝试调整它。如果你只进行 CPU 推理，也请移除它。

<pre class="language-bash" data-overflow="wrap"><code class="lang-bash">./llama.cpp/llama-cli \
    --model unsloth/DeepSeek-V3-0324-GGUF-UD/blob/main/UD-Q2_K_XL/DeepSeek-V3-0324-UD-Q2_K_XL-00001-of-00006.gguf \\
    <a data-footnote-ref href="#user-content-fn-1">--cache-type-k q8_0 </a>\\
    <a data-footnote-ref href="#user-content-fn-2">--threads 20</a> \\
    <a data-footnote-ref href="#user-content-fn-3">--n-gpu-layers 2</a> \\
    -no-cnv \\
    --prio 3 \
    --temp 0.3 \\
    --min-p 0.01 \
    <a data-footnote-ref href="#user-content-fn-4">--ctx-size 4096</a> \\
    --seed 3407 \
    --prompt "&#x3C;｜User｜>Create a Flappy Bird game in Python. You must include these things:\n1. You must use pygame.\n2. The background color should be randomly chosen and is a light shade. Start with a light blue color.\n3. Pressing SPACE multiple times will accelerate the bird.\n4. The bird's shape should be randomly chosen as a square, circle or triangle. The color should be randomly chosen as a dark color.\n5. Place on the bottom some land colored as dark brown or yellow chosen randomly.\n6. Make a score shown on the top right side. Increment if you pass pipes and don't hit them.\n7. Make randomly spaced pipes with enough space. Color them randomly as dark green or light brown or a dark gray shade.\n8. When you lose, show the best score. Make the text inside the screen. Pressing q or Esc will quit the game. Restarting is pressing SPACE again.\nThe final game should be inside a markdown section in Python. Check your code for errors and fix them before the final markdown section.&#x3C;｜Assistant｜>"
</code></pre>

<details>

<summary>如果运行上面的命令，我们会得到两种非常不同的结果。<br><br><strong>标准 2-bit 版本：</strong> 点击查看结果 <em><mark style="color:红色;"><strong>（癫痫警告！）</strong></mark></em><br><strong>动态 2-bit 版本：</strong> 结果如下：</summary>

<img src="/files/5fe98e3a96a176b3447ae07103a3205ec0f94c6e" alt="" data-size="original">

标准 2-bit。背景失败，碰撞失败

</details>

<div align="center"><figure><img src="/files/e00c6c2a8132ae460a5f8e6d1e8af0b6ae5ee294" alt="" width="240"><figcaption><p>动态 2-bit。成功创建了一个可玩的游戏。</p></figcaption></figure></div>

5. 和 DeepSeek-R1 一样，V3 有 61 层。例如，对于 24GB GPU 或 80GB GPU，四舍五入后你可以预期在以下层数之后卸载（如果发生内存不足，就减 1 层）：

| 量化      | 文件大小  | 24GB GPU | 80GB GPU | 2x80GB GPU |
| ------- | ----- | -------- | -------- | ---------- |
| 1.73bit | 173GB | 5        | 25       | 56         |
| 2.22bit | 183GB | 4        | 22       | 49         |
| 2.51bit | 212GB | 2        | 19       | 32         |

### 在 Mac / Apple 设备上运行

对于 Apple Metal 设备，请注意 `--n-gpu-layers`。如果发现机器内存不足，请降低这个值。对于一台 128GB 统一内存的机器，应该可以卸载大约 59 层。

```bash
./llama.cpp/llama-cli \
    --model DeepSeek-R1-GGUF/DeepSeek-V3-0324-UD-IQ1_S/DeepSeek-V3-0324-UD-IQ1_S-00001-of-00003.gguf \\
    --cache-type-k q4_0 \\
    --threads 16 \\
    --prio 2 \
    --temp 0.6 \\
    --ctx-size 8192 \\
    --seed 3407 \
    --n-gpu-layers 59 \\
    -no-cnv \\
    --prompt "<｜User｜>Create a Flappy Bird game in Python.<｜Assistant｜>"
```

## :8ball: 七边形测试

我们也通过以下方式测试我们的动态量化： [r/Localllama](https://www.reddit.com/r/LocalLLaMA/comments/1j7r47l/i_just_made_an_animation_of_a_ball_bouncing/) 中的方式，通过 Heptagon 测试来测试动态量化版本，该测试会要求模型创建一个基础物理引擎，以模拟球体在一个移动的封闭六边形中旋转。（原文如此）

<figure><img src="/files/d38353ab2a5265a67c2480d370a152680663bb43" alt="" width="563"><figcaption><p>目标是让七边形旋转，并且七边形中的球应该移动。</p></figcaption></figure>

{% code overflow="wrap" %}

```bash
./llama.cpp/llama-cli \
    --model unsloth/DeepSeek-V3-0324-GGUF-UD/blob/main/UD-Q2_K_XL/DeepSeek-V3-0324-UD-Q2_K_XL-00001-of-00006.gguf \\
    --cache-type-k q8_0 \\
    --threads 20 \\
    --n-gpu-layers 2 \\
    -no-cnv \\
    --prio 3 \
    --temp 0.3 \\
    --min-p 0.01 \
    --ctx-size 4096 \\
    --seed 3407 \
    --prompt "<｜User｜>编写一个 Python 程序，在一个旋转的七边形内部显示 20 个球弹跳：\n- 所有球具有相同的半径。\n- 所有球上都要有从 1 到 20 的编号。\n- 所有球在开始时从七边形中心落下。\n- 颜色为：#f8b862, #f6ad49, #f39800, #f08300, #ec6d51, #ee7948, #ed6d3d, #ec6800, #ec6800, #ee7800, #eb6238, #ea5506, #ea5506, #eb6101, #e49e61, #e45e32, #e17b34, #dd7a56, #db8449, #d66a35\n- 球应受到重力和摩擦力影响，并且必须以真实方式从旋转的墙壁上弹开。球之间也应发生碰撞。\n- 所有球的材质决定了其碰撞反弹高度不会超过七边形的半径，但要高于球的半径。\n- 所有球都会因摩擦而旋转，球上的数字可用于指示球的自转。\n- 七边形围绕其中心旋转，旋转速度为每 5 秒 360 度。\n- 七边形大小应足够容纳所有球。\n- 不要使用 pygame 库；请自行实现碰撞检测算法、碰撞响应等。允许使用以下 Python 库：tkinter、math、numpy、dataclasses、typing、sys。\n- 所有代码应放在一个 Python 文件中。<｜Assistant｜>"
```

{% endcode %}

<table data-view="cards"><thead><tr><th></th><th data-type="files"></th><th data-hidden data-card-cover data-type="files"></th></tr></thead><tbody><tr><td>非动态 2-bit。失败 - <mark style="background-color:red;">癫痫警告</mark> 再次！</td><td><a href="/files/7e8f62d1dcdc8ccf322aa80dccdcbff48e6f2d12">/files/7e8f62d1dcdc8ccf322aa80dccdcbff48e6f2d12</a></td><td><a href="/files/a2d69ffd41e6ebd7cc1b44b0df875013fed7c2e1">/files/a2d69ffd41e6ebd7cc1b44b0df875013fed7c2e1</a></td></tr><tr><td>动态 2-bit。实际上正确解决了七边形谜题！！</td><td><a href="/files/d7962c47f1ceba3b53d88f5e2042a56991d24716">/files/d7962c47f1ceba3b53d88f5e2042a56991d24716</a></td><td><a href="/files/7a4b44c8cc8455c0c3298766356a3d04ae747c01">/files/7a4b44c8cc8455c0c3298766356a3d04ae747c01</a></td></tr><tr><td>原始 float8</td><td><a href="/files/0ceeaeae184021a08e02895dd041c1bc092aeda7">/files/0ceeaeae184021a08e02895dd041c1bc092aeda7</a></td><td><a href="/files/bd72ca709d2cb6790de5168bfbbdd28f6134d6d6">/files/bd72ca709d2cb6790de5168bfbbdd28f6134d6d6</a></td></tr></tbody></table>

动态 2.7 bit 量化的大小只有 230GB，居然真的成功解决了七边形谜题！下面是全部 3 个版本（包括完整 fp8）的完整输出：

<details>

<summary>动态 2-bit 七边形代码</summary>

```python
import tkinter as tk
import math
import numpy as np
from dataclasses import dataclass
from typing import List, Tuple, Optional

# 常量
HEPTAGON_RADIUS = 300
BALL_RADIUS = 20
GRAVITY = 0.2
FRICTION = 0.99
BOUNCE_FACTOR = 0.8
ROTATION_SPEED = 360 / 5  # 每秒度数
SPIN_FRICTION = 0.98
BALL_COLORS = [
    "#f8b862", "#f6ad49", "#f39800", "#f08300", "#ec6d51",
    "#ee7948", "#ed6d3d", "#ec6800", "#ec6800", "#ee7800",
    "#eb6238", "#ea5506", "#ea5506", "#eb6101", "#e49e61",
    "#e45e32", "#e17b34", "#dd7a56", "#db8449", "#d66a35"
]

@dataclass
class Ball:
    x: float
    y: float
    vx: float
    vy: float
    number: int
    spin: float = 0.0
    color: str = "#000000"

@dataclass
class Wall:
    x1: float
    y1: float
    x2: float
    y2: float

class BouncingBalls:
    def __init__(self, root):
        self.root = root
        self.canvas = tk.Canvas(root, width=800, height=800, bg="white")
        self.canvas.pack()
        self.balls: List[Ball] = []
        self.walls: List[Wall] = []
        self.heptagon_angle = 0
        self.last_time = 0
        
        self.setup_balls()
        self.setup_heptagon()
        
        self.root.after(16, self.update)
        self.root.bind("<space>", self.reset_balls)
    
    def setup_balls(self):
        for i in range(20):
            ball = Ball(
                x=400,
                y=400,
                vx=np.random.uniform(-5, 5),
                vy=np.random.uniform(-5, 5),
                number=i+1,
                color=BALL_COLORS[i]
            )
            self.balls.append(ball)
    
    def setup_heptagon(self):
        # 创建初始七边形墙壁
        self.update_heptagon_walls(0)
    
    def update_heptagon_walls(self, angle):
        self.walls = []
        center_x, center_y = 400, 400
        angle_rad = math.radians(angle)
        
        for i in range(7):
            angle1 = angle_rad + 2 * math.pi * i / 7
            angle2 = angle_rad + 2 * math.pi * (i + 1) / 7
            
            x1 = center_x + HEPTAGON_RADIUS * math.cos(angle1)
            y1 = center_y + HEPTAGON_RADIUS * math.sin(angle1)
            x2 = center_x + HEPTAGON_RADIUS * math.cos(angle2)
            y2 = center_y + HEPTAGON_RADIUS * math.sin(angle2)
            
            self.walls.append(Wall(x1, y1, x2, y2))
    
    def reset_balls(self, event=None):
        for ball in self.balls:
            ball.x = 400
            ball.y = 400
            ball.vx = np.random.uniform(-5, 5)
            ball.vy = np.random.uniform(-5, 5)
            ball.spin = np.random.uniform(-5, 5)
    
    def update(self):
        current_time = self.root.after_idle(self.root.after, 16, self.update)
        if self.last_time == 0:
            self.last_time = current_time
            return
        
        # 计算 delta 时间（近似）
        dt = 0.016  # 假设约 60 FPS
        
        # 更新七边形旋转
        self.heptagon_angle += ROTATION_SPEED * dt
        self.update_heptagon_walls(self.heptagon_angle)
        
        # 更新球
        for ball in self.balls:
            # 应用重力
            ball.vy += GRAVITY
            
            # 应用摩擦
            ball.vx *= FRICTION
            ball.vy *= FRICTION
            ball.spin *= SPIN_FRICTION
            
            # 移动球
            ball.x += ball.vx
            ball.y += ball.vy
            
            # 检查与墙壁的碰撞
            self.check_wall_collisions(ball)
            
            # 检查与其他球的碰撞
            for other in self.balls:
                if other.number != ball.number:
                    self.check_ball_collision(ball, other)
        
        # 绘制一切
        self.draw()
    
    def check_wall_collisions(self, ball):
        for wall in self.walls:
            # 找到墙线段上离球最近的点
            closest = self.closest_point_on_segment(
                wall.x1, wall.y1, wall.x2, wall.y2, ball.x, ball.y
            )
            
            # 计算到墙的距离
            dx = ball.x - closest[0]
            dy = ball.y - closest[1]
            distance = math.sqrt(dx*dx + dy*dy)
            
            if distance < BALL_RADIUS:
                # 检测到碰撞
                # 计算法向量
                nx = dx / distance
                ny = dy / distance
                
                # 计算沿法线方向的相对速度
                v_rel = ball.vx * nx + ball.vy * ny
                
                if v_rel < 0:  # 正在朝墙移动
                    # 计算冲量
                    j = -(1 + BOUNCE_FACTOR) * v_rel
                    
                    # 应用冲量
                    ball.vx += j * nx
                    ball.vy += j * ny
                    
                    # 根据碰撞增加一些自旋
                    ball.spin += (ball.vx * ny - ball.vy * nx) * 0.1
                    
                    # 将球移出碰撞区域
                    penetration = BALL_RADIUS - distance
                    ball.x += penetration * nx
                    ball.y += penetration * ny
    
    def check_ball_collision(self, ball1, ball2):
        dx = ball2.x - ball1.x
        dy = ball2.y - ball1.y
        distance = math.sqrt(dx*dx + dy*dy)
        
        if distance < 2 * BALL_RADIUS:
            # 检测到碰撞
            nx = dx / distance
            ny = dy / distance
            
            # 计算相对速度
            v_rel_x = ball2.vx - ball1.vx
            v_rel_y = ball2.vy - ball1.vy
            v_rel = v_rel_x * nx + v_rel_y * ny
            
            if v_rel < 0:  # 正在彼此靠近
                # 计算冲量
                j = -(1 + BOUNCE_FACTOR) * v_rel / 2
                
                # 应用冲量
                ball1.vx -= j * nx
                ball1.vy -= j * ny
                ball2.vx += j * nx
                ball2.vy += j * ny
                
                # 根据碰撞增加自旋
                ball1.spin += (ball1.vx * ny - ball1.vy * nx) * 0.05
                ball2.spin += (ball2.vx * ny - ball2.vy * nx) * 0.05
                
                # 将球分开
                penetration = 2 * BALL_RADIUS - distance
                ball1.x -= penetration * nx * 0.5
                ball1.y -= penetration * ny * 0.5
                ball2.x += penetration * nx * 0.5
                ball2.y += penetration * ny * 0.5
    
    @staticmethod
    def closest_point_on_segment(x1, y1, x2, y2, x, y):
        # 点到线段起点的向量
        dx = x - x1
        dy = y - y1
        
        # 线段向量
        sx = x2 - x1
        sy = y2 - y1
        
        # 点在该线段上的投影
        dot = dx * sx + dy * sy
        len_sq = sx * sx + sy * sy
        param = dot / len_sq if len_sq != 0 else -1
        
        if param < 0:
            return x1, y1
        elif param > 1:
            return x2, y2
        else:
            return x1 + param * sx, y1 + param * sy
    
    def draw(self):
        self.canvas.delete("all")
        
        # 绘制七边形
        points = []
        for wall in self.walls:
            points.extend([wall.x1, wall.y1])
        self.canvas.create_polygon(points, fill="", outline="black", width=2)
        
        # 绘制球
        for ball in self.balls:
            # 绘制球
            self.canvas.create_oval(
                ball.x - BALL_RADIUS, ball.y - BALL_RADIUS,
                ball.x + BALL_RADIUS, ball.y + BALL_RADIUS,
                fill=ball.color, outline="black"
            )
            
            # 根据自旋绘制数字旋转效果
            angle = ball.spin * 10  # 视觉效果下的自旋缩放
            self.canvas.create_text(
                ball.x, ball.y,
                text=str(ball.number),
                font=("Arial", 12, "bold"),
                angle=angle
            )

if __name__ == "__main__":
    root = tk.Tk()
    root.title("旋转七边形中的弹跳球")
    app = BouncingBalls(root)
    root.mainloop()
```

</details>

<details>

<summary>非动态 2-bit 七边形代码</summary>

```python
import tkinter as tk
import math
import random
from dataclasses import dataclass
from typing import List, Tuple, Optional
import sys

# 常量
WIDTH, HEIGHT = 800, 800
HEPTAGON_RADIUS = 300
BALL_RADIUS = 15
GRAVITY = 0.5
FRICTION = 0.999
ELASTICITY = 0.8
ROTATION_SPEED = 2 * math.pi / 5  # 每 5 秒 360 度
SPIN_DECAY = 0.99

# 球的颜色
BALL_COLORS = [
    "#f8b862", "#f6ad49", "#f39800", "#f08300", "#ec6d51",
    "#ee7948", "#ed6d3d", "#ec6800", "#ec6800", "#ee7800",
    "#eb6238", "#ea5506", "#ea5506", "#eb6101", "#e49e61",
    "#e45e32", "#e17b34", "#dd7a56", "#db8449", "#d66a35"
]

@dataclass
class Ball:
    x: float
    y: float
    vx: float
    vy: float
    radius: float
    color: str
    number: int
    spin: float = 0.0

@dataclass
class Heptagon:
    center_x: float
    center_y: float
    radius: float
    angle: float = 0.0

class BouncingBalls:
    def __init__(self, root):
        self.root = root
        self.canvas = tk.Canvas(root, width=WIDTH, height=HEIGHT, bg="white")
        self.canvas.pack()
        
        self.heptagon = Heptagon(WIDTH//2, HEIGHT//2, HEPTAGON_RADIUS)
        self.balls = []
        self.setup_balls()
        
        self.root.after(0, self.update)
        self.root.mainloop()
    
    def setup_balls(self):
        center_x, center_y = WIDTH//2, HEIGHT//2
        for i in range(20):
            self.balls.append(Ball(
                x=center_x,
                y=center_y,
                vx=0,
                vy=0,
                radius=BALL_RADIUS,
                color=BALL_COLORS[i],
                number=i+1,
                spin=0
            ))
    
    def update(self):
        self.canvas.delete("all")
        
        # 更新七边形角度
        self.heptagon.angle += ROTATION_SPEED / 60  # 假设 60 FPS
        
        # 绘制七边形
        self.draw_heptagon()
        
        # 更新并绘制球
        for ball in self.balls:
            # 应用重力
            ball.vy += GRAVITY
            
            # 更新位置
            ball.x += ball.vx
            ball.y += ball.vy
            
            # 应用摩擦
            ball.vx *= FRICTION
            ball.vy *= FRICTION
            
            # 应用自旋衰减
            ball.spin *= SPIN_DECAY
            
            # 检查与七边形墙壁的碰撞
            self.check_heptagon_collision(ball)
            
            # 检查与其他球的碰撞
            for other in self.balls:
                if other != ball:
                    if self.check_ball_collision(ball, other):
                        self.resolve_ball_collision(ball, other)
            
            # 绘制球
            self.draw_ball(ball)
        
        self.root.after(16, self.update)  # ~60 FPS
    
    def draw_heptagon(self):
        center_x, center_y = self.heptagon.center_x, self.heptagon.center_y
        points = []
        for i in range(7):
            angle = self.heptagon.angle + i * 2 * math.pi / 7
            x = center_x + self.heptagon.radius * math.cos(angle)
            y = center_y + self.heptagon.radius * math.sin(angle)
            points.append((x, y))
        
        # 绘制七边形
        self.canvas.create_polygon(
            [points[0], points[1], points[2], points[3], 
             points[4], points[5], points[6]],
            outline="black", fill="", width=2
        )
    
    def draw_ball(self, ball):
        self.canvas.create_oval(
            ball.x - ball.radius,
            ball.y - ball.radius,
            ball.x + ball.radius,
            ball.y + ball.radius,
            fill=ball.color,
            outline="black"
        )
        
        # 绘制数字
        self.canvas.create_text(
            ball.x, ball.y,
            text=str(ball.number),
            fill="black"
        )
    
    def check_heptagon_collision(self, ball):
        center_x, center_y = WIDTH//2, HEIGHT//2
        
        # 检查与中心的距离
        dx = ball.x - center_x
        dy = ball.y - center_y
        dist = math.sqrt(dx**2 + dy**2)
        
        if dist + ball.radius > self.heptagon.radius:
            # 找到从中心指向球的法向量
            angle = math.atan2(dy, dx)
            normal_x = math.cos(angle)
            normal_y = math.sin(angle)
            
            # 将球移回七边形内部
            overlap = (dist + ball.radius) - self.heptagon.radius
            ball.x -= overlap * normal_x
            ball.y -= overlap * normal_y
            
            # 反射速度
            dot_product = ball.vx * normal_x + ball.vy * normal_y
            ball.vx -= 2 * dot_product * normal_x * ELASTICITY
            ball.vy -= 2 * dot_product * normal_y * ELASTICITY
    
    def check_ball_collision(self, ball1, ball2):
        dx = ball2.x - ball1.x
        dy = ball2.y - ball1.y
        distance = math.sqrt(dx**2 + dy**2)
        return distance < (ball1.radius + ball2.radius)
    
    def resolve_ball_collision(self, ball1, ball2):
        dx = ball2.x - ball1.x
        dy = ball2.y - ball1.y
        distance = math.sqrt(dx**2 + dy**2)
        
        # 法向量
        nx = dx / distance
        ny = dy / distance
        
        # 相对速度
        dvx = ball2.vx - ball1.vx
        dvy = ball2.vy - ball1.vy
        
        # 计算冲量
        impulse = 2 * (dvx * nx + dvy * ny) / 2
        impulse *= ELASTICITY
        
        # 应用冲量
        ball1.vx -= impulse * nx
        ball1.vy -= impulse * ny
        ball2.vx += impulse * nx
        ball2.vy += impulse * ny
        
        # 将球分开以防粘连
        overlap = (ball1.radius + ball2.radius) - distance
        ball1.x -= overlap * nx / 2
        ball1.y -= overlap * ny / 2
        ball2.x += overlap * nx / 2
        ball2.y += overlap * ny / 2
    
    def run(self):
        self.root.mainloop()

if __name__ == "__main__":
    root = tk.Tk()
    root.title("旋转七边形中的弹跳球")
    app = BouncingBalls(root)
    app.run()
```

</details>

<details>

<summary>Float8 七边形代码</summary>

```python
import tkinter as tk
import math
import numpy as np
from dataclasses import dataclass
from typing import List, Tuple, Optional

# 常量
WIDTH, HEIGHT = 800, 800
CENTER_X, CENTER_Y = WIDTH // 2, HEIGHT // 2
GRAVITY = 0.2
FRICTION = 0.99
BALL_RADIUS = 15
HEPTAGON_RADIUS = 300
ROTATION_SPEED = 360 / 5  # 每秒度数
ELASTICITY = 0.8
SPIN_FRICTION = 0.98

# 球的颜色
BALL_COLORS = [
    '#f8b862', '#f6ad49', '#f39800', '#f08300', '#ec6d51',
    '#ee7948', '#ed6d3d', '#ec6800', '#ec6800', '#ee7800',
    '#eb6238', '#ea5506', '#ea5506', '#eb6101', '#e49e61',
    '#e45e32', '#e17b34', '#dd7a56', '#db8449', '#d66a35'
]

@dataclass
class Ball:
    x: float
    y: float
    vx: float
    vy: float
    radius: float
    color: str
    number: int
    spin: float = 0.0

    def move(self):
        self.x += self.vx
        self.y += self.vy
        self.vy += GRAVITY
        self.vx *= FRICTION
        self.vy *= FRICTION
        self.spin *= SPIN_FRICTION

    def collide_with_ball(self, other: 'Ball'):
        dx = other.x - self.x
        dy = other.y - self.y
        distance = math.hypot(dx, dy)
        
        if distance < self.radius + other.radius:
            # 计算碰撞法线
            nx = dx / distance
            ny = dy / distance
            
            # 计算相对速度
            dvx = other.vx - self.vx
            dvy = other.vy - self.vy
            
            # 计算冲量
            impulse = 2 * (dvx * nx + dvy * ny) / (1/self.radius + 1/other.radius)
            
            # 应用冲量
            self.vx += impulse * nx / self.radius
            self.vy += impulse * ny / self.radius
            other.vx -= impulse * nx / other.radius
            other.vy -= impulse * ny / other.radius
            
            # 分离球体以防止粘连
            overlap = (self.radius + other.radius - distance) / 2
            self.x -= overlap * nx
            self.y -= overlap * ny
            other.x += overlap * nx
            other.y += overlap * ny
            
            # 传递一些自旋
            transfer = impulse * 0.01
            self.spin -= transfer
            other.spin += transfer

class HeptagonBounceSimulator:
    def __init__(self, root):
        self.root = root
        self.canvas = tk.Canvas(root, width=WIDTH, height=HEIGHT, bg='white')
        self.canvas.pack()
        
        self.balls = self.create_balls()
        self.heptagon_angle = 0
        self.last_time = 0
        self.running = True
        
        self.root.bind('<space>', self.toggle_pause)
        self.root.bind('<Escape>', lambda e: root.destroy())
        
        self.last_time = self.root.after(0, self.update)
    
    def create_balls(self) -> List[Ball]:
        balls = []
        for i in range(20):
            # 所有球从中心开始，并带有较小的随机速度
            angle = np.random.uniform(0, 2 * math.pi)
            speed = np.random.uniform(0.5, 2)
            vx = math.cos(angle) * speed
            vy = math.sin(angle) * speed
            
            balls.append(Ball(
                x=CENTER_X,
                y=CENTER_Y,
                vx=vx,
                vy=vy,
                radius=BALL_RADIUS,
                color=BALL_COLORS[i],
                number=i+1,
                spin=np.random.uniform(-2, 2)
            ))
        return balls
    
    def toggle_pause(self, event):
        self.running = not self.running
        if self.running:
            self.last_time = self.root.after(0, self.update)
    
    def get_heptagon_vertices(self) -> List[Tuple[float, float]]:
        vertices = []
        for i in range(7):
            angle = math.radians(self.heptagon_angle + i * 360 / 7)
            x = CENTER_X + HEPTAGON_RADIUS * math.cos(angle)
            y = CENTER_Y + HEPTAGON_RADIUS * math.sin(angle)
            vertices.append((x, y))
        return vertices
    
    def check_ball_heptagon_collision(self, ball: Ball):
        vertices = self.get_heptagon_vertices()
        closest_dist = float('inf')
        closest_normal = (0, 0)
        closest_edge = None
        
        # 检查与七边形每条边的碰撞
        for i in range(len(vertices)):
            p1 = vertices[i]
            p2 = vertices[(i + 1) % len(vertices)]
            
            # 从 p1 到 p2 的向量
            edge_x = p2[0] - p1[0]
            edge_y = p2[1] - p1[1]
            edge_length = math.hypot(edge_x, edge_y)
            
            # 归一化边向量
            edge_x /= edge_length
            edge_y /= edge_length
            
            # 法向量（垂直于边，指向内部）
            nx = -edge_y
            ny = edge_x
            
            # 从 p1 到球的向量
            ball_to_p1_x = ball.x - p1[0]
            ball_to_p1_y = ball.y - p1[1]
            
            # 将球投影到边法线上
            projection = ball_to_p1_x * nx + ball_to_p1_y * ny
            
            # 如果投影为负，球在七边形外部
            if projection < ball.radius:
                # 找到边上离球最近的点
                edge_proj = ball_to_p1_x * edge_x + ball_to_p1_y * edge_y
                edge_proj = max(0, min(edge_length, edge_proj))
                closest_x = p1[0] + edge_proj * edge_x
                closest_y = p1[1] + edge_proj * edge_y
                
                # 球到边上最近点的距离
                dist = math.hypot(ball.x - closest_x, ball.y - closest_y)
                
                if dist < closest_dist:
                    closest_dist = dist
                    closest_normal = (nx, ny)
                    closest_edge = (p1, p2)
        
        if closest_dist < ball.radius:
            # 计算弹跳响应
            dot_product = ball.vx * closest_normal[0] + ball.vy * closest_normal[1]
            
            # 应用带弹性的反弹
            ball.vx -= (1 + ELASTICITY) * dot_product * closest_normal[0]
            ball.vy -= (1 + ELASTICITY) * dot_product * closest_normal[1]
            
            # 根据撞击添加一些自旋
            edge_vec = (closest_edge[1][0] - closest_edge[0][0], 
                        closest_edge[1][1] - closest_edge[0][1])
            edge_length = math.hypot(edge_vec[0], edge_vec[1])
            if edge_length > 0:
                edge_vec = (edge_vec[0]/edge_length, edge_vec[1]/edge_length)
                # 速度与边方向的叉积
                spin_effect = (ball.vx * edge_vec[1] - ball.vy * edge_vec[0]) * 0.1
                ball.spin += spin_effect
            
            # 将球移出七边形以防止粘连
            penetration = ball.radius - closest_dist
            ball.x += penetration * closest_normal[0]
            ball.y += penetration * closest_normal[1]
    
    def update(self):
        if not self.running:
            return
        
        # 清空画布
        self.canvas.delete('all')
        
        # 更新七边形旋转
        self.heptagon_angle += ROTATION_SPEED / 60  # 假设约为 60 FPS
        
        # 绘制七边形
        vertices = self.get_heptagon_vertices()
        self.canvas.create_polygon(vertices, outline='black', fill='', width=2)
        
        # 更新并绘制球
        for i, ball in enumerate(self.balls):
            # 移动球
            ball.move()
            
            # 检查与七边形的碰撞
            self.check_ball_heptagon_collision(ball)
            
            # 绘制球
            self.canvas.create_oval(
                ball.x - ball.radius, ball.y - ball.radius,
                ball.x + ball.radius, ball.y + ball.radius,
                fill=ball.color, outline='black'
            )
            
            # 根据自旋绘制数字旋转效果
            angle = ball.spin * 10  # 缩放自旋以显示可见旋转
            self.canvas.create_text(
                ball.x, ball.y,
                text=str(ball.number),
                font=('Arial', 10, 'bold'),
                angle=angle
            )
        
        # 检查球与球之间的碰撞
        for i in range(len(self.balls)):
            for j in range(i + 1, len(self.balls)):
                self.balls[i].collide_with_ball(self.balls[j])
        
        # 安排下一次更新
        self.last_time = self.root.after(16, self.update)  # 约 60 FPS

if __name__ == '__main__':
    root = tk.Tk()
    root.title('旋转七边形中的弹跳球')
    simulator = HeptagonBounceSimulator(root)
    root.mainloop()
```

</details>

## :detective: 额外发现与提示

1. 我们通过经验测试发现，使用较低的 KV 缓存量化（4bit）似乎会降低生成质量——还需要进行更多测试，但我们建议使用 `q8_0` 缓存量化。量化的目标是支持更长的上下文长度，因为 KV 缓存会占用相当多的内存。
2. 我们发现 `down_proj` 在这个模型中对量化极其敏感。我们不得不重做一些使用 2bits 的动态量化， `down_proj` 现在我们对所有这些矩阵都将 3bits 作为最低值。
3. 使用 `llama.cpp` 的 Flash Attention 后端确实会带来稍快一些的解码速度。编译时请使用 `-DGGML_CUDA_FA_ALL_QUANTS=ON` 。另外，最好将你的 CUDA 架构设置为 <https://developer.nvidia.com/cuda-gpus> 中所示的值，以减少编译时间，然后通过 `-DCMAKE_CUDA_ARCHITECTURES="80"`
4. 使用一个 `min_p=0.01`应该就足够了。 `llama.cpp`默认值为 0.1，这可能没有必要。毕竟已经使用了 0.3 的温度，因此我们很可能极不可能采样到低概率词元，所以移除极不可能的词元是个好主意。DeepSeek 建议在编码任务中使用 0.0 温度。

[^1]: 必须使用 8bit——不是 4bit

[^2]: 你机器拥有的 CPU 线程数

[^3]: 24GB GPU 约为 2。80GB GPU 约为 18。

[^4]: 上下文长度


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://unsloth.ai/docs/zh/mo-xing/tutorials/deepseek-v3-0324-how-to-run-locally.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
