🌠QwQ-32B：如何有效运行

如何使用我们的漏洞修复和避免无尽生成 + GGUF 有效运行 QwQ-32B。

Qwen 发布了 QwQ-32B —— 一个在许多基准测试上性能可与 DeepSeek-R1 相媲美的推理模型，基准测试。然而，人们遇到了 无限生成, 很多重复，<think> 令牌问题和微调问题。我们希望本指南能帮助调试并解决大多数问题！

我们修复错误后上传的模型非常适合微调、vLLM 和 Transformers。如果你使用 llama.cpp 及以 llama.cpp 为后端的引擎，请按照我们的此处说明来修复无限生成问题。

带有我们修复的 Unsloth QwQ-32B 上传如下：

⚙️ 官方推荐设置

根据 Qwen，以下是推理的推荐设置：

温度（Temperature）为 0.6
Top_K 为 40（或 20 到 40）
Min_P 为 0.00（可选，但 0.01 效果很好，llama.cpp 的默认值是 0.1）
Top_P 为 0.95
重复惩罚为 1.0。（在 llama.cpp 和 transformers 中 1.0 表示禁用）
聊天模板： <|im_start|>user\n用 Python 创建一个 Flappy Bird 游戏。<|im_end|>\n<|im_start|>assistant\n<think>\n

llama.cpp 使用 min_p = 0.1默认值可能会导致问题。强制将其设为 0.0。

👍 llama.cpp 的推荐设置

我们注意到许多人使用了一个 重复惩罚（Repetition Penalty） 大于 1.0。例如 1.1 到 1.5。实际上这会干扰 llama.cpp 的采样机制。重复惩罚的目的是对重复生成进行惩罚，但我们发现它的效果并不如预期。

关闭 重复惩罚（Repetition Penalty） 也可行（即将其设置为 1.0），但我们发现使用它有助于惩罚无限生成。

要使用它，我们发现你还必须在 llama.cpp 中编辑采样器的顺序，使其在应用 重复惩罚（Repetition Penalty）之前，否则会出现无限生成。所以添加这一项：

--samplers "top_k;top_p;min_p;temperature;dry;typ_p;xtc"

默认情况下，llama.cpp 使用如下顺序：

--samplers "dry;top_k;typ_p;top_p;min_p;xtc;temperature"

我们基本上重新排列了 temperature 和 dry，并将 min_p 提前。这意味着我们按以下顺序应用采样器：

top_k=40
top_p=0.95
min_p=0.0
temperature=0.6
dry
typ_p
xtc

如果你仍然遇到问题，可以将--repeat-penalty 1.0 提高到 1.2 或 1.3。

致谢 @krist486 感谢其提醒我注意 llama.cpp 的采样方向。

☀️ Dry 重复惩罚

我们调查了使用 dry 惩罚（dry penalty） 如在以下建议中所述 https://github.com/ggml-org/llama.cpp/blob/master/examples/main/README.md 使用 0.8 的值，但我们实际上发现这会 反而导致语法问题，尤其是在编码时。如果你仍然遇到问题，你可以将dry 惩罚增加到 0.8。

如果你决定使用，采用我们交换后的采样顺序也可以有所帮助。 dry 惩罚（dry penalty）.

🦙 教程：如何在 Ollama 中运行 QwQ-32B

安装 ollama 如果你还没有安装！

apt-get update
apt-get install pciutils -y
curl -fsSL https://ollama.com/install.sh | sh

运行模型！注意如果失败，你可以在另一个终端调用 ollama serve如果失败！我们在 param 中！

ollama run hf.co/unsloth/QwQ-32B-GGUF:Q4_K_M

📖 教程：如何在 llama.cpp 中运行 QwQ-32B

获取最新的 llama.cpp 在此处的 GitHub。您也可以按照下面的构建说明进行。若 -DGGML_CUDA=ON 更改为 -DGGML_CUDA=OFF 如果您没有 GPU 或仅想要在 CPU 上进行推理。 对于 Apple Mac / Metal 设备，设置 -DGGML_CUDA=OFF 然后照常继续 - Metal 支持默认启用。

apt-get update
apt-get install pciutils build-essential cmake curl libcurl4-openssl-dev -y
git clone https://github.com/ggml-org/llama.cpp
cmake llama.cpp -B llama.cpp/build \
    -DBUILD_SHARED_LIBS=ON -DGGML_CUDA=ON -DLLAMA_CURL=ON
cmake --build llama.cpp/build --config Release -j --clean-first --target llama-quantize llama-cli llama-gguf-split
cp llama.cpp/build/bin/llama-* llama.cpp

通过以下方式下载模型（在安装 pip install huggingface_hub hf_transfer ）。你可以选择 Q4_K_M，或其他量化版本（例如 BF16 全精度）。更多版本见： https://huggingface.co/unsloth/QwQ-32B-GGUF

# !pip install huggingface_hub hf_transfer
import os
os.environ["HF_HUB_ENABLE_HF_TRANSFER"] = "1"
from huggingface_hub import snapshot_download
snapshot_download(
    repo_id = "unsloth/QwQ-32B-GGUF",
    local_dir = "unsloth-QwQ-32B-GGUF",
    allow_patterns = ["*Q4_K_M*"], # 适用于 Q4_K_M
)

运行 Unsloth 的 Flappy Bird 测试，它会将输出保存到 Q4_K_M_yes_samplers.txt
编辑 --threads 32 用于设置 CPU 线程数， --ctx-size 16384 用于上下文长度， --n-gpu-layers 99 用于指定将多少层卸载到 GPU。若 GPU 出现内存不足，请尝试调整它。若仅使用 CPU 推理，请移除此项。
我们使用 --repeat-penalty 1.1 和 --dry-multiplier 0.5 你可以调整这些参数。

./llama.cpp/llama-cli \
    --model unsloth-QwQ-32B-GGUF/QwQ-32B-Q4_K_M.gguf \\
    --threads 32 \
    --ctx-size 16384 \
    --n-gpu-layers 99 \
    --seed 3407 \
    --prio 2 \
    --temp 0.6 \
    --repeat-penalty 1.1 \\
    --dry-multiplier 0.5 \\
    --min-p 0.01 \
    --top-k 40 \
    --top-p 0.95 \
    -no-cnv \
    --samplers "top_k;top_p;min_p;temperature;dry;typ_p;xtc" \\
    --prompt "<|im_start|>user\n用 Python 创建一个 Flappy Bird 游戏。你必须包含以下内容：\n1. 必须使用 pygame。\n2. 背景颜色应随机选择且为浅色调。初始为浅蓝色。\n3. 多次按下 SPACE 会加速小鸟。\n4. 小鸟的形状应在正方形、圆形或三角形中随机选择。颜色应随机选择为深色。\n5. 底部放置一些土地，颜色随机为深棕色或黄色。\n6. 在右上角显示分数。通过管道且未碰撞时增加分数。\n7. 随机间隔生成管道，间距充足。颜色随机为深绿色、浅棕色或深灰色。\n8. 失败时显示最佳分数。将文本显示在屏幕内。按 q 或 Esc 退出游戏。按 SPACE 再次重新开始。\n最终游戏应位于 Python 的 markdown 部分。在最终的 markdown 部分之前检查代码是否有错误并修复。<|im_end|>\n<|im_start|>assistant\n<think>\n"  \\
        2>&1 | tee Q4_K_M_yes_samplers.txt

我们在 https://unsloth.ai/blog/deepseekr1-dynamic 1.58bit 博客中的完整输入是：

<|im_start|>user
用 Python 创建一个 Flappy Bird 游戏。你必须包含以下内容：
1. 你必须使用 pygame。
2. 背景颜色应随机选择且为浅色。以浅蓝色开始。
3. 多次按下 SPACE 将加速小鸟。
4. 小鸟的形状应随机选择为方形、圆形或三角形。颜色应随机选择为深色。
5. 在底部放置一些土地，颜色随机为深棕色或黄色。
6. 在右上角显示分数。通过通过管道且未碰撞时增加分数。
7. 随机间隔生成管道并保持足够的间距。管道颜色随机为深绿色、浅棕色或深灰色调。
8. 当你失败时，显示最佳得分。将文本显示在屏幕内。按 q 或 Esc 将退出游戏。重新开始是再次按 SPACE。
最终游戏应位于 Python 的 markdown 部分。在最终的 markdown 部分之前检查代码是否有错误并修复。<|im_end|>
<|im_start|>assistant
<think>

去除思考部分后最终 Python 输出的开始和结尾：

import pygame
import random
import sys

pygame.init()
### 续

class Bird:
    def __init__(self):
        ### 续

def main():
    best_score = 0
    current_score = 0
    game_over = False
    pipes = []
    first_time = True  # 跟踪第一次游戏

    # 初始设置
    background_color = (173, 216, 230)  # 初始为浅蓝色
    land_color = random.choice(land_colors)
    bird = Bird()

    while True:
        for event in pygame.event.get():
            ### 续

        if not game_over:
            # 更新小鸟和管道
            bird.update()
            ### 续

        # 绘制
        ### 续
        pygame.display.flip()
        clock.tick(60)

if __name__ == "__main__":
    main()

完整的最终 Python 输出（已去除思考部分）：

import pygame
import random
import sys

pygame.init()
WIDTH, HEIGHT = 800, 600
screen = pygame.display.set_mode((WIDTH, HEIGHT))
pygame.display.set_caption("Flappy Bird Clone")
clock = pygame.time.Clock()

# 颜色
pipe_colors = [(0, 100, 0), (210, 180, 140), (50, 50, 50)]
land_colors = [(139, 69, 19), (255, 255, 0)]

# 游戏常量
GRAVITY = 0.5
PIPE_SPEED = 5
BIRD_SIZE = 30
LAND_HEIGHT = 50
PIPE_WIDTH = 50
PIPE_GAP = 150

class Bird:
    def __init__(self):
        self.x = WIDTH // 2
        self.y = HEIGHT // 2
        self.velocity = 0
        self.shape = random.choice(['square', 'circle', 'triangle'])
        self.color = (random.randint(0, 100), random.randint(0, 100), random.randint(0, 100))
        self.rect = pygame.Rect(self.x - BIRD_SIZE//2, self.y - BIRD_SIZE//2, BIRD_SIZE, BIRD_SIZE)
    
    def update(self):
        self.velocity += GRAVITY
        self.y += self.velocity
        self.rect.y = self.y - BIRD_SIZE//2
        self.rect.x = self.x - BIRD_SIZE//2  # 保持 x 居中
    
    def draw(self):
        if self.shape == 'square':
            pygame.draw.rect(screen, self.color, self.rect)
        elif self.shape == 'circle':
            pygame.draw.circle(screen, self.color, (self.rect.centerx, self.rect.centery), BIRD_SIZE//2)
        elif self.shape == 'triangle':
            points = [
                (self.rect.centerx, self.rect.top),
                (self.rect.left, self.rect.bottom),
                (self.rect.right, self.rect.bottom)
            ]
            pygame.draw.polygon(screen, self.color, points)

def spawn_pipe():
    pipe_x = WIDTH
    top_height = random.randint(50, HEIGHT - PIPE_GAP - LAND_HEIGHT)
    rect_top = pygame.Rect(pipe_x, 0, PIPE_WIDTH, top_height)
    bottom_y = top_height + PIPE_GAP
    bottom_height = (HEIGHT - LAND_HEIGHT) - bottom_y
    rect_bottom = pygame.Rect(pipe_x, bottom_y, PIPE_WIDTH, bottom_height)
    color = random.choice(pipe_colors)
    return {
        'rect_top': rect_top,
        'rect_bottom': rect_bottom,
        'color': color,
        'scored': False
    }

def main():
    best_score = 0
    current_score = 0
    game_over = False
    pipes = []
    first_time = True  # 跟踪第一次游戏

    # 初始设置
    background_color = (173, 216, 230)  # 初始为浅蓝色
    land_color = random.choice(land_colors)
    bird = Bird()

    while True:
        for event in pygame.event.get():
            if event.type == pygame.QUIT:
                pygame.quit()
                sys.exit()
            if event.type == pygame.KEYDOWN:
                if event.key == pygame.K_ESCAPE or event.key == pygame.K_q:
                    pygame.quit()
                    sys.exit()
                if event.key == pygame.K_SPACE:
                    if game_over:
                        # 重置游戏
                        bird = Bird()
                        pipes.clear()
                        current_score = 0
                        if first_time:
                            # 初次游戏结束后的第一次重启
                            background_color = (random.randint(200, 255), random.randint(200, 255), random.randint(200, 255))
                            first_time = False
                        else:
                            background_color = (random.randint(200, 255), random.randint(200, 255), random.randint(200, 255))
                        land_color = random.choice(land_colors)
                        game_over = False
                    else:
                        # 让小鸟跳跃
                        bird.velocity = -15  # 初始向上速度

        if not game_over:
            # 更新小鸟和管道
            bird.update()

            # 向左移动管道
            remove_pipes = []
            for pipe in pipes:
                pipe['rect_top'].x -= PIPE_SPEED
                pipe['rect_bottom'].x -= PIPE_SPEED
                # 检查小鸟是否通过管道
                if not pipe['scored'] and bird.rect.x > pipe['rect_top'].right:
                    current_score += 1
                    pipe['scored'] = True
                # 检查管道是否移出屏幕
                if pipe['rect_top'].right < 0:
                    remove_pipes.append(pipe)
            # 移除移出屏幕的管道
            for p in remove_pipes:
                pipes.remove(p)

            # 如有需要生成新管道
            if not pipes or pipes[-1]['rect_top'].x < WIDTH - 200:
                pipes.append(spawn_pipe())

            # 检查碰撞
            land_rect = pygame.Rect(0, HEIGHT - LAND_HEIGHT, WIDTH, LAND_HEIGHT)
            bird_rect = bird.rect
            # 检查管道碰撞
            for pipe in pipes:
                if bird_rect.colliderect(pipe['rect_top']) or bird_rect.colliderect(pipe['rect_bottom']):
                    game_over = True
                    break
            # 检查与地面和顶部的碰撞
            if bird_rect.bottom >= land_rect.top or bird_rect.top <= 0:
                game_over = True

            if game_over:
                if current_score > best_score:
                    best_score = current_score

        # 绘制
        screen.fill(background_color)
        # 绘制管道
        for pipe in pipes:
            pygame.draw.rect(screen, pipe['color'], pipe['rect_top'])
            pygame.draw.rect(screen, pipe['color'], pipe['rect_bottom'])
        # 绘制土地
        pygame.draw.rect(screen, land_color, (0, HEIGHT - LAND_HEIGHT, WIDTH, LAND_HEIGHT))
        # 绘制小鸟
        bird.draw()
        # 绘制分数
        font = pygame.font.SysFont(None, 36)
        score_text = font.render(f'Score: {current_score}', True, (0, 0, 0))
        screen.blit(score_text, (WIDTH - 150, 10))
        # 游戏结束界面
        if game_over:
            over_text = font.render('Game Over!', True, (255, 0, 0))
            best_text = font.render(f'Best: {best_score}', True, (255, 0, 0))
            restart_text = font.render('Press SPACE to restart', True, (255, 0, 0))
            screen.blit(over_text, (WIDTH//2 - 70, HEIGHT//2 - 30))
            screen.blit(best_text, (WIDTH//2 - 50, HEIGHT//2 + 10))
            screen.blit(restart_text, (WIDTH//2 - 100, HEIGHT//2 + 50))
        
        pygame.display.flip()
        clock.tick(60)

if __name__ == "__main__":
    main()

运行时，我们得到了一个可运行的游戏！

现在尝试不使用我们的修复！所以移除 --samplers "top_k;top_p;min_p;temperature;dry;typ_p;xtc" 这会将输出保存到 Q4_K_M_no_samplers.txt

./llama.cpp/llama-cli \
    --model unsloth-QwQ-32B-GGUF/QwQ-32B-Q4_K_M.gguf \\
    --threads 32 \
    --ctx-size 16384 \
    --n-gpu-layers 99 \
    --seed 3407 \
    --prio 2 \
    --temp 0.6 \
    --repeat-penalty 1.1 \\
    --dry-multiplier 0.5 \\
    --min-p 0.01 \
    --top-k 40 \
    --top-p 0.95 \
    -no-cnv \
    --prompt "<|im_start|>user\n用 Python 创建一个 Flappy Bird 游戏。你必须包含以下内容：\n1. 必须使用 pygame。\n2. 背景颜色应随机选择且为浅色调。初始为浅蓝色。\n3. 多次按下 SPACE 会加速小鸟。\n4. 小鸟的形状应在正方形、圆形或三角形中随机选择。颜色应随机选择为深色。\n5. 底部放置一些土地，颜色随机为深棕色或黄色。\n6. 在右上角显示分数。通过管道且未碰撞时增加分数。\n7. 随机间隔生成管道，间距充足。颜色随机为深绿色、浅棕色或深灰色。\n8. 失败时显示最佳分数。将文本显示在屏幕内。按 q 或 Esc 退出游戏。按 SPACE 再次重新开始。\n最终游戏应位于 Python 的 markdown 部分。在最终的 markdown 部分之前检查代码是否有错误并修复。<|im_end|>\n<|im_start|>assistant\n<think>\n"  \\
        2>&1 | tee Q4_K_M_no_samplers.txt

你会遇到一些循环，但 问题性的不正确的 Python 语法 和许多其他问题。例如下面看起来正确，但却是错误的！即第 39 行 pipes.clear() ### <<< NameError: 名称 'pipes' 未定义。你是否忘记导入 'pipes'？

import pygame
import random

pygame.init()

# 常量
WIDTH, HEIGHT = 800, 600
GROUND_HEIGHT = 20
GRAVITY = 0.7
PIPE_SPEED = -3
BIRD_SIZE = 45
MIN_GAP = 130
MAX_GAP = 200
PIPE_COLORS = [(0, 96, 0), (205, 133, 63), (89, 97, 107)]
DARK_BROWN = (94, 72, 4)
YELLOW = (252, 228, 6)

screen = pygame.display.set_mode((WIDTH, HEIGHT))
clock = pygame.time.Clock()

def random_light_color():
    return (
        random.randint(180, 230),
        random.randint(190, 300),
        random.randint(250, 255)
    )

def reset_game():
    global bird_x, bird_y
    global pipes, score
    global background_color, land_color
    global bird_shape, bird_color

    # 小鸟属性
    bird_x = WIDTH * 0.3
    bird_y = HEIGHT // 2
    bird_vel = -5  # 初始向上推力

    pipes.clear() ### <<< NameError: 名称 'pipes' 未定义。你是否忘记导入 'pipes'？

如果你使用 --repeat-penalty 1.5，情况会更糟且更明显，甚至出现完全错误的语法。

import pygame
from random import randint  # 用于随机生成颜色/形状/位置 
pygame.init()

# 常量：
WIDTH, HEIGHT =456 ,702   #
BACKGROUND_COLOR_LIGHTS=['lightskyblue']
GAP_SIZE=189           #

BIRD_RADIUS=3.  
PIPE_SPEED=- ( )    ? 
class Game():
def __init__(self):
        self.screen_size=( )

def reset_game_vars():
    global current_scor e
   # 置零及其他初始状态。

# 主游戏循环：
while running :
     for event in pygame.event.get() : 
        if quit ... 等等

pygame.quit()
print("代码已简化。由于时间限制，完整可运行版本需要进一步实现。")

你可能会想也许是 Q4_K_M？B16 即全精度应该可以正常工作吧？不对——如果我们不使用我们的修复，即 --samplers "top_k;top_p;min_p;temperature;dry;typ_p;xtc" 在使用重复惩罚时，仍然不起作用。

🌄 仍然不行？试试 Min_p = 0.1，Temperature = 1.5

根据 Min_p 论文 https://arxiv.org/pdf/2407.01082，为了更有创造性和更多样化的输出，如果你仍然看到重复，尝试禁用 top_p 和 top_k！

./llama.cpp/llama-cli --model unsloth-QwQ-32B-GGUF/QwQ-32B-Q4_K_M.gguf \\
    --threads 32 --n-gpu-layers 99 \\
    --ctx-size 16384 \
    --temp 1.5 \\
    --min-p 0.1 \\
    --top-k 0 \
    --top-p 1.0 \
    -no-cnv \
    --prompt "<|im_start|>user\n用 Python 创建一个 Flappy Bird 游戏。你必须包含以下内容：\n1. 必须使用 pygame。\n2. 背景颜色应随机选择且为浅色调。初始为浅蓝色。\n3. 多次按下 SPACE 会加速小鸟。\n4. 小鸟的形状应在正方形、圆形或三角形中随机选择。颜色应随机选择为深色。\n5. 底部放置一些土地，颜色随机为深棕色或黄色。\n6. 在右上角显示分数。通过管道且未碰撞时增加分数。\n7. 随机间隔生成管道，间距充足。颜色随机为深绿色、浅棕色或深灰色。\n8. 失败时显示最佳分数。将文本显示在屏幕内。按 q 或 Esc 退出游戏。按 SPACE 再次重新开始。\n最终游戏应位于 Python 的 markdown 部分。在最终的 markdown 部分之前检查代码是否有错误并修复。<|im_end|>\n<|im_start|>assistant\n<think>\n"

另一种方法是直接禁用 min_p ，因为 llama.cpp 默认使用 min_p = 0.1!

./llama.cpp/llama-cli --model unsloth-QwQ-32B-GGUF/QwQ-32B-Q4_K_M.gguf \\
    --threads 32 --n-gpu-layers 99 \\
    --ctx-size 16384 \
    --temp 0.6 \
    --min-p 0.0 \
    --top-k 40 \
    --top-p 0.95 \
    -no-cnv \
    --prompt "<|im_start|>user\n用 Python 创建一个 Flappy Bird 游戏。你必须包含以下内容：\n1. 必须使用 pygame。\n2. 背景颜色应随机选择且为浅色调。初始为浅蓝色。\n3. 多次按下 SPACE 会加速小鸟。\n4. 小鸟的形状应在正方形、圆形或三角形中随机选择。颜色应随机选择为深色。\n5. 底部放置一些土地，颜色随机为深棕色或黄色。\n6. 在右上角显示分数。通过管道且未碰撞时增加分数。\n7. 随机间隔生成管道，间距充足。颜色随机为深绿色、浅棕色或深灰色。\n8. 失败时显示最佳分数。将文本显示在屏幕内。按 q 或 Esc 退出游戏。按 SPACE 再次重新开始。\n最终游戏应位于 Python 的 markdown 部分。在最终的 markdown 部分之前检查代码是否有错误并修复。<|im_end|>\n<|im_start|>assistant\n<think>\n"

🤔 <think> 令牌未显示？

一些人报告说因为 <think> 在聊天模板中默认被添加，一些系统没有正确输出思考跟踪。你将不得不手动编辑 Jinja 模板，将：

更改为通过移除末尾的 <think>\n 模型现在必须在推理期间手动添加 <think>\n ，但这可能并不总是成功。DeepSeek 也编辑了所有模型，默认添加一个 <think> 令牌以迫使模型进入推理模式。

所以更改 {%- if add_generation_prompt %} {{- '<|im_start|>assistant\n<think>\n' }} {%- endif %} 更改为 {%- if add_generation_prompt %} {{- '<|im_start|>assistant\n' }} {%- endif %}

即移除 <think>\n

完整的移除 <think>\n 部分的 jinja 模板

额外说明

我们起初以为也许：

QwQ 的上下文长度并非原生 128K，而是通过 YaRN 扩展为 32K。例如在 https://huggingface.co/Qwen/QwQ-32B 的自述文件中，我们看到：，我们看到：

{
  ...,
  "rope_scaling": {
    "factor": 4.0,
    "original_max_position_embeddings": 32768,
    "type": "yarn"
  }
}

我们尝试覆盖 llama.cpp 的 YaRN 处理，但没有改变。

--override-kv qwen2.context_length=int:131072 \\
--override-kv qwen2.rope.scaling.type=str:yarn \\
--override-kv qwen2.rope.scaling.factor=float:4 \\
--override-kv qwen2.rope.scaling.original_context_length=int:32768 \\
--override-kv qwen2.rope.scaling.attn_factor=float:1.13862943649292 \\

我们也曾认为 RMS Layernorm 的 epsilon 可能不对——不是 1e-5 而可能是 1e-6。例如这一项具有 rms_norm_eps=1e-06，而这一项具有 rms_norm_eps=1e-05 。我们也覆盖了它，但并未奏效：

--override-kv qwen2.attention.layer_norm_rms_epsilon=float:0.000001 \\

我们还测试了 tokenizer ID 在 llama.cpp 与普通 Transformers 之间是否匹配，感谢 @kalomaze。它们匹配，所以这不是罪魁祸首。

我们在下面提供我们的实验结果：

✏️ 分词器（Tokenizer）修复

我们还发现一些具体影响微调的问题！EOS 令牌是正确的，但 PAD 令牌可能更应该是 "<|vision_pad|>" 我们已在以下位置更新： https://huggingface.co/unsloth/QwQ-32B/blob/main/tokenizer_config.json

"eos_token": "<|im_end|>",
"pad_token": "<|endoftext|>",

🛠️ 动态 4 位量化

我们还上传了动态 4 位量化，相比原始的 4 位量化能提高精度！我们附上 QwQ 的量化误差分析图，包含激活和权重量化误差：

我们将动态 4 位量化上传到： https://huggingface.co/unsloth/QwQ-32B-unsloth-bnb-4bit

自 vLLM 0.7.3（2025 年 2 月 20 日）起， https://github.com/vllm-project/vllm/releases/tag/v0.7.3vLLM 现在支持加载 Unsloth 的动态 4 位量化！

我们所有的 GGUF 文件都在 https://huggingface.co/unsloth/QwQ-32B-GGUF!

上一页Phi-4 Reasoning 下一页推理与部署

最后更新于4天前

这有帮助吗？

hashtag⚙️ 官方推荐设置

hashtag👍 llama.cpp 的推荐设置

hashtag☀️ Dry 重复惩罚

hashtag🦙 教程：如何在 Ollama 中运行 QwQ-32B

hashtag📖 教程：如何在 llama.cpp 中运行 QwQ-32B

hashtag🌄 仍然不行？试试 Min_p = 0.1，Temperature = 1.5

hashtag🤔 <think> 令牌未显示？

hashtag额外说明

hashtag✏️ 分词器（Tokenizer）修复

hashtag🛠️ 动态 4 位量化

⚙️ 官方推荐设置

👍 llama.cpp 的推荐设置

☀️ Dry 重复惩罚

🦙 教程：如何在 Ollama 中运行 QwQ-32B

📖 教程：如何在 llama.cpp 中运行 QwQ-32B

🌄 仍然不行？试试 Min_p = 0.1，Temperature = 1.5

🤔 <think> 令牌未显示？

额外说明

✏️ 分词器（Tokenizer）修复

🛠️ 动态 4 位量化