使用 Unsloth 的 AMD AI 强化学习黑客松

向 Unsloth 的创建者 Daniel Han 学习使用 Unsloth 为 AI 模型进行强化学习的实战技巧。

您可以在此查看 Unsloth 的 GitHub 仓库： https://github.com/unslothai/unsloth

这是我们 AMD 微调笔记本的链接：

notebooks/nb/gpt_oss_(20B)_Reinforcement_Learning_2048_Game_BF16.ipynb at main · unslothai/notebooksGitHub

https://github.com/unslothai/notebooks/blob/main/nb/gpt_oss_(20B)_Reinforcement_Learning_2048_Game_BF16.ipynb

wget 'https://raw.githubusercontent.com/unslothai/notebooks/refs/heads/main/nb/gpt_oss_(20B)_Reinforcement_Learning_2048_Game_BF16.ipynb'

如果想升级 Unsloth / Unsloth Zoo：

uv pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm7.0 --upgrade --force-reinstall

pip uninstall unsloth unsloth_zoo -y && \
    pip install git+https://github.com/unslothai/unsloth-zoo git+https://github.com/unslothai/unsloth --no-deps --force-reinstall --no-cache-dir

关于 bitsandbytes：

pip install "unsloth[amd] @ git+https://github.com/unslothai/unsloth"

如果您看到：

错误：安装失败：bitsandbytes-1.33.7rc0-py3-none-manylinux_2_24_x86_64.whl (bitsandbytes==1.33.7rc0 (来自 https://github.com/bitsandbytes-foundation/bitsandbytes/releases/download/continuous-release_main/bitsandbytes-1.33.7.preview-py3-none-manylinux_2_24_x86_64.whl)) 原因：Wheel 版本与文件名不匹配 (0.49.2.dev0 != 1.33.7rc0)，这表示 Wheel 格式不正确。如果这是故意的，请设置 UV_SKIP_WHEEL_FILENAME_CHECK=1。

不要使用 UV_SKIP_WHEEL_FILENAME_CHECK，取而代之只使用 pip install "unsloth[amd] @ git+https://github.com/unslothai/unsloth" （不是 uv）因为 uv 会破坏 bitsandbytes。若可能的话，也许在 PR 中添加检查来捕捉这些问题。

有关 AMD 安装说明，您可以在此查看我们的指南：

AMD

最后更新于 1个月前

这有帮助吗？