> For the complete documentation index, see [llms.txt](https://unsloth.ai/docs/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://unsloth.ai/docs/zh/ji-chu/unsloth-benchmarks.md).

# Unsloth 基准测试

* 欲了解更详细的基准测试，请阅读我们的 [Llama 3.3 博客](https://unsloth.ai/blog/llama3-3).
* Unsloth 的基准测试也由以下机构进行了： [🤗Hugging Face](https://huggingface.co/blog/unsloth-trl).

{% hint style="warning" %}
如果一开始你的速度看起来更慢，很可能是因为 `torch.compile` 通常需要约 5 分钟（或更久）来热身并完成编译。请确保你测量吞吐量 **在** 它完全加载之后，因为随着运行时间更长，Unsloth 应该会快得多。
{% endhint %}

在 H100 和 [Blackwell](/docs/zh/bo-ke/fine-tuning-llms-with-blackwell-rtx-50-series-and-unsloth.md) GPU 上进行了测试。我们使用 Alpaca 数据集，批大小为 2，梯度累积步数为 4，rank = 32，并在所有线性层（q、k、v、o、gate、up、down）上应用了 QLoRA：

<table data-full-width="false"><thead><tr><th>模型</th><th>显存</th><th>🦥Unsloth 速度</th><th>🦥显存减少</th><th>🦥更长上下文</th><th>😊Hugging Face + FA2</th></tr></thead><tbody><tr><td>Llama 3.3（70B）</td><td>80GB</td><td>2倍</td><td>>75%</td><td>13倍更长</td><td>1倍</td></tr><tr><td>Llama 3.1（8B）</td><td>80GB</td><td>2倍</td><td>>70%</td><td>12倍更长</td><td>1倍</td></tr></tbody></table>

## 上下文长度基准测试

{% hint style="info" %}
你拥有的数据越多，Unsloth 使用的显存就越少，这得益于我们的 [梯度检查点](https://unsloth.ai/blog/long-context) 算法 + Apple 的 CCE 算法！
{% endhint %}

### **Llama 3.1（8B）最大上下文长度**

我们测试了 Llama 3.1（8B）Instruct，并在所有线性层（Q、K、V、O、gate、up 和 down）上进行了 4bit QLoRA，rank = 32，批大小为 1。我们将所有序列填充到某个最大序列长度，以模拟长上下文微调工作负载。

| GPU 显存 | 🦥Unsloth 上下文长度 | Hugging Face + FA2 |
| ------ | --------------- | ------------------ |
| 8 GB   | 2,972           | 显存不足               |
| 12 GB  | 21,848          | 932                |
| 16 GB  | 40,724          | 2,551              |
| 24 GB  | 78,475          | 5,789              |
| 40 GB  | 153,977         | 12,264             |
| 48 GB  | 191,728         | 15,502             |
| 80 GB  | 342,733         | 28,454             |

### **Llama 3.3（70B）最大上下文长度**

我们在一张 80GB A100 上测试了 Llama 3.3（70B）Instruct，并在所有线性层（Q、K、V、O、gate、up 和 down）上进行了 4bit QLoRA，rank = 32，批大小为 1。我们将所有序列填充到某个最大序列长度，以模拟长上下文微调工作负载。

| GPU 显存 | 🦥Unsloth 上下文长度 | Hugging Face + FA2 |
| ------ | --------------- | ------------------ |
| 48 GB  | 12,106          | 显存不足               |
| 80 GB  | 89,389          | 6,916              |


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter, and the optional `goal` query parameter:

```
GET https://unsloth.ai/docs/zh/ji-chu/unsloth-benchmarks.md?ask=<question>&goal=<endgoal>
```

`ask` is the immediate question: it should be specific, self-contained, and written in natural language.
`goal` is optional and describes the broader end goal you are ultimately trying to accomplish on behalf of the user. GitBook uses it to tailor the answer towards what is most useful for that goal.

The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
模型	显存	🦥Unsloth 速度	🦥显存减少	🦥更长上下文	😊Hugging Face + FA2
Llama 3.3（70B）	80GB	2倍	>75%	13倍更长	1倍
Llama 3.1（8B）	80GB	2倍	>70%	12倍更长	1倍