> For the complete documentation index, see [llms.txt](https://unsloth.ai/docs/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://unsloth.ai/docs/de/loslegen/reinforcement-learning-rl-guide/advanced-rl-documentation/fp16-vs-bf16-for-rl.md). # FP16 vs. BF16 für RL ### Float16 vs Bfloat16 Es gab ein Paper mit dem Titel „**Überwindung der Trainings-Inferenz-Unstimmigkeit mittels FP16**" in dem gezeigt wird, dass die Verwendung von Float16-Präzision beim Reinforcement Learning deutlich besser sein kann als die Verwendung von Bfloat16.

Tatsächlich wird es bei längeren Generierungen immer schlimmer, wenn man Bfloat16 verwendet:

Wir haben eine Untersuchung durchgeführt, und **FESTGESTELLT, dass Float16 stabiler ist** als Bfloat16 mit deutlich kleineren Gradientennormen siehe und {% columns %} {% column width="50%" %}

{% endcolumn %} {% column width="50%" %}

{% endcolumn %} {% endcolumns %} ### :exploding\_head:A100 Cascade-Attention-Fehler Laut und , hatten ältere vLLM-Versionen (vor 0.11.0) fehlerhafte Attention-Mechanismen für A100 und ähnliche GPUs. Bitte aktualisieren Sie vLLM! Wir deaktivieren außerdem standardmäßig Cascade Attention in vLLM während Unsloth-Reinforcement-Learning, wenn wir eine ältere vLLM-Version feststellen.

Verschiedene Hardware verändert ebenfalls die Ergebnisse; neuere und teurere GPUs zeigen geringere KL-Differenzen zwischen Inferenz- und Trainingsseite:

### :fire:Verwendung von Float16 in Unsloth RL Um Float16-Präzision in Unsloth GRPO und RL zu verwenden, müssen Sie lediglich `dtype = torch.float16` setzen, und wir kümmern uns um den Rest! {% code overflow="wrap" %} ```python from unsloth import FastLanguageModel import torch max_seq_length = 2048 # Kann für längere Reasoning-Traces erhöht werden lora_rank = 32 # Größerer Rank = intelligenter, aber langsamer model, tokenizer = FastLanguageModel.from_pretrained( model_name = "unsloth/Qwen3-4B-Base", max_seq_length = max_seq_length, load_in_4bit = False, # False für LoRA 16bit fast_inference = True, # vLLM Fast-Inferenz aktivieren max_lora_rank = lora_rank, gpu_memory_utilization = 0.9, # Bei Speichermangel reduzieren dtype = torch.float16, # Verwenden Sie torch.float16, torch.bfloat16 ) ``` {% endcode %} --- # Agent Instructions This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com. ## Querying This Documentation If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question. Perform an HTTP GET request on the current page URL with the `ask` query parameter: ``` GET https://unsloth.ai/docs/de/loslegen/reinforcement-learning-rl-guide/advanced-rl-documentation/fp16-vs-bf16-for-rl.md?ask= ``` The question should be specific, self-contained, and written in natural language. The response will contain a direct answer to the question and relevant excerpts and sources from the documentation. Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.