> For the complete documentation index, see [llms.txt](https://unsloth.ai/docs/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://unsloth.ai/docs/get-started/reinforcement-learning-rl-guide/advanced-rl-documentation/fp16-vs-bf16-for-rl.md). # FP16 vs BF16 for RL ### Float16 vs Bfloat16 There was a paper titled "**Defeating the Training-Inference Mismatch via FP16**" showing how using float16 precision can dramatically be better than using bfloat16 when doing reinforcement learning.

In fact the longer the generation, the worse it gets when using bfloat16:

We did an investigation, and **DO find float16 to be more stable** than bfloat16 with much smaller gradient norms see and {% columns %} {% column width="50%" %}

{% endcolumn %} {% column width="50%" %}

{% endcolumn %} {% endcolumns %} ### :exploding\_head:A100 Cascade Attention Bug As per and , older vLLM versions (before 0.11.0) had broken attention mechanisms for A100 and similar GPUs. Please update vLLM! We also by default disable cascade attention in vLLM during Unsloth reinforcement learning if we detect an older vLLM version.

Different hardware also changes results, where newer and more expensive GPUs have less KL difference between the inference and training sides:

### :fire:Using float16 in Unsloth RL To use float16 precision in Unsloth GRPO and RL, you just need to set `dtype = torch.float16` and we'll take care of the rest! {% code overflow="wrap" %} ```python from unsloth import FastLanguageModel import torch max_seq_length = 2048 # Can increase for longer reasoning traces lora_rank = 32 # Larger rank = smarter, but slower model, tokenizer = FastLanguageModel.from_pretrained( model_name = "unsloth/Qwen3-4B-Base", max_seq_length = max_seq_length, load_in_4bit = False, # False for LoRA 16bit fast_inference = True, # Enable vLLM fast inference max_lora_rank = lora_rank, gpu_memory_utilization = 0.9, # Reduce if out of memory dtype = torch.float16, # Use torch.float16, torch.bfloat16 ) ``` {% endcode %} --- # Agent Instructions This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com. ## Querying This Documentation If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question. Perform an HTTP GET request on the current page URL with the `ask` query parameter: ``` GET https://unsloth.ai/docs/get-started/reinforcement-learning-rl-guide/advanced-rl-documentation/fp16-vs-bf16-for-rl.md?ask= ``` The question should be specific, self-contained, and written in natural language. The response will contain a direct answer to the question and relevant excerpts and sources from the documentation. Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.