⁉️FP16 vs BF16 for RL
Defeating the Training-Inference Mismatch via FP16 https://arxiv.org/pdf/2510.26788 shows how using float16 is better than bfloat16
Last updated
Was this helpful?
Defeating the Training-Inference Mismatch via FP16 https://arxiv.org/pdf/2510.26788 shows how using float16 is better than bfloat16
Last updated
Was this helpful?
Was this helpful?
from unsloth import FastLanguageModel
import torch
max_seq_length = 2048 # Can increase for longer reasoning traces
lora_rank = 32 # Larger rank = smarter, but slower
model, tokenizer = FastLanguageModel.from_pretrained(
model_name = "unsloth/Qwen3-4B-Base",
max_seq_length = max_seq_length,
load_in_4bit = False, # False for LoRA 16bit
fast_inference = True, # Enable vLLM fast inference
max_lora_rank = lora_rank,
gpu_memory_utilization = 0.9, # Reduce if out of memory
dtype = torch.float16, # Use torch.float16, torch.bfloat16
)