👁️🗨️Vision Reinforcement Learning (VLM RL)
Train Vision/multimodal models via GRPO and RL with Unsloth!
os.environ['UNSLOTH_VLLM_STANDBY'] = '1' # To enable memory efficient GRPO with vLLM
model, tokenizer = FastVisionModel.from_pretrained(
model_name = "Qwen/Qwen2.5-VL-7B-Instruct",
max_seq_length = 16384, #Must be this large to fit image in context
load_in_4bit = True, # False for LoRA 16bit
fast_inference = True, # Enable vLLM fast inference
gpu_memory_utilization = 0.8, # Reduce if out of memory
)# Add LoRA adapter to the model for parameter efficient fine tuning
model = FastVisionModel.get_peft_model(
model,
finetune_vision_layers = False,# fast_inference doesn't support finetune_vision_layers yet :(
finetune_language_layers = True, # False if not finetuning language layers
finetune_attention_modules = True, # False if not finetuning attention layers
finetune_mlp_modules = True, # False if not finetuning MLP layers
r = lora_rank, # Choose any number > 0 ! Suggested 8, 16, 32, 64, 128
lora_alpha = lora_rank*2, # *2 speeds up training
use_gradient_checkpointing = "unsloth", # Reduces memory usage
random_state = 3407,
)🦋Qwen 2.5 VL Vision RL Issues and Quirks


🏅Reward Functions to reduce gibberish
🏁GSPO Reinforcement Learning




Last updated
Was this helpful?

