infoFine-tuning LLMs on Intel GPUs with Unsloth

Learn how to train and fine-tune large language models on Intel GPUs.

You can now fine-tune LLMs on your local Intel device with Unsloth! Read our guide on exactly how to get started with training your own custom model.

Before you begin, make sure you have:

  • Intel GPU: Data Center GPU Max Series, Arc Series, or Intel Ultra AIPC

  • OS: Linux (Ubuntu 22.04+ recommended) or Windows 11 (recommended)

  • Windows only: Install Intel oneAPI Base Toolkit 2025.2.1 (select version 2025.2.1)

  • Intel Graphics driver: Latest recommended driver for Windows/Linux

  • Python: 3.10+

Build Unsloth with Intel Support

1

Create a new conda environment (Optional)

conda create -n unsloth-xpu python==3.10
conda activate unsloth-xpu
2

Install Unsloth

git clone https://github.com/unslothai/unsloth.git
cd unsloth
pip install .[intel-gpu-torch290]
circle-info

Linux Only: Install vLLM (Optional) You can also install vLLM for inference and RL. Please follow vLLM's guidearrow-up-right.

3

Verify your environments

import torch
print(f"PyTorch version: {torch.__version__}")
print(f"XPU available: {torch.xpu.is_available()}")
print(f"XPU device count: {torch.xpu.device_count()}")
print(f"XPU device name: {torch.xpu.get_device_name(0)}")
4

Start fine-tuning.

You can directly use our Unsloth notebooks or view our dedicated fine-tuning or reinforcement learning guides.

Windows Only - Runtime Configurations

In Command Prompt with Administrator privilege, enable long path support in the Windows registry:

powershell -Command "Set-ItemProperty -Path "HKLM:\\SYSTEM\\CurrentControlSet\\Control\\FileSystem" -Name "LongPathsEnabled" -Value 1

This command only needs to be set once on a single machine. It does not need to be configured before each run. Then:

  1. Download level-zero-win-sdk-1.20.2.zip from GitHubarrow-up-right

  2. Unzip the level-zero-win-sdk-1.20.2.zip

  3. In Command Prompt, under conda environment unsloth-xpu:

Example 1: QLoRA Fine-tuning with SFT

This example demonstrates how to fine-tune a Qwen3-32B model using 4-bit QLoRA on an Intel GPU. QLoRA significantly reduces memory requirements, making it possible to fine-tune large models on consumer-grade hardware.

Example 2: Reinforcement Learning GRPO

GRPO is a reinforcement learning technique for aligning language models with human preferences. This example shows how to train a model to follow a specific XML output format using multiple reward functions.

What is GRPO?

GRPO improves upon traditional RLHF by:

  • Using group-based normalization for more stable training

  • Supporting multiple reward functions for multi-objective optimization

  • Being more memory efficient than PPO

Troubleshooting

Out of Memory (OOM) Errors

If you run out of memory, try these solutions:

  1. Reduce batch size: Lower per_device_train_batch_size.

  2. Use a smaller model: Start with a smaller model to reduce memory requirements.

  3. Reduce sequence length: Lower max_seq_length.

  4. Reduce LoRA rank: Use r=8 instead of r=16 or r=32.

  5. For GRPO, reduce number of generations: Lower num_generations.

(Windows Only) Intel Ultra AIPC iGPU Shared Memory

For Intel Ultra AIPC with recent GPU drivers on Windows, the shared GPU memory for the integrated GPU typically defaults to 57% of system memory. For larger models (e.g., Qwen3-32B), or when using longer max sequence length, larger batch size, LoRA adapters with larger LoRA rank, etc., during fine-tuning, you could increase available VRAM by raising the percentage of system memory allocated to the iGPU.

You can adjust this by modifying the registry:

  • Path: Computer\HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\GraphicsDrivers\MemoryManager

  • Key to change: SystemPartitionCommitLimitPercentage (set to a larger percentage)

Last updated

Was this helpful?