square-up-rightUnsloth AMD PyTorch Synthetic Data Hackathon

Tips & tricks, troubleshooting and guide to run Unsloth on an AMD GPU.

Once you get access to a MI300 machine, you will see a Jupyter Notebook interface:

First, update Unsloth and confirm everything works as expected - click on Terminal

Then run the below in the Terminal to update Unsloth - ensure the version is 2025.10.5 or higher.

To make a new Notebook or Terminal, click on the PLUS button

circle-check

🦋TUTORIAL 1: Confirming Unsloth works

Confirm our simple Llama 3.2 1B / 3B conversational notebook runs as expected in a new Terminal.

You should see the below (it'll take 2 minutes). If anything breaks, try updating Unsloth first via

🦥TUTORIAL 2: Running synthetic data generation

circle-check

Now let's try the example at https://github.com/edamamez/Unsloth-AMD-Fine-Tuning-Synthetic-Dataarrow-up-right and also https://www.amd.com/en/developer/resources/technical-articles/2025/10x-model-fine-tuning-using-synthetic-data-with-unsloth.htmlarrow-up-right

First make a new Terminal again - the PLUS button will allow a new Terminal.

Run vLLM to load up Llama 3.3 70B Instruct in a new Terminal (use the PLUS button for a new Terminal)

You will see:

Wait until you see INFO: Application startup complete. then click the PLUS button to open a new tab

Install synthetic-data-kit https://github.com/meta-llama/synthetic-data-kitarrow-up-right in a new Terminal window.

Get config.yaml either from https://raw.githubusercontent.com/edamamez/Unsloth-AMD-Fine-Tuning-Synthetic-Data/refs/heads/main/config.yamlarrow-up-right, or below:

file-download
7KB

Check if synthetic data kit worked via. If you see errors, confirm vLLM is running in the 1st cell.

Now, get some files we will use for processing:

Now let's ingest the data and process it:

Now, either create Q&A (question & answer pairs) or CoT (chain of thought) pairs (it might take 3 minutes)

Now let's ask a a LLM to curate the data and call LLM as a judge to remove less desirable synthetic data rows, then we save the output - it might take 3 minutes

Once again, SHUT DOWN the vLLM service to save VRAM!!! Go to the previous tab, and CTRL+C 3 times. Or see How do I free AMD GPU memory?

Now get the notebook which we will run at https://github.com/unslothai/notebooks/blob/main/nb/Synthetic_Data_Hackathon.ipynbarrow-up-right:

circle-info

If you get Out of Memory errors, shut down your vLLM instance - see How do I free AMD GPU memory?

Click on the the left folder button and open up "Synthetic_Data_Hackathon.ipynb" (double click)

Then run all!

You will see in the middle of the notebook:

See https://github.com/edamamez/Unsloth-AMD-Fine-Tuning-Synthetic-Data/blob/main/tutorial.ipynbarrow-up-right for more details

🐬TUTORIAL 3: GPT-OSS Reinforcement Learning Auto Kernel Creation

You can run this as a notebook or via Python script!

Python script: https://github.com/unslothai/notebooks/blob/main/python_scripts/gpt_oss_(20B)_GRPO_BF16.pyarrow-up-right

Notebook: https://github.com/unslothai/notebooks/blob/main/nb/gpt_oss_(20B)_GRPO_BF16.ipynbarrow-up-right

Then again like Tutorial 2, open the file "Auto_Kernels_RL.ipynb" and restart and run all!

If you run it and scroll down, you will see the 2048 game being run via auto generated strategies through RL:

♦️TUTORIAL 4: GPT-OSS Reinforcement Learning 2048 Game

You can run this as a notebook or via Python script!

Python script: https://github.com/unslothai/notebooks/blob/main/python_scripts/gpt_oss_(20B)_GRPO_BF16.pyarrow-up-right

Notebook: https://github.com/unslothai/notebooks/blob/main/nb/gpt_oss_(20B)_Reinforcement_Learning_2048_Game_BF16.ipynbarrow-up-right

Then again like Tutorial 3, open the file "Auto_Kernels_RL.ipynb" and restart and run all!

When you scroll down, you will see the RL algorithm auto creating strategies to win 2048!

🌻Optimal vLLM commands on AMD

To serve models on AMD GPUs, please use the following commands which will boost performance. Confirm aiter and flash-attention are installed or see Updating vLLM to the latest on AMD

For MI300X, MI325X and Radeon GPUs:

For MI355X, do the below:

🛠️Troubleshooting and FAQs

🆓How do I free AMD GPU memory?

If you are on a Docker image (like the hackathon) run the below in a new Terminal rocm-smi -d 0 --showpids if on a local machine

If in a local machine, simply do rocm-smi -d 0 --showpids and run sudo kill -9 XXXX where XXXX is the PID listed for that specific process that uses the most VRAM.

For the Docker image like in the hackathon, after running the first cell, you might see something like below:

Then look for the process which is using the VRAM (like vLLM), and type sudo kill -9 XXXX where XXXX is the PID listed on the left column like below:

Confirm all GPU memory is freed via rocm-smi -d 0 --showpids For example below shows 0 memory usage:

If on the other hand you see the below, then rerun the first Docker cell image to kill the process again.

📝torch.OutOfMemoryError: HIP out of memory RuntimeError: Engine process failed to start.

Please see How do I free AMD GPU memory? for checking if your GPU is using memory from another process and try deleting that process that is using memory.

Also try amd-smi process --gpu 0 to list all processes and the VRAM usage for all processes using the GPU:

▶️No platform detected for vLLM, upgrading vLLM, gpt-oss on vLLM

If you are running vllm serve Unsloth/gpt-oss-20b you might be using an old vLLM version. python -c "import vllm; print(vllm.__version__)" to get the vLLM version.

In the pre-built hackathon docker, you will get 0.7.4 , which unfortunately does not support newer models like GPT-OSS, however, other models work like vllm serve Unsloth/Llama-3.3-70B-Instruct --port 8001 --max-model-len 48000 --gpu-memory-utilization 0.85

🧁Updating vLLM to the latest on AMD

circle-exclamation

To get the latest vLLM, please see https://docs.vllm.ai/en/latest/getting_started/installation/gpu.html#install-specific-revisionsarrow-up-right, specifically run the below, after clearing all processes using the AMD GPU via How do I free AMD GPU memory?

Executing the above will render (reminder to shut down all processes using the GPU first! See How do I free AMD GPU memory?)

chevron-right(OPTIONAL Collapsible code) To build Flash Attention via (this will take 30 minutes to 1 hour) So this is optional if you do not want to wait 30 minutes to 1 hour! I would skip this process generally. Expand this cell if you want to install Flash Attention.hashtag

You will see:

To monitor the progress for Flash-Attention (which might be very long), watch for the [296/2206] progress.

(NOT OPTIONAL) Then build aiter AI Tensor Engine for ROCmarrow-up-right (this will take 5 minutes)

(NOT OPTIONAL) Then build vLLM:

You will see the below (please wait 5 to 10 minutes!)

Confirm vLLM, torch got updated via

which should show vLLM is 0.11.0 or higher, and torch MUST be 2.8.0 as of October 2025. The type vllm to confirm vLLM works as expected.

📖Running unsloth/gpt-oss-20b in vLLM

circle-exclamation

After updating vLLM via Updating vLLM to the latest on AMD, you can run gpt-oss-20barrow-up-right! See Optimal vLLM commands on AMD for better optimal commands to run vllm on AMD GPUs (you might get faster inference!)

⁉️RuntimeError: User specified an unsupported autocast device_type 'hip'

Please update Unsloth! See below Updating Unsloth

🐛NotImplementedError: Unsloth currently ok

🆕Updating Unsloth

First, update Unsloth and confirm everything works as expected - click on Terminal

Then run the below in the Terminal to update Unsloth - ensure the version is 2025.10.5 or higher.

You must RESTART the runtime as well

⁉️terminate called after throwing an instance of 'std::logic_error' what()

Please verify you are on torch==2.8.0. Rerun the below:

System has not been booted, Failed to connect to bus

You might see the below:

Please message us so we can reboot the machine!

🐛Configured ROCm binary not found - get_native_library()

This indicates bitsandbytes is not installed correctly like below:

Please see Updating Unslothto update bitsandbytes and Unsloth!

NotImplementedError: Cannot copy out of meta tensor; no data!

This means you are out of memory. See How do I free AMD GPU memory? for freeing GPU memory.

💭Failed to import from vllm._C with ModuleNotFoundError("No module named 'vllm._C'")

Please re-install vLLM. Use vllm_build as the folder you are git cloning into and not vllm. Updating vLLM to the latest on AMD

😯ModuleNotFoundError: No module named 'vllm'

Please do not rm -rf vllm_build the folder which you built. Or reinstall vllm via Updating vLLM to the latest on AMD

📒ipykernel>6.30.1 breaks progress bars.

If you see the below:

For now ignore it - you just won't see progress bars for downloading models and uploading.

🐛AssertionError: No MXFP4 MoE backend

If you are running gpt-oss-20b and see this during vLLM, please reinstall vLLM via Updating vLLM to the latest on AMD

🤕NotImplementedError: Could not run `aten::empty_strided`

Please use .to("cuda") and not .to("hip") Also update Unsloth Updating Unsloth

🐛NotImplementedError: Could not run 'aten::empty.memory_format'

Please see Updating Unslothto update bitsandbytes and Unsloth!

Last updated

Was this helpful?