Fine-Tuning LLMs on NVIDIA DGX Station with Unsloth
NVIDIA DGX Station tutorial on how to fine-tune with notebooks from Unsloth.
You can now train LLMs locally on your NVIDIA DGX Station with Unsloth. DGX Station has more than ~200GB VRAM and over 700GB of unified GPU / CPU memory and combines a Grace CPU and a Blackwell GPU in a tightly connected system designed for large-scale AI workloads. Linked by NVLink-C2C, the CPU and GPU remain distinct but work together far more efficiently than in a traditional CPU-GPU setup.
In this guide, we’ll use Unsloth notebooks train Qwen3.5 and gpt-oss-120b on DGX Station. Thank you to NVIDIA for providing some early access DGX Station hardware to test Unsloth on!
Quickstart
You will need python3 installed and in particular the dev headers are needed. On our system we have python 3.12 so we will install the 3.12 dev headers.
sudo apt update
sudo apt install python3.12-devThen create a fresh virtual environment to install Unsloth. This way we minimize dependency conflicts and preserve the state of the current working environment.
python3 -m venv .unsloth
source .unsloth/bin/activate
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu130First install torch from the cuda 13 index otherwise we could get the CPU version or a mismatch in architecture and capabilities!


Now we can install Unsloth:


Now lets install xformers and optionally build flash-attention from source. Both packages take time so please be patient while they build.


For Qwen 3.5 MoE we’ll want to download two kernel packages flash-linear-attention and causal-conv1d to make it fast.

If you don’t already have a notebook client, install one. For this guide we will use Jupyter Notebook:
Finally we download the actual Unsloth notebooks to run. There are 250+ notebooks for LLM Training as well as Python scripts.
Training Tutorials
Now we can launch Jupyter Notebook and navigate to the UI on a browser.

Copy and paste the localhost site with token parameter and paste into your browser. You should see something like:
The nb folder has all the notebooks to run.

Qwen3.5-35B-A3B Training
Open the file nb/Qwen3_5_MoE.ipynb. Skip past the installation section since we already installed everything we need beforehand. Navigate to the Unsloth section and start executing cells from there.

The notebook covers model setup, dataset preparation, and trainer configuration. Each step can take some time as we are downloading a very large model, initializing billions of weights, and further optimizing to make it run fast.

Training is very fast with the default setting. On the DGX Station there is plenty of memory so you can play with the default training hyper parameters to really push the memory and compute. Once done training you can save the model for later, push the model to Hugging Face Hub to share with others, or export to a quantized format.
gpt-oss-120b Training
Open the file nb/gpt-oss-(120B)_A100-Fine-tuning.ipynb. Skip past the installation section since we already installed the prerequisites and navigate to the Unsloth section. We can start running the notebook from there. The notebook will use around 72 GB of GPU memory and take about 10 minutes.

Each cell can take some time to run as we need to download the model, initialize the weights, and further optimize for a fast experience. The notebook goes through dataset preprocessing and trainer setup. Once we get to the trainer.train() cell and execute training begins.

Now that it’s complete we can save the model for later use, push to Hugging Face Hub to share with the world, or export it to GGUF format.

Read more about NVIDIA's DGX Station at https://www.nvidia.com/en-us/products/workstations/dgx-station/
Last updated
Was this helpful?

