🎨Run Qwen-Image-2512 in stable-diffusion.cpp Tutorial

Tutorial for using Qwen-Image-2512 in stable-diffusion.cpp.

Qwen-Image-2512 is Qwen's new text-to-image foundational model and you can now run it on your local device via stable-diffusion.cpp. See below for instructions:

📖 stable-diffusion.cpp Tutorial

stable-diffusion.cpparrow-up-right is an open-source library for efficient and local inference of diffusion image models written in pure C/C++.

To run, you don't need a GPU, just a CPU with RAM will work. For best results, ensure your total usable memory (RAM + VRAM / unified) is larger than the GGUF size; e.g. 4-bit (Q4_K_M) unsloth/Qwen-Image-Edit-2512-GGUF is 13.1 GB, so you should have 13.2+ GB of combined memory.

The tutorial will focus on machines with CUDA available, but instructions to build with on Apple or CPU only are similar and available in the repo.

#1. Setup environment

We will be building from source so we need to first be sure your build software is installed

sudo apt update
sudo apt install -y git cmake build-essential pkg-config
circle-info

Releases Pagearrow-up-right may have pre built binaries available for your hardware if you don't want to go through the build process.

Make sure CUDA environment variables are set:

export CUDA_HOME=/usr/local/cuda
export PATH="$CUDA_HOME/bin:$PATH"
export LD_LIBRARY_PATH="$CUDA_HOME/lib64:${LD_LIBRARY_PATH:-}"

You can confirm if set correctly by running:

nvcc --version  // if not found install nvidia-cuda-toolkit
ldconfig -p | grep -E 'libcudart\.so|libcublas\.so'

We can now clone the repo and build:

Confirm sd-cli was built:

#2. Download Models

Diffusion models typically need 3 components. A Variational AutoEncoder (VAE) that encodes image pixel space to latent space, a text encoder to translate text to input embeddings, and the actual diffusion transformer. Both the diffusion model and text encoder can be GGUF format while we typically use safetensors for the vae. Let's download the models we will use:

We are using Q4 GGUF variants, but you can try smaller or larger quant types depending on how much VRAM/RAM you have.

circle-exclamation

Workflow and Hyperparameters

You can view our detailed 🎯 Workflow and Hyperparameters Guide.

#3. Inference

We can now run the binary that we built. This is an example of a basic text to image command:

circle-check

Last updated

Was this helpful?