How to Run models with Unsloth Studio
Run AI models, LLMs and GGUFs locally with Unsloth Studio.
Unsloth Studio lets you run AI models 100% offline on your computer. Run model formats like GGUF and safetensors from Hugging Face or from your local files.
Works on all MacOS, CPU, Windows, Linux, WSL setups! No GPU required
Search + Download + Run any model like GGUFs, LoRA adapters, safetensors etc.
Compare two different model outputs side-by-side
Self-healing tool calling / web search, code execution and call OpenAI-compatible APIs
Auto inference parameter tuning (temp, top-p etc.) and edit chat templates
Upload images, audio, PDFs, code, DOCX and more file types to chat with.

Using Unsloth Studio Chat
Search and run models
You can search and download any model via Hugging Face or use local files.
Studio supports a wide range of model types, including GGUF, vision-language, and text-to-speech models. Run the latest models like Qwen3.5 or NVIDIA Nemotron 3.
Upload images, audio, PDFs, code, DOCX and more file types to chat with.

Unsloth Studio Chat automatically works on multi-GPU setups for inference.




Model Arena
Studio Chat lets you compare any two models side-by-side using the same prompt. E.g. compare the base model and LoRa adapter. Inference will firstly load for one model, then the second one (parallel inference is being worked on).

After training, you can compare the base and fine-tuned models side by side with the same prompt to see what changed and whether results improved.
This workflow makes it easy to see how your fine-tuning changed the model’s responses and whether it improved results for your use case.

Adding Files as Context
Studio Chat supports multimodal inputs directly in the conversation. You can attach documents, images, or audio as additional context for a prompt.

This makes it easy to test how a model handles real-world inputs such as PDFs, screenshots, or reference material. Files are processed locally and included as context for the model.
Using GGUF Models with llama.cpp
After fine-tuning a model or adapter in Studio, you can export it to GGUF and run local inference with llama.cpp directly in Studio Chat. Unsloth Studio is powered by llama.cpp and Hugging Face.
Last updated
Was this helpful?

