> For the complete documentation index, see [llms.txt](https://unsloth.ai/docs/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://unsloth.ai/docs/models/tutorials/how-to-run-llms-with-docker.md). # How to Run Local LLMs with Docker: Step-by-Step Guide You can now run any model, including Unsloth [Dynamic GGUFs](/docs/basics/unsloth-dynamic-2.0-ggufs.md), on Mac, Windows or Linux with a single line of code or **no code** at all. We collabed with Docker to simplify model deployment, and Unsloth now powers most GGUF models on Docker. Before you start, make sure to look over [hardware requirements](#hardware-info--performance) and [our tips](#hardware-info--performance) for optimizing performance when running LLMs on your device. Docker Terminal Tutorial Docker no-code Tutorial To get started, run OpenAI [gpt-oss](/docs/models/gpt-oss-how-to-run-and-fine-tune.md) with a single command: ```bash docker model run ai/gpt-oss:20B ``` Or to run a specific [Unsloth model](/docs/get-started/unsloth-model-catalog.md) / quant from Hugging Face: ```bash docker model run hf.co/unsloth/gpt-oss-20b-GGUF:F16 ``` {% hint style="success" %} You don’t need Docker Desktop, Docker CE is enough to run models. {% endhint %} #### **Why Unsloth + Docker?** We collab with model labs like Google Gemma to fix model bugs and boost accuracy. Our Dynamic GGUFs consistently outperform other quant methods, giving you high-accuracy, efficient inference. If you use Docker, you can run models instantly with zero setup. Docker uses [Docker Model Runner](https://github.com/docker/model-runner) (DMR), which lets you run LLMs as easily as containers with no dependency issues. DMR uses Unsloth models and `llama.cpp` under the hood for fast, efficient, up-to-date inference. ## :gear: Hardware Info + Performance For the best performance, aim for your VRAM + RAM combined to be at least equal to the size of the quantized model you're downloading. If you have less, the model will still run, but significantly slower. Make sure your device also has enough disk space to store the model. If your model only barely fits in memory, you can expect around \~5 tokens/s, depending on model size. Having extra RAM/VRAM available will improve inference speed, and additional VRAM will enable the biggest performance boost (provided the entire model fits) {% hint style="info" %} **Example:** If you're downloading gpt-oss-20b (F16) and the model is 13.8 GB, ensure that your disk space and RAM + VRAM > 13.8 GB. {% endhint %} **Quantization recommendations:** * For models under 30B parameters, use at least 4-bit (Q4). * For models 70B parameters or larger, use a minimum of 2-bit quantization (e.g., UD\_Q2\_K\_XL). ## ⚡ Step-by-Step Tutorials Below are **two ways** to run models with Docker: one using the [terminal](#method-1-docker-terminal), and the other using [Docker Desktop](#method-2-docker-desktop-no-code) with no code: ### Method #1: Docker Terminal {% stepper %} {% step %} #### Install Docker Docker Model Runner is already available in **both** [Docker Desktop](https://docs.docker.com/ai/model-runner/get-started/#docker-desktop) and [**Docker CE**](https://docs.docker.com/ai/model-runner/get-started/#docker-engine)**.** {% endstep %} {% step %} #### Run the model Decide on a model to run, then run the command via terminal. * Browse the verified catalog of trusted models available on [Docker Hub](https://hub.docker.com/r/ai) or [Unsloth's Hugging Face](https://huggingface.co/unsloth) page. * Go to Terminal to run the commands. To verify if you have `docker` installed, you can type 'docker' and enter. * Docker Hub defaults to running Unsloth Dynamic 4-bit, however you can select your own quantization level (see step #3). For example, to run OpenAI `gpt-oss-20b` in a single command: ```bash docker model run ai/gpt-oss:20B ``` Or to run a specific [Unsloth](/docs/get-started/unsloth-model-catalog.md) gpt-oss quant from Hugging Face: ```bash docker model run hf.co/unsloth/gpt-oss-20b-GGUF:UD-Q8_K_XL ``` **This is how running gpt-oss-20b should look via CLI:**

gpt-oss-20b with Unsloths' UD-Q8_K_XL quantization

{% endstep %} {% step %} #### To run a specific quantization level: If you want to run a specific quantization of a model, append `:` and the quantization name to the model (e.g., `Q4` for Docker or `UD-Q4_K_XL`). You can view all available quantizations on each model’s Docker Hub page. e.g. see the listed quantizations for gpt-oss [here](https://hub.docker.com/r/ai/gpt-oss#gptoss). The same applies to Unsloth quants on Hugging Face: visit the [model’s HF page](https://huggingface.co/unsloth/gpt-oss-20b-GGUF?show_file_info=gpt-oss-20b-Q2_K_L.gguf), choose a quantization, then run something like: `docker model run hf.co/unsloth/gpt-oss-20b-GGUF:Q2_K_L`

gpt-oss quantization levels on Docker Hub

Unsloth gpt-oss quantization levels on Hugging Face

{% endstep %} {% endstepper %} ### Method #2: Docker Desktop (no code) {% stepper %} {% step %} #### Install Docker Desktop Docker Model Runner is already available in [Docker Desktop](https://docs.docker.com/ai/model-runner/get-started/#docker-desktop). 1. Decide on a model to run, open Docker Desktop, then click on the models tab. 2. Click 'Add models +' or Docker Hub. Search for the model. Browse the verified model catalog available on [Docker Hub](https://hub.docker.com/r/ai).

#1. Click 'Models' tab then 'Add models +'

{% endstep %} {% step %} #### Pull the model Click the model you want to run to see available quantizations. * Quantizations range from 1–16 bits. For models under 30B parameters, use at least 4-bit (`Q4`). * Choose a size that fits your hardware: ideally, your combined unified memory, RAM, or VRAM should be equal to or greater than the model size. For example, an 11GB model runs well on 12GB unified memory.

#3. Select which quantization you would like to pull.

#4. Wait for model to finish downloading, then Run it.

{% endstep %} {% step %} #### Run the model Type any prompt in the 'Ask a question' box and use the LLM like you would use ChatGPT.

An example of running Qwen3-4B `UD-Q8_K_XL`

{% endstep %} {% endstepper %} #### **To run the latest models:** You can run any new model on Docker as long as it’s supported by `llama.cpp` or `vllm` and available on Docker Hub. ### What Is the Docker Model Runner? The Docker Model Runner (DMR) is an open-source tool that lets you pull and run AI models as easily as you run containers. GitHub: It provides a consistent runtime for models, similar to how Docker standardized app deployment. Under the hood, it uses optimized backends (like `llama.cpp`) for smooth, hardware-efficient inference on your machine. Whether you’re a researcher, developer, or hobbyist, you can now: * Run open models locally in seconds. * Avoid dependency hell, everything is handled in Docker. * Share and reproduce model setups effortlessly. --- # Agent Instructions This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com. ## Querying This Documentation If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question. Perform an HTTP GET request on the current page URL with the `ask` query parameter: ``` GET https://unsloth.ai/docs/models/tutorials/how-to-run-llms-with-docker.md?ask= ``` The question should be specific, self-contained, and written in natural language. The response will contain a direct answer to the question and relevant excerpts and sources from the documentation. Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.