🐋DeepSeek-R1: How to Run Locally
A guide on how you can run our 1.58-bit Dynamic Quants for DeepSeek-R1 using llama.cpp.
Using llama.cpp (recommended)
apt-get update
apt-get install pciutils build-essential cmake curl libcurl4-openssl-dev -y
git clone https://github.com/ggerganov/llama.cpp
cmake llama.cpp -B llama.cpp/build \
-DBUILD_SHARED_LIBS=ON -DGGML_CUDA=ON -DLLAMA_CURL=ON
cmake --build llama.cpp/build --config Release -j --clean-first --target llama-quantize llama-cli llama-gguf-split
cp llama.cpp/build/bin/llama-* llama.cpp# pip install huggingface_hub hf_transfer
# import os # Optional for faster downloading
# os.environ["HF_HUB_ENABLE_HF_TRANSFER"] = "1"
from huggingface_hub import snapshot_download
snapshot_download(
repo_id = "unsloth/DeepSeek-R1-GGUF",
local_dir = "DeepSeek-R1-GGUF",
allow_patterns = ["*UD-IQ1_S*"], # Select quant type UD-IQ1_S for 1.58bit
)

Quant
File Size
24GB GPU
80GB GPU
2x80GB GPU
Running on Mac / Apple devices
Run in Ollama/Open WebUI
DeepSeek Chat Template
Token
R1
Distill Qwen
Distill Llama
Token
Qwen 2.5 32B Base
Llama 3.3 70B Instruct
GGUF R1 Table
MoE Bits
Type
Disk Size
Accuracy
Link
Details
Last updated
Was this helpful?

