SGLang Deployment & Inference Guide
Guide on saving and deploying LLMs to SGLang for serving LLMs in production
💻Installing SGLang
# OPTIONAL use a virtual environment
python -m venv unsloth_env
source unsloth_env/bin/activate
# Install Rust, outlines-core then SGLang
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
source $HOME/.cargo/env && sudo apt-get install -y pkg-config libssl-dev
pip install --upgrade pip && pip install uv
uv pip install "sglang" && uv pip install unslothdocker run --gpus all \
--shm-size 32g \
-p 30000:30000 \
-v ~/.cache/huggingface:/root/.cache/huggingface \
--env "HF_TOKEN=<secret>" \
--ipc=host \
lmsysorg/sglang:latest \
python3 -m sglang.launch_server --model-path unsloth/Llama-3.1-8B-Instruct --host 0.0.0.0 --port 30000🐛Debugging SGLang Installation issues
🚚Deploying SGLang models

🦥Deploying Unsloth finetunes in SGLang
🚃gpt-oss-20b: Unsloth & SGLang Deployment Guide
💎FP8 Online Quantization
⚡Benchmarking SGLang

Batch/Input/Output
TTFT (s)
ITL (s)
Input Throughput
Output Throughput
🏃SGLang Interactive Offline Mode
🎇GGUFs in SGLang
🎬High throughput GGUF serving with SGLang
Last updated
Was this helpful?

