Connect vLLM to Unsloth for Local Chat Inference
Setup
Common vLLM arguments
vllm serve unsloth/gemma-4-26B-A4B-it \
--dtype auto \
--host 0.0.0.0 \
--port 8000 \
--api-key token-abc123 \
--max-model-len 8192 \
--gpu-memory-utilization 0.9Last updated
Was this helpful?



