LoRA 热插拔指南
🍧 vLLM LoRA 热插拔 / 动态 LoRA
export VLLM_ALLOW_RUNTIME_LORA_UPDATING=Trueexport VLLM_ALLOW_RUNTIME_LORA_UPDATING=True
vllm serve unsloth/Llama-3.1-8B-Instruct \
--quantization fp8 \
--kv-cache-dtype fp8
--gpu-memory-utilization 0.8 \
--max-model-len 65536 \
--enable-lora \
--max-loras 4 \
--max-lora-rank 64curl -X POST http://localhost:8000/v1/load_lora_adapter \
-H "Content-Type: application/json" \
-d '{
"lora_name": "LORA_NAME",
"lora_path": "/path/to/LORA"
}'最后更新于
这有帮助吗?

