How to Run Local LLMs with OpenAI Codex
Use open models with OpenAI Codex on your device locally.
📖 #1: Setup Tutorial
1
Instal llama.cpp
apt-get update
apt-get install pciutils build-essential cmake curl libcurl4-openssl-dev git-all -y
git clone https://github.com/ggml-org/llama.cpp
cmake llama.cpp -B llama.cpp/build \
-DBUILD_SHARED_LIBS=OFF -DGGML_CUDA=ON
cmake --build llama.cpp/build --config Release -j --clean-first --target llama-cli llama-mtmd-cli llama-server llama-gguf-split
cp llama.cpp/build/bin/llama-* llama.cpp
2
3
Start the Llama-server
./llama.cpp/llama-server \
--model unsloth/GLM-4.7-Flash-GGUF/GLM-4.7-Flash-UD-Q4_K_XL.gguf \
--alias "unsloth/GLM-4.7-Flash" \
--temp 1.0 \
--top-p 0.95 \
--min-p 0.01 \
--port 8001 \
--kv-unified \
--cache-type-k q8_0 --cache-type-v q8_0 \
--flash-attn on \
--batch-size 4096 --ubatch-size 1024 \
--ctx-size 131072 OpenAI Codex CLI Tutorial
Install
brew install --cask codexapt update
apt install nodejs npm -y
npm install -g @openai/codex


Last updated
Was this helpful?



