# Basics

- [Inference & Deployment](/docs/basics/inference-and-deployment.md): Learn how to save your finetuned model so you can run it in your favorite inference engine.
- [Saving to GGUF](/docs/basics/inference-and-deployment/saving-to-gguf.md): Saving models to 16bit for GGUF so you can use it for Ollama, Jan AI, Open WebUI and more!
- [Speculative Decoding](/docs/basics/inference-and-deployment/saving-to-gguf/speculative-decoding.md): Speculative Decoding with llama-server, llama.cpp, vLLM and more for 2x faster inference
- [vLLM Deployment & Inference Guide](/docs/basics/inference-and-deployment/vllm-guide.md): Guide on saving and deploying LLMs to vLLM for serving LLMs in production
- [vLLM Engine Arguments](/docs/basics/inference-and-deployment/vllm-guide/vllm-engine-arguments.md)
- [LoRA Hot Swapping Guide](/docs/basics/inference-and-deployment/vllm-guide/lora-hot-swapping-guide.md)
- [Saving to Ollama](/docs/basics/inference-and-deployment/saving-to-ollama.md)
- [Deploying models to LM Studio](/docs/basics/inference-and-deployment/lm-studio.md): Saving models to GGUF so you can run and deploy them to LM Studio
- [How to install LM Studio CLI in Linux Terminal](/docs/basics/inference-and-deployment/lm-studio/how-to-install-lm-studio-cli-in-linux-terminal.md): LM Studio CLI installation guide without a UI in a terminal instance.
- [SGLang Deployment & Inference Guide](/docs/basics/inference-and-deployment/sglang-guide.md): Guide on saving and deploying LLMs to SGLang for serving LLMs in production
- [llama-server & OpenAI endpoint Deployment Guide](/docs/basics/inference-and-deployment/llama-server-and-openai-endpoint.md): Deploying via llama-server with an OpenAI compatible endpoint
- [How to Run and Deploy LLMs on your iOS or Android Phone](/docs/basics/inference-and-deployment/deploy-llms-phone.md): Tutorial for fine-tuning your own LLM and deploying it on your Android or iPhone with ExecuTorch.
- [Troubleshooting Inference](/docs/basics/inference-and-deployment/troubleshooting-inference.md): If you're experiencing issues when running or saving your model.
- [How to Run Local LLMs with Claude Code](/docs/basics/claude-code.md): Guide to use open models with Claude Code on your local device.
- [How to Run Local LLMs with OpenAI Codex](/docs/basics/codex.md): Use open models with OpenAI Codex on your device locally.
- [Multi-GPU Fine-tuning with Unsloth](/docs/basics/multi-gpu-training-with-unsloth.md): Learn how to fine-tune LLMs on multiple GPUs and parallelism with Unsloth.
- [Multi-GPU Fine-tuning with Distributed Data Parallel (DDP)](/docs/basics/multi-gpu-training-with-unsloth/ddp.md): Learn how to use the Unsloth CLI to train on multiple GPUs with Distributed Data Parallel (DDP)!
- [Fine-tuning Embedding Models with Unsloth Guide](/docs/basics/embedding-finetuning.md): Learn how to easily fine-tune embedding models with Unsloth.
- [Fine-tune MoE Models 12x Faster with Unsloth](/docs/basics/faster-moe.md): Train MoE LLMs locally using Unsloth Guide.
- [Text-to-Speech (TTS) Fine-tuning Guide](/docs/basics/text-to-speech-tts-fine-tuning.md): Learn how to to fine-tune TTS & STT voice models with Unsloth.
- [Unsloth Dynamic 2.0 GGUFs](/docs/basics/unsloth-dynamic-2.0-ggufs.md): A big new upgrade to our Dynamic Quants!
- [Unsloth Dynamic GGUFs on Aider Polyglot](/docs/basics/unsloth-dynamic-2.0-ggufs/unsloth-dynamic-ggufs-on-aider-polyglot.md): Performance of Unsloth Dynamic GGUFs on Aider Polyglot Benchmarks
- [Tool Calling Guide for Local LLMs](/docs/basics/tool-calling-guide-for-local-llms.md)
- [Vision Fine-tuning](/docs/basics/vision-fine-tuning.md): Learn how to fine-tune vision/multimodal LLMs with Unsloth
- [Troubleshooting & FAQs](/docs/basics/troubleshooting-and-faqs.md): Tips to solve issues, and frequently asked questions.
- [Hugging Face Hub, XET debugging](/docs/basics/troubleshooting-and-faqs/hugging-face-hub-xet-debugging.md): Debugging, troubleshooting stalled, stuck downloads and slow downloads
- [Chat Templates](/docs/basics/chat-templates.md): Learn the fundamentals and customization options of chat templates, including Conversational, ChatML, ShareGPT, Alpaca formats, and more!
- [Unsloth Environment Flags](/docs/basics/unsloth-environment-flags.md): Advanced flags which might be useful if you see breaking finetunes, or you want to turn stuff off.
- [Continued Pretraining](/docs/basics/continued-pretraining.md): AKA as Continued Finetuning. Unsloth allows you to continually pretrain so a model can learn a new language.
- [Finetuning from Last Checkpoint](/docs/basics/finetuning-from-last-checkpoint.md): Checkpointing allows you to save your finetuning progress so you can pause it and then continue.
- [Unsloth Benchmarks](/docs/basics/unsloth-benchmarks.md): Unsloth recorded benchmarks on NVIDIA GPUs.
