# Basics

- [How to use Unsloth as an API endpoint](https://unsloth.ai/docs/basics/api.md)
- [Inference & Deployment](https://unsloth.ai/docs/basics/inference-and-deployment.md): Learn how to save your finetuned model so you can run it in your favorite inference engine.
- [Saving to GGUF](https://unsloth.ai/docs/basics/inference-and-deployment/saving-to-gguf.md)
- [Speculative Decoding](https://unsloth.ai/docs/basics/inference-and-deployment/saving-to-gguf/speculative-decoding.md): Speculative Decoding with llama-server, llama.cpp, vLLM and more for 2x faster inference
- [vLLM Deployment & Inference Guide](https://unsloth.ai/docs/basics/inference-and-deployment/vllm-guide.md): Guide on saving and deploying LLMs to vLLM for serving LLMs in production
- [vLLM Engine Arguments](https://unsloth.ai/docs/basics/inference-and-deployment/vllm-guide/vllm-engine-arguments.md)
- [LoRA Hot Swapping Guide](https://unsloth.ai/docs/basics/inference-and-deployment/vllm-guide/lora-hot-swapping-guide.md)
- [Saving to Ollama](https://unsloth.ai/docs/basics/inference-and-deployment/saving-to-ollama.md)
- [Deploying models to LM Studio](https://unsloth.ai/docs/basics/inference-and-deployment/lm-studio.md): Saving models to GGUF so you can run and deploy them to LM Studio
- [How to install LM Studio CLI in Linux Terminal](https://unsloth.ai/docs/basics/inference-and-deployment/lm-studio/how-to-install-lm-studio-cli-in-linux-terminal.md): LM Studio CLI installation guide without a UI in a terminal instance.
- [SGLang Deployment & Inference Guide](https://unsloth.ai/docs/basics/inference-and-deployment/sglang-guide.md): Guide on saving and deploying LLMs to SGLang for serving LLMs in production
- [Unsloth Inference](https://unsloth.ai/docs/basics/inference-and-deployment/unsloth-inference.md): Learn how to run your finetuned model with Unsloth's faster inference.
- [llama-server & OpenAI endpoint Deployment Guide](https://unsloth.ai/docs/basics/inference-and-deployment/llama-server-and-openai-endpoint.md): Deploying via llama-server with an OpenAI compatible endpoint
- [How to Run and Deploy LLMs on your iOS or Android Phone](https://unsloth.ai/docs/basics/inference-and-deployment/deploy-llms-phone.md): Tutorial for fine-tuning your own LLM and deploying it on your Android or iPhone with ExecuTorch.
- [Troubleshooting Inference](https://unsloth.ai/docs/basics/inference-and-deployment/troubleshooting-inference.md): If you're experiencing issues when running or saving your model.
- [Deploying LLMs with Hugging Face Jobs](https://unsloth.ai/docs/basics/inference-and-deployment/deploying-llms-with-hugging-face-jobs.md): Using Hugging Face jobs and skills to fine-tune LFM with Codex / Claude Code with a SKILL.
- [How to Run Local LLMs with Claude Code](https://unsloth.ai/docs/basics/claude-code.md): Guide to use open models with Claude Code on your local device.
- [How to Run Local LLMs with OpenAI Codex](https://unsloth.ai/docs/basics/codex.md): Use open models with OpenAI Codex on your device locally.
- [Multi-GPU Fine-tuning with Unsloth](https://unsloth.ai/docs/basics/multi-gpu-training-with-unsloth.md): Learn how to fine-tune LLMs on multiple GPUs and parallelism with Unsloth.
- [Multi-GPU Fine-tuning with Distributed Data Parallel (DDP)](https://unsloth.ai/docs/basics/multi-gpu-training-with-unsloth/ddp.md): Learn how to use the Unsloth CLI to train on multiple GPUs with Distributed Data Parallel (DDP)!
- [Fine-tuning Embedding Models with Unsloth Guide](https://unsloth.ai/docs/basics/embedding-finetuning.md): Learn how to easily fine-tune embedding models with Unsloth.
- [Fine-tune MoE Models 12x Faster with Unsloth](https://unsloth.ai/docs/basics/faster-moe.md): Train MoE LLMs locally using Unsloth Guide.
- [Text-to-Speech (TTS) Fine-tuning Guide](https://unsloth.ai/docs/basics/text-to-speech-tts-fine-tuning.md): Learn how to to fine-tune TTS & STT voice models with Unsloth.
- [Unsloth Dynamic 2.0 GGUFs](https://unsloth.ai/docs/basics/unsloth-dynamic-2.0-ggufs.md): A big new upgrade to our Dynamic Quants!
- [Unsloth Dynamic GGUFs on Aider Polyglot](https://unsloth.ai/docs/basics/unsloth-dynamic-2.0-ggufs/unsloth-dynamic-ggufs-on-aider-polyglot.md): Performance of Unsloth Dynamic GGUFs on Aider Polyglot Benchmarks
- [Tool Calling Guide for Local LLMs](https://unsloth.ai/docs/basics/tool-calling-guide-for-local-llms.md)
- [Vision Fine-tuning](https://unsloth.ai/docs/basics/vision-fine-tuning.md): Learn how to fine-tune vision/multimodal LLMs with Unsloth
- [Troubleshooting & FAQs](https://unsloth.ai/docs/basics/troubleshooting-and-faqs.md): Tips to solve issues, and frequently asked questions.
- [Hugging Face Hub, XET debugging](https://unsloth.ai/docs/basics/troubleshooting-and-faqs/hugging-face-hub-xet-debugging.md): Debugging, troubleshooting stalled, stuck downloads and slow downloads
- [Chat Templates](https://unsloth.ai/docs/basics/chat-templates.md): Learn the fundamentals and customization options of chat templates, including Conversational, ChatML, ShareGPT, Alpaca formats, and more!
- [Unsloth Environment Flags](https://unsloth.ai/docs/basics/unsloth-environment-flags.md): Advanced flags which might be useful if you see breaking finetunes, or you want to turn stuff off.
- [Continued Pretraining](https://unsloth.ai/docs/basics/continued-pretraining.md): AKA as Continued Finetuning. Unsloth allows you to continually pretrain so a model can learn a new language.
- [Finetuning from Last Checkpoint](https://unsloth.ai/docs/basics/finetuning-from-last-checkpoint.md): Checkpointing allows you to save your finetuning progress so you can pause it and then continue.
- [Unsloth Benchmarks](https://unsloth.ai/docs/basics/unsloth-benchmarks.md): Unsloth recorded benchmarks on NVIDIA GPUs.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://unsloth.ai/docs/basics.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.