# Unsloth Documentation

## 🇺🇸 English

- [Unsloth Docs](/docs/get-started/readme.md): Unsloth is an open-source framework for running and training models.
- [Fine-tuning for Beginners](/docs/get-started/fine-tuning-for-beginners.md)
- [Unsloth Requirements](/docs/get-started/fine-tuning-for-beginners/unsloth-requirements.md): Here are Unsloth's requirements including system and GPU VRAM requirements.
- [FAQ + Is Fine-tuning Right For Me?](/docs/get-started/fine-tuning-for-beginners/faq-+-is-fine-tuning-right-for-me.md): If you're stuck on if fine-tuning is right for you, see here! Learn about fine-tuning misconceptions, how it compared to RAG and more:
- [Unsloth Notebooks](/docs/get-started/unsloth-notebooks.md): Fine-tuning notebooks: Explore the Unsloth catalog.
- [Unsloth Model Catalog](/docs/get-started/unsloth-model-catalog.md)
- [Unsloth Installation](/docs/get-started/install.md): Learn to install Unsloth locally or online.
- [Install Unsloth via pip and uv](/docs/get-started/install/pip-install.md): To install Unsloth locally via Pip, follow the steps below:
- [Install Unsloth on MacOS](/docs/get-started/install/mac.md)
- [How to Fine-Tune LLMs on Windows with Unsloth (Step-by-Step Guide)](/docs/get-started/install/windows-installation.md): See how to install Unsloth on Windows to start fine-tuning LLMs locally.
- [Install Unsloth via Docker](/docs/get-started/install/docker.md): Install Unsloth using our official Docker container
- [Updating Unsloth](/docs/get-started/install/updating.md): To update or use an old version of Unsloth, follow the steps below:
- [Fine-tuning LLMs on AMD GPUs with Unsloth Guide](/docs/get-started/install/amd.md): Learn how for fine-tune large language models (LLMs) on AMD GPUs with Unsloth.
- [Fine-tuning LLMs on Intel GPUs with Unsloth](/docs/get-started/install/intel.md): Learn how to train and fine-tune large language models on Intel GPUs.
- [Fine-tuning LLMs Guide](/docs/get-started/fine-tuning-llms-guide.md): Learn all the basics and best practices of fine-tuning. Beginner-friendly.
- [Datasets Guide](/docs/get-started/fine-tuning-llms-guide/datasets-guide.md): Learn how to create & prepare a dataset for fine-tuning.
- [LoRA fine-tuning Hyperparameters Guide](/docs/get-started/fine-tuning-llms-guide/lora-hyperparameters-guide.md): Learn step-by-step the best LLM fine-tuning settings - LoRA rank & alpha, epochs, batch size + gradient accumulation, QLoRA vs. LoRA, target modules, and more.
- [What Model Should I Use for Fine-tuning?](/docs/get-started/fine-tuning-llms-guide/what-model-should-i-use.md)
- [Tutorial: How to Finetune Llama-3 and Use In Ollama](/docs/get-started/fine-tuning-llms-guide/tutorial-how-to-finetune-llama-3-and-use-in-ollama.md): Beginner's Guide for creating a customized personal assistant (like ChatGPT) to run locally on Ollama
- [Reinforcement Learning (RL) Guide](/docs/get-started/reinforcement-learning-rl-guide.md): Learn all about Reinforcement Learning (RL) and how to train your own DeepSeek-R1 reasoning model with Unsloth using GRPO. A complete guide from beginner to advanced.
- [Reinforcement Learning GRPO with 7x Longer Context](/docs/get-started/reinforcement-learning-rl-guide/grpo-long-context.md): Learn how Unsloth enables ultra long context RL fine-tuning.
- [Vision Reinforcement Learning (VLM RL)](/docs/get-started/reinforcement-learning-rl-guide/vision-reinforcement-learning-vlm-rl.md): Train Vision/multimodal models via GRPO and RL with Unsloth!
- [FP8 Reinforcement Learning](/docs/get-started/reinforcement-learning-rl-guide/fp8-reinforcement-learning.md): Train reinforcement learning (RL) and GRPO in FP8 precision with Unsloth.
- [Tutorial: Train your own Reasoning model with GRPO](/docs/get-started/reinforcement-learning-rl-guide/tutorial-train-your-own-reasoning-model-with-grpo.md): Beginner's Guide to transforming a model like Llama 3.1 (8B) into a reasoning model by using Unsloth and GRPO.
- [Advanced Reinforcement Learning Documentation](/docs/get-started/reinforcement-learning-rl-guide/advanced-rl-documentation.md): Advanced documentation settings when using Unsloth with GRPO.
- [GSPO Reinforcement Learning](/docs/get-started/reinforcement-learning-rl-guide/advanced-rl-documentation/gspo-reinforcement-learning.md): Train with GSPO (Group Sequence Policy Optimization) RL in Unsloth.
- [RL Reward Hacking](/docs/get-started/reinforcement-learning-rl-guide/advanced-rl-documentation/rl-reward-hacking.md): Learn what is Reward Hacking in Reinforcement Learning and how to counter it.
- [FP16 vs BF16 for RL](/docs/get-started/reinforcement-learning-rl-guide/advanced-rl-documentation/fp16-vs-bf16-for-rl.md): Defeating the Training-Inference Mismatch via FP16 https://arxiv.org/pdf/2510.26788 shows how using float16 is better than bfloat16
- [Memory Efficient RL](/docs/get-started/reinforcement-learning-rl-guide/memory-efficient-rl.md)
- [Preference Optimization Training - DPO, ORPO & KTO](/docs/get-started/reinforcement-learning-rl-guide/preference-dpo-orpo-and-kto.md): Learn about preference alignment fine-tuning with DPO, GRPO, ORPO or KTO via Unsloth, follow the steps below:
- [Introducing Unsloth Studio](/docs/new/studio.md): Run and train AI models locally with Unsloth Studio.
- [Get started with Unsloth Studio](/docs/new/studio/start.md): A guide for getting started with the fine-tuning studio, data recipes, model exporting, and chat.
- [How to Run models with Unsloth Studio](/docs/new/studio/chat.md): Run AI models, LLMs and GGUFs locally with Unsloth Studio.
- [Unsloth Studio Installation](/docs/new/studio/install.md): Learn how to install Unsloth Studio on your local device.
- [Unsloth Data Recipes](/docs/new/studio/data-recipe.md): Learn how to create, build and edit datasets with Unsloth Studio's Data Recipes.
- [Export models with Unsloth Studio](/docs/new/studio/export.md): Learn how to export your safetensor or LoRA model files to GGUF or other formats.
- [Unsloth Updates](/docs/new/changelog.md): Unsloth Changelog for our latest releases, improvements and fixes.
- [Qwen3.6 - How to Run Locally](/docs/models/qwen3.6.md): Run the new Qwen3.6-35-A3B model locally!
- [Gemma 4 - How to Run Locally](/docs/models/gemma-4.md): Run Google’s new Gemma 4 models locally, including E2B, E4B, 26B A4B, and 31B.
- [Gemma 4 Fine-tuning Guide](/docs/models/gemma-4/train.md): Train Gemma 4 by Google with Unsloth.
- [Qwen3.5 - How to Run Locally](/docs/models/qwen3.5.md): Run the new Qwen3.5 LLMs including Medium: Qwen3.5-35B-A3B, 27B, 122B-A10B, Small: Qwen3.5-0.8B, 2B, 4B, 9B and 397B-A17B on your local device!
- [Qwen3.5 Fine-tuning Guide](/docs/models/qwen3.5/fine-tune.md): Learn how to fine-tune Qwen3.5 LLMs with Unsloth.
- [Qwen3.5 GGUF Benchmarks](/docs/models/qwen3.5/gguf-benchmarks.md): See how Unsloth Dynamic GGUFs perform + analysis of perplexity, KL divergence & MXFP4.
- [GLM-5.1 - How to Run Locally](/docs/models/glm-5.1.md): Run the new GLM-5.1 model by Z.ai on your own local device!
- [MiniMax-M2.7 - How to Run Locally](/docs/models/minimax-m27.md): Run MiniMax-M2.7 LLM locally on your own device!
- [NVIDIA Nemotron 3 Nano - How To Run Guide](/docs/models/nemotron-3.md): Run & fine-tune NVIDIA Nemotron 3 Nano locally on your device!
- [NVIDIA Nemotron-3-Super: How To Run Guide](/docs/models/nemotron-3/nemotron-3-super.md): Run & fine-tune NVIDIA Nemotron-3-Super-120B-A12B locally on your device!
- [Qwen3-Coder-Next: How to Run Locally](/docs/models/qwen3-coder-next.md): Guide to run Qwen3-Coder-Next locally on your device!
- [GLM-4.7-Flash: How To Run Locally](/docs/models/glm-4.7-flash.md): Run & fine-tune GLM-4.7-Flash locally on your device!
- [Kimi K2.5: How to Run Locally Guide](/docs/models/kimi-k2.5.md): Guide on running Kimi-K2.5 on your own local device!
- [gpt-oss: How to Run Guide](/docs/models/gpt-oss-how-to-run-and-fine-tune.md): Run & fine-tune OpenAI's new open-source models!
- [gpt-oss Reinforcement Learning](/docs/models/gpt-oss-how-to-run-and-fine-tune/gpt-oss-reinforcement-learning.md)
- [Tutorial: How to Train gpt-oss with RL](/docs/models/gpt-oss-how-to-run-and-fine-tune/gpt-oss-reinforcement-learning/tutorial-how-to-train-gpt-oss-with-rl.md): Learn to train OpenAI gpt-oss with GRPO to autonomously beat 2048 locally or on Colab.
- [Tutorial: How to Fine-tune gpt-oss](/docs/models/gpt-oss-how-to-run-and-fine-tune/tutorial-how-to-fine-tune-gpt-oss.md): Learn step-by-step how to train OpenAI gpt-oss locally with Unsloth.
- [Long Context gpt-oss Training](/docs/models/gpt-oss-how-to-run-and-fine-tune/long-context-gpt-oss-training.md)
- [Large language model (LLMs) Tutorials](/docs/models/tutorials.md): Discover the latest LLMs and learn how to run and fine-tune models locally for optimal performance with Unsloth.
- [GLM-5: How to Run Locally Guide](/docs/models/tutorials/glm-5.md): Run the new GLM-5 model by Z.ai on your own local device!
- [Qwen3 - How to Run & Fine-tune](/docs/models/tutorials/qwen3-how-to-run-and-fine-tune.md): Learn to run & fine-tune Qwen3 locally with Unsloth + our Dynamic 2.0 quants
- [Qwen3-VL: How to Run Guide](/docs/models/tutorials/qwen3-how-to-run-and-fine-tune/qwen3-vl-how-to-run-and-fine-tune.md): Learn to fine-tune and run Qwen3-VL locally with Unsloth.
- [Qwen3-2507: Run Locally Guide](/docs/models/tutorials/qwen3-how-to-run-and-fine-tune/qwen3-2507.md): Run Qwen3-30B-A3B-2507 and 235B-A22B Thinking and Instruct versions locally on your device!
- [MiniMax-M2.5: How to Run Guide](/docs/models/tutorials/minimax-m25.md): Run MiniMax-M2.5 locally on your own device!
- [Qwen3-Coder: How to Run Locally](/docs/models/tutorials/qwen3-coder-how-to-run-locally.md): Run Qwen3-Coder-30B-A3B-Instruct and 480B-A35B locally with Unsloth Dynamic quants.
- [Gemma 3 - How to Run Guide](/docs/models/tutorials/gemma-3-how-to-run-and-fine-tune.md): How to run Gemma 3 effectively with our GGUFs on llama.cpp, Ollama, Open WebUI and how to fine-tune with Unsloth!
- [Gemma 3n: How to Run & Fine-tune](/docs/models/tutorials/gemma-3-how-to-run-and-fine-tune/gemma-3n-how-to-run-and-fine-tune.md): Run Google's new Gemma 3n locally with Dynamic GGUFs on llama.cpp, Ollama, Open WebUI and fine-tune with Unsloth!
- [DeepSeek-OCR 2: How to Run & Fine-tune Guide](/docs/models/tutorials/deepseek-ocr-2.md): Guide on how to run and fine-tune DeepSeek-OCR-2 locally.
- [GLM-4.7: How to Run Locally Guide](/docs/models/tutorials/glm-4.7.md): A guide on how to run Z.ai GLM-4.7 model on your own local device!
- [How to Run Qwen-Image-2512 Locally in ComfyUI](/docs/models/tutorials/qwen-image-2512.md): Step-by-step tutorial for running Qwen-Image-2512 on your local device with ComfyUI.
- [Run Qwen-Image-2512 in stable-diffusion.cpp Tutorial](/docs/models/tutorials/qwen-image-2512/stable-diffusion.cpp.md): Tutorial for using Qwen-Image-2512 in stable-diffusion.cpp.
- [Devstral 2 - How to Run Guide](/docs/models/tutorials/devstral-2.md): Guide for local running Mistral Devstral 2 models: 123B-Instruct-2512 and Small-2-24B-Instruct-2512.
- [Ministral 3 - How to Run Guide](/docs/models/tutorials/ministral-3.md): Guide for Mistral Ministral 3 models, to run or fine-tune locally on your device
- [DeepSeek-OCR: How to Run & Fine-tune](/docs/models/tutorials/deepseek-ocr-how-to-run-and-fine-tune.md): Guide on how to run and fine-tune DeepSeek-OCR locally.
- [Kimi K2 Thinking: Run Locally Guide](/docs/models/tutorials/kimi-k2-thinking-how-to-run-locally.md): Guide on running Kimi-K2-Thinking and Kimi-K2 on your own local device!
- [GLM-4.6: Run Locally Guide](/docs/models/tutorials/glm-4.6-how-to-run-locally.md): A guide on how to run Z.ai GLM-4.6 and GLM-4.6V-Flash model on your own local device!
- [Qwen3-Next: Run Locally Guide](/docs/models/tutorials/qwen3-next.md): Run Qwen3-Next-80B-A3B-Instruct and Thinking versions locally on your device!
- [FunctionGemma: How to Run & Fine-tune](/docs/models/tutorials/functiongemma.md): Learn how to run and fine-tune FunctionGemma locally on your device and phone.
- [DeepSeek-V3.1: How to Run Locally](/docs/models/tutorials/deepseek-v3.1-how-to-run-locally.md): A guide on how to run DeepSeek-V3.1 and Terminus on your own local device!
- [DeepSeek-R1-0528: How to Run Locally](/docs/models/tutorials/deepseek-r1-0528-how-to-run-locally.md): A guide on how to run DeepSeek-R1-0528 including Qwen3 on your own local device!
- [Liquid LFM2.5: How To Run & Fine-tune](/docs/models/tutorials/lfm2.5.md): Run and fine-tune LFM2.5 Instruct and Vision locally on your device!
- [Magistral: How to Run & Fine-tune](/docs/models/tutorials/magistral-how-to-run-and-fine-tune.md): Meet Magistral - Mistral's new reasoning models.
- [IBM Granite 4.0](/docs/models/tutorials/ibm-granite-4.0.md): How to run IBM Granite-4.0 with Unsloth GGUFs on llama.cpp, Ollama and how to fine-tune!
- [Llama 4: How to Run & Fine-tune](/docs/models/tutorials/llama-4-how-to-run-and-fine-tune.md): How to run Llama 4 locally using our dynamic GGUFs which recovers accuracy compared to standard quantization.
- [Grok 2](/docs/models/tutorials/grok-2.md): Run xAI's Grok 2 model locally!
- [Devstral: How to Run & Fine-tune](/docs/models/tutorials/devstral-how-to-run-and-fine-tune.md): Run and fine-tune Mistral Devstral 1.1, including Small-2507 and 2505.
- [How to Run Local LLMs with Docker: Step-by-Step Guide](/docs/models/tutorials/how-to-run-llms-with-docker.md): Learn how to run Large Language Models (LLMs) with Docker & Unsloth on your local device.
- [DeepSeek-V3-0324: How to Run Locally](/docs/models/tutorials/deepseek-v3-0324-how-to-run-locally.md): How to run DeepSeek-V3-0324 locally using our dynamic quants which recovers accuracy
- [DeepSeek-R1: How to Run Locally](/docs/models/tutorials/deepseek-r1-how-to-run-locally.md): A guide on how you can run our 1.58-bit Dynamic Quants for DeepSeek-R1 using llama.cpp.
- [DeepSeek-R1 Dynamic 1.58-bit](/docs/models/tutorials/deepseek-r1-how-to-run-locally/deepseek-r1-dynamic-1.58-bit.md): See performance comparison tables for Unsloth's Dynamic GGUF Quants vs Standard IMatrix Quants.
- [Phi-4 Reasoning: How to Run & Fine-tune](/docs/models/tutorials/phi-4-reasoning-how-to-run-and-fine-tune.md): Learn to run & fine-tune Phi-4 reasoning models locally with Unsloth + our Dynamic 2.0 quants
- [QwQ-32B: How to Run effectively](/docs/models/tutorials/qwq-32b-how-to-run-effectively.md): How to run QwQ-32B effectively with our bug fixes and without endless generations + GGUFs.
- [Inference & Deployment](/docs/basics/inference-and-deployment.md): Learn how to save your finetuned model so you can run it in your favorite inference engine.
- [Saving to GGUF](/docs/basics/inference-and-deployment/saving-to-gguf.md): Saving models to 16bit for GGUF so you can use it for Ollama, Jan AI, Open WebUI and more!
- [Speculative Decoding](/docs/basics/inference-and-deployment/saving-to-gguf/speculative-decoding.md): Speculative Decoding with llama-server, llama.cpp, vLLM and more for 2x faster inference
- [vLLM Deployment & Inference Guide](/docs/basics/inference-and-deployment/vllm-guide.md): Guide on saving and deploying LLMs to vLLM for serving LLMs in production
- [vLLM Engine Arguments](/docs/basics/inference-and-deployment/vllm-guide/vllm-engine-arguments.md)
- [LoRA Hot Swapping Guide](/docs/basics/inference-and-deployment/vllm-guide/lora-hot-swapping-guide.md)
- [Saving to Ollama](/docs/basics/inference-and-deployment/saving-to-ollama.md)
- [Deploying models to LM Studio](/docs/basics/inference-and-deployment/lm-studio.md): Saving models to GGUF so you can run and deploy them to LM Studio
- [How to install LM Studio CLI in Linux Terminal](/docs/basics/inference-and-deployment/lm-studio/how-to-install-lm-studio-cli-in-linux-terminal.md): LM Studio CLI installation guide without a UI in a terminal instance.
- [SGLang Deployment & Inference Guide](/docs/basics/inference-and-deployment/sglang-guide.md): Guide on saving and deploying LLMs to SGLang for serving LLMs in production
- [llama-server & OpenAI endpoint Deployment Guide](/docs/basics/inference-and-deployment/llama-server-and-openai-endpoint.md): Deploying via llama-server with an OpenAI compatible endpoint
- [How to Run and Deploy LLMs on your iOS or Android Phone](/docs/basics/inference-and-deployment/deploy-llms-phone.md): Tutorial for fine-tuning your own LLM and deploying it on your Android or iPhone with ExecuTorch.
- [Troubleshooting Inference](/docs/basics/inference-and-deployment/troubleshooting-inference.md): If you're experiencing issues when running or saving your model.
- [How to Run Local LLMs with Claude Code](/docs/basics/claude-code.md): Guide to use open models with Claude Code on your local device.
- [How to Run Local LLMs with OpenAI Codex](/docs/basics/codex.md): Use open models with OpenAI Codex on your device locally.
- [Multi-GPU Fine-tuning with Unsloth](/docs/basics/multi-gpu-training-with-unsloth.md): Learn how to fine-tune LLMs on multiple GPUs and parallelism with Unsloth.
- [Multi-GPU Fine-tuning with Distributed Data Parallel (DDP)](/docs/basics/multi-gpu-training-with-unsloth/ddp.md): Learn how to use the Unsloth CLI to train on multiple GPUs with Distributed Data Parallel (DDP)!
- [Fine-tuning Embedding Models with Unsloth Guide](/docs/basics/embedding-finetuning.md): Learn how to easily fine-tune embedding models with Unsloth.
- [Fine-tune MoE Models 12x Faster with Unsloth](/docs/basics/faster-moe.md): Train MoE LLMs locally using Unsloth Guide.
- [Text-to-Speech (TTS) Fine-tuning Guide](/docs/basics/text-to-speech-tts-fine-tuning.md): Learn how to to fine-tune TTS & STT voice models with Unsloth.
- [Unsloth Dynamic 2.0 GGUFs](/docs/basics/unsloth-dynamic-2.0-ggufs.md): A big new upgrade to our Dynamic Quants!
- [Unsloth Dynamic GGUFs on Aider Polyglot](/docs/basics/unsloth-dynamic-2.0-ggufs/unsloth-dynamic-ggufs-on-aider-polyglot.md): Performance of Unsloth Dynamic GGUFs on Aider Polyglot Benchmarks
- [Tool Calling Guide for Local LLMs](/docs/basics/tool-calling-guide-for-local-llms.md)
- [Vision Fine-tuning](/docs/basics/vision-fine-tuning.md): Learn how to fine-tune vision/multimodal LLMs with Unsloth
- [Troubleshooting & FAQs](/docs/basics/troubleshooting-and-faqs.md): Tips to solve issues, and frequently asked questions.
- [Hugging Face Hub, XET debugging](/docs/basics/troubleshooting-and-faqs/hugging-face-hub-xet-debugging.md): Debugging, troubleshooting stalled, stuck downloads and slow downloads
- [Chat Templates](/docs/basics/chat-templates.md): Learn the fundamentals and customization options of chat templates, including Conversational, ChatML, ShareGPT, Alpaca formats, and more!
- [Unsloth Environment Flags](/docs/basics/unsloth-environment-flags.md): Advanced flags which might be useful if you see breaking finetunes, or you want to turn stuff off.
- [Continued Pretraining](/docs/basics/continued-pretraining.md): AKA as Continued Finetuning. Unsloth allows you to continually pretrain so a model can learn a new language.
- [Finetuning from Last Checkpoint](/docs/basics/finetuning-from-last-checkpoint.md): Checkpointing allows you to save your finetuning progress so you can pause it and then continue.
- [Unsloth Benchmarks](/docs/basics/unsloth-benchmarks.md): Unsloth recorded benchmarks on NVIDIA GPUs.
- [3x Faster LLM Training with Unsloth Kernels + Packing](/docs/blog/3x-faster-training-packing.md): Learn how Unsloth increases training throughput and eliminates padding waste for fine-tuning.
- [500K Context Length Fine-tuning](/docs/blog/500k-context-length-fine-tuning.md): Learn how to enable >500K token context window fine-tuning with Unsloth.
- [Quantization-Aware Training (QAT)](/docs/blog/quantization-aware-training-qat.md): Quantize models to 4-bit with Unsloth and PyTorch to recover accuracy.
- [Fine-Tuning LLMs on NVIDIA DGX Station with Unsloth](/docs/blog/dgx-station.md): NVIDIA DGX Station tutorial on how to fine-tune with notebooks from Unsloth.
- [How to Fine-tune LLMs with Unsloth & Docker](/docs/blog/how-to-fine-tune-llms-with-unsloth-and-docker.md): Learn how to fine-tune LLMs or do Reinforcement Learning (RL) with Unsloth's Docker image.
- [Fine-tuning LLMs with NVIDIA DGX Spark and Unsloth](/docs/blog/fine-tuning-llms-with-nvidia-dgx-spark-and-unsloth.md): Tutorial on how to fine-tune and do reinforcement learning (RL) with OpenAI gpt-oss on NVIDIA DGX Spark.
- [Fine-tuning LLMs with Blackwell, RTX 50 series & Unsloth](/docs/blog/fine-tuning-llms-with-blackwell-rtx-50-series-and-unsloth.md): Learn how to fine-tune LLMs on NVIDIA's Blackwell RTX 50 series and B200 GPUs with our step-by-step guide.