# Unsloth 推理

Unsloth 原生支持 2 倍更快的推理。有关仅推理的笔记本，请点击 [这里](https://colab.research.google.com/drive/1aqlNQi7MMJbynFDyOQteD2t0yVfjb9Zh?usp=sharing).

所有 QLoRA、LoRA 和非 LoRA 的推理路径均快 2 倍。这不需要更改代码或新增依赖项。

<pre class="language-python"><code class="lang-python"><strong>from unsloth import FastLanguageModel
</strong>model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "lora_model", # 您用于训练的模型
    max_seq_length = max_seq_length,
    dtype = dtype,
    load_in_4bit = load_in_4bit,
)
FastLanguageModel.for_inference(model) # 启用原生 2 倍加速推理
text_streamer = TextStreamer(tokenizer)
_ = model.generate(**inputs, streamer = text_streamer, max_new_tokens = 64)
</code></pre>

#### NotImplementedError: 需要 UTF-8 区域设置。当前为 ANSI

有时当您执行一个单元格时 [会出现此错误](https://github.com/googlecolab/colabtools/issues/3409) 要解决此问题，请在新单元格中运行以下命令：

```python
import locale
locale.getpreferredencoding = lambda: "UTF-8"
```


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://unsloth.ai/docs/zh/ji-chu/inference-and-deployment/unsloth-inference.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
