Unsloth Inference

Learn how to run your finetuned model with Unsloth's faster inference.

Unsloth supports natively 2x faster inference. For our inference only notebook, click herearrow-up-right.

All QLoRA, LoRA and non LoRA inference paths are 2x faster. This requires no change of code or any new dependencies.

from unsloth import FastLanguageModel
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "lora_model", # YOUR MODEL YOU USED FOR TRAINING
    max_seq_length = max_seq_length,
    dtype = dtype,
    load_in_4bit = load_in_4bit,
)
FastLanguageModel.for_inference(model) # Enable native 2x faster inference
text_streamer = TextStreamer(tokenizer)
_ = model.generate(**inputs, streamer = text_streamer, max_new_tokens = 64)

NotImplementedError: A UTF-8 locale is required. Got ANSI

Sometimes when you execute a cell this errorarrow-up-right can appear. To solve this, in a new cell, run the below:

import locale
locale.getpreferredencoding = lambda: "UTF-8"

Last updated

Was this helpful?