💧Liquid LFM2.5: Ausführen & Finetunen

Führe LFM2.5 Instruct und Vision lokal auf deinem Gerät aus und finetune sie!

Liquid AI veröffentlicht LFM2.5, einschließlich ihres instruct und vision Modells. LFM2.5-1.2B-Instruct ist ein hybrides Reasoning-Modell mit 1,17 Mrd. Parametern, trainiert auf 28T Token und RL und bietet erstklassige Leistung im 1B‑Bereich für Instruktionsbefolgung, Werkzeugnutzung und agentische Aufgaben.

LFM2.5 läuft mit unter 1GB RAM und erreicht 239 tok/s Dekodierung auf AMD CPU. Sie können es auch feinabstimmen lokal mit Unsloth.

Text LFM2.5-Instruct Vision LFM2.5-VL

Dynamische GGUFs

16-Bit Instruct

LFM2.5-1.2B-Instruct-GGUF

LFM2.5-1.2B-Instruct

Modellspezifikationen:

Parameter: 1,17 Mrd.
Architektur: 16 Schichten (10 doppelt geöffnete LIV-Konvolutionsblöcke + 6 GQA-Blöcke)
Trainingsbudget: 28T Token
Kontextlänge: 32.768 Token
Wortschatzgröße: 65,536
Sprachen: Englisch, Arabisch, Chinesisch, Französisch, Deutsch, Japanisch, Koreanisch, Spanisch

⚙️ Gebrauchsanleitung

Liquid AI empfiehlt diese Einstellungen für Inferenz:

temperature = 0.1
top_k = 50
top_p = 0.1
repetition_penalty = 1.05
Maximale Kontextlänge: 32,768

Chat-Vorlagenformat

LFM2.5 verwendet ein ChatML-ähnliches Format:

tokenizer.apply_chat_template([
    {"role": "system", "content": "You are a helpful assistant trained by Liquid AI."},
    {"role": "user", "content": "What is C. elegans?"},
], add_generation_prompt=True, tokenize=False)

LFM2.5 Chat-Vorlage:

<|startoftext|><|im_start|>system
You are a helpful assistant trained by Liquid AI.<|im_end|>
<|im_start|>user
What is C. elegans?<|im_end|>
<|im_start|>assistant

Werkzeugnutzung

LFM2.5 unterstützt Funktionsaufrufe mit Spezialtokens <|tool_call_start|> und <|tool_call_end|>. Stellen Sie Werkzeuge als JSON-Objekt im System-Prompt bereit:

<|startoftext|><|im_start|>system
Liste der Werkzeuge: [{"name": "get_weather", "description": "Gets the current weather", "parameters": {"type": "object", "properties": {"city": {"type": "string"}}}}]<|im_end|>
<|im_start|>user
Wie ist das Wetter in Paris?<|im_end|>
<|im_start|>assistant
<|tool_call_start|>[get_weather(city="Paris")]<|tool_call_end|>

🖥️ LFM2.5-1.2B-Instruct ausführen

📖 llama.cpp Anleitung (GGUF)

1. Baue llama.cpp

Holen Sie sich die neueste llama.cpp von GitHub. Ändern -DGGML_CUDA=ON zu -DGGML_CUDA=OFF wenn Sie keine GPU haben.

apt-get update
apt-get install pciutils build-essential cmake curl libcurl4-openssl-dev -y
git clone https://github.com/ggml-org/llama.cpp
cmake llama.cpp -B llama.cpp/build \
    -DBUILD_SHARED_LIBS=OFF -DGGML_CUDA=ON -DLLAMA_CURL=ON
cmake --build llama.cpp/build --config Release -j --clean-first --target llama-cli llama-server
cp llama.cpp/build/bin/llama-* llama.cpp

2. Direkt von Hugging Face ausführen

./llama.cpp/llama-cli \
    -hf LiquidAI/LFM2.5-1.2B-Instruct-GGUF:Q4_K_M \
    --jinja --ctx-size 32768 \
    --temp 0.1 --top-k 50 --top-p 0.1 --repeat-penalty 1.05

3. Oder lade das Modell zuerst herunter

import os
os.environ["HF_HUB_ENABLE_HF_TRANSFER"] = "1"
from huggingface_hub import snapshot_download
snapshot_download(
    repo_id="LiquidAI/LFM2.5-1.2B-Instruct-GGUF",
    local_dir="LiquidAI/LFM2.5-1.2B-Instruct-GGUF",
    allow_patterns=["*Q4_K_M*"],
)

4. Im Konversationsmodus ausführen

./llama.cpp/llama-cli \
    --model LiquidAI/LFM2.5-1.2B-Instruct-GGUF/LFM2.5-1.2B-Instruct-Q4_K_M.gguf \
    --ctx-size 32768 \
    --n-gpu-layers 99 \
    --seed 3407 \
    --prio 2 \
    --temp 0.1 \
    --top-k 50 \
    --top-p 0.1 \
    --repeat-penalty 1.05 \
    --jinja

🦥 Feinabstimmung von LFM2.5 mit Unsloth

Unsloth unterstützt die Feinabstimmung von LFM2.5-Modellen. Das 1.2B-Modell passt bequem auf eine kostenlose Colab T4 GPU. Das Training ist 2x schneller bei 50% weniger VRAM.

Kostenloses Colab-Notebook:

LFM2.5 wird für agentische Aufgaben, Datenauszug, RAG und Werkzeugnutzung empfohlen. Es wird nicht für wissensintensive Aufgaben oder Programmierung empfohlen.

Unsloth-Konfiguration für LFM2.5

from unsloth import FastLanguageModel
import torch

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name="LiquidAI/LFM2.5-1.2B-Instruct",
    max_seq_length=4096,
    load_in_4bit=False,
)

model = FastLanguageModel.get_peft_model(
    model,
    r=16,
    target_modules = ["q_proj", "k_proj", "v_proj", "out_proj", "in_proj",
                      "w1", "w2", "w3"],
    lora_alpha=16,
    lora_dropout=0,
    bias="none",
    use_gradient_checkpointing="unsloth",
    random_state=3407,
)

Trainingseinrichtung

from trl import SFTTrainer
from transformers import TrainingArguments
from unsloth import is_bfloat16_supported

trainer = SFTTrainer(
    model=model,
    tokenizer=tokenizer,
    train_dataset=dataset,
    dataset_text_field="text",
    max_seq_length=4096,
    dataset_num_proc=2,
    packing=False,
    args=TrainingArguments(
        per_device_train_batch_size=2,
        gradient_accumulation_steps=4,
        warmup_steps=5,
        max_steps=60,
        learning_rate=2e-4,
        fp16=not is_bfloat16_supported(),
        bf16=is_bfloat16_supported(),
        logging_steps=1,
        optim="adamw_8bit",
        weight_decay=0.01,
        lr_scheduler_type="linear",
        seed=3407,
        output_dir="outputs",
    ),
)

trainer.train()

Speichern und Export

# LoRA-Adapter speichern
model.save_pretrained("lfm25_lora")
tokenizer.save_pretrained("lfm25_lora")

# Zusammenführen und in 16bit speichern
model.save_pretrained_merged("lfm25_merged", tokenizer, save_method="merged_16bit")

# Export nach GGUF
model.save_pretrained_gguf("lfm25_gguf", tokenizer, quantization_method="q4_k_m")

🎉 llama-server Bereitstellung & Deployment

Um LFM2.5 produktiv mit einer OpenAI-kompatiblen API bereitzustellen:

./llama.cpp/llama-server \
    --model LiquidAI/LFM2.5-1.2B-Instruct-GGUF/LFM2.5-1.2B-Instruct-Q4_K_M.gguf \
    --alias "LiquidAI/LFM2.5-1.2B-Instruct" \
    --threads -1 \
    --n-gpu-layers 99 \
    --ctx-size 32768 \
    --port 8001 \
    --temp 0.1 \
    --top-k 50 \
    --top-p 0.1 \
    --repeat-penalty 1.05 \
    --jinja

Mit OpenAI-Client testen:

from openai import OpenAI

client = OpenAI(
    base_url="http://127.0.0.1:8001/v1",
    api_key="sk-no-key-required",
)

completion = client.chat.completions.create(
    model="LiquidAI/LFM2.5-1.2B-Instruct",
    messages=[{"role": "user", "content": "What is 2+2?"}],
)
print(completion.choices[0].message.content)

📊 Benchmarks

LFM2.5-1.2B-Instruct liefert erstklassige Leistung im 1B‑Bereich und bietet schnelle CPU-Inferenz mit geringem Speicherverbrauch:

💧 Liquid LFM2.5-1.2B-VL Anleitung

LFM2.5-VL-1.6B ist ein Vision-LLM, aufgebaut auf LFM2.5-1.2B-Base und für stärkere reale Leistung abgestimmt. Sie können es jetzt feinabstimmen lokal mit Unsloth.

Ausführungs-Tutorial Feinabstimmungs-Tutorial

Dynamische GGUFs

16-Bit Instruct

LFM2.5-VL-1.6B-GGUF

LFM2.5-VL-1.6B

Modellspezifikationen:

LM Backbone: LFM2.5-1.2B-Base
Vision-Encoder: SigLIP2 NaFlex formoptimierte 400M
Kontextlänge: 32.768 Token
Wortschatzgröße: 65,536
Sprachen: Englisch, Arabisch, Chinesisch, Französisch, Deutsch, Japanisch, Koreanisch und Spanisch
Native Auflösungsverarbeitung: Verarbeitet Bilder bis zu 512×512 Pixel ohne Hochskalierung und bewahrt nicht-standardmäßige Seitenverhältnisse ohne Verzerrung
Kachelstrategie: Teilt große Bilder in nicht überlappende 512×512 Patches und beinhaltet Thumbnail-Codierung für globalen Kontext
Flexibilität zur Inferenzzeit: Vom Benutzer einstellbare maximale Bild-Token und Kachelanzahl für Geschwindigkeits/Qualitäts-Kompro-misse ohne Retraining

⚙️ Nutzungsanleitung

Liquid AI empfiehlt diese Einstellungen für Inferenz:

Text: temperature=0.1, min_p=0.15, repetition_penalty=1.05
Vision: min_image_tokens=64, max_image_tokens=256, do_image_splitting=True

Chat-Vorlagenformat

LFM2.5-VL verwendet ein ChatML-ähnliches Format:

tokenizer.apply_chat_template([
    {
        "role": "user",
        "content": [
            {"type": "image"},
            {"type": "text", "text": "What's in this image?"}
        ]
    },
    {"role": "assistant", "content": "I can see a cat sitting on a couch."}
], tokenize=False)

LFM2.5-VL Chat-Vorlage:

<|startoftext|><|im_start|>system
You are a helpful multimodal assistant by Liquid AI.<|im_end|>
<|im_start|>user
<image>Beschreibe dieses Bild.<|im_end|>
<|im_start|>assistant
Dieses Bild zeigt einen Caenorhabditis elegans (C. elegans) Nematoden.<|im_end|>

🖥️ LFM2.5-VL-1.6B ausführen

📖 llama.cpp Anleitung (GGUF)

1. Baue llama.cpp

Holen Sie sich das neueste llama.cpp von GitHub. Ändern -DGGML_CUDA=ON zu -DGGML_CUDA=OFF wenn Sie keine GPU haben.

apt-get update
apt-get install pciutils build-essential cmake curl libcurl4-openssl-dev -y
git clone https://github.com/ggml-org/llama.cpp
cmake llama.cpp -B llama.cpp/build \
    -DBUILD_SHARED_LIBS=OFF -DGGML_CUDA=ON -DLLAMA_CURL=ON
cmake --build llama.cpp/build --config Release -j --clean-first --target llama-cli llama-server
cp llama.cpp/build/bin/llama-* llama.cpp

2. Direkt von Hugging Face ausführen

./llama.cpp/llama-cli \
  -hf LiquidAI/LFM2.5-VL-1.6B-GGUF:Q4_0 \
  --image test_image.jpg \
  --image-max-tokens 64 \
  -p "What's in this image?" \
  -n 128

🦥 Feinabstimmung von LFM2.5-VL mit Unsloth

Unsloth unterstützt die Feinabstimmung von LFM2.5-Modellen. Das 1.6B-Modell passt bequem auf eine kostenlose Colab T4 GPU. Das Training ist 2x schneller bei 50% weniger VRAM.

Kostenloses Colab-Notebook:

LFM2.5-VL-1.6B SFT LoRA Notebook

Unsloth-Konfiguration für LFM2.5

from unsloth import FastVisionModel
import torch

model, tokenizer = FastVisionModel.from_pretrained(
    model_name = "LiquidAI/LFM2.5-VL-1.6B",
    max_seq_length = 4096, 
    load_in_4bit = False, 
)

model = FastVisionModel.get_peft_model(
    model,
    finetune_vision_layers     = False, # Vorläufig auf False setzen
    finetune_language_layers   = True, # False, wenn die Sprachschichten nicht feinabgestimmt werden
    finetune_attention_modules = True, # False, wenn die Attention-Schichten nicht feinabgestimmt werden
    finetune_mlp_modules       = True, # False, wenn die MLP-Schichten nicht feinabgestimmt werden
    r = 16,         
    lora_alpha = 16,
    lora_dropout = 0,
    bias = "none",
)

Trainingseinrichtung

from unsloth.trainer import UnslothVisionDataCollator
from trl import SFTTrainer, SFTConfig

FastVisionModel.for_training(model) # Für Training aktivieren!

trainer = SFTTrainer(
    model = model,
    tokenizer = tokenizer,
    data_collator = UnslothVisionDataCollator(model, tokenizer), # Muss verwendet werden!
    train_dataset = converted_dataset,
    args = SFTConfig(
        per_device_train_batch_size = 2,
        gradient_accumulation_steps = 4,
        warmup_steps = 5,
        max_steps = 30,# num_train_epochs = 1, # Setzen Sie dies anstelle von max_steps für vollständiges Training
        learning_rate = 2e-4,
        logging_steps = 1,
        optim = "adamw_8bit",
        weight_decay = 0.001,
        lr_scheduler_type = "linear",
        seed = 3407,
        output_dir = "outputs",
        report_to = "none",     # Für Weights and Biases
        remove_unused_columns = False,
        dataset_text_field = "",
        dataset_kwargs = {"skip_prepare_dataset": True},
        max_length = 2048,
    ),
)

trainer.train()

Speichern und Export

# LoRA-Adapter speichern
model.save_pretrained("lfm25_lora")
tokenizer.save_pretrained("lfm25_lora")

# Zusammenführen und in 16bit speichern
model.save_pretrained_merged("lfm25_merged", tokenizer, save_method="merged_16bit")

# Export nach GGUF
model.save_pretrained_gguf("lfm25_gguf", tokenizer, quantization_method="q4_k_m")

📊 Benchmarks

LFM2.5-VL-1.6B liefert erstklassige Leistung:

Modell

MMStar

MM-IFEval

BLINK

InfoVQA (Val)

OCRBench (v2)

RealWorldQA

MMMU (Val)

MMMB (Durchschnitt)

Multilinguales MMBench (Durchschnitt)

LFM2.5-VL-1.6B

50.67

52.29

48.82

62.71

41.44

64.84

40.56

76.96

65.90

LFM2-VL-1.6B

49.87

46.35

44.50

58.35

35.11

65.75

39.67

72.13

60.57

InternVL3.5-1B

50.27

36.17

44.19

60.99

33.53

57.12

41.89

68.93

58.32

FastVLM-1.5B

53.13

24.99

43.29

23.92

26.61

61.56

38.78

64.84

50.89

📚 Ressourcen

VorherigeDeepSeek-R1-0528 NächsteMagistral

Zuletzt aktualisiert vor 21 Tagen

War das hilfreich?

hashtag⚙️ Gebrauchsanleitung

hashtagChat-Vorlagenformat

hashtagWerkzeugnutzung

hashtag🖥️ LFM2.5-1.2B-Instruct ausführen

hashtag📖 llama.cpp Anleitung (GGUF)

hashtag🦥 Feinabstimmung von LFM2.5 mit Unsloth

hashtagUnsloth-Konfiguration für LFM2.5

hashtagTrainingseinrichtung

hashtagSpeichern und Export

hashtag🎉 llama-server Bereitstellung & Deployment

hashtag📊 Benchmarks

hashtag💧 Liquid LFM2.5-1.2B-VL Anleitung

hashtag⚙️ Nutzungsanleitung

hashtagChat-Vorlagenformat

hashtag🖥️ LFM2.5-VL-1.6B ausführen

hashtag📖 llama.cpp Anleitung (GGUF)

hashtag🦥 Feinabstimmung von LFM2.5-VL mit Unsloth

hashtagUnsloth-Konfiguration für LFM2.5

hashtagTrainingseinrichtung

hashtagSpeichern und Export

hashtag📊 Benchmarks

hashtag📚 Ressourcen

⚙️ Gebrauchsanleitung

Chat-Vorlagenformat

Werkzeugnutzung

🖥️ LFM2.5-1.2B-Instruct ausführen

📖 llama.cpp Anleitung (GGUF)

🦥 Feinabstimmung von LFM2.5 mit Unsloth

Unsloth-Konfiguration für LFM2.5

Trainingseinrichtung

Speichern und Export

🎉 llama-server Bereitstellung & Deployment

📊 Benchmarks

💧 Liquid LFM2.5-1.2B-VL Anleitung

⚙️ Nutzungsanleitung

Chat-Vorlagenformat

🖥️ LFM2.5-VL-1.6B ausführen

📖 llama.cpp Anleitung (GGUF)

🦥 Feinabstimmung von LFM2.5-VL mit Unsloth

Unsloth-Konfiguration für LFM2.5

Trainingseinrichtung

Speichern und Export

📊 Benchmarks

📚 Ressourcen