🥝Kimi K2.5: Anleitung zum lokalen Betrieb

Anleitung zum Ausführen von Kimi-K2.5 auf deinem eigenen lokalen Gerät!

Kimi-K2.5 ist das neue Modell von Moonshot, das SOTA-Leistung in Vision-, Coding-, Agentic- und Chat-Aufgaben erreicht. Das 1T-Parameter-Hybrid-Reasoning-Modell benötigt 600 GB Festplattenspeicher, während die quantisierte Unsloth Dynamic 1,8-Bit Version dies auf 240 GB reduziert (-60% Größe): Kimi-K2.5-GGUF

Alle Uploads verwenden Unsloth Dynamic 2.0 für SOTA Aider- und 5-shot MMLU-Leistung. Siehe, wie unsere Dynamic 1–2 Bit GGUFs bei Coding-Benchmarks.

⚙️ Empfohlene Anforderungen

Du brauchst >240 GB Festplattenspeicher um das 1-Bit-Quant laufen zu lassen!

Die einzige Voraussetzung ist Festplattenspeicher + RAM + VRAM ≥ 240 GB. Das bedeutet, du musst nicht so viel RAM oder VRAM (GPU) haben, um das Modell auszuführen, aber es wird deutlich langsamer sein.

Das 1,8-Bit (UD-TQ1_0) Quant läuft auf einer einzelnen 24GB-GPU, wenn du alle MoE-Schichten in den System-RAM (oder eine schnelle SSD) auslagerst. Mit ~256GB RAM sind etwa ~10 Tokens/s zu erwarten. Das vollständige Kimi K2.5-Modell ist 630GB groß und benötigt typischerweise mindestens 4× H200-GPUs.

Wenn das Modell passt, erhältst du >40 Tokens/s bei Verwendung einer B200.

Um das Modell in nahezu voller Präzisionzu betreiben, kannst du die 4-Bit- oder 5-Bit-Quants verwenden. Du kannst auch eine höhere Bit-Breite verwenden, um auf der sicheren Seite zu sein.

Für starke Leistung strebe >240GB einheitlichen Speicher (oder kombinierten RAM+VRAM) an, um 10+ Tokens/s zu erreichen. Wenn du darunter liegst, funktioniert es zwar, aber die Geschwindigkeit wird sinken (llama.cpp kann weiterhin über mmap/disk offload laufen) und kann von ~10 Tokens/s auf <2 Tokens/s fallen.

Wir empfehlen UD-Q2_K_XL (375GB) als guten Kompromiss zwischen Größe und Qualität. Faustregel: RAM+VRAM ≈ Quant-Größe; ansonsten funktioniert es weiterhin, nur langsamer wegen Auslagerung.

🥝 Kimi K2.5 Anleitung ausführen

Kimi-K2.5 benötigt unterschiedliche Sampling-Parameter für verschiedene Anwendungsfälle.

Derzeit gibt es keine Vision-Unterstützung für das Modell, aber hoffentlich wird llama.cpp dies bald unterstützen.

Um das Modell in voller Präzision auszuführen, musst du nur die 4-Bit- oder 5-Bit-Dynamic-GGUFs (z. B. UD_Q4_K_XL) verwenden, da das Modell ursprünglich im INT4-Format veröffentlicht wurde.

Du kannst eine höherbitige Quantisierung wählen, nur um auf der sicheren Seite zu sein bei kleinen Quantisierungsunterschieden, aber in den meisten Fällen ist das unnötig.

Unterschiede von Kimi K2.5 zu Kimi K2 Thinking

Beide Modelle verwenden eine modifizierte DeepSeek V3 MoE-Architektur.
rope_scaling.beta_fast K2.5 verwendet 32.0 vs K2 Thinking's 1.0.
MoonViT ist der native Auflösungs-Vision-Encoder mit 200M Parametern. Er ist ähnlich demjenigen, der in Kimi-VL-A3B-Instruct verwendet wird.

🌙 Gebrauchsanleitung:

Laut Moonshot AI sind dies die empfohlenen Einstellungen für die Kimi K2.5 Inferenz:

Standardeinstellungen (Sofortmodus)

Denkmodus

temperature = 0.6

temperature = 1.0

top_p = 0.95

min_p = 0.01

Setzen Sie die Temperatur 1.0 um Wiederholungen und Inkohärenz zu reduzieren.
Vorgeschlagene Kontextlänge = 98.304 (bis zu 256K)
Hinweis: Die Verwendung unterschiedlicher Tools kann unterschiedliche Einstellungen erfordern

Wir empfehlen, min_p auf 0,01 um das Auftreten unwahrscheinlicher Tokens mit niedrigen Wahrscheinlichkeiten zu unterdrücken. Und deaktiviere oder setze repeat penalty = 1.0 falls nötig.

Chat-Vorlage für Kimi K2.5

Ausführen tokenizer.apply_chat_template([{"role": "user", "content": "What is 1+1?"},]) ergibt:

<|im_system|>system<|im_middle|>You are Kimi, an AI assistant created by Moonshot AI.<|im_end|><|im_user|>user<|im_middle|>What is 1+1?<|im_end|><|im_assistant|>assistant<|im_middle|><think>

✨ Kimi K2.5 in llama.cpp ausführen

Für diese Anleitung verwenden wir das kleinste 1-Bit-Quant mit 240GB Größe. Du kannst die Quantisierungsart gern auf 2-Bit, 3-Bit usw. ändern. Um das Modell in nahezu voller Präzisionzu betreiben, kannst du die 4-Bit- oder 5-Bit-Quants verwenden. Du kannst auch eine höhere Bit-Breite verwenden, um auf der sicheren Seite zu sein.

Holen Sie sich die neueste llama.cpp auf GitHub hier. Sie können auch den unten stehenden Build-Anweisungen folgen. Ändern Sie -DGGML_CUDA=ON zu -DGGML_CUDA=OFF wenn Sie keine GPU haben oder nur CPU-Inferenz wünschen.

apt-get update
apt-get install pciutils build-essential cmake curl libcurl4-openssl-dev -y
git clone https://github.com/ggml-org/llama.cpp
cmake llama.cpp -B llama.cpp/build \
    -DBUILD_SHARED_LIBS=OFF -DGGML_CUDA=ON
cmake --build llama.cpp/build --config Release -j --clean-first --target llama-cli llama-mtmd-cli llama-server llama-gguf-split
cp llama.cpp/build/bin/llama-* llama.cpp

Wenn Sie llama.cpp direkt zum Laden von Modellen kannst du Folgendes tun: (:UD-TQ1_0) ist der Quantisierungstyp. Du kannst auch über Hugging Face herunterladen (Punkt 3). Das ist ähnlich wie ollama run . Verwenden Sie export LLAMA_CACHE="folder" um zu erzwingen, dass llama.cpp an einem bestimmten Ort zu speichern.

LLAMA_SET_ROWS=1 macht llama.cpp ein wenig schneller! Benutze es! --fit on passt Modelle automatisch optimal auf all deine GPUs und CPUs an.

export LLAMA_CACHE="unsloth/Kimi-K2.5-GGUF"
LLAMA_SET_ROWS=1 ./llama.cpp/llama-cli \
    -hf unsloth/Kimi-K2.5-GGUF:UD-TQ1_0\
    --temp 1.0 \
    --min-p 0.01 \
    --top-p 0.95 \
    --ctx-size 16384 \
    --seed 3407

--fit on wird das Modell automatisch an dein System anpassen. Wenn du nicht --fit on verwendest und du ungefähr 360GB kombinierten GPU-Speicher hast, entferne -ot ".ffn_.*_exps.=CPU" um maximale Geschwindigkeit zu erzielen.

Verwenden Sie --fit on für automatisches Anpassen auf GPUs und CPUs. Wenn das nicht funktioniert, siehe unten:

Bitte probieren Sie aus -ot ".ffn_.*_exps.=CPU" um alle MoE-Schichten auf die CPU auszulagern! Dies ermöglicht es effektiv, alle Nicht-MoE-Schichten auf einer GPU unterzubringen und die Generationsgeschwindigkeit zu verbessern. Sie können den Regex-Ausdruck anpassen, um mehr Schichten auszulagern, wenn Sie mehr GPU-Kapazität haben.

Wenn Sie etwas mehr GPU-Speicher haben, versuchen Sie -ot ".ffn_(up|down)_exps.=CPU" Dies lagert up- und down-Projektions-MoE-Schichten aus.

Versuchen Sie -ot ".ffn_(up)_exps.=CPU" wenn Sie noch mehr GPU-Speicher haben. Dies lagert nur up-Projektions-MoE-Schichten aus.

Und schließlich alle Schichten auslagern über -ot ".ffn_.*_exps.=CPU" Dies verwendet am wenigsten VRAM.

Sie können den Regex auch anpassen, zum Beispiel -ot "\.(6|7|8|9|[0-9][0-9]|[0-9][0-9][0-9])\.ffn_(gate|up|down)_exps.=CPU" bedeutet, Gate-, Up- und Down-MoE-Schichten auszulagern, jedoch nur ab der 6. Schicht.

Laden Sie das Modell herunter über (nach Installation von pip install huggingface_hub hf_transfer ). Wir empfehlen die Verwendung unseres 2-Bit-Dynamic-Quants UD-Q2_K_XL, um Größe und Genauigkeit auszugleichen. Alle Versionen unter: huggingface.co/unsloth/Kimi-K2.5-GGUF

pip install -U huggingface_hub
hf download unsloth/Kimi-K2.5-GGUF \
    --local-dir unsloth/Kimi-K2.5-GGUF \
    --include "*UD-TQ1_0*" # Verwende "*UD-Q2_K_XL*" für Dynamic 2bit

Wenn du feststellst, dass Downloads bei 90 bis 95% oder so stecken bleiben, siehe bitte unsere Fehlerbehebungsanleitung.

Führe beliebige Prompts aus.
Bearbeiten --ctx-size 16384 für Kontextlänge. Du kannst dies auch weglassen für automatische Kontextlängen-Erkennung via --fit on

LLAMA_SET_ROWS=1 ./llama.cpp/llama-cli \
    --model unsloth/Kimi-K2.5-GGUF/UD-TQ1_0/Kimi-K2.5-UD-TQ1_0-00001-of-00005.gguf \
    --temp 1.0 \
    --min_p 0.01 \
    --top-p 0.95 \
    --ctx-size 16384 \
    --seed 3407

Als Beispiel versuche: "Erstelle ein Flappy Bird Spiel in HTML", und du erhältst:

<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <title>Flappy Bird</title>
    <style>
        body {
            margin: 0;
            padding: 0;
            background: #222;
            display: flex;
            justify-content: center;
            align-items: center;
            height: 100vh;
            font-family: 'Segoe UI', sans-serif;
            overflow: hidden;
            touch-action: none;
        }
        
        #game-container {
            position: relative;
            width: 400px;
            height: 600px;
            background: linear-gradient(to bottom, #70c5ce 0%, #70c5ce 80%, #c23810 80%, #c23810 100%);
            box-shadow: 0 0 20px rgba(0,0,0,0.5);
            overflow: hidden;
        }
        
        canvas {
            display: block;
        }
        
        .overlay {
            position: absolute;
            top: 50%;
            left: 50%;
            transform: translate(-50%, -50%);
            text-align: center;
            color: white;
            text-shadow: 2px 2px 0 #000;
            font-weight: bold;
            pointer-events: none;
        }
        
        .game-title {
            font-size: 48px;
            margin-bottom: 20px;
        }
        
        .score-display {
            font-size: 36px;
            margin-bottom: 10px;
        }
        
        .best-score {
            font-size: 24px;
            color: #ffe;
        }
        
        .instruction {
            font-size: 20px;
            animation: pulse 1s infinite;
        }
        
        @keyframes pulse {
            0%, 100% { opacity: 1; }
            50% { opacity: 0.5; }
        }
        
        .hidden { display: none; }
    </style>
</head>
<body>
    <div id="game-container">
        <canvas id="canvas" width="400" height="600"></canvas>
        
        <!-- Start Screen -->
        <div id="start-screen" class="overlay">
            <div class="game-title">FLAPPY BIRD</div>
            <div class="instruction">Click or Space to Fly</div>
        </div>
        
        <!-- Game Over Screen -->
        <div id="game-over-screen" class="overlay hidden">
            <div class="game-title">GAME OVER</div>
            <div class="score-display">Score: <span id="final-score">0</span></div>
            <div class="best-score">Best: <span id="best-score">0</span></div>
            <div class="instruction">Click to Restart</div>
        </div>
        
        <!-- Score Counter -->
        <div id="current-score" class="overlay hidden" style="top: 10%; font-size: 72px; color: white; text-shadow: 4px 4px 0 #000;">
            0
        </div>
    </div>

    <script>
        const canvas = document.getElementById('canvas');
        const ctx = canvas.getContext('2d');
        
        // Spielkonstanten
        const GRAVITY = 0.4;
        const JUMP_STRENGTH = -7;
        const PIPE_SPEED = 3;
        const PIPE_SPAWN_RATE = 120; // frames
        const PIPE_GAP = 120;
        
        // Spielzustand
        let bird = { x: 50, y: 200, velocity: 0, radius: 15, wingState: 0 };
        let pipes = [];
        let score = 0;
        let bestScore = localStorage.getItem('flappyBest') || 0;
        let frameCount = 0;
        let isGameOver = false;
        let isPlaying = false;
        
        // DOM-Elemente
        const startScreen = document.getElementById('start-screen');
        const gameOverScreen = document.getElementById('game-over-screen');
        const currentScoreDisplay = document.getElementById('current-score');
        const finalScoreEl = document.getElementById('final-score');
        const bestScoreEl = document.getElementById('best-score');
        
        // Eingabeverarbeitung
        function handleInput(e) {
            if (!isPlaying) {
                if (isGameOver) {
                    resetGame();
                }
                startGame();
            } else if (!isGameOver) {
                bird.velocity = JUMP_STRENGTH;
                bird.wingState = 1;
            }
        }
        
        document.addEventListener('keydown', (e) => {
            if (e.code === 'Space' || e.code === 'ArrowUp') handleInput(e);
        });
        canvas.addEventListener('pointerdown', handleInput);
        
        function startGame() {
            isPlaying = true;
            isGameOver = false;
            startScreen.classList.add('hidden');
            currentScoreDisplay.classList.remove('hidden');
            resetGameState();
            gameLoop();
        }
        
        function resetGameState() {
            bird = { x: 50, y: 200, velocity: 0, radius: 15, wingState: 0 };
            pipes = [];
            score = 0;
            frameCount = 0;
            currentScoreDisplay.textContent = score;
        }
        
        function resetGame() {
            isGameOver = false;
            isPlaying = true;
            gameOverScreen.classList.add('hidden');
            currentScoreDisplay.classList.remove('hidden');
            resetGameState();
            gameLoop();
        }
        
        function spawnPipe() {
            const minHeight = 100;
            const maxHeight = 400;
            const topHeight = Math.floor(Math.random() * (maxHeight - minHeight + 1) + minHeight);
            const bottomHeight = canvas.height - topHeight - PIPE_GAP;
            
            pipes.push({
                x: canvas.width,
                topHeight: topHeight,
                bottomY: topHeight + PIPE_GAP,
                bottomHeight: bottomHeight,
                passed: false
            });
        }
        
        function update() {
            if (isGameOver) return;
            
            // Vogel-Physik
            bird.velocity += GRAVITY;
            bird.y += bird.velocity;
            
            // Boden-/Decken-Kollision
            if (bird.y + bird.radius > canvas.height || bird.y - bird.radius < 0) {
                gameOver();
                return;
            }
            
            // Rohr-Erzeugung
            frameCount++;
            if (frameCount % PIPE_SPAWN_RATE === 0) {
                spawnPipe();
            }
            
            // Rohrbewegung und Kollision
            for (let i = pipes.length - 1; i >= 0; i--) {
                const pipe = pipes[i];
                pipe.x -= PIPE_SPEED;
                
                // Entferne Bildschirmausgeblendete Rohre
                if (pipe.x + 60 < 0) {
                    pipes.splice(i, 1);
                    continue;
                }
                
                // Kollisionsprüfung (vereinfachtes Rechteck-Kreis-Modell)
                const pipeWidth = 60;
                const pipeX = pipe.x;
                const pipeLeft = pipeX;
                const pipeRight = pipeX + pipeWidth;
                
                // Vogel ist Kreis, Rohre sind Rechtecke
                const birdLeft = bird.x - bird.radius + 4; // +4 für Schnabel-Offset
                const birdRight = bird.x + bird.radius + 2;
                const birdTop = bird.y - bird.radius;
                const birdBottom = bird.y + bird.radius;
                
                // Horizontale Kollisionsprüfung
                if (birdRight > pipeLeft && birdLeft < pipeRight) {
                    // Kollision mit oberem Rohr
                    if (birdTop < pipe.topHeight) {
                        gameOver();
                        return;
                    }
                    // Kollision mit unterem Rohr
                    if (birdBottom > pipe.bottomY) {
                        gameOver();
                        return;
                    }
                }
                
                // Punkte-Zählung
                if (pipe.x + pipeWidth < bird.x && !pipe.passued) {
                    pipe.passed = true;
                    score++;
                    currentScoreDisplay.textContent = score;
                }
            }
            
            // Flügel animieren
            if (bird.wingState > 0) {
                bird.wingState = (bird.wingState + 0.2) % 2;
            }
        }
        
        function draw() {
            // Leere das Canvas
            ctx.clearRect(0, 0, canvas.width, canvas.height);
            
            // Rohre zeichnen
            pipes.forEach(pipe => {
                // Oberes Rohr
                ctx.fillStyle = '#46c';
                ctx.fillRect(pipe.x, 0, 60, pipe.topHeight);
                ctx.fillStyle = '#34a';
                ctx.fillRect(pipe.x, pipe.topHeight - 20, 60, 20); // Kappe
                
                // Unteres Rohr
                ctx.fillStyle = '#46c';
                ctx.fillRect(pipe.x, pipe.bottomY, 60, canvas.height - pipe.bottomY);
                ctx.fillStyle = '#34a';
                ctx.fillRect(pipe.x, pipe.bottomY - 20, 60, 20); // Kappe
            });
            
            // Vogel zeichnen (Kreis mit Schnabel)
            ctx.fillStyle = '#e3bc4e';
            ctx.beginPath();
            ctx.arc(bird.x, bird.y, bird.radius, 0, Math.PI * 2);
            ctx.fill();
            
            // Schnabel
            ctx.fillStyle = '#e04c4c';
            ctx.beginPath();
            ctx.moveTo(bird.x + bird.radius - 4, bird.y - 4);
            ctx.lineTo(bird.x + bird.radius + 10, bird.y);
            ctx.lineTo(bird.x + bird.radius - 4, bird.y + 4);
            ctx.fill();
            
            // Augen
            ctx.fillStyle = 'black';
            ctx.beginPath();
            ctx.arc(bird.x + 5, bird.y - 6, 3, 0, Math.PI * 2);
            ctx.fill();
            
            // Flügel
            ctx.fillStyle = '#c4a';
            ctx.beginPath();
            ctx.ellipse(bird.x - 5, bird.y + 5, 10, 6, 0, 0, Math.PI * 2);
            ctx.fill();
        }
        
        function gameOver() {
            isGameOver = true;
            isPlaying = false;
            
            // Bestenstand aktualisieren
            if (score > bestScore) {
                bestScore = score;
                localStorage.setItem('flappyBest', bestScore);
            }
            
            // Game-Over-Bildschirm anzeigen
            currentScoreDisplay.classList.add('hidden');
            gameOverScreen.classList.remove('hidden');
            finalScoreEl.textContent = score;
            bestScoreEl.textContent = bestScore;
        }
        
        function gameLoop() {
            if (!isPlaying) return;
            
            update();
            draw();
            requestAnimationFrame(gameLoop);
        }
        
        // Erste Zeichnung
        draw();
    </script>
</body>
</html>

✨ Mit llama-server und OpenAIs Completion-Bibliothek bereitstellen

Die Verwendung von --kv-unified kann das Inferenz-Serving in llama.cpp beschleunigen! Siehe https://www.reddit.com/r/LocalLLaMA/comments/1qnwa33/glm_47_flash_huge_performance_improvement_with_kvu/

Nachdem du llama.cpp wie in Kimi K2.5installiert hast, kannst du das Folgende verwenden, um einen OpenAI-kompatiblen Server zu starten:

LLAMA_SET_ROWS=1 ./llama.cpp/llama-server \
    --model unsloth/Kimi-K2.5-GGUF/UD-TQ1_0/Kimi-K2.5-UD-TQ1_0-00001-of-00005.gguf \
    --special \
    --alias "unsloth/Kimi-K2.5" \
    --min_p 0.01 \
    --ctx-size 16384 \
    --port 8001 \
    --kv-unified

Verwenden Sie anschließend die OpenAI-Python-Bibliothek nachdem Sie pip install openai :

from openai import OpenAI
import json
openai_client = OpenAI(
    base_url = "http://127.0.0.1:8001/v1",
    api_key = "sk-no-key-required",
)
completion = openai_client.chat.completions.create(
    model = "unsloth/Kimi-K2.5",
    messages = [{"role": "user", "content": "What is 1+1?"},],
)
print(completion.choices[0].message.content)

Und wir erhalten:

Und im anderen llama-server-Fenster:

📊 Benchmarks

Unten kannst du weitere Benchmarks in Tabellenform sehen:

Reasoning & Wissen

Benchmark

Kimi K2.5

GPT-5.2

Claude 4.5 Opus

Gemini 3 Pro

DeepSeek V3.2

Qwen3-VL-235B-A22B-Thinking

HLE-Full

30.1

34.5

30.8

37.5

25.1†

HLE-Full (mit Tools)

50.2

45.5

43.2

45.8

40.8†

AIME 2025

96.1

100

92.8

95.0

93.1

HMMT 2025 (Feb)

95.4

99.4

92.9*

97.3*

92.5

IMO-AnswerBench

81.8

86.3

78.5*

83.1*

78.3

GPQA-Diamond

87.6

92.4

87.0

91.9

82.4

MMLU-Pro

87.1

86.7*

89.3*

90.1

85.0

Bild & Video

Benchmark

Kimi K2.5

GPT-5.2

Claude 4.5 Opus

Gemini 3 Pro

DeepSeek V3.2

Qwen3-VL-235B-A22B-Thinking

MMMU-Pro

78.5

79.5*

74.0

81.0

69.3

CharXiv (RQ)

77.5

82.1

67.2*

81.4

66.1

MathVision

84.2

83.0

77.1*

86.1*

74.6

MathVista (mini)

90.1

82.8*

80.2*

89.8*

85.8

ZeroBench

ZeroBench (mit Tools)

12*

OCRBench

92.3

80.7*

86.5*

90.3*

87.5

OmniDocBench 1.5

88.8

85.7

87.7*

88.5

82.0*

InfoVQA (val)

92.6

84*

76.9*

57.2*

89.5

SimpleVQA

71.2

55.8*

69.7*

56.8*

WorldVQA

46.3

28.0

36.8

47.4

23.5

VideoMMMU

86.6

85.9

84.4*

87.6

80.0

MMVU

80.4

80.8*

77.3

77.5

71.1

MotionBench

70.4

64.8

60.3

70.3

VideoMME

87.4

86.0*

88.4*

79.0

LongVideoBench

79.8

76.5*

67.2*

77.7*

65.6*

LVBench

75.9

73.5*

63.6

Coding

Benchmark

Kimi K2.5

GPT-5.2

Claude 4.5 Opus

Gemini 3 Pro

DeepSeek V3.2

Qwen3-VL-235B-A22B-Thinking

SWE-Bench Verifiziert

76.8

80.0

80.9

76.2

73.1

SWE-Bench Pro

50.7

55.6

55.4*

SWE-Bench Mehrsprachig

73.0

72.0

77.5

65.0

70.2

Terminal Bench 2.0

50.8

54.0

59.3

54.2

46.4

PaperBench

63.5

63.7*

72.9*

47.1

CyberGym

41.3

50.6

39.9*

17.3*

SciCode

48.7

52.1

49.5

56.1

38.9

OJBench (cpp)

57.4

54.6*

68.5*

54.7*

LiveCodeBench (v6)

85.0

82.2*

87.4*

83.3

Langer Kontext

Benchmark

Kimi K2.5

GPT-5.2

Claude 4.5 Opus

Gemini 3 Pro

DeepSeek V3.2

Qwen3-VL-235B-A22B-Thinking

Longbench v2

61.0

54.5*

64.4*

68.2*

59.8*

AA-LCR

70.0

72.3*

71.3*

65.3*

64.3*

Agentic Search

Benchmark

Kimi K2.5

GPT-5.2

Claude 4.5 Opus

Gemini 3 Pro

DeepSeek V3.2

Qwen3-VL-235B-A22B-Thinking

BrowseComp

60.6

65.8

37.0

37.8

51.4

BrowseComp (mit ctx-Management)

74.9

65.8

57.8

59.2

67.6

BrowseComp (Agent Swarm)

78.4

WideSearch (item-f1)

72.7

76.2*

57.0

32.5*

WideSearch (item-f1 Agent Swarm)

79.0

DeepSearchQA

77.1

71.3*

76.1*

63.2*

60.9*

FinSearchCompT2&T3

67.8

66.2*

49.9

59.1*

Seal-0

57.4

45.0

47.7*

45.5*

49.5*

Anmerkungen

* = Wertung, neu bewertet von den Autoren (vorher nicht öffentlich verfügbar).
† = DeepSeek V3.2 Wertung entspricht seinem Text-only-Subset (wie in den Fußnoten vermerkt).
- = nicht bewertet / nicht verfügbar.

VorherigeGLM-4.7-Flash Nächstegpt-oss

Zuletzt aktualisiert vor 1 Tag

War das hilfreich?

hashtag⚙️ Empfohlene Anforderungen

hashtag🥝 Kimi K2.5 Anleitung ausführen

hashtagUnterschiede von Kimi K2.5 zu Kimi K2 Thinking

hashtag🌙 Gebrauchsanleitung:

hashtagChat-Vorlage für Kimi K2.5

hashtag✨ Kimi K2.5 in llama.cpp ausführen

hashtag✨ Mit llama-server und OpenAIs Completion-Bibliothek bereitstellen

hashtag📊 Benchmarks

hashtagReasoning & Wissen

hashtagBild & Video

hashtagCoding

hashtagLanger Kontext

hashtagAgentic Search

hashtagAnmerkungen

⚙️ Empfohlene Anforderungen

🥝 Kimi K2.5 Anleitung ausführen

Unterschiede von Kimi K2.5 zu Kimi K2 Thinking

🌙 Gebrauchsanleitung:

Chat-Vorlage für Kimi K2.5

✨ Kimi K2.5 in llama.cpp ausführen

✨ Mit llama-server und OpenAIs Completion-Bibliothek bereitstellen

📊 Benchmarks

Reasoning & Wissen

Bild & Video

Coding

Langer Kontext

Agentic Search

Anmerkungen