Qwen3.5 GGUF Benchmarks
See how Unsloth Dynamic GGUFs perform + analysis of perplexity, KL divergence & MXFP4.
We updated all Qwen3.5 Unsloth Dynamic quants being SOTA on nearly all bits. We did over 150 KL Divergence benchmarks, totally 9TB of GGUFs. We uploaded all research artifacts.
We also fixed a tool calling chat template issue (affects all quant uploaders and types regardless where you're using it or where it's from).
Mar 5 Update: Redownload Qwen3.5-35B, 27B, 122B and 397B.
All GGUFs now updated with an improved quantization algorithm.
All use our new imatrix data. See some improvements in chat, coding, long context, and tool-calling use-cases.
New benchmarks for Qwen3.5-122B-A10B and 35-A3B out now!
Want to see how to run the model + hardware requirements? Read our inference guide.
99.9% KL Divergence shows SOTA on Pareto Frontier for Unsloth Dynamic Q4_K_XL, IQ3_XXS etc.:


Imatrix definitely helps reduce KLD & PPL, at the cost of 5-10% slower inference.
We tested our GGUFs against many other providers
Quantizing ssm_out (Mamba layers) is not a good idea, and ffn_down_exps.
Retiring MXFP4 from all GGUF quants: Q2_K_XL, Q3_K_XL and Q4_K_XL, except for pure MXFP4_MOE.

1) Some tensors are very sensitive to quantization
We made over 9TB of research artifacts available for the community to investigate further on our Experiments page. It includes KLD metrics and all 121 configs we tested.
We varied bit widths across each tensor type, and generated a best and worst Pareto Frontier plot below vs 99.9% KLD.
For the best items to quantize, ffn_up_exps and ffn_gate_exps are generally ok to quantize to 3bit. ffn_down_exps is slightly more sensitive.
For the worst items, ssm_out dramatically increases KLD and the disk space savings is minuscule. For example, ssm_out at q2_k does dramatically worse. Quantizing any attn_* is especially sensitive for hybrid architectures, and so leaving them in higher precision works well.

Tensor type vs bits on 99.9% KL Divergence
We plot all quant levels vs 99.9% KLD, and sort from worst KLD to best. Quantizing ffn_* layers too heavily down is not a good idea.
However, some bit widths are good, especially 3bit. - for example leaving ffn_* (down, up, gate) at around iq3_xxs seems to be best compromise on disk space and 99.9% KLD change. 2 bits cause more degradation.

MXFP4 is much worse on many tensors - attn_gate, attn_q, ssm_beta, ssm_alpha using MXFP4 is not a good idea, and rather Q4_K is better - also MXFP4 uses 4.25 bits per weight, whilst Q4_K uses 4.5 bits per weight. It's better to use Q4_K than MXFP4 when choosing between them.


2) Imatrix works very well
Imatrix definitely helps weight the quantization process in the right way. For example previously ssm_out at 2bits was really bad, however imatrix reduces the 99.9% KLD by a lot.
Imatrix generally helps on lower bits, and works on all quants and bit widths.

I quants (iq3_xxs, iq2_s etc) makes inference 5-10% slower, they're definitely better in terms of efficiency, but there is a tradeoff.
mxfp4
1978.69
90.67
q4_k
1976.44
90.38
q3_k
1972.61
91.36
q6_k
1964.55
90.50
q2_k
1964.20
90.77
q8_0
1964.17
90.33
q5_k
1947.74
90.72
iq3_xxs
2030.94
85.68
iq2_xxs
1997.64
85.79
iq3_s
1990.12
84.37
iq2_xs
1967.85
85.19
iq2_s
1952.50
85.04
3) Perplexity & KLD can be misleading
Perplexity and KLD can be misleading as they’re highly influenced by calibration. Most GGUFs are evaluated on Wiki-test with 512 context windows, so results shift a lot if the GGUF’s imatrix calibration set includes Wikipedia-like and 512 context samples (as most GGUFs do). That’s why our GGUFs sometimes show higher perplexity as our imatrix data rather uses long-context chat and tool-calling examples instead.

Benjamin’s recent MiniMax‑M2.5 analysis shows a case how perplexity and KLD can be very misleading. Unsloth Dynamic IQ2_XXS performs better than AesSedai’s IQ3_S on real world evals (LiveCodeBench v6, MMLU Pro) despite being 11GB smaller. Yet, AesSedai’s perplexity and KLD benchmarks suggest the opposite. (PPL: 0.3552 vs 0.2441; KLD: 9.0338 vs 8.2849 - lower is better).


This mismatch shows how lower perplexity or KLD doesn’t necessarily translate to better real-world performance. The graph also shows UD‑Q4-K‑XL outperforming other Q4 quants, while being ~8GB smaller. This doesn’t mean perplexity or KLD is useless, as they provide a rough signal. So, going forward, we’ll publish perplexity and KLD for every quant so the community has some sort of reference.
4) March 5th 2026 Update - more robustness
We further enhanced our quantization method for Qwen3.5 MoEs to reduce Maximum KLD directly. 99.9% is what is generally used, but for massive outliers, Maximum KLD can be useful. Our New method generally pushes the Maximum KLD quite a much down vs the pre March 5th update.

UD-Q2_K_XL
12.0
11.3
8.237
8.155
UD-Q3_K_XL
16.1
15.5
5.505
5.146
UD-Q4_K_XL
19.2
20.7 (+7.8%)
5.894
2.877 (-51%)
UD-Q5_K_XL
23.2
24.6 (+6%)
5.536
3.210 (-42%)
Full Benchmarks
AesSedai
IQ3_S
12.65
6.9152
1.8669
0.0613
AesSedai
IQ4_XS
16.4
6.6447
0.8067
0.0235
AesSedai
Q4_K_M
20.62
6.5665
0.3171
0.0096
AesSedai
Q5_K_M
24.45
6.5356
0.21
0.0058
Ubergarm
Q4_0
19.79
6.5784
0.4829
0.0142
Unsloth
IQ2_XXS
9.09
7.716
4.2221
0.1846
Unsloth
Q2_K_XL
12.04
7.0438
2.9092
0.097
Unsloth
IQ3_XXS
13.12
6.7829
1.5296
0.0501
Unsloth
IQ3_S
14.13
6.7715
1.4193
0.0457
Unsloth
Q3_K_M
15.54
6.732
0.9726
0.0324
Unsloth
Q3_K_XL
16.06
6.7245
0.9539
0.0308
Unsloth
MXFP4_MOE
18.17
6.6
0.7789
0.0272
Unsloth
Q4_K_M
18.49
6.6053
0.5478
0.0192
Unsloth
Q4_K_L
18.82
6.5905
0.4828
0.015
Unsloth
Q4_K_XL
19.17
6.5918
0.4097
0.0137
Unsloth
Q5_K_XL
23.22
6.5489
0.236
0.0069
Unsloth
Q6_K_S
26.56
6.5456
0.2226
0.0065
Unsloth
Q6_K_XL
28.22
6.5392
0.1437
0.0041
Unsloth
Q8_K_XL
36.04
6.5352
0.1033
0.0026
bartowski
Qwen_IQ2_XXS
8.15
9.3427
6.0607
0.3457
bartowski
Qwen_Q2_K_L
11.98
7.5504
3.8095
0.1559
bartowski
Qwen_IQ3_XXS
12.94
7.0938
2.1563
0.0851
bartowski
Qwen_Q3_K_M
14.95
6.772
1.7779
0.0585
bartowski
Qwen_Q3_K_XL
15.97
6.8245
1.7516
0.0627
bartowski
Qwen_IQ4_XS
17.42
6.6234
0.7265
0.0234
bartowski
Qwen_Q4_K_M
19.77
6.6097
0.5771
0.0182
bartowski
Qwen_Q5_K_M
23.11
6.5828
0.3549
0.0106
noctrex
MXFP4_MOE_BF16
20.55
6.5948
0.7939
0.0248
noctrex
MXFP4_MOE_F16
20.55
6.5937
0.7614
0.0247
Last updated
Was this helpful?

