mmlw-retrieval-e5-small-onnx

ONNX-exported version of sdadas/mmlw-retrieval-e5-small for use with fastembed (ONNX Runtime backend).

model.onnx — dynamic int8 quantized model (~113 MB)
tokenizer.json, config.json — tokenizer + config (same as source model)

Embedding dimensions

384 (drop-in compat with sentence-transformers MiniLM family).

Usage with fastembed

from fastembed import TextEmbedding

TextEmbedding.add_custom_model(
    model="Infojura/mmlw-retrieval-e5-small-onnx",
    dim=384,
    sources=ModelSource(hf="Infojura/mmlw-retrieval-e5-small-onnx"),
    pooling=PoolingType.MEAN,
    normalization=True,
)

model = TextEmbedding("Infojura/mmlw-retrieval-e5-small-onnx")

# E5 requires prefixes:
query_embedding = next(model.embed(["query: Jaka jest sygnatura sprawy?"]))
passage_embeddings = list(model.embed(["passage: Sąd Okręgowy w Warszawie..."]))

Export pipeline

uv run python -m optimum.exporters.onnx \
    --model sdadas/mmlw-retrieval-e5-small \
    --task feature-extraction \
    ./fp32/
uv run python -c "
from onnxruntime.quantization import quantize_dynamic, QuantType
quantize_dynamic('./fp32/model.onnx', './int8/model.onnx', weight_type=QuantType.QInt8)
"

Exported with optimum==2.1.0, optimum-onnx==0.1.0, transformers==4.57.6.

Measured performance (on Polish legal retrieval)

Baseline sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2:

MRR on Polish SAOS corpus: 0.528 → 0.868 (+64%)
R@1: 0.438 → 0.815

See Infojura/taado-converters ADR-004 for eval harness.

License

Apache 2.0 (inherited from source model sdadas/mmlw-retrieval-e5-small). Full license: https://www.apache.org/licenses/LICENSE-2.0

Attribution

Original model by Sławomir Dadas:

@misc{dadas2024mmlw,
  title={Multilingual and multilabel extension of the Polish language models},
  author={Dadas, Sławomir},
  year={2024},
}

Downloads last month: 52

Model tree for Infojura/mmlw-retrieval-e5-small-onnx

Base model

sdadas/mmlw-retrieval-e5-small

Quantized

(1)

this model

Infojura
/

mmlw-retrieval-e5-small-onnx