---
license: mit
language:
  - en
tags:
  - slug-generation
  - onnx
  - embedding-to-text
  - url-slug
  - beam-search
library_name: onnxruntime
pipeline_tag: summarization
---

# vec2slug-v1-openai-large

Generate URL slugs directly from text embeddings, without re-feeding
source text through a language model. Designed to piggyback on embeddings
a system already has for search or deduplication.

| | |
|---|---|
| **Parameters** | 24.8M |
| **Architecture** | Transformer decoder, 6L, d=512 |
| **Input** | OpenAI `text-embedding-3-small` (1536d) |
| **Vocab** | BPE, 5000 subwords |
| **Token F1** | 0.306 |
| **ONNX size** | 95.1 MiB |
| **Inference (CPU)** | ~41ms (M-series), ~160ms (budget VPS) |

14 to 19× faster and approximately 85× cheaper than a Haiku-class LLM
call for the same task, including the cost of computing a fresh embedding.
With existing embeddings (the intended use case), approximately 2,000×
cheaper.

This is the **larger** of two variants. It achieves the best Token F1 but at 2x the inference cost of the smaller model.

See also: [Vec2Slug V1-Openai-Small](https://huggingface.co/hashintel/vec2slug-v1-openai-small)

## Quickstart

```bash
# install dependencies
pip install onnxruntime numpy

# or run directly with uv
uv run inference.py . --input embeddings.npy
```

```python
from inference import OnnxPredictor
import numpy as np

predictor = OnnxPredictor.from_dir(".")

# embeddings: [N, 1536] float32 from OpenAI text-embedding-3-small
slugs = predictor.predict(embeddings)
# ["how-neural-networks-learn", "climate-change-solutions", ...]
```

PyTorch inference (requires `torch`):

```python
from inference import PyTorchPredictor

predictor = PyTorchPredictor.from_dir(".")
slugs = predictor.predict(embeddings)
```

## Examples

Predictions on held-out test samples (beam search, width 4). The model
sees only the 1536-dim embedding, never the source text.

| Source text | Reference slug | Predicted slug |
|---|---|---|
| Children's book about astronomy and living on Mars | `can-we-live-on-mars` | `can-we-live-on-mars` |
| Teaching resources for Martin Luther King Jr. Day | `celebrating-martin-luther-king-jr-day` | `celebrating-martin-luther-king-jr-day` |
| Article about Waldorf education practices | `12-things-may-not-know-waldorf-education` | `10-things-you-didnt-know-about-waldorf-education` |

The third example illustrates the typical case: the model captures the
topic correctly but diverges in specific wording. The common failure mode
is overgeneralization rather than incoherence.

## How it works

The model is a prefix-conditioned transformer decoder. A precomputed text
embedding is linearly projected into the decoder's hidden space and placed
at position 0 as a prefix token. The decoder then autoregressively generates
BPE subword tokens that form a kebab-case URL slug.

Beam search uses bounded additive length reward with score-based optimal
stopping ([Huang et al. 2017](https://doi.org/10.18653/v1/D17-1227)). All
decoding parameters are stored in `model.json`.

## Files

| File | Description |
|---|---|
| `model.onnx` | ONNX model (forward pass only) |
| `model.json` | Sidecar: vocabulary, beam search config, stopwords |
| `model.pt` | PyTorch weights (`state_dict`) |
| `tokenizer.json` | BPE tokenizer (HuggingFace `tokenizers` format) |
| `inference.py` | Standalone inference script (`uv run` compatible) |
| `manifest.train.json` | Training configuration and results |
| `manifest.onnx.json` | Export verification (tolerance, argmax agreement) |
| `history.train.jsonl` | Training loss/metric curves |

## Training

Trained on 2.3M documents from FineWeb-Edu with slugs extracted
from source URLs. The extraction pipeline filters on language, slug format,
Gopher repetition, and token count.

BPE vocabulary (5,000 subwords) with `-` as a special token. Trained for 36 epochs with label smoothing (0.1) and position-aware EOS loss weighting. Best checkpoint at step 70,560.

## Evaluation

Evaluated on 5,000 held-out test samples using the full beam search
decoding pipeline.

| Metric | Value |
|---|---|
| Token F1 (macro) | 0.306 |
| Exact match | 2.1% |
| ROUGE-L | 0.284 |
| BERTScore F1 | 0.872 |
| Validity | 100% |
| Vocab diversity | 97.8% |

Token F1 splits both slugs on hyphens and computes set-overlap F1 (order
ignored). ROUGE-L measures the longest common subsequence and penalizes
misordered words. BERTScore computes contextual embedding similarity via
roberta-large; the floor is high (~0.82) because short English slugs are
not widely separated in that embedding space.

## Limitations

- Requires precomputed embeddings from OpenAI `text-embedding-3-small`.
  Other embedding models will produce poor results.
- Trained on English web content. Non-English or domain-specific text
  may produce generic or inaccurate slugs.
- Slugs reflect patterns in the training URLs, which include SEO-influenced
  and editorially inconsistent sources.
- The primary failure mode is overgeneralization: the model captures the
  topic but may miss specific angles or proper nouns (`asm` instead of
  `wasm` for a WebAssembly article).

## Links

- [Blog post](https://hash.dev/blog/vec2slug)
- [Training code](https://github.com/hashintel/labs)
- [Vec2Slug V1-Openai-Small](https://huggingface.co/hashintel/vec2slug-v1-openai-small)

## Citation

```bibtex
@misc{vec2slug2026,
  title={vec2slug: URL Slug Generation from Text Embeddings},
  author={Mahmoud, Bilal and {HASH}},
  year={2026},
  url={https://github.com/hashintel/labs}
}
```