--- license: mit language: - en tags: - slug-generation - onnx - embedding-to-text - url-slug - beam-search library_name: onnxruntime pipeline_tag: summarization --- # vec2slug-v1-openai-large Generate URL slugs directly from text embeddings, without re-feeding source text through a language model. Designed to piggyback on embeddings a system already has for search or deduplication. | | | |---|---| | **Parameters** | 24.8M | | **Architecture** | Transformer decoder, 6L, d=512 | | **Input** | OpenAI `text-embedding-3-small` (1536d) | | **Vocab** | BPE, 5000 subwords | | **Token F1** | 0.306 | | **ONNX size** | 95.1 MiB | | **Inference (CPU)** | ~41ms (M-series), ~160ms (budget VPS) | 14 to 19× faster and approximately 85× cheaper than a Haiku-class LLM call for the same task, including the cost of computing a fresh embedding. With existing embeddings (the intended use case), approximately 2,000× cheaper. This is the **larger** of two variants. It achieves the best Token F1 but at 2x the inference cost of the smaller model. See also: [Vec2Slug V1-Openai-Small](https://huggingface.co/hashintel/vec2slug-v1-openai-small) ## Quickstart ```bash # install dependencies pip install onnxruntime numpy # or run directly with uv uv run inference.py . --input embeddings.npy ``` ```python from inference import OnnxPredictor import numpy as np predictor = OnnxPredictor.from_dir(".") # embeddings: [N, 1536] float32 from OpenAI text-embedding-3-small slugs = predictor.predict(embeddings) # ["how-neural-networks-learn", "climate-change-solutions", ...] ``` PyTorch inference (requires `torch`): ```python from inference import PyTorchPredictor predictor = PyTorchPredictor.from_dir(".") slugs = predictor.predict(embeddings) ``` ## Examples Predictions on held-out test samples (beam search, width 4). The model sees only the 1536-dim embedding, never the source text. | Source text | Reference slug | Predicted slug | |---|---|---| | Children's book about astronomy and living on Mars | `can-we-live-on-mars` | `can-we-live-on-mars` | | Teaching resources for Martin Luther King Jr. Day | `celebrating-martin-luther-king-jr-day` | `celebrating-martin-luther-king-jr-day` | | Article about Waldorf education practices | `12-things-may-not-know-waldorf-education` | `10-things-you-didnt-know-about-waldorf-education` | The third example illustrates the typical case: the model captures the topic correctly but diverges in specific wording. The common failure mode is overgeneralization rather than incoherence. ## How it works The model is a prefix-conditioned transformer decoder. A precomputed text embedding is linearly projected into the decoder's hidden space and placed at position 0 as a prefix token. The decoder then autoregressively generates BPE subword tokens that form a kebab-case URL slug. Beam search uses bounded additive length reward with score-based optimal stopping ([Huang et al. 2017](https://doi.org/10.18653/v1/D17-1227)). All decoding parameters are stored in `model.json`. ## Files | File | Description | |---|---| | `model.onnx` | ONNX model (forward pass only) | | `model.json` | Sidecar: vocabulary, beam search config, stopwords | | `model.pt` | PyTorch weights (`state_dict`) | | `tokenizer.json` | BPE tokenizer (HuggingFace `tokenizers` format) | | `inference.py` | Standalone inference script (`uv run` compatible) | | `manifest.train.json` | Training configuration and results | | `manifest.onnx.json` | Export verification (tolerance, argmax agreement) | | `history.train.jsonl` | Training loss/metric curves | ## Training Trained on 2.3M documents from FineWeb-Edu with slugs extracted from source URLs. The extraction pipeline filters on language, slug format, Gopher repetition, and token count. BPE vocabulary (5,000 subwords) with `-` as a special token. Trained for 36 epochs with label smoothing (0.1) and position-aware EOS loss weighting. Best checkpoint at step 70,560. ## Evaluation Evaluated on 5,000 held-out test samples using the full beam search decoding pipeline. | Metric | Value | |---|---| | Token F1 (macro) | 0.306 | | Exact match | 2.1% | | ROUGE-L | 0.284 | | BERTScore F1 | 0.872 | | Validity | 100% | | Vocab diversity | 97.8% | Token F1 splits both slugs on hyphens and computes set-overlap F1 (order ignored). ROUGE-L measures the longest common subsequence and penalizes misordered words. BERTScore computes contextual embedding similarity via roberta-large; the floor is high (~0.82) because short English slugs are not widely separated in that embedding space. ## Limitations - Requires precomputed embeddings from OpenAI `text-embedding-3-small`. Other embedding models will produce poor results. - Trained on English web content. Non-English or domain-specific text may produce generic or inaccurate slugs. - Slugs reflect patterns in the training URLs, which include SEO-influenced and editorially inconsistent sources. - The primary failure mode is overgeneralization: the model captures the topic but may miss specific angles or proper nouns (`asm` instead of `wasm` for a WebAssembly article). ## Links - [Blog post](https://hash.dev/blog/vec2slug) - [Training code](https://github.com/hashintel/labs) - [Vec2Slug V1-Openai-Small](https://huggingface.co/hashintel/vec2slug-v1-openai-small) ## Citation ```bibtex @misc{vec2slug2026, title={vec2slug: URL Slug Generation from Text Embeddings}, author={Mahmoud, Bilal and {HASH}}, year={2026}, url={https://github.com/hashintel/labs} } ```