initial upload

Browse files

Files changed (9) hide show

README.md +132 -0
history.train.jsonl +35 -0
inference.py +528 -0
manifest.onnx.json +35 -0
manifest.train.json +45 -0
model.json +0 -0
model.onnx +3 -0
model.pt +3 -0
tokenizer.json +0 -0

README.md ADDED Viewed

	@@ -0,0 +1,132 @@

+---
+license: mit
+language:
+  - en
+tags:
+  - slug-generation
+  - onnx
+  - embedding-to-text
+  - url-slug
+  - beam-search
+library_name: onnxruntime
+pipeline_tag: text2text-generation
+---
+# vec2slug-v1-large
+Generate URL slugs directly from text embeddings, without re-feeding
+source text through a language model.
+| | |
+|---|---|
+| **Parameters** | 24.8M |
+| **Architecture** | Transformer decoder, 6L, d=512 |
+| **Input** | OpenAI `text-embedding-3-small` (1536d) |
+| **Vocab** | BPE, 5000 subwords |
+| **Token F1** | 0.306 |
+| **ONNX size** | 95.1 MiB |
+| **Inference (CPU)** | ~66ms (M-series), ~258ms (budget VPS) |
+This is the **larger** of two variants. It achieves the best Token F1 but at 2.2x the inference cost of the smaller model.
+See also: [Vec2Slug V1-Small](https://huggingface.co/hashintel/vec2slug-v1-small)
+## Quickstart
+```bash
+# install dependencies
+pip install onnxruntime numpy
+# or run directly with uv
+uv run inference.py . --input embeddings.npy
+```
+```python
+from inference import OnnxPredictor
+import numpy as np
+predictor = OnnxPredictor.from_dir(".")
+# embeddings: [N, 1536] float32 from OpenAI text-embedding-3-small
+slugs = predictor.predict(embeddings)
+# ["how-neural-networks-learn", "climate-change-solutions", ...]
+```
+PyTorch inference (requires `torch`):
+```python
+from inference import PyTorchPredictor
+predictor = PyTorchPredictor.from_dir(".")
+slugs = predictor.predict(embeddings)
+```
+## How it works
+The model is a prefix-conditioned transformer decoder. A precomputed text
+embedding is linearly projected into the decoder's hidden space and placed
+at position 0 as a prefix token. The decoder then autoregressively generates
+BPE subword tokens that form a kebab-case URL slug.
+Beam search uses bounded additive length reward with score-based optimal
+stopping ([Huang et al. 2017](https://arxiv.org/abs/1702.02429)). All
+decoding parameters are stored in `model.json`.
+## Files
+| File | Description |
+|---|---|
+| `model.onnx` | ONNX model (forward pass only) |
+| `model.json` | Sidecar: vocabulary, beam search config, stopwords |
+| `model.pt` | PyTorch weights (`state_dict`) |
+| `tokenizer.json` | BPE tokenizer (HuggingFace `tokenizers` format) |
+| `inference.py` | Standalone inference script (`uv run` compatible) |
+| `manifest.train.json` | Training configuration and results |
+| `manifest.onnx.json` | Export verification (tolerance, argmax agreement) |
+| `history.train.jsonl` | Training loss/metric curves |
+## Training
+Trained on 2.3M documents from FineWeb-Edu with slugs extracted
+from source URLs. The extraction pipeline filters on language, slug format,
+Gopher repetition, and token count.
+BPE vocabulary (5,000 subwords) with `-` as a special token. Trained for 36 epochs with label smoothing (0.1) and position-aware EOS loss weighting. Best checkpoint at step 70,560.
+## Evaluation
+Evaluated on 5,000 held-out test samples using the full beam search
+decoding pipeline.
+| Metric | Value |
+|---|---|
+| Token F1 (macro) | 0.306 |
+| Exact match | 2.1% |
+| Validity | 100% |
+| Vocab diversity | 97.8% |
+## Limitations
+- Requires precomputed embeddings from OpenAI `text-embedding-3-small`.
+  Other embedding models will produce poor results.
+- Trained on English web content. Non-English or domain-specific text
+  may produce generic or inaccurate slugs.
+- Slugs reflect patterns in the training URLs, which include SEO-influenced
+  and editorially inconsistent sources.
+## Links
+- [Blog post](https://hash.dev/blog/vec2slug)
+- [Training code](https://github.com/hashintel/labs)
+- [Vec2Slug V1-Small](https://huggingface.co/hashintel/vec2slug-v1-small)
+## Citation
+```bibtex
+@misc{vec2slug2025,
+  title={vec2slug: URL Slug Generation from Text Embeddings},
+  author={Mahmoud, Bilal},
+  year={2025},
+  url={https://github.com/hashintel/labs}
+}
+```

history.train.jsonl ADDED Viewed

	@@ -0,0 +1,35 @@

+{"step": 2000, "epoch": 2, "train_loss": 4.106095605669401, "val_loss": 3.5962798564910887, "tok_f1": 0.1444776942012236, "mean_words": 4.9355, "lr": 0.0003, "wall_time": 1779222902.593431}
+{"step": 4000, "epoch": 3, "train_loss": 3.413264048563655, "val_loss": 3.324702338409424, "tok_f1": 0.2016711557233616, "mean_words": 4.7035, "lr": 0.0003, "wall_time": 1779223948.1597261}
+{"step": 6000, "epoch": 4, "train_loss": 3.2278357425950444, "val_loss": 3.215394763946533, "tok_f1": 0.22583124481727423, "mean_words": 4.8295, "lr": 0.0003, "wall_time": 1779224990.457532}
+{"step": 8000, "epoch": 5, "train_loss": 3.1305528082016782, "val_loss": 3.154838008880615, "tok_f1": 0.24623787544155193, "mean_words": 4.788, "lr": 0.0003, "wall_time": 1779226040.401222}
+{"step": 10000, "epoch": 6, "train_loss": 3.0648372187956725, "val_loss": 3.1069913497924806, "tok_f1": 0.24793856168341463, "mean_words": 4.969, "lr": 0.0003, "wall_time": 1779227077.782281}
+{"step": 12000, "epoch": 7, "train_loss": 3.0171820744484097, "val_loss": 3.0755839088439942, "tok_f1": 0.2596720575176457, "mean_words": 4.941, "lr": 0.0003, "wall_time": 1779228114.887825}
+{"step": 14000, "epoch": 8, "train_loss": 2.9800491321919766, "val_loss": 3.052339796447754, "tok_f1": 0.2559486954222248, "mean_words": 5.0645, "lr": 0.0003, "wall_time": 1779229152.6426811}
+{"step": 16000, "epoch": 9, "train_loss": 2.9487837862643955, "val_loss": 3.0353144744873046, "tok_f1": 0.26408666970284617, "mean_words": 4.755, "lr": 0.0003, "wall_time": 1779230163.828542}
+{"step": 18000, "epoch": 11, "train_loss": 2.918691721206019, "val_loss": 3.02962755279541, "tok_f1": 0.2688920691236868, "mean_words": 5.12, "lr": 0.0003, "wall_time": 1779231166.284846}
+{"step": 20000, "epoch": 12, "train_loss": 2.896327673036307, "val_loss": 3.018091846084595, "tok_f1": 0.2724128310415075, "mean_words": 4.878, "lr": 0.0003, "wall_time": 1779232199.0462751}
+{"step": 22000, "epoch": 13, "train_loss": 2.87821229420086, "val_loss": 3.0077461536407473, "tok_f1": 0.27315177722604195, "mean_words": 5.1035, "lr": 0.0003, "wall_time": 1779233254.125132}
+{"step": 24000, "epoch": 14, "train_loss": 2.8617876689077253, "val_loss": 2.998493883895874, "tok_f1": 0.2770256465756466, "mean_words": 4.8905, "lr": 0.0003, "wall_time": 1779234303.187704}
+{"step": 26000, "epoch": 15, "train_loss": 2.846088374496488, "val_loss": 2.9906312114715576, "tok_f1": 0.27703381985661396, "mean_words": 4.896, "lr": 0.0003, "wall_time": 1779235355.874115}
+{"step": 28000, "epoch": 16, "train_loss": 2.8328490578439105, "val_loss": 2.983960963058472, "tok_f1": 0.2795972222222222, "mean_words": 4.9435, "lr": 0.0003, "wall_time": 1779236406.483165}
+{"step": 30000, "epoch": 17, "train_loss": 2.820020103981039, "val_loss": 2.97227031211853, "tok_f1": 0.28214252634620285, "mean_words": 5.0595, "lr": 0.0003, "wall_time": 1779237476.096491}
+{"step": 32000, "epoch": 18, "train_loss": 2.8092726084687767, "val_loss": 2.968260679626465, "tok_f1": 0.28473659257409256, "mean_words": 4.924, "lr": 0.0003, "wall_time": 1779238523.0281012}
+{"step": 34000, "epoch": 20, "train_loss": 2.79349008795453, "val_loss": 2.977187242126465, "tok_f1": 0.2865114801864802, "mean_words": 4.9075, "lr": 0.0003, "wall_time": 1779239577.0179908}
+{"step": 36000, "epoch": 21, "train_loss": 2.783505980300933, "val_loss": 2.9694487785339354, "tok_f1": 0.288755238062591, "mean_words": 4.858, "lr": 0.0003, "wall_time": 1779240639.721827}
+{"step": 38000, "epoch": 22, "train_loss": 2.774734211295068, "val_loss": 2.965319557952881, "tok_f1": 0.2830145099181864, "mean_words": 4.9315, "lr": 0.0003, "wall_time": 1779241690.4812958}
+{"step": 40000, "epoch": 23, "train_loss": 2.7663396469081585, "val_loss": 2.960056104660034, "tok_f1": 0.29040886058386056, "mean_words": 4.988, "lr": 0.0003, "wall_time": 1779242737.81421}
+{"step": 42000, "epoch": 24, "train_loss": 2.75957179015756, "val_loss": 2.957438604736328, "tok_f1": 0.2905343975468975, "mean_words": 4.9165, "lr": 0.0003, "wall_time": 1779243786.238262}
+{"step": 44000, "epoch": 25, "train_loss": 2.7523164791037815, "val_loss": 2.9523234798431397, "tok_f1": 0.29058897613824086, "mean_words": 4.9375, "lr": 0.0003, "wall_time": 1779244830.177305}
+{"step": 46000, "epoch": 26, "train_loss": 2.7447811277795235, "val_loss": 2.9494457813262938, "tok_f1": 0.28798811188811185, "mean_words": 5.0245, "lr": 0.0003, "wall_time": 1779245868.350689}
+{"step": 48000, "epoch": 27, "train_loss": 2.7385771292894536, "val_loss": 2.946452843475342, "tok_f1": 0.28848719752469754, "mean_words": 4.876, "lr": 0.0003, "wall_time": 1779246903.871413}
+{"step": 50000, "epoch": 29, "train_loss": 2.728870005215236, "val_loss": 2.957064482879639, "tok_f1": 0.290686912515589, "mean_words": 4.911, "lr": 0.0003, "wall_time": 1779247946.015985}
+{"step": 52000, "epoch": 30, "train_loss": 2.7219258368258132, "val_loss": 2.9526238201141357, "tok_f1": 0.2944186653216065, "mean_words": 4.7625, "lr": 0.0003, "wall_time": 1779248976.7653491}
+{"step": 54000, "epoch": 31, "train_loss": 2.7171959208950165, "val_loss": 2.9489395374298097, "tok_f1": 0.28971268453768456, "mean_words": 4.812, "lr": 0.0003, "wall_time": 1779250006.8798962}
+{"step": 56000, "epoch": 32, "train_loss": 2.711857982278668, "val_loss": 2.949110791015625, "tok_f1": 0.29125145589704415, "mean_words": 4.9335, "lr": 0.0003, "wall_time": 1779251042.63035}
+{"step": 58000, "epoch": 33, "train_loss": 2.7074541541301547, "val_loss": 2.9462409435272217, "tok_f1": 0.2962148821766469, "mean_words": 4.908, "lr": 0.0003, "wall_time": 1779252071.914974}
+{"step": 60000, "epoch": 34, "train_loss": 2.70361461964871, "val_loss": 2.944313480758667, "tok_f1": 0.29103940960999786, "mean_words": 4.9475, "lr": 0.0003, "wall_time": 1779253094.764807}
+{"step": 62000, "epoch": 35, "train_loss": 2.698599462362122, "val_loss": 2.942076708984375, "tok_f1": 0.29306238744915214, "mean_words": 4.841, "lr": 0.0003, "wall_time": 1779254107.271397}
+{"step": 64000, "epoch": 36, "train_loss": 2.6947960017598676, "val_loss": 2.937381767654419, "tok_f1": 0.295903315556992, "mean_words": 4.934, "lr": 0.0003, "wall_time": 1779255123.1438122}
+{"step": 66000, "epoch": 38, "train_loss": 2.687774037942866, "val_loss": 2.948435255050659, "tok_f1": 0.2897239565989566, "mean_words": 4.964, "lr": 0.0003, "wall_time": 1779256134.4341109}
+{"step": 68000, "epoch": 39, "train_loss": 2.6818021759542097, "val_loss": 2.9472034103393554, "tok_f1": 0.2949354034854035, "mean_words": 4.946, "lr": 0.0003, "wall_time": 1779257196.270357}
+{"step": 70000, "epoch": 40, "train_loss": 2.678613240182306, "val_loss": 2.9431504138946534, "tok_f1": 0.29160234944793767, "mean_words": 5.032, "lr": 0.0003, "wall_time": 1779258283.2707899}

inference.py ADDED Viewed

	@@ -0,0 +1,528 @@

+# /// script
+# requires-python = ">=3.12"
+# dependencies = [
+#     "numpy>=1.24",
+#     "onnxruntime>=1.16",
+# ]
+# ///
+"""vec2slug: generate URL slugs from text embeddings.
+Standalone inference script for vec2slug models. Loads an ONNX (or
+PyTorch) model and its JSON sidecar, runs beam search decoding, and
+returns kebab-case slugs.
+Usage as a library:
+    from inference import OnnxPredictor
+    predictor = OnnxPredictor.from_dir(".")
+    slugs = predictor.predict(embeddings)  # [N, input_dim] float32
+Usage from the command line:
+    uv run inference.py .                          # random demo
+    uv run inference.py . --input embeddings.npy   # real embeddings
+PyTorch backend (requires torch):
+    from inference import PyTorchPredictor
+    predictor = PyTorchPredictor.from_dir(".")
+"""
+from __future__ import annotations
+import argparse
+import json
+import sys
+from abc import ABC, abstractmethod
+from pathlib import Path
+from typing import TypedDict
+import numpy as np
+class ModelConfig(TypedDict):
+    input_dim: int
+    embed_dim: int
+    num_heads: int
+    num_layers: int
+    max_slug_tokens: int
+    vocab_size: int
+class TokenConfig(TypedDict):
+    pad: int
+    bos: int
+    eos: int
+    unk: int
+    hyphen: int
+class BeamSearchConfig(TypedDict):
+    beam_width: int
+    length_reward: float
+    reward_cap: int
+    min_decode_tokens: int
+    min_slug_words: int
+class Sidecar(TypedDict):
+    model: ModelConfig
+    tokens: TokenConfig
+    vocab: dict[str, str]  # token_id (str) -> token
+    beam_search: BeamSearchConfig
+    stopwords: list[str]
+def _log_softmax(x: np.ndarray) -> np.ndarray:
+    """Numerically stable log-softmax over a 1-D array."""
+    x_max = x.max()
+    shifted = x - x_max
+    return shifted - np.log(np.exp(shifted).sum())
+class SlugPredictor(ABC):
+    """Beam search slug predictor. Subclasses provide the forward pass."""
+    def __init__(self, sidecar: Sidecar):
+        tokens = sidecar["tokens"]
+        self.pad_idx = tokens["pad"]
+        self.bos_idx = tokens["bos"]
+        self.eos_idx = tokens["eos"]
+        self.unk_idx = tokens["unk"]
+        self.hyphen_idx = tokens["hyphen"]
+        self.id_to_token: dict[int, str] = {
+            int(k): v for k, v in sidecar["vocab"].items()
+        }
+        beam = sidecar["beam_search"]
+        self.beam_width: int = beam["beam_width"]
+        self.length_reward: float = beam["length_reward"]
+        self.reward_cap: int = beam["reward_cap"]
+        self.min_decode_tokens: int = beam["min_decode_tokens"]
+        self.min_slug_words: int = beam["min_slug_words"]
+        self.max_length: int = sidecar["model"]["max_slug_tokens"]
+        self.max_content_tokens: int = max(self.max_length - 1, 0)
+        self.stopwords: frozenset[str] = frozenset(sidecar["stopwords"])
+    def predict(self, embeddings: np.ndarray) -> list[str]:
+        """Predict slugs for a batch of embeddings.
+        Args:
+            embeddings: float32 array of shape [N, input_dim].
+        Returns:
+            List of kebab-case slug strings, one per embedding.
+        """
+        slugs = []
+        for i in range(len(embeddings)):
+            candidates = self._beam_search(embeddings[i : i + 1])
+            slugs.append(candidates[0][0] if candidates else "")
+        return slugs
+    def predict_topk(
+        self, embeddings: np.ndarray, k: int = 5
+    ) -> list[list[tuple[str, float]]]:
+        """Return top-k slug candidates with scores for each embedding."""
+        results = []
+        for i in range(len(embeddings)):
+            candidates = self._beam_search(embeddings[i : i + 1])
+            results.append(candidates[:k])
+        return results
+    @abstractmethod
+    def _forward(self, embeddings: np.ndarray, token_ids: np.ndarray) -> np.ndarray:
+        """Run the model: (embeddings, token_ids) -> logits.
+        Args:
+            embeddings: [batch, input_dim] float32
+            token_ids:  [batch, seq_len] int64
+        Returns:
+            logits: [batch, seq_len, vocab_size] float32
+        """
+        raise NotImplementedError
+    def _decode_tokens(self, indices: list[int]) -> str:
+        """Decode token indices to a slug string, stopping at EOS."""
+        parts: list[str] = []
+        for idx in indices:
+            if idx == self.eos_idx:
+                break
+            if idx in (self.pad_idx, self.bos_idx):
+                continue
+            if idx == self.hyphen_idx:
+                parts.append("-")
+            else:
+                token = self.id_to_token.get(idx)
+                if token is not None:
+                    parts.append(token)
+        return "".join(parts)
+    def _score(self, log_prob: float, tokens: list[int]) -> float:
+        """Score a completed beam using bounded additive length reward.
+        score = log_prob + r * min(word_count, B) + penalties
+        """
+        slug = self._decode_tokens(tokens).strip("-")
+        words = slug.split("-") if slug else []
+        word_count = len([w for w in words if w])
+        score = log_prob + self.length_reward * min(word_count, self.reward_cap)
+        # Trailing stopword penalty
+        if words and words[-1] in self.stopwords:
+            score -= 1.0
+        # Repetition penalty
+        content = [w for w in words if w and w not in self.stopwords]
+        if len(content) != len(set(content)):
+            score -= 2.0
+        return score
+    def _partial_score(self, log_prob: float, tokens: list[int]) -> float:
+        """Optimistic partial score for active beam ranking."""
+        slug = self._decode_tokens(tokens).strip("-")
+        words = [w for w in slug.split("-") if w] if slug else []
+        return log_prob + self.length_reward * min(len(words), self.reward_cap)
+    def _beam_search(self, embedding: np.ndarray) -> list[tuple[str, float]]:
+        """Beam search with score-based optimal stopping.
+        Uses bounded additive length reward with the Huang et al. (2017)
+        stopping criterion: stop when the best completed beam provably
+        dominates every active beam's upper bound.
+        """
+        bos = self.bos_idx
+        eos = self.eos_idx
+        pad = self.pad_idx
+        unk = self.unk_idx
+        k = self.beam_width
+        r = self.length_reward
+        B = self.reward_cap
+        active: list[tuple[float, list[int]]] = [(0.0, [bos])]
+        best_finished_score = -float("inf")
+        completed: list[tuple[float, list[int]]] = []
+        stopped_by_bound = False
+        for _step in range(self.max_length):
+            if not active:
+                break
+            candidates: list[tuple[float, list[int]]] = []
+            # Batch all active beams into a single forward pass
+            max_len = max(len(t) for _, t in active)
+            padded = [t + [pad] * (max_len - len(t)) for _, t in active]
+            input_ids = np.array(padded, dtype=np.int64)
+            embedding_batch = np.tile(embedding, (len(active), 1))
+            all_logits = self._forward(embedding_batch, input_ids)
+            for beam_idx, (log_prob, tokens) in enumerate(active):
+                next_logits = all_logits[beam_idx, len(tokens) - 1, :].copy()
+                content_length = len(tokens) - 1  # exclude BOS
+                force_eos = content_length >= self.max_content_tokens
+                # Suppress PAD and UNK always
+                next_logits[pad] = -np.inf
+                if unk is not None:
+                    next_logits[unk] = -np.inf
+                if force_eos:
+                    # Force EOS, but charge its model probability
+                    log_probs = _log_softmax(next_logits)
+                    top_indices = np.array([eos])
+                else:
+                    if content_length < self.min_decode_tokens:
+                        next_logits[eos] = -np.inf
+                    slug_so_far = self._decode_tokens(tokens[1:]).strip("-")
+                    words = slug_so_far.split("-") if slug_so_far else []
+                    if len(words) < self.min_slug_words:
+                        next_logits[eos] = -np.inf
+                    if words and words[-1] in self.stopwords:
+                        next_logits[eos] = -np.inf
+                    log_probs = _log_softmax(next_logits)
+                    top_count = min(k, len(log_probs))
+                    top_indices = np.argpartition(log_probs, -top_count)[-top_count:]
+                    top_indices = top_indices[np.argsort(log_probs[top_indices])[::-1]]
+                for j in range(len(top_indices)):
+                    token_id = int(top_indices[j])
+                    token_lp = float(log_probs[token_id])
+                    if not np.isfinite(token_lp):
+                        continue
+                    new_log_prob = log_prob + token_lp
+                    new_tokens = tokens + [token_id]
+                    if token_id == eos:
+                        score = self._score(new_log_prob, new_tokens)
+                        completed.append((new_log_prob, new_tokens))
+                        best_finished_score = max(best_finished_score, score)
+                    else:
+                        candidates.append((new_log_prob, new_tokens))
+            # Rank by partial objective for consistent pruning
+            candidates.sort(
+                key=lambda x: self._partial_score(x[0], x[1]), reverse=True
+            )
+            active = candidates[:k]
+            # Optimal stopping: best completed dominates all active upper bounds
+            if active and best_finished_score > -float("inf"):
+                max_active_lp = max(lp for lp, _ in active)
+                upper_bound = max_active_lp + r * B
+                if best_finished_score >= upper_bound:
+                    stopped_by_bound = True
+                    break
+        # Force-finish active beams by charging EOS probability
+        if active and not stopped_by_bound:
+            max_len = max(len(t) for _, t in active)
+            padded = [t + [pad] * (max_len - len(t)) for _, t in active]
+            input_ids = np.array(padded, dtype=np.int64)
+            embedding_batch = np.tile(embedding, (len(active), 1))
+            finish_logits = self._forward(embedding_batch, input_ids)
+            for bi, (log_prob, tokens) in enumerate(active):
+                nl = finish_logits[bi, len(tokens) - 1, :].copy()
+                nl[pad] = -np.inf
+                if unk is not None:
+                    nl[unk] = -np.inf
+                lp = _log_softmax(nl)
+                eos_lp = float(lp[eos])
+                if np.isfinite(eos_lp):
+                    completed.append((log_prob + eos_lp, tokens + [eos]))
+                else:
+                    completed.append((log_prob - 5.0, tokens + [eos]))
+        # Deduplicate and rank
+        scored = [
+            (self._score(log_prob, tokens), tokens)
+            for log_prob, tokens in completed
+        ]
+        scored.sort(key=lambda x: -x[0])
+        seen: set[str] = set()
+        results: list[tuple[str, float]] = []
+        for score, tokens in scored:
+            slug = self._decode_tokens(tokens).strip("-")
+            if not slug or slug in seen:
+                continue
+            seen.add(slug)
+            results.append((slug, score))
+        return results
+class OnnxPredictor(SlugPredictor):
+    """ONNX Runtime inference. No torch dependency."""
+    def __init__(self, session, sidecar: Sidecar):
+        super().__init__(sidecar)
+        self.session = session
+    @classmethod
+    def from_dir(cls, model_dir: str | Path) -> OnnxPredictor:
+        """Load from a directory containing model.onnx and model.json."""
+        import onnxruntime as ort
+        model_dir = Path(model_dir)
+        session = ort.InferenceSession(str(model_dir / "model.onnx"))
+        sidecar = json.loads((model_dir / "model.json").read_text())
+        return cls(session, sidecar)
+    def _forward(self, embeddings: np.ndarray, token_ids: np.ndarray) -> np.ndarray:
+        return self.session.run(
+            None,
+            {"src_embedding": embeddings, "token_ids": token_ids},
+        )[0]
+def _load_pytorch_model(model_dir: Path, model_config: ModelConfig):
+    """Build and load the SlugDecoder. Requires torch.
+    The model is a prefix-conditioned transformer decoder: the source
+    embedding is projected into decoder space and placed at position 0,
+    followed by BOS and autoregressive token embeddings.
+    """
+    import torch
+    from torch import Tensor, nn
+    class DecoderBlock(nn.Module):
+        def __init__(self, embed_dim: int, num_heads: int, dropout: float):
+            super().__init__()
+            self.ln1 = nn.LayerNorm(embed_dim)
+            self.attn = nn.MultiheadAttention(
+                embed_dim, num_heads, dropout=dropout, batch_first=True
+            )
+            self.ln2 = nn.LayerNorm(embed_dim)
+            self.ffn = nn.Sequential(
+                nn.Linear(embed_dim, embed_dim * 4),
+                nn.GELU(),
+                nn.Dropout(dropout),
+                nn.Linear(embed_dim * 4, embed_dim),
+                nn.Dropout(dropout),
+            )
+        def forward(self, x: Tensor, attn_mask: Tensor) -> Tensor:
+            normed = self.ln1(x)
+            x = (
+                x
+                + self.attn(
+                    normed, normed, normed, attn_mask=attn_mask, is_causal=True
+                )[0]
+            )
+            x = x + self.ffn(self.ln2(x))
+            return x
+    class SlugDecoder(nn.Module):
+        def __init__(
+            self,
+            vocab_size: int,
+            embed_dim: int,
+            num_heads: int,
+            num_layers: int,
+            input_dim: int,
+            max_length: int,
+            dropout: float = 0.1,
+        ):
+            super().__init__()
+            self.embed_dim = embed_dim
+            self.max_length = max_length
+            self.embedding_projection = nn.Linear(input_dim, embed_dim)
+            self.token_embedding = nn.Embedding(vocab_size, embed_dim, padding_idx=0)
+            self.position_embedding = nn.Embedding(max_length + 1, embed_dim)
+            self.dropout = nn.Dropout(dropout)
+            self.blocks = nn.ModuleList(
+                [DecoderBlock(embed_dim, num_heads, dropout) for _ in range(num_layers)]
+            )
+            self.ln_final = nn.LayerNorm(embed_dim)
+            self.output_projection = nn.Linear(embed_dim, vocab_size)
+        def forward(self, embeddings: Tensor, target_ids: Tensor) -> Tensor:
+            prefix = self.embedding_projection(embeddings).unsqueeze(1)
+            token_emb = self.token_embedding(target_ids)
+            seq = torch.cat([prefix, token_emb], dim=1)
+            positions = torch.arange(seq.size(1), device=seq.device)
+            seq = seq + self.position_embedding(positions)
+            seq = self.dropout(seq)
+            attn_mask = nn.Transformer.generate_square_subsequent_mask(
+                seq.size(1), device=seq.device
+            )
+            for block in self.blocks:
+                seq = block(seq, attn_mask)
+            seq = self.ln_final(seq)
+            return self.output_projection(seq[:, 1:, :])
+    model = SlugDecoder(
+        vocab_size=model_config["vocab_size"],
+        embed_dim=model_config["embed_dim"],
+        num_heads=model_config["num_heads"],
+        num_layers=model_config["num_layers"],
+        input_dim=model_config["input_dim"],
+        max_length=model_config["max_slug_tokens"],
+    )
+    model.load_state_dict(
+        torch.load(model_dir / "model.pt", map_location="cpu", weights_only=True)
+    )
+    model.eval()
+    return model
+class PyTorchPredictor(SlugPredictor):
+    """PyTorch inference. Requires: pip install torch"""
+    def __init__(self, model, sidecar: Sidecar):
+        super().__init__(sidecar)
+        self.model = model
+    @classmethod
+    def from_dir(cls, model_dir: str | Path) -> PyTorchPredictor:
+        """Load from a directory containing model.pt and model.json."""
+        model_dir = Path(model_dir)
+        sidecar = json.loads((model_dir / "model.json").read_text())
+        model = _load_pytorch_model(model_dir, sidecar["model"])
+        return cls(model, sidecar)
+    def _forward(self, embeddings: np.ndarray, token_ids: np.ndarray) -> np.ndarray:
+        import torch
+        with torch.no_grad():
+            logits = self.model(
+                torch.from_numpy(embeddings),
+                torch.from_numpy(token_ids),
+            )
+            return logits.numpy()
+def main():
+    parser = argparse.ArgumentParser(
+        description="Generate URL slugs from text embeddings",
+    )
+    parser.add_argument(
+        "model_dir",
+        type=Path,
+        help="Directory containing model.onnx and model.json",
+    )
+    parser.add_argument(
+        "--input",
+        type=Path,
+        default=None,
+        help="Path to .npy file with embeddings (shape [N, input_dim])",
+    )
+    parser.add_argument(
+        "--backend",
+        choices=["onnx", "pytorch"],
+        default="onnx",
+        help="Inference backend (default: onnx)",
+    )
+    parser.add_argument(
+        "--topk",
+        type=int,
+        default=1,
+        help="Number of candidates per embedding (default: 1)",
+    )
+    args = parser.parse_args()
+    # Load model
+    if args.backend == "onnx":
+        predictor = OnnxPredictor.from_dir(args.model_dir)
+    else:
+        predictor = PyTorchPredictor.from_dir(args.model_dir)
+    # Load or generate embeddings
+    sidecar = json.loads((args.model_dir / "model.json").read_text())
+    input_dim = sidecar["model"]["input_dim"]
+    if args.input is not None:
+        embeddings = np.load(args.input).astype(np.float32)
+        print(f"Loaded {len(embeddings)} embeddings from {args.input}", file=sys.stderr)
+    else:
+        embeddings = np.random.randn(3, input_dim).astype(np.float32)
+        print(
+            "No --input provided, using random embeddings (results will be nonsensical)",
+            file=sys.stderr,
+        )
+    # Predict
+    if args.topk > 1:
+        results = predictor.predict_topk(embeddings, k=args.topk)
+        for i, candidates in enumerate(results):
+            print(f"[{i}]")
+            for slug, score in candidates:
+                print(f"  {score:+.2f}  {slug}")
+    else:
+        slugs = predictor.predict(embeddings)
+        for slug in slugs:
+            print(slug)
+if __name__ == "__main__":
+    main()

manifest.onnx.json ADDED Viewed

	@@ -0,0 +1,35 @@

+{
+  "exported_at": "2026-05-23T17:36:12.280199+00:00",
+  "torch_version": "2.12.0",
+  "artifacts": [
+    "model.onnx"
+  ],
+  "sidecar": "model.json",
+  "onnx_size_bytes": 99694368,
+  "sidecar_size_bytes": 105072,
+  "verification": {
+    "onnxruntime_version": "1.26.0",
+    "random_inputs": {
+      "batch_1_max_diff": 1.9073486328125e-05,
+      "batch_4_max_diff": 2.6226043701171875e-05
+    },
+    "real_embeddings": {
+      "prediction_set": "seq2seq_bpe_d512_l6_t24_eos_seq2seq_test.parquet",
+      "n_samples": 5000,
+      "tolerance": {
+        "atol": 0.0001,
+        "rtol": 1e-05
+      },
+      "max_absolute_diff": 2.8908252716064453e-05,
+      "mean_absolute_diff": 2.8676997771981405e-06,
+      "p95_absolute_diff": 2.0503997802734375e-05,
+      "p99_absolute_diff": 2.342522202525288e-05,
+      "argmax_agreement": 5000,
+      "argmax_agreement_rate": 1.0,
+      "wilson_ci_95": [
+        0.9992322698624194,
+        1.0
+      ]
+    }
+  }
+}

manifest.train.json ADDED Viewed

	@@ -0,0 +1,45 @@

+{
+  "schema_version": 1,
+  "variant": "seq2seq",
+  "encoder": "openai",
+  "seed": 42,
+  "compression": null,
+  "tokenizer": "bpe",
+  "model": {
+    "input_dim": 1536,
+    "vocab_size": 5000,
+    "embed_dim": 512,
+    "num_heads": 8,
+    "num_layers": 6,
+    "dropout": 0.1,
+    "max_slug_tokens": 24
+  },
+  "training": {
+    "lr": 0.0003,
+    "weight_decay": 0.0001,
+    "batch_size": 1024,
+    "patience": 10,
+    "epochs": 50,
+    "eval_every": 2000,
+    "val_max_samples": 5000,
+    "checkpoint_every": 5000,
+    "keep_last_checkpoints": 5,
+    "f1_n_samples": 2000
+  },
+  "results": {
+    "best_val_loss": 2.937381767654419,
+    "best_step": 64000,
+    "total_steps": 64000,
+    "n_params": 24840072
+  },
+  "artifacts": [
+    "best.pt",
+    "tokenizer.json",
+    "history.jsonl",
+    "step_040000.pt",
+    "step_045000.pt",
+    "step_050000.pt",
+    "step_055000.pt",
+    "step_060000.pt"
+  ]
+}

model.json ADDED Viewed

The diff for this file is too large to render. See raw diff

model.onnx ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:1cc982c9e13af132fa31fdabf8cd3b3be04660f12f3bc72706273cb57bbc8f9f
+size 99694368

model.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:c22ce1b1b571d2eec1498b09f24afca25b7b1a4848587bab2cfa26f39c81e33e
+size 99382065

tokenizer.json ADDED Viewed

The diff for this file is too large to render. See raw diff