--- license: other license_name: lfm1.0 license_link: https://www.liquid.ai/license base_model: LiquidAI/LFM2.5-350M-Base datasets: - juanquivilla/sotto-transcript-cleanup tags: - speech-to-text - transcript-cleanup - disfluency-correction - sotto-asr - lfm2 - liquid-ai - text2text-generation library_name: transformers pipeline_tag: text-generation language: - en --- # SottoASR Transcript Cleanup — LFM2.5-350M (bf16)

sotto.app · MLX 5-bit (recommended for deployment) · MLX 4-bit · Training Dataset

## Overview This is the **full-precision (bf16) fine-tuned** [LiquidAI/LFM2.5-350M-Base](https://huggingface.co/LiquidAI/LFM2.5-350M-Base) model for cleaning speech-to-text transcripts. It is the fine-tuned SLM (Small Language Model) powering on-device transcript cleanup in [**SottoASR**](https://sotto.app) — a local, privacy-first speech-to-text application for macOS. **For on-device deployment, use the [MLX 5-bit quantized version](https://huggingface.co/juanquivilla/sotto-cleanup-lfm25-350m-mlx-5bit) (233MB, <0.5% quality loss).** ## What It Does Takes raw, unpunctuated ASR output and produces clean, properly formatted text: | Input (raw ASR) | Output (cleaned) | |---|---| | `uh the server is uh running low on memory` | The server is running low on memory. | | `use redis wait no memcached is better` | Use Memcached. | | `so basically the the api is um throttling our requests` | The API is throttling our requests. | | `lets go ahead and really focus on the performance issue` | Let's go ahead and really focus on the performance issue. | | `send the email to john period` | Send the email to John. | | `me and the team is working on fixing it` | The team and I are working on fixing it. | Handles: filler removal, crutch word removal, self-corrections, false starts, grammar fixes, misheard word correction, dictation commands (period→., comma→,, slash→/), list formatting, and wording preservation. ## Performance | Metric | This Model (350M) | Prompted Qwen3.5-2B | Improvement | |--------|-------------------|---------------------|-------------| | **ROUGE-L** | **0.931** | 0.891 | **+4.5%** | | **Exact Match** | **56%** | 37% | **+51% relative** | | **Self-Correction** | **0.869** | 0.742 | **+17.1%** | | **Zero-Filler Rate** | **90%** | 82% | **+9.8% relative** | | **Inference** | **0.12s** | 1.0s | **8.3x faster** | | **Model Size** | **354M params** | 2B params | **5.7x smaller** | ### Per-Category Scores | Category | ROUGE-L | Description | |----------|---------|-------------| | preserve_wording | 0.987 | Clean input passes through unchanged | | list_formatting | 0.972 | Spoken lists → numbered format | | filler_removal | 0.955 | uh, um, uhm, er, ah | | short | 0.940 | Brief utterances (2-10 words) | | false_start | 0.926 | Stutters and restarts | | dictation_commands | 0.971 | period→., comma→,, slash→/ | | mixed | 0.928 | Multiple overlapping disfluencies | | long_dictation | 0.918 | 100+ word passages | | misheard_words | 0.913 | ASR errors (post gress→Postgres) | | grammar | 0.906 | gonna→going to, me and him→he and I | | crutch_words | 0.892 | basically, you know, I mean | | self_correction | 0.869 | Speaker changes mind mid-sentence | ## Training - **Base model:** [LiquidAI/LFM2.5-350M-Base](https://huggingface.co/LiquidAI/LFM2.5-350M-Base) (hybrid convolution + attention, 32K context) - **Dataset:** [juanquivilla/sotto-transcript-cleanup](https://huggingface.co/datasets/juanquivilla/sotto-transcript-cleanup) — 124K synthetic pairs - **Method:** Two-stage full fine-tuning 1. **Stage 1:** Full FT on 124K dataset (LR 1e-5, 3 epochs, ~22 min on RTX 4090) 2. **Stage 2:** Concentrated hard-pattern FT on 14K examples (LR 2e-6, 1 epoch, 27 seconds) - **Data sources:** Qwen3.5-35B (95K), Grok 4.20 (29K), hand-crafted (235) - **Key finding:** Full fine-tune dramatically outperforms LoRA for small models (+7% ROUGE-L on same data) ## Usage ```python from transformers import AutoModelForCausalLM, AutoTokenizer import torch model = AutoModelForCausalLM.from_pretrained( "juanquivilla/sotto-cleanup-lfm25-350m", dtype=torch.bfloat16, device_map="auto", trust_remote_code=True, ) tokenizer = AutoTokenizer.from_pretrained("juanquivilla/sotto-cleanup-lfm25-350m") raw = "uh the server is uh running low on memory" prompt = f"### Input:\n{raw}\n\n### Output:\n" inputs = tokenizer(prompt, return_tensors="pt").to(model.device) with torch.no_grad(): out = model.generate(**inputs, max_new_tokens=256, do_sample=False) print(tokenizer.decode(out[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True)) # → "The server is running low on memory." ``` ## Quantized Variants | Variant | Size | ROUGE-L | Filler-Free | Link | |---------|------|---------|-------------|------| | **bf16 (this model)** | 676MB | 0.931 | 90% | — | | **MLX 5-bit (recommended)** | 233MB | 0.926 | 99% | [sotto-cleanup-lfm25-350m-mlx-5bit](https://huggingface.co/juanquivilla/sotto-cleanup-lfm25-350m-mlx-5bit) | | MLX 4-bit | 190MB | 0.897 | 99% | [sotto-cleanup-lfm25-350m-mlx-4bit](https://huggingface.co/juanquivilla/sotto-cleanup-lfm25-350m-mlx-4bit) | ## Part of SottoASR [**SottoASR**](https://sotto.app) is a local, privacy-first speech-to-text application for macOS. Press a hotkey, speak, and clean text appears at your cursor. All processing happens on-device — no audio or text is ever sent to a cloud service. This model powers the transcript cleanup step. ## License This model inherits the [LFM 1.0 license](https://www.liquid.ai/license) from the base model.