Spaces:
Running on Zero
M18 β Translation Service
Spec version: v1.0 (Phase 2)
Depends on: M03 (bus), X04 (config), X03 (observability), transformers, torch
Depended on by: UI marketplace + chat (one-click translate), M19 STT (with translate_to_en=true)
1. Responsibility
Provide trans.text@1.0. Translate between languages, with strong emphasis on:
- German β English (default)
- German β Plattdeutsch (Niederrhein-specific, Christof's domain)
- Major European languages
- Optionally Arabic, Turkish, Russian, Ukrainian β useful in refugee-context emergencies
2. File layout
hearthnet/services/translation/
βββ __init__.py
βββ service.py
βββ backends/
βββ __init__.py
βββ base.py
βββ nllb.py # facebook/nllb-200-distilled-600M
βββ plattdeutsch.py # specialised fine-tune, optional
3. Public API
3.1 backends/base.py
@dataclass(frozen=True)
class TranslationResult:
text: str
from_lang: str # ISO 639-1
to_lang: str
confidence: float # 0..1 if backend supports; else 1.0 placeholder
ms: int
class TranslationBackend(Protocol):
name: str
languages_pairs: list[tuple[str, str]] # supported (from, to) pairs
max_chars: int
async def warm(self) -> None: ...
async def close(self) -> None: ...
async def translate(
self,
text: str,
*,
from_lang: str, # "auto" supported
to_lang: str,
domain: str | None,
) -> TranslationResult: ...
def detect_language(self, text: str) -> str | None: ...
def health(self) -> dict: ...
3.2 Concrete backends
class NllbBackend(TranslationBackend):
"""facebook/nllb-200-distilled-600M (or larger variants).
200+ language pairs out of the box."""
def __init__(
self,
model: str = "facebook/nllb-200-distilled-600M",
device: str = "auto",
max_chars: int = TRANSLATION_MAX_CHARS,
):
...
class PlattdeutschBackend(TranslationBackend):
"""Optional specialised fine-tune.
If a Plattdeutsch fine-tune is present in models_dir, registers deβnds pair.
Otherwise no-op (the backend reports zero language pairs and is filtered out)."""
def __init__(
self,
models_dir: Path,
device: str = "auto",
):
...
3.3 service.py
class TranslationService:
name = "translation"
version = "1.0"
def __init__(self, config: TranslationConfig):
self._backends: list[TranslationBackend] = self._build_backends(config)
def capabilities(self) -> list[tuple[CapabilityDescriptor, Callable, ParamsPredicate]]:
"""One trans.text entry per backend. params declare languages_pairs."""
async def start(self) -> None: ...
async def stop(self) -> None: ...
def health(self) -> dict: ...
async def handle_translate(self, req: RouteRequest) -> dict:
"""CAP2 Β§4.10."""
3.4 params_compatible predicate
def params_compatible(offered: dict, requested: dict) -> bool:
if "backend" in requested and requested["backend"] != offered.get("backend"):
return False
pair = (requested.get("from"), requested.get("to"))
if pair[0] == "auto":
# auto-detect; backend must support at least one source β target pair
return any(t == pair[1] for (_, t) in offered.get("languages_pairs", []))
return pair in offered.get("languages_pairs", [])
4. Behaviour
4.1 Auto-detection
from: "auto":
- Call
detect_language(text)(NLLB has internal language detection) - Substitute detected lang
- Translate
4.2 Domain hints
domain: "everyday" | "medical" | "legal" | "emergency" is advisory. NLLB ignores it; specialised fine-tunes may use it.
4.3 Niederrhein focus
PlattdeutschBackend is Christof's local interest. When installed:
- Registers pairs
("de", "nds")and("nds", "de") - Optionally
("en", "nds")if fine-tune extends - Used by the marketplace UI's "auf Platt" button in M08 settings ext
4.4 Length limits
- Single request: β€
TRANSLATION_MAX_CHARS(4000) - For longer texts, callers chunk by paragraph and recombine
4.5 Batching
Internal: requests within 100 ms batched up to 8 strings per forward pass. Improves GPU utilisation. Demultiplexed back. Transparent to callers.
4.6 Caching
In-memory LRU cache (text_hash, from, to) β result, max 10k entries. Big wins for marketplace UI which re-translates same posts on every refresh.
5. Errors
| Condition | Wire code |
|---|---|
| Pair not supported by any backend | not_found |
| Text too long | bad_request |
| Detection failed | bad_request |
| Backend OOM | capacity_exceeded |
6. Configuration
config.translation.enabled = True
config.translation.backends = [
TranslationBackendConfig(name="nllb", model="facebook/nllb-200-distilled-600M", device="auto"),
TranslationBackendConfig(name="plattdeutsch", models_dir=Path("~/.hearthnet/models/plattdeutsch")),
]
Constants: TRANSLATION_MAX_CHARS.
7. Tests
Unit
test_descriptor_schema_validatestest_params_compatible_pair_must_matchtest_auto_detect_substitutes_source_langtest_text_too_long_rejectedtest_cache_hit_returns_immediately
Integration
test_german_to_english_quality(BLEU above floor)test_plattdeutsch_pair_registered_when_finetune_presenttest_marketplace_one_click_translate_end_to_end
8. Cross-references
| What | Where |
|---|---|
trans.text@1.0 wire |
CAP2 Β§4.10 |
| STT translate-to-EN feature | M19 Β§4.3 |
| Marketplace one-click | M08 ext |
| Niederrhein context | Christof's domain |
9. Open questions
- Fine-tune in-the-loop. A community could fine-tune the Plattdeutsch model on its own corpus over time. Reserved.
- Document-level translation. Currently per-string. Document-coherence translation (better than chunked) is Phase 3.
- Glossary support. Domain glossaries (technical terms, names) preserved across translation. Phase 2.5.