--- license: other license_name: desert-ant-labs-source-available-1.0 license_link: LICENSE.md language: - en tags: - audio - speech-enhancement - denoising - dereverberation - on-device - core-ml - onnx pipeline_tag: audio-to-audio --- # Clear: on-device speech enhancement 48 kHz on-device speech enhancement. Takes noisy mono or stereo audio (phone mic, untreated room, traffic), returns a podcast-ready file: denoised, dereverbed, voice warm and present. ## Try it - **Browser demo:** [`huggingface.co/spaces/desert-ant-labs/clear-demo`](https://huggingface.co/spaces/desert-ant-labs/clear-demo). Twelve real recordings, raw vs cleaned, sample-aligned A/B. - **Drop-in SDKs** for iOS, Android, and Web are coming soon. Email for early access. ## Variants | Variant | Character | When to use | |---|---|---| | **`clear-studio`** | Quiet, studio-like; silences near zero | Default. Works across the full range of input quality: phone audio, laptop mic, untreated rooms, USB / XLR podcast captures. | | **`clear-natural`** | Room tone, breath, lip texture preserved | Treated podcast studios, USB / XLR captures, voiceover where the original sound is intentional. | If the source is already clean and you want the model to stay invisible, pick `clear-natural`. Otherwise `clear-studio` is the default. ## Files Both variants share the same architecture and realtime cost; only the weights differ. | Variant | File | Format | Size | |---|---|---|---:| | `clear-studio` | `clear-studio.mlpackage.zip` | Core ML mlpackage (fp16) | ~3.8 MB | | `clear-studio` | `clear-studio.mlmodelc.zip` | Core ML mlmodelc (fp16, precompiled) | ~3.8 MB | | `clear-studio` | `clear-studio.onnx` | ONNX (fp32) | ~8.5 MB | | `clear-natural` | `clear-natural.mlpackage.zip` | Core ML mlpackage (fp16) | ~3.8 MB | | `clear-natural` | `clear-natural.mlmodelc.zip` | Core ML mlmodelc (fp16, precompiled) | ~3.8 MB | | `clear-natural` | `clear-natural.onnx` | ONNX (fp32) | ~8.5 MB | ## Use ### ONNX ```python from huggingface_hub import hf_hub_download import onnxruntime as ort path = hf_hub_download("desert-ant-labs/clear", "clear-studio.onnx") session = ort.InferenceSession(path, providers=["CPUExecutionProvider"]) ``` ## Inputs and outputs - **Architecture:** DeepFilterNet 3 (DFN3-half). - **Sample rate:** 48 kHz, mono or stereo (per-channel inference). - **Inference contract:** `spec` / `feat_erb` / `feat_spec` → `spec_enhanced`. STFT, ERB, and ISTFT are host-side DSP, not part of the model graph. ## Performance Both variants run at the same speed. Enhancing a 5-minute clip on the Apple Neural Engine: | Device | Chip | Mono | Stereo | |---|---|---:|---:| | iPhone 15 Pro | A17 Pro | 4.88 s (61× realtime) | 6.53 s (46×) | | iPhone 17 Pro | A19 Pro | 3.70 s (81× realtime) | 5.16 s (58×) | Cold model load is ~0.6 s; later loads ~100 ms via the system ANE cache. ## Limitations - Trained on English speech; non-English speech still benefits but has not been measured against per-language ground truth. - Heavy background music or multi-speaker overlap degrades quality. - Mastering is informational only; verify against the platform's actual loudness target before publishing. ## Built on - [DeepFilterNet 3](https://github.com/Rikorose/DeepFilterNet) by Rikorose, MIT. Fine-tuned on the Desert Ant Labs speech corpus. ## License Released under the **Desert Ant Labs Source-Available License v1.0** (see [`LICENSE.md`](LICENSE.md)). - **Free for commercial use up to 100,000 Monthly Active Users (MAU).** - Above 100,000 MAU a commercial license is required. Contact . ## Citation ```bibtex @software{clear_2026, title = {Clear: on-device speech enhancement}, author = {Desert Ant Labs}, year = {2026}, url = {https://huggingface.co/desert-ant-labs/clear}, } ``` --- © 2026 Desert Ant Labs ·