--- license: apache-2.0 tags: - audio - speech - foundation-model - next-token-prediction - research language: - en - es - fr - de - th - hi - ar - zh --- # 🥤 SODA Preliminary Run **SODA** (**S**caling **O**pen **D**iscrete **A**udio) is a suite of discrete audio foundation models using next-token prediction on interleaved semantic, acoustic, and text tokens. 🌐 **Project Page:** [https://soda-audio.github.io](https://soda-audio.github.io/) This model is part of our SODA Research collection (e.g., IsoFLOP sweep, preliminary runs, or ablations). Note that this model (**soda-600m-prelim**) is the same model as in https://huggingface.co/WillHeld/blueberry As an preliminary experiment, we were not strict with our data criterion, and we ended up training this model on multiple languages (%ratio) as follows: including English (72.52%), Spanish (15.04%), French (7.13%), German (4.73%), Thai (0.20%), Hindi (0.16%), Arabic (0.12%), Chinese (0.10%). This model was trained on 500B tokens in total. **For full usage instructions, and more information, please refer to the SODA-4B-base model card:** 👉 **[SODA-4B-base](https://huggingface.co/soda-research/soda-4b-base)** 📈 **WandB**: https://wandb.ai/marin-community/marin/runs/exp1699_marin_yodas2-b5edae/