---
license: apache-2.0
tags:
- audio
- speech
- foundation-model
- next-token-prediction
- research
language:
- en
- es
- fr
- de
- th
- hi
- ar
- zh
---

# 🥤 SODA Preliminary Run

**SODA** (**S**caling **O**pen **D**iscrete **A**udio) is a suite of discrete audio foundation models using next-token prediction on interleaved semantic, acoustic, and text tokens.

🌐 **Project Page:** [https://soda-audio.github.io](https://soda-audio.github.io/)

This model is part of our SODA Research collection (e.g., IsoFLOP sweep, preliminary runs, or ablations). 

Note that this model (**soda-600m-prelim**) is the same model as in https://huggingface.co/WillHeld/blueberry

As an preliminary experiment, we were not strict with our data criterion, and we ended up training this model on multiple languages (%ratio) as follows: including English (72.52%), Spanish (15.04%), French (7.13%), German (4.73%), Thai (0.20%), Hindi (0.16%), Arabic (0.12%), Chinese (0.10%).
This model was trained on 500B tokens in total.

**For full usage instructions, and more information, please refer to the SODA-4B-base model card:**  
👉 **[SODA-4B-base](https://huggingface.co/soda-research/soda-4b-base)**

📈 **WandB**: https://wandb.ai/marin-community/marin/runs/exp1699_marin_yodas2-b5edae/