| --- |
| tags: |
| - audio-to-audio |
| - text-to-speech |
| - speech-to-text |
| license: cc-by-nc-sa-4.0 |
| language: |
| - zh |
| - en |
| - de |
| - ja |
| - fr |
| - es |
| - ko |
| - ar |
| pipeline_tag: audio-to-audio |
| inference: false |
| extra_gated_prompt: >- |
| You agree to not use the model to generate contents that violate DMCA or local |
| laws. |
| extra_gated_fields: |
| Country: country |
| Specific date: date_picker |
| I agree to use this model for non-commercial use ONLY: checkbox |
| --- |
| |
| # Fish Agent V0.1 3B |
|
|
| **Fish Agent V0.1 3B** is a groundbreaking Voice-to-Voice model capable of capturing and generating environmental audio information with unprecedented accuracy. What sets it apart is its semantic-token-free architecture, eliminating the need for traditional semantic encoders/decoders like Whisper and CosyVoice. |
|
|
| Additionally, it stands as a state-of-the-art text-to-speech (TTS) model, trained on an extensive dataset of 700,000 hours of multilingual audio content. |
|
|
| This model is a continue-pretrained version of Qwen-2.5-3B-Instruct for 200B voice & text tokens. |
|
|
| ## Supported Languages |
| The model supports the following languages with their respective training data sizes: |
| - English (en): ~300,000 hours |
| - Chinese (zh): ~300,000 hours |
| - German (de): ~20,000 hours |
| - Japanese (ja): ~20,000 hours |
| - French (fr): ~20,000 hours |
| - Spanish (es): ~20,000 hours |
| - Korean (ko): ~20,000 hours |
| - Arabic (ar): ~20,000 hours |
|
|
| For detailed information and implementation guidelines, please visit our [Fish Speech GitHub repository](https://github.com/fishaudio/fish-speech). |
|
|
| ## Citation |
| If you find this repository helpful in your work, please consider citing: |
|
|
| ```bibtex |
| @misc{fish-agent-0.1, |
| author = {Shijia Liao and Tianyu Li and Rcell and others}, |
| title = {Fish Agent V0.1 3B}, |
| year = {2024}, |
| publisher = {GitHub}, |
| journal = {GitHub repository}, |
| howpublished = {\url{https://github.com/fishaudio/fish-speech}} |
| } |
| ``` |
|
|
| ## License |
| This model and its associated code are released under the BY-CC-NC-SA-4.0 license, allowing for non-commercial use with appropriate attribution. |
|
|