Spaces:

fahin-one
/

submission_svara_2

Runtime error

App Files Files Community

fahin-one commited on Nov 27, 2025

Commit

a1911f8

1 Parent(s): a64eac2

Initial upload of full project to HF Space

Browse files

Files changed (34) hide show

.gitattributes +1 -0
.gitignore +21 -0
Dockerfile +28 -0
README.md +25 -10
^^^^^^^^^XX +0 -0
adapters/bn/README.md +207 -0
adapters/bn/adapter_config.json +43 -0
adapters/bn/adapter_model.safetensors +3 -0
adapters/bn/chat_template.jinja +93 -0
adapters/bn/checkpoint-500/README.md +207 -0
adapters/bn/checkpoint-500/adapter_config.json +43 -0
adapters/bn/checkpoint-500/adapter_model.safetensors +3 -0
adapters/bn/checkpoint-500/optimizer.pt +3 -0
adapters/bn/checkpoint-500/rng_state.pth +3 -0
adapters/bn/checkpoint-500/scaler.pt +3 -0
adapters/bn/checkpoint-500/scheduler.pt +3 -0
adapters/bn/checkpoint-500/trainer_state.json +104 -0
adapters/bn/checkpoint-500/training_args.bin +3 -0
adapters/bn/checkpoint-795/README.md +207 -0
adapters/bn/checkpoint-795/adapter_config.json +43 -0
adapters/bn/checkpoint-795/adapter_model.safetensors +3 -0
adapters/bn/checkpoint-795/optimizer.pt +3 -0
adapters/bn/checkpoint-795/rng_state.pth +3 -0
adapters/bn/checkpoint-795/scaler.pt +3 -0
adapters/bn/checkpoint-795/scheduler.pt +3 -0
adapters/bn/checkpoint-795/trainer_state.json +139 -0
adapters/bn/checkpoint-795/training_args.bin +3 -0
adapters/bn/special_tokens_map.json +26 -0
adapters/bn/tokenizer.json +3 -0
adapters/bn/tokenizer_config.json +0 -0
app.py +65 -0
inference.py +108 -0
requirements.txt +25 -0
submission_svara_2 +1 -0

.gitattributes CHANGED Viewed

@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text

 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
+adapters/bn/tokenizer.json filter=lfs diff=lfs merge=lfs -text

.gitignore ADDED Viewed

	@@ -0,0 +1,21 @@

+# Ignore raw data and logs
+data/
+logs/
+checkpoints/
+wandb/
+tensorboard/
+# Ignore compressed archives
+*.zip
+*.tar
+*.tar.gz
+*.7z
+# Ignore temporary files
+__pycache__/
+*.pyc
+*.pyo
+*.tmp
+*.log
+.DS_Store

Dockerfile ADDED Viewed

	@@ -0,0 +1,28 @@

+FROM pytorch/pytorch:2.1.2-cuda12.1-cudnn8-runtime
+WORKDIR /app
+# Install system dependencies for audio
+RUN apt-get update && apt-get install -y \
+    libsndfile1 \
+    ffmpeg \
+    git \
+    && rm -rf /var/lib/apt/lists/*
+# Install Python deps
+COPY requirements.txt .
+RUN pip install --no-cache-dir -r requirements.txt
+# Copy application code
+COPY . .
+# Create cache dir for Transformers
+RUN mkdir -p /app/cache
+ENV HF_HOME=/app/cache
+RUN chmod -R 777 /app
+# Expose port
+EXPOSE 7860
+# Run
+CMD ["uvicorn", "app:app", "--host", "0.0.0.0", "--port", "7860"]

README.md CHANGED Viewed

@@ -1,10 +1,25 @@
----
-title: Submission Svara 2
-emoji: 🌍
-colorFrom: indigo
-colorTo: pink
-sdk: docker
-pinned: false
----
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

+# Svara TTS Submission - Team [Your Name/Team]
+This repository contains the inference pipeline for the **Voice Tech for All** challenge. It serves a fine-tuned `kenpath/svara-tts-v1` model capable of synthesizing speech in 11 Indian languages via dynamic LoRA adapter switching.
+## 🏗 Architecture
+- **Base Model:** `kenpath/svara-tts-v1` (Llama-based architecture predicting discrete audio codes).
+- **Vocoder:** `SNAC` (Multi-Scale Neural Audio Codec) at 24kHz.
+- **Adaptation:** 11 LoRA adapters loaded dynamically at runtime to minimize VRAM usage.
+- **Serving:** FastAPI server exposing a REST endpoint.
+## 🚀 Usage
+### Endpoint: `POST /generate`
+Generates audio from text for a specific language.
+**Parameters:**
+- `text` (string): The text to synthesize.
+- `language` (string): The language code (matches the adapter folder names, e.g., `hi`, `bn`).
+**Example cURL:**
+```bash
+curl -X POST "[https://your-space-url.hf.space/generate](https://your-space-url.hf.space/generate)" \
+     -F "text=नमस्ते, आप कैसे हैं?" \
+     -F "language=hi" \
+     --output output.wav

^^^^^^^^^XX ADDED Viewed

File without changes

adapters/bn/README.md ADDED Viewed

	@@ -0,0 +1,207 @@

+---
+base_model: kenpath/svara-tts-v1
+library_name: peft
+pipeline_tag: text-generation
+tags:
+- base_model:adapter:kenpath/svara-tts-v1
+- lora
+- transformers
+---
+# Model Card for Model ID
+<!-- Provide a quick summary of what the model is/does. -->
+## Model Details
+### Model Description
+<!-- Provide a longer summary of what this model is. -->
+- **Developed by:** [More Information Needed]
+- **Funded by [optional]:** [More Information Needed]
+- **Shared by [optional]:** [More Information Needed]
+- **Model type:** [More Information Needed]
+- **Language(s) (NLP):** [More Information Needed]
+- **License:** [More Information Needed]
+- **Finetuned from model [optional]:** [More Information Needed]
+### Model Sources [optional]
+<!-- Provide the basic links for the model. -->
+- **Repository:** [More Information Needed]
+- **Paper [optional]:** [More Information Needed]
+- **Demo [optional]:** [More Information Needed]
+## Uses
+<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+### Direct Use
+<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
+[More Information Needed]
+### Downstream Use [optional]
+<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
+[More Information Needed]
+### Out-of-Scope Use
+<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
+[More Information Needed]
+## Bias, Risks, and Limitations
+<!-- This section is meant to convey both technical and sociotechnical limitations. -->
+[More Information Needed]
+### Recommendations
+<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
+Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
+## How to Get Started with the Model
+Use the code below to get started with the model.
+[More Information Needed]
+## Training Details
+### Training Data
+<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+[More Information Needed]
+### Training Procedure
+<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+#### Preprocessing [optional]
+[More Information Needed]
+#### Training Hyperparameters
+- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
+#### Speeds, Sizes, Times [optional]
+<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
+[More Information Needed]
+## Evaluation
+<!-- This section describes the evaluation protocols and provides the results. -->
+### Testing Data, Factors & Metrics
+#### Testing Data
+<!-- This should link to a Dataset Card if possible. -->
+[More Information Needed]
+#### Factors
+<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
+[More Information Needed]
+#### Metrics
+<!-- These are the evaluation metrics being used, ideally with a description of why. -->
+[More Information Needed]
+### Results
+[More Information Needed]
+#### Summary
+## Model Examination [optional]
+<!-- Relevant interpretability work for the model goes here -->
+[More Information Needed]
+## Environmental Impact
+<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
+Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+- **Hardware Type:** [More Information Needed]
+- **Hours used:** [More Information Needed]
+- **Cloud Provider:** [More Information Needed]
+- **Compute Region:** [More Information Needed]
+- **Carbon Emitted:** [More Information Needed]
+## Technical Specifications [optional]
+### Model Architecture and Objective
+[More Information Needed]
+### Compute Infrastructure
+[More Information Needed]
+#### Hardware
+[More Information Needed]
+#### Software
+[More Information Needed]
+## Citation [optional]
+<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
+**BibTeX:**
+[More Information Needed]
+**APA:**
+[More Information Needed]
+## Glossary [optional]
+<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
+[More Information Needed]
+## More Information [optional]
+[More Information Needed]
+## Model Card Authors [optional]
+[More Information Needed]
+## Model Card Contact
+[More Information Needed]
+### Framework versions
+- PEFT 0.16.0

adapters/bn/adapter_config.json ADDED Viewed

	@@ -0,0 +1,43 @@

+{
+  "alpha_pattern": {},
+  "auto_mapping": null,
+  "base_model_name_or_path": "kenpath/svara-tts-v1",
+  "bias": "none",
+  "corda_config": null,
+  "eva_config": null,
+  "exclude_modules": null,
+  "fan_in_fan_out": false,
+  "inference_mode": true,
+  "init_lora_weights": true,
+  "layer_replication": null,
+  "layers_pattern": null,
+  "layers_to_transform": null,
+  "loftq_config": {},
+  "lora_alpha": 32,
+  "lora_bias": false,
+  "lora_dropout": 0.1,
+  "megatron_config": null,
+  "megatron_core": "megatron.core",
+  "modules_to_save": null,
+  "peft_type": "LORA",
+  "qalora_group_size": 16,
+  "r": 8,
+  "rank_pattern": {},
+  "revision": null,
+  "target_modules": [
+    "wv",
+    "o_proj",
+    "wk",
+    "k_proj",
+    "v_proj",
+    "q_proj",
+    "wq",
+    "proj",
+    "wo"
+  ],
+  "task_type": "CAUSAL_LM",
+  "trainable_token_indices": null,
+  "use_dora": false,
+  "use_qalora": false,
+  "use_rslora": false
+}

adapters/bn/adapter_model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:aa246c0e3683124cdcd8c7d3a8456a46c994b8c468878c8d9f872efdc135e142
+size 18379784

adapters/bn/chat_template.jinja ADDED Viewed

	@@ -0,0 +1,93 @@

+{{- bos_token }}
+{%- if custom_tools is defined %}
+    {%- set tools = custom_tools %}
+{%- endif %}
+{%- if not tools_in_user_message is defined %}
+    {%- set tools_in_user_message = true %}
+{%- endif %}
+{%- if not date_string is defined %}
+    {%- if strftime_now is defined %}
+        {%- set date_string = strftime_now("%d %b %Y") %}
+    {%- else %}
+        {%- set date_string = "26 Jul 2024" %}
+    {%- endif %}
+{%- endif %}
+{%- if not tools is defined %}
+    {%- set tools = none %}
+{%- endif %}
+{#- This block extracts the system message, so we can slot it into the right place. #}
+{%- if messages[0]['role'] == 'system' %}
+    {%- set system_message = messages[0]['content']|trim %}
+    {%- set messages = messages[1:] %}
+{%- else %}
+    {%- set system_message = "" %}
+{%- endif %}
+{#- System message #}
+{{- "<|start_header_id|>system<|end_header_id|>\n\n" }}
+{%- if tools is not none %}
+    {{- "Environment: ipython\n" }}
+{%- endif %}
+{{- "Cutting Knowledge Date: December 2023\n" }}
+{{- "Today Date: " + date_string + "\n\n" }}
+{%- if tools is not none and not tools_in_user_message %}
+    {{- "You have access to the following functions. To call a function, please respond with JSON for a function call." }}
+    {{- 'Respond in the format {"name": function name, "parameters": dictionary of argument name and its value}.' }}
+    {{- "Do not use variables.\n\n" }}
+    {%- for t in tools %}
+        {{- t | tojson(indent=4) }}
+        {{- "\n\n" }}
+    {%- endfor %}
+{%- endif %}
+{{- system_message }}
+{{- "<|eot_id|>" }}
+{#- Custom tools are passed in a user message with some extra guidance #}
+{%- if tools_in_user_message and not tools is none %}
+    {#- Extract the first user message so we can plug it in here #}
+    {%- if messages | length != 0 %}
+        {%- set first_user_message = messages[0]['content']|trim %}
+        {%- set messages = messages[1:] %}
+    {%- else %}
+        {{- raise_exception("Cannot put tools in the first user message when there's no first user message!") }}
+{%- endif %}
+    {{- '<|start_header_id|>user<|end_header_id|>\n\n' -}}
+    {{- "Given the following functions, please respond with a JSON for a function call " }}
+    {{- "with its proper arguments that best answers the given prompt.\n\n" }}
+    {{- 'Respond in the format {"name": function name, "parameters": dictionary of argument name and its value}.' }}
+    {{- "Do not use variables.\n\n" }}
+    {%- for t in tools %}
+        {{- t | tojson(indent=4) }}
+        {{- "\n\n" }}
+    {%- endfor %}
+    {{- first_user_message + "<|eot_id|>"}}
+{%- endif %}
+{%- for message in messages %}
+    {%- if not (message.role == 'ipython' or message.role == 'tool' or 'tool_calls' in message) %}
+        {{- '<|start_header_id|>' + message['role'] + '<|end_header_id|>\n\n'+ message['content'] | trim + '<|eot_id|>' }}
+    {%- elif 'tool_calls' in message %}
+        {%- if not message.tool_calls|length == 1 %}
+            {{- raise_exception("This model only supports single tool-calls at once!") }}
+        {%- endif %}
+        {%- set tool_call = message.tool_calls[0].function %}
+        {{- '<|start_header_id|>assistant<|end_header_id|>\n\n' -}}
+        {{- '{"name": "' + tool_call.name + '", ' }}
+        {{- '"parameters": ' }}
+        {{- tool_call.arguments | tojson }}
+        {{- "}" }}
+        {{- "<|eot_id|>" }}
+    {%- elif message.role == "tool" or message.role == "ipython" %}
+        {{- "<|start_header_id|>ipython<|end_header_id|>\n\n" }}
+        {%- if message.content is mapping or message.content is iterable %}
+            {{- message.content | tojson }}
+        {%- else %}
+            {{- message.content }}
+        {%- endif %}
+        {{- "<|eot_id|>" }}
+    {%- endif %}
+{%- endfor %}
+{%- if add_generation_prompt %}
+    {{- '<|start_header_id|>assistant<|end_header_id|>\n\n' }}
+{%- endif %}

adapters/bn/checkpoint-500/README.md ADDED Viewed

	@@ -0,0 +1,207 @@

+---
+base_model: kenpath/svara-tts-v1
+library_name: peft
+pipeline_tag: text-generation
+tags:
+- base_model:adapter:kenpath/svara-tts-v1
+- lora
+- transformers
+---
+# Model Card for Model ID
+<!-- Provide a quick summary of what the model is/does. -->
+## Model Details
+### Model Description
+<!-- Provide a longer summary of what this model is. -->
+- **Developed by:** [More Information Needed]
+- **Funded by [optional]:** [More Information Needed]
+- **Shared by [optional]:** [More Information Needed]
+- **Model type:** [More Information Needed]
+- **Language(s) (NLP):** [More Information Needed]
+- **License:** [More Information Needed]
+- **Finetuned from model [optional]:** [More Information Needed]
+### Model Sources [optional]
+<!-- Provide the basic links for the model. -->
+- **Repository:** [More Information Needed]
+- **Paper [optional]:** [More Information Needed]
+- **Demo [optional]:** [More Information Needed]
+## Uses
+<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+### Direct Use
+<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
+[More Information Needed]
+### Downstream Use [optional]
+<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
+[More Information Needed]
+### Out-of-Scope Use
+<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
+[More Information Needed]
+## Bias, Risks, and Limitations
+<!-- This section is meant to convey both technical and sociotechnical limitations. -->
+[More Information Needed]
+### Recommendations
+<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
+Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
+## How to Get Started with the Model
+Use the code below to get started with the model.
+[More Information Needed]
+## Training Details
+### Training Data
+<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+[More Information Needed]
+### Training Procedure
+<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+#### Preprocessing [optional]
+[More Information Needed]
+#### Training Hyperparameters
+- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
+#### Speeds, Sizes, Times [optional]
+<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
+[More Information Needed]
+## Evaluation
+<!-- This section describes the evaluation protocols and provides the results. -->
+### Testing Data, Factors & Metrics
+#### Testing Data
+<!-- This should link to a Dataset Card if possible. -->
+[More Information Needed]
+#### Factors
+<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
+[More Information Needed]
+#### Metrics
+<!-- These are the evaluation metrics being used, ideally with a description of why. -->
+[More Information Needed]
+### Results
+[More Information Needed]
+#### Summary
+## Model Examination [optional]
+<!-- Relevant interpretability work for the model goes here -->
+[More Information Needed]
+## Environmental Impact
+<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
+Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+- **Hardware Type:** [More Information Needed]
+- **Hours used:** [More Information Needed]
+- **Cloud Provider:** [More Information Needed]
+- **Compute Region:** [More Information Needed]
+- **Carbon Emitted:** [More Information Needed]
+## Technical Specifications [optional]
+### Model Architecture and Objective
+[More Information Needed]
+### Compute Infrastructure
+[More Information Needed]
+#### Hardware
+[More Information Needed]
+#### Software
+[More Information Needed]
+## Citation [optional]
+<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
+**BibTeX:**
+[More Information Needed]
+**APA:**
+[More Information Needed]
+## Glossary [optional]
+<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
+[More Information Needed]
+## More Information [optional]
+[More Information Needed]
+## Model Card Authors [optional]
+[More Information Needed]
+## Model Card Contact
+[More Information Needed]
+### Framework versions
+- PEFT 0.16.0

adapters/bn/checkpoint-500/adapter_config.json ADDED Viewed

	@@ -0,0 +1,43 @@

+{
+  "alpha_pattern": {},
+  "auto_mapping": null,
+  "base_model_name_or_path": "kenpath/svara-tts-v1",
+  "bias": "none",
+  "corda_config": null,
+  "eva_config": null,
+  "exclude_modules": null,
+  "fan_in_fan_out": false,
+  "inference_mode": true,
+  "init_lora_weights": true,
+  "layer_replication": null,
+  "layers_pattern": null,
+  "layers_to_transform": null,
+  "loftq_config": {},
+  "lora_alpha": 32,
+  "lora_bias": false,
+  "lora_dropout": 0.1,
+  "megatron_config": null,
+  "megatron_core": "megatron.core",
+  "modules_to_save": null,
+  "peft_type": "LORA",
+  "qalora_group_size": 16,
+  "r": 8,
+  "rank_pattern": {},
+  "revision": null,
+  "target_modules": [
+    "wv",
+    "o_proj",
+    "wk",
+    "k_proj",
+    "v_proj",
+    "q_proj",
+    "wq",
+    "proj",
+    "wo"
+  ],
+  "task_type": "CAUSAL_LM",
+  "trainable_token_indices": null,
+  "use_dora": false,
+  "use_qalora": false,
+  "use_rslora": false
+}

adapters/bn/checkpoint-500/adapter_model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:8465d913974758b1dee3d0e63d968733d8f1d8b9e026d8f35827e2b6dd40922b
+size 18379784

adapters/bn/checkpoint-500/optimizer.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:fdc17a6f375be7718ba586e69aad417004f1a52c16ab6dab8d317f3fdd1078af
+size 9584634

adapters/bn/checkpoint-500/rng_state.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:a6aed2b10bee4226d583ef0795da2b39268d8b034159ae8b85200ff38e62abab
+size 14244

adapters/bn/checkpoint-500/scaler.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:8b518d56b3c6725bead57518a740c30779ace967ae41891564a1adfcc41ae4af
+size 988

adapters/bn/checkpoint-500/scheduler.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:8440834bedbdd291243e37f57b580c1e6bef4a868c200733affded7216504b85
+size 1064

adapters/bn/checkpoint-500/trainer_state.json ADDED Viewed

	@@ -0,0 +1,104 @@

+{
+  "best_global_step": null,
+  "best_metric": null,
+  "best_model_checkpoint": null,
+  "epoch": 1.888468809073724,
+  "eval_steps": 500,
+  "global_step": 500,
+  "is_hyper_param_search": false,
+  "is_local_process_zero": true,
+  "is_world_process_zero": true,
+  "log_history": [
+    {
+      "epoch": 0.1890359168241966,
+      "grad_norm": 0.6747773289680481,
+      "learning_rate": 0.00028188679245283015,
+      "loss": 4.7024,
+      "step": 50
+    },
+    {
+      "epoch": 0.3780718336483932,
+      "grad_norm": 0.5653451681137085,
+      "learning_rate": 0.00026301886792452826,
+      "loss": 4.4922,
+      "step": 100
+    },
+    {
+      "epoch": 0.5671077504725898,
+      "grad_norm": 0.7042025327682495,
+      "learning_rate": 0.0002441509433962264,
+      "loss": 4.5527,
+      "step": 150
+    },
+    {
+      "epoch": 0.7561436672967864,
+      "grad_norm": 0.6175432205200195,
+      "learning_rate": 0.00022528301886792453,
+      "loss": 4.5609,
+      "step": 200
+    },
+    {
+      "epoch": 0.945179584120983,
+      "grad_norm": 0.6139667630195618,
+      "learning_rate": 0.00020641509433962262,
+      "loss": 4.5874,
+      "step": 250
+    },
+    {
+      "epoch": 1.1323251417769375,
+      "grad_norm": 0.6452978849411011,
+      "learning_rate": 0.00018754716981132072,
+      "loss": 4.4433,
+      "step": 300
+    },
+    {
+      "epoch": 1.3213610586011342,
+      "grad_norm": 0.5474215149879456,
+      "learning_rate": 0.00016867924528301886,
+      "loss": 4.388,
+      "step": 350
+    },
+    {
+      "epoch": 1.5103969754253308,
+      "grad_norm": 0.811112642288208,
+      "learning_rate": 0.00014981132075471697,
+      "loss": 4.4651,
+      "step": 400
+    },
+    {
+      "epoch": 1.6994328922495274,
+      "grad_norm": 0.7931979894638062,
+      "learning_rate": 0.00013094339622641508,
+      "loss": 4.3723,
+      "step": 450
+    },
+    {
+      "epoch": 1.888468809073724,
+      "grad_norm": 0.6235228776931763,
+      "learning_rate": 0.00011207547169811319,
+      "loss": 4.4056,
+      "step": 500
+    }
+  ],
+  "logging_steps": 50,
+  "max_steps": 795,
+  "num_input_tokens_seen": 0,
+  "num_train_epochs": 3,
+  "save_steps": 500,
+  "stateful_callbacks": {
+    "TrainerControl": {
+      "args": {
+        "should_epoch_stop": false,
+        "should_evaluate": false,
+        "should_log": false,
+        "should_save": true,
+        "should_training_stop": false
+      },
+      "attributes": {}
+    }
+  },
+  "total_flos": 1.5224731872903168e+16,
+  "train_batch_size": 1,
+  "trial_name": null,
+  "trial_params": null
+}

adapters/bn/checkpoint-500/training_args.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:bb5894c37dcda07d3b6dc6d125319810ab7fca777cf1e2f1e7b24c9af6cdc573
+size 5368

adapters/bn/checkpoint-795/README.md ADDED Viewed

	@@ -0,0 +1,207 @@

+---
+base_model: kenpath/svara-tts-v1
+library_name: peft
+pipeline_tag: text-generation
+tags:
+- base_model:adapter:kenpath/svara-tts-v1
+- lora
+- transformers
+---
+# Model Card for Model ID
+<!-- Provide a quick summary of what the model is/does. -->
+## Model Details
+### Model Description
+<!-- Provide a longer summary of what this model is. -->
+- **Developed by:** [More Information Needed]
+- **Funded by [optional]:** [More Information Needed]
+- **Shared by [optional]:** [More Information Needed]
+- **Model type:** [More Information Needed]
+- **Language(s) (NLP):** [More Information Needed]
+- **License:** [More Information Needed]
+- **Finetuned from model [optional]:** [More Information Needed]
+### Model Sources [optional]
+<!-- Provide the basic links for the model. -->
+- **Repository:** [More Information Needed]
+- **Paper [optional]:** [More Information Needed]
+- **Demo [optional]:** [More Information Needed]
+## Uses
+<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+### Direct Use
+<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
+[More Information Needed]
+### Downstream Use [optional]
+<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
+[More Information Needed]
+### Out-of-Scope Use
+<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
+[More Information Needed]
+## Bias, Risks, and Limitations
+<!-- This section is meant to convey both technical and sociotechnical limitations. -->
+[More Information Needed]
+### Recommendations
+<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
+Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
+## How to Get Started with the Model
+Use the code below to get started with the model.
+[More Information Needed]
+## Training Details
+### Training Data
+<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+[More Information Needed]
+### Training Procedure
+<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+#### Preprocessing [optional]
+[More Information Needed]
+#### Training Hyperparameters
+- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
+#### Speeds, Sizes, Times [optional]
+<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
+[More Information Needed]
+## Evaluation
+<!-- This section describes the evaluation protocols and provides the results. -->
+### Testing Data, Factors & Metrics
+#### Testing Data
+<!-- This should link to a Dataset Card if possible. -->
+[More Information Needed]
+#### Factors
+<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
+[More Information Needed]
+#### Metrics
+<!-- These are the evaluation metrics being used, ideally with a description of why. -->
+[More Information Needed]
+### Results
+[More Information Needed]
+#### Summary
+## Model Examination [optional]
+<!-- Relevant interpretability work for the model goes here -->
+[More Information Needed]
+## Environmental Impact
+<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
+Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+- **Hardware Type:** [More Information Needed]
+- **Hours used:** [More Information Needed]
+- **Cloud Provider:** [More Information Needed]
+- **Compute Region:** [More Information Needed]
+- **Carbon Emitted:** [More Information Needed]
+## Technical Specifications [optional]
+### Model Architecture and Objective
+[More Information Needed]
+### Compute Infrastructure
+[More Information Needed]
+#### Hardware
+[More Information Needed]
+#### Software
+[More Information Needed]
+## Citation [optional]
+<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
+**BibTeX:**
+[More Information Needed]
+**APA:**
+[More Information Needed]
+## Glossary [optional]
+<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
+[More Information Needed]
+## More Information [optional]
+[More Information Needed]
+## Model Card Authors [optional]
+[More Information Needed]
+## Model Card Contact
+[More Information Needed]
+### Framework versions
+- PEFT 0.16.0

adapters/bn/checkpoint-795/adapter_config.json ADDED Viewed

	@@ -0,0 +1,43 @@

+{
+  "alpha_pattern": {},
+  "auto_mapping": null,
+  "base_model_name_or_path": "kenpath/svara-tts-v1",
+  "bias": "none",
+  "corda_config": null,
+  "eva_config": null,
+  "exclude_modules": null,
+  "fan_in_fan_out": false,
+  "inference_mode": true,
+  "init_lora_weights": true,
+  "layer_replication": null,
+  "layers_pattern": null,
+  "layers_to_transform": null,
+  "loftq_config": {},
+  "lora_alpha": 32,
+  "lora_bias": false,
+  "lora_dropout": 0.1,
+  "megatron_config": null,
+  "megatron_core": "megatron.core",
+  "modules_to_save": null,
+  "peft_type": "LORA",
+  "qalora_group_size": 16,
+  "r": 8,
+  "rank_pattern": {},
+  "revision": null,
+  "target_modules": [
+    "wv",
+    "o_proj",
+    "wk",
+    "k_proj",
+    "v_proj",
+    "q_proj",
+    "wq",
+    "proj",
+    "wo"
+  ],
+  "task_type": "CAUSAL_LM",
+  "trainable_token_indices": null,
+  "use_dora": false,
+  "use_qalora": false,
+  "use_rslora": false
+}

adapters/bn/checkpoint-795/adapter_model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:aa246c0e3683124cdcd8c7d3a8456a46c994b8c468878c8d9f872efdc135e142
+size 18379784

adapters/bn/checkpoint-795/optimizer.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:f2d8ca4910b2d06ca8a6300901836c679c4638a5cb62cde0072fd4c01d317361
+size 9584634

adapters/bn/checkpoint-795/rng_state.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:4f8664dee33a75e16cd244ee571724c5ca643d48198e78132a298003b0c5b830
+size 14244

adapters/bn/checkpoint-795/scaler.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:373ec1ace89608530ff2ea47177c94dea66acdbf718ea1a11c0fa66f6af52dd4
+size 988

adapters/bn/checkpoint-795/scheduler.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:dc385e4975e71baea60dded77250967d6eeaacdb26e791049a369ffb1f066f33
+size 1064

adapters/bn/checkpoint-795/trainer_state.json ADDED Viewed

	@@ -0,0 +1,139 @@

+{
+  "best_global_step": null,
+  "best_metric": null,
+  "best_model_checkpoint": null,
+  "epoch": 3.0,
+  "eval_steps": 500,
+  "global_step": 795,
+  "is_hyper_param_search": false,
+  "is_local_process_zero": true,
+  "is_world_process_zero": true,
+  "log_history": [
+    {
+      "epoch": 0.1890359168241966,
+      "grad_norm": 0.6747773289680481,
+      "learning_rate": 0.00028188679245283015,
+      "loss": 4.7024,
+      "step": 50
+    },
+    {
+      "epoch": 0.3780718336483932,
+      "grad_norm": 0.5653451681137085,
+      "learning_rate": 0.00026301886792452826,
+      "loss": 4.4922,
+      "step": 100
+    },
+    {
+      "epoch": 0.5671077504725898,
+      "grad_norm": 0.7042025327682495,
+      "learning_rate": 0.0002441509433962264,
+      "loss": 4.5527,
+      "step": 150
+    },
+    {
+      "epoch": 0.7561436672967864,
+      "grad_norm": 0.6175432205200195,
+      "learning_rate": 0.00022528301886792453,
+      "loss": 4.5609,
+      "step": 200
+    },
+    {
+      "epoch": 0.945179584120983,
+      "grad_norm": 0.6139667630195618,
+      "learning_rate": 0.00020641509433962262,
+      "loss": 4.5874,
+      "step": 250
+    },
+    {
+      "epoch": 1.1323251417769375,
+      "grad_norm": 0.6452978849411011,
+      "learning_rate": 0.00018754716981132072,
+      "loss": 4.4433,
+      "step": 300
+    },
+    {
+      "epoch": 1.3213610586011342,
+      "grad_norm": 0.5474215149879456,
+      "learning_rate": 0.00016867924528301886,
+      "loss": 4.388,
+      "step": 350
+    },
+    {
+      "epoch": 1.5103969754253308,
+      "grad_norm": 0.811112642288208,
+      "learning_rate": 0.00014981132075471697,
+      "loss": 4.4651,
+      "step": 400
+    },
+    {
+      "epoch": 1.6994328922495274,
+      "grad_norm": 0.7931979894638062,
+      "learning_rate": 0.00013094339622641508,
+      "loss": 4.3723,
+      "step": 450
+    },
+    {
+      "epoch": 1.888468809073724,
+      "grad_norm": 0.6235228776931763,
+      "learning_rate": 0.00011207547169811319,
+      "loss": 4.4056,
+      "step": 500
+    },
+    {
+      "epoch": 2.0756143667296785,
+      "grad_norm": 0.7440807819366455,
+      "learning_rate": 9.320754716981131e-05,
+      "loss": 4.4061,
+      "step": 550
+    },
+    {
+      "epoch": 2.264650283553875,
+      "grad_norm": 0.7756364941596985,
+      "learning_rate": 7.433962264150943e-05,
+      "loss": 4.3203,
+      "step": 600
+    },
+    {
+      "epoch": 2.4536862003780717,
+      "grad_norm": 0.7223451733589172,
+      "learning_rate": 5.547169811320754e-05,
+      "loss": 4.2439,
+      "step": 650
+    },
+    {
+      "epoch": 2.6427221172022684,
+      "grad_norm": 0.7152644395828247,
+      "learning_rate": 3.6603773584905655e-05,
+      "loss": 4.308,
+      "step": 700
+    },
+    {
+      "epoch": 2.831758034026465,
+      "grad_norm": 0.8246577382087708,
+      "learning_rate": 1.773584905660377e-05,
+      "loss": 4.402,
+      "step": 750
+    }
+  ],
+  "logging_steps": 50,
+  "max_steps": 795,
+  "num_input_tokens_seen": 0,
+  "num_train_epochs": 3,
+  "save_steps": 500,
+  "stateful_callbacks": {
+    "TrainerControl": {
+      "args": {
+        "should_epoch_stop": false,
+        "should_evaluate": false,
+        "should_log": false,
+        "should_save": true,
+        "should_training_stop": true
+      },
+      "attributes": {}
+    }
+  },
+  "total_flos": 2.4184540042776576e+16,
+  "train_batch_size": 1,
+  "trial_name": null,
+  "trial_params": null
+}

adapters/bn/checkpoint-795/training_args.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:bb5894c37dcda07d3b6dc6d125319810ab7fca777cf1e2f1e7b24c9af6cdc573
+size 5368

adapters/bn/special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,26 @@

+{
+  "additional_special_tokens": [
+    "<|audio|>"
+  ],
+  "bos_token": {
+    "content": "<|begin_of_text|>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "eos_token": {
+    "content": "<|eot_id|>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "pad_token": {
+    "content": "<custom_token_7>",
+    "lstrip": false,
+    "normalized": true,
+    "rstrip": false,
+    "single_word": false
+  }
+}

adapters/bn/tokenizer.json ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:044e2a10201774018db120391980464472baabf223bd353cea49b17da0b66abc
+size 22849546

adapters/bn/tokenizer_config.json ADDED Viewed

The diff for this file is too large to render. See raw diff

app.py ADDED Viewed

	@@ -0,0 +1,65 @@

+import os
+import io
+import soundfile as sf
+from fastapi import FastAPI, Form, HTTPException
+from fastapi.responses import StreamingResponse
+from inference import SvaraEngine
+# Configuration
+BASE_MODEL = "kenpath/svara-tts-v1"
+ADAPTERS_DIR = "./adapters"
+# Auto-discover adapters in the folder
+adapter_map = {}
+if os.path.exists(ADAPTERS_DIR):
+    for name in os.listdir(ADAPTERS_DIR):
+        path = os.path.join(ADAPTERS_DIR, name)
+        if os.path.isdir(path):
+            adapter_map[name] = path
+# Initialize App & Engine
+app = FastAPI(title="Team Submission - Svara TTS API")
+engine = None
+@app.on_event("startup")
+async def startup_event():
+    global engine
+    # Initialize engine once on startup
+    if not adapter_map:
+        raise RuntimeError("No adapters found in ./adapters folder!")
+    engine = SvaraEngine(BASE_MODEL, adapter_map)
+@app.get("/")
+def home():
+    return {"status": "healthy", "languages_loaded": list(adapter_map.keys())}
+@app.post("/generate")
+async def generate_audio(text: str = Form(...), language: str = Form(...)):
+    """
+    Endpoint strictly for the judges.
+    Input: Text and Language Code (e.g., 'hi', 'bn').
+    Output: WAV Audio Stream.
+    """
+    if language not in adapter_map:
+        raise HTTPException(status_code=400, detail=f"Language '{language}' not supported.")
+    try:
+        # Run Inference
+        audio_data, sr = engine.synthesize(text, language)
+        if audio_data is None:
+             raise HTTPException(status_code=500, detail="Model failed to generate audio tokens.")
+        # Convert to WAV in-memory
+        buffer = io.BytesIO()
+        sf.write(buffer, audio_data, sr, format='WAV')
+        buffer.seek(0)
+        return StreamingResponse(buffer, media_type="audio/wav")
+    except Exception as e:
+        raise HTTPException(status_code=500, detail=str(e))
+if __name__ == "__main__":
+    import uvicorn
+    uvicorn.run(app, host="0.0.0.0", port=7860)

inference.py ADDED Viewed

	@@ -0,0 +1,108 @@

+import torch
+import torchaudio
+from transformers import AutoModelForCausalLM, AutoTokenizer
+from peft import PeftModel
+from snac import SNAC
+class SvaraEngine:
+    def __init__(self, base_model_id, adapter_map, device="cuda"):
+        self.device = device if torch.cuda.is_available() else "cpu"
+        print(f"🚀 Initializing Svara Engine on {self.device}...")
+        # 1. Load Tokenizer & Base Model (Llama-based)
+        self.tokenizer = AutoTokenizer.from_pretrained(base_model_id)
+        self.base_model = AutoModelForCausalLM.from_pretrained(
+            base_model_id,
+            torch_dtype=torch.float16,
+            device_map=self.device
+        )
+        # 2. Load SNAC Decoder (The Vocoder)
+        print("🔊 Loading SNAC Decoder...")
+        self.snac_model = SNAC.from_pretrained("hubertsiuzdak/snac_24khz").eval().to(self.device)
+        # 3. Load LoRA Adapters dynamically
+        # We load the first one to initialize PEFT, then add others
+        first_lang = list(adapter_map.keys())[0]
+        print(f"🔗 Loading initial adapter: {first_lang}...")
+        self.model = PeftModel.from_pretrained(
+            self.base_model,
+            adapter_map[first_lang],
+            adapter_name=first_lang
+        )
+        for lang, path in adapter_map.items():
+            if lang == first_lang: continue
+            print(f"   - Loading adapter: {lang}")
+            self.model.load_adapter(path, adapter_name=lang)
+        print("✅ System Ready.")
+    def reconstruct_audio(self, audio_ids):
+        """
+        Svara/Orpheus Specific: Converts flat tokens back into SNAC hierarchical codes.
+        Structure: 7 tokens per frame -> [1 code (L1), 2 codes (L2), 4 codes (L3)]
+        """
+        # Filter out special text tokens, keep only audio tokens
+        # Note: 128258 is the standard start offset for audio tokens in this architecture
+        audio_ids = [id - 128258 for id in audio_ids if id >= 128258]
+        # Truncate to multiple of 7 (SNAC requirement)
+        num_frames = len(audio_ids) // 7
+        audio_ids = audio_ids[:num_frames * 7]
+        if len(audio_ids) == 0:
+            return None
+        # Reshape: (N, 7)
+        codes = torch.tensor(audio_ids, dtype=torch.int32).reshape(-1, 7).to(self.device)
+        # Unpack layers (SNAC Specific Layout)
+        # Layer 1: Col 0
+        l1 = codes[:, 0]
+        # Layer 2: Cols 1, 4
+        l2 = torch.stack((codes[:, 1], codes[:, 4])).t().flatten()
+        # Layer 3: Cols 2, 3, 5, 6
+        l3 = torch.stack((codes[:, 2], codes[:, 3], codes[:, 5], codes[:, 6])).t().flatten()
+        # Input to SNAC must be a list of tensors [L1, L2, L3] with batch dim
+        snac_input = [
+            l1.unsqueeze(0), # (1, T1)
+            l2.unsqueeze(0), # (1, T2)
+            l3.unsqueeze(0)  # (1, T3)
+        ]
+        with torch.no_grad():
+            audio_hat = self.snac_model.decode(snac_input)
+        return audio_hat.cpu().squeeze().numpy()
+    def synthesize(self, text, language_code):
+        # 1. Switch LoRA Adapter
+        if language_code in self.model.peft_config:
+            self.model.set_adapter(language_code)
+        else:
+            print(f"⚠️ Warning: Language {language_code} not found, using active adapter.")
+        # 2. Prepare Prompt (Orpheus/Svara Format)
+        # Svara expects text followed by a start token.
+        input_ids = self.tokenizer(text, return_tensors="pt").input_ids.to(self.device)
+        # 3. Generate Tokens
+        with torch.no_grad():
+            output_ids = self.model.generate(
+                input_ids,
+                max_new_tokens=1024, # Adjust based on length
+                do_sample=True,
+                temperature=0.7,     # 0.7 is sweet spot for Svara
+                top_p=0.9,
+                repetition_penalty=1.1
+            )
+        # 4. Extract only new tokens (audio codes)
+        generated_ids = output_ids[0, input_ids.shape[1]:].tolist()
+        # 5. Decode to Audio
+        audio_waveform = self.reconstruct_audio(generated_ids)
+        return audio_waveform, 24000 # SNAC sample rate

requirements.txt ADDED Viewed

	@@ -0,0 +1,25 @@

+# fastapi==0.109.0
+# uvicorn==0.27.0
+# python-multipart==0.0.9
+# torch==2.1.2
+# torchaudio==2.1.2
+# transformers==4.37.0
+# peft==0.7.1
+# snac==1.2.0
+# soundfile==0.12.1
+# scipy==1.11.4
+# accelerate==0.26.1
+# numpy<2.0.0
+fastapi
+uvicorn
+python-multipart
+torch
+torchaudio
+transformers
+peft
+snac
+soundfile
+scipy
+accelerate
+numpy

submission_svara_2 ADDED Viewed

	@@ -0,0 +1 @@


1	+ Subproject commit a64eac25f4d1b48eb40fabe14a77b7fd99927d47