fahin-one commited on
Commit
a1911f8
·
1 Parent(s): a64eac2

Initial upload of full project to HF Space

Browse files
.gitattributes CHANGED
@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ adapters/bn/tokenizer.json filter=lfs diff=lfs merge=lfs -text
.gitignore ADDED
@@ -0,0 +1,21 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Ignore raw data and logs
2
+ data/
3
+ logs/
4
+ checkpoints/
5
+ wandb/
6
+ tensorboard/
7
+
8
+ # Ignore compressed archives
9
+ *.zip
10
+ *.tar
11
+ *.tar.gz
12
+ *.7z
13
+
14
+ # Ignore temporary files
15
+ __pycache__/
16
+ *.pyc
17
+ *.pyo
18
+ *.tmp
19
+ *.log
20
+ .DS_Store
21
+
Dockerfile ADDED
@@ -0,0 +1,28 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ FROM pytorch/pytorch:2.1.2-cuda12.1-cudnn8-runtime
2
+
3
+ WORKDIR /app
4
+
5
+ # Install system dependencies for audio
6
+ RUN apt-get update && apt-get install -y \
7
+ libsndfile1 \
8
+ ffmpeg \
9
+ git \
10
+ && rm -rf /var/lib/apt/lists/*
11
+
12
+ # Install Python deps
13
+ COPY requirements.txt .
14
+ RUN pip install --no-cache-dir -r requirements.txt
15
+
16
+ # Copy application code
17
+ COPY . .
18
+
19
+ # Create cache dir for Transformers
20
+ RUN mkdir -p /app/cache
21
+ ENV HF_HOME=/app/cache
22
+ RUN chmod -R 777 /app
23
+
24
+ # Expose port
25
+ EXPOSE 7860
26
+
27
+ # Run
28
+ CMD ["uvicorn", "app:app", "--host", "0.0.0.0", "--port", "7860"]
README.md CHANGED
@@ -1,10 +1,25 @@
1
- ---
2
- title: Submission Svara 2
3
- emoji: 🌍
4
- colorFrom: indigo
5
- colorTo: pink
6
- sdk: docker
7
- pinned: false
8
- ---
9
-
10
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Svara TTS Submission - Team [Your Name/Team]
2
+
3
+ This repository contains the inference pipeline for the **Voice Tech for All** challenge. It serves a fine-tuned `kenpath/svara-tts-v1` model capable of synthesizing speech in 11 Indian languages via dynamic LoRA adapter switching.
4
+
5
+ ## 🏗 Architecture
6
+ - **Base Model:** `kenpath/svara-tts-v1` (Llama-based architecture predicting discrete audio codes).
7
+ - **Vocoder:** `SNAC` (Multi-Scale Neural Audio Codec) at 24kHz.
8
+ - **Adaptation:** 11 LoRA adapters loaded dynamically at runtime to minimize VRAM usage.
9
+ - **Serving:** FastAPI server exposing a REST endpoint.
10
+
11
+ ## 🚀 Usage
12
+
13
+ ### Endpoint: `POST /generate`
14
+ Generates audio from text for a specific language.
15
+
16
+ **Parameters:**
17
+ - `text` (string): The text to synthesize.
18
+ - `language` (string): The language code (matches the adapter folder names, e.g., `hi`, `bn`).
19
+
20
+ **Example cURL:**
21
+ ```bash
22
+ curl -X POST "[https://your-space-url.hf.space/generate](https://your-space-url.hf.space/generate)" \
23
+ -F "text=नमस्ते, आप कैसे हैं?" \
24
+ -F "language=hi" \
25
+ --output output.wav
^^^^^^^^^XX ADDED
File without changes
adapters/bn/README.md ADDED
@@ -0,0 +1,207 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model: kenpath/svara-tts-v1
3
+ library_name: peft
4
+ pipeline_tag: text-generation
5
+ tags:
6
+ - base_model:adapter:kenpath/svara-tts-v1
7
+ - lora
8
+ - transformers
9
+ ---
10
+
11
+ # Model Card for Model ID
12
+
13
+ <!-- Provide a quick summary of what the model is/does. -->
14
+
15
+
16
+
17
+ ## Model Details
18
+
19
+ ### Model Description
20
+
21
+ <!-- Provide a longer summary of what this model is. -->
22
+
23
+
24
+
25
+ - **Developed by:** [More Information Needed]
26
+ - **Funded by [optional]:** [More Information Needed]
27
+ - **Shared by [optional]:** [More Information Needed]
28
+ - **Model type:** [More Information Needed]
29
+ - **Language(s) (NLP):** [More Information Needed]
30
+ - **License:** [More Information Needed]
31
+ - **Finetuned from model [optional]:** [More Information Needed]
32
+
33
+ ### Model Sources [optional]
34
+
35
+ <!-- Provide the basic links for the model. -->
36
+
37
+ - **Repository:** [More Information Needed]
38
+ - **Paper [optional]:** [More Information Needed]
39
+ - **Demo [optional]:** [More Information Needed]
40
+
41
+ ## Uses
42
+
43
+ <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
44
+
45
+ ### Direct Use
46
+
47
+ <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
48
+
49
+ [More Information Needed]
50
+
51
+ ### Downstream Use [optional]
52
+
53
+ <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
54
+
55
+ [More Information Needed]
56
+
57
+ ### Out-of-Scope Use
58
+
59
+ <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
60
+
61
+ [More Information Needed]
62
+
63
+ ## Bias, Risks, and Limitations
64
+
65
+ <!-- This section is meant to convey both technical and sociotechnical limitations. -->
66
+
67
+ [More Information Needed]
68
+
69
+ ### Recommendations
70
+
71
+ <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
72
+
73
+ Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
74
+
75
+ ## How to Get Started with the Model
76
+
77
+ Use the code below to get started with the model.
78
+
79
+ [More Information Needed]
80
+
81
+ ## Training Details
82
+
83
+ ### Training Data
84
+
85
+ <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
86
+
87
+ [More Information Needed]
88
+
89
+ ### Training Procedure
90
+
91
+ <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
92
+
93
+ #### Preprocessing [optional]
94
+
95
+ [More Information Needed]
96
+
97
+
98
+ #### Training Hyperparameters
99
+
100
+ - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
101
+
102
+ #### Speeds, Sizes, Times [optional]
103
+
104
+ <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
105
+
106
+ [More Information Needed]
107
+
108
+ ## Evaluation
109
+
110
+ <!-- This section describes the evaluation protocols and provides the results. -->
111
+
112
+ ### Testing Data, Factors & Metrics
113
+
114
+ #### Testing Data
115
+
116
+ <!-- This should link to a Dataset Card if possible. -->
117
+
118
+ [More Information Needed]
119
+
120
+ #### Factors
121
+
122
+ <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
123
+
124
+ [More Information Needed]
125
+
126
+ #### Metrics
127
+
128
+ <!-- These are the evaluation metrics being used, ideally with a description of why. -->
129
+
130
+ [More Information Needed]
131
+
132
+ ### Results
133
+
134
+ [More Information Needed]
135
+
136
+ #### Summary
137
+
138
+
139
+
140
+ ## Model Examination [optional]
141
+
142
+ <!-- Relevant interpretability work for the model goes here -->
143
+
144
+ [More Information Needed]
145
+
146
+ ## Environmental Impact
147
+
148
+ <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
149
+
150
+ Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
151
+
152
+ - **Hardware Type:** [More Information Needed]
153
+ - **Hours used:** [More Information Needed]
154
+ - **Cloud Provider:** [More Information Needed]
155
+ - **Compute Region:** [More Information Needed]
156
+ - **Carbon Emitted:** [More Information Needed]
157
+
158
+ ## Technical Specifications [optional]
159
+
160
+ ### Model Architecture and Objective
161
+
162
+ [More Information Needed]
163
+
164
+ ### Compute Infrastructure
165
+
166
+ [More Information Needed]
167
+
168
+ #### Hardware
169
+
170
+ [More Information Needed]
171
+
172
+ #### Software
173
+
174
+ [More Information Needed]
175
+
176
+ ## Citation [optional]
177
+
178
+ <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
179
+
180
+ **BibTeX:**
181
+
182
+ [More Information Needed]
183
+
184
+ **APA:**
185
+
186
+ [More Information Needed]
187
+
188
+ ## Glossary [optional]
189
+
190
+ <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
191
+
192
+ [More Information Needed]
193
+
194
+ ## More Information [optional]
195
+
196
+ [More Information Needed]
197
+
198
+ ## Model Card Authors [optional]
199
+
200
+ [More Information Needed]
201
+
202
+ ## Model Card Contact
203
+
204
+ [More Information Needed]
205
+ ### Framework versions
206
+
207
+ - PEFT 0.16.0
adapters/bn/adapter_config.json ADDED
@@ -0,0 +1,43 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "alpha_pattern": {},
3
+ "auto_mapping": null,
4
+ "base_model_name_or_path": "kenpath/svara-tts-v1",
5
+ "bias": "none",
6
+ "corda_config": null,
7
+ "eva_config": null,
8
+ "exclude_modules": null,
9
+ "fan_in_fan_out": false,
10
+ "inference_mode": true,
11
+ "init_lora_weights": true,
12
+ "layer_replication": null,
13
+ "layers_pattern": null,
14
+ "layers_to_transform": null,
15
+ "loftq_config": {},
16
+ "lora_alpha": 32,
17
+ "lora_bias": false,
18
+ "lora_dropout": 0.1,
19
+ "megatron_config": null,
20
+ "megatron_core": "megatron.core",
21
+ "modules_to_save": null,
22
+ "peft_type": "LORA",
23
+ "qalora_group_size": 16,
24
+ "r": 8,
25
+ "rank_pattern": {},
26
+ "revision": null,
27
+ "target_modules": [
28
+ "wv",
29
+ "o_proj",
30
+ "wk",
31
+ "k_proj",
32
+ "v_proj",
33
+ "q_proj",
34
+ "wq",
35
+ "proj",
36
+ "wo"
37
+ ],
38
+ "task_type": "CAUSAL_LM",
39
+ "trainable_token_indices": null,
40
+ "use_dora": false,
41
+ "use_qalora": false,
42
+ "use_rslora": false
43
+ }
adapters/bn/adapter_model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:aa246c0e3683124cdcd8c7d3a8456a46c994b8c468878c8d9f872efdc135e142
3
+ size 18379784
adapters/bn/chat_template.jinja ADDED
@@ -0,0 +1,93 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {{- bos_token }}
2
+ {%- if custom_tools is defined %}
3
+ {%- set tools = custom_tools %}
4
+ {%- endif %}
5
+ {%- if not tools_in_user_message is defined %}
6
+ {%- set tools_in_user_message = true %}
7
+ {%- endif %}
8
+ {%- if not date_string is defined %}
9
+ {%- if strftime_now is defined %}
10
+ {%- set date_string = strftime_now("%d %b %Y") %}
11
+ {%- else %}
12
+ {%- set date_string = "26 Jul 2024" %}
13
+ {%- endif %}
14
+ {%- endif %}
15
+ {%- if not tools is defined %}
16
+ {%- set tools = none %}
17
+ {%- endif %}
18
+
19
+ {#- This block extracts the system message, so we can slot it into the right place. #}
20
+ {%- if messages[0]['role'] == 'system' %}
21
+ {%- set system_message = messages[0]['content']|trim %}
22
+ {%- set messages = messages[1:] %}
23
+ {%- else %}
24
+ {%- set system_message = "" %}
25
+ {%- endif %}
26
+
27
+ {#- System message #}
28
+ {{- "<|start_header_id|>system<|end_header_id|>\n\n" }}
29
+ {%- if tools is not none %}
30
+ {{- "Environment: ipython\n" }}
31
+ {%- endif %}
32
+ {{- "Cutting Knowledge Date: December 2023\n" }}
33
+ {{- "Today Date: " + date_string + "\n\n" }}
34
+ {%- if tools is not none and not tools_in_user_message %}
35
+ {{- "You have access to the following functions. To call a function, please respond with JSON for a function call." }}
36
+ {{- 'Respond in the format {"name": function name, "parameters": dictionary of argument name and its value}.' }}
37
+ {{- "Do not use variables.\n\n" }}
38
+ {%- for t in tools %}
39
+ {{- t | tojson(indent=4) }}
40
+ {{- "\n\n" }}
41
+ {%- endfor %}
42
+ {%- endif %}
43
+ {{- system_message }}
44
+ {{- "<|eot_id|>" }}
45
+
46
+ {#- Custom tools are passed in a user message with some extra guidance #}
47
+ {%- if tools_in_user_message and not tools is none %}
48
+ {#- Extract the first user message so we can plug it in here #}
49
+ {%- if messages | length != 0 %}
50
+ {%- set first_user_message = messages[0]['content']|trim %}
51
+ {%- set messages = messages[1:] %}
52
+ {%- else %}
53
+ {{- raise_exception("Cannot put tools in the first user message when there's no first user message!") }}
54
+ {%- endif %}
55
+ {{- '<|start_header_id|>user<|end_header_id|>\n\n' -}}
56
+ {{- "Given the following functions, please respond with a JSON for a function call " }}
57
+ {{- "with its proper arguments that best answers the given prompt.\n\n" }}
58
+ {{- 'Respond in the format {"name": function name, "parameters": dictionary of argument name and its value}.' }}
59
+ {{- "Do not use variables.\n\n" }}
60
+ {%- for t in tools %}
61
+ {{- t | tojson(indent=4) }}
62
+ {{- "\n\n" }}
63
+ {%- endfor %}
64
+ {{- first_user_message + "<|eot_id|>"}}
65
+ {%- endif %}
66
+
67
+ {%- for message in messages %}
68
+ {%- if not (message.role == 'ipython' or message.role == 'tool' or 'tool_calls' in message) %}
69
+ {{- '<|start_header_id|>' + message['role'] + '<|end_header_id|>\n\n'+ message['content'] | trim + '<|eot_id|>' }}
70
+ {%- elif 'tool_calls' in message %}
71
+ {%- if not message.tool_calls|length == 1 %}
72
+ {{- raise_exception("This model only supports single tool-calls at once!") }}
73
+ {%- endif %}
74
+ {%- set tool_call = message.tool_calls[0].function %}
75
+ {{- '<|start_header_id|>assistant<|end_header_id|>\n\n' -}}
76
+ {{- '{"name": "' + tool_call.name + '", ' }}
77
+ {{- '"parameters": ' }}
78
+ {{- tool_call.arguments | tojson }}
79
+ {{- "}" }}
80
+ {{- "<|eot_id|>" }}
81
+ {%- elif message.role == "tool" or message.role == "ipython" %}
82
+ {{- "<|start_header_id|>ipython<|end_header_id|>\n\n" }}
83
+ {%- if message.content is mapping or message.content is iterable %}
84
+ {{- message.content | tojson }}
85
+ {%- else %}
86
+ {{- message.content }}
87
+ {%- endif %}
88
+ {{- "<|eot_id|>" }}
89
+ {%- endif %}
90
+ {%- endfor %}
91
+ {%- if add_generation_prompt %}
92
+ {{- '<|start_header_id|>assistant<|end_header_id|>\n\n' }}
93
+ {%- endif %}
adapters/bn/checkpoint-500/README.md ADDED
@@ -0,0 +1,207 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model: kenpath/svara-tts-v1
3
+ library_name: peft
4
+ pipeline_tag: text-generation
5
+ tags:
6
+ - base_model:adapter:kenpath/svara-tts-v1
7
+ - lora
8
+ - transformers
9
+ ---
10
+
11
+ # Model Card for Model ID
12
+
13
+ <!-- Provide a quick summary of what the model is/does. -->
14
+
15
+
16
+
17
+ ## Model Details
18
+
19
+ ### Model Description
20
+
21
+ <!-- Provide a longer summary of what this model is. -->
22
+
23
+
24
+
25
+ - **Developed by:** [More Information Needed]
26
+ - **Funded by [optional]:** [More Information Needed]
27
+ - **Shared by [optional]:** [More Information Needed]
28
+ - **Model type:** [More Information Needed]
29
+ - **Language(s) (NLP):** [More Information Needed]
30
+ - **License:** [More Information Needed]
31
+ - **Finetuned from model [optional]:** [More Information Needed]
32
+
33
+ ### Model Sources [optional]
34
+
35
+ <!-- Provide the basic links for the model. -->
36
+
37
+ - **Repository:** [More Information Needed]
38
+ - **Paper [optional]:** [More Information Needed]
39
+ - **Demo [optional]:** [More Information Needed]
40
+
41
+ ## Uses
42
+
43
+ <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
44
+
45
+ ### Direct Use
46
+
47
+ <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
48
+
49
+ [More Information Needed]
50
+
51
+ ### Downstream Use [optional]
52
+
53
+ <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
54
+
55
+ [More Information Needed]
56
+
57
+ ### Out-of-Scope Use
58
+
59
+ <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
60
+
61
+ [More Information Needed]
62
+
63
+ ## Bias, Risks, and Limitations
64
+
65
+ <!-- This section is meant to convey both technical and sociotechnical limitations. -->
66
+
67
+ [More Information Needed]
68
+
69
+ ### Recommendations
70
+
71
+ <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
72
+
73
+ Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
74
+
75
+ ## How to Get Started with the Model
76
+
77
+ Use the code below to get started with the model.
78
+
79
+ [More Information Needed]
80
+
81
+ ## Training Details
82
+
83
+ ### Training Data
84
+
85
+ <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
86
+
87
+ [More Information Needed]
88
+
89
+ ### Training Procedure
90
+
91
+ <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
92
+
93
+ #### Preprocessing [optional]
94
+
95
+ [More Information Needed]
96
+
97
+
98
+ #### Training Hyperparameters
99
+
100
+ - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
101
+
102
+ #### Speeds, Sizes, Times [optional]
103
+
104
+ <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
105
+
106
+ [More Information Needed]
107
+
108
+ ## Evaluation
109
+
110
+ <!-- This section describes the evaluation protocols and provides the results. -->
111
+
112
+ ### Testing Data, Factors & Metrics
113
+
114
+ #### Testing Data
115
+
116
+ <!-- This should link to a Dataset Card if possible. -->
117
+
118
+ [More Information Needed]
119
+
120
+ #### Factors
121
+
122
+ <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
123
+
124
+ [More Information Needed]
125
+
126
+ #### Metrics
127
+
128
+ <!-- These are the evaluation metrics being used, ideally with a description of why. -->
129
+
130
+ [More Information Needed]
131
+
132
+ ### Results
133
+
134
+ [More Information Needed]
135
+
136
+ #### Summary
137
+
138
+
139
+
140
+ ## Model Examination [optional]
141
+
142
+ <!-- Relevant interpretability work for the model goes here -->
143
+
144
+ [More Information Needed]
145
+
146
+ ## Environmental Impact
147
+
148
+ <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
149
+
150
+ Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
151
+
152
+ - **Hardware Type:** [More Information Needed]
153
+ - **Hours used:** [More Information Needed]
154
+ - **Cloud Provider:** [More Information Needed]
155
+ - **Compute Region:** [More Information Needed]
156
+ - **Carbon Emitted:** [More Information Needed]
157
+
158
+ ## Technical Specifications [optional]
159
+
160
+ ### Model Architecture and Objective
161
+
162
+ [More Information Needed]
163
+
164
+ ### Compute Infrastructure
165
+
166
+ [More Information Needed]
167
+
168
+ #### Hardware
169
+
170
+ [More Information Needed]
171
+
172
+ #### Software
173
+
174
+ [More Information Needed]
175
+
176
+ ## Citation [optional]
177
+
178
+ <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
179
+
180
+ **BibTeX:**
181
+
182
+ [More Information Needed]
183
+
184
+ **APA:**
185
+
186
+ [More Information Needed]
187
+
188
+ ## Glossary [optional]
189
+
190
+ <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
191
+
192
+ [More Information Needed]
193
+
194
+ ## More Information [optional]
195
+
196
+ [More Information Needed]
197
+
198
+ ## Model Card Authors [optional]
199
+
200
+ [More Information Needed]
201
+
202
+ ## Model Card Contact
203
+
204
+ [More Information Needed]
205
+ ### Framework versions
206
+
207
+ - PEFT 0.16.0
adapters/bn/checkpoint-500/adapter_config.json ADDED
@@ -0,0 +1,43 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "alpha_pattern": {},
3
+ "auto_mapping": null,
4
+ "base_model_name_or_path": "kenpath/svara-tts-v1",
5
+ "bias": "none",
6
+ "corda_config": null,
7
+ "eva_config": null,
8
+ "exclude_modules": null,
9
+ "fan_in_fan_out": false,
10
+ "inference_mode": true,
11
+ "init_lora_weights": true,
12
+ "layer_replication": null,
13
+ "layers_pattern": null,
14
+ "layers_to_transform": null,
15
+ "loftq_config": {},
16
+ "lora_alpha": 32,
17
+ "lora_bias": false,
18
+ "lora_dropout": 0.1,
19
+ "megatron_config": null,
20
+ "megatron_core": "megatron.core",
21
+ "modules_to_save": null,
22
+ "peft_type": "LORA",
23
+ "qalora_group_size": 16,
24
+ "r": 8,
25
+ "rank_pattern": {},
26
+ "revision": null,
27
+ "target_modules": [
28
+ "wv",
29
+ "o_proj",
30
+ "wk",
31
+ "k_proj",
32
+ "v_proj",
33
+ "q_proj",
34
+ "wq",
35
+ "proj",
36
+ "wo"
37
+ ],
38
+ "task_type": "CAUSAL_LM",
39
+ "trainable_token_indices": null,
40
+ "use_dora": false,
41
+ "use_qalora": false,
42
+ "use_rslora": false
43
+ }
adapters/bn/checkpoint-500/adapter_model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:8465d913974758b1dee3d0e63d968733d8f1d8b9e026d8f35827e2b6dd40922b
3
+ size 18379784
adapters/bn/checkpoint-500/optimizer.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:fdc17a6f375be7718ba586e69aad417004f1a52c16ab6dab8d317f3fdd1078af
3
+ size 9584634
adapters/bn/checkpoint-500/rng_state.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a6aed2b10bee4226d583ef0795da2b39268d8b034159ae8b85200ff38e62abab
3
+ size 14244
adapters/bn/checkpoint-500/scaler.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:8b518d56b3c6725bead57518a740c30779ace967ae41891564a1adfcc41ae4af
3
+ size 988
adapters/bn/checkpoint-500/scheduler.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:8440834bedbdd291243e37f57b580c1e6bef4a868c200733affded7216504b85
3
+ size 1064
adapters/bn/checkpoint-500/trainer_state.json ADDED
@@ -0,0 +1,104 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_global_step": null,
3
+ "best_metric": null,
4
+ "best_model_checkpoint": null,
5
+ "epoch": 1.888468809073724,
6
+ "eval_steps": 500,
7
+ "global_step": 500,
8
+ "is_hyper_param_search": false,
9
+ "is_local_process_zero": true,
10
+ "is_world_process_zero": true,
11
+ "log_history": [
12
+ {
13
+ "epoch": 0.1890359168241966,
14
+ "grad_norm": 0.6747773289680481,
15
+ "learning_rate": 0.00028188679245283015,
16
+ "loss": 4.7024,
17
+ "step": 50
18
+ },
19
+ {
20
+ "epoch": 0.3780718336483932,
21
+ "grad_norm": 0.5653451681137085,
22
+ "learning_rate": 0.00026301886792452826,
23
+ "loss": 4.4922,
24
+ "step": 100
25
+ },
26
+ {
27
+ "epoch": 0.5671077504725898,
28
+ "grad_norm": 0.7042025327682495,
29
+ "learning_rate": 0.0002441509433962264,
30
+ "loss": 4.5527,
31
+ "step": 150
32
+ },
33
+ {
34
+ "epoch": 0.7561436672967864,
35
+ "grad_norm": 0.6175432205200195,
36
+ "learning_rate": 0.00022528301886792453,
37
+ "loss": 4.5609,
38
+ "step": 200
39
+ },
40
+ {
41
+ "epoch": 0.945179584120983,
42
+ "grad_norm": 0.6139667630195618,
43
+ "learning_rate": 0.00020641509433962262,
44
+ "loss": 4.5874,
45
+ "step": 250
46
+ },
47
+ {
48
+ "epoch": 1.1323251417769375,
49
+ "grad_norm": 0.6452978849411011,
50
+ "learning_rate": 0.00018754716981132072,
51
+ "loss": 4.4433,
52
+ "step": 300
53
+ },
54
+ {
55
+ "epoch": 1.3213610586011342,
56
+ "grad_norm": 0.5474215149879456,
57
+ "learning_rate": 0.00016867924528301886,
58
+ "loss": 4.388,
59
+ "step": 350
60
+ },
61
+ {
62
+ "epoch": 1.5103969754253308,
63
+ "grad_norm": 0.811112642288208,
64
+ "learning_rate": 0.00014981132075471697,
65
+ "loss": 4.4651,
66
+ "step": 400
67
+ },
68
+ {
69
+ "epoch": 1.6994328922495274,
70
+ "grad_norm": 0.7931979894638062,
71
+ "learning_rate": 0.00013094339622641508,
72
+ "loss": 4.3723,
73
+ "step": 450
74
+ },
75
+ {
76
+ "epoch": 1.888468809073724,
77
+ "grad_norm": 0.6235228776931763,
78
+ "learning_rate": 0.00011207547169811319,
79
+ "loss": 4.4056,
80
+ "step": 500
81
+ }
82
+ ],
83
+ "logging_steps": 50,
84
+ "max_steps": 795,
85
+ "num_input_tokens_seen": 0,
86
+ "num_train_epochs": 3,
87
+ "save_steps": 500,
88
+ "stateful_callbacks": {
89
+ "TrainerControl": {
90
+ "args": {
91
+ "should_epoch_stop": false,
92
+ "should_evaluate": false,
93
+ "should_log": false,
94
+ "should_save": true,
95
+ "should_training_stop": false
96
+ },
97
+ "attributes": {}
98
+ }
99
+ },
100
+ "total_flos": 1.5224731872903168e+16,
101
+ "train_batch_size": 1,
102
+ "trial_name": null,
103
+ "trial_params": null
104
+ }
adapters/bn/checkpoint-500/training_args.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:bb5894c37dcda07d3b6dc6d125319810ab7fca777cf1e2f1e7b24c9af6cdc573
3
+ size 5368
adapters/bn/checkpoint-795/README.md ADDED
@@ -0,0 +1,207 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model: kenpath/svara-tts-v1
3
+ library_name: peft
4
+ pipeline_tag: text-generation
5
+ tags:
6
+ - base_model:adapter:kenpath/svara-tts-v1
7
+ - lora
8
+ - transformers
9
+ ---
10
+
11
+ # Model Card for Model ID
12
+
13
+ <!-- Provide a quick summary of what the model is/does. -->
14
+
15
+
16
+
17
+ ## Model Details
18
+
19
+ ### Model Description
20
+
21
+ <!-- Provide a longer summary of what this model is. -->
22
+
23
+
24
+
25
+ - **Developed by:** [More Information Needed]
26
+ - **Funded by [optional]:** [More Information Needed]
27
+ - **Shared by [optional]:** [More Information Needed]
28
+ - **Model type:** [More Information Needed]
29
+ - **Language(s) (NLP):** [More Information Needed]
30
+ - **License:** [More Information Needed]
31
+ - **Finetuned from model [optional]:** [More Information Needed]
32
+
33
+ ### Model Sources [optional]
34
+
35
+ <!-- Provide the basic links for the model. -->
36
+
37
+ - **Repository:** [More Information Needed]
38
+ - **Paper [optional]:** [More Information Needed]
39
+ - **Demo [optional]:** [More Information Needed]
40
+
41
+ ## Uses
42
+
43
+ <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
44
+
45
+ ### Direct Use
46
+
47
+ <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
48
+
49
+ [More Information Needed]
50
+
51
+ ### Downstream Use [optional]
52
+
53
+ <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
54
+
55
+ [More Information Needed]
56
+
57
+ ### Out-of-Scope Use
58
+
59
+ <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
60
+
61
+ [More Information Needed]
62
+
63
+ ## Bias, Risks, and Limitations
64
+
65
+ <!-- This section is meant to convey both technical and sociotechnical limitations. -->
66
+
67
+ [More Information Needed]
68
+
69
+ ### Recommendations
70
+
71
+ <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
72
+
73
+ Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
74
+
75
+ ## How to Get Started with the Model
76
+
77
+ Use the code below to get started with the model.
78
+
79
+ [More Information Needed]
80
+
81
+ ## Training Details
82
+
83
+ ### Training Data
84
+
85
+ <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
86
+
87
+ [More Information Needed]
88
+
89
+ ### Training Procedure
90
+
91
+ <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
92
+
93
+ #### Preprocessing [optional]
94
+
95
+ [More Information Needed]
96
+
97
+
98
+ #### Training Hyperparameters
99
+
100
+ - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
101
+
102
+ #### Speeds, Sizes, Times [optional]
103
+
104
+ <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
105
+
106
+ [More Information Needed]
107
+
108
+ ## Evaluation
109
+
110
+ <!-- This section describes the evaluation protocols and provides the results. -->
111
+
112
+ ### Testing Data, Factors & Metrics
113
+
114
+ #### Testing Data
115
+
116
+ <!-- This should link to a Dataset Card if possible. -->
117
+
118
+ [More Information Needed]
119
+
120
+ #### Factors
121
+
122
+ <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
123
+
124
+ [More Information Needed]
125
+
126
+ #### Metrics
127
+
128
+ <!-- These are the evaluation metrics being used, ideally with a description of why. -->
129
+
130
+ [More Information Needed]
131
+
132
+ ### Results
133
+
134
+ [More Information Needed]
135
+
136
+ #### Summary
137
+
138
+
139
+
140
+ ## Model Examination [optional]
141
+
142
+ <!-- Relevant interpretability work for the model goes here -->
143
+
144
+ [More Information Needed]
145
+
146
+ ## Environmental Impact
147
+
148
+ <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
149
+
150
+ Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
151
+
152
+ - **Hardware Type:** [More Information Needed]
153
+ - **Hours used:** [More Information Needed]
154
+ - **Cloud Provider:** [More Information Needed]
155
+ - **Compute Region:** [More Information Needed]
156
+ - **Carbon Emitted:** [More Information Needed]
157
+
158
+ ## Technical Specifications [optional]
159
+
160
+ ### Model Architecture and Objective
161
+
162
+ [More Information Needed]
163
+
164
+ ### Compute Infrastructure
165
+
166
+ [More Information Needed]
167
+
168
+ #### Hardware
169
+
170
+ [More Information Needed]
171
+
172
+ #### Software
173
+
174
+ [More Information Needed]
175
+
176
+ ## Citation [optional]
177
+
178
+ <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
179
+
180
+ **BibTeX:**
181
+
182
+ [More Information Needed]
183
+
184
+ **APA:**
185
+
186
+ [More Information Needed]
187
+
188
+ ## Glossary [optional]
189
+
190
+ <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
191
+
192
+ [More Information Needed]
193
+
194
+ ## More Information [optional]
195
+
196
+ [More Information Needed]
197
+
198
+ ## Model Card Authors [optional]
199
+
200
+ [More Information Needed]
201
+
202
+ ## Model Card Contact
203
+
204
+ [More Information Needed]
205
+ ### Framework versions
206
+
207
+ - PEFT 0.16.0
adapters/bn/checkpoint-795/adapter_config.json ADDED
@@ -0,0 +1,43 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "alpha_pattern": {},
3
+ "auto_mapping": null,
4
+ "base_model_name_or_path": "kenpath/svara-tts-v1",
5
+ "bias": "none",
6
+ "corda_config": null,
7
+ "eva_config": null,
8
+ "exclude_modules": null,
9
+ "fan_in_fan_out": false,
10
+ "inference_mode": true,
11
+ "init_lora_weights": true,
12
+ "layer_replication": null,
13
+ "layers_pattern": null,
14
+ "layers_to_transform": null,
15
+ "loftq_config": {},
16
+ "lora_alpha": 32,
17
+ "lora_bias": false,
18
+ "lora_dropout": 0.1,
19
+ "megatron_config": null,
20
+ "megatron_core": "megatron.core",
21
+ "modules_to_save": null,
22
+ "peft_type": "LORA",
23
+ "qalora_group_size": 16,
24
+ "r": 8,
25
+ "rank_pattern": {},
26
+ "revision": null,
27
+ "target_modules": [
28
+ "wv",
29
+ "o_proj",
30
+ "wk",
31
+ "k_proj",
32
+ "v_proj",
33
+ "q_proj",
34
+ "wq",
35
+ "proj",
36
+ "wo"
37
+ ],
38
+ "task_type": "CAUSAL_LM",
39
+ "trainable_token_indices": null,
40
+ "use_dora": false,
41
+ "use_qalora": false,
42
+ "use_rslora": false
43
+ }
adapters/bn/checkpoint-795/adapter_model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:aa246c0e3683124cdcd8c7d3a8456a46c994b8c468878c8d9f872efdc135e142
3
+ size 18379784
adapters/bn/checkpoint-795/optimizer.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f2d8ca4910b2d06ca8a6300901836c679c4638a5cb62cde0072fd4c01d317361
3
+ size 9584634
adapters/bn/checkpoint-795/rng_state.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:4f8664dee33a75e16cd244ee571724c5ca643d48198e78132a298003b0c5b830
3
+ size 14244
adapters/bn/checkpoint-795/scaler.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:373ec1ace89608530ff2ea47177c94dea66acdbf718ea1a11c0fa66f6af52dd4
3
+ size 988
adapters/bn/checkpoint-795/scheduler.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:dc385e4975e71baea60dded77250967d6eeaacdb26e791049a369ffb1f066f33
3
+ size 1064
adapters/bn/checkpoint-795/trainer_state.json ADDED
@@ -0,0 +1,139 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_global_step": null,
3
+ "best_metric": null,
4
+ "best_model_checkpoint": null,
5
+ "epoch": 3.0,
6
+ "eval_steps": 500,
7
+ "global_step": 795,
8
+ "is_hyper_param_search": false,
9
+ "is_local_process_zero": true,
10
+ "is_world_process_zero": true,
11
+ "log_history": [
12
+ {
13
+ "epoch": 0.1890359168241966,
14
+ "grad_norm": 0.6747773289680481,
15
+ "learning_rate": 0.00028188679245283015,
16
+ "loss": 4.7024,
17
+ "step": 50
18
+ },
19
+ {
20
+ "epoch": 0.3780718336483932,
21
+ "grad_norm": 0.5653451681137085,
22
+ "learning_rate": 0.00026301886792452826,
23
+ "loss": 4.4922,
24
+ "step": 100
25
+ },
26
+ {
27
+ "epoch": 0.5671077504725898,
28
+ "grad_norm": 0.7042025327682495,
29
+ "learning_rate": 0.0002441509433962264,
30
+ "loss": 4.5527,
31
+ "step": 150
32
+ },
33
+ {
34
+ "epoch": 0.7561436672967864,
35
+ "grad_norm": 0.6175432205200195,
36
+ "learning_rate": 0.00022528301886792453,
37
+ "loss": 4.5609,
38
+ "step": 200
39
+ },
40
+ {
41
+ "epoch": 0.945179584120983,
42
+ "grad_norm": 0.6139667630195618,
43
+ "learning_rate": 0.00020641509433962262,
44
+ "loss": 4.5874,
45
+ "step": 250
46
+ },
47
+ {
48
+ "epoch": 1.1323251417769375,
49
+ "grad_norm": 0.6452978849411011,
50
+ "learning_rate": 0.00018754716981132072,
51
+ "loss": 4.4433,
52
+ "step": 300
53
+ },
54
+ {
55
+ "epoch": 1.3213610586011342,
56
+ "grad_norm": 0.5474215149879456,
57
+ "learning_rate": 0.00016867924528301886,
58
+ "loss": 4.388,
59
+ "step": 350
60
+ },
61
+ {
62
+ "epoch": 1.5103969754253308,
63
+ "grad_norm": 0.811112642288208,
64
+ "learning_rate": 0.00014981132075471697,
65
+ "loss": 4.4651,
66
+ "step": 400
67
+ },
68
+ {
69
+ "epoch": 1.6994328922495274,
70
+ "grad_norm": 0.7931979894638062,
71
+ "learning_rate": 0.00013094339622641508,
72
+ "loss": 4.3723,
73
+ "step": 450
74
+ },
75
+ {
76
+ "epoch": 1.888468809073724,
77
+ "grad_norm": 0.6235228776931763,
78
+ "learning_rate": 0.00011207547169811319,
79
+ "loss": 4.4056,
80
+ "step": 500
81
+ },
82
+ {
83
+ "epoch": 2.0756143667296785,
84
+ "grad_norm": 0.7440807819366455,
85
+ "learning_rate": 9.320754716981131e-05,
86
+ "loss": 4.4061,
87
+ "step": 550
88
+ },
89
+ {
90
+ "epoch": 2.264650283553875,
91
+ "grad_norm": 0.7756364941596985,
92
+ "learning_rate": 7.433962264150943e-05,
93
+ "loss": 4.3203,
94
+ "step": 600
95
+ },
96
+ {
97
+ "epoch": 2.4536862003780717,
98
+ "grad_norm": 0.7223451733589172,
99
+ "learning_rate": 5.547169811320754e-05,
100
+ "loss": 4.2439,
101
+ "step": 650
102
+ },
103
+ {
104
+ "epoch": 2.6427221172022684,
105
+ "grad_norm": 0.7152644395828247,
106
+ "learning_rate": 3.6603773584905655e-05,
107
+ "loss": 4.308,
108
+ "step": 700
109
+ },
110
+ {
111
+ "epoch": 2.831758034026465,
112
+ "grad_norm": 0.8246577382087708,
113
+ "learning_rate": 1.773584905660377e-05,
114
+ "loss": 4.402,
115
+ "step": 750
116
+ }
117
+ ],
118
+ "logging_steps": 50,
119
+ "max_steps": 795,
120
+ "num_input_tokens_seen": 0,
121
+ "num_train_epochs": 3,
122
+ "save_steps": 500,
123
+ "stateful_callbacks": {
124
+ "TrainerControl": {
125
+ "args": {
126
+ "should_epoch_stop": false,
127
+ "should_evaluate": false,
128
+ "should_log": false,
129
+ "should_save": true,
130
+ "should_training_stop": true
131
+ },
132
+ "attributes": {}
133
+ }
134
+ },
135
+ "total_flos": 2.4184540042776576e+16,
136
+ "train_batch_size": 1,
137
+ "trial_name": null,
138
+ "trial_params": null
139
+ }
adapters/bn/checkpoint-795/training_args.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:bb5894c37dcda07d3b6dc6d125319810ab7fca777cf1e2f1e7b24c9af6cdc573
3
+ size 5368
adapters/bn/special_tokens_map.json ADDED
@@ -0,0 +1,26 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "additional_special_tokens": [
3
+ "<|audio|>"
4
+ ],
5
+ "bos_token": {
6
+ "content": "<|begin_of_text|>",
7
+ "lstrip": false,
8
+ "normalized": false,
9
+ "rstrip": false,
10
+ "single_word": false
11
+ },
12
+ "eos_token": {
13
+ "content": "<|eot_id|>",
14
+ "lstrip": false,
15
+ "normalized": false,
16
+ "rstrip": false,
17
+ "single_word": false
18
+ },
19
+ "pad_token": {
20
+ "content": "<custom_token_7>",
21
+ "lstrip": false,
22
+ "normalized": true,
23
+ "rstrip": false,
24
+ "single_word": false
25
+ }
26
+ }
adapters/bn/tokenizer.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:044e2a10201774018db120391980464472baabf223bd353cea49b17da0b66abc
3
+ size 22849546
adapters/bn/tokenizer_config.json ADDED
The diff for this file is too large to render. See raw diff
 
app.py ADDED
@@ -0,0 +1,65 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import os
2
+ import io
3
+ import soundfile as sf
4
+ from fastapi import FastAPI, Form, HTTPException
5
+ from fastapi.responses import StreamingResponse
6
+ from inference import SvaraEngine
7
+
8
+ # Configuration
9
+ BASE_MODEL = "kenpath/svara-tts-v1"
10
+ ADAPTERS_DIR = "./adapters"
11
+
12
+ # Auto-discover adapters in the folder
13
+ adapter_map = {}
14
+ if os.path.exists(ADAPTERS_DIR):
15
+ for name in os.listdir(ADAPTERS_DIR):
16
+ path = os.path.join(ADAPTERS_DIR, name)
17
+ if os.path.isdir(path):
18
+ adapter_map[name] = path
19
+
20
+ # Initialize App & Engine
21
+ app = FastAPI(title="Team Submission - Svara TTS API")
22
+ engine = None
23
+
24
+ @app.on_event("startup")
25
+ async def startup_event():
26
+ global engine
27
+ # Initialize engine once on startup
28
+ if not adapter_map:
29
+ raise RuntimeError("No adapters found in ./adapters folder!")
30
+ engine = SvaraEngine(BASE_MODEL, adapter_map)
31
+
32
+ @app.get("/")
33
+ def home():
34
+ return {"status": "healthy", "languages_loaded": list(adapter_map.keys())}
35
+
36
+ @app.post("/generate")
37
+ async def generate_audio(text: str = Form(...), language: str = Form(...)):
38
+ """
39
+ Endpoint strictly for the judges.
40
+ Input: Text and Language Code (e.g., 'hi', 'bn').
41
+ Output: WAV Audio Stream.
42
+ """
43
+ if language not in adapter_map:
44
+ raise HTTPException(status_code=400, detail=f"Language '{language}' not supported.")
45
+
46
+ try:
47
+ # Run Inference
48
+ audio_data, sr = engine.synthesize(text, language)
49
+
50
+ if audio_data is None:
51
+ raise HTTPException(status_code=500, detail="Model failed to generate audio tokens.")
52
+
53
+ # Convert to WAV in-memory
54
+ buffer = io.BytesIO()
55
+ sf.write(buffer, audio_data, sr, format='WAV')
56
+ buffer.seek(0)
57
+
58
+ return StreamingResponse(buffer, media_type="audio/wav")
59
+
60
+ except Exception as e:
61
+ raise HTTPException(status_code=500, detail=str(e))
62
+
63
+ if __name__ == "__main__":
64
+ import uvicorn
65
+ uvicorn.run(app, host="0.0.0.0", port=7860)
inference.py ADDED
@@ -0,0 +1,108 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import torch
2
+ import torchaudio
3
+ from transformers import AutoModelForCausalLM, AutoTokenizer
4
+ from peft import PeftModel
5
+ from snac import SNAC
6
+
7
+ class SvaraEngine:
8
+ def __init__(self, base_model_id, adapter_map, device="cuda"):
9
+ self.device = device if torch.cuda.is_available() else "cpu"
10
+ print(f"🚀 Initializing Svara Engine on {self.device}...")
11
+
12
+ # 1. Load Tokenizer & Base Model (Llama-based)
13
+ self.tokenizer = AutoTokenizer.from_pretrained(base_model_id)
14
+ self.base_model = AutoModelForCausalLM.from_pretrained(
15
+ base_model_id,
16
+ torch_dtype=torch.float16,
17
+ device_map=self.device
18
+ )
19
+
20
+ # 2. Load SNAC Decoder (The Vocoder)
21
+ print("🔊 Loading SNAC Decoder...")
22
+ self.snac_model = SNAC.from_pretrained("hubertsiuzdak/snac_24khz").eval().to(self.device)
23
+
24
+ # 3. Load LoRA Adapters dynamically
25
+ # We load the first one to initialize PEFT, then add others
26
+ first_lang = list(adapter_map.keys())[0]
27
+ print(f"🔗 Loading initial adapter: {first_lang}...")
28
+ self.model = PeftModel.from_pretrained(
29
+ self.base_model,
30
+ adapter_map[first_lang],
31
+ adapter_name=first_lang
32
+ )
33
+
34
+ for lang, path in adapter_map.items():
35
+ if lang == first_lang: continue
36
+ print(f" - Loading adapter: {lang}")
37
+ self.model.load_adapter(path, adapter_name=lang)
38
+
39
+ print("✅ System Ready.")
40
+
41
+ def reconstruct_audio(self, audio_ids):
42
+ """
43
+ Svara/Orpheus Specific: Converts flat tokens back into SNAC hierarchical codes.
44
+ Structure: 7 tokens per frame -> [1 code (L1), 2 codes (L2), 4 codes (L3)]
45
+ """
46
+ # Filter out special text tokens, keep only audio tokens
47
+ # Note: 128258 is the standard start offset for audio tokens in this architecture
48
+ audio_ids = [id - 128258 for id in audio_ids if id >= 128258]
49
+
50
+ # Truncate to multiple of 7 (SNAC requirement)
51
+ num_frames = len(audio_ids) // 7
52
+ audio_ids = audio_ids[:num_frames * 7]
53
+
54
+ if len(audio_ids) == 0:
55
+ return None
56
+
57
+ # Reshape: (N, 7)
58
+ codes = torch.tensor(audio_ids, dtype=torch.int32).reshape(-1, 7).to(self.device)
59
+
60
+ # Unpack layers (SNAC Specific Layout)
61
+ # Layer 1: Col 0
62
+ l1 = codes[:, 0]
63
+ # Layer 2: Cols 1, 4
64
+ l2 = torch.stack((codes[:, 1], codes[:, 4])).t().flatten()
65
+ # Layer 3: Cols 2, 3, 5, 6
66
+ l3 = torch.stack((codes[:, 2], codes[:, 3], codes[:, 5], codes[:, 6])).t().flatten()
67
+
68
+ # Input to SNAC must be a list of tensors [L1, L2, L3] with batch dim
69
+ snac_input = [
70
+ l1.unsqueeze(0), # (1, T1)
71
+ l2.unsqueeze(0), # (1, T2)
72
+ l3.unsqueeze(0) # (1, T3)
73
+ ]
74
+
75
+ with torch.no_grad():
76
+ audio_hat = self.snac_model.decode(snac_input)
77
+
78
+ return audio_hat.cpu().squeeze().numpy()
79
+
80
+ def synthesize(self, text, language_code):
81
+ # 1. Switch LoRA Adapter
82
+ if language_code in self.model.peft_config:
83
+ self.model.set_adapter(language_code)
84
+ else:
85
+ print(f"⚠️ Warning: Language {language_code} not found, using active adapter.")
86
+
87
+ # 2. Prepare Prompt (Orpheus/Svara Format)
88
+ # Svara expects text followed by a start token.
89
+ input_ids = self.tokenizer(text, return_tensors="pt").input_ids.to(self.device)
90
+
91
+ # 3. Generate Tokens
92
+ with torch.no_grad():
93
+ output_ids = self.model.generate(
94
+ input_ids,
95
+ max_new_tokens=1024, # Adjust based on length
96
+ do_sample=True,
97
+ temperature=0.7, # 0.7 is sweet spot for Svara
98
+ top_p=0.9,
99
+ repetition_penalty=1.1
100
+ )
101
+
102
+ # 4. Extract only new tokens (audio codes)
103
+ generated_ids = output_ids[0, input_ids.shape[1]:].tolist()
104
+
105
+ # 5. Decode to Audio
106
+ audio_waveform = self.reconstruct_audio(generated_ids)
107
+
108
+ return audio_waveform, 24000 # SNAC sample rate
requirements.txt ADDED
@@ -0,0 +1,25 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # fastapi==0.109.0
2
+ # uvicorn==0.27.0
3
+ # python-multipart==0.0.9
4
+ # torch==2.1.2
5
+ # torchaudio==2.1.2
6
+ # transformers==4.37.0
7
+ # peft==0.7.1
8
+ # snac==1.2.0
9
+ # soundfile==0.12.1
10
+ # scipy==1.11.4
11
+ # accelerate==0.26.1
12
+ # numpy<2.0.0
13
+
14
+ fastapi
15
+ uvicorn
16
+ python-multipart
17
+ torch
18
+ torchaudio
19
+ transformers
20
+ peft
21
+ snac
22
+ soundfile
23
+ scipy
24
+ accelerate
25
+ numpy
submission_svara_2 ADDED
@@ -0,0 +1 @@
 
 
1
+ Subproject commit a64eac25f4d1b48eb40fabe14a77b7fd99927d47