wt5719001 hynt commited on
Commit
e40867d
·
0 Parent(s):

Duplicate from hynt/ZipVoice-Vietnamese-2500h

Browse files

Co-authored-by: Nguyen Thien Hy <hynt@users.noreply.huggingface.co>

Files changed (5) hide show
  1. .gitattributes +35 -0
  2. README.md +57 -0
  3. config.json +26 -0
  4. iter-525000-avg-2.pt +3 -0
  5. tokens.txt +360 -0
.gitattributes ADDED
@@ -0,0 +1,35 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ *.7z filter=lfs diff=lfs merge=lfs -text
2
+ *.arrow filter=lfs diff=lfs merge=lfs -text
3
+ *.bin filter=lfs diff=lfs merge=lfs -text
4
+ *.bz2 filter=lfs diff=lfs merge=lfs -text
5
+ *.ckpt filter=lfs diff=lfs merge=lfs -text
6
+ *.ftz filter=lfs diff=lfs merge=lfs -text
7
+ *.gz filter=lfs diff=lfs merge=lfs -text
8
+ *.h5 filter=lfs diff=lfs merge=lfs -text
9
+ *.joblib filter=lfs diff=lfs merge=lfs -text
10
+ *.lfs.* filter=lfs diff=lfs merge=lfs -text
11
+ *.mlmodel filter=lfs diff=lfs merge=lfs -text
12
+ *.model filter=lfs diff=lfs merge=lfs -text
13
+ *.msgpack filter=lfs diff=lfs merge=lfs -text
14
+ *.npy filter=lfs diff=lfs merge=lfs -text
15
+ *.npz filter=lfs diff=lfs merge=lfs -text
16
+ *.onnx filter=lfs diff=lfs merge=lfs -text
17
+ *.ot filter=lfs diff=lfs merge=lfs -text
18
+ *.parquet filter=lfs diff=lfs merge=lfs -text
19
+ *.pb filter=lfs diff=lfs merge=lfs -text
20
+ *.pickle filter=lfs diff=lfs merge=lfs -text
21
+ *.pkl filter=lfs diff=lfs merge=lfs -text
22
+ *.pt filter=lfs diff=lfs merge=lfs -text
23
+ *.pth filter=lfs diff=lfs merge=lfs -text
24
+ *.rar filter=lfs diff=lfs merge=lfs -text
25
+ *.safetensors filter=lfs diff=lfs merge=lfs -text
26
+ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
27
+ *.tar.* filter=lfs diff=lfs merge=lfs -text
28
+ *.tar filter=lfs diff=lfs merge=lfs -text
29
+ *.tflite filter=lfs diff=lfs merge=lfs -text
30
+ *.tgz filter=lfs diff=lfs merge=lfs -text
31
+ *.wasm filter=lfs diff=lfs merge=lfs -text
32
+ *.xz filter=lfs diff=lfs merge=lfs -text
33
+ *.zip filter=lfs diff=lfs merge=lfs -text
34
+ *.zst filter=lfs diff=lfs merge=lfs -text
35
+ *tfevents* filter=lfs diff=lfs merge=lfs -text
README.md ADDED
@@ -0,0 +1,57 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - text-to-speech
4
+ - vietnamese
5
+ - ai-model
6
+ - deep-learning
7
+ license: cc-by-nc-sa-4.0
8
+ library_name: pytorch
9
+ datasets:
10
+ - PhoAudioBook
11
+ - ViVoice
12
+ - UEH
13
+ model_name: ZipVoice-Vietnamese-2500h
14
+ language: vi
15
+ ---
16
+
17
+ # 🛑 Important Note ⚠️
18
+ This Text-to-Speech (TTS) model is provided solely for research, experimentation, and technology development purposes. Any audio content generated by the model does not represent the voice, identity, opinions, or endorsement of any real individual or organization. The authors and related parties assume no responsibility for any misuse, unlawful activities, violations of privacy, personality rights, intellectual property rights, or any direct or indirect damages arising from the use of this model.
19
+
20
+ Users bear full responsibility and legal liability for the deployment, distribution, and use of the model. The use of this model for impersonation, voice cloning of individuals without lawful consent, creating misleading content, fraud, manipulation of public opinion, or any purpose that violates applicable laws is strictly prohibited. When using or sharing generated audio, it is strongly recommended to clearly disclose that the content is AI-generated and to comply fully with all applicable legal regulations, platform policies, and ethical standards.
21
+
22
+ # 🎙️ ZipVoice-Vietnamese-2500h
23
+ ZipVoice is a series of fast and high-quality zero-shot TTS models based on flow matching.
24
+
25
+ Key features:
26
+ 1. Small and fast: only 123M parameters.
27
+
28
+ 2. High-quality voice cloning: state-of-the-art performance in speaker similarity, intelligibility, and naturalness.
29
+
30
+ 3. Multi-lingual: support Chinese and English.
31
+
32
+ 4. Multi-mode: support both single-speaker and dialogue speech generation.
33
+
34
+ This checkpoint is a compact fine-tuned version of ZipVoice trained on 2500 hours of Vietnamese speech.
35
+
36
+ 🔗 For more fine-tuning and inference experiments, visit: https://github.com/k2-fsa/ZipVoice.
37
+
38
+ 📜 **License:** [CC-BY-NC-SA-4.0](https://spdx.org/licenses/CC-BY-NC-SA-4.0) — Non-commercial research use only.
39
+
40
+ ---
41
+
42
+ ## 📌 Model Details
43
+
44
+ - **Dataset:** PhoAudioBook, ViVoice, TeacherDinh-UEH.
45
+ - **Total dataset durations:** 2500 hours
46
+ - **Data processing Technique:**
47
+ - Remove all music background from audios, using facebook demucs model: https://github.com/facebookresearch/demucs
48
+ - Do not use audio files shorter than 1 second or longer than 30 seconds.
49
+ - Keep the default punctuation marks unchanged.
50
+ - Normalize to lowercase format.
51
+ - **Training Configuration:**
52
+ - **Base Model:** ZipVoice with espeak-ng vi for tokenizer
53
+ - **GPU:** RTX 3090
54
+ - **Batch Siz:** Max duration 200
55
+ - **Training Progress:** Stopped at **525,000 steps at epoch 11**
56
+
57
+ ---
config.json ADDED
@@ -0,0 +1,26 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "model" : {
3
+ "fm_decoder_downsampling_factor" : [1,2,4,2,1],
4
+ "fm_decoder_num_layers" : [2,2,4,4,4],
5
+ "fm_decoder_cnn_module_kernel" : [31,15,7,15,31],
6
+ "fm_decoder_feedforward_dim" : 1536,
7
+ "fm_decoder_num_heads" : 4,
8
+ "fm_decoder_dim" : 512,
9
+ "text_encoder_num_layers" : 4,
10
+ "text_encoder_feedforward_dim" : 512,
11
+ "text_encoder_cnn_module_kernel" : 9,
12
+ "text_encoder_num_heads" : 4,
13
+ "text_encoder_dim" : 192,
14
+ "query_head_dim" : 32,
15
+ "value_head_dim" : 12,
16
+ "pos_head_dim" : 4,
17
+ "pos_dim" : 48,
18
+ "time_embed_dim" : 192,
19
+ "text_embed_dim" : 192,
20
+ "feat_dim": 100
21
+ },
22
+ "feature" : {
23
+ "sampling_rate": 24000,
24
+ "type": "vocos"
25
+ }
26
+ }
iter-525000-avg-2.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d0866f1a66c4fb3a2b5cb5f0fb5cbf6a9491a3fc305cf6467f6a655b5fdeea67
3
+ size 491164130
tokens.txt ADDED
@@ -0,0 +1,360 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ _ 0
2
+ ^ 1
3
+ $ 2
4
+ 3
5
+ ! 4
6
+ ' 5
7
+ ( 6
8
+ ) 7
9
+ , 8
10
+ - 9
11
+ . 10
12
+ : 11
13
+ ; 12
14
+ ? 13
15
+ a 14
16
+ b 15
17
+ c 16
18
+ d 17
19
+ e 18
20
+ f 19
21
+ h 20
22
+ i 21
23
+ j 22
24
+ k 23
25
+ l 24
26
+ m 25
27
+ n 26
28
+ o 27
29
+ p 28
30
+ q 29
31
+ r 30
32
+ s 31
33
+ t 32
34
+ u 33
35
+ v 34
36
+ w 35
37
+ x 36
38
+ y 37
39
+ z 38
40
+ æ 39
41
+ ç 40
42
+ ð 41
43
+ ø 42
44
+ ħ 43
45
+ ŋ 44
46
+ œ 45
47
+ ǀ 46
48
+ ǁ 47
49
+ ǂ 48
50
+ ǃ 49
51
+ ɐ 50
52
+ ɑ 51
53
+ ɒ 52
54
+ ɓ 53
55
+ ɔ 54
56
+ ɕ 55
57
+ ɖ 56
58
+ ɗ 57
59
+ ɘ 58
60
+ ə 59
61
+ ɚ 60
62
+ ɛ 61
63
+ ɜ 62
64
+ ɞ 63
65
+ ɟ 64
66
+ ɠ 65
67
+ ɡ 66
68
+ ɢ 67
69
+ ɣ 68
70
+ ɤ 69
71
+ ɥ 70
72
+ ɦ 71
73
+ ɧ 72
74
+ ɨ 73
75
+ ɪ 74
76
+ ɫ 75
77
+ ɬ 76
78
+ ɭ 77
79
+ ɮ 78
80
+ ɯ 79
81
+ ɰ 80
82
+ ɱ 81
83
+ ɲ 82
84
+ ɳ 83
85
+ ɴ 84
86
+ ɵ 85
87
+ ɶ 86
88
+ ɸ 87
89
+ ɹ 88
90
+ ɺ 89
91
+ ɻ 90
92
+ ɽ 91
93
+ ɾ 92
94
+ ʀ 93
95
+ ʁ 94
96
+ ʂ 95
97
+ ʃ 96
98
+ ʄ 97
99
+ ʈ 98
100
+ ʉ 99
101
+ ʊ 100
102
+ ʋ 101
103
+ ʌ 102
104
+ ʍ 103
105
+ ʎ 104
106
+ ʏ 105
107
+ ʐ 106
108
+ ʑ 107
109
+ ʒ 108
110
+ ʔ 109
111
+ ʕ 110
112
+ ʘ 111
113
+ ʙ 112
114
+ ʛ 113
115
+ ʜ 114
116
+ ʝ 115
117
+ ʟ 116
118
+ ʡ 117
119
+ ʢ 118
120
+ ʲ 119
121
+ ˈ 120
122
+ ˌ 121
123
+ ː 122
124
+ ˑ 123
125
+ ˞ 124
126
+ β 125
127
+ θ 126
128
+ χ 127
129
+ ᵻ 128
130
+ ⱱ 129
131
+ 0 130
132
+ 1 131
133
+ 2 132
134
+ 3 133
135
+ 4 134
136
+ 5 135
137
+ 6 136
138
+ 7 137
139
+ 8 138
140
+ 9 139
141
+ ̧ 140
142
+ ̃ 141
143
+ ̪ 142
144
+ ̯ 143
145
+ ̩ 144
146
+ ʰ 145
147
+ ˤ 146
148
+ ε 147
149
+ ↓ 148
150
+ # 149
151
+ " 150
152
+ ↑ 151
153
+ ̺ 152
154
+ ̻ 153
155
+ g 154
156
+ ʦ 155
157
+ X 156
158
+ ̝ 157
159
+ ̊ 158
160
+ a1 159
161
+ a2 160
162
+ a3 161
163
+ a4 162
164
+ a5 163
165
+ ai1 164
166
+ ai2 165
167
+ ai3 166
168
+ ai4 167
169
+ ai5 168
170
+ an1 169
171
+ an2 170
172
+ an3 171
173
+ an4 172
174
+ an5 173
175
+ ang1 174
176
+ ang2 175
177
+ ang3 176
178
+ ang4 177
179
+ ang5 178
180
+ ao1 179
181
+ ao2 180
182
+ ao3 181
183
+ ao4 182
184
+ ao5 183
185
+ b0 184
186
+ c0 185
187
+ ch0 186
188
+ d0 187
189
+ e1 188
190
+ e2 189
191
+ e3 190
192
+ e4 191
193
+ e5 192
194
+ ei1 193
195
+ ei2 194
196
+ ei3 195
197
+ ei4 196
198
+ ei5 197
199
+ en1 198
200
+ en2 199
201
+ en3 200
202
+ en4 201
203
+ en5 202
204
+ eng1 203
205
+ eng2 204
206
+ eng3 205
207
+ eng4 206
208
+ eng5 207
209
+ er2 208
210
+ er3 209
211
+ er4 210
212
+ er5 211
213
+ f0 212
214
+ g0 213
215
+ g2 214
216
+ g3 215
217
+ g4 216
218
+ g5 217
219
+ h0 218
220
+ i1 219
221
+ i2 220
222
+ i3 221
223
+ i4 222
224
+ i5 223
225
+ ia1 224
226
+ ia2 225
227
+ ia3 226
228
+ ia4 227
229
+ ia5 228
230
+ ian1 229
231
+ ian2 230
232
+ ian3 231
233
+ ian4 232
234
+ ian5 233
235
+ iang1 234
236
+ iang2 235
237
+ iang3 236
238
+ iang4 237
239
+ iang5 238
240
+ iao1 239
241
+ iao2 240
242
+ iao3 241
243
+ iao4 242
244
+ iao5 243
245
+ ie1 244
246
+ ie2 245
247
+ ie3 246
248
+ ie4 247
249
+ ie5 248
250
+ in1 249
251
+ in2 250
252
+ in3 251
253
+ in4 252
254
+ in5 253
255
+ ing1 254
256
+ ing2 255
257
+ ing3 256
258
+ ing4 257
259
+ ing5 258
260
+ iong1 259
261
+ iong2 260
262
+ iong3 261
263
+ iong4 262
264
+ iu1 263
265
+ iu2 264
266
+ iu3 265
267
+ iu4 266
268
+ iu5 267
269
+ j0 268
270
+ k0 269
271
+ l0 270
272
+ m0 271
273
+ m1 272
274
+ m2 273
275
+ m4 274
276
+ m5 275
277
+ n0 276
278
+ n2 277
279
+ n3 278
280
+ n4 279
281
+ n5 280
282
+ ng5 281
283
+ o1 282
284
+ o2 283
285
+ o3 284
286
+ o4 285
287
+ o5 286
288
+ ong1 287
289
+ ong2 288
290
+ ong3 289
291
+ ong4 290
292
+ ong5 291
293
+ ou1 292
294
+ ou2 293
295
+ ou3 294
296
+ ou4 295
297
+ ou5 296
298
+ p0 297
299
+ q0 298
300
+ r0 299
301
+ s0 300
302
+ sh0 301
303
+ t0 302
304
+ u1 303
305
+ u2 304
306
+ u3 305
307
+ u4 306
308
+ u5 307
309
+ ua1 308
310
+ ua2 309
311
+ ua3 310
312
+ ua4 311
313
+ uai1 312
314
+ uai2 313
315
+ uai3 314
316
+ uai4 315
317
+ uai5 316
318
+ uan1 317
319
+ uan2 318
320
+ uan3 319
321
+ uan4 320
322
+ uan5 321
323
+ uang1 322
324
+ uang2 323
325
+ uang3 324
326
+ uang4 325
327
+ uang5 326
328
+ ue1 327
329
+ ue2 328
330
+ ue3 329
331
+ ue4 330
332
+ ui1 331
333
+ ui2 332
334
+ ui3 333
335
+ ui4 334
336
+ ui5 335
337
+ un1 336
338
+ un2 337
339
+ un3 338
340
+ un4 339
341
+ un5 340
342
+ uo1 341
343
+ uo2 342
344
+ uo3 343
345
+ uo4 344
346
+ uo5 345
347
+ v2 346
348
+ v3 347
349
+ v4 348
350
+ ve3 349
351
+ ve4 350
352
+ w0 351
353
+ x0 352
354
+ y0 353
355
+ z0 354
356
+ zh0 355
357
+ ê1 356
358
+ ê2 357
359
+ ê3 358
360
+ ê4 359