Text-to-Speech
Transformers
Safetensors
higgs_multimodal_qwen3
text-generation
speech-generation
voice-agent
expressive-speech
controllable-tts
multilingual-tts
Instructions to use bosonai/higgs-tts-3-4b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use bosonai/higgs-tts-3-4b with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-to-speech", model="bosonai/higgs-tts-3-4b")# Load model directly from transformers import AutoModelForSeq2SeqLM model = AutoModelForSeq2SeqLM.from_pretrained("bosonai/higgs-tts-3-4b", dtype="auto") - Notebooks
- Google Colab
- Kaggle
Mirror from boson-sglang/higgs-audio-v3-tts-4b
Browse files- .gitattributes +1 -0
- LICENSE +166 -0
- README.md +488 -0
- assets/model_architecture.png +0 -0
- chat_template.jinja +54 -0
- config.json +112 -0
- model.safetensors +3 -0
- model.safetensors.index.json +934 -0
- tokenizer.json +3 -0
- tokenizer_config.json +78 -0
.gitattributes
CHANGED
|
@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
|
|
| 33 |
*.zip filter=lfs diff=lfs merge=lfs -text
|
| 34 |
*.zst filter=lfs diff=lfs merge=lfs -text
|
| 35 |
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
|
|
|
|
|
| 33 |
*.zip filter=lfs diff=lfs merge=lfs -text
|
| 34 |
*.zst filter=lfs diff=lfs merge=lfs -text
|
| 35 |
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
| 36 |
+
tokenizer.json filter=lfs diff=lfs merge=lfs -text
|
LICENSE
ADDED
|
@@ -0,0 +1,166 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
BOSON HIGGS AUDIO v3 RESEARCH AND NON-COMMERCIAL LICENSE AGREEMENT
|
| 2 |
+
|
| 3 |
+
Last Updated: May 21, 2026
|
| 4 |
+
|
| 5 |
+
I. INTRODUCTION
|
| 6 |
+
|
| 7 |
+
This License Agreement (the "Agreement") is entered into by and between Licensee (as defined below) and Boson AI USA, Inc. ("Boson", "we", or "us"). This Agreement governs Your use, reproduction, distribution, and modification of the Higgs Materials.
|
| 8 |
+
|
| 9 |
+
This Agreement is intended to allow Research and Non-Commercial use of the Higgs Materials free of charge. Any Commercial use of the Higgs Materials requires a separate written license from Boson. For clarity, this Agreement is a research and non-commercial source-available model license and is not an open source license.
|
| 10 |
+
|
| 11 |
+
The Higgs Materials may incorporate third-party components made available under their own open source licenses. Your obligations under those third-party licenses (including, where applicable, the retention of upstream copyright, patent, trademark, and attribution notices) are addressed in a separate "NOTICE" file accompanying the Higgs Materials and are in addition to Your obligations under this Agreement. This Agreement governs the Higgs Materials and any Derivative Work as a whole.
|
| 12 |
+
|
| 13 |
+
By clicking "I Accept", or by accessing, using, or distributing any portion or element of the Higgs Materials or any Derivative Work, You agree that You have read, understood, and are bound by the terms of this Agreement. If You are acting on behalf of an entity, You represent that You are authorized to bind that entity, and "You" includes both you and that entity.
|
| 14 |
+
|
| 15 |
+
II. RESEARCH AND NON-COMMERCIAL USE LICENSE
|
| 16 |
+
|
| 17 |
+
Subject to the terms of this Agreement, Boson grants You a non-exclusive, worldwide, non-transferable, non-sublicensable, royalty-free, limited license (terminable as set forth in Section IV(h)) under Boson's intellectual property rights embodied in the Higgs Materials to use, reproduce, distribute, copy, create Derivative Works of, and make modifications to the Higgs Materials solely for a Research Purpose or a Non-Commercial Purpose.
|
| 18 |
+
|
| 19 |
+
"Research Purpose" means academic or scientific advancement that is not primarily intended for commercial advantage or monetary compensation to You or others, including peer-reviewed publication, benchmarking, and reproducibility studies. For clarity, Research Purpose may include non-production benchmarking, evaluation, security testing, and red-teaming by commercial entities, in each case to the extent expressly permitted in Section III.
|
| 20 |
+
|
| 21 |
+
"Non-Commercial Purpose" means any purpose other than a Research Purpose that is not primarily intended for commercial advantage or monetary compensation, such as personal use (i.e., hobbyist projects) or short-term evaluation and testing.
|
| 22 |
+
|
| 23 |
+
III. COMMERCIAL USE
|
| 24 |
+
|
| 25 |
+
Any use of the Higgs Materials, Derivative Works, or any output generated using the foregoing for a Commercial Purpose requires a separate written license agreement from Boson. No commercial rights are granted under this Agreement.
|
| 26 |
+
|
| 27 |
+
"Commercial Purpose" means any purpose that is primarily intended for or directed toward commercial advantage or monetary compensation to You or others, including but not limited to: (i) production use, hosted use (including via an API, SaaS offering, plug-in, or end-user application), or any use made available to end users outside Your organization; (ii) any use in connection with a product or service for which You charge a fee, sell advertising, or otherwise generate revenue, whether directly or indirectly; (iii) Your or Your Affiliate's internal business or organizational operations beyond limited evaluation as permitted below; and (iv) any use described in Section IV(b)(i) (model training, fine-tuning, distillation, or improvement of non-Boson models), which is independently prohibited by that section absent a separate written agreement with Boson.
|
| 28 |
+
|
| 29 |
+
For clarity, limited internal evaluation, benchmarking, security testing, or red-teaming of the Higgs Materials by a for-profit entity is permitted as a Research Purpose or Non-Commercial Purpose, provided that such use is (a) not deployed in production, (b) not made available to end users or customers, (c) not used to generate revenue, and (d) not used to train, fine-tune, distill, or improve any non-Boson model. Your status as a for-profit entity (including the amount of funding You have raised or revenue You have generated) does not, by itself, make Your use a Commercial Purpose.
|
| 30 |
+
|
| 31 |
+
For further clarity, the publication of research results, benchmark outputs, illustrative samples, or non-commercial demonstrations in academic papers, research reports, conference talks, blog posts, public model cards, or non-commercial repositories is not a Commercial Purpose solely because such materials are made publicly accessible, provided that such use (a) is not monetized, (b) is not used to provide a product or service, and (c) complies with the Acceptable Use Policy and the Use Restrictions in Section IV(b). Outputs generated solely for permitted evaluation, benchmarking, academic research, security testing, red-teaming, or non-commercial demonstration may be retained, displayed, and distributed as part of such permitted activity, provided they are not monetized, used in a product or service, or used to train, fine-tune, distill, or improve non-Boson models.
|
| 32 |
+
|
| 33 |
+
To obtain a commercial license, contact Boson at:
|
| 34 |
+
|
| 35 |
+
Website: https://boson.ai
|
| 36 |
+
Email: contact@boson.ai
|
| 37 |
+
|
| 38 |
+
IV. GENERAL TERMS
|
| 39 |
+
|
| 40 |
+
Your license under this Agreement is subject to the following terms.
|
| 41 |
+
|
| 42 |
+
a. Distribution and Attribution.
|
| 43 |
+
|
| 44 |
+
If You distribute or make available the Higgs Materials or any Derivative Work to a third party, or make available any permitted non-commercial application, demonstration, repository, or service that uses any portion of the foregoing, You shall:
|
| 45 |
+
|
| 46 |
+
(i) provide a copy of this Agreement to that third party, together with any NOTICE file that accompanies the Higgs Materials;
|
| 47 |
+
|
| 48 |
+
(ii) retain the following attribution notice within a "NOTICE" text file distributed as part of such copies:
|
| 49 |
+
|
| 50 |
+
"Boson Higgs Audio v3 is licensed under the Boson Higgs Audio v3 Research and Non-Commercial License, Copyright (c) Boson AI USA, Inc. All Rights Reserved.";
|
| 51 |
+
|
| 52 |
+
(iii) for public-facing applications or demonstrations, make a reasonable attribution such as:
|
| 53 |
+
|
| 54 |
+
"Built with Higgs Audio v3 licensed from Boson AI USA, Inc.";
|
| 55 |
+
|
| 56 |
+
For repositories, model cards, papers, reports, or technical documentation, attribution in the README, model card, NOTICE file, or equivalent documentation is sufficient; and
|
| 57 |
+
|
| 58 |
+
(iv) if You create, modify, fine-tune, distill, or otherwise improve any AI model or software using the Higgs Materials and distribute or make it available, You shall state in the model card, documentation, repository description, and distribution page that the model or software is "Derived from Higgs Audio v3, licensed from Boson AI USA, Inc." You shall not use "Higgs Audio v3" (or any confusingly similar name, including without limitation "Higgs Audio 4", "Higgs Pro", or "Boson Audio") as the leading or primary element of Your model or software name, or otherwise in a manner that suggests official status, endorsement by, affiliation with, or successor status to Boson.
|
| 59 |
+
|
| 60 |
+
If You create a Derivative Work, You may add Your own attribution notices to the "NOTICE" text file, provided You clearly indicate which attributions apply to the Higgs Materials and state that You changed the Higgs Materials and how You modified them.
|
| 61 |
+
|
| 62 |
+
b. Use Restrictions.
|
| 63 |
+
|
| 64 |
+
Your use of the Higgs Materials and any Derivative Work, including any output or results thereof, must comply with all applicable laws and regulations (including Trade Control Laws) and must adhere to the Acceptable Use Policy, which is incorporated by reference into this Agreement. Updates to the Acceptable Use Policy apply prospectively and will not retroactively terminate rights for prior compliant use, except where reasonably necessary to address legal compliance, security, fraud, abuse, or safety risks.
|
| 65 |
+
|
| 66 |
+
You will not, and will not permit any third party to:
|
| 67 |
+
|
| 68 |
+
(i) use the Higgs Materials, Derivative Works, or any output or results thereof to create, train, fine-tune, distill, or improve any foundational generative AI model, large language model, audio model, speech model, or other machine-learning model that is intended to generate, synthesize, clone, transform, or materially improve speech, audio, language, or multimodal generation capabilities (excluding Higgs Audio v3 itself and Derivative Works thereof for permitted purposes), unless expressly authorized in a separate written agreement with Boson. For clarity, this restriction does not prohibit the creation or use of non-generative classifiers, evaluators, safety filters, watermark or provenance detectors, or benchmark tools solely for permitted Research or Non-Commercial Purposes, provided that (a) they are not used, directly or indirectly, to train, fine-tune, distill, or improve any non-Boson generative model, and (b) they are not designed or intended to enable, accelerate, or contribute to the development of any audio, speech, language, or multimodal generation system by You or any third party;
|
| 69 |
+
|
| 70 |
+
(ii) use the Higgs Materials or any output to generate audio that impersonates, clones, simulates, or imitates any real person (including their voice, name, likeness, or persona) without that person's explicit, verifiable consent, which may include written or legally valid electronic consent. You may not submit or use reference audio, speaker embeddings, voice prompts, or other materials for voice cloning or voice adaptation unless You have all necessary rights, licenses, and consents for such use. For public-facing, commercial, telephony, political, endorsement, or other high-risk uses, You must maintain records sufficient to demonstrate such consent and produce them on Boson's reasonable request;
|
| 71 |
+
|
| 72 |
+
(iii) use the Higgs Materials or any output to create content intended to defraud, harass, stalk, threaten, defame, or deceive any person;
|
| 73 |
+
|
| 74 |
+
(iv) use the Higgs Materials or any output to generate audio depicting any minor in any sexual, exploitative, or otherwise harmful context;
|
| 75 |
+
|
| 76 |
+
(v) use the Higgs Materials or any output for (A) deceptive political persuasion, election-related deception, campaign impersonation, voter suppression, or synthetic robocalls; or (B) generating audio that falsely depicts, simulates, or attributes statements to any political candidate, elected official, government official, party representative, or other public figure, unless You have that person's explicit prior authorization and provide all legally required disclosures (including, where applicable, clear and conspicuous labeling that the content is synthetic). For the avoidance of doubt, subsection (B) is not intended to prohibit bona fide journalism, commentary, education, scholarship, parody, or satire that does not falsely present synthetic content as authentic;
|
| 77 |
+
|
| 78 |
+
(vi) use the Higgs Materials, Derivative Works, or any output or results thereof to identify, verify, authenticate, track, surveil, or profile any person based on voice or other biometric characteristics, except with a lawful basis and the explicit consent required by applicable law;
|
| 79 |
+
|
| 80 |
+
(vii) make public-facing synthetic audio, telephony, advertising, political, endorsement, or other high-risk content without clear and conspicuous disclosure that the content is AI-generated or synthetic where required by law or where a reasonable person would otherwise be misled;
|
| 81 |
+
|
| 82 |
+
(viii) remove, obscure, alter, or fail to reproduce any copyright, trademark, attribution, watermark, provenance signal, or other proprietary notice included in or output by the Higgs Materials;
|
| 83 |
+
|
| 84 |
+
(ix) reverse engineer, decompile, or otherwise attempt to extract non-public components, training data, source data, trade secrets, or proprietary information from the Higgs Materials, except to the extent such restriction is prohibited by applicable law. For the avoidance of doubt, this restriction does not prohibit ordinary inspection, loading, conversion, quantization, fine-tuning, or other use of the model weights as made available by Boson under this Agreement; or
|
| 85 |
+
|
| 86 |
+
(x) bring any legal action, claim, charge, or demand challenging Boson's ownership of, or rights in, the Higgs Materials.
|
| 87 |
+
|
| 88 |
+
No Third-Party Rights Granted. This Agreement does not grant You any rights to use, reproduce, clone, simulate, imitate, or otherwise exploit any third-party voice, likeness, name, image, persona, performance, recording, copyrighted work, or other protected content. You are solely responsible for obtaining all rights, licenses, consents, and permissions required for any reference audio, speaker embedding, prompt, input, output, and any downstream use thereof. Nothing in the Higgs Materials or any output thereof shall be construed as a representation or warranty that any voice, likeness, recording, copyrighted work, or other content embedded in or generated by the Higgs Materials is cleared for Your intended use.
|
| 89 |
+
|
| 90 |
+
c. Representations and Warranties.
|
| 91 |
+
|
| 92 |
+
You represent and warrant that: (i) You are of the age required by applicable law to enter into this Agreement and, if acting on behalf of an entity, are authorized to bind that entity; (ii) neither You nor any party that owns or controls You is listed on any U.S. or other applicable government list of prohibited or restricted parties, and You are not located in, organized under the laws of, or ordinarily resident in any country or territory subject to comprehensive U.S. trade sanctions; (iii) You will not knowingly make the Higgs Materials or any Derivative Work available to any party listed on any U.S. or other applicable government list of prohibited or restricted parties, and will use commercially reasonable measures to prevent such access by parties You know or reasonably should know to be so listed; (iv) You will not use the Higgs Materials in connection with the design, development, production, or use of nuclear, chemical, or biological weapons, or missile technology; and (v) all information You provide to Boson in connection with this Agreement is true, accurate, and complete.
|
| 93 |
+
|
| 94 |
+
d. Intellectual Property.
|
| 95 |
+
|
| 96 |
+
(i) Trademark License. No trademark licenses are granted under this Agreement. Neither Boson nor Licensee may use any name or mark owned by, or associated with, the other party or any of its Affiliates, except as required to comply with Section IV(a). Boson hereby grants You a limited, non-exclusive, non-transferable license to use "Higgs Audio v3" (the "Mark") solely as a descriptive reference required to comply with the attribution and notice obligations in Sections IV(a)(ii)-(iv), and not as the leading or primary name of any product, model, or service. All goodwill arising out of Your use of the Mark inures to the benefit of Boson.
|
| 97 |
+
|
| 98 |
+
(ii) Ownership of Derivative Works. Subject to Boson's ownership of the Higgs Materials and any Derivative Works made by or for Boson, and subject to the restrictions in this Agreement, as between You and Boson, You are the owner of any Derivative Works You create.
|
| 99 |
+
|
| 100 |
+
(iii) Ownership of Outputs. As between You and Boson, You own the outputs generated from the Higgs Materials or Derivative Works to the extent permitted by applicable law, provided that Boson grants no rights in any output generated in violation of this Agreement. Boson makes no representation that any output is non-infringing or fit for any purpose, and You are solely responsible for Your use of any output.
|
| 101 |
+
|
| 102 |
+
(iv) Disputes. If You or Your Affiliate institutes litigation or other proceedings against Boson or any of Boson's licensees, customers, or end users (including a cross-claim or counterclaim in a lawsuit) alleging that the Higgs Materials, Derivative Works, or any output or results thereof, or any portion of any of the foregoing, constitutes infringement of intellectual property or other rights owned or licensable by You, then any licenses granted to You under this Agreement shall terminate automatically as of the date such litigation or claim is filed or instituted. You will indemnify, defend, and hold harmless Boson and its Affiliates, officers, directors, employees, and agents from and against any claim, charge, demand, loss, or cause of action by any third party arising out of or related to Your use or distribution of the Higgs Materials or Derivative Works, Your outputs, or Your breach of this Agreement.
|
| 103 |
+
|
| 104 |
+
(v) Feedback. From time to time, You may provide Boson with verbal or written suggestions, comments, ideas, or other feedback related to Boson's existing or prospective technology, products, or services (collectively, "Feedback"). You are not obligated to provide Feedback, but to the extent You do, You hereby grant Boson a perpetual, irrevocable, royalty-free, fully paid-up, sublicensable, transferable, non-exclusive, worldwide right and license to use, reproduce, distribute, modify, and otherwise exploit the Feedback in any manner without restriction. All Feedback is provided "AS IS" and You make no warranties regarding any Feedback.
|
| 105 |
+
|
| 106 |
+
e. Audit and Reporting.
|
| 107 |
+
|
| 108 |
+
This Section IV(e) applies only to Licensees that (i) distribute or make available the Higgs Materials or any Derivative Work to third parties, (ii) make the Higgs Materials or any Derivative Work available as part of a hosted demonstration, service, or product, or (iii) Boson reasonably believes are using the Higgs Materials for a Commercial Purpose without a commercial license. It does not apply to individual hobbyist or personal Non-Commercial use.
|
| 109 |
+
|
| 110 |
+
For Licensees within the scope above, Boson reserves the right, no more than once per calendar year and upon reasonable prior written notice, to request a written report describing Your use of the Higgs Materials and Derivative Works, including the purposes of such use and the approximate number of users or end users (if any). You shall respond to any such request within thirty (30) days. If, following such request, Boson reasonably believes You are using the Higgs Materials for a Commercial Purpose without a commercial license, Boson may require You to either (i) cease such use, or (ii) enter into a commercial license on then-current terms.
|
| 111 |
+
|
| 112 |
+
f. Disclaimer of Warranty.
|
| 113 |
+
|
| 114 |
+
UNLESS REQUIRED BY APPLICABLE LAW, THE HIGGS MATERIALS AND ANY OUTPUT AND RESULTS THEREFROM ARE PROVIDED ON AN "AS IS" BASIS, WITH ALL FAULTS, WITHOUT WARRANTIES OF ANY KIND, WHETHER EXPRESS, IMPLIED, STATUTORY, OR ARISING FROM COURSE OF DEALING OR USAGE OF TRADE. BOSON DISCLAIMS ALL WARRANTIES, INCLUDING, WITHOUT LIMITATION, ANY WARRANTIES OF TITLE, NON-INFRINGEMENT, MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, ACCURACY, AND ANY WARRANTY THAT THE HIGGS MATERIALS WILL BE UNINTERRUPTED OR ERROR-FREE. YOU ARE SOLELY RESPONSIBLE FOR DETERMINING THE APPROPRIATENESS AND LAWFULNESS OF USING OR REDISTRIBUTING THE HIGGS MATERIALS AND ASSUME ANY AND ALL RISKS ASSOCIATED WITH SUCH USE.
|
| 115 |
+
|
| 116 |
+
g. Limitation of Liability.
|
| 117 |
+
|
| 118 |
+
IN NO EVENT WILL BOSON OR ITS AFFILIATES BE LIABLE UNDER ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, TORT (INCLUDING NEGLIGENCE), STRICT LIABILITY, PRODUCTS LIABILITY, OR OTHERWISE, ARISING OUT OF OR RELATING TO THIS AGREEMENT OR THE HIGGS MATERIALS, FOR ANY LOST PROFITS, LOST REVENUE, LOSS OF DATA, COST OF SUBSTITUTE GOODS, OR ANY INDIRECT, SPECIAL, CONSEQUENTIAL, INCIDENTAL, EXEMPLARY, OR PUNITIVE DAMAGES, EVEN IF BOSON OR ITS AFFILIATES HAVE BEEN ADVISED OF THE POSSIBILITY OF ANY OF THE FOREGOING. IN ANY EVENT, BOSON'S AGGREGATE LIABILITY ARISING OUT OF OR RELATING TO THIS AGREEMENT SHALL NOT EXCEED ONE HUNDRED U.S. DOLLARS (US$100).
|
| 119 |
+
|
| 120 |
+
h. Term and Termination.
|
| 121 |
+
|
| 122 |
+
The term of this Agreement commences upon Your acceptance of this Agreement or first access to the Higgs Materials and continues until terminated. Boson may terminate this Agreement upon Your breach of any term of this Agreement, Your violation of the Acceptable Use Policy, or Your violation of applicable law, in each case by providing You written notice (which may include notice posted at https://boson.ai or in the Higgs Materials repository). For material breaches that are capable of cure, Boson will provide You a reasonable opportunity to cure (not less than thirty (30) days) before terminating, except for breaches of Section IV(b) or violations of applicable law, which Boson may treat as non-curable. This Agreement terminates automatically and without notice upon the events specified in Section IV(d)(iv).
|
| 123 |
+
|
| 124 |
+
Boson may, at any time, discontinue making the Higgs Materials available prospectively (including by removing them from distribution channels), but such prospective discontinuation does not, by itself, terminate the rights already granted to Licensees who are then in compliance with this Agreement.
|
| 125 |
+
|
| 126 |
+
Upon termination of this Agreement, You shall (i) immediately cease all use of the Higgs Materials and Derivative Works, (ii) delete all copies of the Higgs Materials and Derivative Works from any computer, server, or storage medium under Your control, and (iii) certify such deletion in writing to Boson upon request. Sections III, IV(a)(ii), IV(b) (with respect to any prior use, any outputs generated before termination, and the No Third-Party Rights Granted paragraph), IV(c), IV(d), IV(f), IV(g), IV(h), and IV(i) shall survive any termination or expiration of this Agreement.
|
| 127 |
+
|
| 128 |
+
i. Governing Law and Jurisdiction.
|
| 129 |
+
|
| 130 |
+
This Agreement is governed by and construed in accordance with the laws of the State of California, without regard to its choice-of-law principles. The UN Convention on Contracts for the International Sale of Goods does not apply. The federal courts located in the Northern District of California and the state courts located in Santa Clara County, California shall have exclusive jurisdiction of any dispute arising out of or relating to this Agreement, and You consent to personal jurisdiction in those courts and waive any objection to venue.
|
| 131 |
+
|
| 132 |
+
j. Miscellaneous.
|
| 133 |
+
|
| 134 |
+
This Agreement, together with the Acceptable Use Policy and any NOTICE file accompanying the Higgs Materials, constitutes the entire agreement between the parties regarding the subject matter hereof and supersedes any prior or contemporaneous understandings. No waiver of any term shall be effective unless in writing and signed by Boson. If any provision of this Agreement is held unenforceable, the remainder shall remain in full force. You may not assign or transfer this Agreement, in whole or in part, without Boson's prior written consent; any attempted assignment in violation of this section is void. Boson may assign this Agreement freely.
|
| 135 |
+
|
| 136 |
+
V. DEFINITIONS
|
| 137 |
+
|
| 138 |
+
"Acceptable Use Policy" means the Boson AI Acceptable Use Policy available at https://boson.ai/acceptable-use, as updated by Boson from time to time, subject to the prospective-update limitation in Section IV(b).
|
| 139 |
+
|
| 140 |
+
"Affiliate" means, with respect to a party, any entity that directly or indirectly controls, is controlled by, or is under common control with that party, where "control" means direct or indirect ownership of more than fifty percent (50%) of the voting interests of the subject entity.
|
| 141 |
+
|
| 142 |
+
"Agreement" means this Boson Higgs Audio v3 Research and Non-Commercial License Agreement, including the incorporated Acceptable Use Policy.
|
| 143 |
+
|
| 144 |
+
"Boson" or "we" or "us" means Boson AI USA, Inc.
|
| 145 |
+
|
| 146 |
+
"Commercial Purpose" — see Section III.
|
| 147 |
+
|
| 148 |
+
"Derivative Work" means (a) any derivative work of the Higgs Materials as recognized under U.S. copyright law and (b) any modification to a Model, and any other model created or trained based on or derived from a Model or a Model's outputs, including without limitation fine-tuned models, distilled models, quantized models, converted or format-transformed models, and low-rank adaptation ("LoRA") weights. For clarity, (i) the outputs of a Model are not themselves Derivative Works; (ii) evaluation reports, benchmark results, academic papers, and other non-model analytical materials created from outputs are not Derivative Works, but in each case remain subject to the use restrictions in this Agreement; and (iii) quantized, converted, format-transformed, or optimized versions of a Model are Derivative Works and may be distributed only under this Agreement and only for a Research Purpose or Non-Commercial Purpose.
|
| 149 |
+
|
| 150 |
+
"Documentation" means any specifications, manuals, technical reports, model cards, and other written materials provided by Boson related to the Higgs Materials.
|
| 151 |
+
|
| 152 |
+
"Higgs Audio v3" means the Boson Higgs Audio v3 foundational audio language model, including model architecture, trained model weights, inference code, training code, fine-tuning code, audio tokenizer, and related elements, in each case as developed by Boson and made available at https://github.com/boson-ai/higgs-audio, https://huggingface.co/bosonai/, or otherwise.
|
| 153 |
+
|
| 154 |
+
"Higgs Materials" means, collectively, Higgs Audio v3, the Documentation, and any other materials made available by Boson under this Agreement.
|
| 155 |
+
|
| 156 |
+
"Licensee" or "You" or "Your" means the individual or entity exercising rights under this Agreement.
|
| 157 |
+
|
| 158 |
+
"Mark" — see Section IV(d)(i).
|
| 159 |
+
|
| 160 |
+
"Model" means Higgs Audio v3 and any other machine-learning model included in the Higgs Materials.
|
| 161 |
+
|
| 162 |
+
"Non-Commercial Purpose" — see Section II.
|
| 163 |
+
|
| 164 |
+
"Research Purpose" — see Section II.
|
| 165 |
+
|
| 166 |
+
"Trade Control Laws" means all applicable U.S. and non-U.S. export control, economic sanctions, and trade laws and regulations, including those administered by the U.S. Department of Commerce Bureau of Industry and Security, the U.S. Department of the Treasury Office of Foreign Assets Control, and the U.S. Department of State.
|
README.md
ADDED
|
@@ -0,0 +1,488 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: other
|
| 3 |
+
license_name: boson-higgs-audio-v3-research-and-non-commercial-license
|
| 4 |
+
license_link: LICENSE
|
| 5 |
+
language:
|
| 6 |
+
- af
|
| 7 |
+
- ar
|
| 8 |
+
- as
|
| 9 |
+
- ast
|
| 10 |
+
- az
|
| 11 |
+
- ba
|
| 12 |
+
- be
|
| 13 |
+
- bg
|
| 14 |
+
- bn
|
| 15 |
+
- bs
|
| 16 |
+
- ca
|
| 17 |
+
- ceb
|
| 18 |
+
- ckb
|
| 19 |
+
- cs
|
| 20 |
+
- cy
|
| 21 |
+
- da
|
| 22 |
+
- de
|
| 23 |
+
- el
|
| 24 |
+
- en
|
| 25 |
+
- eo
|
| 26 |
+
- es
|
| 27 |
+
- et
|
| 28 |
+
- eu
|
| 29 |
+
- fa
|
| 30 |
+
- fi
|
| 31 |
+
- fr
|
| 32 |
+
- ga
|
| 33 |
+
- gl
|
| 34 |
+
- gu
|
| 35 |
+
- ha
|
| 36 |
+
- he
|
| 37 |
+
- hi
|
| 38 |
+
- hr
|
| 39 |
+
- ht
|
| 40 |
+
- hu
|
| 41 |
+
- hy
|
| 42 |
+
- id
|
| 43 |
+
- is
|
| 44 |
+
- it
|
| 45 |
+
- jv
|
| 46 |
+
- ka
|
| 47 |
+
- kab
|
| 48 |
+
- kam
|
| 49 |
+
- kea
|
| 50 |
+
- kk
|
| 51 |
+
- kn
|
| 52 |
+
- ko
|
| 53 |
+
- ky
|
| 54 |
+
- la
|
| 55 |
+
- lb
|
| 56 |
+
- lg
|
| 57 |
+
- ln
|
| 58 |
+
- lt
|
| 59 |
+
- luo
|
| 60 |
+
- lv
|
| 61 |
+
- mhr
|
| 62 |
+
- mi
|
| 63 |
+
- mk
|
| 64 |
+
- ml
|
| 65 |
+
- mn
|
| 66 |
+
- mr
|
| 67 |
+
- ms
|
| 68 |
+
- mt
|
| 69 |
+
- ne
|
| 70 |
+
- nl
|
| 71 |
+
- no
|
| 72 |
+
- nso
|
| 73 |
+
- ny
|
| 74 |
+
- oc
|
| 75 |
+
- om
|
| 76 |
+
- pa
|
| 77 |
+
- pl
|
| 78 |
+
- ps
|
| 79 |
+
- pt
|
| 80 |
+
- ro
|
| 81 |
+
- ru
|
| 82 |
+
- rw
|
| 83 |
+
- sd
|
| 84 |
+
- sk
|
| 85 |
+
- sl
|
| 86 |
+
- sn
|
| 87 |
+
- so
|
| 88 |
+
- sq
|
| 89 |
+
- sr
|
| 90 |
+
- sv
|
| 91 |
+
- sw
|
| 92 |
+
- ta
|
| 93 |
+
- te
|
| 94 |
+
- tg
|
| 95 |
+
- tl
|
| 96 |
+
- tr
|
| 97 |
+
- ug
|
| 98 |
+
- uk
|
| 99 |
+
- umb
|
| 100 |
+
- ur
|
| 101 |
+
- uz
|
| 102 |
+
- vi
|
| 103 |
+
- xh
|
| 104 |
+
- zh
|
| 105 |
+
- zu
|
| 106 |
+
tags:
|
| 107 |
+
- text-to-speech
|
| 108 |
+
- speech-generation
|
| 109 |
+
- voice-agent
|
| 110 |
+
- expressive-speech
|
| 111 |
+
- controllable-tts
|
| 112 |
+
- multilingual-tts
|
| 113 |
+
pipeline_tag: text-to-speech
|
| 114 |
+
library_name: transformers
|
| 115 |
+
---
|
| 116 |
+
|
| 117 |
+
# Higgs Audio v3 TTS
|
| 118 |
+
|
| 119 |
+
<div align="left" style="display: flex; justify-content: flex-start; margin-top: 10px;"> <a href="https://www.boson.ai/blog/higgs-audio-v3-tts"> <img src="https://img.shields.io/badge/💚-Boson%20Blog-228B22" style="margin-right: 5px;"> </a> </div>
|
| 120 |
+
|
| 121 |
+
Higgs Audio v3 TTS is built for voice chat: it **speaks, not just reads**. It turns model responses into expressive conversational speech across **100 languages**, with **zero-shot voice cloning** and **inline control** over emotion, style, prosody, pauses, and sound effects.
|
| 122 |
+
|
| 123 |
+
> [!TIP]
|
| 124 |
+
> Released for research and non-commercial use under the **Boson Higgs Audio v3 Research and Non-Commercial License**. Production, hosted APIs, or revenue-generating use requires a separate commercial license. Prohibited: voice cloning without consent, impersonation, fraud, election deception, biometric surveillance, or any unlawful use.
|
| 125 |
+
|
| 126 |
+

|
| 127 |
+
|
| 128 |
+
Higgs autoregressive decoder consumes interleaved text and audio tokens. Audio is encoded by the **Higgs Tokenizer** into 8 codebooks at 25 fps, staggered via a **delay pattern**, then mapped to backbone hidden states through a **multi-codebook fused embedding**. Output codes pass through a **multi-codebook fused head**, are de-delayed, and decoded back to waveform.
|
| 129 |
+
|
| 130 |
+
| Component | Spec |
|
| 131 |
+
|---|---|
|
| 132 |
+
| Backbone | ~4B autoregressive decoder (36 L, hidden=2560, GQA 32/8) |
|
| 133 |
+
| Multi-codebook embedding / head | Fused single-tensor, tied with text embedding |
|
| 134 |
+
| Context length | 8,192 tokens (training sequence length) |
|
| 135 |
+
| Audio tokens | 8 codebooks × 1026 vocab, delay pattern |
|
| 136 |
+
| Sample rate | 24 kHz |
|
| 137 |
+
| Frame rate | 25 fps (40 ms / frame) |
|
| 138 |
+
|
| 139 |
+
## Supported Languages
|
| 140 |
+
|
| 141 |
+
The model reaches **single-digit WER/CER on 100 languages**, which split into two tiers.
|
| 142 |
+
|
| 143 |
+
### WER/CER under 5 — polished, production-quality (83)
|
| 144 |
+
|
| 145 |
+
🇿🇦 Afrikaans · 🇸🇦🇪🇬 Arabic · 🇦🇲 Armenian · 🇮🇳 Assamese · 🇪🇸 Asturian · 🇦🇿 Azerbaijani · 🇷🇺 Bashkir · 🇪🇸 Basque · 🇧🇾 Belarusian · 🇧🇩🇮🇳 Bengali · 🇧🇦 Bosnian · 🇧🇬 Bulgarian · 🇪🇸 Catalan · 🇵🇭 Cebuano · 🇮🇶 Central Kurdish · 🇨🇳 Chinese · 🇭🇷 Croatian · 🇨🇿 Czech · 🇩🇰 Danish · 🇳🇱🇧🇪 Dutch · 🇷🇺 Eastern Mari · 🇺🇸🇬🇧🇦🇺 English · 🌐 Esperanto · 🇪🇪 Estonian · 🇫🇮 Finnish · 🇫🇷🇨🇦 French · 🇪🇸 Galician · 🇬🇪 Georgian · 🇩🇪🇦🇹 German · 🇬🇷 Greek · 🇮🇳 Gujarati · 🇭🇹 Haitian Creole · 🇳🇬 Hausa · 🇮🇱 Hebrew · 🇮🇳 Hindi · 🇭🇺 Hungarian · 🇮🇩 Indonesian · 🇮🇹 Italian · 🇮🇩 Javanese · 🇮🇳 Kannada · 🇰🇿 Kazakh · 🇷🇼 Kinyarwanda · 🇰🇬 Kyrgyz · 🇱🇻 Latvian · 🇨🇩 Lingala · 🇱🇹 Lithuanian · 🇰🇪 Luo · 🇲🇰 Macedonian · 🇲🇾🇮🇩 Malay · 🇮🇳 Malayalam · 🇲🇹 Maltese · 🇳🇿 Māori · 🇮🇳 Marathi · 🇲🇳 Mongolian · 🇳🇵 Nepali · 🇳🇴 Norwegian · 🇫🇷 Occitan · 🇮🇷🇦🇫 Persian · 🇵🇱 Polish · 🇵🇹🇧🇷 Portuguese · 🇷🇴 Romanian · 🇷🇺 Russian · 🇿🇦 Sepedi · 🇷🇸 Serbian · 🇿🇼 Shona · 🇸🇰 Slovak · 🇸🇮 Slovene · 🇪🇸🇲🇽 Spanish · 🇹🇿🇰🇪 Swahili · 🇸🇪 Swedish · 🇵🇭 Tagalog · 🇹🇯 Tajik · 🇮🇳🇱🇰 Tamil · 🇮🇳 Telugu · 🇹🇷 Turkish · 🇺🇦 Ukrainian · 🇵🇰🇮🇳 Urdu · 🇨🇳 Uyghur · 🇺🇿 Uzbek · 🇻🇳 Vietnamese · 🇿🇦 Xhosa · 🇿🇦 Zulu · 🇰🇷 Korean
|
| 146 |
+
|
| 147 |
+
### WER/CER between 5 and 10 — usable, but less polished (17)
|
| 148 |
+
|
| 149 |
+
🇦🇱 Albanian · 🇲🇼🇿🇲 Chichewa/Nyanja · 🇮🇳🇵🇰 Eastern Punjabi · 🇺🇬 Ganda · 🇮🇸 Icelandic · 🇮🇪 Irish · 🇩🇿 Kabyle · 🇨🇻 Kabuverdianu · 🇰🇪 Kamba · 🇻🇦 Latin · 🇱🇺 Luxembourgish · 🇪🇹🇰🇪 Oromo · 🇦🇫🇵🇰 Pashto · 🇵🇰🇮🇳 Sindhi · 🇸🇴 Somali · 🇦🇴 Umbundu · 🇬🇧 Welsh
|
| 150 |
+
|
| 151 |
+
## Control Tokens
|
| 152 |
+
|
| 153 |
+
All tags follow `<|category:value|>` syntax and can be inserted mid-utterance.
|
| 154 |
+
|
| 155 |
+
- **Emotion** — `elation`, `amusement`, `enthusiasm`, `determination`, `pride`, `contentment`, `affection`, `relief`, `contemplation`, `confusion`, `surprise`, `awe`, `longing`, `arousal`, `anger`, `fear`, `disgust`, `bitterness`, `sadness`, `shame`, `helplessness`
|
| 156 |
+
- **Style** — `singing`, `shouting`, `whispering`
|
| 157 |
+
- **Sound effects** — `cough`, `laughter`, `crying`, `screaming`, `burping`, `humming`, `sigh`, `sniff`, `sneeze`
|
| 158 |
+
- **Prosody**
|
| 159 |
+
- Speed — `speed_very_slow` (≈0.65×), `speed_slow` (≈0.85×), `speed_fast` (≈1.2×), `speed_very_fast` (≈1.4×)
|
| 160 |
+
- Pauses — `pause` (≈400–700 ms), `long_pause` (≈700–1500 ms)
|
| 161 |
+
- Pitch — `pitch_low` (≈−3 st), `pitch_high` (≈+2.5 st)
|
| 162 |
+
- Delivery — `expressive_high`, `expressive_low`
|
| 163 |
+
|
| 164 |
+
## Evaluation Benchmarks
|
| 165 |
+
|
| 166 |
+
### Multilingual Voice Clone
|
| 167 |
+
|
| 168 |
+
We evaluate Higgs Audio v3 TTS on public multilingual TTS suites and our internal 111-language Higgs-Multilingual set, covering both common and lower-resource languages.
|
| 169 |
+
|
| 170 |
+
WER / CER (↓, ×100) macro-averaged across each benchmark's language set. Lower is better; **bold** marks the best per row. All numbers are reproducible end-to-end with original metrics and normalization.
|
| 171 |
+
|
| 172 |
+
<div style="font-family:-apple-system,BlinkMacSystemFont,'Segoe UI',Roboto,sans-serif;width:100%;padding:16px 0;overflow-x:auto;-webkit-overflow-scrolling:touch">
|
| 173 |
+
<table style="width:100%;border-collapse:collapse;font-size:clamp(11px,1.4vw,13px)">
|
| 174 |
+
<thead><tr>
|
| 175 |
+
<th style="padding:18px 24px;text-align:center;font-weight:600;border-bottom:2px solid #16a34a;color:#16a34a">Benchmark</th>
|
| 176 |
+
<th style="padding:18px 24px;text-align:center;font-weight:500;white-space:nowrap;border-bottom:2px solid #16a34a;color:#16a34a;font-size:14px">Higgs Audio v2</th>
|
| 177 |
+
<th style="padding:18px 24px;text-align:center;font-weight:500;white-space:nowrap;border-bottom:2px solid #16a34a;color:#16a34a;font-size:14px">Higgs Audio v3</th>
|
| 178 |
+
<th style="padding:18px 24px;text-align:center;font-weight:500;white-space:nowrap;border-bottom:2px solid #16a34a;color:#16a34a;font-size:14px">Fish Audio S2 Pro</th>
|
| 179 |
+
<th style="padding:18px 24px;text-align:center;font-weight:500;white-space:nowrap;border-bottom:2px solid #16a34a;color:#16a34a;font-size:14px">Qwen3-TTS-1.7B</th>
|
| 180 |
+
<th style="padding:18px 24px;text-align:center;font-weight:500;white-space:nowrap;border-bottom:2px solid #16a34a;color:#16a34a;font-size:14px">VibeVoice-7B</th>
|
| 181 |
+
<th style="padding:18px 24px;text-align:center;font-weight:500;white-space:nowrap;border-bottom:2px solid #16a34a;color:#16a34a;font-size:14px">IndexTTS-2</th>
|
| 182 |
+
<th style="padding:18px 24px;text-align:center;font-weight:500;white-space:nowrap;border-bottom:2px solid #16a34a;color:#16a34a;font-size:14px">MiMo-Audio-7B-Instruct</th>
|
| 183 |
+
<th style="padding:18px 24px;text-align:center;font-weight:500;white-space:nowrap;border-bottom:2px solid #16a34a;color:#16a34a;font-size:14px">MOSS-TTS-v1.5</th>
|
| 184 |
+
<th style="padding:18px 24px;text-align:center;font-weight:500;white-space:nowrap;border-bottom:2px solid #16a34a;color:#16a34a;font-size:14px">OmniVoice</th>
|
| 185 |
+
<th style="padding:18px 24px;text-align:center;font-weight:500;white-space:nowrap;border-bottom:2px solid #16a34a;color:#16a34a;font-size:14px">ChatterBox</th>
|
| 186 |
+
<th style="padding:18px 24px;text-align:center;font-weight:500;white-space:nowrap;border-bottom:2px solid #16a34a;color:#16a34a;font-size:14px">FireRedTTS-2</th>
|
| 187 |
+
</tr></thead>
|
| 188 |
+
<tbody>
|
| 189 |
+
<tr>
|
| 190 |
+
<td style="padding:16px 24px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15)">SeedTTS</td>
|
| 191 |
+
<td style="padding:16px 24px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15)">2.10</td>
|
| 192 |
+
<td style="padding:16px 24px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15);font-weight:600">1.11</td>
|
| 193 |
+
<td style="padding:16px 24px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15)">1.31</td>
|
| 194 |
+
<td style="padding:16px 24px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15)">1.30</td>
|
| 195 |
+
<td style="padding:16px 24px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15)">3.59</td>
|
| 196 |
+
<td style="padding:16px 24px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15)">1.63</td>
|
| 197 |
+
<td style="padding:16px 24px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15)">3.70</td>
|
| 198 |
+
<td style="padding:16px 24px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15)">1.73</td>
|
| 199 |
+
<td style="padding:16px 24px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15)">1.21</td>
|
| 200 |
+
<td style="padding:16px 24px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15)">17.00</td>
|
| 201 |
+
<td style="padding:16px 24px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15)">1.72</td>
|
| 202 |
+
</tr>
|
| 203 |
+
<tr>
|
| 204 |
+
<td style="padding:16px 24px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15)">CV3</td>
|
| 205 |
+
<td style="padding:16px 24px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15)">21.19</td>
|
| 206 |
+
<td style="padding:16px 24px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15);font-weight:600">4.41</td>
|
| 207 |
+
<td style="padding:16px 24px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15)">4.60</td>
|
| 208 |
+
<td style="padding:16px 24px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15)">7.73</td>
|
| 209 |
+
<td style="padding:16px 24px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15)">11.66</td>
|
| 210 |
+
<td style="padding:16px 24px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15)">129.26</td>
|
| 211 |
+
<td style="padding:16px 24px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15)">71.55</td>
|
| 212 |
+
<td style="padding:16px 24px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15)">6.11</td>
|
| 213 |
+
<td style="padding:16px 24px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15)">4.92</td>
|
| 214 |
+
<td style="padding:16px 24px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15)">32.62</td>
|
| 215 |
+
<td style="padding:16px 24px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15)">19.20</td>
|
| 216 |
+
</tr>
|
| 217 |
+
<tr>
|
| 218 |
+
<td style="padding:16px 24px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15)">MiniMax-Multilingual</td>
|
| 219 |
+
<td style="padding:16px 24px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15)">49.86</td>
|
| 220 |
+
<td style="padding:16px 24px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15);font-weight:600">2.74</td>
|
| 221 |
+
<td style="padding:16px 24px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15)">5.15</td>
|
| 222 |
+
<td style="padding:16px 24px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15)">27.41</td>
|
| 223 |
+
<td style="padding:16px 24px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15)">8.21</td>
|
| 224 |
+
<td style="padding:16px 24px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15)">112.91</td>
|
| 225 |
+
<td style="padding:16px 24px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15)">85.67</td>
|
| 226 |
+
<td style="padding:16px 24px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15)">3.78</td>
|
| 227 |
+
<td style="padding:16px 24px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15)">2.98</td>
|
| 228 |
+
<td style="padding:16px 24px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15)">49.30</td>
|
| 229 |
+
<td style="padding:16px 24px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15)">12.52</td>
|
| 230 |
+
</tr>
|
| 231 |
+
<tr>
|
| 232 |
+
<td style="padding:16px 24px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15)">Higgs-Multilingual</td>
|
| 233 |
+
<td style="padding:16px 24px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15)">52.24</td>
|
| 234 |
+
<td style="padding:16px 24px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15);font-weight:600">3.61</td>
|
| 235 |
+
<td style="padding:16px 24px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15)">8.68</td>
|
| 236 |
+
<td style="padding:16px 24px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15)">97.09</td>
|
| 237 |
+
<td style="padding:16px 24px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15)">13.74</td>
|
| 238 |
+
<td style="padding:16px 24px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15)">57.71</td>
|
| 239 |
+
<td style="padding:16px 24px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15)">59.61</td>
|
| 240 |
+
<td style="padding:16px 24px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15)">21.28</td>
|
| 241 |
+
<td style="padding:16px 24px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15)">3.63</td>
|
| 242 |
+
<td style="padding:16px 24px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15)">57.52</td>
|
| 243 |
+
<td style="padding:16px 24px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15)">33.69</td>
|
| 244 |
+
</tr>
|
| 245 |
+
</tbody>
|
| 246 |
+
</table>
|
| 247 |
+
</div>
|
| 248 |
+
|
| 249 |
+
### Emergent TTS
|
| 250 |
+
|
| 251 |
+
Win-rate (↑) per category — judge preference vs the BASELINE row; **bold** marks the highest win-rate per column. For a fair comparison, every model shares the same reference audio per prompt, and we run the benchmark text verbatim — no inline control tags inserted.
|
| 252 |
+
|
| 253 |
+
<div style="font-family:-apple-system,BlinkMacSystemFont,'Segoe UI',Roboto,sans-serif;width:100%;padding:16px 0;overflow-x:auto;-webkit-overflow-scrolling:touch">
|
| 254 |
+
<table style="width:100%;border-collapse:collapse;font-size:clamp(11px,1.4vw,13px)">
|
| 255 |
+
<thead><tr>
|
| 256 |
+
<th style="padding:18px 24px;text-align:center;font-weight:600;border-bottom:2px solid #16a34a;color:#16a34a">Model</th>
|
| 257 |
+
<th style="padding:18px 24px;text-align:center;font-weight:500;white-space:nowrap;border-bottom:2px solid #16a34a;color:#16a34a;font-size:14px">Overall ↑</th>
|
| 258 |
+
<th style="padding:18px 24px;text-align:center;font-weight:500;white-space:nowrap;border-bottom:2px solid #16a34a;color:#16a34a;font-size:14px">Emotions ↑</th>
|
| 259 |
+
<th style="padding:18px 24px;text-align:center;font-weight:500;white-space:nowrap;border-bottom:2px solid #16a34a;color:#16a34a;font-size:14px">Foreign Words ↑</th>
|
| 260 |
+
<th style="padding:18px 24px;text-align:center;font-weight:500;white-space:nowrap;border-bottom:2px solid #16a34a;color:#16a34a;font-size:14px">Paralinguistics ↑</th>
|
| 261 |
+
<th style="padding:18px 24px;text-align:center;font-weight:500;white-space:nowrap;border-bottom:2px solid #16a34a;color:#16a34a;font-size:14px">Complex Pronunciation ↑</th>
|
| 262 |
+
<th style="padding:18px 24px;text-align:center;font-weight:500;white-space:nowrap;border-bottom:2px solid #16a34a;color:#16a34a;font-size:14px">Questions ↑</th>
|
| 263 |
+
<th style="padding:18px 24px;text-align:center;font-weight:500;white-space:nowrap;border-bottom:2px solid #16a34a;color:#16a34a;font-size:14px">Syntactic Complexity ↑</th>
|
| 264 |
+
</tr></thead>
|
| 265 |
+
<tbody>
|
| 266 |
+
<tr>
|
| 267 |
+
<td style="padding:16px 24px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15)">Higgs Audio v3</td>
|
| 268 |
+
<td style="padding:16px 24px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15);font-weight:600">53.65%</td>
|
| 269 |
+
<td style="padding:16px 24px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15)">53.75%</td>
|
| 270 |
+
<td style="padding:16px 24px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15);font-weight:600">48.75%</td>
|
| 271 |
+
<td style="padding:16px 24px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15);font-weight:600">68.57%</td>
|
| 272 |
+
<td style="padding:16px 24px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15)">25.10%</td>
|
| 273 |
+
<td style="padding:16px 24px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15);font-weight:600">61.43%</td>
|
| 274 |
+
<td style="padding:16px 24px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15);font-weight:600">60.71%</td>
|
| 275 |
+
</tr>
|
| 276 |
+
<tr>
|
| 277 |
+
<td style="padding:16px 24px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15)">Fish Audio S2 Pro</td>
|
| 278 |
+
<td style="padding:16px 24px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15)">43.80%</td>
|
| 279 |
+
<td style="padding:16px 24px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15)">53.04%</td>
|
| 280 |
+
<td style="padding:16px 24px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15)">33.93%</td>
|
| 281 |
+
<td style="padding:16px 24px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15)">53.75%</td>
|
| 282 |
+
<td style="padding:16px 24px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15)">18.16%</td>
|
| 283 |
+
<td style="padding:16px 24px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15)">55.00%</td>
|
| 284 |
+
<td style="padding:16px 24px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15)">45.71%</td>
|
| 285 |
+
</tr>
|
| 286 |
+
<tr>
|
| 287 |
+
<td style="padding:16px 24px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15)">Qwen3-TTS-1.7B</td>
|
| 288 |
+
<td style="padding:16px 24px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15)">38.84%</td>
|
| 289 |
+
<td style="padding:16px 24px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15)">45.54%</td>
|
| 290 |
+
<td style="padding:16px 24px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15)">24.64%</td>
|
| 291 |
+
<td style="padding:16px 24px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15)">44.29%</td>
|
| 292 |
+
<td style="padding:16px 24px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15);font-weight:600">30.00%</td>
|
| 293 |
+
<td style="padding:16px 24px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15)">53.39%</td>
|
| 294 |
+
<td style="padding:16px 24px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15)">34.11%</td>
|
| 295 |
+
</tr>
|
| 296 |
+
<tr>
|
| 297 |
+
<td style="padding:16px 24px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15)">OmniVoice</td>
|
| 298 |
+
<td style="padding:16px 24px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15)">40.82%</td>
|
| 299 |
+
<td style="padding:16px 24px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15);font-weight:600">61.07%</td>
|
| 300 |
+
<td style="padding:16px 24px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15)">28.75%</td>
|
| 301 |
+
<td style="padding:16px 24px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15)">52.68%</td>
|
| 302 |
+
<td style="padding:16px 24px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15)">13.67%</td>
|
| 303 |
+
<td style="padding:16px 24px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15)">45.00%</td>
|
| 304 |
+
<td style="padding:16px 24px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15)">40.36%</td>
|
| 305 |
+
</tr>
|
| 306 |
+
</tbody>
|
| 307 |
+
</table>
|
| 308 |
+
</div>
|
| 309 |
+
|
| 310 |
+
## Usage
|
| 311 |
+
|
| 312 |
+
### SGLang Usage
|
| 313 |
+
|
| 314 |
+
Pair the weights in this repo with [**SGLang-Omni**](https://github.com/sgl-project/sglang-omni) — a production serving stack with continuous batching for multi-codebook decoding and the same inline tag controls. The Higgs TTS cookbook walks you through installation, server launch, request examples, and the full API reference.
|
| 315 |
+
|
| 316 |
+
See the [Higgs TTS cookbook](https://sgl-project.github.io/sglang-omni/cookbook/higgs_tts.html) for the full details.
|
| 317 |
+
|
| 318 |
+
|
| 319 |
+
#### Throughput
|
| 320 |
+
|
| 321 |
+
Throughput on Seed-TTS EN (full set, **N=1088** per run). Client `--max-concurrency` sweep against a Higgs server (`max_running_requests=16`, bf16, CUDA Graph on). Each row is the **mean of 3 runs**. Hardware: **1× H100**.
|
| 322 |
+
|
| 323 |
+
<div style="font-family:-apple-system,BlinkMacSystemFont,'Segoe UI',Roboto,sans-serif;width:100%;padding:16px 0;overflow-x:auto;-webkit-overflow-scrolling:touch">
|
| 324 |
+
<table style="width:100%;border-collapse:collapse;font-size:clamp(11px,1.4vw,13px)">
|
| 325 |
+
<thead><tr>
|
| 326 |
+
<th style="padding:18px 24px;text-align:center;font-weight:600;border-bottom:2px solid #ea580c;color:#ea580c">Concurrency</th>
|
| 327 |
+
<th style="padding:18px 24px;text-align:center;font-weight:500;white-space:nowrap;border-bottom:2px solid #ea580c;color:#ea580c;font-size:14px">Throughput (req/s)</th>
|
| 328 |
+
<th style="padding:18px 24px;text-align:center;font-weight:500;white-space:nowrap;border-bottom:2px solid #ea580c;color:#ea580c;font-size:14px">Mean latency</th>
|
| 329 |
+
<th style="padding:18px 24px;text-align:center;font-weight:500;white-space:nowrap;border-bottom:2px solid #ea580c;color:#ea580c;font-size:14px">RTF (per-req)</th>
|
| 330 |
+
<th style="padding:18px 24px;text-align:center;font-weight:500;white-space:nowrap;border-bottom:2px solid #ea580c;color:#ea580c;font-size:14px">audio_s/s</th>
|
| 331 |
+
</tr></thead>
|
| 332 |
+
<tbody>
|
| 333 |
+
<tr>
|
| 334 |
+
<td style="padding:16px 24px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15)">1</td>
|
| 335 |
+
<td style="padding:16px 24px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15)">1.62</td>
|
| 336 |
+
<td style="padding:16px 24px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15)">617 ms</td>
|
| 337 |
+
<td style="padding:16px 24px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15)">0.147</td>
|
| 338 |
+
<td style="padding:16px 24px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15)">6.89</td>
|
| 339 |
+
</tr>
|
| 340 |
+
<tr>
|
| 341 |
+
<td style="padding:16px 24px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15)">2</td>
|
| 342 |
+
<td style="padding:16px 24px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15)">2.70</td>
|
| 343 |
+
<td style="padding:16px 24px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15)">742 ms</td>
|
| 344 |
+
<td style="padding:16px 24px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15)">0.180</td>
|
| 345 |
+
<td style="padding:16px 24px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15)">11.37</td>
|
| 346 |
+
</tr>
|
| 347 |
+
<tr>
|
| 348 |
+
<td style="padding:16px 24px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15)">4</td>
|
| 349 |
+
<td style="padding:16px 24px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15)">5.45</td>
|
| 350 |
+
<td style="padding:16px 24px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15)">733 ms</td>
|
| 351 |
+
<td style="padding:16px 24px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15)">0.177</td>
|
| 352 |
+
<td style="padding:16px 24px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15)">22.84</td>
|
| 353 |
+
</tr>
|
| 354 |
+
<tr>
|
| 355 |
+
<td style="padding:16px 24px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15)">8</td>
|
| 356 |
+
<td style="padding:16px 24px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15)">8.91</td>
|
| 357 |
+
<td style="padding:16px 24px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15)">898 ms</td>
|
| 358 |
+
<td style="padding:16px 24px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15)">0.217</td>
|
| 359 |
+
<td style="padding:16px 24px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15)">37.38</td>
|
| 360 |
+
</tr>
|
| 361 |
+
<tr>
|
| 362 |
+
<td style="padding:16px 24px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15)">16</td>
|
| 363 |
+
<td style="padding:16px 24px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15)">14.74</td>
|
| 364 |
+
<td style="padding:16px 24px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15)">1079 ms</td>
|
| 365 |
+
<td style="padding:16px 24px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15)">0.262</td>
|
| 366 |
+
<td style="padding:16px 24px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15)">61.84</td>
|
| 367 |
+
</tr>
|
| 368 |
+
</tbody>
|
| 369 |
+
</table>
|
| 370 |
+
</div>
|
| 371 |
+
|
| 372 |
+
- **Concurrency** — Maximum number of in-flight client requests (`--max-concurrency`).
|
| 373 |
+
- **Throughput (req/s)** — Completed requests divided by total benchmark wall-clock time.
|
| 374 |
+
- **Mean latency** — Average end-to-end time per request (send to full response received).
|
| 375 |
+
- **RTF (per-req)** — Average ratio of processing time to generated audio duration per request (<1 is faster than real time).
|
| 376 |
+
- **audio_s/s** — Total seconds of audio produced divided by total benchmark wall-clock time.
|
| 377 |
+
|
| 378 |
+
To reproduce the results, follow the instructions in [this script](https://github.com/sgl-project/sglang-omni/blob/main/benchmarks/eval/benchmark_tts_seedtts.py).
|
| 379 |
+
|
| 380 |
+
<details>
|
| 381 |
+
<summary><b>Quick-start</b></summary>
|
| 382 |
+
|
| 383 |
+
#### Prerequisites
|
| 384 |
+
|
| 385 |
+
Install `sglang-omni` (see its [installation guide](https://github.com/sgl-project/sglang-omni#installation)), then pull the weights:
|
| 386 |
+
|
| 387 |
+
```bash
|
| 388 |
+
export HF_TOKEN=hf_xxxxxxxxxxxxxxxx
|
| 389 |
+
hf download bosonai/higgs-audio-v3-tts-4b
|
| 390 |
+
```
|
| 391 |
+
|
| 392 |
+
#### Launch the server
|
| 393 |
+
|
| 394 |
+
Pipeline: `preprocessing → audio_encoder → tts_engine → vocoder`.
|
| 395 |
+
|
| 396 |
+
```bash
|
| 397 |
+
sgl-omni serve --model-path bosonai/higgs-audio-v3-tts-4b --port 8000
|
| 398 |
+
```
|
| 399 |
+
|
| 400 |
+
#### Zero-shot synthesis
|
| 401 |
+
|
| 402 |
+
```bash
|
| 403 |
+
curl -X POST http://localhost:8000/v1/audio/speech \
|
| 404 |
+
-H "Content-Type: application/json" \
|
| 405 |
+
-d '{"input": "Hello, how are you?"}' \
|
| 406 |
+
--output output.wav
|
| 407 |
+
```
|
| 408 |
+
|
| 409 |
+
#### Voice cloning
|
| 410 |
+
|
| 411 |
+
Supplying the reference transcript (`text`) materially improves cloning fidelity.
|
| 412 |
+
|
| 413 |
+
```python
|
| 414 |
+
import requests
|
| 415 |
+
|
| 416 |
+
resp = requests.post(
|
| 417 |
+
"http://localhost:8000/v1/audio/speech",
|
| 418 |
+
json={
|
| 419 |
+
"input": "Have a nice day and enjoy south california sunshine.",
|
| 420 |
+
"references": [{
|
| 421 |
+
"audio_path": "ref.wav",
|
| 422 |
+
"text": "Hey, Adam here. Let's create something that feels real, sounds human, and connects every time.",
|
| 423 |
+
}],
|
| 424 |
+
"temperature": 0.8, "top_k": 50, "max_new_tokens": 1024,
|
| 425 |
+
},
|
| 426 |
+
)
|
| 427 |
+
with open("output.wav", "wb") as f:
|
| 428 |
+
f.write(resp.content)
|
| 429 |
+
```
|
| 430 |
+
|
| 431 |
+
#### Streaming (Server-Sent Events)
|
| 432 |
+
|
| 433 |
+
Set `"stream": true` to receive base64-encoded WAV chunks as the vocoder emits them — sub-second time-to-first-audio. Each event carries `audio.data` (base64 WAV bytes); the terminal event has `finish_reason: "stop"` plus usage metadata.
|
| 434 |
+
|
| 435 |
+
```python
|
| 436 |
+
import requests, base64, json
|
| 437 |
+
|
| 438 |
+
with requests.post(
|
| 439 |
+
"http://localhost:8000/v1/audio/speech",
|
| 440 |
+
json={"input": "Get the trust fund to the bank early.", "stream": True},
|
| 441 |
+
stream=True,
|
| 442 |
+
) as resp, open("output.wav", "wb") as f:
|
| 443 |
+
for line in resp.iter_lines():
|
| 444 |
+
if not line or not line.startswith(b"data: ") or line == b"data: [DONE]":
|
| 445 |
+
continue
|
| 446 |
+
event = json.loads(line[6:])
|
| 447 |
+
if event.get("finish_reason") == "stop":
|
| 448 |
+
break
|
| 449 |
+
audio = event.get("audio") or {}
|
| 450 |
+
if audio.get("data"):
|
| 451 |
+
f.write(base64.b64decode(audio["data"]))
|
| 452 |
+
```
|
| 453 |
+
|
| 454 |
+
#### Inline control tokens
|
| 455 |
+
|
| 456 |
+
Embed `<|emotion:…|>`, `<|style:…|>`, `<|prosody:…|>`, and `<|sfx:…|>` tokens directly in `input`. Two rules:
|
| 457 |
+
|
| 458 |
+
1. **Delivery tokens first.** Emotion, style, and the prosody *speed / pitch / expressive* tokens shape the whole turn — put them at the start of `input`. Positional tokens (`<|prosody:pause|>`, `<|prosody:long_pause|>`, `<|sfx:…|>`) go inline exactly where they fire.
|
| 459 |
+
2. **Pair every `<|sfx:…|>` with its onomatopoeia.** E.g. `<|sfx:laughter|>Haha`, `<|sfx:sigh|>Uh`, `<|sfx:sneeze|>Achoo`. The written sound gives the model the acoustic cue to realize the effect.
|
| 460 |
+
|
| 461 |
+
Example — amusement + laughter:
|
| 462 |
+
|
| 463 |
+
```bash
|
| 464 |
+
curl -X POST http://localhost:8000/v1/audio/speech \
|
| 465 |
+
-H "Content-Type: application/json" \
|
| 466 |
+
-d '{"input": "<|emotion:amusement|><|prosody:expressive_high|>Wait, wait, that was kind of hilarious. <|sfx:laughter|>Hehe, no, seriously, I was not ready for that."}' \
|
| 467 |
+
--output output.wav
|
| 468 |
+
```
|
| 469 |
+
</details>
|
| 470 |
+
|
| 471 |
+
### API Usage
|
| 472 |
+
|
| 473 |
+
For zero-ops deployment, use the [**Boson AI API**](https://bosonai.mintlify.app/overview).
|
| 474 |
+
|
| 475 |
+
## Citation
|
| 476 |
+
|
| 477 |
+
```bibtex
|
| 478 |
+
@misc{bosonai_higgs_audio_tts_v3_2026,
|
| 479 |
+
title = {Higgs Audio v3 TTS: Conversational Speech for Voice AI from Boson AI},
|
| 480 |
+
author = {Boson AI},
|
| 481 |
+
year = {2026},
|
| 482 |
+
howpublished = {https://huggingface.co/boson-sglang/higgs-audio-v3-tts-4b},
|
| 483 |
+
}
|
| 484 |
+
```
|
| 485 |
+
|
| 486 |
+
## License
|
| 487 |
+
|
| 488 |
+
Boson Higgs Audio v3 Research and Non-Commercial License — see [LICENSE](./LICENSE).
|
assets/model_architecture.png
ADDED
|
chat_template.jinja
ADDED
|
@@ -0,0 +1,54 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{%- if tools %}
|
| 2 |
+
{{- '<|im_start|>system\n' }}
|
| 3 |
+
{%- if messages[0]['role'] == 'system' %}
|
| 4 |
+
{{- messages[0]['content'] }}
|
| 5 |
+
{%- else %}
|
| 6 |
+
{{- 'You are a helpful assistant.' }}
|
| 7 |
+
{%- endif %}
|
| 8 |
+
{{- "\n\n# Tools\n\nYou may call one or more functions to assist with the user query.\n\nYou are provided with function signatures within <tools></tools> XML tags:\n<tools>" }}
|
| 9 |
+
{%- for tool in tools %}
|
| 10 |
+
{{- "\n" }}
|
| 11 |
+
{{- tool | tojson }}
|
| 12 |
+
{%- endfor %}
|
| 13 |
+
{{- "\n</tools>\n\nFor each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:\n<tool_call>\n{\"name\": <function-name>, \"arguments\": <args-json-object>}\n</tool_call><|im_end|>\n" }}
|
| 14 |
+
{%- else %}
|
| 15 |
+
{%- if messages[0]['role'] == 'system' %}
|
| 16 |
+
{{- '<|im_start|>system\n' + messages[0]['content'] + '<|im_end|>\n' }}
|
| 17 |
+
{%- else %}
|
| 18 |
+
{{- '<|im_start|>system\nYou are a helpful assistant.<|im_end|>\n' }}
|
| 19 |
+
{%- endif %}
|
| 20 |
+
{%- endif %}
|
| 21 |
+
{%- for message in messages %}
|
| 22 |
+
{%- if (message.role == "user") or (message.role == "system" and not loop.first) or (message.role == "assistant" and not message.tool_calls) %}
|
| 23 |
+
{{- '<|im_start|>' + message.role + '\n' + message.content + '<|im_end|>' + '\n' }}
|
| 24 |
+
{%- elif message.role == "assistant" %}
|
| 25 |
+
{{- '<|im_start|>' + message.role }}
|
| 26 |
+
{%- if message.content %}
|
| 27 |
+
{{- '\n' + message.content }}
|
| 28 |
+
{%- endif %}
|
| 29 |
+
{%- for tool_call in message.tool_calls %}
|
| 30 |
+
{%- if tool_call.function is defined %}
|
| 31 |
+
{%- set tool_call = tool_call.function %}
|
| 32 |
+
{%- endif %}
|
| 33 |
+
{{- '\n<tool_call>\n{"name": "' }}
|
| 34 |
+
{{- tool_call.name }}
|
| 35 |
+
{{- '", "arguments": ' }}
|
| 36 |
+
{{- tool_call.arguments | tojson }}
|
| 37 |
+
{{- '}\n</tool_call>' }}
|
| 38 |
+
{%- endfor %}
|
| 39 |
+
{{- '<|im_end|>\n' }}
|
| 40 |
+
{%- elif message.role == "tool" %}
|
| 41 |
+
{%- if (loop.index0 == 0) or (messages[loop.index0 - 1].role != "tool") %}
|
| 42 |
+
{{- '<|im_start|>user' }}
|
| 43 |
+
{%- endif %}
|
| 44 |
+
{{- '\n<tool_response>\n' }}
|
| 45 |
+
{{- message.content }}
|
| 46 |
+
{{- '\n</tool_response>' }}
|
| 47 |
+
{%- if loop.last or (messages[loop.index0 + 1].role != "tool") %}
|
| 48 |
+
{{- '<|im_end|>\n' }}
|
| 49 |
+
{%- endif %}
|
| 50 |
+
{%- endif %}
|
| 51 |
+
{%- endfor %}
|
| 52 |
+
{%- if add_generation_prompt %}
|
| 53 |
+
{{- '<|im_start|>assistant\n' }}
|
| 54 |
+
{%- endif %}
|
config.json
ADDED
|
@@ -0,0 +1,112 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"_hidden_size": 2560,
|
| 3 |
+
"_vocab_size": 151936,
|
| 4 |
+
"architectures": [
|
| 5 |
+
"HiggsMultimodalQwen3ForConditionalGeneration"
|
| 6 |
+
],
|
| 7 |
+
"audio_encoder_config": {
|
| 8 |
+
"_name_or_path": "",
|
| 9 |
+
"architectures": null,
|
| 10 |
+
"chunk_size_feed_forward": 0,
|
| 11 |
+
"dtype": null,
|
| 12 |
+
"encoder_type": "discrete",
|
| 13 |
+
"id2label": {
|
| 14 |
+
"0": "LABEL_0",
|
| 15 |
+
"1": "LABEL_1"
|
| 16 |
+
},
|
| 17 |
+
"is_encoder_decoder": false,
|
| 18 |
+
"label2id": {
|
| 19 |
+
"LABEL_0": 0,
|
| 20 |
+
"LABEL_1": 1
|
| 21 |
+
},
|
| 22 |
+
"max_chunk_size": 50,
|
| 23 |
+
"mel_per_sample": 8,
|
| 24 |
+
"model_type": "higgs_audio_encoder",
|
| 25 |
+
"num_codebooks": 8,
|
| 26 |
+
"out_dim": 2560,
|
| 27 |
+
"output_attentions": false,
|
| 28 |
+
"output_hidden_states": false,
|
| 29 |
+
"problem_type": null,
|
| 30 |
+
"qwen3_aut_config": null,
|
| 31 |
+
"return_dict": true,
|
| 32 |
+
"tie_word_embeddings": true,
|
| 33 |
+
"use_delay_pattern": true,
|
| 34 |
+
"vocab_size": 1026,
|
| 35 |
+
"whisper_config": null
|
| 36 |
+
},
|
| 37 |
+
"audio_token_id": -100,
|
| 38 |
+
"ignore_index": -100,
|
| 39 |
+
"model_type": "higgs_multimodal_qwen3",
|
| 40 |
+
"text_config": {
|
| 41 |
+
"_name_or_path": "/ceph/models/Qwen3-4B-Base",
|
| 42 |
+
"architectures": [
|
| 43 |
+
"Qwen3ForCausalLM"
|
| 44 |
+
],
|
| 45 |
+
"attention_bias": false,
|
| 46 |
+
"attention_dropout": 0.0,
|
| 47 |
+
"bos_token_id": 151643,
|
| 48 |
+
"dtype": "bfloat16",
|
| 49 |
+
"eos_token_id": 151643,
|
| 50 |
+
"head_dim": 128,
|
| 51 |
+
"hidden_act": "silu",
|
| 52 |
+
"hidden_size": 2560,
|
| 53 |
+
"initializer_range": 0.02,
|
| 54 |
+
"intermediate_size": 9728,
|
| 55 |
+
"layer_types": [
|
| 56 |
+
"full_attention",
|
| 57 |
+
"full_attention",
|
| 58 |
+
"full_attention",
|
| 59 |
+
"full_attention",
|
| 60 |
+
"full_attention",
|
| 61 |
+
"full_attention",
|
| 62 |
+
"full_attention",
|
| 63 |
+
"full_attention",
|
| 64 |
+
"full_attention",
|
| 65 |
+
"full_attention",
|
| 66 |
+
"full_attention",
|
| 67 |
+
"full_attention",
|
| 68 |
+
"full_attention",
|
| 69 |
+
"full_attention",
|
| 70 |
+
"full_attention",
|
| 71 |
+
"full_attention",
|
| 72 |
+
"full_attention",
|
| 73 |
+
"full_attention",
|
| 74 |
+
"full_attention",
|
| 75 |
+
"full_attention",
|
| 76 |
+
"full_attention",
|
| 77 |
+
"full_attention",
|
| 78 |
+
"full_attention",
|
| 79 |
+
"full_attention",
|
| 80 |
+
"full_attention",
|
| 81 |
+
"full_attention",
|
| 82 |
+
"full_attention",
|
| 83 |
+
"full_attention",
|
| 84 |
+
"full_attention",
|
| 85 |
+
"full_attention",
|
| 86 |
+
"full_attention",
|
| 87 |
+
"full_attention",
|
| 88 |
+
"full_attention",
|
| 89 |
+
"full_attention",
|
| 90 |
+
"full_attention",
|
| 91 |
+
"full_attention"
|
| 92 |
+
],
|
| 93 |
+
"max_position_embeddings": 32768,
|
| 94 |
+
"max_window_layers": 36,
|
| 95 |
+
"model_type": "qwen3",
|
| 96 |
+
"num_attention_heads": 32,
|
| 97 |
+
"num_hidden_layers": 36,
|
| 98 |
+
"num_key_value_heads": 8,
|
| 99 |
+
"pad_token_id": null,
|
| 100 |
+
"rms_norm_eps": 1e-06,
|
| 101 |
+
"rope_parameters": {
|
| 102 |
+
"rope_theta": 1000000,
|
| 103 |
+
"rope_type": "default"
|
| 104 |
+
},
|
| 105 |
+
"sliding_window": null,
|
| 106 |
+
"tie_word_embeddings": true,
|
| 107 |
+
"use_cache": true,
|
| 108 |
+
"use_sliding_window": false,
|
| 109 |
+
"vocab_size": 151936
|
| 110 |
+
},
|
| 111 |
+
"transformers_version": "5.5.0"
|
| 112 |
+
}
|
model.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:2f7965264c360b38180885006944aa16bd1de20f4e6cff79f6473bfcf8ae3d5a
|
| 3 |
+
size 9309834930
|
model.safetensors.index.json
ADDED
|
@@ -0,0 +1,934 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"metadata": {
|
| 3 |
+
"total_size": 8489763794
|
| 4 |
+
},
|
| 5 |
+
"weight_map": {
|
| 6 |
+
"body.layers.0.input_layernorm.weight": "model.safetensors",
|
| 7 |
+
"body.layers.0.mlp.down_proj.weight": "model.safetensors",
|
| 8 |
+
"body.layers.0.mlp.gate_proj.weight": "model.safetensors",
|
| 9 |
+
"body.layers.0.mlp.up_proj.weight": "model.safetensors",
|
| 10 |
+
"body.layers.0.post_attention_layernorm.weight": "model.safetensors",
|
| 11 |
+
"body.layers.0.self_attn.k_norm.weight": "model.safetensors",
|
| 12 |
+
"body.layers.0.self_attn.k_proj.weight": "model.safetensors",
|
| 13 |
+
"body.layers.0.self_attn.o_proj.weight": "model.safetensors",
|
| 14 |
+
"body.layers.0.self_attn.q_norm.weight": "model.safetensors",
|
| 15 |
+
"body.layers.0.self_attn.q_proj.weight": "model.safetensors",
|
| 16 |
+
"body.layers.0.self_attn.v_proj.weight": "model.safetensors",
|
| 17 |
+
"body.layers.1.input_layernorm.weight": "model.safetensors",
|
| 18 |
+
"body.layers.1.mlp.down_proj.weight": "model.safetensors",
|
| 19 |
+
"body.layers.1.mlp.gate_proj.weight": "model.safetensors",
|
| 20 |
+
"body.layers.1.mlp.up_proj.weight": "model.safetensors",
|
| 21 |
+
"body.layers.1.post_attention_layernorm.weight": "model.safetensors",
|
| 22 |
+
"body.layers.1.self_attn.k_norm.weight": "model.safetensors",
|
| 23 |
+
"body.layers.1.self_attn.k_proj.weight": "model.safetensors",
|
| 24 |
+
"body.layers.1.self_attn.o_proj.weight": "model.safetensors",
|
| 25 |
+
"body.layers.1.self_attn.q_norm.weight": "model.safetensors",
|
| 26 |
+
"body.layers.1.self_attn.q_proj.weight": "model.safetensors",
|
| 27 |
+
"body.layers.1.self_attn.v_proj.weight": "model.safetensors",
|
| 28 |
+
"body.layers.10.input_layernorm.weight": "model.safetensors",
|
| 29 |
+
"body.layers.10.mlp.down_proj.weight": "model.safetensors",
|
| 30 |
+
"body.layers.10.mlp.gate_proj.weight": "model.safetensors",
|
| 31 |
+
"body.layers.10.mlp.up_proj.weight": "model.safetensors",
|
| 32 |
+
"body.layers.10.post_attention_layernorm.weight": "model.safetensors",
|
| 33 |
+
"body.layers.10.self_attn.k_norm.weight": "model.safetensors",
|
| 34 |
+
"body.layers.10.self_attn.k_proj.weight": "model.safetensors",
|
| 35 |
+
"body.layers.10.self_attn.o_proj.weight": "model.safetensors",
|
| 36 |
+
"body.layers.10.self_attn.q_norm.weight": "model.safetensors",
|
| 37 |
+
"body.layers.10.self_attn.q_proj.weight": "model.safetensors",
|
| 38 |
+
"body.layers.10.self_attn.v_proj.weight": "model.safetensors",
|
| 39 |
+
"body.layers.11.input_layernorm.weight": "model.safetensors",
|
| 40 |
+
"body.layers.11.mlp.down_proj.weight": "model.safetensors",
|
| 41 |
+
"body.layers.11.mlp.gate_proj.weight": "model.safetensors",
|
| 42 |
+
"body.layers.11.mlp.up_proj.weight": "model.safetensors",
|
| 43 |
+
"body.layers.11.post_attention_layernorm.weight": "model.safetensors",
|
| 44 |
+
"body.layers.11.self_attn.k_norm.weight": "model.safetensors",
|
| 45 |
+
"body.layers.11.self_attn.k_proj.weight": "model.safetensors",
|
| 46 |
+
"body.layers.11.self_attn.o_proj.weight": "model.safetensors",
|
| 47 |
+
"body.layers.11.self_attn.q_norm.weight": "model.safetensors",
|
| 48 |
+
"body.layers.11.self_attn.q_proj.weight": "model.safetensors",
|
| 49 |
+
"body.layers.11.self_attn.v_proj.weight": "model.safetensors",
|
| 50 |
+
"body.layers.12.input_layernorm.weight": "model.safetensors",
|
| 51 |
+
"body.layers.12.mlp.down_proj.weight": "model.safetensors",
|
| 52 |
+
"body.layers.12.mlp.gate_proj.weight": "model.safetensors",
|
| 53 |
+
"body.layers.12.mlp.up_proj.weight": "model.safetensors",
|
| 54 |
+
"body.layers.12.post_attention_layernorm.weight": "model.safetensors",
|
| 55 |
+
"body.layers.12.self_attn.k_norm.weight": "model.safetensors",
|
| 56 |
+
"body.layers.12.self_attn.k_proj.weight": "model.safetensors",
|
| 57 |
+
"body.layers.12.self_attn.o_proj.weight": "model.safetensors",
|
| 58 |
+
"body.layers.12.self_attn.q_norm.weight": "model.safetensors",
|
| 59 |
+
"body.layers.12.self_attn.q_proj.weight": "model.safetensors",
|
| 60 |
+
"body.layers.12.self_attn.v_proj.weight": "model.safetensors",
|
| 61 |
+
"body.layers.13.input_layernorm.weight": "model.safetensors",
|
| 62 |
+
"body.layers.13.mlp.down_proj.weight": "model.safetensors",
|
| 63 |
+
"body.layers.13.mlp.gate_proj.weight": "model.safetensors",
|
| 64 |
+
"body.layers.13.mlp.up_proj.weight": "model.safetensors",
|
| 65 |
+
"body.layers.13.post_attention_layernorm.weight": "model.safetensors",
|
| 66 |
+
"body.layers.13.self_attn.k_norm.weight": "model.safetensors",
|
| 67 |
+
"body.layers.13.self_attn.k_proj.weight": "model.safetensors",
|
| 68 |
+
"body.layers.13.self_attn.o_proj.weight": "model.safetensors",
|
| 69 |
+
"body.layers.13.self_attn.q_norm.weight": "model.safetensors",
|
| 70 |
+
"body.layers.13.self_attn.q_proj.weight": "model.safetensors",
|
| 71 |
+
"body.layers.13.self_attn.v_proj.weight": "model.safetensors",
|
| 72 |
+
"body.layers.14.input_layernorm.weight": "model.safetensors",
|
| 73 |
+
"body.layers.14.mlp.down_proj.weight": "model.safetensors",
|
| 74 |
+
"body.layers.14.mlp.gate_proj.weight": "model.safetensors",
|
| 75 |
+
"body.layers.14.mlp.up_proj.weight": "model.safetensors",
|
| 76 |
+
"body.layers.14.post_attention_layernorm.weight": "model.safetensors",
|
| 77 |
+
"body.layers.14.self_attn.k_norm.weight": "model.safetensors",
|
| 78 |
+
"body.layers.14.self_attn.k_proj.weight": "model.safetensors",
|
| 79 |
+
"body.layers.14.self_attn.o_proj.weight": "model.safetensors",
|
| 80 |
+
"body.layers.14.self_attn.q_norm.weight": "model.safetensors",
|
| 81 |
+
"body.layers.14.self_attn.q_proj.weight": "model.safetensors",
|
| 82 |
+
"body.layers.14.self_attn.v_proj.weight": "model.safetensors",
|
| 83 |
+
"body.layers.15.input_layernorm.weight": "model.safetensors",
|
| 84 |
+
"body.layers.15.mlp.down_proj.weight": "model.safetensors",
|
| 85 |
+
"body.layers.15.mlp.gate_proj.weight": "model.safetensors",
|
| 86 |
+
"body.layers.15.mlp.up_proj.weight": "model.safetensors",
|
| 87 |
+
"body.layers.15.post_attention_layernorm.weight": "model.safetensors",
|
| 88 |
+
"body.layers.15.self_attn.k_norm.weight": "model.safetensors",
|
| 89 |
+
"body.layers.15.self_attn.k_proj.weight": "model.safetensors",
|
| 90 |
+
"body.layers.15.self_attn.o_proj.weight": "model.safetensors",
|
| 91 |
+
"body.layers.15.self_attn.q_norm.weight": "model.safetensors",
|
| 92 |
+
"body.layers.15.self_attn.q_proj.weight": "model.safetensors",
|
| 93 |
+
"body.layers.15.self_attn.v_proj.weight": "model.safetensors",
|
| 94 |
+
"body.layers.16.input_layernorm.weight": "model.safetensors",
|
| 95 |
+
"body.layers.16.mlp.down_proj.weight": "model.safetensors",
|
| 96 |
+
"body.layers.16.mlp.gate_proj.weight": "model.safetensors",
|
| 97 |
+
"body.layers.16.mlp.up_proj.weight": "model.safetensors",
|
| 98 |
+
"body.layers.16.post_attention_layernorm.weight": "model.safetensors",
|
| 99 |
+
"body.layers.16.self_attn.k_norm.weight": "model.safetensors",
|
| 100 |
+
"body.layers.16.self_attn.k_proj.weight": "model.safetensors",
|
| 101 |
+
"body.layers.16.self_attn.o_proj.weight": "model.safetensors",
|
| 102 |
+
"body.layers.16.self_attn.q_norm.weight": "model.safetensors",
|
| 103 |
+
"body.layers.16.self_attn.q_proj.weight": "model.safetensors",
|
| 104 |
+
"body.layers.16.self_attn.v_proj.weight": "model.safetensors",
|
| 105 |
+
"body.layers.17.input_layernorm.weight": "model.safetensors",
|
| 106 |
+
"body.layers.17.mlp.down_proj.weight": "model.safetensors",
|
| 107 |
+
"body.layers.17.mlp.gate_proj.weight": "model.safetensors",
|
| 108 |
+
"body.layers.17.mlp.up_proj.weight": "model.safetensors",
|
| 109 |
+
"body.layers.17.post_attention_layernorm.weight": "model.safetensors",
|
| 110 |
+
"body.layers.17.self_attn.k_norm.weight": "model.safetensors",
|
| 111 |
+
"body.layers.17.self_attn.k_proj.weight": "model.safetensors",
|
| 112 |
+
"body.layers.17.self_attn.o_proj.weight": "model.safetensors",
|
| 113 |
+
"body.layers.17.self_attn.q_norm.weight": "model.safetensors",
|
| 114 |
+
"body.layers.17.self_attn.q_proj.weight": "model.safetensors",
|
| 115 |
+
"body.layers.17.self_attn.v_proj.weight": "model.safetensors",
|
| 116 |
+
"body.layers.18.input_layernorm.weight": "model.safetensors",
|
| 117 |
+
"body.layers.18.mlp.down_proj.weight": "model.safetensors",
|
| 118 |
+
"body.layers.18.mlp.gate_proj.weight": "model.safetensors",
|
| 119 |
+
"body.layers.18.mlp.up_proj.weight": "model.safetensors",
|
| 120 |
+
"body.layers.18.post_attention_layernorm.weight": "model.safetensors",
|
| 121 |
+
"body.layers.18.self_attn.k_norm.weight": "model.safetensors",
|
| 122 |
+
"body.layers.18.self_attn.k_proj.weight": "model.safetensors",
|
| 123 |
+
"body.layers.18.self_attn.o_proj.weight": "model.safetensors",
|
| 124 |
+
"body.layers.18.self_attn.q_norm.weight": "model.safetensors",
|
| 125 |
+
"body.layers.18.self_attn.q_proj.weight": "model.safetensors",
|
| 126 |
+
"body.layers.18.self_attn.v_proj.weight": "model.safetensors",
|
| 127 |
+
"body.layers.19.input_layernorm.weight": "model.safetensors",
|
| 128 |
+
"body.layers.19.mlp.down_proj.weight": "model.safetensors",
|
| 129 |
+
"body.layers.19.mlp.gate_proj.weight": "model.safetensors",
|
| 130 |
+
"body.layers.19.mlp.up_proj.weight": "model.safetensors",
|
| 131 |
+
"body.layers.19.post_attention_layernorm.weight": "model.safetensors",
|
| 132 |
+
"body.layers.19.self_attn.k_norm.weight": "model.safetensors",
|
| 133 |
+
"body.layers.19.self_attn.k_proj.weight": "model.safetensors",
|
| 134 |
+
"body.layers.19.self_attn.o_proj.weight": "model.safetensors",
|
| 135 |
+
"body.layers.19.self_attn.q_norm.weight": "model.safetensors",
|
| 136 |
+
"body.layers.19.self_attn.q_proj.weight": "model.safetensors",
|
| 137 |
+
"body.layers.19.self_attn.v_proj.weight": "model.safetensors",
|
| 138 |
+
"body.layers.2.input_layernorm.weight": "model.safetensors",
|
| 139 |
+
"body.layers.2.mlp.down_proj.weight": "model.safetensors",
|
| 140 |
+
"body.layers.2.mlp.gate_proj.weight": "model.safetensors",
|
| 141 |
+
"body.layers.2.mlp.up_proj.weight": "model.safetensors",
|
| 142 |
+
"body.layers.2.post_attention_layernorm.weight": "model.safetensors",
|
| 143 |
+
"body.layers.2.self_attn.k_norm.weight": "model.safetensors",
|
| 144 |
+
"body.layers.2.self_attn.k_proj.weight": "model.safetensors",
|
| 145 |
+
"body.layers.2.self_attn.o_proj.weight": "model.safetensors",
|
| 146 |
+
"body.layers.2.self_attn.q_norm.weight": "model.safetensors",
|
| 147 |
+
"body.layers.2.self_attn.q_proj.weight": "model.safetensors",
|
| 148 |
+
"body.layers.2.self_attn.v_proj.weight": "model.safetensors",
|
| 149 |
+
"body.layers.20.input_layernorm.weight": "model.safetensors",
|
| 150 |
+
"body.layers.20.mlp.down_proj.weight": "model.safetensors",
|
| 151 |
+
"body.layers.20.mlp.gate_proj.weight": "model.safetensors",
|
| 152 |
+
"body.layers.20.mlp.up_proj.weight": "model.safetensors",
|
| 153 |
+
"body.layers.20.post_attention_layernorm.weight": "model.safetensors",
|
| 154 |
+
"body.layers.20.self_attn.k_norm.weight": "model.safetensors",
|
| 155 |
+
"body.layers.20.self_attn.k_proj.weight": "model.safetensors",
|
| 156 |
+
"body.layers.20.self_attn.o_proj.weight": "model.safetensors",
|
| 157 |
+
"body.layers.20.self_attn.q_norm.weight": "model.safetensors",
|
| 158 |
+
"body.layers.20.self_attn.q_proj.weight": "model.safetensors",
|
| 159 |
+
"body.layers.20.self_attn.v_proj.weight": "model.safetensors",
|
| 160 |
+
"body.layers.21.input_layernorm.weight": "model.safetensors",
|
| 161 |
+
"body.layers.21.mlp.down_proj.weight": "model.safetensors",
|
| 162 |
+
"body.layers.21.mlp.gate_proj.weight": "model.safetensors",
|
| 163 |
+
"body.layers.21.mlp.up_proj.weight": "model.safetensors",
|
| 164 |
+
"body.layers.21.post_attention_layernorm.weight": "model.safetensors",
|
| 165 |
+
"body.layers.21.self_attn.k_norm.weight": "model.safetensors",
|
| 166 |
+
"body.layers.21.self_attn.k_proj.weight": "model.safetensors",
|
| 167 |
+
"body.layers.21.self_attn.o_proj.weight": "model.safetensors",
|
| 168 |
+
"body.layers.21.self_attn.q_norm.weight": "model.safetensors",
|
| 169 |
+
"body.layers.21.self_attn.q_proj.weight": "model.safetensors",
|
| 170 |
+
"body.layers.21.self_attn.v_proj.weight": "model.safetensors",
|
| 171 |
+
"body.layers.22.input_layernorm.weight": "model.safetensors",
|
| 172 |
+
"body.layers.22.mlp.down_proj.weight": "model.safetensors",
|
| 173 |
+
"body.layers.22.mlp.gate_proj.weight": "model.safetensors",
|
| 174 |
+
"body.layers.22.mlp.up_proj.weight": "model.safetensors",
|
| 175 |
+
"body.layers.22.post_attention_layernorm.weight": "model.safetensors",
|
| 176 |
+
"body.layers.22.self_attn.k_norm.weight": "model.safetensors",
|
| 177 |
+
"body.layers.22.self_attn.k_proj.weight": "model.safetensors",
|
| 178 |
+
"body.layers.22.self_attn.o_proj.weight": "model.safetensors",
|
| 179 |
+
"body.layers.22.self_attn.q_norm.weight": "model.safetensors",
|
| 180 |
+
"body.layers.22.self_attn.q_proj.weight": "model.safetensors",
|
| 181 |
+
"body.layers.22.self_attn.v_proj.weight": "model.safetensors",
|
| 182 |
+
"body.layers.23.input_layernorm.weight": "model.safetensors",
|
| 183 |
+
"body.layers.23.mlp.down_proj.weight": "model.safetensors",
|
| 184 |
+
"body.layers.23.mlp.gate_proj.weight": "model.safetensors",
|
| 185 |
+
"body.layers.23.mlp.up_proj.weight": "model.safetensors",
|
| 186 |
+
"body.layers.23.post_attention_layernorm.weight": "model.safetensors",
|
| 187 |
+
"body.layers.23.self_attn.k_norm.weight": "model.safetensors",
|
| 188 |
+
"body.layers.23.self_attn.k_proj.weight": "model.safetensors",
|
| 189 |
+
"body.layers.23.self_attn.o_proj.weight": "model.safetensors",
|
| 190 |
+
"body.layers.23.self_attn.q_norm.weight": "model.safetensors",
|
| 191 |
+
"body.layers.23.self_attn.q_proj.weight": "model.safetensors",
|
| 192 |
+
"body.layers.23.self_attn.v_proj.weight": "model.safetensors",
|
| 193 |
+
"body.layers.24.input_layernorm.weight": "model.safetensors",
|
| 194 |
+
"body.layers.24.mlp.down_proj.weight": "model.safetensors",
|
| 195 |
+
"body.layers.24.mlp.gate_proj.weight": "model.safetensors",
|
| 196 |
+
"body.layers.24.mlp.up_proj.weight": "model.safetensors",
|
| 197 |
+
"body.layers.24.post_attention_layernorm.weight": "model.safetensors",
|
| 198 |
+
"body.layers.24.self_attn.k_norm.weight": "model.safetensors",
|
| 199 |
+
"body.layers.24.self_attn.k_proj.weight": "model.safetensors",
|
| 200 |
+
"body.layers.24.self_attn.o_proj.weight": "model.safetensors",
|
| 201 |
+
"body.layers.24.self_attn.q_norm.weight": "model.safetensors",
|
| 202 |
+
"body.layers.24.self_attn.q_proj.weight": "model.safetensors",
|
| 203 |
+
"body.layers.24.self_attn.v_proj.weight": "model.safetensors",
|
| 204 |
+
"body.layers.25.input_layernorm.weight": "model.safetensors",
|
| 205 |
+
"body.layers.25.mlp.down_proj.weight": "model.safetensors",
|
| 206 |
+
"body.layers.25.mlp.gate_proj.weight": "model.safetensors",
|
| 207 |
+
"body.layers.25.mlp.up_proj.weight": "model.safetensors",
|
| 208 |
+
"body.layers.25.post_attention_layernorm.weight": "model.safetensors",
|
| 209 |
+
"body.layers.25.self_attn.k_norm.weight": "model.safetensors",
|
| 210 |
+
"body.layers.25.self_attn.k_proj.weight": "model.safetensors",
|
| 211 |
+
"body.layers.25.self_attn.o_proj.weight": "model.safetensors",
|
| 212 |
+
"body.layers.25.self_attn.q_norm.weight": "model.safetensors",
|
| 213 |
+
"body.layers.25.self_attn.q_proj.weight": "model.safetensors",
|
| 214 |
+
"body.layers.25.self_attn.v_proj.weight": "model.safetensors",
|
| 215 |
+
"body.layers.26.input_layernorm.weight": "model.safetensors",
|
| 216 |
+
"body.layers.26.mlp.down_proj.weight": "model.safetensors",
|
| 217 |
+
"body.layers.26.mlp.gate_proj.weight": "model.safetensors",
|
| 218 |
+
"body.layers.26.mlp.up_proj.weight": "model.safetensors",
|
| 219 |
+
"body.layers.26.post_attention_layernorm.weight": "model.safetensors",
|
| 220 |
+
"body.layers.26.self_attn.k_norm.weight": "model.safetensors",
|
| 221 |
+
"body.layers.26.self_attn.k_proj.weight": "model.safetensors",
|
| 222 |
+
"body.layers.26.self_attn.o_proj.weight": "model.safetensors",
|
| 223 |
+
"body.layers.26.self_attn.q_norm.weight": "model.safetensors",
|
| 224 |
+
"body.layers.26.self_attn.q_proj.weight": "model.safetensors",
|
| 225 |
+
"body.layers.26.self_attn.v_proj.weight": "model.safetensors",
|
| 226 |
+
"body.layers.27.input_layernorm.weight": "model.safetensors",
|
| 227 |
+
"body.layers.27.mlp.down_proj.weight": "model.safetensors",
|
| 228 |
+
"body.layers.27.mlp.gate_proj.weight": "model.safetensors",
|
| 229 |
+
"body.layers.27.mlp.up_proj.weight": "model.safetensors",
|
| 230 |
+
"body.layers.27.post_attention_layernorm.weight": "model.safetensors",
|
| 231 |
+
"body.layers.27.self_attn.k_norm.weight": "model.safetensors",
|
| 232 |
+
"body.layers.27.self_attn.k_proj.weight": "model.safetensors",
|
| 233 |
+
"body.layers.27.self_attn.o_proj.weight": "model.safetensors",
|
| 234 |
+
"body.layers.27.self_attn.q_norm.weight": "model.safetensors",
|
| 235 |
+
"body.layers.27.self_attn.q_proj.weight": "model.safetensors",
|
| 236 |
+
"body.layers.27.self_attn.v_proj.weight": "model.safetensors",
|
| 237 |
+
"body.layers.28.input_layernorm.weight": "model.safetensors",
|
| 238 |
+
"body.layers.28.mlp.down_proj.weight": "model.safetensors",
|
| 239 |
+
"body.layers.28.mlp.gate_proj.weight": "model.safetensors",
|
| 240 |
+
"body.layers.28.mlp.up_proj.weight": "model.safetensors",
|
| 241 |
+
"body.layers.28.post_attention_layernorm.weight": "model.safetensors",
|
| 242 |
+
"body.layers.28.self_attn.k_norm.weight": "model.safetensors",
|
| 243 |
+
"body.layers.28.self_attn.k_proj.weight": "model.safetensors",
|
| 244 |
+
"body.layers.28.self_attn.o_proj.weight": "model.safetensors",
|
| 245 |
+
"body.layers.28.self_attn.q_norm.weight": "model.safetensors",
|
| 246 |
+
"body.layers.28.self_attn.q_proj.weight": "model.safetensors",
|
| 247 |
+
"body.layers.28.self_attn.v_proj.weight": "model.safetensors",
|
| 248 |
+
"body.layers.29.input_layernorm.weight": "model.safetensors",
|
| 249 |
+
"body.layers.29.mlp.down_proj.weight": "model.safetensors",
|
| 250 |
+
"body.layers.29.mlp.gate_proj.weight": "model.safetensors",
|
| 251 |
+
"body.layers.29.mlp.up_proj.weight": "model.safetensors",
|
| 252 |
+
"body.layers.29.post_attention_layernorm.weight": "model.safetensors",
|
| 253 |
+
"body.layers.29.self_attn.k_norm.weight": "model.safetensors",
|
| 254 |
+
"body.layers.29.self_attn.k_proj.weight": "model.safetensors",
|
| 255 |
+
"body.layers.29.self_attn.o_proj.weight": "model.safetensors",
|
| 256 |
+
"body.layers.29.self_attn.q_norm.weight": "model.safetensors",
|
| 257 |
+
"body.layers.29.self_attn.q_proj.weight": "model.safetensors",
|
| 258 |
+
"body.layers.29.self_attn.v_proj.weight": "model.safetensors",
|
| 259 |
+
"body.layers.3.input_layernorm.weight": "model.safetensors",
|
| 260 |
+
"body.layers.3.mlp.down_proj.weight": "model.safetensors",
|
| 261 |
+
"body.layers.3.mlp.gate_proj.weight": "model.safetensors",
|
| 262 |
+
"body.layers.3.mlp.up_proj.weight": "model.safetensors",
|
| 263 |
+
"body.layers.3.post_attention_layernorm.weight": "model.safetensors",
|
| 264 |
+
"body.layers.3.self_attn.k_norm.weight": "model.safetensors",
|
| 265 |
+
"body.layers.3.self_attn.k_proj.weight": "model.safetensors",
|
| 266 |
+
"body.layers.3.self_attn.o_proj.weight": "model.safetensors",
|
| 267 |
+
"body.layers.3.self_attn.q_norm.weight": "model.safetensors",
|
| 268 |
+
"body.layers.3.self_attn.q_proj.weight": "model.safetensors",
|
| 269 |
+
"body.layers.3.self_attn.v_proj.weight": "model.safetensors",
|
| 270 |
+
"body.layers.30.input_layernorm.weight": "model.safetensors",
|
| 271 |
+
"body.layers.30.mlp.down_proj.weight": "model.safetensors",
|
| 272 |
+
"body.layers.30.mlp.gate_proj.weight": "model.safetensors",
|
| 273 |
+
"body.layers.30.mlp.up_proj.weight": "model.safetensors",
|
| 274 |
+
"body.layers.30.post_attention_layernorm.weight": "model.safetensors",
|
| 275 |
+
"body.layers.30.self_attn.k_norm.weight": "model.safetensors",
|
| 276 |
+
"body.layers.30.self_attn.k_proj.weight": "model.safetensors",
|
| 277 |
+
"body.layers.30.self_attn.o_proj.weight": "model.safetensors",
|
| 278 |
+
"body.layers.30.self_attn.q_norm.weight": "model.safetensors",
|
| 279 |
+
"body.layers.30.self_attn.q_proj.weight": "model.safetensors",
|
| 280 |
+
"body.layers.30.self_attn.v_proj.weight": "model.safetensors",
|
| 281 |
+
"body.layers.31.input_layernorm.weight": "model.safetensors",
|
| 282 |
+
"body.layers.31.mlp.down_proj.weight": "model.safetensors",
|
| 283 |
+
"body.layers.31.mlp.gate_proj.weight": "model.safetensors",
|
| 284 |
+
"body.layers.31.mlp.up_proj.weight": "model.safetensors",
|
| 285 |
+
"body.layers.31.post_attention_layernorm.weight": "model.safetensors",
|
| 286 |
+
"body.layers.31.self_attn.k_norm.weight": "model.safetensors",
|
| 287 |
+
"body.layers.31.self_attn.k_proj.weight": "model.safetensors",
|
| 288 |
+
"body.layers.31.self_attn.o_proj.weight": "model.safetensors",
|
| 289 |
+
"body.layers.31.self_attn.q_norm.weight": "model.safetensors",
|
| 290 |
+
"body.layers.31.self_attn.q_proj.weight": "model.safetensors",
|
| 291 |
+
"body.layers.31.self_attn.v_proj.weight": "model.safetensors",
|
| 292 |
+
"body.layers.32.input_layernorm.weight": "model.safetensors",
|
| 293 |
+
"body.layers.32.mlp.down_proj.weight": "model.safetensors",
|
| 294 |
+
"body.layers.32.mlp.gate_proj.weight": "model.safetensors",
|
| 295 |
+
"body.layers.32.mlp.up_proj.weight": "model.safetensors",
|
| 296 |
+
"body.layers.32.post_attention_layernorm.weight": "model.safetensors",
|
| 297 |
+
"body.layers.32.self_attn.k_norm.weight": "model.safetensors",
|
| 298 |
+
"body.layers.32.self_attn.k_proj.weight": "model.safetensors",
|
| 299 |
+
"body.layers.32.self_attn.o_proj.weight": "model.safetensors",
|
| 300 |
+
"body.layers.32.self_attn.q_norm.weight": "model.safetensors",
|
| 301 |
+
"body.layers.32.self_attn.q_proj.weight": "model.safetensors",
|
| 302 |
+
"body.layers.32.self_attn.v_proj.weight": "model.safetensors",
|
| 303 |
+
"body.layers.33.input_layernorm.weight": "model.safetensors",
|
| 304 |
+
"body.layers.33.mlp.down_proj.weight": "model.safetensors",
|
| 305 |
+
"body.layers.33.mlp.gate_proj.weight": "model.safetensors",
|
| 306 |
+
"body.layers.33.mlp.up_proj.weight": "model.safetensors",
|
| 307 |
+
"body.layers.33.post_attention_layernorm.weight": "model.safetensors",
|
| 308 |
+
"body.layers.33.self_attn.k_norm.weight": "model.safetensors",
|
| 309 |
+
"body.layers.33.self_attn.k_proj.weight": "model.safetensors",
|
| 310 |
+
"body.layers.33.self_attn.o_proj.weight": "model.safetensors",
|
| 311 |
+
"body.layers.33.self_attn.q_norm.weight": "model.safetensors",
|
| 312 |
+
"body.layers.33.self_attn.q_proj.weight": "model.safetensors",
|
| 313 |
+
"body.layers.33.self_attn.v_proj.weight": "model.safetensors",
|
| 314 |
+
"body.layers.34.input_layernorm.weight": "model.safetensors",
|
| 315 |
+
"body.layers.34.mlp.down_proj.weight": "model.safetensors",
|
| 316 |
+
"body.layers.34.mlp.gate_proj.weight": "model.safetensors",
|
| 317 |
+
"body.layers.34.mlp.up_proj.weight": "model.safetensors",
|
| 318 |
+
"body.layers.34.post_attention_layernorm.weight": "model.safetensors",
|
| 319 |
+
"body.layers.34.self_attn.k_norm.weight": "model.safetensors",
|
| 320 |
+
"body.layers.34.self_attn.k_proj.weight": "model.safetensors",
|
| 321 |
+
"body.layers.34.self_attn.o_proj.weight": "model.safetensors",
|
| 322 |
+
"body.layers.34.self_attn.q_norm.weight": "model.safetensors",
|
| 323 |
+
"body.layers.34.self_attn.q_proj.weight": "model.safetensors",
|
| 324 |
+
"body.layers.34.self_attn.v_proj.weight": "model.safetensors",
|
| 325 |
+
"body.layers.35.input_layernorm.weight": "model.safetensors",
|
| 326 |
+
"body.layers.35.mlp.down_proj.weight": "model.safetensors",
|
| 327 |
+
"body.layers.35.mlp.gate_proj.weight": "model.safetensors",
|
| 328 |
+
"body.layers.35.mlp.up_proj.weight": "model.safetensors",
|
| 329 |
+
"body.layers.35.post_attention_layernorm.weight": "model.safetensors",
|
| 330 |
+
"body.layers.35.self_attn.k_norm.weight": "model.safetensors",
|
| 331 |
+
"body.layers.35.self_attn.k_proj.weight": "model.safetensors",
|
| 332 |
+
"body.layers.35.self_attn.o_proj.weight": "model.safetensors",
|
| 333 |
+
"body.layers.35.self_attn.q_norm.weight": "model.safetensors",
|
| 334 |
+
"body.layers.35.self_attn.q_proj.weight": "model.safetensors",
|
| 335 |
+
"body.layers.35.self_attn.v_proj.weight": "model.safetensors",
|
| 336 |
+
"body.layers.4.input_layernorm.weight": "model.safetensors",
|
| 337 |
+
"body.layers.4.mlp.down_proj.weight": "model.safetensors",
|
| 338 |
+
"body.layers.4.mlp.gate_proj.weight": "model.safetensors",
|
| 339 |
+
"body.layers.4.mlp.up_proj.weight": "model.safetensors",
|
| 340 |
+
"body.layers.4.post_attention_layernorm.weight": "model.safetensors",
|
| 341 |
+
"body.layers.4.self_attn.k_norm.weight": "model.safetensors",
|
| 342 |
+
"body.layers.4.self_attn.k_proj.weight": "model.safetensors",
|
| 343 |
+
"body.layers.4.self_attn.o_proj.weight": "model.safetensors",
|
| 344 |
+
"body.layers.4.self_attn.q_norm.weight": "model.safetensors",
|
| 345 |
+
"body.layers.4.self_attn.q_proj.weight": "model.safetensors",
|
| 346 |
+
"body.layers.4.self_attn.v_proj.weight": "model.safetensors",
|
| 347 |
+
"body.layers.5.input_layernorm.weight": "model.safetensors",
|
| 348 |
+
"body.layers.5.mlp.down_proj.weight": "model.safetensors",
|
| 349 |
+
"body.layers.5.mlp.gate_proj.weight": "model.safetensors",
|
| 350 |
+
"body.layers.5.mlp.up_proj.weight": "model.safetensors",
|
| 351 |
+
"body.layers.5.post_attention_layernorm.weight": "model.safetensors",
|
| 352 |
+
"body.layers.5.self_attn.k_norm.weight": "model.safetensors",
|
| 353 |
+
"body.layers.5.self_attn.k_proj.weight": "model.safetensors",
|
| 354 |
+
"body.layers.5.self_attn.o_proj.weight": "model.safetensors",
|
| 355 |
+
"body.layers.5.self_attn.q_norm.weight": "model.safetensors",
|
| 356 |
+
"body.layers.5.self_attn.q_proj.weight": "model.safetensors",
|
| 357 |
+
"body.layers.5.self_attn.v_proj.weight": "model.safetensors",
|
| 358 |
+
"body.layers.6.input_layernorm.weight": "model.safetensors",
|
| 359 |
+
"body.layers.6.mlp.down_proj.weight": "model.safetensors",
|
| 360 |
+
"body.layers.6.mlp.gate_proj.weight": "model.safetensors",
|
| 361 |
+
"body.layers.6.mlp.up_proj.weight": "model.safetensors",
|
| 362 |
+
"body.layers.6.post_attention_layernorm.weight": "model.safetensors",
|
| 363 |
+
"body.layers.6.self_attn.k_norm.weight": "model.safetensors",
|
| 364 |
+
"body.layers.6.self_attn.k_proj.weight": "model.safetensors",
|
| 365 |
+
"body.layers.6.self_attn.o_proj.weight": "model.safetensors",
|
| 366 |
+
"body.layers.6.self_attn.q_norm.weight": "model.safetensors",
|
| 367 |
+
"body.layers.6.self_attn.q_proj.weight": "model.safetensors",
|
| 368 |
+
"body.layers.6.self_attn.v_proj.weight": "model.safetensors",
|
| 369 |
+
"body.layers.7.input_layernorm.weight": "model.safetensors",
|
| 370 |
+
"body.layers.7.mlp.down_proj.weight": "model.safetensors",
|
| 371 |
+
"body.layers.7.mlp.gate_proj.weight": "model.safetensors",
|
| 372 |
+
"body.layers.7.mlp.up_proj.weight": "model.safetensors",
|
| 373 |
+
"body.layers.7.post_attention_layernorm.weight": "model.safetensors",
|
| 374 |
+
"body.layers.7.self_attn.k_norm.weight": "model.safetensors",
|
| 375 |
+
"body.layers.7.self_attn.k_proj.weight": "model.safetensors",
|
| 376 |
+
"body.layers.7.self_attn.o_proj.weight": "model.safetensors",
|
| 377 |
+
"body.layers.7.self_attn.q_norm.weight": "model.safetensors",
|
| 378 |
+
"body.layers.7.self_attn.q_proj.weight": "model.safetensors",
|
| 379 |
+
"body.layers.7.self_attn.v_proj.weight": "model.safetensors",
|
| 380 |
+
"body.layers.8.input_layernorm.weight": "model.safetensors",
|
| 381 |
+
"body.layers.8.mlp.down_proj.weight": "model.safetensors",
|
| 382 |
+
"body.layers.8.mlp.gate_proj.weight": "model.safetensors",
|
| 383 |
+
"body.layers.8.mlp.up_proj.weight": "model.safetensors",
|
| 384 |
+
"body.layers.8.post_attention_layernorm.weight": "model.safetensors",
|
| 385 |
+
"body.layers.8.self_attn.k_norm.weight": "model.safetensors",
|
| 386 |
+
"body.layers.8.self_attn.k_proj.weight": "model.safetensors",
|
| 387 |
+
"body.layers.8.self_attn.o_proj.weight": "model.safetensors",
|
| 388 |
+
"body.layers.8.self_attn.q_norm.weight": "model.safetensors",
|
| 389 |
+
"body.layers.8.self_attn.q_proj.weight": "model.safetensors",
|
| 390 |
+
"body.layers.8.self_attn.v_proj.weight": "model.safetensors",
|
| 391 |
+
"body.layers.9.input_layernorm.weight": "model.safetensors",
|
| 392 |
+
"body.layers.9.mlp.down_proj.weight": "model.safetensors",
|
| 393 |
+
"body.layers.9.mlp.gate_proj.weight": "model.safetensors",
|
| 394 |
+
"body.layers.9.mlp.up_proj.weight": "model.safetensors",
|
| 395 |
+
"body.layers.9.post_attention_layernorm.weight": "model.safetensors",
|
| 396 |
+
"body.layers.9.self_attn.k_norm.weight": "model.safetensors",
|
| 397 |
+
"body.layers.9.self_attn.k_proj.weight": "model.safetensors",
|
| 398 |
+
"body.layers.9.self_attn.o_proj.weight": "model.safetensors",
|
| 399 |
+
"body.layers.9.self_attn.q_norm.weight": "model.safetensors",
|
| 400 |
+
"body.layers.9.self_attn.q_proj.weight": "model.safetensors",
|
| 401 |
+
"body.layers.9.self_attn.v_proj.weight": "model.safetensors",
|
| 402 |
+
"body.norm.weight": "model.safetensors",
|
| 403 |
+
"tied.embedding.modality_embeddings.0.embedding.weight": "model.safetensors",
|
| 404 |
+
"tied.embedding.modality_embeddings.0.model.acoustic_decoder.block.0.conv_t1.bias": "model.safetensors",
|
| 405 |
+
"tied.embedding.modality_embeddings.0.model.acoustic_decoder.block.0.conv_t1.weight": "model.safetensors",
|
| 406 |
+
"tied.embedding.modality_embeddings.0.model.acoustic_decoder.block.0.res_unit1.conv1.bias": "model.safetensors",
|
| 407 |
+
"tied.embedding.modality_embeddings.0.model.acoustic_decoder.block.0.res_unit1.conv1.weight": "model.safetensors",
|
| 408 |
+
"tied.embedding.modality_embeddings.0.model.acoustic_decoder.block.0.res_unit1.conv2.bias": "model.safetensors",
|
| 409 |
+
"tied.embedding.modality_embeddings.0.model.acoustic_decoder.block.0.res_unit1.conv2.weight": "model.safetensors",
|
| 410 |
+
"tied.embedding.modality_embeddings.0.model.acoustic_decoder.block.0.res_unit1.snake1.alpha": "model.safetensors",
|
| 411 |
+
"tied.embedding.modality_embeddings.0.model.acoustic_decoder.block.0.res_unit1.snake2.alpha": "model.safetensors",
|
| 412 |
+
"tied.embedding.modality_embeddings.0.model.acoustic_decoder.block.0.res_unit2.conv1.bias": "model.safetensors",
|
| 413 |
+
"tied.embedding.modality_embeddings.0.model.acoustic_decoder.block.0.res_unit2.conv1.weight": "model.safetensors",
|
| 414 |
+
"tied.embedding.modality_embeddings.0.model.acoustic_decoder.block.0.res_unit2.conv2.bias": "model.safetensors",
|
| 415 |
+
"tied.embedding.modality_embeddings.0.model.acoustic_decoder.block.0.res_unit2.conv2.weight": "model.safetensors",
|
| 416 |
+
"tied.embedding.modality_embeddings.0.model.acoustic_decoder.block.0.res_unit2.snake1.alpha": "model.safetensors",
|
| 417 |
+
"tied.embedding.modality_embeddings.0.model.acoustic_decoder.block.0.res_unit2.snake2.alpha": "model.safetensors",
|
| 418 |
+
"tied.embedding.modality_embeddings.0.model.acoustic_decoder.block.0.res_unit3.conv1.bias": "model.safetensors",
|
| 419 |
+
"tied.embedding.modality_embeddings.0.model.acoustic_decoder.block.0.res_unit3.conv1.weight": "model.safetensors",
|
| 420 |
+
"tied.embedding.modality_embeddings.0.model.acoustic_decoder.block.0.res_unit3.conv2.bias": "model.safetensors",
|
| 421 |
+
"tied.embedding.modality_embeddings.0.model.acoustic_decoder.block.0.res_unit3.conv2.weight": "model.safetensors",
|
| 422 |
+
"tied.embedding.modality_embeddings.0.model.acoustic_decoder.block.0.res_unit3.snake1.alpha": "model.safetensors",
|
| 423 |
+
"tied.embedding.modality_embeddings.0.model.acoustic_decoder.block.0.res_unit3.snake2.alpha": "model.safetensors",
|
| 424 |
+
"tied.embedding.modality_embeddings.0.model.acoustic_decoder.block.0.snake1.alpha": "model.safetensors",
|
| 425 |
+
"tied.embedding.modality_embeddings.0.model.acoustic_decoder.block.1.conv_t1.bias": "model.safetensors",
|
| 426 |
+
"tied.embedding.modality_embeddings.0.model.acoustic_decoder.block.1.conv_t1.weight": "model.safetensors",
|
| 427 |
+
"tied.embedding.modality_embeddings.0.model.acoustic_decoder.block.1.res_unit1.conv1.bias": "model.safetensors",
|
| 428 |
+
"tied.embedding.modality_embeddings.0.model.acoustic_decoder.block.1.res_unit1.conv1.weight": "model.safetensors",
|
| 429 |
+
"tied.embedding.modality_embeddings.0.model.acoustic_decoder.block.1.res_unit1.conv2.bias": "model.safetensors",
|
| 430 |
+
"tied.embedding.modality_embeddings.0.model.acoustic_decoder.block.1.res_unit1.conv2.weight": "model.safetensors",
|
| 431 |
+
"tied.embedding.modality_embeddings.0.model.acoustic_decoder.block.1.res_unit1.snake1.alpha": "model.safetensors",
|
| 432 |
+
"tied.embedding.modality_embeddings.0.model.acoustic_decoder.block.1.res_unit1.snake2.alpha": "model.safetensors",
|
| 433 |
+
"tied.embedding.modality_embeddings.0.model.acoustic_decoder.block.1.res_unit2.conv1.bias": "model.safetensors",
|
| 434 |
+
"tied.embedding.modality_embeddings.0.model.acoustic_decoder.block.1.res_unit2.conv1.weight": "model.safetensors",
|
| 435 |
+
"tied.embedding.modality_embeddings.0.model.acoustic_decoder.block.1.res_unit2.conv2.bias": "model.safetensors",
|
| 436 |
+
"tied.embedding.modality_embeddings.0.model.acoustic_decoder.block.1.res_unit2.conv2.weight": "model.safetensors",
|
| 437 |
+
"tied.embedding.modality_embeddings.0.model.acoustic_decoder.block.1.res_unit2.snake1.alpha": "model.safetensors",
|
| 438 |
+
"tied.embedding.modality_embeddings.0.model.acoustic_decoder.block.1.res_unit2.snake2.alpha": "model.safetensors",
|
| 439 |
+
"tied.embedding.modality_embeddings.0.model.acoustic_decoder.block.1.res_unit3.conv1.bias": "model.safetensors",
|
| 440 |
+
"tied.embedding.modality_embeddings.0.model.acoustic_decoder.block.1.res_unit3.conv1.weight": "model.safetensors",
|
| 441 |
+
"tied.embedding.modality_embeddings.0.model.acoustic_decoder.block.1.res_unit3.conv2.bias": "model.safetensors",
|
| 442 |
+
"tied.embedding.modality_embeddings.0.model.acoustic_decoder.block.1.res_unit3.conv2.weight": "model.safetensors",
|
| 443 |
+
"tied.embedding.modality_embeddings.0.model.acoustic_decoder.block.1.res_unit3.snake1.alpha": "model.safetensors",
|
| 444 |
+
"tied.embedding.modality_embeddings.0.model.acoustic_decoder.block.1.res_unit3.snake2.alpha": "model.safetensors",
|
| 445 |
+
"tied.embedding.modality_embeddings.0.model.acoustic_decoder.block.1.snake1.alpha": "model.safetensors",
|
| 446 |
+
"tied.embedding.modality_embeddings.0.model.acoustic_decoder.block.2.conv_t1.bias": "model.safetensors",
|
| 447 |
+
"tied.embedding.modality_embeddings.0.model.acoustic_decoder.block.2.conv_t1.weight": "model.safetensors",
|
| 448 |
+
"tied.embedding.modality_embeddings.0.model.acoustic_decoder.block.2.res_unit1.conv1.bias": "model.safetensors",
|
| 449 |
+
"tied.embedding.modality_embeddings.0.model.acoustic_decoder.block.2.res_unit1.conv1.weight": "model.safetensors",
|
| 450 |
+
"tied.embedding.modality_embeddings.0.model.acoustic_decoder.block.2.res_unit1.conv2.bias": "model.safetensors",
|
| 451 |
+
"tied.embedding.modality_embeddings.0.model.acoustic_decoder.block.2.res_unit1.conv2.weight": "model.safetensors",
|
| 452 |
+
"tied.embedding.modality_embeddings.0.model.acoustic_decoder.block.2.res_unit1.snake1.alpha": "model.safetensors",
|
| 453 |
+
"tied.embedding.modality_embeddings.0.model.acoustic_decoder.block.2.res_unit1.snake2.alpha": "model.safetensors",
|
| 454 |
+
"tied.embedding.modality_embeddings.0.model.acoustic_decoder.block.2.res_unit2.conv1.bias": "model.safetensors",
|
| 455 |
+
"tied.embedding.modality_embeddings.0.model.acoustic_decoder.block.2.res_unit2.conv1.weight": "model.safetensors",
|
| 456 |
+
"tied.embedding.modality_embeddings.0.model.acoustic_decoder.block.2.res_unit2.conv2.bias": "model.safetensors",
|
| 457 |
+
"tied.embedding.modality_embeddings.0.model.acoustic_decoder.block.2.res_unit2.conv2.weight": "model.safetensors",
|
| 458 |
+
"tied.embedding.modality_embeddings.0.model.acoustic_decoder.block.2.res_unit2.snake1.alpha": "model.safetensors",
|
| 459 |
+
"tied.embedding.modality_embeddings.0.model.acoustic_decoder.block.2.res_unit2.snake2.alpha": "model.safetensors",
|
| 460 |
+
"tied.embedding.modality_embeddings.0.model.acoustic_decoder.block.2.res_unit3.conv1.bias": "model.safetensors",
|
| 461 |
+
"tied.embedding.modality_embeddings.0.model.acoustic_decoder.block.2.res_unit3.conv1.weight": "model.safetensors",
|
| 462 |
+
"tied.embedding.modality_embeddings.0.model.acoustic_decoder.block.2.res_unit3.conv2.bias": "model.safetensors",
|
| 463 |
+
"tied.embedding.modality_embeddings.0.model.acoustic_decoder.block.2.res_unit3.conv2.weight": "model.safetensors",
|
| 464 |
+
"tied.embedding.modality_embeddings.0.model.acoustic_decoder.block.2.res_unit3.snake1.alpha": "model.safetensors",
|
| 465 |
+
"tied.embedding.modality_embeddings.0.model.acoustic_decoder.block.2.res_unit3.snake2.alpha": "model.safetensors",
|
| 466 |
+
"tied.embedding.modality_embeddings.0.model.acoustic_decoder.block.2.snake1.alpha": "model.safetensors",
|
| 467 |
+
"tied.embedding.modality_embeddings.0.model.acoustic_decoder.block.3.conv_t1.bias": "model.safetensors",
|
| 468 |
+
"tied.embedding.modality_embeddings.0.model.acoustic_decoder.block.3.conv_t1.weight": "model.safetensors",
|
| 469 |
+
"tied.embedding.modality_embeddings.0.model.acoustic_decoder.block.3.res_unit1.conv1.bias": "model.safetensors",
|
| 470 |
+
"tied.embedding.modality_embeddings.0.model.acoustic_decoder.block.3.res_unit1.conv1.weight": "model.safetensors",
|
| 471 |
+
"tied.embedding.modality_embeddings.0.model.acoustic_decoder.block.3.res_unit1.conv2.bias": "model.safetensors",
|
| 472 |
+
"tied.embedding.modality_embeddings.0.model.acoustic_decoder.block.3.res_unit1.conv2.weight": "model.safetensors",
|
| 473 |
+
"tied.embedding.modality_embeddings.0.model.acoustic_decoder.block.3.res_unit1.snake1.alpha": "model.safetensors",
|
| 474 |
+
"tied.embedding.modality_embeddings.0.model.acoustic_decoder.block.3.res_unit1.snake2.alpha": "model.safetensors",
|
| 475 |
+
"tied.embedding.modality_embeddings.0.model.acoustic_decoder.block.3.res_unit2.conv1.bias": "model.safetensors",
|
| 476 |
+
"tied.embedding.modality_embeddings.0.model.acoustic_decoder.block.3.res_unit2.conv1.weight": "model.safetensors",
|
| 477 |
+
"tied.embedding.modality_embeddings.0.model.acoustic_decoder.block.3.res_unit2.conv2.bias": "model.safetensors",
|
| 478 |
+
"tied.embedding.modality_embeddings.0.model.acoustic_decoder.block.3.res_unit2.conv2.weight": "model.safetensors",
|
| 479 |
+
"tied.embedding.modality_embeddings.0.model.acoustic_decoder.block.3.res_unit2.snake1.alpha": "model.safetensors",
|
| 480 |
+
"tied.embedding.modality_embeddings.0.model.acoustic_decoder.block.3.res_unit2.snake2.alpha": "model.safetensors",
|
| 481 |
+
"tied.embedding.modality_embeddings.0.model.acoustic_decoder.block.3.res_unit3.conv1.bias": "model.safetensors",
|
| 482 |
+
"tied.embedding.modality_embeddings.0.model.acoustic_decoder.block.3.res_unit3.conv1.weight": "model.safetensors",
|
| 483 |
+
"tied.embedding.modality_embeddings.0.model.acoustic_decoder.block.3.res_unit3.conv2.bias": "model.safetensors",
|
| 484 |
+
"tied.embedding.modality_embeddings.0.model.acoustic_decoder.block.3.res_unit3.conv2.weight": "model.safetensors",
|
| 485 |
+
"tied.embedding.modality_embeddings.0.model.acoustic_decoder.block.3.res_unit3.snake1.alpha": "model.safetensors",
|
| 486 |
+
"tied.embedding.modality_embeddings.0.model.acoustic_decoder.block.3.res_unit3.snake2.alpha": "model.safetensors",
|
| 487 |
+
"tied.embedding.modality_embeddings.0.model.acoustic_decoder.block.3.snake1.alpha": "model.safetensors",
|
| 488 |
+
"tied.embedding.modality_embeddings.0.model.acoustic_decoder.block.4.conv_t1.bias": "model.safetensors",
|
| 489 |
+
"tied.embedding.modality_embeddings.0.model.acoustic_decoder.block.4.conv_t1.weight": "model.safetensors",
|
| 490 |
+
"tied.embedding.modality_embeddings.0.model.acoustic_decoder.block.4.res_unit1.conv1.bias": "model.safetensors",
|
| 491 |
+
"tied.embedding.modality_embeddings.0.model.acoustic_decoder.block.4.res_unit1.conv1.weight": "model.safetensors",
|
| 492 |
+
"tied.embedding.modality_embeddings.0.model.acoustic_decoder.block.4.res_unit1.conv2.bias": "model.safetensors",
|
| 493 |
+
"tied.embedding.modality_embeddings.0.model.acoustic_decoder.block.4.res_unit1.conv2.weight": "model.safetensors",
|
| 494 |
+
"tied.embedding.modality_embeddings.0.model.acoustic_decoder.block.4.res_unit1.snake1.alpha": "model.safetensors",
|
| 495 |
+
"tied.embedding.modality_embeddings.0.model.acoustic_decoder.block.4.res_unit1.snake2.alpha": "model.safetensors",
|
| 496 |
+
"tied.embedding.modality_embeddings.0.model.acoustic_decoder.block.4.res_unit2.conv1.bias": "model.safetensors",
|
| 497 |
+
"tied.embedding.modality_embeddings.0.model.acoustic_decoder.block.4.res_unit2.conv1.weight": "model.safetensors",
|
| 498 |
+
"tied.embedding.modality_embeddings.0.model.acoustic_decoder.block.4.res_unit2.conv2.bias": "model.safetensors",
|
| 499 |
+
"tied.embedding.modality_embeddings.0.model.acoustic_decoder.block.4.res_unit2.conv2.weight": "model.safetensors",
|
| 500 |
+
"tied.embedding.modality_embeddings.0.model.acoustic_decoder.block.4.res_unit2.snake1.alpha": "model.safetensors",
|
| 501 |
+
"tied.embedding.modality_embeddings.0.model.acoustic_decoder.block.4.res_unit2.snake2.alpha": "model.safetensors",
|
| 502 |
+
"tied.embedding.modality_embeddings.0.model.acoustic_decoder.block.4.res_unit3.conv1.bias": "model.safetensors",
|
| 503 |
+
"tied.embedding.modality_embeddings.0.model.acoustic_decoder.block.4.res_unit3.conv1.weight": "model.safetensors",
|
| 504 |
+
"tied.embedding.modality_embeddings.0.model.acoustic_decoder.block.4.res_unit3.conv2.bias": "model.safetensors",
|
| 505 |
+
"tied.embedding.modality_embeddings.0.model.acoustic_decoder.block.4.res_unit3.conv2.weight": "model.safetensors",
|
| 506 |
+
"tied.embedding.modality_embeddings.0.model.acoustic_decoder.block.4.res_unit3.snake1.alpha": "model.safetensors",
|
| 507 |
+
"tied.embedding.modality_embeddings.0.model.acoustic_decoder.block.4.res_unit3.snake2.alpha": "model.safetensors",
|
| 508 |
+
"tied.embedding.modality_embeddings.0.model.acoustic_decoder.block.4.snake1.alpha": "model.safetensors",
|
| 509 |
+
"tied.embedding.modality_embeddings.0.model.acoustic_decoder.conv1.bias": "model.safetensors",
|
| 510 |
+
"tied.embedding.modality_embeddings.0.model.acoustic_decoder.conv1.weight": "model.safetensors",
|
| 511 |
+
"tied.embedding.modality_embeddings.0.model.acoustic_decoder.conv2.bias": "model.safetensors",
|
| 512 |
+
"tied.embedding.modality_embeddings.0.model.acoustic_decoder.conv2.weight": "model.safetensors",
|
| 513 |
+
"tied.embedding.modality_embeddings.0.model.acoustic_decoder.snake1.alpha": "model.safetensors",
|
| 514 |
+
"tied.embedding.modality_embeddings.0.model.acoustic_encoder.block.0.conv1.bias": "model.safetensors",
|
| 515 |
+
"tied.embedding.modality_embeddings.0.model.acoustic_encoder.block.0.conv1.weight": "model.safetensors",
|
| 516 |
+
"tied.embedding.modality_embeddings.0.model.acoustic_encoder.block.0.res_unit1.conv1.bias": "model.safetensors",
|
| 517 |
+
"tied.embedding.modality_embeddings.0.model.acoustic_encoder.block.0.res_unit1.conv1.weight": "model.safetensors",
|
| 518 |
+
"tied.embedding.modality_embeddings.0.model.acoustic_encoder.block.0.res_unit1.conv2.bias": "model.safetensors",
|
| 519 |
+
"tied.embedding.modality_embeddings.0.model.acoustic_encoder.block.0.res_unit1.conv2.weight": "model.safetensors",
|
| 520 |
+
"tied.embedding.modality_embeddings.0.model.acoustic_encoder.block.0.res_unit1.snake1.alpha": "model.safetensors",
|
| 521 |
+
"tied.embedding.modality_embeddings.0.model.acoustic_encoder.block.0.res_unit1.snake2.alpha": "model.safetensors",
|
| 522 |
+
"tied.embedding.modality_embeddings.0.model.acoustic_encoder.block.0.res_unit2.conv1.bias": "model.safetensors",
|
| 523 |
+
"tied.embedding.modality_embeddings.0.model.acoustic_encoder.block.0.res_unit2.conv1.weight": "model.safetensors",
|
| 524 |
+
"tied.embedding.modality_embeddings.0.model.acoustic_encoder.block.0.res_unit2.conv2.bias": "model.safetensors",
|
| 525 |
+
"tied.embedding.modality_embeddings.0.model.acoustic_encoder.block.0.res_unit2.conv2.weight": "model.safetensors",
|
| 526 |
+
"tied.embedding.modality_embeddings.0.model.acoustic_encoder.block.0.res_unit2.snake1.alpha": "model.safetensors",
|
| 527 |
+
"tied.embedding.modality_embeddings.0.model.acoustic_encoder.block.0.res_unit2.snake2.alpha": "model.safetensors",
|
| 528 |
+
"tied.embedding.modality_embeddings.0.model.acoustic_encoder.block.0.res_unit3.conv1.bias": "model.safetensors",
|
| 529 |
+
"tied.embedding.modality_embeddings.0.model.acoustic_encoder.block.0.res_unit3.conv1.weight": "model.safetensors",
|
| 530 |
+
"tied.embedding.modality_embeddings.0.model.acoustic_encoder.block.0.res_unit3.conv2.bias": "model.safetensors",
|
| 531 |
+
"tied.embedding.modality_embeddings.0.model.acoustic_encoder.block.0.res_unit3.conv2.weight": "model.safetensors",
|
| 532 |
+
"tied.embedding.modality_embeddings.0.model.acoustic_encoder.block.0.res_unit3.snake1.alpha": "model.safetensors",
|
| 533 |
+
"tied.embedding.modality_embeddings.0.model.acoustic_encoder.block.0.res_unit3.snake2.alpha": "model.safetensors",
|
| 534 |
+
"tied.embedding.modality_embeddings.0.model.acoustic_encoder.block.0.snake1.alpha": "model.safetensors",
|
| 535 |
+
"tied.embedding.modality_embeddings.0.model.acoustic_encoder.block.1.conv1.bias": "model.safetensors",
|
| 536 |
+
"tied.embedding.modality_embeddings.0.model.acoustic_encoder.block.1.conv1.weight": "model.safetensors",
|
| 537 |
+
"tied.embedding.modality_embeddings.0.model.acoustic_encoder.block.1.res_unit1.conv1.bias": "model.safetensors",
|
| 538 |
+
"tied.embedding.modality_embeddings.0.model.acoustic_encoder.block.1.res_unit1.conv1.weight": "model.safetensors",
|
| 539 |
+
"tied.embedding.modality_embeddings.0.model.acoustic_encoder.block.1.res_unit1.conv2.bias": "model.safetensors",
|
| 540 |
+
"tied.embedding.modality_embeddings.0.model.acoustic_encoder.block.1.res_unit1.conv2.weight": "model.safetensors",
|
| 541 |
+
"tied.embedding.modality_embeddings.0.model.acoustic_encoder.block.1.res_unit1.snake1.alpha": "model.safetensors",
|
| 542 |
+
"tied.embedding.modality_embeddings.0.model.acoustic_encoder.block.1.res_unit1.snake2.alpha": "model.safetensors",
|
| 543 |
+
"tied.embedding.modality_embeddings.0.model.acoustic_encoder.block.1.res_unit2.conv1.bias": "model.safetensors",
|
| 544 |
+
"tied.embedding.modality_embeddings.0.model.acoustic_encoder.block.1.res_unit2.conv1.weight": "model.safetensors",
|
| 545 |
+
"tied.embedding.modality_embeddings.0.model.acoustic_encoder.block.1.res_unit2.conv2.bias": "model.safetensors",
|
| 546 |
+
"tied.embedding.modality_embeddings.0.model.acoustic_encoder.block.1.res_unit2.conv2.weight": "model.safetensors",
|
| 547 |
+
"tied.embedding.modality_embeddings.0.model.acoustic_encoder.block.1.res_unit2.snake1.alpha": "model.safetensors",
|
| 548 |
+
"tied.embedding.modality_embeddings.0.model.acoustic_encoder.block.1.res_unit2.snake2.alpha": "model.safetensors",
|
| 549 |
+
"tied.embedding.modality_embeddings.0.model.acoustic_encoder.block.1.res_unit3.conv1.bias": "model.safetensors",
|
| 550 |
+
"tied.embedding.modality_embeddings.0.model.acoustic_encoder.block.1.res_unit3.conv1.weight": "model.safetensors",
|
| 551 |
+
"tied.embedding.modality_embeddings.0.model.acoustic_encoder.block.1.res_unit3.conv2.bias": "model.safetensors",
|
| 552 |
+
"tied.embedding.modality_embeddings.0.model.acoustic_encoder.block.1.res_unit3.conv2.weight": "model.safetensors",
|
| 553 |
+
"tied.embedding.modality_embeddings.0.model.acoustic_encoder.block.1.res_unit3.snake1.alpha": "model.safetensors",
|
| 554 |
+
"tied.embedding.modality_embeddings.0.model.acoustic_encoder.block.1.res_unit3.snake2.alpha": "model.safetensors",
|
| 555 |
+
"tied.embedding.modality_embeddings.0.model.acoustic_encoder.block.1.snake1.alpha": "model.safetensors",
|
| 556 |
+
"tied.embedding.modality_embeddings.0.model.acoustic_encoder.block.2.conv1.bias": "model.safetensors",
|
| 557 |
+
"tied.embedding.modality_embeddings.0.model.acoustic_encoder.block.2.conv1.weight": "model.safetensors",
|
| 558 |
+
"tied.embedding.modality_embeddings.0.model.acoustic_encoder.block.2.res_unit1.conv1.bias": "model.safetensors",
|
| 559 |
+
"tied.embedding.modality_embeddings.0.model.acoustic_encoder.block.2.res_unit1.conv1.weight": "model.safetensors",
|
| 560 |
+
"tied.embedding.modality_embeddings.0.model.acoustic_encoder.block.2.res_unit1.conv2.bias": "model.safetensors",
|
| 561 |
+
"tied.embedding.modality_embeddings.0.model.acoustic_encoder.block.2.res_unit1.conv2.weight": "model.safetensors",
|
| 562 |
+
"tied.embedding.modality_embeddings.0.model.acoustic_encoder.block.2.res_unit1.snake1.alpha": "model.safetensors",
|
| 563 |
+
"tied.embedding.modality_embeddings.0.model.acoustic_encoder.block.2.res_unit1.snake2.alpha": "model.safetensors",
|
| 564 |
+
"tied.embedding.modality_embeddings.0.model.acoustic_encoder.block.2.res_unit2.conv1.bias": "model.safetensors",
|
| 565 |
+
"tied.embedding.modality_embeddings.0.model.acoustic_encoder.block.2.res_unit2.conv1.weight": "model.safetensors",
|
| 566 |
+
"tied.embedding.modality_embeddings.0.model.acoustic_encoder.block.2.res_unit2.conv2.bias": "model.safetensors",
|
| 567 |
+
"tied.embedding.modality_embeddings.0.model.acoustic_encoder.block.2.res_unit2.conv2.weight": "model.safetensors",
|
| 568 |
+
"tied.embedding.modality_embeddings.0.model.acoustic_encoder.block.2.res_unit2.snake1.alpha": "model.safetensors",
|
| 569 |
+
"tied.embedding.modality_embeddings.0.model.acoustic_encoder.block.2.res_unit2.snake2.alpha": "model.safetensors",
|
| 570 |
+
"tied.embedding.modality_embeddings.0.model.acoustic_encoder.block.2.res_unit3.conv1.bias": "model.safetensors",
|
| 571 |
+
"tied.embedding.modality_embeddings.0.model.acoustic_encoder.block.2.res_unit3.conv1.weight": "model.safetensors",
|
| 572 |
+
"tied.embedding.modality_embeddings.0.model.acoustic_encoder.block.2.res_unit3.conv2.bias": "model.safetensors",
|
| 573 |
+
"tied.embedding.modality_embeddings.0.model.acoustic_encoder.block.2.res_unit3.conv2.weight": "model.safetensors",
|
| 574 |
+
"tied.embedding.modality_embeddings.0.model.acoustic_encoder.block.2.res_unit3.snake1.alpha": "model.safetensors",
|
| 575 |
+
"tied.embedding.modality_embeddings.0.model.acoustic_encoder.block.2.res_unit3.snake2.alpha": "model.safetensors",
|
| 576 |
+
"tied.embedding.modality_embeddings.0.model.acoustic_encoder.block.2.snake1.alpha": "model.safetensors",
|
| 577 |
+
"tied.embedding.modality_embeddings.0.model.acoustic_encoder.block.3.conv1.bias": "model.safetensors",
|
| 578 |
+
"tied.embedding.modality_embeddings.0.model.acoustic_encoder.block.3.conv1.weight": "model.safetensors",
|
| 579 |
+
"tied.embedding.modality_embeddings.0.model.acoustic_encoder.block.3.res_unit1.conv1.bias": "model.safetensors",
|
| 580 |
+
"tied.embedding.modality_embeddings.0.model.acoustic_encoder.block.3.res_unit1.conv1.weight": "model.safetensors",
|
| 581 |
+
"tied.embedding.modality_embeddings.0.model.acoustic_encoder.block.3.res_unit1.conv2.bias": "model.safetensors",
|
| 582 |
+
"tied.embedding.modality_embeddings.0.model.acoustic_encoder.block.3.res_unit1.conv2.weight": "model.safetensors",
|
| 583 |
+
"tied.embedding.modality_embeddings.0.model.acoustic_encoder.block.3.res_unit1.snake1.alpha": "model.safetensors",
|
| 584 |
+
"tied.embedding.modality_embeddings.0.model.acoustic_encoder.block.3.res_unit1.snake2.alpha": "model.safetensors",
|
| 585 |
+
"tied.embedding.modality_embeddings.0.model.acoustic_encoder.block.3.res_unit2.conv1.bias": "model.safetensors",
|
| 586 |
+
"tied.embedding.modality_embeddings.0.model.acoustic_encoder.block.3.res_unit2.conv1.weight": "model.safetensors",
|
| 587 |
+
"tied.embedding.modality_embeddings.0.model.acoustic_encoder.block.3.res_unit2.conv2.bias": "model.safetensors",
|
| 588 |
+
"tied.embedding.modality_embeddings.0.model.acoustic_encoder.block.3.res_unit2.conv2.weight": "model.safetensors",
|
| 589 |
+
"tied.embedding.modality_embeddings.0.model.acoustic_encoder.block.3.res_unit2.snake1.alpha": "model.safetensors",
|
| 590 |
+
"tied.embedding.modality_embeddings.0.model.acoustic_encoder.block.3.res_unit2.snake2.alpha": "model.safetensors",
|
| 591 |
+
"tied.embedding.modality_embeddings.0.model.acoustic_encoder.block.3.res_unit3.conv1.bias": "model.safetensors",
|
| 592 |
+
"tied.embedding.modality_embeddings.0.model.acoustic_encoder.block.3.res_unit3.conv1.weight": "model.safetensors",
|
| 593 |
+
"tied.embedding.modality_embeddings.0.model.acoustic_encoder.block.3.res_unit3.conv2.bias": "model.safetensors",
|
| 594 |
+
"tied.embedding.modality_embeddings.0.model.acoustic_encoder.block.3.res_unit3.conv2.weight": "model.safetensors",
|
| 595 |
+
"tied.embedding.modality_embeddings.0.model.acoustic_encoder.block.3.res_unit3.snake1.alpha": "model.safetensors",
|
| 596 |
+
"tied.embedding.modality_embeddings.0.model.acoustic_encoder.block.3.res_unit3.snake2.alpha": "model.safetensors",
|
| 597 |
+
"tied.embedding.modality_embeddings.0.model.acoustic_encoder.block.3.snake1.alpha": "model.safetensors",
|
| 598 |
+
"tied.embedding.modality_embeddings.0.model.acoustic_encoder.block.4.conv1.bias": "model.safetensors",
|
| 599 |
+
"tied.embedding.modality_embeddings.0.model.acoustic_encoder.block.4.conv1.weight": "model.safetensors",
|
| 600 |
+
"tied.embedding.modality_embeddings.0.model.acoustic_encoder.block.4.res_unit1.conv1.bias": "model.safetensors",
|
| 601 |
+
"tied.embedding.modality_embeddings.0.model.acoustic_encoder.block.4.res_unit1.conv1.weight": "model.safetensors",
|
| 602 |
+
"tied.embedding.modality_embeddings.0.model.acoustic_encoder.block.4.res_unit1.conv2.bias": "model.safetensors",
|
| 603 |
+
"tied.embedding.modality_embeddings.0.model.acoustic_encoder.block.4.res_unit1.conv2.weight": "model.safetensors",
|
| 604 |
+
"tied.embedding.modality_embeddings.0.model.acoustic_encoder.block.4.res_unit1.snake1.alpha": "model.safetensors",
|
| 605 |
+
"tied.embedding.modality_embeddings.0.model.acoustic_encoder.block.4.res_unit1.snake2.alpha": "model.safetensors",
|
| 606 |
+
"tied.embedding.modality_embeddings.0.model.acoustic_encoder.block.4.res_unit2.conv1.bias": "model.safetensors",
|
| 607 |
+
"tied.embedding.modality_embeddings.0.model.acoustic_encoder.block.4.res_unit2.conv1.weight": "model.safetensors",
|
| 608 |
+
"tied.embedding.modality_embeddings.0.model.acoustic_encoder.block.4.res_unit2.conv2.bias": "model.safetensors",
|
| 609 |
+
"tied.embedding.modality_embeddings.0.model.acoustic_encoder.block.4.res_unit2.conv2.weight": "model.safetensors",
|
| 610 |
+
"tied.embedding.modality_embeddings.0.model.acoustic_encoder.block.4.res_unit2.snake1.alpha": "model.safetensors",
|
| 611 |
+
"tied.embedding.modality_embeddings.0.model.acoustic_encoder.block.4.res_unit2.snake2.alpha": "model.safetensors",
|
| 612 |
+
"tied.embedding.modality_embeddings.0.model.acoustic_encoder.block.4.res_unit3.conv1.bias": "model.safetensors",
|
| 613 |
+
"tied.embedding.modality_embeddings.0.model.acoustic_encoder.block.4.res_unit3.conv1.weight": "model.safetensors",
|
| 614 |
+
"tied.embedding.modality_embeddings.0.model.acoustic_encoder.block.4.res_unit3.conv2.bias": "model.safetensors",
|
| 615 |
+
"tied.embedding.modality_embeddings.0.model.acoustic_encoder.block.4.res_unit3.conv2.weight": "model.safetensors",
|
| 616 |
+
"tied.embedding.modality_embeddings.0.model.acoustic_encoder.block.4.res_unit3.snake1.alpha": "model.safetensors",
|
| 617 |
+
"tied.embedding.modality_embeddings.0.model.acoustic_encoder.block.4.res_unit3.snake2.alpha": "model.safetensors",
|
| 618 |
+
"tied.embedding.modality_embeddings.0.model.acoustic_encoder.block.4.snake1.alpha": "model.safetensors",
|
| 619 |
+
"tied.embedding.modality_embeddings.0.model.acoustic_encoder.conv1.bias": "model.safetensors",
|
| 620 |
+
"tied.embedding.modality_embeddings.0.model.acoustic_encoder.conv1.weight": "model.safetensors",
|
| 621 |
+
"tied.embedding.modality_embeddings.0.model.acoustic_encoder.conv2.bias": "model.safetensors",
|
| 622 |
+
"tied.embedding.modality_embeddings.0.model.acoustic_encoder.conv2.weight": "model.safetensors",
|
| 623 |
+
"tied.embedding.modality_embeddings.0.model.acoustic_encoder.snake1.alpha": "model.safetensors",
|
| 624 |
+
"tied.embedding.modality_embeddings.0.model.decoder_semantic.conv1.weight": "model.safetensors",
|
| 625 |
+
"tied.embedding.modality_embeddings.0.model.decoder_semantic.conv2.weight": "model.safetensors",
|
| 626 |
+
"tied.embedding.modality_embeddings.0.model.decoder_semantic.conv_blocks.0.conv.bias": "model.safetensors",
|
| 627 |
+
"tied.embedding.modality_embeddings.0.model.decoder_semantic.conv_blocks.0.conv.weight": "model.safetensors",
|
| 628 |
+
"tied.embedding.modality_embeddings.0.model.decoder_semantic.conv_blocks.0.res_units.0.conv1.weight": "model.safetensors",
|
| 629 |
+
"tied.embedding.modality_embeddings.0.model.decoder_semantic.conv_blocks.0.res_units.0.conv2.weight": "model.safetensors",
|
| 630 |
+
"tied.embedding.modality_embeddings.0.model.decoder_semantic.conv_blocks.0.res_units.1.conv1.weight": "model.safetensors",
|
| 631 |
+
"tied.embedding.modality_embeddings.0.model.decoder_semantic.conv_blocks.0.res_units.1.conv2.weight": "model.safetensors",
|
| 632 |
+
"tied.embedding.modality_embeddings.0.model.decoder_semantic.conv_blocks.1.conv.bias": "model.safetensors",
|
| 633 |
+
"tied.embedding.modality_embeddings.0.model.decoder_semantic.conv_blocks.1.conv.weight": "model.safetensors",
|
| 634 |
+
"tied.embedding.modality_embeddings.0.model.decoder_semantic.conv_blocks.1.res_units.0.conv1.weight": "model.safetensors",
|
| 635 |
+
"tied.embedding.modality_embeddings.0.model.decoder_semantic.conv_blocks.1.res_units.0.conv2.weight": "model.safetensors",
|
| 636 |
+
"tied.embedding.modality_embeddings.0.model.decoder_semantic.conv_blocks.1.res_units.1.conv1.weight": "model.safetensors",
|
| 637 |
+
"tied.embedding.modality_embeddings.0.model.decoder_semantic.conv_blocks.1.res_units.1.conv2.weight": "model.safetensors",
|
| 638 |
+
"tied.embedding.modality_embeddings.0.model.encoder_semantic.conv.weight": "model.safetensors",
|
| 639 |
+
"tied.embedding.modality_embeddings.0.model.encoder_semantic.conv_blocks.0.conv.bias": "model.safetensors",
|
| 640 |
+
"tied.embedding.modality_embeddings.0.model.encoder_semantic.conv_blocks.0.conv.weight": "model.safetensors",
|
| 641 |
+
"tied.embedding.modality_embeddings.0.model.encoder_semantic.conv_blocks.0.res_units.0.conv1.weight": "model.safetensors",
|
| 642 |
+
"tied.embedding.modality_embeddings.0.model.encoder_semantic.conv_blocks.0.res_units.0.conv2.weight": "model.safetensors",
|
| 643 |
+
"tied.embedding.modality_embeddings.0.model.encoder_semantic.conv_blocks.0.res_units.1.conv1.weight": "model.safetensors",
|
| 644 |
+
"tied.embedding.modality_embeddings.0.model.encoder_semantic.conv_blocks.0.res_units.1.conv2.weight": "model.safetensors",
|
| 645 |
+
"tied.embedding.modality_embeddings.0.model.encoder_semantic.conv_blocks.1.conv.bias": "model.safetensors",
|
| 646 |
+
"tied.embedding.modality_embeddings.0.model.encoder_semantic.conv_blocks.1.conv.weight": "model.safetensors",
|
| 647 |
+
"tied.embedding.modality_embeddings.0.model.encoder_semantic.conv_blocks.1.res_units.0.conv1.weight": "model.safetensors",
|
| 648 |
+
"tied.embedding.modality_embeddings.0.model.encoder_semantic.conv_blocks.1.res_units.0.conv2.weight": "model.safetensors",
|
| 649 |
+
"tied.embedding.modality_embeddings.0.model.encoder_semantic.conv_blocks.1.res_units.1.conv1.weight": "model.safetensors",
|
| 650 |
+
"tied.embedding.modality_embeddings.0.model.encoder_semantic.conv_blocks.1.res_units.1.conv2.weight": "model.safetensors",
|
| 651 |
+
"tied.embedding.modality_embeddings.0.model.fc.bias": "model.safetensors",
|
| 652 |
+
"tied.embedding.modality_embeddings.0.model.fc.weight": "model.safetensors",
|
| 653 |
+
"tied.embedding.modality_embeddings.0.model.fc1.bias": "model.safetensors",
|
| 654 |
+
"tied.embedding.modality_embeddings.0.model.fc1.weight": "model.safetensors",
|
| 655 |
+
"tied.embedding.modality_embeddings.0.model.fc2.bias": "model.safetensors",
|
| 656 |
+
"tied.embedding.modality_embeddings.0.model.fc2.weight": "model.safetensors",
|
| 657 |
+
"tied.embedding.modality_embeddings.0.model.quantizer.quantizers.0.codebook.cluster_size": "model.safetensors",
|
| 658 |
+
"tied.embedding.modality_embeddings.0.model.quantizer.quantizers.0.codebook.embed": "model.safetensors",
|
| 659 |
+
"tied.embedding.modality_embeddings.0.model.quantizer.quantizers.0.codebook.embed_avg": "model.safetensors",
|
| 660 |
+
"tied.embedding.modality_embeddings.0.model.quantizer.quantizers.0.codebook.inited": "model.safetensors",
|
| 661 |
+
"tied.embedding.modality_embeddings.0.model.quantizer.quantizers.0.project_in.bias": "model.safetensors",
|
| 662 |
+
"tied.embedding.modality_embeddings.0.model.quantizer.quantizers.0.project_in.weight": "model.safetensors",
|
| 663 |
+
"tied.embedding.modality_embeddings.0.model.quantizer.quantizers.0.project_out.bias": "model.safetensors",
|
| 664 |
+
"tied.embedding.modality_embeddings.0.model.quantizer.quantizers.0.project_out.weight": "model.safetensors",
|
| 665 |
+
"tied.embedding.modality_embeddings.0.model.quantizer.quantizers.1.codebook.cluster_size": "model.safetensors",
|
| 666 |
+
"tied.embedding.modality_embeddings.0.model.quantizer.quantizers.1.codebook.embed": "model.safetensors",
|
| 667 |
+
"tied.embedding.modality_embeddings.0.model.quantizer.quantizers.1.codebook.embed_avg": "model.safetensors",
|
| 668 |
+
"tied.embedding.modality_embeddings.0.model.quantizer.quantizers.1.codebook.inited": "model.safetensors",
|
| 669 |
+
"tied.embedding.modality_embeddings.0.model.quantizer.quantizers.1.project_in.bias": "model.safetensors",
|
| 670 |
+
"tied.embedding.modality_embeddings.0.model.quantizer.quantizers.1.project_in.weight": "model.safetensors",
|
| 671 |
+
"tied.embedding.modality_embeddings.0.model.quantizer.quantizers.1.project_out.bias": "model.safetensors",
|
| 672 |
+
"tied.embedding.modality_embeddings.0.model.quantizer.quantizers.1.project_out.weight": "model.safetensors",
|
| 673 |
+
"tied.embedding.modality_embeddings.0.model.quantizer.quantizers.2.codebook.cluster_size": "model.safetensors",
|
| 674 |
+
"tied.embedding.modality_embeddings.0.model.quantizer.quantizers.2.codebook.embed": "model.safetensors",
|
| 675 |
+
"tied.embedding.modality_embeddings.0.model.quantizer.quantizers.2.codebook.embed_avg": "model.safetensors",
|
| 676 |
+
"tied.embedding.modality_embeddings.0.model.quantizer.quantizers.2.codebook.inited": "model.safetensors",
|
| 677 |
+
"tied.embedding.modality_embeddings.0.model.quantizer.quantizers.2.project_in.bias": "model.safetensors",
|
| 678 |
+
"tied.embedding.modality_embeddings.0.model.quantizer.quantizers.2.project_in.weight": "model.safetensors",
|
| 679 |
+
"tied.embedding.modality_embeddings.0.model.quantizer.quantizers.2.project_out.bias": "model.safetensors",
|
| 680 |
+
"tied.embedding.modality_embeddings.0.model.quantizer.quantizers.2.project_out.weight": "model.safetensors",
|
| 681 |
+
"tied.embedding.modality_embeddings.0.model.quantizer.quantizers.3.codebook.cluster_size": "model.safetensors",
|
| 682 |
+
"tied.embedding.modality_embeddings.0.model.quantizer.quantizers.3.codebook.embed": "model.safetensors",
|
| 683 |
+
"tied.embedding.modality_embeddings.0.model.quantizer.quantizers.3.codebook.embed_avg": "model.safetensors",
|
| 684 |
+
"tied.embedding.modality_embeddings.0.model.quantizer.quantizers.3.codebook.inited": "model.safetensors",
|
| 685 |
+
"tied.embedding.modality_embeddings.0.model.quantizer.quantizers.3.project_in.bias": "model.safetensors",
|
| 686 |
+
"tied.embedding.modality_embeddings.0.model.quantizer.quantizers.3.project_in.weight": "model.safetensors",
|
| 687 |
+
"tied.embedding.modality_embeddings.0.model.quantizer.quantizers.3.project_out.bias": "model.safetensors",
|
| 688 |
+
"tied.embedding.modality_embeddings.0.model.quantizer.quantizers.3.project_out.weight": "model.safetensors",
|
| 689 |
+
"tied.embedding.modality_embeddings.0.model.quantizer.quantizers.4.codebook.cluster_size": "model.safetensors",
|
| 690 |
+
"tied.embedding.modality_embeddings.0.model.quantizer.quantizers.4.codebook.embed": "model.safetensors",
|
| 691 |
+
"tied.embedding.modality_embeddings.0.model.quantizer.quantizers.4.codebook.embed_avg": "model.safetensors",
|
| 692 |
+
"tied.embedding.modality_embeddings.0.model.quantizer.quantizers.4.codebook.inited": "model.safetensors",
|
| 693 |
+
"tied.embedding.modality_embeddings.0.model.quantizer.quantizers.4.project_in.bias": "model.safetensors",
|
| 694 |
+
"tied.embedding.modality_embeddings.0.model.quantizer.quantizers.4.project_in.weight": "model.safetensors",
|
| 695 |
+
"tied.embedding.modality_embeddings.0.model.quantizer.quantizers.4.project_out.bias": "model.safetensors",
|
| 696 |
+
"tied.embedding.modality_embeddings.0.model.quantizer.quantizers.4.project_out.weight": "model.safetensors",
|
| 697 |
+
"tied.embedding.modality_embeddings.0.model.quantizer.quantizers.5.codebook.cluster_size": "model.safetensors",
|
| 698 |
+
"tied.embedding.modality_embeddings.0.model.quantizer.quantizers.5.codebook.embed": "model.safetensors",
|
| 699 |
+
"tied.embedding.modality_embeddings.0.model.quantizer.quantizers.5.codebook.embed_avg": "model.safetensors",
|
| 700 |
+
"tied.embedding.modality_embeddings.0.model.quantizer.quantizers.5.codebook.inited": "model.safetensors",
|
| 701 |
+
"tied.embedding.modality_embeddings.0.model.quantizer.quantizers.5.project_in.bias": "model.safetensors",
|
| 702 |
+
"tied.embedding.modality_embeddings.0.model.quantizer.quantizers.5.project_in.weight": "model.safetensors",
|
| 703 |
+
"tied.embedding.modality_embeddings.0.model.quantizer.quantizers.5.project_out.bias": "model.safetensors",
|
| 704 |
+
"tied.embedding.modality_embeddings.0.model.quantizer.quantizers.5.project_out.weight": "model.safetensors",
|
| 705 |
+
"tied.embedding.modality_embeddings.0.model.quantizer.quantizers.6.codebook.cluster_size": "model.safetensors",
|
| 706 |
+
"tied.embedding.modality_embeddings.0.model.quantizer.quantizers.6.codebook.embed": "model.safetensors",
|
| 707 |
+
"tied.embedding.modality_embeddings.0.model.quantizer.quantizers.6.codebook.embed_avg": "model.safetensors",
|
| 708 |
+
"tied.embedding.modality_embeddings.0.model.quantizer.quantizers.6.codebook.inited": "model.safetensors",
|
| 709 |
+
"tied.embedding.modality_embeddings.0.model.quantizer.quantizers.6.project_in.bias": "model.safetensors",
|
| 710 |
+
"tied.embedding.modality_embeddings.0.model.quantizer.quantizers.6.project_in.weight": "model.safetensors",
|
| 711 |
+
"tied.embedding.modality_embeddings.0.model.quantizer.quantizers.6.project_out.bias": "model.safetensors",
|
| 712 |
+
"tied.embedding.modality_embeddings.0.model.quantizer.quantizers.6.project_out.weight": "model.safetensors",
|
| 713 |
+
"tied.embedding.modality_embeddings.0.model.quantizer.quantizers.7.codebook.cluster_size": "model.safetensors",
|
| 714 |
+
"tied.embedding.modality_embeddings.0.model.quantizer.quantizers.7.codebook.embed": "model.safetensors",
|
| 715 |
+
"tied.embedding.modality_embeddings.0.model.quantizer.quantizers.7.codebook.embed_avg": "model.safetensors",
|
| 716 |
+
"tied.embedding.modality_embeddings.0.model.quantizer.quantizers.7.codebook.inited": "model.safetensors",
|
| 717 |
+
"tied.embedding.modality_embeddings.0.model.quantizer.quantizers.7.project_in.bias": "model.safetensors",
|
| 718 |
+
"tied.embedding.modality_embeddings.0.model.quantizer.quantizers.7.project_in.weight": "model.safetensors",
|
| 719 |
+
"tied.embedding.modality_embeddings.0.model.quantizer.quantizers.7.project_out.bias": "model.safetensors",
|
| 720 |
+
"tied.embedding.modality_embeddings.0.model.quantizer.quantizers.7.project_out.weight": "model.safetensors",
|
| 721 |
+
"tied.embedding.modality_embeddings.0.model.semantic_model.encoder.layer_norm.bias": "model.safetensors",
|
| 722 |
+
"tied.embedding.modality_embeddings.0.model.semantic_model.encoder.layer_norm.weight": "model.safetensors",
|
| 723 |
+
"tied.embedding.modality_embeddings.0.model.semantic_model.encoder.layers.0.attention.k_proj.bias": "model.safetensors",
|
| 724 |
+
"tied.embedding.modality_embeddings.0.model.semantic_model.encoder.layers.0.attention.k_proj.weight": "model.safetensors",
|
| 725 |
+
"tied.embedding.modality_embeddings.0.model.semantic_model.encoder.layers.0.attention.out_proj.bias": "model.safetensors",
|
| 726 |
+
"tied.embedding.modality_embeddings.0.model.semantic_model.encoder.layers.0.attention.out_proj.weight": "model.safetensors",
|
| 727 |
+
"tied.embedding.modality_embeddings.0.model.semantic_model.encoder.layers.0.attention.q_proj.bias": "model.safetensors",
|
| 728 |
+
"tied.embedding.modality_embeddings.0.model.semantic_model.encoder.layers.0.attention.q_proj.weight": "model.safetensors",
|
| 729 |
+
"tied.embedding.modality_embeddings.0.model.semantic_model.encoder.layers.0.attention.v_proj.bias": "model.safetensors",
|
| 730 |
+
"tied.embedding.modality_embeddings.0.model.semantic_model.encoder.layers.0.attention.v_proj.weight": "model.safetensors",
|
| 731 |
+
"tied.embedding.modality_embeddings.0.model.semantic_model.encoder.layers.0.feed_forward.intermediate_dense.bias": "model.safetensors",
|
| 732 |
+
"tied.embedding.modality_embeddings.0.model.semantic_model.encoder.layers.0.feed_forward.intermediate_dense.weight": "model.safetensors",
|
| 733 |
+
"tied.embedding.modality_embeddings.0.model.semantic_model.encoder.layers.0.feed_forward.output_dense.bias": "model.safetensors",
|
| 734 |
+
"tied.embedding.modality_embeddings.0.model.semantic_model.encoder.layers.0.feed_forward.output_dense.weight": "model.safetensors",
|
| 735 |
+
"tied.embedding.modality_embeddings.0.model.semantic_model.encoder.layers.0.final_layer_norm.bias": "model.safetensors",
|
| 736 |
+
"tied.embedding.modality_embeddings.0.model.semantic_model.encoder.layers.0.final_layer_norm.weight": "model.safetensors",
|
| 737 |
+
"tied.embedding.modality_embeddings.0.model.semantic_model.encoder.layers.0.layer_norm.bias": "model.safetensors",
|
| 738 |
+
"tied.embedding.modality_embeddings.0.model.semantic_model.encoder.layers.0.layer_norm.weight": "model.safetensors",
|
| 739 |
+
"tied.embedding.modality_embeddings.0.model.semantic_model.encoder.layers.1.attention.k_proj.bias": "model.safetensors",
|
| 740 |
+
"tied.embedding.modality_embeddings.0.model.semantic_model.encoder.layers.1.attention.k_proj.weight": "model.safetensors",
|
| 741 |
+
"tied.embedding.modality_embeddings.0.model.semantic_model.encoder.layers.1.attention.out_proj.bias": "model.safetensors",
|
| 742 |
+
"tied.embedding.modality_embeddings.0.model.semantic_model.encoder.layers.1.attention.out_proj.weight": "model.safetensors",
|
| 743 |
+
"tied.embedding.modality_embeddings.0.model.semantic_model.encoder.layers.1.attention.q_proj.bias": "model.safetensors",
|
| 744 |
+
"tied.embedding.modality_embeddings.0.model.semantic_model.encoder.layers.1.attention.q_proj.weight": "model.safetensors",
|
| 745 |
+
"tied.embedding.modality_embeddings.0.model.semantic_model.encoder.layers.1.attention.v_proj.bias": "model.safetensors",
|
| 746 |
+
"tied.embedding.modality_embeddings.0.model.semantic_model.encoder.layers.1.attention.v_proj.weight": "model.safetensors",
|
| 747 |
+
"tied.embedding.modality_embeddings.0.model.semantic_model.encoder.layers.1.feed_forward.intermediate_dense.bias": "model.safetensors",
|
| 748 |
+
"tied.embedding.modality_embeddings.0.model.semantic_model.encoder.layers.1.feed_forward.intermediate_dense.weight": "model.safetensors",
|
| 749 |
+
"tied.embedding.modality_embeddings.0.model.semantic_model.encoder.layers.1.feed_forward.output_dense.bias": "model.safetensors",
|
| 750 |
+
"tied.embedding.modality_embeddings.0.model.semantic_model.encoder.layers.1.feed_forward.output_dense.weight": "model.safetensors",
|
| 751 |
+
"tied.embedding.modality_embeddings.0.model.semantic_model.encoder.layers.1.final_layer_norm.bias": "model.safetensors",
|
| 752 |
+
"tied.embedding.modality_embeddings.0.model.semantic_model.encoder.layers.1.final_layer_norm.weight": "model.safetensors",
|
| 753 |
+
"tied.embedding.modality_embeddings.0.model.semantic_model.encoder.layers.1.layer_norm.bias": "model.safetensors",
|
| 754 |
+
"tied.embedding.modality_embeddings.0.model.semantic_model.encoder.layers.1.layer_norm.weight": "model.safetensors",
|
| 755 |
+
"tied.embedding.modality_embeddings.0.model.semantic_model.encoder.layers.10.attention.k_proj.bias": "model.safetensors",
|
| 756 |
+
"tied.embedding.modality_embeddings.0.model.semantic_model.encoder.layers.10.attention.k_proj.weight": "model.safetensors",
|
| 757 |
+
"tied.embedding.modality_embeddings.0.model.semantic_model.encoder.layers.10.attention.out_proj.bias": "model.safetensors",
|
| 758 |
+
"tied.embedding.modality_embeddings.0.model.semantic_model.encoder.layers.10.attention.out_proj.weight": "model.safetensors",
|
| 759 |
+
"tied.embedding.modality_embeddings.0.model.semantic_model.encoder.layers.10.attention.q_proj.bias": "model.safetensors",
|
| 760 |
+
"tied.embedding.modality_embeddings.0.model.semantic_model.encoder.layers.10.attention.q_proj.weight": "model.safetensors",
|
| 761 |
+
"tied.embedding.modality_embeddings.0.model.semantic_model.encoder.layers.10.attention.v_proj.bias": "model.safetensors",
|
| 762 |
+
"tied.embedding.modality_embeddings.0.model.semantic_model.encoder.layers.10.attention.v_proj.weight": "model.safetensors",
|
| 763 |
+
"tied.embedding.modality_embeddings.0.model.semantic_model.encoder.layers.10.feed_forward.intermediate_dense.bias": "model.safetensors",
|
| 764 |
+
"tied.embedding.modality_embeddings.0.model.semantic_model.encoder.layers.10.feed_forward.intermediate_dense.weight": "model.safetensors",
|
| 765 |
+
"tied.embedding.modality_embeddings.0.model.semantic_model.encoder.layers.10.feed_forward.output_dense.bias": "model.safetensors",
|
| 766 |
+
"tied.embedding.modality_embeddings.0.model.semantic_model.encoder.layers.10.feed_forward.output_dense.weight": "model.safetensors",
|
| 767 |
+
"tied.embedding.modality_embeddings.0.model.semantic_model.encoder.layers.10.final_layer_norm.bias": "model.safetensors",
|
| 768 |
+
"tied.embedding.modality_embeddings.0.model.semantic_model.encoder.layers.10.final_layer_norm.weight": "model.safetensors",
|
| 769 |
+
"tied.embedding.modality_embeddings.0.model.semantic_model.encoder.layers.10.layer_norm.bias": "model.safetensors",
|
| 770 |
+
"tied.embedding.modality_embeddings.0.model.semantic_model.encoder.layers.10.layer_norm.weight": "model.safetensors",
|
| 771 |
+
"tied.embedding.modality_embeddings.0.model.semantic_model.encoder.layers.11.attention.k_proj.bias": "model.safetensors",
|
| 772 |
+
"tied.embedding.modality_embeddings.0.model.semantic_model.encoder.layers.11.attention.k_proj.weight": "model.safetensors",
|
| 773 |
+
"tied.embedding.modality_embeddings.0.model.semantic_model.encoder.layers.11.attention.out_proj.bias": "model.safetensors",
|
| 774 |
+
"tied.embedding.modality_embeddings.0.model.semantic_model.encoder.layers.11.attention.out_proj.weight": "model.safetensors",
|
| 775 |
+
"tied.embedding.modality_embeddings.0.model.semantic_model.encoder.layers.11.attention.q_proj.bias": "model.safetensors",
|
| 776 |
+
"tied.embedding.modality_embeddings.0.model.semantic_model.encoder.layers.11.attention.q_proj.weight": "model.safetensors",
|
| 777 |
+
"tied.embedding.modality_embeddings.0.model.semantic_model.encoder.layers.11.attention.v_proj.bias": "model.safetensors",
|
| 778 |
+
"tied.embedding.modality_embeddings.0.model.semantic_model.encoder.layers.11.attention.v_proj.weight": "model.safetensors",
|
| 779 |
+
"tied.embedding.modality_embeddings.0.model.semantic_model.encoder.layers.11.feed_forward.intermediate_dense.bias": "model.safetensors",
|
| 780 |
+
"tied.embedding.modality_embeddings.0.model.semantic_model.encoder.layers.11.feed_forward.intermediate_dense.weight": "model.safetensors",
|
| 781 |
+
"tied.embedding.modality_embeddings.0.model.semantic_model.encoder.layers.11.feed_forward.output_dense.bias": "model.safetensors",
|
| 782 |
+
"tied.embedding.modality_embeddings.0.model.semantic_model.encoder.layers.11.feed_forward.output_dense.weight": "model.safetensors",
|
| 783 |
+
"tied.embedding.modality_embeddings.0.model.semantic_model.encoder.layers.11.final_layer_norm.bias": "model.safetensors",
|
| 784 |
+
"tied.embedding.modality_embeddings.0.model.semantic_model.encoder.layers.11.final_layer_norm.weight": "model.safetensors",
|
| 785 |
+
"tied.embedding.modality_embeddings.0.model.semantic_model.encoder.layers.11.layer_norm.bias": "model.safetensors",
|
| 786 |
+
"tied.embedding.modality_embeddings.0.model.semantic_model.encoder.layers.11.layer_norm.weight": "model.safetensors",
|
| 787 |
+
"tied.embedding.modality_embeddings.0.model.semantic_model.encoder.layers.2.attention.k_proj.bias": "model.safetensors",
|
| 788 |
+
"tied.embedding.modality_embeddings.0.model.semantic_model.encoder.layers.2.attention.k_proj.weight": "model.safetensors",
|
| 789 |
+
"tied.embedding.modality_embeddings.0.model.semantic_model.encoder.layers.2.attention.out_proj.bias": "model.safetensors",
|
| 790 |
+
"tied.embedding.modality_embeddings.0.model.semantic_model.encoder.layers.2.attention.out_proj.weight": "model.safetensors",
|
| 791 |
+
"tied.embedding.modality_embeddings.0.model.semantic_model.encoder.layers.2.attention.q_proj.bias": "model.safetensors",
|
| 792 |
+
"tied.embedding.modality_embeddings.0.model.semantic_model.encoder.layers.2.attention.q_proj.weight": "model.safetensors",
|
| 793 |
+
"tied.embedding.modality_embeddings.0.model.semantic_model.encoder.layers.2.attention.v_proj.bias": "model.safetensors",
|
| 794 |
+
"tied.embedding.modality_embeddings.0.model.semantic_model.encoder.layers.2.attention.v_proj.weight": "model.safetensors",
|
| 795 |
+
"tied.embedding.modality_embeddings.0.model.semantic_model.encoder.layers.2.feed_forward.intermediate_dense.bias": "model.safetensors",
|
| 796 |
+
"tied.embedding.modality_embeddings.0.model.semantic_model.encoder.layers.2.feed_forward.intermediate_dense.weight": "model.safetensors",
|
| 797 |
+
"tied.embedding.modality_embeddings.0.model.semantic_model.encoder.layers.2.feed_forward.output_dense.bias": "model.safetensors",
|
| 798 |
+
"tied.embedding.modality_embeddings.0.model.semantic_model.encoder.layers.2.feed_forward.output_dense.weight": "model.safetensors",
|
| 799 |
+
"tied.embedding.modality_embeddings.0.model.semantic_model.encoder.layers.2.final_layer_norm.bias": "model.safetensors",
|
| 800 |
+
"tied.embedding.modality_embeddings.0.model.semantic_model.encoder.layers.2.final_layer_norm.weight": "model.safetensors",
|
| 801 |
+
"tied.embedding.modality_embeddings.0.model.semantic_model.encoder.layers.2.layer_norm.bias": "model.safetensors",
|
| 802 |
+
"tied.embedding.modality_embeddings.0.model.semantic_model.encoder.layers.2.layer_norm.weight": "model.safetensors",
|
| 803 |
+
"tied.embedding.modality_embeddings.0.model.semantic_model.encoder.layers.3.attention.k_proj.bias": "model.safetensors",
|
| 804 |
+
"tied.embedding.modality_embeddings.0.model.semantic_model.encoder.layers.3.attention.k_proj.weight": "model.safetensors",
|
| 805 |
+
"tied.embedding.modality_embeddings.0.model.semantic_model.encoder.layers.3.attention.out_proj.bias": "model.safetensors",
|
| 806 |
+
"tied.embedding.modality_embeddings.0.model.semantic_model.encoder.layers.3.attention.out_proj.weight": "model.safetensors",
|
| 807 |
+
"tied.embedding.modality_embeddings.0.model.semantic_model.encoder.layers.3.attention.q_proj.bias": "model.safetensors",
|
| 808 |
+
"tied.embedding.modality_embeddings.0.model.semantic_model.encoder.layers.3.attention.q_proj.weight": "model.safetensors",
|
| 809 |
+
"tied.embedding.modality_embeddings.0.model.semantic_model.encoder.layers.3.attention.v_proj.bias": "model.safetensors",
|
| 810 |
+
"tied.embedding.modality_embeddings.0.model.semantic_model.encoder.layers.3.attention.v_proj.weight": "model.safetensors",
|
| 811 |
+
"tied.embedding.modality_embeddings.0.model.semantic_model.encoder.layers.3.feed_forward.intermediate_dense.bias": "model.safetensors",
|
| 812 |
+
"tied.embedding.modality_embeddings.0.model.semantic_model.encoder.layers.3.feed_forward.intermediate_dense.weight": "model.safetensors",
|
| 813 |
+
"tied.embedding.modality_embeddings.0.model.semantic_model.encoder.layers.3.feed_forward.output_dense.bias": "model.safetensors",
|
| 814 |
+
"tied.embedding.modality_embeddings.0.model.semantic_model.encoder.layers.3.feed_forward.output_dense.weight": "model.safetensors",
|
| 815 |
+
"tied.embedding.modality_embeddings.0.model.semantic_model.encoder.layers.3.final_layer_norm.bias": "model.safetensors",
|
| 816 |
+
"tied.embedding.modality_embeddings.0.model.semantic_model.encoder.layers.3.final_layer_norm.weight": "model.safetensors",
|
| 817 |
+
"tied.embedding.modality_embeddings.0.model.semantic_model.encoder.layers.3.layer_norm.bias": "model.safetensors",
|
| 818 |
+
"tied.embedding.modality_embeddings.0.model.semantic_model.encoder.layers.3.layer_norm.weight": "model.safetensors",
|
| 819 |
+
"tied.embedding.modality_embeddings.0.model.semantic_model.encoder.layers.4.attention.k_proj.bias": "model.safetensors",
|
| 820 |
+
"tied.embedding.modality_embeddings.0.model.semantic_model.encoder.layers.4.attention.k_proj.weight": "model.safetensors",
|
| 821 |
+
"tied.embedding.modality_embeddings.0.model.semantic_model.encoder.layers.4.attention.out_proj.bias": "model.safetensors",
|
| 822 |
+
"tied.embedding.modality_embeddings.0.model.semantic_model.encoder.layers.4.attention.out_proj.weight": "model.safetensors",
|
| 823 |
+
"tied.embedding.modality_embeddings.0.model.semantic_model.encoder.layers.4.attention.q_proj.bias": "model.safetensors",
|
| 824 |
+
"tied.embedding.modality_embeddings.0.model.semantic_model.encoder.layers.4.attention.q_proj.weight": "model.safetensors",
|
| 825 |
+
"tied.embedding.modality_embeddings.0.model.semantic_model.encoder.layers.4.attention.v_proj.bias": "model.safetensors",
|
| 826 |
+
"tied.embedding.modality_embeddings.0.model.semantic_model.encoder.layers.4.attention.v_proj.weight": "model.safetensors",
|
| 827 |
+
"tied.embedding.modality_embeddings.0.model.semantic_model.encoder.layers.4.feed_forward.intermediate_dense.bias": "model.safetensors",
|
| 828 |
+
"tied.embedding.modality_embeddings.0.model.semantic_model.encoder.layers.4.feed_forward.intermediate_dense.weight": "model.safetensors",
|
| 829 |
+
"tied.embedding.modality_embeddings.0.model.semantic_model.encoder.layers.4.feed_forward.output_dense.bias": "model.safetensors",
|
| 830 |
+
"tied.embedding.modality_embeddings.0.model.semantic_model.encoder.layers.4.feed_forward.output_dense.weight": "model.safetensors",
|
| 831 |
+
"tied.embedding.modality_embeddings.0.model.semantic_model.encoder.layers.4.final_layer_norm.bias": "model.safetensors",
|
| 832 |
+
"tied.embedding.modality_embeddings.0.model.semantic_model.encoder.layers.4.final_layer_norm.weight": "model.safetensors",
|
| 833 |
+
"tied.embedding.modality_embeddings.0.model.semantic_model.encoder.layers.4.layer_norm.bias": "model.safetensors",
|
| 834 |
+
"tied.embedding.modality_embeddings.0.model.semantic_model.encoder.layers.4.layer_norm.weight": "model.safetensors",
|
| 835 |
+
"tied.embedding.modality_embeddings.0.model.semantic_model.encoder.layers.5.attention.k_proj.bias": "model.safetensors",
|
| 836 |
+
"tied.embedding.modality_embeddings.0.model.semantic_model.encoder.layers.5.attention.k_proj.weight": "model.safetensors",
|
| 837 |
+
"tied.embedding.modality_embeddings.0.model.semantic_model.encoder.layers.5.attention.out_proj.bias": "model.safetensors",
|
| 838 |
+
"tied.embedding.modality_embeddings.0.model.semantic_model.encoder.layers.5.attention.out_proj.weight": "model.safetensors",
|
| 839 |
+
"tied.embedding.modality_embeddings.0.model.semantic_model.encoder.layers.5.attention.q_proj.bias": "model.safetensors",
|
| 840 |
+
"tied.embedding.modality_embeddings.0.model.semantic_model.encoder.layers.5.attention.q_proj.weight": "model.safetensors",
|
| 841 |
+
"tied.embedding.modality_embeddings.0.model.semantic_model.encoder.layers.5.attention.v_proj.bias": "model.safetensors",
|
| 842 |
+
"tied.embedding.modality_embeddings.0.model.semantic_model.encoder.layers.5.attention.v_proj.weight": "model.safetensors",
|
| 843 |
+
"tied.embedding.modality_embeddings.0.model.semantic_model.encoder.layers.5.feed_forward.intermediate_dense.bias": "model.safetensors",
|
| 844 |
+
"tied.embedding.modality_embeddings.0.model.semantic_model.encoder.layers.5.feed_forward.intermediate_dense.weight": "model.safetensors",
|
| 845 |
+
"tied.embedding.modality_embeddings.0.model.semantic_model.encoder.layers.5.feed_forward.output_dense.bias": "model.safetensors",
|
| 846 |
+
"tied.embedding.modality_embeddings.0.model.semantic_model.encoder.layers.5.feed_forward.output_dense.weight": "model.safetensors",
|
| 847 |
+
"tied.embedding.modality_embeddings.0.model.semantic_model.encoder.layers.5.final_layer_norm.bias": "model.safetensors",
|
| 848 |
+
"tied.embedding.modality_embeddings.0.model.semantic_model.encoder.layers.5.final_layer_norm.weight": "model.safetensors",
|
| 849 |
+
"tied.embedding.modality_embeddings.0.model.semantic_model.encoder.layers.5.layer_norm.bias": "model.safetensors",
|
| 850 |
+
"tied.embedding.modality_embeddings.0.model.semantic_model.encoder.layers.5.layer_norm.weight": "model.safetensors",
|
| 851 |
+
"tied.embedding.modality_embeddings.0.model.semantic_model.encoder.layers.6.attention.k_proj.bias": "model.safetensors",
|
| 852 |
+
"tied.embedding.modality_embeddings.0.model.semantic_model.encoder.layers.6.attention.k_proj.weight": "model.safetensors",
|
| 853 |
+
"tied.embedding.modality_embeddings.0.model.semantic_model.encoder.layers.6.attention.out_proj.bias": "model.safetensors",
|
| 854 |
+
"tied.embedding.modality_embeddings.0.model.semantic_model.encoder.layers.6.attention.out_proj.weight": "model.safetensors",
|
| 855 |
+
"tied.embedding.modality_embeddings.0.model.semantic_model.encoder.layers.6.attention.q_proj.bias": "model.safetensors",
|
| 856 |
+
"tied.embedding.modality_embeddings.0.model.semantic_model.encoder.layers.6.attention.q_proj.weight": "model.safetensors",
|
| 857 |
+
"tied.embedding.modality_embeddings.0.model.semantic_model.encoder.layers.6.attention.v_proj.bias": "model.safetensors",
|
| 858 |
+
"tied.embedding.modality_embeddings.0.model.semantic_model.encoder.layers.6.attention.v_proj.weight": "model.safetensors",
|
| 859 |
+
"tied.embedding.modality_embeddings.0.model.semantic_model.encoder.layers.6.feed_forward.intermediate_dense.bias": "model.safetensors",
|
| 860 |
+
"tied.embedding.modality_embeddings.0.model.semantic_model.encoder.layers.6.feed_forward.intermediate_dense.weight": "model.safetensors",
|
| 861 |
+
"tied.embedding.modality_embeddings.0.model.semantic_model.encoder.layers.6.feed_forward.output_dense.bias": "model.safetensors",
|
| 862 |
+
"tied.embedding.modality_embeddings.0.model.semantic_model.encoder.layers.6.feed_forward.output_dense.weight": "model.safetensors",
|
| 863 |
+
"tied.embedding.modality_embeddings.0.model.semantic_model.encoder.layers.6.final_layer_norm.bias": "model.safetensors",
|
| 864 |
+
"tied.embedding.modality_embeddings.0.model.semantic_model.encoder.layers.6.final_layer_norm.weight": "model.safetensors",
|
| 865 |
+
"tied.embedding.modality_embeddings.0.model.semantic_model.encoder.layers.6.layer_norm.bias": "model.safetensors",
|
| 866 |
+
"tied.embedding.modality_embeddings.0.model.semantic_model.encoder.layers.6.layer_norm.weight": "model.safetensors",
|
| 867 |
+
"tied.embedding.modality_embeddings.0.model.semantic_model.encoder.layers.7.attention.k_proj.bias": "model.safetensors",
|
| 868 |
+
"tied.embedding.modality_embeddings.0.model.semantic_model.encoder.layers.7.attention.k_proj.weight": "model.safetensors",
|
| 869 |
+
"tied.embedding.modality_embeddings.0.model.semantic_model.encoder.layers.7.attention.out_proj.bias": "model.safetensors",
|
| 870 |
+
"tied.embedding.modality_embeddings.0.model.semantic_model.encoder.layers.7.attention.out_proj.weight": "model.safetensors",
|
| 871 |
+
"tied.embedding.modality_embeddings.0.model.semantic_model.encoder.layers.7.attention.q_proj.bias": "model.safetensors",
|
| 872 |
+
"tied.embedding.modality_embeddings.0.model.semantic_model.encoder.layers.7.attention.q_proj.weight": "model.safetensors",
|
| 873 |
+
"tied.embedding.modality_embeddings.0.model.semantic_model.encoder.layers.7.attention.v_proj.bias": "model.safetensors",
|
| 874 |
+
"tied.embedding.modality_embeddings.0.model.semantic_model.encoder.layers.7.attention.v_proj.weight": "model.safetensors",
|
| 875 |
+
"tied.embedding.modality_embeddings.0.model.semantic_model.encoder.layers.7.feed_forward.intermediate_dense.bias": "model.safetensors",
|
| 876 |
+
"tied.embedding.modality_embeddings.0.model.semantic_model.encoder.layers.7.feed_forward.intermediate_dense.weight": "model.safetensors",
|
| 877 |
+
"tied.embedding.modality_embeddings.0.model.semantic_model.encoder.layers.7.feed_forward.output_dense.bias": "model.safetensors",
|
| 878 |
+
"tied.embedding.modality_embeddings.0.model.semantic_model.encoder.layers.7.feed_forward.output_dense.weight": "model.safetensors",
|
| 879 |
+
"tied.embedding.modality_embeddings.0.model.semantic_model.encoder.layers.7.final_layer_norm.bias": "model.safetensors",
|
| 880 |
+
"tied.embedding.modality_embeddings.0.model.semantic_model.encoder.layers.7.final_layer_norm.weight": "model.safetensors",
|
| 881 |
+
"tied.embedding.modality_embeddings.0.model.semantic_model.encoder.layers.7.layer_norm.bias": "model.safetensors",
|
| 882 |
+
"tied.embedding.modality_embeddings.0.model.semantic_model.encoder.layers.7.layer_norm.weight": "model.safetensors",
|
| 883 |
+
"tied.embedding.modality_embeddings.0.model.semantic_model.encoder.layers.8.attention.k_proj.bias": "model.safetensors",
|
| 884 |
+
"tied.embedding.modality_embeddings.0.model.semantic_model.encoder.layers.8.attention.k_proj.weight": "model.safetensors",
|
| 885 |
+
"tied.embedding.modality_embeddings.0.model.semantic_model.encoder.layers.8.attention.out_proj.bias": "model.safetensors",
|
| 886 |
+
"tied.embedding.modality_embeddings.0.model.semantic_model.encoder.layers.8.attention.out_proj.weight": "model.safetensors",
|
| 887 |
+
"tied.embedding.modality_embeddings.0.model.semantic_model.encoder.layers.8.attention.q_proj.bias": "model.safetensors",
|
| 888 |
+
"tied.embedding.modality_embeddings.0.model.semantic_model.encoder.layers.8.attention.q_proj.weight": "model.safetensors",
|
| 889 |
+
"tied.embedding.modality_embeddings.0.model.semantic_model.encoder.layers.8.attention.v_proj.bias": "model.safetensors",
|
| 890 |
+
"tied.embedding.modality_embeddings.0.model.semantic_model.encoder.layers.8.attention.v_proj.weight": "model.safetensors",
|
| 891 |
+
"tied.embedding.modality_embeddings.0.model.semantic_model.encoder.layers.8.feed_forward.intermediate_dense.bias": "model.safetensors",
|
| 892 |
+
"tied.embedding.modality_embeddings.0.model.semantic_model.encoder.layers.8.feed_forward.intermediate_dense.weight": "model.safetensors",
|
| 893 |
+
"tied.embedding.modality_embeddings.0.model.semantic_model.encoder.layers.8.feed_forward.output_dense.bias": "model.safetensors",
|
| 894 |
+
"tied.embedding.modality_embeddings.0.model.semantic_model.encoder.layers.8.feed_forward.output_dense.weight": "model.safetensors",
|
| 895 |
+
"tied.embedding.modality_embeddings.0.model.semantic_model.encoder.layers.8.final_layer_norm.bias": "model.safetensors",
|
| 896 |
+
"tied.embedding.modality_embeddings.0.model.semantic_model.encoder.layers.8.final_layer_norm.weight": "model.safetensors",
|
| 897 |
+
"tied.embedding.modality_embeddings.0.model.semantic_model.encoder.layers.8.layer_norm.bias": "model.safetensors",
|
| 898 |
+
"tied.embedding.modality_embeddings.0.model.semantic_model.encoder.layers.8.layer_norm.weight": "model.safetensors",
|
| 899 |
+
"tied.embedding.modality_embeddings.0.model.semantic_model.encoder.layers.9.attention.k_proj.bias": "model.safetensors",
|
| 900 |
+
"tied.embedding.modality_embeddings.0.model.semantic_model.encoder.layers.9.attention.k_proj.weight": "model.safetensors",
|
| 901 |
+
"tied.embedding.modality_embeddings.0.model.semantic_model.encoder.layers.9.attention.out_proj.bias": "model.safetensors",
|
| 902 |
+
"tied.embedding.modality_embeddings.0.model.semantic_model.encoder.layers.9.attention.out_proj.weight": "model.safetensors",
|
| 903 |
+
"tied.embedding.modality_embeddings.0.model.semantic_model.encoder.layers.9.attention.q_proj.bias": "model.safetensors",
|
| 904 |
+
"tied.embedding.modality_embeddings.0.model.semantic_model.encoder.layers.9.attention.q_proj.weight": "model.safetensors",
|
| 905 |
+
"tied.embedding.modality_embeddings.0.model.semantic_model.encoder.layers.9.attention.v_proj.bias": "model.safetensors",
|
| 906 |
+
"tied.embedding.modality_embeddings.0.model.semantic_model.encoder.layers.9.attention.v_proj.weight": "model.safetensors",
|
| 907 |
+
"tied.embedding.modality_embeddings.0.model.semantic_model.encoder.layers.9.feed_forward.intermediate_dense.bias": "model.safetensors",
|
| 908 |
+
"tied.embedding.modality_embeddings.0.model.semantic_model.encoder.layers.9.feed_forward.intermediate_dense.weight": "model.safetensors",
|
| 909 |
+
"tied.embedding.modality_embeddings.0.model.semantic_model.encoder.layers.9.feed_forward.output_dense.bias": "model.safetensors",
|
| 910 |
+
"tied.embedding.modality_embeddings.0.model.semantic_model.encoder.layers.9.feed_forward.output_dense.weight": "model.safetensors",
|
| 911 |
+
"tied.embedding.modality_embeddings.0.model.semantic_model.encoder.layers.9.final_layer_norm.bias": "model.safetensors",
|
| 912 |
+
"tied.embedding.modality_embeddings.0.model.semantic_model.encoder.layers.9.final_layer_norm.weight": "model.safetensors",
|
| 913 |
+
"tied.embedding.modality_embeddings.0.model.semantic_model.encoder.layers.9.layer_norm.bias": "model.safetensors",
|
| 914 |
+
"tied.embedding.modality_embeddings.0.model.semantic_model.encoder.layers.9.layer_norm.weight": "model.safetensors",
|
| 915 |
+
"tied.embedding.modality_embeddings.0.model.semantic_model.encoder.pos_conv_embed.conv.bias": "model.safetensors",
|
| 916 |
+
"tied.embedding.modality_embeddings.0.model.semantic_model.encoder.pos_conv_embed.conv.parametrizations.weight.original0": "model.safetensors",
|
| 917 |
+
"tied.embedding.modality_embeddings.0.model.semantic_model.encoder.pos_conv_embed.conv.parametrizations.weight.original1": "model.safetensors",
|
| 918 |
+
"tied.embedding.modality_embeddings.0.model.semantic_model.feature_extractor.conv_layers.0.conv.weight": "model.safetensors",
|
| 919 |
+
"tied.embedding.modality_embeddings.0.model.semantic_model.feature_extractor.conv_layers.0.layer_norm.bias": "model.safetensors",
|
| 920 |
+
"tied.embedding.modality_embeddings.0.model.semantic_model.feature_extractor.conv_layers.0.layer_norm.weight": "model.safetensors",
|
| 921 |
+
"tied.embedding.modality_embeddings.0.model.semantic_model.feature_extractor.conv_layers.1.conv.weight": "model.safetensors",
|
| 922 |
+
"tied.embedding.modality_embeddings.0.model.semantic_model.feature_extractor.conv_layers.2.conv.weight": "model.safetensors",
|
| 923 |
+
"tied.embedding.modality_embeddings.0.model.semantic_model.feature_extractor.conv_layers.3.conv.weight": "model.safetensors",
|
| 924 |
+
"tied.embedding.modality_embeddings.0.model.semantic_model.feature_extractor.conv_layers.4.conv.weight": "model.safetensors",
|
| 925 |
+
"tied.embedding.modality_embeddings.0.model.semantic_model.feature_extractor.conv_layers.5.conv.weight": "model.safetensors",
|
| 926 |
+
"tied.embedding.modality_embeddings.0.model.semantic_model.feature_extractor.conv_layers.6.conv.weight": "model.safetensors",
|
| 927 |
+
"tied.embedding.modality_embeddings.0.model.semantic_model.feature_projection.layer_norm.bias": "model.safetensors",
|
| 928 |
+
"tied.embedding.modality_embeddings.0.model.semantic_model.feature_projection.layer_norm.weight": "model.safetensors",
|
| 929 |
+
"tied.embedding.modality_embeddings.0.model.semantic_model.feature_projection.projection.bias": "model.safetensors",
|
| 930 |
+
"tied.embedding.modality_embeddings.0.model.semantic_model.feature_projection.projection.weight": "model.safetensors",
|
| 931 |
+
"tied.embedding.modality_embeddings.0.model.semantic_model.masked_spec_embed": "model.safetensors",
|
| 932 |
+
"tied.embedding.text_embedding.weight": "model.safetensors"
|
| 933 |
+
}
|
| 934 |
+
}
|
tokenizer.json
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:eb883de2de5adc5113f1f02b54830a0ea7cd6ef191cde65c41aceb3737d4d1c1
|
| 3 |
+
size 11433924
|
tokenizer_config.json
ADDED
|
@@ -0,0 +1,78 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"add_prefix_space": false,
|
| 3 |
+
"backend": "tokenizers",
|
| 4 |
+
"bos_token": null,
|
| 5 |
+
"clean_up_tokenization_spaces": false,
|
| 6 |
+
"eos_token": "<|endoftext|>",
|
| 7 |
+
"errors": "replace",
|
| 8 |
+
"extra_special_tokens": [
|
| 9 |
+
"<|asr|>",
|
| 10 |
+
"<|streaming_asr|>",
|
| 11 |
+
"<|tts|>",
|
| 12 |
+
"<|streaming_tts|>",
|
| 13 |
+
"<|audio_cont_txt|>",
|
| 14 |
+
"<|audio|>",
|
| 15 |
+
"<|audio_end|>",
|
| 16 |
+
"<|text|>",
|
| 17 |
+
"<|text_end|>",
|
| 18 |
+
"<|eoc|>",
|
| 19 |
+
"<|user|>",
|
| 20 |
+
"<|assistant|>",
|
| 21 |
+
"<|system|>",
|
| 22 |
+
"<|await_audio|>",
|
| 23 |
+
"<|ref_audio|>",
|
| 24 |
+
"<|ref_text|>",
|
| 25 |
+
"<|emotion:elation|>",
|
| 26 |
+
"<|emotion:amusement|>",
|
| 27 |
+
"<|emotion:enthusiasm|>",
|
| 28 |
+
"<|emotion:determination|>",
|
| 29 |
+
"<|emotion:pride|>",
|
| 30 |
+
"<|emotion:contentment|>",
|
| 31 |
+
"<|emotion:affection|>",
|
| 32 |
+
"<|emotion:relief|>",
|
| 33 |
+
"<|emotion:contemplation|>",
|
| 34 |
+
"<|emotion:confusion|>",
|
| 35 |
+
"<|emotion:surprise|>",
|
| 36 |
+
"<|emotion:awe|>",
|
| 37 |
+
"<|emotion:longing|>",
|
| 38 |
+
"<|emotion:arousal|>",
|
| 39 |
+
"<|emotion:anger|>",
|
| 40 |
+
"<|emotion:fear|>",
|
| 41 |
+
"<|emotion:disgust|>",
|
| 42 |
+
"<|emotion:bitterness|>",
|
| 43 |
+
"<|emotion:sadness|>",
|
| 44 |
+
"<|emotion:shame|>",
|
| 45 |
+
"<|emotion:helplessness|>",
|
| 46 |
+
"<|env:music|>",
|
| 47 |
+
"<|env:noise|>",
|
| 48 |
+
"<|style:singing|>",
|
| 49 |
+
"<|style:shouting|>",
|
| 50 |
+
"<|style:whispering|>",
|
| 51 |
+
"<|sfx:cough|>",
|
| 52 |
+
"<|sfx:laughter|>",
|
| 53 |
+
"<|sfx:crying|>",
|
| 54 |
+
"<|sfx:screaming|>",
|
| 55 |
+
"<|sfx:burping|>",
|
| 56 |
+
"<|sfx:humming|>",
|
| 57 |
+
"<|sfx:sigh|>",
|
| 58 |
+
"<|sfx:sniff|>",
|
| 59 |
+
"<|sfx:sneeze|>",
|
| 60 |
+
"<|prosody:speed_very_slow|>",
|
| 61 |
+
"<|prosody:speed_slow|>",
|
| 62 |
+
"<|prosody:speed_fast|>",
|
| 63 |
+
"<|prosody:speed_very_fast|>",
|
| 64 |
+
"<|prosody:pitch_low|>",
|
| 65 |
+
"<|prosody:pitch_high|>",
|
| 66 |
+
"<|prosody:pause|>",
|
| 67 |
+
"<|prosody:long_pause|>",
|
| 68 |
+
"<|chatml|>",
|
| 69 |
+
"<|prosody:expressive_high|>",
|
| 70 |
+
"<|prosody:expressive_low|>"
|
| 71 |
+
],
|
| 72 |
+
"is_local": true,
|
| 73 |
+
"model_max_length": 131072,
|
| 74 |
+
"pad_token": "<|endoftext|>",
|
| 75 |
+
"split_special_tokens": false,
|
| 76 |
+
"tokenizer_class": "Qwen2Tokenizer",
|
| 77 |
+
"unk_token": null
|
| 78 |
+
}
|