wavy-jung commited on
Commit
43402ae
·
verified ·
1 Parent(s): 273bbfc

Upload folder using huggingface_hub

Browse files
.gitattributes CHANGED
@@ -33,3 +33,5 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ assets/performance/flops-vs-mmlu.jpg filter=lfs diff=lfs merge=lfs -text
37
+ assets/performance/kanana-1.5-radar.png filter=lfs diff=lfs merge=lfs -text
LICENSE ADDED
@@ -0,0 +1,194 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ Apache License
2
+ ==============
3
+
4
+ _Version 2.0, January 2004_
5
+ _&lt;<http://www.apache.org/licenses/>&gt;_
6
+
7
+ ### Terms and Conditions for use, reproduction, and distribution
8
+
9
+ #### 1. Definitions
10
+
11
+ “License” shall mean the terms and conditions for use, reproduction, and
12
+ distribution as defined by Sections 1 through 9 of this document.
13
+
14
+ “Licensor” shall mean the copyright owner or entity authorized by the copyright
15
+ owner that is granting the License.
16
+
17
+ “Legal Entity” shall mean the union of the acting entity and all other entities
18
+ that control, are controlled by, or are under common control with that entity.
19
+ For the purposes of this definition, “control” means **(i)** the power, direct or
20
+ indirect, to cause the direction or management of such entity, whether by
21
+ contract or otherwise, or **(ii)** ownership of fifty percent (50%) or more of the
22
+ outstanding shares, or **(iii)** beneficial ownership of such entity.
23
+
24
+ “You” (or “Your”) shall mean an individual or Legal Entity exercising
25
+ permissions granted by this License.
26
+
27
+ “Source” form shall mean the preferred form for making modifications, including
28
+ but not limited to software source code, documentation source, and configuration
29
+ files.
30
+
31
+ “Object” form shall mean any form resulting from mechanical transformation or
32
+ translation of a Source form, including but not limited to compiled object code,
33
+ generated documentation, and conversions to other media types.
34
+
35
+ “Work” shall mean the work of authorship, whether in Source or Object form, made
36
+ available under the License, as indicated by a copyright notice that is included
37
+ in or attached to the work (an example is provided in the Appendix below).
38
+
39
+ “Derivative Works” shall mean any work, whether in Source or Object form, that
40
+ is based on (or derived from) the Work and for which the editorial revisions,
41
+ annotations, elaborations, or other modifications represent, as a whole, an
42
+ original work of authorship. For the purposes of this License, Derivative Works
43
+ shall not include works that remain separable from, or merely link (or bind by
44
+ name) to the interfaces of, the Work and Derivative Works thereof.
45
+
46
+ “Contribution” shall mean any work of authorship, including the original version
47
+ of the Work and any modifications or additions to that Work or Derivative Works
48
+ thereof, that is intentionally submitted to Licensor for inclusion in the Work
49
+ by the copyright owner or by an individual or Legal Entity authorized to submit
50
+ on behalf of the copyright owner. For the purposes of this definition,
51
+ “submitted” means any form of electronic, verbal, or written communication sent
52
+ to the Licensor or its representatives, including but not limited to
53
+ communication on electronic mailing lists, source code control systems, and
54
+ issue tracking systems that are managed by, or on behalf of, the Licensor for
55
+ the purpose of discussing and improving the Work, but excluding communication
56
+ that is conspicuously marked or otherwise designated in writing by the copyright
57
+ owner as “Not a Contribution.”
58
+
59
+ “Contributor” shall mean Licensor and any individual or Legal Entity on behalf
60
+ of whom a Contribution has been received by Licensor and subsequently
61
+ incorporated within the Work.
62
+
63
+ #### 2. Grant of Copyright License
64
+
65
+ Subject to the terms and conditions of this License, each Contributor hereby
66
+ grants to You a perpetual, worldwide, non-exclusive, no-charge, royalty-free,
67
+ irrevocable copyright license to reproduce, prepare Derivative Works of,
68
+ publicly display, publicly perform, sublicense, and distribute the Work and such
69
+ Derivative Works in Source or Object form.
70
+
71
+ #### 3. Grant of Patent License
72
+
73
+ Subject to the terms and conditions of this License, each Contributor hereby
74
+ grants to You a perpetual, worldwide, non-exclusive, no-charge, royalty-free,
75
+ irrevocable (except as stated in this section) patent license to make, have
76
+ made, use, offer to sell, sell, import, and otherwise transfer the Work, where
77
+ such license applies only to those patent claims licensable by such Contributor
78
+ that are necessarily infringed by their Contribution(s) alone or by combination
79
+ of their Contribution(s) with the Work to which such Contribution(s) was
80
+ submitted. If You institute patent litigation against any entity (including a
81
+ cross-claim or counterclaim in a lawsuit) alleging that the Work or a
82
+ Contribution incorporated within the Work constitutes direct or contributory
83
+ patent infringement, then any patent licenses granted to You under this License
84
+ for that Work shall terminate as of the date such litigation is filed.
85
+
86
+ #### 4. Redistribution
87
+
88
+ You may reproduce and distribute copies of the Work or Derivative Works thereof
89
+ in any medium, with or without modifications, and in Source or Object form,
90
+ provided that You meet the following conditions:
91
+
92
+ * **(a)** You must give any other recipients of the Work or Derivative Works a copy of
93
+ this License; and
94
+ * **(b)** You must cause any modified files to carry prominent notices stating that You
95
+ changed the files; and
96
+ * **(c)** You must retain, in the Source form of any Derivative Works that You distribute,
97
+ all copyright, patent, trademark, and attribution notices from the Source form
98
+ of the Work, excluding those notices that do not pertain to any part of the
99
+ Derivative Works; and
100
+ * **(d)** If the Work includes a “NOTICE” text file as part of its distribution, then any
101
+ Derivative Works that You distribute must include a readable copy of the
102
+ attribution notices contained within such NOTICE file, excluding those notices
103
+ that do not pertain to any part of the Derivative Works, in at least one of the
104
+ following places: within a NOTICE text file distributed as part of the
105
+ Derivative Works; within the Source form or documentation, if provided along
106
+ with the Derivative Works; or, within a display generated by the Derivative
107
+ Works, if and wherever such third-party notices normally appear. The contents of
108
+ the NOTICE file are for informational purposes only and do not modify the
109
+ License. You may add Your own attribution notices within Derivative Works that
110
+ You distribute, alongside or as an addendum to the NOTICE text from the Work,
111
+ provided that such additional attribution notices cannot be construed as
112
+ modifying the License.
113
+
114
+ You may add Your own copyright statement to Your modifications and may provide
115
+ additional or different license terms and conditions for use, reproduction, or
116
+ distribution of Your modifications, or for any such Derivative Works as a whole,
117
+ provided Your use, reproduction, and distribution of the Work otherwise complies
118
+ with the conditions stated in this License.
119
+
120
+ #### 5. Submission of Contributions
121
+
122
+ Unless You explicitly state otherwise, any Contribution intentionally submitted
123
+ for inclusion in the Work by You to the Licensor shall be under the terms and
124
+ conditions of this License, without any additional terms or conditions.
125
+ Notwithstanding the above, nothing herein shall supersede or modify the terms of
126
+ any separate license agreement you may have executed with Licensor regarding
127
+ such Contributions.
128
+
129
+ #### 6. Trademarks
130
+
131
+ This License does not grant permission to use the trade names, trademarks,
132
+ service marks, or product names of the Licensor, except as required for
133
+ reasonable and customary use in describing the origin of the Work and
134
+ reproducing the content of the NOTICE file.
135
+
136
+ #### 7. Disclaimer of Warranty
137
+
138
+ Unless required by applicable law or agreed to in writing, Licensor provides the
139
+ Work (and each Contributor provides its Contributions) on an “AS IS” BASIS,
140
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied,
141
+ including, without limitation, any warranties or conditions of TITLE,
142
+ NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A PARTICULAR PURPOSE. You are
143
+ solely responsible for determining the appropriateness of using or
144
+ redistributing the Work and assume any risks associated with Your exercise of
145
+ permissions under this License.
146
+
147
+ #### 8. Limitation of Liability
148
+
149
+ In no event and under no legal theory, whether in tort (including negligence),
150
+ contract, or otherwise, unless required by applicable law (such as deliberate
151
+ and grossly negligent acts) or agreed to in writing, shall any Contributor be
152
+ liable to You for damages, including any direct, indirect, special, incidental,
153
+ or consequential damages of any character arising as a result of this License or
154
+ out of the use or inability to use the Work (including but not limited to
155
+ damages for loss of goodwill, work stoppage, computer failure or malfunction, or
156
+ any and all other commercial damages or losses), even if such Contributor has
157
+ been advised of the possibility of such damages.
158
+
159
+ #### 9. Accepting Warranty or Additional Liability
160
+
161
+ While redistributing the Work or Derivative Works thereof, You may choose to
162
+ offer, and charge a fee for, acceptance of support, warranty, indemnity, or
163
+ other liability obligations and/or rights consistent with this License. However,
164
+ in accepting such obligations, You may act only on Your own behalf and on Your
165
+ sole responsibility, not on behalf of any other Contributor, and only if You
166
+ agree to indemnify, defend, and hold each Contributor harmless for any liability
167
+ incurred by, or claims asserted against, such Contributor by reason of your
168
+ accepting any such warranty or additional liability.
169
+
170
+ _END OF TERMS AND CONDITIONS_
171
+
172
+ ### APPENDIX: How to apply the Apache License to your work
173
+
174
+ To apply the Apache License to your work, attach the following boilerplate
175
+ notice, with the fields enclosed by brackets `[]` replaced with your own
176
+ identifying information. (Don't include the brackets!) The text should be
177
+ enclosed in the appropriate comment syntax for the file format. We also
178
+ recommend that a file or class name and description of purpose be included on
179
+ the same “printed page” as the copyright notice for easier identification within
180
+ third-party archives.
181
+
182
+ Copyright [yyyy] [name of copyright owner]
183
+
184
+ Licensed under the Apache License, Version 2.0 (the "License");
185
+ you may not use this file except in compliance with the License.
186
+ You may obtain a copy of the License at
187
+
188
+ http://www.apache.org/licenses/LICENSE-2.0
189
+
190
+ Unless required by applicable law or agreed to in writing, software
191
+ distributed under the License is distributed on an "AS IS" BASIS,
192
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
193
+ See the License for the specific language governing permissions and
194
+ limitations under the License.
README.md CHANGED
@@ -3,131 +3,1014 @@ language:
3
  - en
4
  - ko
5
  library_name: transformers
6
- license: unlicense
7
  pipeline_tag: text-generation
8
  model_id: kakaocorp/kanana-1.5-2.1b-base
9
  repo: kakaocorp/kanana-1.5-2.1b-base
10
- developers: KananaAlpha LLM
11
  training_regime: bf16 mixed precision
12
- results: '| mmlu (5-shots) [acc] | kmmlu-direct (5-shots) [exact_match] | haerae (5-shots) [acc_norm] | gsm8k (5-shots) [exact_match_strict] | humaneval (0-shots) [pass@1] | mbpp (3-shots) [pass@1] |
13
- |------------------------|----------------------------------------|-------------------------------|----------------------------------------|--------------------------------|---------------------------|
14
- | 56.26 | 45.25 | 76.72 | 53.60 | 53.66 | 56.65 |'
15
- model_summary: Kanana-1.5-2.1b-base is an auto-regressive language model that
16
- uses an optimized transformer architecture. Kanana-1.5-2.1b-base uses a tokenizer
17
- with a vocabulary of 128K tokens, and supports sequence length of 32k.
18
- Grouped-Query Attention (GQA) is used for all models to improve inference efficiency.
19
- training_data: Kanana-1.5-2.1b-base was continuously pretrained from kakaocorp/kanana-essence-2.1b-dus-v1.0.0. Neither the pretraining nor the fine-tuning datasets include Kakao user data.
20
- model-index:
21
- - name: kanana-1.5-2.1b-base
22
- results:
23
- - task:
24
- type: multiple_choice
25
- name: mmlu
26
- dataset:
27
- name: mmlu (5-shots)
28
- type: hails/mmlu_no_train
29
- metrics:
30
- - type: acc
31
- value: 56.26
32
- name: acc
33
- - task:
34
- type: generate_until
35
- name: kmmlu
36
- dataset:
37
- name: kmmlu-direct (5-shots)
38
- type: HAERAE-HUB/KMMLU
39
- metrics:
40
- - type: exact_match
41
- value: 45.25
42
- name: exact_match
43
- - task:
44
- type: multiple_choice
45
- name: haerae
46
- dataset:
47
- name: haerae (5-shots)
48
- type: HAERAE-HUB/HAE_RAE_BENCH
49
- metrics:
50
- - type: acc_norm
51
- value: 76.72
52
- name: acc_norm
53
- - task:
54
- type: generate_until
55
- name: gsm8k
56
- dataset:
57
- name: gsm8k (5-shots)
58
- type: openai/gsm8k
59
- metrics:
60
- - type: exact_match
61
- value: 53.60
62
- name: exact_match_strict
63
- - task:
64
- type: generate_until
65
- name: humaneval
66
- dataset:
67
- name: humaneval (0-shots)
68
- type: openai/openai_humaneval
69
- metrics:
70
- - type: pass@1
71
- value: 53.66
72
- name: pass@1
73
- - task:
74
- type: generate_until
75
- name: mbpp
76
- dataset:
77
- name: mbpp (3-shots)
78
- type: google-research-datasets/mbpp
79
- metrics:
80
- - type: pass@1
81
- value: 56.65
82
- name: pass@1
83
  ---
84
- # Model Card for kakaocorp/kanana-1.5-2.1b-base
 
 
 
 
 
 
 
 
 
 
 
 
85
 
86
- <!-- Provide a quick summary of what the model is/does. -->
87
 
88
- Kanana-1.5-2.1b-base is an auto-regressive language model that uses an optimized transformer architecture. Kanana-1.5-2.1b-base uses a tokenizer with a vocabulary of 128K tokens, and supports sequence length of 32k. Grouped-Query Attention (GQA) is used for all models to improve inference efficiency.
89
 
90
- ## Model Details
91
 
92
- ### Model Description
 
 
 
 
93
 
94
- <!-- Provide a longer summary of what this model is. -->
95
 
 
96
 
 
 
 
 
 
 
 
 
 
 
 
97
 
98
- - **Developed by:** KananaAlpha LLM
99
- - **Language(s) (NLP):** ['en', 'ko']
100
- - **License:** unlicense
101
 
102
- ## Training Details
 
103
 
104
- ### Training Data
 
 
 
105
 
106
- <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
 
107
 
108
- Kanana-1.5-2.1b-base was continuously pretrained from kakaocorp/kanana-essence-2.1b-v1.0.0. Neither the pretraining nor the fine-tuning datasets include Kakao user data.
109
 
110
- ### Training Procedure
111
 
112
- <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
113
 
114
- #### Training Hyperparameters
115
 
116
- - **Training regime:** bf16 mixed precision <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
117
 
118
- ## Evaluation
119
 
120
- <!-- This section describes the evaluation protocols and provides the results. -->
 
 
 
 
121
 
122
- ### Results for General Tasks
 
 
 
123
 
124
- | mmlu (5-shots) [acc] | kmmlu-direct (5-shots) [exact_match] | haerae (5-shots) [acc_norm] | gsm8k (5-shots) [exact_match_strict] | humaneval (0-shots) [pass@1] | mbpp (3-shots) [pass@1] |
125
- |------------------------|----------------------------------------|-------------------------------|----------------------------------------|--------------------------------|---------------------------|
126
- | 56.26 | 45.25 | 76.72 | 53.60 | 53.66 | 56.65 |
 
127
 
128
- ### Results for Long-Context Tasks
129
- | context length | ruler_niah_mk_2 [ruler_recall] | ruler_niah_mk_3 [ruler_recall] | ruler_niah_mv [ruler_recall] | json_kv [substring_exact_match] | niah [avg] | avg |
130
- |---------------|--------------------------------|--------------------------------|------------------------------|----------------------------------|------------|-------|
131
- | 8192 | 100.00 | 99.00 | 97.00 | 100.00 | 98.92 | 98.98 |
132
- | 16384 | 99.00 | 97.00 | 95.75 | 100.00 | 99.21 | 98.19 |
133
- | 32768 | 95.00 | 95.00 | 86.00 | 100.00 | 99.07 | 95.01 |
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3
  - en
4
  - ko
5
  library_name: transformers
6
+ license: apache-2.0
7
  pipeline_tag: text-generation
8
  model_id: kakaocorp/kanana-1.5-2.1b-base
9
  repo: kakaocorp/kanana-1.5-2.1b-base
10
+ developers: Kanana LLM
11
  training_regime: bf16 mixed precision
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
12
  ---
13
+ # Kanana
14
+ <p align="center">
15
+ <br>
16
+ <picture>
17
+ <source media="(prefers-color-scheme: dark)" srcset="assets/logo/kanana-logo-light.png">
18
+ <source media="(prefers-color-scheme: light)" srcset="assets/logo/kanana-logo-dark.png">
19
+ <img alt="Kanana Logo" src="assets/logo/kanana-logo-light.png" width="400" style="margin: 40px auto;">
20
+ </picture>
21
+ </br>
22
+ <p align="center">
23
+ 🤗 <a href="https://kko.kakao.com/kananallm">HF Models</a> &nbsp |
24
+ &nbsp 📕 <a href="https://tech.kakao.com/posts/707">Blog Post</a> &nbsp |
25
+ &nbsp 📜 <a href="https://arxiv.org/abs/2502.18934">Technical Report</a>
26
 
 
27
 
28
+ <br>
29
 
30
+ ## 🔥 News
31
 
32
+ - ✨`2025/05/23`: Published a [blog post](https://tech.kakao.com/posts/707) about `Kanana 1.5` models and released 🤗[HF model weights](https://kko.kakao.com/kananallm).
33
+ - 📜`2025/02/27`: Released [Technical Report](https://arxiv.org/abs/2502.18934) and 🤗[HF model weights](https://huggingface.co/collections/kakaocorp/kanana-nano-21b-67a326cda1c449c8d4172259).
34
+ - 📕`2025/01/10`: Published a [blog post](https://tech.kakao.com/posts/682) about the development of `Kanana Nano` model.
35
+ - 📕`2024/11/14`: Published blog posts ([pre-training](https://tech.kakao.com/posts/661), [post-training](https://tech.kakao.com/posts/662)) about the development of `Kanana` models.
36
+ - ▶️`2024/11/06`: Published a [presentation video](https://youtu.be/HTBl142x9GI?si=o_we6t9suYK8DfX3) about the development of the `Kanana` models.
37
 
38
+ <br>
39
 
40
+ ## Table of Contents
41
 
42
+ - [Kanana 1.5](#kanana-15)
43
+ - [Performance](#performance)
44
+ - [Kanana 1.5 Base Models](#kanana-15-base-models)
45
+ - [Kanana 1.5 Instruct Models](#kanana-15-instruct-models)
46
+ - [Long Context](#long-context)
47
+ - [Processing 32K+ Length](#processing-32k-length)
48
+ - [Kanana 1.0](#kanana-10)
49
+ - [License](#license)
50
+ - [Citation](#citation)
51
+ - [Contributors](#contributors)
52
+ - [Contact](#contact)
53
 
54
+ <br>
 
 
55
 
56
+ ## Kanana 1.5
57
+ `Kanana 1.5`, a newly introduced version of the Kanana model family, presents substantial enhancements in **coding, mathematics, and function calling capabilities** over the previous version, enabling broader application to more complex real-world problems. This new version now can handle __up to 32K tokens length natively and up to 128K tokens using YaRN__, allowing the model to maintain coherence when handling extensive documents or engaging in extended conversations. Furthermore, Kanana 1.5 delivers more natural and accurate conversations through a __refined post-training process__.
58
 
59
+ <p align="center">
60
+ <picture>
61
+ <img src="assets/performance/kanana-1.5-radar.png", width="700" style="margin: 40px auto;">
62
+ </picture>
63
 
64
+ > [!Note]
65
+ > Neither the pre-training nor the post-training data includes Kakao user data.
66
 
67
+ <br>
68
 
69
+ ### Performance
70
 
71
+ #### Kanana 1.5 Base Models
72
+ <table>
73
+ <tr>
74
+ <th>Models</th>
75
+ <th>MMLU</th>
76
+ <th>KMMLU</th>
77
+ <th>HAERAE</th>
78
+ <th>HumanEval</th>
79
+ <th>MBPP</th>
80
+ <th>GSM8K</th>
81
+ </tr>
82
+ <tr>
83
+ <td><strong>Kanana-Flag-1.5-32.5B</strong></td>
84
+ <td align="center">76.76</td>
85
+ <td align="center">61.90</td>
86
+ <td align="center">89.18</td>
87
+ <td align="center">73.17</td>
88
+ <td align="center">65.60</td>
89
+ <td align="center">81.50</td>
90
+ </tr>
91
+ <tr>
92
+ <td><strong>Kanana-Flag-32.5B</strong></td>
93
+ <td align="center">77.68</td>
94
+ <td align="center">62.10</td>
95
+ <td align="center">90.47</td>
96
+ <td align="center">51.22</td>
97
+ <td align="center">63.40</td>
98
+ <td align="center">70.05</td>
99
+ </tr>
100
+ <tr>
101
+ <td><strong>Kanana-Essence-1.5-9.8B</strong></td>
102
+ <td align="center">68.27</td>
103
+ <td align="center">52.78</td>
104
+ <td align="center">86.34</td>
105
+ <td align="center">64.63</td>
106
+ <td align="center">61.60</td>
107
+ <td align="center">71.57</td>
108
+ </tr>
109
+ <tr>
110
+ <td><strong>Kanana-Essence-9.8B</strong></td>
111
+ <td align="center">67.61</td>
112
+ <td align="center">50.57</td>
113
+ <td align="center">84.97</td>
114
+ <td align="center">40.24</td>
115
+ <td align="center">53.60</td>
116
+ <td align="center">63.61</td>
117
+ </tr>
118
+ <tr>
119
+ <td><strong>Kanana-1.5-8B</strong></td>
120
+ <td align="center">64.24</td>
121
+ <td align="center">48.94</td>
122
+ <td align="center">82.77</td>
123
+ <td align="center">61.59</td>
124
+ <td align="center">57.80</td>
125
+ <td align="center">63.53</td>
126
+ </tr>
127
+ <tr>
128
+ <td><strong>Kanana-8B</strong></td>
129
+ <td align="center">64.22</td>
130
+ <td align="center">48.30</td>
131
+ <td align="center">83.41</td>
132
+ <td align="center">40.24</td>
133
+ <td align="center">51.40</td>
134
+ <td align="center">57.09</td>
135
+ </tr>
136
+ <tr>
137
+ <td><strong>Kanana-Nano-1.5-3B</strong></td>
138
+ <td align="center">59.23</td>
139
+ <td align="center">47.30</td>
140
+ <td align="center">78.00</td>
141
+ <td align="center">46.34</td>
142
+ <td align="center">46.80</td>
143
+ <td align="center">61.79</td>
144
+ </tr>
145
+ <tr>
146
+ <td><strong>Kanana-1.5-2.1B</strong></td>
147
+ <td align="center">56.30</td>
148
+ <td align="center">45.10</td>
149
+ <td align="center">77.46</td>
150
+ <td align="center">52.44</td>
151
+ <td align="center">47.00</td>
152
+ <td align="center">55.95</td>
153
+ </tr>
154
+ <tr>
155
+ <td><strong>Kanana-Nano-2.1B</strong></td>
156
+ <td align="center">54.83</td>
157
+ <td align="center">44.80</td>
158
+ <td align="center">77.09</td>
159
+ <td align="center">31.10</td>
160
+ <td align="center">46.20</td>
161
+ <td align="center">46.32</td>
162
+ </tr>
163
+ </table>
164
 
 
165
 
166
+ #### Kanana 1.5 Instruct Models
167
+ <table>
168
+ <tr>
169
+ <th>Models</th>
170
+ <th>MT-Bench</th>
171
+ <th>KoMT-Bench</th>
172
+ <th>IFEval</th>
173
+ <th>HumanEval+</th>
174
+ <th>MBPP+</th>
175
+ <th>GSM8K (0-shot)</th>
176
+ <th>MATH</th>
177
+ <th>MMLU (0-shot, CoT)</th>
178
+ <th>KMMLU (0-shot, CoT)</th>
179
+ <th>FunctionChatBench</th>
180
+ </tr>
181
+ <tr>
182
+ <td><strong>Kanana-Flag-1.5-32.5B</strong></td>
183
+ <td align="center">8.13</td>
184
+ <td align="center">8.12</td>
185
+ <td align="center">82.70</td>
186
+ <td align="center">79.88</td>
187
+ <td align="center">71.96</td>
188
+ <td align="center">93.03</td>
189
+ <td align="center">75.96</td>
190
+ <td align="center">82.76</td>
191
+ <td align="center">64.10</td>
192
+ <td align="center">67.17</td>
193
+ </tr>
194
+ <tr>
195
+ <td><strong>Kanana-Flag-32.5B</strong></td>
196
+ <td align="center">8.33</td>
197
+ <td align="center">8.03</td>
198
+ <td align="center">84.59</td>
199
+ <td align="center">78.66</td>
200
+ <td align="center">69.84</td>
201
+ <td align="center">91.66</td>
202
+ <td align="center">58.08</td>
203
+ <td align="center">81.08</td>
204
+ <td align="center">64.19</td>
205
+ <td align="center">65.67</td>
206
+ </tr>
207
+ <tr>
208
+ <td><strong>Kanana-Essence-1.5-9.8B</strong></td>
209
+ <td align="center">7.88</td>
210
+ <td align="center">7.35</td>
211
+ <td align="center">76.34</td>
212
+ <td align="center">72.56</td>
213
+ <td align="center">66.93</td>
214
+ <td align="center">90.07</td>
215
+ <td align="center">62.02</td>
216
+ <td align="center">72.85</td>
217
+ <td align="center">52.00</td>
218
+ <td align="center">51.43</td>
219
+ </tr>
220
+ <tr>
221
+ <td><strong>Kanana-Essence-9.8B</strong></td>
222
+ <td align="center">7.81</td>
223
+ <td align="center">7.65</td>
224
+ <td align="center">80.20</td>
225
+ <td align="center">72.56</td>
226
+ <td align="center">68.52</td>
227
+ <td align="center">84.91</td>
228
+ <td align="center">42.24</td>
229
+ <td align="center">70.64</td>
230
+ <td align="center">50.76</td>
231
+ <td align="center">26.77</td>
232
+ </tr>
233
+ <tr>
234
+ <td><strong>Kanana-1.5-8B*</strong></td>
235
+ <td align="center">7.76</td>
236
+ <td align="center">7.63</td>
237
+ <td align="center">80.11</td>
238
+ <td align="center">76.83</td>
239
+ <td align="center">67.99</td>
240
+ <td align="center">87.64</td>
241
+ <td align="center">67.54</td>
242
+ <td align="center">68.82</td>
243
+ <td align="center">48.28</td>
244
+ <td align="center">58.00</td>
245
+ </tr>
246
+ <tr>
247
+ <td><strong>Kanana-8B</strong></td>
248
+ <td align="center">7.13</td>
249
+ <td align="center">6.92</td>
250
+ <td align="center">76.91</td>
251
+ <td align="center">62.20</td>
252
+ <td align="center">43.92</td>
253
+ <td align="center">79.23</td>
254
+ <td align="center">37.68</td>
255
+ <td align="center">66.50</td>
256
+ <td align="center">47.43</td>
257
+ <td align="center">17.37</td>
258
+ </tr>
259
+ <tr>
260
+ <td><strong>Kanana-Nano-1.5-3B</strong></td>
261
+ <td align="center">7.01</td>
262
+ <td align="center">6.52</td>
263
+ <td align="center">70.08</td>
264
+ <td align="center">70.73</td>
265
+ <td align="center">64.29</td>
266
+ <td align="center">80.36</td>
267
+ <td align="center">56.70</td>
268
+ <td align="center">59.69</td>
269
+ <td align="center">37.60</td>
270
+ <td align="center">55.37</td>
271
+ </tr>
272
+ <tr>
273
+ <td><strong>Kanana-1.5-2.1B*</strong></td>
274
+ <td align="center">7.01</td>
275
+ <td align="center">6.54</td>
276
+ <td align="center">68.61</td>
277
+ <td align="center">68.90</td>
278
+ <td align="center">65.08</td>
279
+ <td align="center">81.43</td>
280
+ <td align="center">60.62</td>
281
+ <td align="center">53.87</td>
282
+ <td align="center">32.93</td>
283
+ <td align="center">53.70</td>
284
+ </tr>
285
+ <tr>
286
+ <td><strong>Kanana-Nano-2.1B</strong></td>
287
+ <td align="center">6.40</td>
288
+ <td align="center">5.90</td>
289
+ <td align="center">71.97</td>
290
+ <td align="center">63.41</td>
291
+ <td align="center">62.43</td>
292
+ <td align="center">72.32</td>
293
+ <td align="center">29.26</td>
294
+ <td align="center">52.48</td>
295
+ <td align="center">38.51</td>
296
+ <td align="center">26.10</td>
297
+ </tr>
298
+ </table>
299
 
300
+ > \* Models released under the Apache 2.0 license have been trained on more recent data compared to other models.
301
 
302
+ #### Long Context
303
+ ##### Kanana-1.5-32.5B-Base
304
+ Below is a Needle-in-a-Haystack performance of `Kanana-1.5-32.5B-Base` model which was trained on a target context length of 32K.
305
+ - (left): evaluated with native 32K context length
306
+ - (right): extended to 128K context length using YaRN
307
 
308
+ <p align="center">
309
+ <picture>
310
+ <img src="assets/performance/niah-32.5b-base.png", width="1000" style="margin: 40px auto;">
311
+ </picture>
312
 
313
+ ##### Kanana-1.5-32.5B-Instruct
314
+ Below is a Needle-in-a-Haystack performance of `Kanana-1.5-32.5B-Instruct` model which was trained on a target context length of 32K.
315
+ - (left): evaluated with native 32K context length
316
+ - (right): extended to 128K context length using YaRN
317
 
318
+ <p align="center">
319
+ <picture>
320
+ <img src="assets/performance/niah-32.5b-inst.png", width="1000" style="margin: 40px auto;">
321
+ </picture>
322
+
323
+ <br>
324
+
325
+ #### Processing 32K+ Length
326
+ Currently, the `config.json` uploaded to HuggingFace is configured for token lengths of 32,768 or less. To process tokens beyond this length, YaRN must be applied. By updating the `config.json` with the following parameters, you can apply YaRN to handle token sequences up to 128K in length:
327
+ ```json
328
+ "rope_scaling": {
329
+ "factor": 4.4,
330
+ "original_max_position_embeddings": 32768,
331
+ "type": "yarn",
332
+ "beta_fast": 64,
333
+ "beta_slow": 2
334
+ },
335
+ ```
336
+
337
+ <br>
338
+
339
+
340
+ ## Kanana 1.0
341
+
342
+ ### Kanana 1.0 Introduction
343
+
344
+ <details>
345
+ <summary>View the details about Kanana 1.0</summary>
346
+ We introduce Kanana, a series of bilingual language models (developed by [Kakao](https://github.com/kakao)) that demonstrate exceeding performance in Korean and competitive performance in English. The computational cost of Kanana is significantly lower than that of state-of-the-art models of similar size. The report details the techniques employed during pre-training to achieve compute-efficient yet competitive models, including high-quality data filtering, staged pre-training, depth up-scaling, and pruning and distillation. Furthermore, the report outlines the methodologies utilized during the post-training of the Kanana models, encompassing supervised fine-tuning and preference optimization, aimed at enhancing their capability for seamless interaction with users. Lastly, the report elaborates on plausible approaches used for language model adaptation to specific scenarios, such as embedding, function calling, and Retrieval Augmented Generation (RAG). The Kanana model series spans from 2.1B to 32.5B parameters with 2.1B models (base, instruct, embedding, function call, and RAG) publicly released to promote research on Korean language models.
347
+
348
+ > Neither the pre-training nor the post-training data includes Kakao user data.
349
+
350
+ <p align="center">
351
+ <picture>
352
+ <img src="assets/performance/flops-vs-mmlu.jpg", width="700" style="margin: 40px auto;">
353
+ </picture>
354
+
355
+ </details>
356
+
357
+
358
+ ### Kanana 1.0 Models Performance
359
+ <details>
360
+ <summary>View detailed performance of Kanana 1.0 models</summary>
361
+
362
+ Below are partial report on the performance of the `Kanana` model series. Please refer to the [Technical Report](https://arxiv.org/abs/2502.18934) for the full results.
363
+
364
+ #### Pre-trained Model Performance
365
+
366
+ <table>
367
+ <tr>
368
+ <th>Models</th>
369
+ <th>MMLU</th>
370
+ <th>KMMLU</th>
371
+ <th>HAERAE</th>
372
+ <th>HumanEval</th>
373
+ <th>MBPP</th>
374
+ <th>GSM8K</th>
375
+ </tr>
376
+ <tr>
377
+ <th colspan="8" height="30px">27b+ scale</th>
378
+ </tr>
379
+ <tr>
380
+ <td>Kanana-Flag-32.5b</td>
381
+ <td align="center">77.68</td>
382
+ <td align="center">62.10</td>
383
+ <td align="center"><strong>90.47</strong></td>
384
+ <td align="center"><strong>51.22</strong></td>
385
+ <td align="center">63.40</td>
386
+ <td align="center">70.05</td>
387
+ </tr>
388
+ <tr>
389
+ <td>Qwen2.5-32b</td>
390
+ <td align="center"><strong>83.10</strong></td>
391
+ <td align="center"><strong>63.15</strong></td>
392
+ <td align="center">75.16</td>
393
+ <td align="center">50.00</td>
394
+ <td align="center">73.40</td>
395
+ <td align="center"><strong>82.41</strong></td>
396
+ </tr>
397
+ <tr>
398
+ <td>Gemma-2-27b</td>
399
+ <td align="center">75.45</td>
400
+ <td align="center">51.16</td>
401
+ <td align="center">69.11</td>
402
+ <td align="center"><strong>51.22</strong></td>
403
+ <td align="center">64.60</td>
404
+ <td align="center">74.37</td>
405
+ </tr>
406
+ <tr>
407
+ <td>EXAONE-3.5-32b</td>
408
+ <td align="center">72.68</td>
409
+ <td align="center">46.36</td>
410
+ <td align="center">82.22</td>
411
+ <td align="center">-</td>
412
+ <td align="center">-</td>
413
+ <td align="center">-</td>
414
+ </tr>
415
+ <tr>
416
+ <td>Aya-Expanse-32b</td>
417
+ <td align="center">74.52</td>
418
+ <td align="center">49.57</td>
419
+ <td align="center">80.66</td>
420
+ <td align="center">-</td>
421
+ <td align="center">-</td>
422
+ <td align="center">-</td>
423
+ </tr>
424
+ <tr>
425
+ <th colspan="8" height="30px">7b+ scale</th>
426
+ </tr>
427
+ <tr>
428
+ <td>Kanana-Essence-9.8b</td>
429
+ <td align="center">67.61</td>
430
+ <td align="center">50.57</td>
431
+ <td align="center"><strong>84.98</strong></td>
432
+ <td align="center">40.24</td>
433
+ <td align="center">53.60</td>
434
+ <td align="center">63.61</td>
435
+ </tr>
436
+ <tr>
437
+ <td>Llama-3.1-8b</td>
438
+ <td align="center">65.18</td>
439
+ <td align="center">41.02</td>
440
+ <td align="center">61.78</td>
441
+ <td align="center">35.37</td>
442
+ <td align="center">48.60</td>
443
+ <td align="center">50.87</td>
444
+ </tr>
445
+ <tr>
446
+ <td>Qwen2.5-7b</td>
447
+ <td align="center"><strong>74.19</strong></td>
448
+ <td align="center"><strong>51.68</strong></td>
449
+ <td align="center">67.46</td>
450
+ <td align="center"><strong>56.71</strong></td>
451
+ <td align="center"><strong>63.20</strong></td>
452
+ <td align="center"><strong>83.85</strong></td>
453
+ </tr>
454
+ <tr>
455
+ <td>Gemma-2-9b</td>
456
+ <td align="center">70.34</td>
457
+ <td align="center">48.18</td>
458
+ <td align="center">66.18</td>
459
+ <td align="center">37.20</td>
460
+ <td align="center">53.60</td>
461
+ <td align="center">68.16</td>
462
+ </tr>
463
+ <tr>
464
+ <td>EXAONE-3.5-7.8b</td>
465
+ <td align="center">65.36</td>
466
+ <td align="center">45.30</td>
467
+ <td align="center">77.54</td>
468
+ <td align="center">-</td>
469
+ <td align="center">-</td>
470
+ <td align="center">-</td>
471
+ </tr>
472
+ <tr>
473
+ <td>Aya-Expanse-8b</td>
474
+ <td align="center">62.52</td>
475
+ <td align="center">40.11</td>
476
+ <td align="center">71.95</td>
477
+ <td align="center">-</td>
478
+ <td align="center">-</td>
479
+ <td align="center">-</td>
480
+ </tr>
481
+ <tr>
482
+ <th colspan="8" height="30px">2b+ scale</th>
483
+ </tr>
484
+ <tr>
485
+ <td>Kanana-Nano-2.1b</td>
486
+ <td align="center">54.83</td>
487
+ <td align="center">44.80</td>
488
+ <td align="center"><strong>77.09</strong></td>
489
+ <td align="center">31.10</td>
490
+ <td align="center">46.20</td>
491
+ <td align="center">46.32</td>
492
+ </tr>
493
+ <tr>
494
+ <td>Llama-3.2-3b</td>
495
+ <td align="center">56.40</td>
496
+ <td align="center">35.57</td>
497
+ <td align="center">47.66</td>
498
+ <td align="center">25.61</td>
499
+ <td align="center">39.00</td>
500
+ <td align="center">27.37</td>
501
+ </tr>
502
+ <tr>
503
+ <td>Qwen2.5-3b</td>
504
+ <td align="center"><strong>65.57</strong></td>
505
+ <td align="center"><strong>45.28</strong></td>
506
+ <td align="center">61.32</td>
507
+ <td align="center"><strong>37.80</strong></td>
508
+ <td align="center"><strong>55.60</strong></td>
509
+ <td align="center"><strong>69.07</strong></td>
510
+ </tr>
511
+ <tr>
512
+ <td>Gemma-2-2b</td>
513
+ <td align="center">52.89</td>
514
+ <td align="center">30.67</td>
515
+ <td align="center">45.55</td>
516
+ <td align="center">20.12</td>
517
+ <td align="center">28.20</td>
518
+ <td align="center">24.72</td>
519
+ </tr>
520
+ <tr>
521
+ <td>EXAONE-3.5-2.4b</td>
522
+ <td align="center">59.27</td>
523
+ <td align="center">43.58</td>
524
+ <td align="center">69.65</td>
525
+ <td align="center">-</td>
526
+ <td align="center">-</td>
527
+ <td align="center">-</td>
528
+ </tr>
529
+ <tr>
530
+ <th colspan="8" height="30px">70b+ scale</th>
531
+ </tr>
532
+ <tr>
533
+ <td>Llama-3.1-70b</td>
534
+ <td align="center">78.93</td>
535
+ <td align="center">53.00</td>
536
+ <td align="center">76.35</td>
537
+ <td align="center">57.32</td>
538
+ <td align="center">66.60</td>
539
+ <td align="center">81.73</td>
540
+ </tr>
541
+ <tr>
542
+ <td>Qwen2.5-72b</td>
543
+ <td align="center">86.12</td>
544
+ <td align="center">68.57</td>
545
+ <td align="center">80.84</td>
546
+ <td align="center">55.49</td>
547
+ <td align="center">76.40</td>
548
+ <td align="center">92.04</td>
549
+ </tr>
550
+ </table>
551
+
552
+ <br>
553
+
554
+
555
+ #### Post-trained Model Performance
556
+
557
+ ##### Instruction-following Benchmarks
558
+ <table>
559
+ <tr>
560
+ <th>Models</th>
561
+ <th>MT-Bench</th>
562
+ <th>LogicKor</th>
563
+ <th>KoMT-Bench</th>
564
+ <th>WildBench</th>
565
+ <th>IFEval</th>
566
+ </tr>
567
+ <tr>
568
+ <th colspan="8" height="30px">27b+ scale</th>
569
+ </tr>
570
+ <tr>
571
+ <td>Kanana-Flag-32.5b</td>
572
+ <td align="center">8.356</td>
573
+ <td align="center"><strong>9.524</strong></td>
574
+ <td align="center"><strong>8.058</strong></td>
575
+ <td align="center">54.14</td>
576
+ <td align="center"><strong>0.856</strong></td>
577
+ </tr>
578
+ <tr>
579
+ <td>Qwen2.5-32b</td>
580
+ <td align="center">8.331</td>
581
+ <td align="center">8.988</td>
582
+ <td align="center">7.847</td>
583
+ <td align="center">51.13</td>
584
+ <td align="center">0.822</td>
585
+ </tr>
586
+ <tr>
587
+ <td>Gemma-2-27b</td>
588
+ <td align="center">8.088</td>
589
+ <td align="center">8.869</td>
590
+ <td align="center">7.373</td>
591
+ <td align="center">46.46</td>
592
+ <td align="center">0.817</td>
593
+ </tr>
594
+ <tr>
595
+ <td>EXAONE-3.5-32b</td>
596
+ <td align="center"><strong>8.375</strong></td>
597
+ <td align="center">9.202</td>
598
+ <td align="center">7.907</td>
599
+ <td align="center"><strong>54.30</strong></td>
600
+ <td align="center">0.845</td>
601
+ </tr>
602
+ <tr>
603
+ <td>Aya-Expanse-32b</td>
604
+ <td align="center">7.788</td>
605
+ <td align="center">8.941</td>
606
+ <td align="center">7.626</td>
607
+ <td align="center">48.36</td>
608
+ <td align="center">0.735</td>
609
+ </tr>
610
+ <tr>
611
+ <th colspan="8" height="30px">7b+ scale</th>
612
+ </tr>
613
+ <tr>
614
+ <td>Kanana-Essence-9.8b</td>
615
+ <td align="center">7.769</td>
616
+ <td align="center">8.964</td>
617
+ <td align="center">7.706</td>
618
+ <td align="center">47.27</td>
619
+ <td align="center">0.799</td>
620
+ </tr>
621
+ <tr>
622
+ <td>Llama-3.1-8b</td>
623
+ <td align="center">7.500</td>
624
+ <td align="center">6.512</td>
625
+ <td align="center">5.336</td>
626
+ <td align="center">33.20</td>
627
+ <td align="center">0.772</td>
628
+ </tr>
629
+ <tr>
630
+ <td>Qwen2.5-7b</td>
631
+ <td align="center">7.625</td>
632
+ <td align="center">7.952</td>
633
+ <td align="center">6.808</td>
634
+ <td align="center">41.31</td>
635
+ <td align="center">0.760</td>
636
+ </tr>
637
+ <tr>
638
+ <td>Gemma-2-9b</td>
639
+ <td align="center">7.633</td>
640
+ <td align="center">8.643</td>
641
+ <td align="center">7.029</td>
642
+ <td align="center">40.92</td>
643
+ <td align="center">0.750</td>
644
+ </tr>
645
+ <tr>
646
+ <td>EXAONE-3.5-7.8b</td>
647
+ <td align="center"><strong>8.213</strong></td>
648
+ <td align="center"><strong>9.357</strong></td>
649
+ <td align="center"><strong>8.013</strong></td>
650
+ <td align="center"><strong>50.98</strong></td>
651
+ <td align="center"><strong>0.826</strong></td>
652
+ </tr>
653
+ <tr>
654
+ <td>Aya-Expanse-8b</td>
655
+ <td align="center">7.131</td>
656
+ <td align="center">8.357</td>
657
+ <td align="center">7.006</td>
658
+ <td align="center">38.50</td>
659
+ <td align="center">0.645</td>
660
+ </tr>
661
+ <tr>
662
+ <th colspan="8" height="30px">2b+ scale</th>
663
+ </tr>
664
+ <tr>
665
+ <td>Kanana-Nano-2.1b</td>
666
+ <td align="center">6.400</td>
667
+ <td align="center">7.964</td>
668
+ <td align="center">5.857</td>
669
+ <td align="center">25.41</td>
670
+ <td align="center">0.720</td>
671
+ </tr>
672
+ <tr>
673
+ <td>Llama-3.2-3b</td>
674
+ <td align="center">7.050</td>
675
+ <td align="center">4.452</td>
676
+ <td align="center">3.967</td>
677
+ <td align="center">21.91</td>
678
+ <td align="center">0.767</td>
679
+ </tr>
680
+ <tr>
681
+ <td>Qwen2.5-3b</td>
682
+ <td align="center">6.969</td>
683
+ <td align="center">6.488</td>
684
+ <td align="center">5.274</td>
685
+ <td align="center">25.76</td>
686
+ <td align="center">0.355</td>
687
+ </tr>
688
+ <tr>
689
+ <td>Gemma-2-2b</td>
690
+ <td align="center">7.225</td>
691
+ <td align="center">5.917</td>
692
+ <td align="center">4.835</td>
693
+ <td align="center">28.71</td>
694
+ <td align="center">0.428</td>
695
+ </tr>
696
+ <tr>
697
+ <td>EXAONE-3.5-2.4b</td>
698
+ <td align="center"><strong>7.919</strong></td>
699
+ <td align="center"><strong>8.941</strong></td>
700
+ <td align="center"><strong>7.223</strong></td>
701
+ <td align="center"><strong>41.68</strong></td>
702
+ <td align="center"><strong>0.790</strong></td>
703
+ </tr>
704
+ <tr>
705
+ <th colspan="8" height="30px">70b+ scale</th>
706
+ </tr>
707
+ <tr>
708
+ <td>Llama-3.1-70b</td>
709
+ <td align="center">8.275</td>
710
+ <td align="center">8.250</td>
711
+ <td align="center">6.970</td>
712
+ <td align="center">46.50</td>
713
+ <td align="center">0.875</td>
714
+ </tr>
715
+ <tr>
716
+ <td>Qwen2.5-72b</td>
717
+ <td align="center">8.619</td>
718
+ <td align="center">9.214</td>
719
+ <td align="center">8.281</td>
720
+ <td align="center">55.25</td>
721
+ <td align="center">0.861</td>
722
+ </tr>
723
+ </table>
724
+
725
+ <br>
726
+
727
+ ##### General Benchmarks
728
+
729
+ <table>
730
+ <tr>
731
+ <th>Models</th>
732
+ <th>MMLU</th>
733
+ <th>KMMLU</th>
734
+ <th>HAE-RAE</th>
735
+ <th>HumanEval+</th>
736
+ <th>MBPP+</th>
737
+ <th>GSM8K</th>
738
+ <th>MATH</th>
739
+ </tr>
740
+ <tr>
741
+ <th colspan="8" height="30px">27b+ scale</th>
742
+ </tr>
743
+ <tr>
744
+ <td>Kanana-Flag-32.5b</td>
745
+ <td align="center">81.08</td>
746
+ <td align="center"><strong>64.19</strong></td>
747
+ <td align="center"><strong>68.18</strong></td>
748
+ <td align="center">77.44</td>
749
+ <td align="center">69.84</td>
750
+ <td align="center">90.83</td>
751
+ <td align="center">57.82</td>
752
+ </tr>
753
+ <tr>
754
+ <td>Qwen2.5-32b</td>
755
+ <td align="center"><strong>84.40</strong></td>
756
+ <td align="center">59.37</td>
757
+ <td align="center">48.30</td>
758
+ <td align="center"><strong>82.32</strong></td>
759
+ <td align="center"><strong>71.96</strong></td>
760
+ <td align="center"><strong>95.30</strong></td>
761
+ <td align="center"><strong>81.90</strong></td>
762
+ </tr>
763
+ <tr>
764
+ <td>Gemma-2-27b</td>
765
+ <td align="center">78.01</td>
766
+ <td align="center">49.98</td>
767
+ <td align="center">46.02</td>
768
+ <td align="center">70.12</td>
769
+ <td align="center">70.90</td>
770
+ <td align="center">91.05</td>
771
+ <td align="center">53.80</td>
772
+ </tr>
773
+ <tr>
774
+ <td>EXAONE-3.5-32b</td>
775
+ <td align="center">78.30</td>
776
+ <td align="center">55.44</td>
777
+ <td align="center">52.27</td>
778
+ <td align="center">78.66</td>
779
+ <td align="center">70.90</td>
780
+ <td align="center">93.56</td>
781
+ <td align="center">76.80</td>
782
+ </tr>
783
+ <tr>
784
+ <td>Aya-Expanse-32b</td>
785
+ <td align="center">74.49</td>
786
+ <td align="center">42.35</td>
787
+ <td align="center">51.14</td>
788
+ <td align="center">64.63</td>
789
+ <td align="center">65.61</td>
790
+ <td align="center">75.06</td>
791
+ <td align="center">42.82</td>
792
+ </tr>
793
+ <tr>
794
+ <th colspan="8" height="30px">7b+ scale</th>
795
+ </tr>
796
+ <tr>
797
+ <td>Kanana-Essence-9.8b</td>
798
+ <td align="center">70.64</td>
799
+ <td align="center">50.76</td>
800
+ <td align="center"><strong>47.16</strong></td>
801
+ <td align="center">72.56</td>
802
+ <td align="center">69.05</td>
803
+ <td align="center">84.91</td>
804
+ <td align="center">42.24</td>
805
+ </tr>
806
+ <tr>
807
+ <td>Llama-3.1-8b</td>
808
+ <td align="center">71.18</td>
809
+ <td align="center">39.24</td>
810
+ <td align="center">40.91</td>
811
+ <td align="center">60.98</td>
812
+ <td align="center">57.67</td>
813
+ <td align="center">82.71</td>
814
+ <td align="center">49.86</td>
815
+ </tr>
816
+ <tr>
817
+ <td>Qwen2.5-7b</td>
818
+ <td align="center"><strong>77.23</strong></td>
819
+ <td align="center">46.87</td>
820
+ <td align="center">37.50</td>
821
+ <td align="center">73.78</td>
822
+ <td align="center"><strong>70.63</strong></td>
823
+ <td align="center"><strong>91.58</strong></td>
824
+ <td align="center"><strong>75.22</strong></td>
825
+ </tr>
826
+ <tr>
827
+ <td>Gemma-2-9b</td>
828
+ <td align="center">73.47</td>
829
+ <td align="center">44.47</td>
830
+ <td align="center">39.77</td>
831
+ <td align="center">59.76</td>
832
+ <td align="center">64.55</td>
833
+ <td align="center">87.72</td>
834
+ <td align="center">48.10</td>
835
+ </tr>
836
+ <tr>
837
+ <td>EXAONE-3.5-7.8b</td>
838
+ <td align="center">72.62</td>
839
+ <td align="center"><strong>52.09</strong></td>
840
+ <td align="center">46.02</td>
841
+ <td align="center"><strong>79.27</strong></td>
842
+ <td align="center">66.67</td>
843
+ <td align="center">89.99</td>
844
+ <td align="center">73.50</td>
845
+ </tr>
846
+ <tr>
847
+ <td>Aya-Expanse-8b</td>
848
+ <td align="center">61.23</td>
849
+ <td align="center">35.78</td>
850
+ <td align="center">39.20</td>
851
+ <td align="center">42.68</td>
852
+ <td align="center">56.88</td>
853
+ <td align="center">78.85</td>
854
+ <td align="center">30.80</td>
855
+ </tr>
856
+ <tr>
857
+ <th colspan="8" height="30px">2b+ scale</th>
858
+ </tr>
859
+ <tr>
860
+ <td>Kanana-Nano-2.1b</td>
861
+ <td align="center">52.48</td>
862
+ <td align="center"><strong>38.51</strong></td>
863
+ <td align="center"><strong>33.52</strong></td>
864
+ <td align="center">63.41</td>
865
+ <td align="center">62.43</td>
866
+ <td align="center">72.32</td>
867
+ <td align="center">29.26</td>
868
+ </tr>
869
+ <tr>
870
+ <td>Llama-3.2-3b</td>
871
+ <td align="center">56.09</td>
872
+ <td align="center">3.07</td>
873
+ <td align="center">17.05</td>
874
+ <td align="center">56.71</td>
875
+ <td align="center">50.26</td>
876
+ <td align="center">66.57</td>
877
+ <td align="center">38.18</td>
878
+ </tr>
879
+ <tr>
880
+ <td>Qwen2.5-3b</td>
881
+ <td align="center"><strong>69.18</strong></td>
882
+ <td align="center">38.33</td>
883
+ <td align="center">32.39</td>
884
+ <td align="center">67.68</td>
885
+ <td align="center"><strong>64.02</strong></td>
886
+ <td align="center"><strong>84.00</strong></td>
887
+ <td align="center"><strong>65.72</strong></td>
888
+ </tr>
889
+ <tr>
890
+ <td>Gemma-2-2b</td>
891
+ <td align="center">57.69</td>
892
+ <td align="center">6.99</td>
893
+ <td align="center">7.95</td>
894
+ <td align="center">35.37</td>
895
+ <td align="center">45.24</td>
896
+ <td align="center">49.81</td>
897
+ <td align="center">21.68</td>
898
+ </tr>
899
+ <tr>
900
+ <td>EXAONE-3.5-2.4b</td>
901
+ <td align="center">63.19</td>
902
+ <td align="center">14.27</td>
903
+ <td align="center">14.20</td>
904
+ <td align="center"><strong>70.73</strong></td>
905
+ <td align="center">59.79</td>
906
+ <td align="center">83.78</td>
907
+ <td align="center">64.04</td>
908
+ </tr>
909
+ <tr>
910
+ <th colspan="8" height="30px">70b+ scale</th>
911
+ </tr>
912
+ <tr>
913
+ <td>Llama-3.1-70b</td>
914
+ <td align="center">83.48</td>
915
+ <td align="center">39.08</td>
916
+ <td align="center">53.41</td>
917
+ <td align="center">75.61</td>
918
+ <td align="center">66.40</td>
919
+ <td align="center">91.66</td>
920
+ <td align="center">63.98</td>
921
+ </tr>
922
+ <tr>
923
+ <td>Qwen2.5-72b</td>
924
+ <td align="center">87.14</td>
925
+ <td align="center">65.78</td>
926
+ <td align="center">60.80</td>
927
+ <td align="center">81.10</td>
928
+ <td align="center">75.66</td>
929
+ <td align="center">95.45</td>
930
+ <td align="center">82.60</td>
931
+ </tr>
932
+ </table>
933
+
934
+ <br>
935
+
936
+ #### Embedding Model Performance
937
+ <table>
938
+ <tr>
939
+ <td align="center">Backbone</td>
940
+ <td align="center">Kanana-Nano-2.1b</td>
941
+ <td align="center">Llama-3.2-3b</td>
942
+ <td align="center">Qwen2.5-3b</td>
943
+ <td align="center">Llama-3.2-1b</td>
944
+ <td align="center">Qwen-2.5-1.5b</td>
945
+ </tr>
946
+ <tr>
947
+ <td align="center">English</td>
948
+ <td align="center">51.56</td>
949
+ <td align="center">53.28</td>
950
+ <td align="center"><strong>54.00</strong></td>
951
+ <td align="center">48.77</td>
952
+ <td align="center">50.60</td>
953
+ </tr>
954
+ <tr>
955
+ <td align="center">Korean</td>
956
+ <td align="center"><strong>65.00</strong></td>
957
+ <td align="center">59.43</td>
958
+ <td align="center">62.10</td>
959
+ <td align="center">54.68</td>
960
+ <td align="center">54.60</td>
961
+ </tr>
962
+ <tr>
963
+ <td align="center">Avg.</td>
964
+ <td align="center"><strong>58.28</strong></td>
965
+ <td align="center">56.35</td>
966
+ <td align="center">58.05</td>
967
+ <td align="center">51.73</td>
968
+ <td align="center">52.60</td>
969
+ </tr>
970
+ </table>
971
+
972
+ </details>
973
+
974
+
975
+ <br>
976
+
977
+ ## License
978
+ The `Kanana 1.5` models are licensed under [apache-2.0](./LICENSE).
979
+
980
+ <br>
981
+
982
+ ## Citation
983
+
984
+ ```
985
+ @misc{kananallmteam2025kananacomputeefficientbilinguallanguage,
986
+ title={Kanana: Compute-efficient Bilingual Language Models},
987
+ author={Kanana LLM Team and Yunju Bak and Hojin Lee and Minho Ryu and Jiyeon Ham and Seungjae Jung and Daniel Wontae Nam and Taegyeong Eo and Donghun Lee and Doohae Jung and Boseop Kim and Nayeon Kim and Jaesun Park and Hyunho Kim and Hyunwoong Ko and Changmin Lee and Kyoung-Woon On and Seulye Baeg and Junrae Cho and Sunghee Jung and Jieun Kang and EungGyun Kim and Eunhwa Kim and Byeongil Ko and Daniel Lee and Minchul Lee and Miok Lee and Shinbok Lee and Gaeun Seo},
988
+ year={2025},
989
+ eprint={2502.18934},
990
+ archivePrefix={arXiv},
991
+ primaryClass={cs.CL},
992
+ url={https://arxiv.org/abs/2502.18934},
993
+ }
994
+ ```
995
+
996
+ <br>
997
+
998
+ ## Contributors
999
+ - Language Model Training: Yunju Bak, Doohae Jung, Boseop Kim, Nayeon Kim, Hojin Lee, Jaesun Park, Minho Ryu
1000
+ - Language Model Alignment: Jiyeon Ham, Seungjae Jung, Hyunho Kim, Hyunwoong Ko, Changmin Lee, Daniel Wontae Nam
1001
+ - AI Engineering: Youmin Kim, Hyeongju Kim
1002
+
1003
+ <details>
1004
+ <summary>Contributors for Kanana 1.0</summary>
1005
+
1006
+ - Pre-training: Yunju Bak, Doohae Jung, Boseop Kim, Nayeon Kim, Hojin Lee, Jaesun Park, Minho Ryu
1007
+ - Post-training: Jiyeon Ham, Seungjae Jung, Hyunho Kim, Hyunwoong Ko, Changmin Lee, Daniel Wontae Nam, Kyoung-Woon On
1008
+ - Adaptation: Seulye Baeg, Junrae Cho, Taegyeong Eo, Sunghee Jung, Jieun Kang, EungGyun Kim, Eunhwa Kim, Byeongil Ko, Daniel Lee, Donghun Lee, Minchul Lee, Miok Lee, Shinbok Lee, Minho Ryu, Gaeun Seo
1009
+
1010
+ </details>
1011
+
1012
+ <br>
1013
+
1014
+ ## Contact
1015
+ - Kanana LLM Team Technical Support: kanana-llm@kakaocorp.com
1016
+ - Business & Partnership Contact: alpha.k@kakaocorp.com
assets/logo/kanana-logo-dark.png ADDED
assets/logo/kanana-logo-light.png ADDED
assets/performance/flops-vs-mmlu.jpg ADDED

Git LFS Details

  • SHA256: 72eb65fd674025c294d4abeb4f47dec5953ba93014abd32eb02d896126d0cb5e
  • Pointer size: 131 Bytes
  • Size of remote file: 333 kB
assets/performance/kanana-1.5-radar.png ADDED

Git LFS Details

  • SHA256: c48e8fb28055ffd94ecc0ab0f3a257c54965422e9cf466a2b9e6cba05891fee5
  • Pointer size: 131 Bytes
  • Size of remote file: 874 kB
assets/performance/niah-32.5b-base.png ADDED
assets/performance/niah-32.5b-inst.png ADDED