Safetensors
gemma3
SAnocha commited on
Commit
6e62017
·
verified ·
1 Parent(s): e4cf469

Update README

Browse files
Files changed (1) hide show
  1. README.md +38 -20
README.md CHANGED
@@ -1,7 +1,7 @@
1
  ---
2
  license: gemma
3
  ---
4
- *Gemma-SEA-LION-v4-27B (Base Model) Last updated: 2025-08-18*
5
 
6
  ---
7
 
@@ -9,13 +9,16 @@ license: gemma
9
 
10
  <!-- Provide a quick summary of what the model is/does. -->
11
 
 
 
 
 
 
 
 
12
  **SEA-LION** is a collection of Large Language Models (LLMs) which have been pretrained and instruct-tuned
13
  for the Southeast Asia (SEA) region.
14
 
15
- Gemma-SEA-LION-v4-27B is a multilingual model which has undergone continued pre-training on
16
- approximately **500B** tokens across 11 SEA languages: Bahasa Indonesia, Burmese, Chinese, English,
17
- Khmer, Lao, Malay, Tagalog, Tamil, Thai and Vietnamese.
18
-
19
 
20
  ## Model Details
21
 
@@ -26,7 +29,7 @@ Khmer, Lao, Malay, Tagalog, Tamil, Thai and Vietnamese.
26
  SEA-LION stands for *Southeast Asian Languages In One Network*.
27
 
28
  We performed continued pre-training in English and SEA languages on Gemma 3 27B IT,
29
- a decoder model using the Gemma 3 architecture, to create Gemma-SEA-LION-v4-27B.
30
 
31
  For tokenization, the model employs the default tokenizer used in Gemma 3 27B IT.
32
 
@@ -64,7 +67,7 @@ fine-tuning and related security measures. In no event shall the authors be held
64
 
65
  <!-- This section is meant to convey both technical and sociotechnical limitations. -->
66
 
67
- It is important for users to be aware that our model exhibits certain limitations that warrant consideration.
68
  Like many LLMs, the model can hallucinate and occasionally generates irrelevant content,
69
  introducing fictional elements that are not grounded in the provided context.
70
  Users should also exercise caution in interpreting and validating the model's responses
@@ -201,34 +204,49 @@ We evaluated Gemma-SEA-LION-v4-27B on general language capabilities.
201
 
202
  **Testing Data**
203
 
204
- General NLP Behaviour
205
-
206
- For the evaluation of general language capabilities, we employed the SEA-HELM evaluation benchmark
207
  across a variety of tasks. These tasks include Question Answering (QA), Sentiment Analysis (Sentiment),
208
- Toxicity Detection (Toxicity), Translation in both directions (Eng>Lang & Lang>Eng),
209
  Abstractive Summarisation (Abssum), Causal Reasoning (Causal), Natural Language Inference (NLI),
210
- and linguistic diagnostics (LINDSEA).
211
 
212
 
213
  #### Factors
214
 
215
  <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
216
 
217
- Our evaluations were set based on task. For all tasks, the model is expected to provide an answer tag
218
- from which the answer is automatically extracted. For tasks where options are provided,
219
- the answer should comprise one of the pre-defined options. The scores for each task is normalised to account
220
- for baseline performance due to random chance.
 
 
 
221
 
222
 
223
  #### Metrics
224
 
225
  <!-- These are the evaluation metrics being used, ideally with a description of why. -->
226
 
227
- The evaluation was done **five-shot** with native prompts on a sample of 100-1000 instances for each dataset.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
228
 
229
  ### Results
230
 
231
- For details on Gemma-SEA-LION-v4-27B performance, please refer to the SEA-HELM leaderboard, [Leaderboard results on SEA-HELM](https://leaderboard.sea-lion.ai/).
232
 
233
 
234
  #### Summary
@@ -246,12 +264,12 @@ Carbon emissions can be estimated using the [Machine Learning Impact calculator]
246
  - **Hours used:** 214 hrs
247
  - **Cloud Provider:** SMC H200
248
  - **Compute Region:** Singapore
249
- - **Carbon Emitted:** appx. 35.27 - 98 kg CO2 e
250
 
251
  ## More Information
252
 
253
  This is the repository for the commercial instruction-tuned model.
254
- The model has not been aligned for safety. Developers and users should perform their own safety
255
  fine-tuning and related security measures. In no event shall the authors be held liable
256
  for any claims, damages, or other liabilities arising from the use of the released weights and codes.
257
 
 
1
  ---
2
  license: gemma
3
  ---
4
+ *Gemma-SEA-LION-v4-27B (Base Model) Last updated: 2025-08-20*
5
 
6
  ---
7
 
 
9
 
10
  <!-- Provide a quick summary of what the model is/does. -->
11
 
12
+ Last updated: 2025-08-20
13
+
14
+
15
+ Gemma-SEA-LION-v4-27B is based on Gemma 3 (which supports over 100 languages)
16
+ and is a multilingual model which has undergone continued pre-training on approximately **500B** tokens across 11 SEA languages:
17
+ Bahasa Indonesia, Burmese, Chinese, English, Khmer, Lao, Malay, Tagalog, Tamil, Thai and Vietnamese.
18
+
19
  **SEA-LION** is a collection of Large Language Models (LLMs) which have been pretrained and instruct-tuned
20
  for the Southeast Asia (SEA) region.
21
 
 
 
 
 
22
 
23
  ## Model Details
24
 
 
29
  SEA-LION stands for *Southeast Asian Languages In One Network*.
30
 
31
  We performed continued pre-training in English and SEA languages on Gemma 3 27B IT,
32
+ a decoder model using the Gemma 3 architecture, to create *Gemma-SEA-LION-v4-27B*.
33
 
34
  For tokenization, the model employs the default tokenizer used in Gemma 3 27B IT.
35
 
 
67
 
68
  <!-- This section is meant to convey both technical and sociotechnical limitations. -->
69
 
70
+ *The model was not tested for robustness against adversarial prompting.* It is important for users to be aware that our model exhibits certain limitations that warrant consideration.
71
  Like many LLMs, the model can hallucinate and occasionally generates irrelevant content,
72
  introducing fictional elements that are not grounded in the provided context.
73
  Users should also exercise caution in interpreting and validating the model's responses
 
204
 
205
  **Testing Data**
206
 
207
+ For the evaluation of general language capabilities, we employed the [SEA-HELM evaluation benchmark](https://arxiv.org/abs/2502.14301)
 
 
208
  across a variety of tasks. These tasks include Question Answering (QA), Sentiment Analysis (Sentiment),
209
+ Toxicity Detection (Toxicity), Metaphor Understanding, Translation in both directions (Eng>Lang & Lang>Eng),
210
  Abstractive Summarisation (Abssum), Causal Reasoning (Causal), Natural Language Inference (NLI),
211
+ Linguistic Diagnostics (LINDSEA) and Global-MMLU-Lite.
212
 
213
 
214
  #### Factors
215
 
216
  <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
217
 
218
+ All evaluations were run with the model specific generation parameters defined in the model config.
219
+ Each evaluation comprised of 8 runs with different seeds and the final results were averaged across these runs.
220
+
221
+ For all tasks, the model was expected to provide an answer tag from which the answer was automatically extracted.
222
+ For tasks where options were provided, the answer should comprise one of the pre-defined options.
223
+
224
+ The evaluation was done **five-shot** with native prompts on a sample of 100-1000 instances for each dataset.
225
 
226
 
227
  #### Metrics
228
 
229
  <!-- These are the evaluation metrics being used, ideally with a description of why. -->
230
 
231
+ The following metrics were used:
232
+ | Task | Metric |
233
+ |---------------------------------|-------------------------------|
234
+ | Sentiment Analysis | Accuracy |
235
+ | Extractive QA (ID, VI, TH, TA) | ChrF++ |
236
+ | MCQ-QA (TL, MY, MS) | Accuracy |
237
+ | Metaphor | Accuracy |
238
+ | Abstractive Summarisation | Rouge-L |
239
+ | Translations | MetricX-24 score (with reference) |
240
+ | Causal Reasoning | Accuracy |
241
+ | Natural Language Inference | Accuracy |
242
+ | LINDSEA | Accuracy |
243
+ | Global MMLU Lite | Accuracy |
244
+ | Toxicity Detection | Accuracy |
245
+
246
 
247
  ### Results
248
 
249
+ Coming soon.
250
 
251
 
252
  #### Summary
 
264
  - **Hours used:** 214 hrs
265
  - **Cloud Provider:** SMC H200
266
  - **Compute Region:** Singapore
267
+ - **Carbon Emitted:** appx. 98 kg CO2 e
268
 
269
  ## More Information
270
 
271
  This is the repository for the commercial instruction-tuned model.
272
+ The model has *not* been aligned for safety. Developers and users should perform their own safety
273
  fine-tuning and related security measures. In no event shall the authors be held liable
274
  for any claims, damages, or other liabilities arising from the use of the released weights and codes.
275