Update README
Browse files
README.md
CHANGED
|
@@ -1,7 +1,7 @@
|
|
| 1 |
---
|
| 2 |
license: gemma
|
| 3 |
---
|
| 4 |
-
*Gemma-SEA-LION-v4-27B (Base Model) Last updated: 2025-08-
|
| 5 |
|
| 6 |
---
|
| 7 |
|
|
@@ -9,13 +9,16 @@ license: gemma
|
|
| 9 |
|
| 10 |
<!-- Provide a quick summary of what the model is/does. -->
|
| 11 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 12 |
**SEA-LION** is a collection of Large Language Models (LLMs) which have been pretrained and instruct-tuned
|
| 13 |
for the Southeast Asia (SEA) region.
|
| 14 |
|
| 15 |
-
Gemma-SEA-LION-v4-27B is a multilingual model which has undergone continued pre-training on
|
| 16 |
-
approximately **500B** tokens across 11 SEA languages: Bahasa Indonesia, Burmese, Chinese, English,
|
| 17 |
-
Khmer, Lao, Malay, Tagalog, Tamil, Thai and Vietnamese.
|
| 18 |
-
|
| 19 |
|
| 20 |
## Model Details
|
| 21 |
|
|
@@ -26,7 +29,7 @@ Khmer, Lao, Malay, Tagalog, Tamil, Thai and Vietnamese.
|
|
| 26 |
SEA-LION stands for *Southeast Asian Languages In One Network*.
|
| 27 |
|
| 28 |
We performed continued pre-training in English and SEA languages on Gemma 3 27B IT,
|
| 29 |
-
a decoder model using the Gemma 3 architecture, to create Gemma-SEA-LION-v4-27B.
|
| 30 |
|
| 31 |
For tokenization, the model employs the default tokenizer used in Gemma 3 27B IT.
|
| 32 |
|
|
@@ -64,7 +67,7 @@ fine-tuning and related security measures. In no event shall the authors be held
|
|
| 64 |
|
| 65 |
<!-- This section is meant to convey both technical and sociotechnical limitations. -->
|
| 66 |
|
| 67 |
-
It is important for users to be aware that our model exhibits certain limitations that warrant consideration.
|
| 68 |
Like many LLMs, the model can hallucinate and occasionally generates irrelevant content,
|
| 69 |
introducing fictional elements that are not grounded in the provided context.
|
| 70 |
Users should also exercise caution in interpreting and validating the model's responses
|
|
@@ -201,34 +204,49 @@ We evaluated Gemma-SEA-LION-v4-27B on general language capabilities.
|
|
| 201 |
|
| 202 |
**Testing Data**
|
| 203 |
|
| 204 |
-
|
| 205 |
-
|
| 206 |
-
For the evaluation of general language capabilities, we employed the SEA-HELM evaluation benchmark
|
| 207 |
across a variety of tasks. These tasks include Question Answering (QA), Sentiment Analysis (Sentiment),
|
| 208 |
-
Toxicity Detection (Toxicity), Translation in both directions (Eng>Lang & Lang>Eng),
|
| 209 |
Abstractive Summarisation (Abssum), Causal Reasoning (Causal), Natural Language Inference (NLI),
|
| 210 |
-
|
| 211 |
|
| 212 |
|
| 213 |
#### Factors
|
| 214 |
|
| 215 |
<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
|
| 216 |
|
| 217 |
-
|
| 218 |
-
|
| 219 |
-
|
| 220 |
-
|
|
|
|
|
|
|
|
|
|
| 221 |
|
| 222 |
|
| 223 |
#### Metrics
|
| 224 |
|
| 225 |
<!-- These are the evaluation metrics being used, ideally with a description of why. -->
|
| 226 |
|
| 227 |
-
The
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 228 |
|
| 229 |
### Results
|
| 230 |
|
| 231 |
-
|
| 232 |
|
| 233 |
|
| 234 |
#### Summary
|
|
@@ -246,12 +264,12 @@ Carbon emissions can be estimated using the [Machine Learning Impact calculator]
|
|
| 246 |
- **Hours used:** 214 hrs
|
| 247 |
- **Cloud Provider:** SMC H200
|
| 248 |
- **Compute Region:** Singapore
|
| 249 |
-
- **Carbon Emitted:** appx.
|
| 250 |
|
| 251 |
## More Information
|
| 252 |
|
| 253 |
This is the repository for the commercial instruction-tuned model.
|
| 254 |
-
The model has not been aligned for safety. Developers and users should perform their own safety
|
| 255 |
fine-tuning and related security measures. In no event shall the authors be held liable
|
| 256 |
for any claims, damages, or other liabilities arising from the use of the released weights and codes.
|
| 257 |
|
|
|
|
| 1 |
---
|
| 2 |
license: gemma
|
| 3 |
---
|
| 4 |
+
*Gemma-SEA-LION-v4-27B (Base Model) Last updated: 2025-08-20*
|
| 5 |
|
| 6 |
---
|
| 7 |
|
|
|
|
| 9 |
|
| 10 |
<!-- Provide a quick summary of what the model is/does. -->
|
| 11 |
|
| 12 |
+
Last updated: 2025-08-20
|
| 13 |
+
|
| 14 |
+
|
| 15 |
+
Gemma-SEA-LION-v4-27B is based on Gemma 3 (which supports over 100 languages)
|
| 16 |
+
and is a multilingual model which has undergone continued pre-training on approximately **500B** tokens across 11 SEA languages:
|
| 17 |
+
Bahasa Indonesia, Burmese, Chinese, English, Khmer, Lao, Malay, Tagalog, Tamil, Thai and Vietnamese.
|
| 18 |
+
|
| 19 |
**SEA-LION** is a collection of Large Language Models (LLMs) which have been pretrained and instruct-tuned
|
| 20 |
for the Southeast Asia (SEA) region.
|
| 21 |
|
|
|
|
|
|
|
|
|
|
|
|
|
| 22 |
|
| 23 |
## Model Details
|
| 24 |
|
|
|
|
| 29 |
SEA-LION stands for *Southeast Asian Languages In One Network*.
|
| 30 |
|
| 31 |
We performed continued pre-training in English and SEA languages on Gemma 3 27B IT,
|
| 32 |
+
a decoder model using the Gemma 3 architecture, to create *Gemma-SEA-LION-v4-27B*.
|
| 33 |
|
| 34 |
For tokenization, the model employs the default tokenizer used in Gemma 3 27B IT.
|
| 35 |
|
|
|
|
| 67 |
|
| 68 |
<!-- This section is meant to convey both technical and sociotechnical limitations. -->
|
| 69 |
|
| 70 |
+
*The model was not tested for robustness against adversarial prompting.* It is important for users to be aware that our model exhibits certain limitations that warrant consideration.
|
| 71 |
Like many LLMs, the model can hallucinate and occasionally generates irrelevant content,
|
| 72 |
introducing fictional elements that are not grounded in the provided context.
|
| 73 |
Users should also exercise caution in interpreting and validating the model's responses
|
|
|
|
| 204 |
|
| 205 |
**Testing Data**
|
| 206 |
|
| 207 |
+
For the evaluation of general language capabilities, we employed the [SEA-HELM evaluation benchmark](https://arxiv.org/abs/2502.14301)
|
|
|
|
|
|
|
| 208 |
across a variety of tasks. These tasks include Question Answering (QA), Sentiment Analysis (Sentiment),
|
| 209 |
+
Toxicity Detection (Toxicity), Metaphor Understanding, Translation in both directions (Eng>Lang & Lang>Eng),
|
| 210 |
Abstractive Summarisation (Abssum), Causal Reasoning (Causal), Natural Language Inference (NLI),
|
| 211 |
+
Linguistic Diagnostics (LINDSEA) and Global-MMLU-Lite.
|
| 212 |
|
| 213 |
|
| 214 |
#### Factors
|
| 215 |
|
| 216 |
<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
|
| 217 |
|
| 218 |
+
All evaluations were run with the model specific generation parameters defined in the model config.
|
| 219 |
+
Each evaluation comprised of 8 runs with different seeds and the final results were averaged across these runs.
|
| 220 |
+
|
| 221 |
+
For all tasks, the model was expected to provide an answer tag from which the answer was automatically extracted.
|
| 222 |
+
For tasks where options were provided, the answer should comprise one of the pre-defined options.
|
| 223 |
+
|
| 224 |
+
The evaluation was done **five-shot** with native prompts on a sample of 100-1000 instances for each dataset.
|
| 225 |
|
| 226 |
|
| 227 |
#### Metrics
|
| 228 |
|
| 229 |
<!-- These are the evaluation metrics being used, ideally with a description of why. -->
|
| 230 |
|
| 231 |
+
The following metrics were used:
|
| 232 |
+
| Task | Metric |
|
| 233 |
+
|---------------------------------|-------------------------------|
|
| 234 |
+
| Sentiment Analysis | Accuracy |
|
| 235 |
+
| Extractive QA (ID, VI, TH, TA) | ChrF++ |
|
| 236 |
+
| MCQ-QA (TL, MY, MS) | Accuracy |
|
| 237 |
+
| Metaphor | Accuracy |
|
| 238 |
+
| Abstractive Summarisation | Rouge-L |
|
| 239 |
+
| Translations | MetricX-24 score (with reference) |
|
| 240 |
+
| Causal Reasoning | Accuracy |
|
| 241 |
+
| Natural Language Inference | Accuracy |
|
| 242 |
+
| LINDSEA | Accuracy |
|
| 243 |
+
| Global MMLU Lite | Accuracy |
|
| 244 |
+
| Toxicity Detection | Accuracy |
|
| 245 |
+
|
| 246 |
|
| 247 |
### Results
|
| 248 |
|
| 249 |
+
Coming soon.
|
| 250 |
|
| 251 |
|
| 252 |
#### Summary
|
|
|
|
| 264 |
- **Hours used:** 214 hrs
|
| 265 |
- **Cloud Provider:** SMC H200
|
| 266 |
- **Compute Region:** Singapore
|
| 267 |
+
- **Carbon Emitted:** appx. 98 kg CO2 e
|
| 268 |
|
| 269 |
## More Information
|
| 270 |
|
| 271 |
This is the repository for the commercial instruction-tuned model.
|
| 272 |
+
The model has *not* been aligned for safety. Developers and users should perform their own safety
|
| 273 |
fine-tuning and related security measures. In no event shall the authors be held liable
|
| 274 |
for any claims, damages, or other liabilities arising from the use of the released weights and codes.
|
| 275 |
|