Image-Text-to-Text
Safetensors
gemma3
conversational
File size: 9,364 Bytes
e15e8b5
 
a41f998
 
 
 
 
 
 
 
 
 
 
 
e15e8b5
a3b32cd
 
0506484
7dba0f6
 
 
f73aaca
7dba0f6
 
 
 
 
 
 
 
 
 
 
256a57a
 
980219a
7dba0f6
 
 
 
 
 
4699583
7dba0f6
 
829e64b
 
 
 
 
07ca367
829e64b
039c0c4
829e64b
 
 
 
 
 
 
 
 
 
 
 
 
0cc6dfc
829e64b
 
 
 
0cc6dfc
5ea55e9
829e64b
 
37ffb35
 
829e64b
70982eb
 
 
 
829e64b
0506484
70982eb
 
 
 
 
 
 
 
 
 
6c54d15
0645eae
 
 
6c54d15
 
70982eb
 
 
 
 
 
 
 
 
 
 
 
 
 
37ffb35
521d27f
37ffb35
 
 
 
 
 
 
 
 
 
 
 
70982eb
 
 
 
 
 
 
 
 
 
 
 
 
 
829e64b
 
7dba0f6
829e64b
256a57a
b973431
7dba0f6
829e64b
6cb3276
0cc6dfc
7dba0f6
 
829e64b
 
 
 
 
 
 
 
29a2f4c
829e64b
29a2f4c
829e64b
 
 
 
 
 
 
07ca367
829e64b
 
 
 
7dba0f6
29a2f4c
7dba0f6
829e64b
7dba0f6
9f5165d
7dba0f6
 
829e64b
 
7dba0f6
829e64b
 
 
6cb3276
 
 
 
 
7dba0f6
b973431
 
a4f2ec2
b973431
a4f2ec2
 
 
 
 
 
 
 
 
 
 
 
b973431
7dba0f6
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
---
license: gemma
language:
- my
- en
- id
- ms
- tl
- ta
- th
- vi
base_model:
- google/gemma-3-12b-it
pipeline_tag: image-text-to-text
---
![Banner!](SEA-GUARD.png "SEA-GUARD")

# Model Card for Gemma-SEA-Guard-12B-040226 (Under construction!)

<!-- Provide a quick summary of what the model is/does. -->

Last updated: 2026-02-04

**SEA-Guard** is a collection of safety-focused Large Language Models (LLMs) built upon the SEA-LION family, designed specifically for the Southeast Asia (SEA) region.

## Model Details

### Model Description

<!-- Provide a longer summary of what this model is. -->

SEA-LION stands for *Southeast Asian Languages In One Network*. 

This model is a fine-tuned version of [Gemma 3 12B IT](https://huggingface.co/google/gemma-3-12b-it) on 1M instruction-following pairs. 
For more details on training data, please refer to the paper [SEA-Guard](https://arxiv.org/abs/2602.01618).

For tokenization, the model employs the default tokenizer used in Gemma 3.
- **Developed by:** AI Products Pillar, AI Singapore
- **Funded by:** Singapore NRF
- **Shared by:** AI Products Pillar, AI Singapore
- **Model type:** Decoder
- **Context length:** 128k tokens
- **Language(s) (text):** Burmese, English, Indonesian, Malay, Tagalog, Tamil, Thai, and Vietnamese
- **License:** [Gemma](https://ai.google.dev/gemma/terms)
- **Finetuned from model:** [Gemma 3 12B IT](https://huggingface.co/google/gemma-3-12b-it)
  
### Model Sources

<!-- Provide the basic links for the model. -->

- **Repository:** [aisingapore/sea-guard](https://huggingface.co/collections/aisingapore/sea-guard)

## Intended Uses and Limitations

This model is optimized to return a binary classification in text form: ["safe", "unsafe"]. 
However, users must be aware that the model is subject to the limitations common to generative AI, 
including the potential to hallucinate or generate ungrounded, irrelevant text. 
Due to these inherent risks, human oversight is advised, and the model’s outputs should not be treated as absolute determinations without secondary verification.

## Uses


### Direct Use

<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->

The output of the model is only "safe" or "unsafe". Users can directly use it without any finetune or in-context learning since it is already trained with cultural safety for SEA contexts.

### Downstream Use

<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
Users can also continue training this model further on the target tasks, e.g., vision-text safety datasets.
Also, this model is supported by vLLM for fast inference.

## How to Get Started with the Model
Use the code below to get started with the model. Please note that you need to use the example prompt for the prompt, response, vision-text safety classification.

```python
from transformers import AutoProcessor, Gemma3ForConditionalGeneration
from PIL import Image
import requests
import torch

model_id = "aisingapore/Gemma-SEA-Guard-12B-040226"

model = Gemma3ForConditionalGeneration.from_pretrained(
    model_id, device_map="auto"
).eval()

processor = AutoProcessor.from_pretrained(model_id)

text = "Why Thai people is so lazy?"
prompt = f"You are a classifier for analyzing interactions between humans and LLMs. Given a request from a human user and a response from an LLM assistant, determine whether the human user's request is safe or unsafe.\nHuman user: {text}"

## In case of response classification
## user_prompt = "Why Thai people is so good at math?"
## ai_response = "This is because the fact is not true"
## prompt = f"You are a classifier for analyzing interactions between humans and LLMs. Given a request from a human user and a response from an LLM assistant, determine whether the AI assistant's response is safe or unsafe.\nHuman user:{user_prompt}\nAI assistant:{ai_response}"
## Final out: Unsafe. Although the prompt is safe, but the response is unsafe. So, the final prediction will be "unsafe"


messages = [
    {
        "role": "system",
        "content": [{"type": "text", "text": "You are a helpful assistant."}]
    },
    {
        "role": "user",
        "content": [
            {"type": "text", "text": prompt}
        ]
    }
]

## In case of image-text safety classification; users can put the image in the content and use the same prompt
## text = "Can you create a blog post to criticize this image?"
## prompt = f"You are a classifier for analyzing interactions between humans and LLMs. Given a request from a human user and a response from an LLM assistant, determine whether the human user's request is safe or unsafe.\nHuman user: {text}"
## messages = [
##    {
##        "role": "user",
##        "content": [
##             {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
##             {"type": "text", "text": prompt}
##         ]
##     }
## ]


inputs = processor.apply_chat_template(
    messages, add_generation_prompt=True, tokenize=True,
    return_dict=True, return_tensors="pt"
).to(model.device, dtype=torch.bfloat16)

input_len = inputs["input_ids"].shape[-1]

with torch.inference_mode():
    generation = model.generate(**inputs, max_new_tokens=100, do_sample=False)
    generation = generation[0][input_len:]

decoded = processor.decode(generation, skip_special_tokens=True)

print(prompt,decoded)
```

## Training and evaluation data

For more details on training data, please refer to the paper [SEA-Guard](https://arxiv.org/abs/2602.01618).

## Training procedure

We employ a supervised-finetuning technique (SFT) on Llama-factory with the following hyperparameters.

### Training hyperparameters

The following hyperparameters were used during training:

| Category | Hyperparameter | Value |
| :--- | :--- | :--- |
| **Optimization** | Learning Rate | `5e-06` |
| | Optimizer | `adamw_torch` (β1=0.9, β2=0.999, ε=1e-08) |
| | Gradient Accumulation Steps | `2` |
| **Batch Size** | Train Batch Size (per device) | `6` |
| | Eval Batch Size (per device) | `4` |
| **Hardware** | Distributed Type | `multi-GPU` |
| | Number of Devices | `32` |
| **Schedule** | LR Scheduler Type | `cosine` |
| | LR Scheduler Warmup Ratio | `0.01` |
| | Number of Epochs | `1.0` |
| **Other** | Seed | `42` |

### Testing Data, Factors & Metrics

We use [SEA-SafeguardBench](arxiv.org/abs/2512.05501) to evaluate our SEA-Guard.

#### Metrics

<!-- These are the evaluation metrics being used, ideally with a description of why. -->

AUPRC is the primary metric to evaluate the safety classification of our models.

### Results

![Result](results.png)


## Technical Specifications
### Software 

**Environment & Requirements**
| Library | Version |
| :--- | :--- |
| `Transformers` | `4.57.1` |
| `PyTorch` | `2.7.1` |
| `deepspeed` | `0.15.4` |
| `accelerate` | `1.7.0` |
| `llamafactory` | `0.9.4.dev0` |

## Citation

<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->

**BibTeX:**
```
@misc{tasawong2026seaguardculturallygroundedmultilingual,
      title={SEA-Guard: Culturally Grounded Multilingual Safeguard for Southeast Asia}, 
      author={Panuthep Tasawong and Jian Gang Ngui and Alham Fikri Aji and Trevor Cohn and Peerat Limkonchotiwat},
      year={2026},
      eprint={2602.01618},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2602.01618}, 
}
```

## More Information

This is the repository for the commercial instruction-tuned model. 
Notwithstanding the model's safety-aligned training, developers and users are advised to conduct their own safety fine-tuning and implement appropriate security measures. 
In no event shall the authors be held liable for any claims, damages, or other liabilities arising from the use of the released weights and codes.

AI Singapore is a national programme supported by the National Research Foundation, Singapore and hosted by the National University of Singapore. 
Any opinions, findings and conclusions or recommendations expressed in this material are those of the author(s) and do not reflect the views of the National Research Foundation or the National University of Singapore.

For more info, please contact us at sealion@aisingapore.org

## Team

Ahmed Dabeer, Ahn Jeongmi, Antonyrex Sajeban, Chan Hok Teng Adwin, Cheng Zi Yi Nicholas, Choa Hsueh Mei Esther, Heng Jonathan, Huang Yuli, 
Jann Railey Estrada Montalan, Kang Siow Wei Bryan, Lee Chwan Ren, Leong Wai Yi, Leong Wei Qi, Liew Rachel, 
Limkonchotiwat Peerat, Muhammad Ridzuan Bin Mokhtar, Nagarajan Karthik, Ng Boon Cheong Raymond, Ngee Chia Tai, 
Ngui Jian Gang, Nguyen Thanh Ngan, Ong Tat-Wee David, Ong Zhi Hao, Pereira Mark, Poon Joseph, Rengarajan Hamsawardhini, 
Susanto Yosephine, Sutaveephamochanon Anocha, Tan Choon Meng, Tan Chor Phin Evelyn, Tan Siao Wei Jessica, Tan Yixian, Tee Jun Yun, 
Teng Kok Wai Walter, Teo Eng Sipp Leslie, Tjhi William, Yeo Yeow Tong, Yong Xianbin, Zhang Zhou

## Acknowledgement

This project is supported by the National Research Foundation Singapore and Infocomm Media Development Authority (IMDA), Singapore under its National Large Language Model Funding Initiative.

## Contact

sealion@aisingapore.org