image

KomdigiITS-0.8B-DFK
Multimodal Classification

Qwen3.5-0.8B · LoRA · Vision-Language
✺
Overview

A LoRA adapter fine-tuned on Qwen3.5-0.8B as a Vision-Language Model for multimodal content classification. The model analyzes social media screenshots and classifies them into four categories: netral, disinformasi, fitnah, and ujaran kebencian.

Trained using the SITA framework with Unsloth's SFT pipeline. Given an image, the model produces a structured analysis with a classification label and a detailed Indonesian-language reasoning of any violations found.

♦ Note: This is the final checkpoint from Workshop 3 (final-qwen35-0.8b-ws3), trained on the DFK VLM Dataset V3 with augmented train/val splits.
✺
Model Details

Identity

Developed by:DFK Tim 3 ITS
Model type:VLM — LoRA adapter
Language:Indonesian

Architecture

Arch:Qwen3_5ForCausalLM
Parameters:0.8B (base)
Precision:bfloat16
✺
Uses

Direct Use

Image-based content moderation classification for Indonesian social media. Given a screenshot, the model produces a structured analysis with a classification label (netral, disinformasi, fitnah, or ujaran kebencian) and a detailed reasoning in Indonesian.

Out-of-Scope Use

This model is not intended for general-purpose vision-language tasks. It is specialized for the DFK disinformation detection pipeline and should not be used for content moderation in other languages or domains without further fine-tuning.

✺
Evaluation

Evaluated on the held-out validation split using greedy decoding (temperature=0.0) and BERTScore (bert-base-multilingual-cased).

92.5
Accuracy
89.3
F1 Macro
92.8
F1 Weighted
79.5
BERTScore F1
Per-Class Breakdown
Netral:P 0.954 · R 0.941 · F1 0.948 · n=970
Ujaran Kbnci:P 0.982 · R 0.930 · F1 0.955 · n=867
Disinformasi:P 0.943 · R 0.888 · F1 0.915 · n=392
Fitnah:P 0.651 · R 0.901 · F1 0.756 · n=213
BERTScore Details
Precision:0.797
Recall:0.793
F1:0.795
✺
Training Details

Training Data

Dataset:dfk_vlm_dataset_v3 (augmented on fitnah class)
Split mode:Fixed splits (train_aug.csv / val_aug.csv)
Train size:14,293 samples
Val size:2,831 samples

Label Classes

Netral:Factual content or non-DFK material — no violation detected
Disinformasi:Claims that contradict established facts, not directed at a specific person
Fitnah:False claims directed at a specific individual (defamation)
Ujaran Kbnci:Hate speech targeting ethnicity, religion, race, or intergroup identity (SARA)
Dataset Distribution
Train (aug)14,293 total
Netral:3,883 (27.2%)
Fitnah:3,846 (26.9%)
Ujaran Kbnci:3,484 (24.4%)
Disinformasi:3,080 (21.6%)
Val (aug)2,831 total
Netral:970 (34.3%)
Ujaran Kbnci:867 (30.6%)
Disinformasi:765 (27.0%)
Fitnah:229 (8.1%)

LoRA Configuration

r:16
Alpha:16
Dropout:0.1
Targets:all-linear
Vision:✓ finetuned
Language:✓ finetuned
Attention:✓ finetuned
MLP:✓ finetuned

Hyperparameters

Epochs:3
Batch size:32
LR:2e-4
Optimizer:AdamW 8-bit
Max seq len:2048
Grad accum:1
Grad ckpt:unsloth
Seed:3407

Trainer

Type:unsloth_vlm_sft (Unsloth VLM SFT trainer)
Train on:Responses only
Instr part:<|im_start|>user\n
Resp part:<|im_start|>assistant\n
Best model:Selected by eval_loss (lower is better)
Prompt Template

Each sample is formatted as a multi-turn conversation using qwen3.5_chatml:

<|im_start|>user
Anda adalah seorang analis konten media sosial ahli. Diberikan tangkapan layar
dari sebuah konten, tentukan label kategori pelanggaran dan berikan analisis
detail mengenai pelanggaran yang ditemukan.
Ringkasan: {ringkasan}
Klaim: {klaim}
Fakta: {fakta}
<image>
<|im_end|>
<|im_start|>assistant
Label: {label}

Analisis: {analisis} <|im_end|>

Input Fields

Ringkasan:Content summary. In the RAG pipeline this is the concatenation of the image caption (from a captioning model) and any user-provided text (e.g. post caption, tweet text). Effectively holds all available textual context about the content.
Klaim:The core claim extracted from the content, used as a web search query for fact-checking. Generated by an LLM from the ringkasan. Can also be a direct caption or user-provided text in simpler setups.
Fakta:Verification context retrieved via web search. Contains numbered search results with titles, descriptions, and source URLs. If no relevant sources are found, defaults to "Tidak ditemukan sumber yang valid."
<image>:Screenshot of the social media post being analyzed.

Output Fields

Label:One of netral, disinformasi, fitnah, or ujaran kebencian.
Analisis:Free-form Indonesian-language explanation of why the content was assigned its label, referencing the image, context, and any retrieved facts.
Full Training Config
experiment_name: final-qwen35-0.8b-ws3
seed: 3407

reporting: wandb: true wandb_project: "DFK3"

model: name: unsloth_vlm pretrained: unsloth/Qwen3.5-0.8B kwargs: load_in_4bit: false chat_template: "sita/templates/qwen3.5_chatml.jinja"

adapter: name: unsloth_vlm_lora kwargs: finetune_vision_layers: true finetune_language_layers: true finetune_attention_modules: true finetune_mlp_modules: true r: 16 lora_alpha: 16 lora_dropout: 0.1 bias: "none" target_modules: "all-linear" use_gradient_checkpointing: "unsloth" random_state: 3407

dataset: name: dfk_vlm_dataset_v3

training: num_epochs: 3 batch_size: 32 learning_rate: 2e-4 gradient_accumulation_steps: 1 logging_steps: 1 save_steps: 100 eval_steps: 50 extra: seed: 3407 max_length: 2048 load_best_model_at_end: true metric_for_best_model: eval_loss greater_is_better: false

trainer: name: unsloth_vlm_sft kwargs: train_on_responses_only: true instruction_part: "<|im_start|>user\n" response_part: "<|im_start|>assistant\n" optim: adamw_8bit

evaluation: name: vlm_gen kwargs: max_new_tokens: 512 temperature: 0.0 bert_model: bert-base-multilingual-cased batch_size: 16 num_workers: 11

✺
Model Sources
✺
Framework Versions
TRL:0.22.2
Transformers:5.3.0
PyTorch:2.11.0+cu128
Datasets:4.3.0
PEFT:0.19.0
Tokenizers:0.22.2
Downloads last month
-
Safetensors
Model size
0.8B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for aitf-its-tim3-dfk/KomdigiITS-0.8B-DFK-MultimodalClassification

Adapter
(21)
this model