Instructions to use aitf-its-tim3-dfk/KomdigiITS-0.8B-DFK-MultimodalClassification with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries
PEFT
How to use aitf-its-tim3-dfk/KomdigiITS-0.8B-DFK-MultimodalClassification with PEFT:
```
Task type is invalid.
```

How to use aitf-its-tim3-dfk/KomdigiITS-0.8B-DFK-MultimodalClassification with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="aitf-its-tim3-dfk/KomdigiITS-0.8B-DFK-MultimodalClassification")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("aitf-its-tim3-dfk/KomdigiITS-0.8B-DFK-MultimodalClassification")
model = AutoModelForCausalLM.from_pretrained("aitf-its-tim3-dfk/KomdigiITS-0.8B-DFK-MultimodalClassification")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use aitf-its-tim3-dfk/KomdigiITS-0.8B-DFK-MultimodalClassification with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "aitf-its-tim3-dfk/KomdigiITS-0.8B-DFK-MultimodalClassification"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "aitf-its-tim3-dfk/KomdigiITS-0.8B-DFK-MultimodalClassification",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/aitf-its-tim3-dfk/KomdigiITS-0.8B-DFK-MultimodalClassification

SGLang

How to use aitf-its-tim3-dfk/KomdigiITS-0.8B-DFK-MultimodalClassification with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "aitf-its-tim3-dfk/KomdigiITS-0.8B-DFK-MultimodalClassification" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "aitf-its-tim3-dfk/KomdigiITS-0.8B-DFK-MultimodalClassification",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "aitf-its-tim3-dfk/KomdigiITS-0.8B-DFK-MultimodalClassification" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "aitf-its-tim3-dfk/KomdigiITS-0.8B-DFK-MultimodalClassification",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Unsloth Studio

How to use aitf-its-tim3-dfk/KomdigiITS-0.8B-DFK-MultimodalClassification with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for aitf-its-tim3-dfk/KomdigiITS-0.8B-DFK-MultimodalClassification to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for aitf-its-tim3-dfk/KomdigiITS-0.8B-DFK-MultimodalClassification to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for aitf-its-tim3-dfk/KomdigiITS-0.8B-DFK-MultimodalClassification to start chatting

Load model with FastModel

pip install unsloth
from unsloth import FastModel
model, tokenizer = FastModel.from_pretrained(
    model_name="aitf-its-tim3-dfk/KomdigiITS-0.8B-DFK-MultimodalClassification",
    max_seq_length=2048,
)

Docker Model Runner
How to use aitf-its-tim3-dfk/KomdigiITS-0.8B-DFK-MultimodalClassification with Docker Model Runner:
```
docker model run hf.co/aitf-its-tim3-dfk/KomdigiITS-0.8B-DFK-MultimodalClassification
```

KomdigiITS-0.8B-DFK
Multimodal Classification

Qwen3.5-0.8B · LoRA · Vision-Language

✺

Overview

A LoRA adapter fine-tuned on Qwen3.5-0.8B as a Vision-Language Model for multimodal content classification. The model analyzes social media screenshots and classifies them into four categories: netral, disinformasi, fitnah, and ujaran kebencian.

Trained using the SITA framework with Unsloth's SFT pipeline. Given an image, the model produces a structured analysis with a classification label and a detailed Indonesian-language reasoning of any violations found.

♦ Note: This is the final checkpoint from Workshop 3 (final-qwen35-0.8b-ws3), trained on the DFK VLM Dataset V3 with augmented train/val splits.

✺

Model Details

Identity

Developed by:DFK Tim 3 ITS

Model type:VLM — LoRA adapter

Language:Indonesian

Architecture

Base model:unsloth/Qwen3.5-0.8B

Arch:Qwen3_5ForCausalLM

Parameters:0.8B (base)

Precision:bfloat16

✺

Uses

Direct Use

Image-based content moderation classification for Indonesian social media. Given a screenshot, the model produces a structured analysis with a classification label (netral, disinformasi, fitnah, or ujaran kebencian) and a detailed reasoning in Indonesian.

Out-of-Scope Use

This model is not intended for general-purpose vision-language tasks. It is specialized for the DFK disinformation detection pipeline and should not be used for content moderation in other languages or domains without further fine-tuning.

✺

Evaluation

Evaluated on the held-out validation split using greedy decoding (temperature=0.0) and BERTScore (bert-base-multilingual-cased).

92.5

Accuracy

89.3

F1 Macro

92.8

F1 Weighted

79.5

BERTScore F1

Per-Class Breakdown

Netral:P 0.954 · R 0.941 · F1 0.948 · n=970

Ujaran Kbnci:P 0.982 · R 0.930 · F1 0.955 · n=867

Disinformasi:P 0.943 · R 0.888 · F1 0.915 · n=392

Fitnah:P 0.651 · R 0.901 · F1 0.756 · n=213

BERTScore Details

Precision:0.797

Recall:0.793

F1:0.795

✺

Training Details

Training Data

Dataset:dfk_vlm_dataset_v3 (augmented on fitnah class)

Split mode:Fixed splits (train_aug.csv / val_aug.csv)

Train size:14,293 samples

Val size:2,831 samples

Label Classes

Netral:Factual content or non-DFK material — no violation detected

Disinformasi:Claims that contradict established facts, not directed at a specific person

Fitnah:False claims directed at a specific individual (defamation)

Ujaran Kbnci:Hate speech targeting ethnicity, religion, race, or intergroup identity (SARA)

Dataset Distribution

Train (aug)14,293 total

Netral:3,883 (27.2%)

Fitnah:3,846 (26.9%)

Ujaran Kbnci:3,484 (24.4%)

Disinformasi:3,080 (21.6%)

Val (aug)2,831 total

Netral:970 (34.3%)

Ujaran Kbnci:867 (30.6%)

Disinformasi:765 (27.0%)

Fitnah:229 (8.1%)

LoRA Configuration

r:16

Alpha:16

Dropout:0.1

Targets:all-linear

Vision:✓ finetuned

Language:✓ finetuned

Attention:✓ finetuned

MLP:✓ finetuned

Hyperparameters

Epochs:3

Batch size:32

LR:2e-4

Optimizer:AdamW 8-bit

Max seq len:2048

Grad accum:1

Grad ckpt:unsloth

Seed:3407

Trainer

Type:unsloth_vlm_sft (Unsloth VLM SFT trainer)

Train on:Responses only

Instr part:<|im_start|>user\n

Resp part:<|im_start|>assistant\n

Best model:Selected by eval_loss (lower is better)

Prompt Template

Each sample is formatted as a multi-turn conversation using qwen3.5_chatml:

<|im_start|>user
Anda adalah seorang analis konten media sosial ahli. Diberikan tangkapan layar
dari sebuah konten, tentukan label kategori pelanggaran dan berikan analisis
detail mengenai pelanggaran yang ditemukan.
Ringkasan: {ringkasan}
Klaim: {klaim}
Fakta: {fakta}
<image>
<|im_end|>
<|im_start|>assistant
Label: {label}

Analisis: {analisis}
<|im_end|>

Input Fields

Ringkasan:Content summary. In the RAG pipeline this is the concatenation of the image caption (from a captioning model) and any user-provided text (e.g. post caption, tweet text). Effectively holds all available textual context about the content.

Klaim:The core claim extracted from the content, used as a web search query for fact-checking. Generated by an LLM from the ringkasan. Can also be a direct caption or user-provided text in simpler setups.

Fakta:Verification context retrieved via web search. Contains numbered search results with titles, descriptions, and source URLs. If no relevant sources are found, defaults to "Tidak ditemukan sumber yang valid."

<image>:Screenshot of the social media post being analyzed.

Output Fields

Label:One of netral, disinformasi, fitnah, or ujaran kebencian.

Analisis:Free-form Indonesian-language explanation of why the content was assigned its label, referencing the image, context, and any retrieved facts.

Full Training Config

experiment_name: final-qwen35-0.8b-ws3 seed: 3407 reporting: wandb: true wandb_project: "DFK3" model: name: unsloth_vlm pretrained: unsloth/Qwen3.5-0.8B kwargs: load_in_4bit: false chat_template: "sita/templates/qwen3.5_chatml.jinja" adapter: name: unsloth_vlm_lora kwargs: finetune_vision_layers: true finetune_language_layers: true finetune_attention_modules: true finetune_mlp_modules: true r: 16 lora_alpha: 16 lora_dropout: 0.1 bias: "none" target_modules: "all-linear" use_gradient_checkpointing: "unsloth" random_state: 3407 dataset: name: dfk_vlm_dataset_v3 training: num_epochs: 3 batch_size: 32 learning_rate: 2e-4 gradient_accumulation_steps: 1 logging_steps: 1 save_steps: 100 eval_steps: 50 extra: seed: 3407 max_length: 2048 load_best_model_at_end: true metric_for_best_model: eval_loss greater_is_better: false trainer: name: unsloth_vlm_sft kwargs: train_on_responses_only: true instruction_part: "<|im_start|>user\n" response_part: "<|im_start|>assistant\n" optim: adamw_8bit

evaluation: name: vlm_gen kwargs: max_new_tokens: 512 temperature: 0.0 bert_model: bert-base-multilingual-cased batch_size: 16 num_workers: 11

✺

Model Sources

Framework:SITA

W&B Run:DFK3 / final-qwen35-0.8b-ws3

✺

Framework Versions

TRL:0.22.2

Transformers:5.3.0

PyTorch:2.11.0+cu128

Datasets:4.3.0

PEFT:0.19.0

Tokenizers:0.22.2

Downloads last month: -

Safetensors

Model size

0.8B params

Tensor type

BF16

Model tree for aitf-its-tim3-dfk/KomdigiITS-0.8B-DFK-MultimodalClassification

Base model

Qwen/Qwen3.5-0.8B-Base

Finetuned

Qwen/Qwen3.5-0.8B