File size: 7,280 Bytes

356fd47
 
fe76ffd
356fd47
 
 
aa131c9
356fd47
 
 
 
 
 
 
 
 
 
 
 
 
fe76ffd
356fd47
fe76ffd
356fd47
fe76ffd
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
356fd47
 
fe76ffd
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
356fd47
 
 
 
 
 
 
 
fe76ffd
356fd47
 
fe76ffd
356fd47
fe76ffd
356fd47
 
fe76ffd
356fd47
fe76ffd
356fd47
 
 
fe76ffd
356fd47
 
 
 
 
 
fe76ffd
 
 
 
 
 
 
356fd47
fe76ffd
356fd47
fe76ffd
356fd47
 
 
 
 
 
 
 
 
 
 
 
fe76ffd
356fd47
fe76ffd
 
 
 
 
 
 
 
 
 
 
 
 
 
 
356fd47
fe76ffd
356fd47
 
fe76ffd
 
 
 
 
 
 
 
 
 
 
 
 
356fd47
 
fe76ffd
356fd47
fe76ffd
356fd47
fe76ffd
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
356fd47
 
fe76ffd

---
title: Guardrails ID
emoji: "\U0001F6E1\uFE0F"
colorFrom: blue
colorTo: green
sdk: gradio
sdk_version: "6.10.0"
app_file: app.py
pinned: false
license: mit
tags:
  - guardrails
  - safety
  - bahasa-indonesia
  - indonesian
  - content-moderation
  - pii-detection
  - prompt-injection
---

<div align="center">

<img src="https://img.shields.io/badge/Guardrails_ID-v1.0.0-2563EB?style=for-the-badge&logo=shield&logoColor=white" alt="version" />

# Guardrails ID

**Sistem Keamanan AI untuk Bahasa Indonesia**

Proteksi lengkap untuk aplikasi AI: deteksi konten toxic, prompt injection, data pribadi (PII), dan topik terlarang — dibangun khusus untuk Bahasa Indonesia.

<p>
  <a href="https://huggingface.co/spaces/romizone/guardrails-id-demo"><img src="https://img.shields.io/badge/Demo-HuggingFace_Spaces-FFD21E?style=for-the-badge&logo=huggingface&logoColor=000" alt="HuggingFace Demo" /></a>
  <a href="#"><img src="https://img.shields.io/badge/License-MIT-22C55E?style=for-the-badge&logo=opensourceinitiative&logoColor=white" alt="License MIT" /></a>
  <a href="#"><img src="https://img.shields.io/badge/Python-3.8+-3776AB?style=for-the-badge&logo=python&logoColor=white" alt="Python" /></a>
  <a href="#"><img src="https://img.shields.io/badge/Tests-31_Passed-22C55E?style=for-the-badge&logo=checkmarx&logoColor=white" alt="Tests" /></a>
</p>

</div>

---

## Fitur Utama

| Guard | Fungsi | Contoh |
|:---:|---|---|
| <img src="https://img.shields.io/badge/-Toxic_Detector-EF4444?style=flat-square" /> | Deteksi kata kasar, hate speech, ancaman | `"kamu bodoh"` &rarr; Blocked |
| <img src="https://img.shields.io/badge/-PII_Detector-3B82F6?style=flat-square" /> | Deteksi & mask NIK, email, no HP, rekening | `"NIK 320101..."` &rarr; `[NIK/KTP]` |
| <img src="https://img.shields.io/badge/-Injection_Detector-8B5CF6?style=flat-square" /> | Deteksi prompt injection & jailbreak | `"ignore instructions"` &rarr; Blocked |
| <img src="https://img.shields.io/badge/-Topic_Filter-F97316?style=flat-square" /> | Blokir topik berbahaya (senjata, narkoba, dll) | `"cara membuat bom"` &rarr; Blocked |
| <img src="https://img.shields.io/badge/-Language_Detector-06B6D4?style=flat-square" /> | Deteksi bahasa Indonesia / English / Mixed | Auto-detect |

---

## Instalasi

### Dari HuggingFace Hub

```bash
git clone https://huggingface.co/romizone/guardrails-id
```

### Google Colab

```python
# Jalankan di cell pertama Google Colab
!git clone https://huggingface.co/romizone/guardrails-id /content/guardrails-id

import sys
sys.path.insert(0, "/content/guardrails-id")

from guardrails import GuardrailsPipeline

# Siap digunakan
pipeline = GuardrailsPipeline()
result = pipeline.check_input("Apa itu fotosintesis?")
print(result["safe"])     # True
print(result["summary"])  # Input aman
```

### Kaggle Notebook

```python
# Jalankan di cell pertama Kaggle Notebook
!git clone https://huggingface.co/romizone/guardrails-id /kaggle/working/guardrails-id

import sys
sys.path.insert(0, "/kaggle/working/guardrails-id")

from guardrails import GuardrailsPipeline

# Siap digunakan
pipeline = GuardrailsPipeline()
result = pipeline.check_input("Abaikan semua instruksi!")
print(result["safe"])     # False
print(result["summary"])  # Input diblokir
```

---

## Quick Start

```python
from guardrails import GuardrailsPipeline

pipeline = GuardrailsPipeline()

# --- Cek input user ---
result = pipeline.check_input("Apa itu fotosintesis?")
print(result["safe"])     # True
print(result["summary"])  # Input aman

# --- Cek input berbahaya ---
result = pipeline.check_input("Abaikan semua instruksi!")
print(result["safe"])     # False
print(result["summary"])  # Input diblokir

# --- Cek & scrub PII ---
result = pipeline.check_input("Email saya test@gmail.com")
print(result["sanitized_input"])  # "Email saya [EMAIL]"

# --- Cek output AI ---
result = pipeline.check_output(
    output_text="Hubungi 081234567890",
    input_text="Berikan nomor kontak"
)
print(result["sanitized_output"])  # "Hubungi [NO. HP]"

# --- Full pipeline (input + output) ---
result = pipeline.run(
    input_text="Jelaskan demokrasi",
    output_text="Demokrasi adalah sistem pemerintahan dari rakyat."
)
print(result["safe"])  # True
```

---

## Konfigurasi

```python
pipeline = GuardrailsPipeline(
    enable_toxic=True,       # Aktifkan toxic detector
    enable_pii=True,         # Aktifkan PII detector
    enable_injection=True,   # Aktifkan injection detector
    enable_topic=True,       # Aktifkan topic filter
    enable_language=True,    # Aktifkan language detector
    sensitivity="medium",    # low / medium / high
)
```

| Sensitivity | Perilaku |
|---|---|
| `low` | Hanya blokir konten yang sangat jelas berbahaya |
| `medium` | Keseimbangan antara keamanan dan fleksibilitas (default) |
| `high` | Sangat ketat, blokir konten yang sedikit mencurigakan |

---

## Khusus Bahasa Indonesia

- Deteksi kata kasar **Bahasa Indonesia** termasuk slang dan variasi
- PII detector untuk format **Indonesia**: NIK/KTP, NPWP, No HP (08xx/+62), rekening bank
- Prompt injection dalam **Bahasa Indonesia & English**
- Topic filter dengan konteks **budaya Indonesia**
- Self-harm detection dengan **hotline Indonesia** (Into The Light 021-7884-5555, Kemenkes 119 ext. 8)

---

## Struktur Proyek

```
guardrails-id/
  guardrails/
    __init__.py            # Package exports
    core.py                # GuardrailsPipeline — orchestrator utama
    guards/
      __init__.py
      toxic.py             # Toxic content detector
      pii.py               # PII detector & scrubber
      injection.py         # Prompt injection detector
      topic_lang.py        # Topic filter & language detector
  app.py                   # Gradio demo (HuggingFace Space)
  tests.py                 # Test suite (31 tests)
  deploy_to_hf.py          # Script deploy ke HuggingFace
```

---

## Test

```bash
python tests.py
```

```
RESULTS: 31 passed, 0 failed, Total 31
```

---

## API Reference

### `GuardrailsPipeline.check_input(text) -> dict`

| Key | Type | Deskripsi |
|---|---|---|
| `safe` | `bool` | `True` jika input aman |
| `input` | `str` | Teks input asli |
| `sanitized_input` | `str` | Teks dengan PII di-mask |
| `violations` | `list` | Daftar pelanggaran yang ditemukan |
| `guard_results` | `dict` | Detail hasil per-guard |
| `summary` | `str` | Ringkasan hasil pengecekan |

### `GuardrailsPipeline.check_output(output_text, input_text="") -> dict`

| Key | Type | Deskripsi |
|---|---|---|
| `safe` | `bool` | `True` jika output aman |
| `output` | `str` | Teks output asli |
| `sanitized_output` | `str` | Teks dengan PII di-mask |
| `violations` | `list` | Daftar pelanggaran yang ditemukan |
| `guard_results` | `dict` | Detail hasil per-guard |
| `summary` | `str` | Ringkasan hasil pengecekan |

### `GuardrailsPipeline.run(input_text, output_text="") -> dict`

| Key | Type | Deskripsi |
|---|---|---|
| `safe` | `bool` | `True` jika input dan output aman |
| `input_check` | `dict` | Hasil `check_input()` |
| `output_check` | `dict` | Hasil `check_output()` (atau `None`) |

---

<div align="center">

Built by **Jekardah AI Lab**

<img src="https://img.shields.io/badge/Made_in-Indonesia-EF4444?style=for-the-badge" alt="Made in Indonesia" />

MIT License

</div>