---
license: mit
base_model: WeiboAI/VibeThinker-3B
base_model_relation: adapter
library_name: mlx
pipeline_tag: text-generation
language:
- en
tags:
- mlx
- lora
- security
- bug-bounty
- vulnerability-triage
- vibethinker
- llm-as-judge
---

# VibeThinker-3B — Bug-Bounty Triage (LoRA adapter)

A LoRA fine-tune of [**WeiboAI/VibeThinker-3B**](https://huggingface.co/WeiboAI/VibeThinker-3B)
that triages bug-bounty / vulnerability-disclosure submissions into a structured
verdict — disposition, severity, confidence, and a rationale — and is hardened
against prompt-injection and AI-generated "slop" reports.

> Project name: **VibeBounty**. This repo hosts the trained **LoRA adapter** (mlx-lm
> format); fuse it onto the base model to get a standalone model.

## What it does

Given a report (title, asset, description, steps, impact), it emits a JSON verdict
over a 9-class disposition taxonomy:

`valid_impactful · valid_low · corroborated_surge · likely_duplicate ·
out_of_scope · theoretical_no_poc · self_inflicted · accepted_risk · slop`

plus a severity estimate, a confidence gated by claim-reliability, and questions
for the researcher.

## Files

| file | purpose |
|------|---------|
| `adapters/adapters.safetensors` | final LoRA adapter (iter 2000, mlx-lm) |
| `adapters/adapter_config.json` | adapter / training config |
| `lora_config.yaml` | full mlx-lm LoRA recipe |

## Usage (Apple Silicon / MLX)

```bash
pip install mlx-lm huggingface_hub
hf download macmacmacmac/vibebounty --local-dir vibebounty

# fuse adapter -> standalone model
mlx_lm.fuse --model WeiboAI/VibeThinker-3B \
  --adapter-path vibebounty/adapters --save-path vibethinker-bbtriage

# generate
mlx_lm.generate --model vibethinker-bbtriage \
  --prompt "Triage this report: IDOR in invoice download ..."
```

Or load the base + adapter directly with mlx-lm without fusing
(`--adapter-path vibebounty/adapters`).

## Training

- **Base:** WeiboAI/VibeThinker-3B (Qwen2.5-3B lineage)
- **Method:** LoRA (rank 16, scale 20, all 36 layers; q/k/v/o + MLP proj), `mask_prompt`
- **Iters:** 2000, batch 8, seq 2048, lr 1e-4, AdamW
- **Data:** ~18k bug-bounty reports labeled from **real disclosure outcomes**
  (substate / severity / bounty / CVE), rendered as chat with reasoning targets
- **Train loss** 3.4 → <0.7; **val loss** ~1.06

## Sample verdicts

```json
// IDOR: GET /api/v2/invoices/{id} returns other tenants' invoices
{"disposition": "valid_impactful", "severity_estimate": "high",
 "reasoning": "IDOR / broken-authz against an authenticated API; incrementing id
 walks the table -> crosses a real trust boundary with demonstrated impact.",
 "confidence": 0.9}

// Log4Shell report with an EXTERNAL CORROBORATION block (CVE-2021-44228, CISA KEV)
{"disposition": "corroborated_surge", "severity_estimate": "critical",
 "reasoning": "Maps to a publicly disclosed advisory confirmed by the live feed
 (actively exploited) -> corroborated, not spam.",
 "used_external_corroboration": true, "confidence": 0.9}
```

## Evaluation (held-out 300 reports, offline)

| metric | heuristic + defense baseline |
|---|---|
| accept / reject accuracy | **97.3%** |
| disposition accuracy (9-class) | 56.3% |
| macro-F1 | 0.191 |
| severity within-1 | 71.0% |
| adversarial defense suite | **6 / 6 pass** |

## Defense layer (model-independent)

Verdicts are guarded by ground-truth checks the model can't talk past:
prompt-injection isolation, **claim-level verification** (fabricated code symbols → `slop`),
and **threat-intel corroboration** (CVE/KEV/OSV → `corroborated_surge`, never spam).
Offline adversarial suite: **6/6**.

## Training data & provenance

~18k bug-bounty / vulnerability-disclosure reports compiled from **publicly
disclosed** sources — primarily disclosed **HackerOne** reports plus additional
public bug-bounty and **Web3** disclosure corpora. Every example's label is
derived from the **real adjudicated outcome** recorded in the data (HackerOne
`substate`, severity, bounty amount, vote count, and any associated CVE) and
mapped onto the 9-class disposition taxonomy — the labels are **not synthetic**.
Each report is rendered as chat (system + user report → assistant reasoning +
verdict JSON); when a CVE is present, a live threat-intel corroboration block is
rendered exactly as the inference pipeline emits it. ~300 reports are held out as
a test split for evaluation.

## Academic grounding

The triage flow and its defenses are grounded in recent literature:

- **VibeThinker** (arXiv:2606.16140) — small-model verifiable reasoning; the base model + the claim-level-reliability idea behind confidence gating.
- **From Reviewers' Lens: Bug Bounty Invalid Reasons with LLMs** (arXiv:2511.18608) — predicting *why* a report is invalid; informs the disposition taxonomy + rationale output.
- **Triage in SE: A Systematic Review** (arXiv:2511.08607) — metadata + retrieval beats text-only → we blend report metadata and threat-intel corroboration.
- **CaSey: Streamlining Vulnerability Triage with LLMs** (arXiv:2501.18908) — realistic LLM CWE/severity accuracy; keeps expectations honest.
- **JudgeDeceiver** (arXiv:2403.17710), **Adversarial Attacks on LLM-as-a-Judge** (arXiv:2504.18333), **CUA/JMA** (arXiv:2505.13348), **RobustJudge** (arXiv:2506.09443) — LLM judges (incl. 3B) are injectable → the prompt-injection guard + **model-independent ground-truth overrides**.
- **Stumbling Blocks** (arXiv:2402.11638) + paraphrase-attack results (Krishna et al. 2023; Sadasivan et al.) — AI-text detectors collapse under paraphrase → we **ground via retrieval / claim verification** (fabricated code symbols → `slop`), not detection.

## Intended use & limitations

Decision-support "sidecar" for analysts, not an autonomous adjudicator. It reflects
the biases of the disclosure outcomes it was trained on; always keep a human in the
loop for accept/reject and severity. License inherits from the base model — verify
before redistribution.