Instructions to use rushilsaraf/qwen3-actionable-v2-adapter with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use rushilsaraf/qwen3-actionable-v2-adapter with PEFT:
from peft import PeftModel from transformers import AutoModelForCausalLM base_model = AutoModelForCausalLM.from_pretrained("unsloth/qwen3-1.7b-unsloth-bnb-4bit") model = PeftModel.from_pretrained(base_model, "rushilsaraf/qwen3-actionable-v2-adapter") - Notebooks
- Google Colab
- Kaggle
Reflections โ Qwen3-1.7B Actionability Classifier (LoRA Adapter)
LoRA adapter for Qwen/Qwen3-1.7B that classifies whether the latest sentence in a smart-glasses conversation transcript warrants proactive assistant intervention (label 1) or should be ignored (label 0).
This adapter is the on-device gate inside the Reflections open-source smart-glasses assistant. It runs in ~200 ms on Apple Silicon (MPS) and lets the pipeline reject low-value turns without paying the latency or cost of a frontier LLM call.
Use case
Reflections streams audio + video from MentraOS smart glasses, transcribes with Soniox, attributes speakers with an active-speaker-detection model, then asks this classifier whether the latest finalized sentence is actionable. When P(actionable) >= GLASSES_GATE_THRESHOLD (default 0.25), the pipeline escalates to Claude Haiku with tools (web search, Google Maps, Google Calendar). Otherwise the turn is dropped silently.
The classifier is not a general chat model. It is trained to output a single label (0 or 1) given five structured context inputs:
- Transcript โ recent speaker-attributed turns, with the target sentence marked.
- Memory โ short summary of prior sessions (read from
memory.md). - Available tools โ names of tools the agent could call this turn (e.g.
send_message,create_calendar_event). - Location โ a coarse description + lat/lon (used by maps-style tools).
- Entity list โ known people in the wearer's life, with facts (allergies, preferences, etc.).
Prompts are rendered into Qwen's ChatML format and the score is softmax(logits)[1] over the two-token vocabulary {0, 1} at the <label> position.
How to use
The adapter is intended to be loaded onto the Qwen3-1.7B base model with PEFT:
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
import torch
BASE = "Qwen/Qwen3-1.7B"
ADAPTER = "rushilsaraf/qwen3-actionable-v2-adapter"
tokenizer = AutoTokenizer.from_pretrained(BASE)
base = AutoModelForCausalLM.from_pretrained(BASE, torch_dtype=torch.float16)
model = PeftModel.from_pretrained(base, ADAPTER)
model.eval()
# In the live Reflections pipeline, prompts are rendered by
# packages/proactivity/render.py and scored as:
# logits = model(input_ids).logits[0, -1]
# p_actionable = torch.softmax(logits[[tok0, tok1]], dim=0)[1].item()
For end-to-end use, install Reflections and run python -m apps.viewer โ the LoRA loads automatically from this Hub repo (override with REFLECTIONS_LORA_MODEL_ID).
Training
| Training base | unsloth/qwen3-1.7b-unsloth-bnb-4bit (Unsloth 4-bit) |
| Inference base | Qwen/Qwen3-1.7B (float16) |
| Framework | Unsloth + TRL SFT |
| Hardware | Single T4 (free Colab tier) |
| Wall-clock | ~50 minutes |
| Examples | ~400 (synthetic, labeled) |
Adapter config
| Parameter | Value |
|---|---|
| PEFT type | LoRA |
Rank (r) |
8 |
| LoRA alpha | 16 |
| LoRA dropout | 0.05 |
| Target modules | q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj |
| Task type | CAUSAL_LM |
| Trainable parameters | ~8.7M (0.51% of base) |
| Adapter size on disk | ~33 MB |
| PEFT version | 0.18.x (compatible with 0.19.x at inference) |
Gate thresholds (Reflections-side)
Two distinct knobs, do not conflate:
GLASSES_GATE_THRESHOLD(default0.25) โ the live gate used by the agent worker. Sentences scoring below this are silently dropped.REASONING_TRIGGER(0.45) โ only used by the offline smoke-test path (scripts/smoke_full_transcript.py,scripts/smoke_server.py) to decide whether to also generate a reasoning trace.
The live path never reads REASONING_TRIGGER.
Performance (held-out synthetic test set)
| Metric | Value |
|---|---|
| Test accuracy | 88.6% |
| Train accuracy | 94.3% |
| Trainโtest gap | +5.7% |
| Raw Qwen3-1.7B (no LoRA) | 51.1% |
| Lift from LoRA | +37.5 points |
| Mean inference latency (Apple Silicon MPS) | ~196 ms |
| p95 latency | ~257 ms |
| Throughput | ~5 classifications / sec |
The benchmark is a synthetic dataset matched to the training distribution. Real-world ASR transcripts are not yet part of the evaluation set โ see Limitations below.
Limitations
- English only.
- Synthetic training data ceiling. The 400-example training set was generated to cover entity / memory / tool / location signals. Real-world ASR disfluencies are not represented.
- Weak categories. Per-category breakdowns show
tool_dependentat ~40% andlocation_dependentat ~60% accuracy. Adding ~25 paired-negative examples per category should fix the imbalance in the next training cycle. - Not a general classifier. The model expects the exact 5-input prompt structure produced by
packages/proactivity/render.pyin Reflections. Out-of-distribution prompts will produce unreliable scores. - Not safety-critical. Do not use for medical, legal, or moderation decisions. This is a latency-saving gate in front of a stronger downstream LLM, not a standalone judgment.
License
- Base model (
Qwen/Qwen3-1.7B): Apache 2.0 โ see the Qwen3 license. - This LoRA adapter: MIT, matching the Reflections repository.
Combined use of base + adapter remains subject to the Apache 2.0 license of the Qwen3 weights.
Citation
If you use this adapter, please reference the Reflections project and the Qwen3 base model:
@misc{qwen3-2025,
title = {Qwen3 Technical Report},
author = {Qwen Team},
year = {2025},
url = {https://huggingface.co/Qwen/Qwen3-1.7B}
}
- Downloads last month
- 27